RLVR is not offline learning. It's not learning from a static dataset. These are live rollouts that are being verified and which update the weights at each pass based on feedback from the environment.
You might argue that traditional RL involves multiple states the agent moves through. But autoregressive LLMs are the same: a forward pass generating a token also creates change in state.
After training, the weights are fixed, of course, but that is the case of most traditional RL systems. RL does not intrinsically mean a continual updating of weights in deployment, which carries a bunch of problems.
From the premise that RLVR can be applied to benchmaxx (true!) it does not follow that it therefore is only good for that.
I don't understand this. In supabase, the default is to turn on RLS for new tables. If you turn it on and have no policy set, no user can fetch anything from the table.
You have to explicitly create a read-all policy for anon keys, and with no constraints, for people to get access to it.
The default is secure.
If you turn off RLS, there are warnings everywhere that the table is unsecured.
The author goes on to compare this with PocketBase, which he says you "have to go out of your way" to make insecure. You have to go out of your way with Supabase, as well!
I wonder if the author tested this? I do agree that some third party website builders who use supabase on the back end could have created insecure defaults, but that's not supabase's fault.
More likely reason is that Supabase is a BaaS. Between client and DB there is no backend for secret management. So RLS is the only way to directly create API on the DB.
I’m not sure anyone’s scared off by this. It’s more that it’s more intuitive to declare your user queries (like Meteor did or how GraphQL works) than to reason about RLS.
It’s not about being scared off, I’m simply challenging the notion that Supabase is secure by default. It depends on your definition of secure, since everyone has a different threat model, but the above thread demonstrates that probably a good chunk of people would say No, it’s not actually secure by default. Being scared off would be probably the best possible outcome over the current situation which is “we don’t really have a good story to tell about whether this is secure or not”.
The fact that it takes a whole thread of conversation to even unwrap whether the default approach they took is good enough is a strong signal to me that it isn’t, because that level of complexity in the implementation often implies a model with a large enough attack surface with weaknesses that can be exploited without too much effort
Much of this is due to vastly better posttraining RL, not models that are much bigger. The idea that most of these gains comes from training really big models, or throwing immensely larger amounts of compute at it, is not really true.
This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?
One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past.
It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.
We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality.
An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters.
One thing I haven't seen anyone bring up yet in this thread, is that there's a big risk of leakage. If even big image models had CSAM sneak into their training material, how can we trust data from our time hasn't snuck into these historical models?
I've used Google books a lot in the past, and Google's time-filtering feature in searches too. Not to mention Spotify's search features targeting date of production. All had huge temporal mislabeling problems.
Also one of our fears. What we've done so far is to drop docs where the datasource was doubtful about the date of publication, if there are multiple possible dates we take the latest to be conservative. During training, we validate that the model learns pre- but not post-cutoff facts. https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
If you have other ideas or think thats not enough, I'd be curious to know! (history-llms@econ.uzh.ch)
> This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?
Feeling a bit defensive? That is not at all my point; I value history highly and read it regularly. I care about it, thus my questions:
> gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
What validity does this 'compression' have? What is the definition of a 'compression'? For example, I could create random statistics or verbiage from the data; why would that be any better or worse than this 'compression'?
Interactivity seems to be a negative: It's fun, but it would seem to highly distort the information output from the data, and omits the most valuable parts (unless we luckily stumble across it). I'd much rather have a systematic presentation of the data.
These critiques are not the end of the line; they are step in innovation, which of course raises challenging questions and, if successful, adapts to the problems. But we still need to grapple with them.
If your position is that brains are not actually bound by the laws of physics -- that they operate on some other plane of existence unbound by any scientifically tested principle -- then it is not only your ideological opposites who have quasi-religious faith in a thing not fully comprehended.
My "position" isn't remotely that. The problem with "brains are bound by the laws of physics" isn't that there's something special about brains. It's that physics doesn't consist of "laws" that things are "bound" by. It consists of theories that attempt to describe.
These theories are enormously successful, but they are also known to be variously incomplete, inconsistent, non-deterministic, philosophically problematic, open to multiple interpretations and only partially understood in their implications, with links between descriptions of things at different scales a particularly challenging and little understood topic. The more you learn about physics (and while I'm no physicist, I have a degree in the subject and have learned a great deal more since) the more you understand the limits of what we know.
Anybody who thinks there's no mystery to physics just doesn't know much about it. Anybody who confidently
asserts as fact things like "the brain consists of protons, neutrons and electrons so it's impossible for it to do anything a computer can't do" is deducing things from their own ignorance.
Read the full post. Partway down you will see they agree with you that getting an API key is not hard.
Paying is hard. And it is confusing how to set it up: you have to create a Vertex billing account and go through a cumbersome process to then connect your AIStudio to it and bring over a "project" which then disconnects all the time and which you have to re-select to use Nano Banana Pro or Gemini 3. It's a very bad process.
It's easy to miss this because they are very generous with the free tier, but Gemini 3 is not free.
Serving a subpoena in this manner is for publicity, not process.
In this case the public defender is issuing a subpoena for records in relation to the trespassing case. This is in relation to OpenAI, not Sam Altman. They could serve any reasonably senior person in the company. It can also be done by certified mail.
> Serving a subpoena in this manner is for publicity, not process.
Agree - but considering all of the companies holding copyrights that are pissed at OpenAI right now it's easy to understand the message being sent with doing this in a public manner.
Cultural antibodies take a long time to develop. In twenty years you will see more common resistance to what's being produced today, but less to whatever new innovation is released then.
See, for example, the slowly declining efficacy of banner ads, as each cohort of computer user learned to ignore them but they still retained efficacy on newer vintages of users.
It's nice because we can just put the JSONata expression into a db field, and so you can have arbitrary data transforms for different customers for different data structures coming or going, and they can be set up just by editing the expression via the site, without having to worry about sandboxing it (other than resource exhaustion for recursive loops). It really sped up the iteration process for configuring transforms.
You might argue that traditional RL involves multiple states the agent moves through. But autoregressive LLMs are the same: a forward pass generating a token also creates change in state.
After training, the weights are fixed, of course, but that is the case of most traditional RL systems. RL does not intrinsically mean a continual updating of weights in deployment, which carries a bunch of problems.
From the premise that RLVR can be applied to benchmaxx (true!) it does not follow that it therefore is only good for that.