Hacker Newsnew | past | comments | ask | show | jobs | submit | hb-robo's commentslogin

That's quite interesting. This is way outside of my wheelhouse - has this kind of approach been tried in other security contexts before? What would you even call that, virtualization?


The word is "bytecode" and the idea is as old as computing.


Java.


I don't really get Mass in general. On paper it seems like it should be quite a bit nicer than it is in reality. I live in the Hudson Valley and feel like it would be a downgrade in a few ways. And for being a deep blue state they are pretty far behind Cali/NY/Wash/Ill on things like food safety and health guidelines.


>pretty far behind Cali/NY/Wash/Ill on things like food safety and health guidelines.

I know that state and local governments are responsible for inspecting restaurants. Also I know that NYC banned transfats in restaurant food. Is that what you mean by "food safety and health guidelines"?


It's politicized by default when it stands in opposition to the most blatantly ideologically slanted social media site in existence this side of Gab and Truth Social.


Considering everyone NOT in a specific ideological umbrella has been fleeing X en masse, yeah it's obviously sensible that alternative ideologies would be more present by default.


Layman question here since this isn't my field: how do you achieve success on closed-system tasks without supervision? Surely at some point along the way, the system must understand whether their answers and reasoning are correct.


In their paper, they explain that "in the case of math problems with deterministic results, the model is required to provide the final answer in a specified format (e.g., within a box), enabling reliable rule-based verification of correctness. Similarly, for LeetCode problems, a compiler can be used to generate feedback based on predefined test cases."

Basically, they have an external source-of-truth that verifies whether the model's answers are correct or not.


You're totally right there must be supervision; it's just a matter of how the term is used.

"Supervised learning" for LLMs generally means the system sees a full response (eg from a human expert) as supervision.

Reinforcement learning is a much weaker signal: the system has the freedom to construct its own response / reasoning, and only gets feedback at the end whether it was correct. This is a much harder task, especially if you start with a weak model. RL training can potentially struggle in the dark for an exponentially long period before stumbling on any reward at all, which is why you'd often start with a supervised learning phase to at least get the model in the right neighborhood.


They use other models to judge correct-ness and when possible just ask the model output something that can be directly verified. Like math equations that can be checked 1:1 against the correct answer.


It's funny, the calculators were incredibly politicized when I was growing up (TI84 generation, so kids were getting caught programming functions to solve exam questions) but GPS was just taken as a given.


Same here. I never really consciously saw it as "defiance" against cognitive decline or anything. More to the point, the answers are much better on average


They can't afford not to accept it, honestly. They need to work so they don't die.

"A 2023 survey conducted by Payroll.org highlighted that 78% of Americans live paycheck to paycheck" "71.93% of Americans Living Paycheck to Paycheck Have $2,000 or Less in Savings" https://www.forbes.com/advisor/banking/living-paycheck-to-pa...


Incredible stuff, really.


> This is the overwhelmingly main reason why Tiktok is getting banned.

Because people are writing Orwell fanfiction?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: