More

sanjams · 2025-10-19T21:15:50 1760908550

I agree, but perhaps OP is suggesting that the hand-crafted data can be generated in a more transparent way. For example, via a script/tool that itself can be reviewed.

Groxx · 2025-10-19T22:16:54 1760912214

could have, absolutely.

should not have been in any commit, which is basically necessary to prevent this case, almost definitely not. it's normal, and requiring all data to be generated just means extremely complicated generators for precise trigger conditions... where you can still hide malicious data. you just have to obfuscate it further. which does raise the difficulty, which is a good thing, but does not make it impossible.

I completely agree that it's a good/best practice, but hard-requiring everywhere it has significant costs for all the (overwhelmingly more common) legitimate cases.

XorNot · 2025-10-20T09:42:12 1760953332

It would be reasonable for error case data though to be thoroughly explained, and it must be explainable since otherwise what are you testing and why does the test exist?

The xz exploit depended on the absence of that explanation but accepting that it was necessary for unstated reasons.

Whereas it's entirely reasonable to have a test that says something like: "simulate an error where the header is corrupted with early nulls for the decoding logic" or something - i.e. an explanation, and then a generator which flips the targeted bits to their values.

Sure: you _could_ try inserting an exploit, but now changes to the code have to also surface plausible data changes inline with the thing they claim is being tested.

I wouldn't even regard that as a lot of work: why would a test like that exist, if not because someone has an explanation for the thing they want to test?

sanjams · 2025-10-19T21:14:36 1760908476

The article references a technical write-up: https://research.swtch.com/xz-script

Groxx · 2025-10-19T22:08:44 1760911724

ah, yes, this is one I remember seeing early on! thank you! I couldn't find much past the blogspam this time :/

sanjams · on Jan 23, 2025

> Infrastructure algorithm optimization

> Novel training frameworks

Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation

lhl · on Jan 23, 2025

I think you haven't been looking too hard in that case. Here is the R1 paper: https://arxiv.org/abs/2501.12948

You can find more papers from the attached author: https://arxiv.org/search/cs?searchtype=author&query=DeepSeek... or title https://arxiv.org/search/?query=DeepSeek&searchtype=title&ab... and go through citations for more.

Of course, you could just search by some of the attached authors as well. Daya Guo, the lead author for the R1 paper has 36 papers on Arxiv: https://arxiv.org/search/cs?query=Guo%2C+Daya&searchtype=aut...

Besides the papers, DeepSeek has an active Github https://github.com/deepseek-ai and https://huggingface.co/deepseek-ai

sanjams · on Jan 23, 2025

I have read the R1 paper. My observation is that there is no information whatsoever about how they are overcoming the limitations of the H800 compared to the H100 which is what the parent article is about. That's the piece Im curious about.

I will concede that I have not read all their papers or looked through their code, but that's why I asked the question: I hoped someone here might be able to point me to specific places in specific papers instead of a axvix search.

lhl · on Jan 23, 2025

Give Section 3 of the DeepSeek-V3 paper a read. The discuss their HAI-LLM framework and have a pretty in-depth description of their DualPipe algorithm and how it compares to other pipeline bubbles. They also describe how they work around NVLink limits and tons of other optimizations in extreme depth. The section is 10 pages long, and it's relatively dense, not fluff!

GaggiX · on Jan 23, 2025

Their paper goes into the details: https://arxiv.org/abs/2501.12948

chvid · on Jan 23, 2025

They wrote a paper. As far as I can tell they applied a smørrebrødsbord approach and that let to the results they got.

diggan · on Jan 23, 2025

FWIW, I think you meant "Smörgåsbord", which is basically tapas but Swedish-style, like a mix of many different dishes. Smørrebrød is a Danish type of sandwich, I'm guessing smørrebrødsbord would be "a table of Smørrebrød", but I'm not sure how common "smørrebrødsbord", I'm not Danish :)

sanjams · on Jan 1, 2025

So then OP is correct? Your comment confirms the same sentiment about the tradeoff API users make: cheaper inference means you pay with your data.

Sure Deepseek may publish their weights so you dont have to use the API, but the point still stands for the API.

eldenring · on Jan 2, 2025

Its a matter of degree. If 90% of the cost savings are from a new, smarter architecture, it doesn't make sense to point to the API terms as the reason for it being so cheap.

sanjams · on June 10, 2022

AWS has a large data center near Portland... (And Washington DC for that matter)

sanjams · on Sept 4, 2020

So is the euro not a success either? Or any other currency in the world for that matter?

_heimdall · on Sept 4, 2020

That's completely unrelated. The Euro wasn't designed as an alternative to make USD obsolete. The Euro did do exactly what it was designed for, to unify much of Europe under one currency and economic system.

garmaine · on Sept 5, 2020

The value of a euro isn’t how many dollars you can buy with it. It is what goods and services you can buy with it, directly. That’s the point.

sanjams · on March 25, 2020

Why wouldn’t we want people to continue iterating on designs? Should all the world’s designers just start trying to become manufacturing planners? Of course not. This would be a terrible use of their skill set. But more than that, why in the world are you criticizing people who are just trying to help? At a minimum these people are helping to inspire hope.

sanjams · on Sept 9, 2019

Ha. I did this exact same thing for a project in college using echonest and linear regression. In the end, we were unable to find a single statistically significant coefficient. We ended up having to change our project completely. Kudos to your team for finding something there

theferalrobot · on Sept 9, 2019

I also did something similar in college but due to similar issues noted pivoted to genre classification with extracted audio features. With that though I was actually able to get a pretty accurate classifier going.

sanjams · on Aug 1, 2019

Cool article. I was fortunate enough to experience some places similar to these first hand during a visit to Tokyo. One of the most impressive things to me was the bartenders’ uncanny ability to pick the perfect next song. In a sea of old, unidentifiable records, they were able to pick out exactly which record and subsequent song they wanted. And the timing of their transitions between songs...incredible. All while serving you drinks.

OstrichGlue · on Aug 1, 2019

That sounds amazing! I'm taking at trip to Japan in the coming months. Would you mind sharing names of any of the places?

sanjams · on May 4, 2019

Some engineers in Mozilla said the same thing about their certs that expired yesterday

mlabouardy · on May 4, 2019

Haha, that's why I updated url