The US has generally resorted to propaganda rather than addressing the self-inflicted structural conditions responsible for the erosion of our dominance. China also conducted a broad, sustained, large-scale campaign of IP theft across almost every industry.
Obviously there is no natural law preventing China from innovating (We have treated political liberalism as a prerequisite to innovation in a way that was always partly self-congratulatory), but it's also obviously true that the speed of the gap closure is due in significant part to theft.
That doesn't change the fact that they are now a legitimate competitor who has gotten a lot of things right (and among these, some things that we get very wrong) and probably actually leads in some areas.
I like this take a lot and agree with it. The US for too long has been asleep at the wheel on many areas, power generation one of them. China with no doubt has conducted very deep and sustained espionage campaigns and even with LLMs there is enough evidence that most of the initial gains was training off of western models. Again no complaints here but I think it’s important to acknowledge both which can be true at the same time.
Yes, but for most uses that is irrelevant. Most of the complaints are not about them not being top-level writers, but that they stand out negatively from human writing by relying on a bunch of bad tropes and stereotypical language use.
Maybe we shouldn't use it to write novels if we can't push it well beyond average, but you don't need to get it to produce anything more than pretty much average or a little bit better for it to be good enough in competition with average humans.
The math is obvious on this one. It's super well-documented that model performance on complex tasks scales (to some asymptote) with the amount of inference-time compute allocated.
LLM providers must dynamically scale inference-time compute based on current load because they have limited compute. Thus it's impossible for traffic spikes _not_ to cause some degradations in model performance (at least until/unless they acquire enough compute to saturate that asymptotic curve for every request under all demand conditions -- it does not seem plausible that they are anywhere close to this)
Umm. I run multiple benchmark using APIs for my work and the inference time compute allotted has clear correlation with the metrics. But time of the day certainly isn't. If it is that straightforward people can prove very easily rather than relying on the anecdotes.
They either overprovision the server during low demand or they might dynamically provision servers based on load.
Yes, every time I see some variant of this come up (and believe me, this has been coming up since before the GPT3.5 days) there’s never any actual data demonstrating that it’s the case. As you say, it should be completely trivial to run the exact same prompt multiple times per day and capture the output to demonstrate this.
But no one ever seems to do that, they are rather content to “feel” that this is the case instead
For what it's worth, I have lived in, and currently spend a lot of time in, both places. You're both very obviously wrong.
There is a serious problem in the US. There is also a serious (though different) problem in the UK. The problem in the US is the chilling effect of the vindictiveness and lawlessness of the current regime. I will not elaborate on this because it's too complicated to communicate effectively in a forum post.
The problem in the UK is a set of vaguely and arbitrarily specified-and-enforced laws that enable the criminalization of 'grossly offensive" speech. There is no statutory definition of what constitutes a 'grossly offensive' communication -- all enforcement is arbitrary and thus can be abused. Whether is it actually abused in any widespread fashion is irrelevant.
- Communications Act 2003 (Section 127): Makes it an offense to send messages via public electronic networks (internet, phone, social media) that are "grossly offensive," indecent, obscene, or menacing, or to cause annoyance/anxiety.
- Malicious Communications Act 1988 (Section 1): Applies to sending letters or electronic communications with the purpose of causing distress or anxiety, containing indecent or grossly offensive content.
I'm still not quite sure how UK law impacts the US. I was hoping for explicit examples of someone actually being removed from power because they were critical of the president. I think that would be pretty big news and the closest I have heard was one of the ex-military standing congresspeople being threatened with reduced military benefits, or legal action, but not actually anyone being removed from a position.
Another (higher profile) example are the baseless threats of criminal indictments against Jerome Powell -- it is impossible to argue that these threats have been made for any reason other than that he, as a nonpartisan official, defied the president's demands to execute his duties as fed chair in such a way (that is, poorly) so as to put a temporary thumb on the scale for the current admin.
The more important question, I think, is how many folk in explicitly nonpartisan functions are choosing not to break step with the current admin for fear of some sort of (likely professional) reprisal. I'm not alleging that they're disappearing dissenters or anything that inflammatory, but it would be intellectually dishonest to contend that there isn't a long, well-documented trail of malfeasance here.
The sycophancy is obviously intentional. People are vulnerable to it, and addiction is profitable. It has nothing to do with the nature of LLMs and everything to do with user engagement metrics.
You can certainly do it with RAII. However, what if a language lacks RAII because it prioritizes explicit code execution? Or simply want to retain simple C semantics?
Because that is the context. It is the constraint that C3, C, Odin, Zig etc maintains, where RAII is out of the question.
Ok then I understand what you mean (I couldn't respond directly to your answer, maybe there is a limit to nesting in HN?).
Let me respond in some more detail then to at least answer why C3 doesn't have RAII: it tries to the follow that data is inert. That is – data doesn't have behaviour in itself, but is acted on by functions. (Even though C3 has methods, they are more a namespacing detail allowed to create methods that derive data from the value, or mutate it. They are not intended as organizational units)
To simplify what the goal is: data should be possible to create or destroy in bulk, without executing code for each individual element. If you create 10000 objects in a single allocation it should be as cheap to free (or create) as a single object.
We can imagine things built into the type system, but then we will need these unsafe constructs where a type is converted from its "unsafe" creation to its "managed" type.
I did look at various cheap ways of doing this through the type system, but it stopped resembling C and seemed to put the focus on resource management rather than the problem at hand.
The idea is, you could have a language like Rust, but with linear rather than affine types. Such a language would have RAII-like idioms, but no implicit destructors; instead, it'd be a compile-time error to have a non-Copy local variable whose value is not always moved out of it before its scope ends (i.e., to write code that in Rust could include an implicit destructor call). So you would have explicit deallocation functions like in C, but unlike in C you could not have resource leaks from forgetting to call them, because the compiler would not let you.
To the extent that you subscribe to a principle like "invisible function calls are never okay", this solves that without undermining Rust's safety story more broadly. I have no idea whether proponents of "better C" type languages have this as their core rationale; I personally don't see the appeal of that flavor of language design.
It is about types that can't be copied and can't go out of scope, and the only way to destroy them is to call one of their destructors. This is compile time checkable.
In theory they can solve a lot of problems easily, mainly resource management. Also it generalizes C++'s RAII, and similar to Rust's ownership.
In practice they haven't got support in any mainstream programming language yet.
I'd keep in mind that internet usage of 96 (I was there) bears no resemblance whatsoever to internet usage of today. The level of predatory sophistication of today's attention economy makes any sort of comparison between the two misguided at best.
Yes, but complaints about my generation sitting in front of computers was not that much different from my generation's complaints now of the next generation being on social media.
As opposed to taking like 30 seconds to install cargo and rust?
I get that the elegant thing to do would be to bootstrap this, but in practice does this actually cost you anything, or is this a purely aesthetic concern?
> As opposed to taking like 30 seconds to install cargo and rust?
I think you're oblivious to the problem domain. C and C++ projects are tightly coupled with build systems. If you are not smack middle in the happy path, you will experience problems. Having to onboard an external language and obscure toolset just to be able to start a hello world is somewhere between a hard sell and an automatic rejection.
I recently tried Cursor for about a week and I was disappointed. It was useful for generating code that someone else has definitely written before (boilerplate etc), but any time I tried to do something nontrivial, it failed no matter how much poking, prodding, and thoughtful prompting I tried.
Even when I tried to ask it for stuff like refactoring a relatively simple rust file to be more idiomatic or organized, it consistently generated code that did not compile and was unable to fix the compile errors on 5 or 6 repromptings.
For what it's worth, a lot of SWE work technically trivial -- it makes this much quicker so there's obviously some value there, but if we're comparing it to a pair programmer, I would definitely fire a dev who had this sort of extremely limited complexity ceiling.
It really feels to me (just vibes, obviously not scientific) like it is good at interpolating between things in its training set, but is not really able to do anything more than that. Presumably this will get better over time.
If you asked a junior developer to refactor a rust program to be more idiomatic, how long would you expect that to take? Would you expect the work to compile on the first try?
I love Cline and Copilot. If you carefully specify your task, provide context for uncommon APIs, and keep the scope limited, then the results are often very good. It’s code completion for whole classes and methods or whole utility scripts for common use cases.
"If you asked a junior developer to refactor a rust program to be more idiomatic, how long would you expect that to take? Would you expect the work to compile on the first try?"
The purpose of giving that task to a junior dev isn't to get the task done, it's to teach them -- I will almost always be at least an order order of magnitude faster than a junior for any given task. I don't expect juniors to be similarly productive to me, I expect them to learn.
The parent comment also referred to a 'competent pair programmer', not a junior dev.
My point was that for the tasks that I wanted to use the LLM, frequently there was no amount of specificity that could help the model solve it -- I tried for a long time, and generally if the task wasn't obvious to me, the model generally could not solve it. I'd end up in a game of trying to do nondeterministic/fuzzy programming in English instead of just writing some code to solve the problem.
Again I agree that there is significant value here, because there is a ton of SWE work that is technically trivial, boring, and just eats up time. It's also super helpful as a natural-language info-lookup interface.
I (like a very large plurality, maybe even a majority, of devs) do not work for a consulting firm. There is no client.
I've done consulting work in the past, though. Any leader who does not take into account (at least to some degree) relative educational value of assignments when staffing projects is invariably a bad leader.
All work is training for a junior. In this context, the idea that you can't ethically train a junior "on a client's dime" is exactly equivalent to saying that you can't ever ethically staff juniors on a consulting project -- that's a ridiculous notion. The work is going to get done, but a junior obviously isn't going to be as fast as I am at any task.
What matters here is the communication overhead not how long between responses. If I’m indefinitely spending more time handholding a jr dev than they save me eventually I just fire em, same with code gen.
A big difference is that the jr. dev is learning compared to the AI who is stuck at whatever competence was baked in from the factory. You might be more patient with the jr if you saw positive signs that the handholding was paying off.
That was my point, though I may not have been clear.
Most people do get better over time, but for those who don’t (or LLM’s) it’s just a question of if their current skills are a net benefit.
I do expect future AI to improve. My expectation is it’s going to be a long slow slog just like with self driving cars etc, but novel approaches regularly turn extremely difficult problems into seemingly trivial exercises.
The US has generally resorted to propaganda rather than addressing the self-inflicted structural conditions responsible for the erosion of our dominance. China also conducted a broad, sustained, large-scale campaign of IP theft across almost every industry.
Obviously there is no natural law preventing China from innovating (We have treated political liberalism as a prerequisite to innovation in a way that was always partly self-congratulatory), but it's also obviously true that the speed of the gap closure is due in significant part to theft.
That doesn't change the fact that they are now a legitimate competitor who has gotten a lot of things right (and among these, some things that we get very wrong) and probably actually leads in some areas.