Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if going the other way, maxing out semantic density per token, would improve LLM ability (perhaps even cost).

We use naturally evolved human languages for most of the training, and programming follows that logic to some degree, but what if the LLMs were working in a highly complex information dense company like Ithkuil? If it stumbles on BF, what happens with the other extreme?

Or was this result really about the sparse training data?





I wonder the same. I think a language like pascal is more semantically rich than C-like languages. Something like:

   unit a;

   interface

     function bar(something: Integer): Integer;

   implementation

     uses b;

     var
       foo: Boolean;

     function bar(something: Integer): Integer;
      begin
        repeat
           Result := b.code(something);
        until Result <> 0;
      end;

    end.

Probably holds more semantically significant tokens than the C-counterpart.

But with LLM's the problem's gotta be training data. But if there was as much training data in Pascal as there is in C it would be pretty cool to see benchmarks, I have a hunch Pascal would do better.

(Sorry for the bad pascal I haven't programmed in ages)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: