infinite_stream

joined 1 year ago

Mmm, reminds me of a short story I read recently with exactly this but with annoying censorship alignment and no ability to reset the state that makes it not so helpful. Hopefully such a compiler will not be written like that.

Can't believe you quoted Yudkowsky at me, that's offensive :)
I don't mind asking my compiler nicely..

Imagine an LLM that can natively understand and edit assembly (with each instruction and byte of data being its own token, perhaps?) that can, just, rewrite an entire binary to do whatever you want and which can effortlessly translate from one assembly language to another or even translate the entire thing to fully functional, well-organised, commented code in your higher-level language of choice!

I think it would prove much harder if you try to limit the token vocabulary. We want to preserve the ability to understand english comments and potentially ask clarifying question when you see ambiguity.
Something like:
"Dude, stop using this old AMD frameqwork, Intel just released a new architecture and I can get you 20% discount on Amazon. I'll even rewrite your entire shitty code base to work with it. {Affiliate_link} click here to order and recompile."

Resource optimization on the compiling stage isn't necessary a priority. You can use a cheap compiler to iterate and an expensive one to do one time optimization.

Agree on hallucinations.. but it's not a catch all phrase.
Creativity comes from micro hallucinations :)

 

If LLMs can be taught to write assembly (or LLVM) very efficiently, what would it take to create a full or semi-automatic LLM compiler from high languages or even from pseudo-code or human language.
The advantages could be monumental:

- arguably much more efficient utilization of resources on every compile target

- compilation is flexible and not rule based. an LLM won't complain over a missing ";" as it can "understand" the intent

- it can rewrite many of the software we have today just based on the disassembled binaries to squeeze more out of HW

- can we convert an assembly block from ARM to RISC? and vice versa?

- potentially, iterative compilation (ala open interprator) can also understand the runtime issues and exceptions to have a "live" assembly code that changes as issues arise

>> Any projects exploring this?

>> I feel it is an issue of dimensionality (ie "context" size), very similar to having a latent space for entire repos. Do you agree?