Decompiling Binary Code with Large Language Models

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

Decompiling Binary Code with Large Language Models

HiddenLayer555@lemmy.ml · edit-2 1 month ago

I assume this relies on recompiling the generated code to compare against the original binary? I wonder how much more efficient this would be if compilers would just produce the same damn binary given the same source code and settings. Honestly blows my mind how people just accept a compiler of all things not being completely deterministic as normal and how reproducible builds of the same source code are considered the holy grail of compilation and not something that should be a given.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

There are already a lot of really good decompilation tools, but the problem tends to be that the output isn’t well structured and names are just random ids, making it difficult to make sense of any non trivial code that ends up being generated. However, LLMs can help with figuring out useful names for the variables and to produce sensible structure for the code. I think this stuff could be incredibly useful for reverse engineering stuff like proprietary drivers which has been notoriously difficult up to now. A direct application of that would be making Linux available on a lot more devices. And compilers not being deterministic is indeed fucked up.

Decompiling Binary Code with Large Language Models

Decompiling Binary Code with Large Language Models

GitHub - albertan017/LLM4Decompile: Reverse Engineering: Decompiling Binary Code with Large Language Models