Binary parsing with Goblin
I’ve started to scrape the surface on how to analyze binaries with Rust. In order to get a meaninful exercise, I decided try out on how to parse the file offset of a symbol from a Mach-O binary on ARM64 macOS. Later on I plan to refine the code and implement the disassembly of the targeted symbol. I picked Mach-O because it being more alien to me than ELF, or even PE/COFF, enforces me to understand the crates in detail that I might find useful for the job.
For parsing binaries I ended up with a Rust create called Goblin, which supports all of the aforementioned binary formats.
Here’s the program that I ended up with, after some trial and error:
// SPDX-License-Identifier: MIT
//! Copyright (c) Jarkko Sakkinen 2024
#![deny(clippy::all)]
#![deny(clippy::pedantic)]
use goblin::mach::MachO;
use std::env;
use std::fs::File;
use std::io::Read;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let args: Vec<String> = env::args().collect();
if args.len() != 3 {
eprintln!("Usage: {} BINARY SYMBOL", args[0]);
std::process::exit(1);
}
let arg_bin = &args[1];
let arg_sym = &args[2];
let mut buffer = Vec::new();
let mut file = File::open(arg_bin)?;
file.read_to_end(&mut buffer)?;
let obj = MachO::parse(&buffer, 0)?;
let symbols = obj.symbols.unwrap_or_else(|| std::process::exit(1));
let nlist = symbols
.iter()
.find_map(|s| {
if let Ok(s) = s {
if s.0 == arg_sym {
Some(s.1)
} else {
None
}
} else {
None
}
})
.unwrap_or_else(|| std::process::exit(1));
let addr = nlist.n_value;
if addr == 0 {
eprintln!("undefined");
std::process::exit(1);
}
let text = obj
.segments
.iter()
.find(|s| s.name().unwrap_or("") == "__TEXT")
.unwrap_or_else(|| std::process::exit(1));
let file_offset = addr - text.vmaddr + text.fileoff;
println!("{arg_sym} {file_offset:#016x}");
Ok(())
}