I’ve started to scrape the surface on how to analyze binaries with Rust. In order to get a meaninful exercise, I decided try out on how to parse the file offset of a symbol from a Mach-O binary on ARM64 macOS. Later on I plan to refine the code and implement the disassembly of the targeted symbol. I picked Mach-O because it being more alien to me than ELF, or even PE/COFF, enforces me to understand the crates in detail that I might find useful for the job.

For parsing binaries I ended up with a Rust create called Goblin, which supports all of the aforementioned binary formats.

Here’s the program that I ended up with, after some trial and error:

// SPDX-License-Identifier: MIT
//! Copyright (c) Jarkko Sakkinen 2024

#![deny(clippy::all)]
#![deny(clippy::pedantic)]

use goblin::mach::MachO;
use std::env;
use std::fs::File;
use std::io::Read;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let args: Vec<String> = env::args().collect();

    if args.len() != 3 {
        eprintln!("Usage: {} BINARY SYMBOL", args[0]);
        std::process::exit(1);
    }

    let arg_bin = &args[1];
    let arg_sym = &args[2];

    let mut buffer = Vec::new();
    let mut file = File::open(arg_bin)?;
    file.read_to_end(&mut buffer)?;
    let obj = MachO::parse(&buffer, 0)?;
    let symbols = obj.symbols.unwrap_or_else(|| std::process::exit(1));

    let nlist = symbols
        .iter()
        .find_map(|s| {
            if let Ok(s) = s {
                if s.0 == arg_sym {
                    Some(s.1)
                } else {
                    None
                }
            } else {
                None
            }
        })
        .unwrap_or_else(|| std::process::exit(1));

    let addr = nlist.n_value;
    if addr == 0 {
        eprintln!("undefined");
        std::process::exit(1);
    }

    let text = obj
        .segments
        .iter()
        .find(|s| s.name().unwrap_or("") == "__TEXT")
        .unwrap_or_else(|| std::process::exit(1));
    let file_offset = addr - text.vmaddr + text.fileoff;

    println!("{arg_sym} {file_offset:#016x}");
    Ok(())
}