Contents

Rudroid - Writing the World's worst Android Emulator in Rust πŸ¦€

Introduction

Rudroid - this might arguably be one of the worst Android emulators possible. In this blog, we’ll write an emulator that can run a ‘Hello World’ Android ELF binary. While doing this, we will learn how to go about writing our own emulators.

Writing an emulator is an awesome way to study and probably master the low-level details of the system we are trying to emulate. I assume you have some working knowledge of Rust, a Linux machine with Rust installed or a Docker engine, and a lot of patience to go through the documentation of system calls, file formats, and more.

Β 

Topics we need to understand while writing Rudroid:

  • Basic Android Operating System Architecture
  • What are system calls
  • How system calls are handled in AArch64
  • How memory mapping works
  • How the operating system loads an ELF into memory and runs it
  • How we can emulate the behavior of Operating system to load an ELF into memory and run

Let’s start by reading the definition of Android:

Android is an open-source, Linux-based software stack created for a wide array of devices and form factors. The following diagram shows the major components of the Android platform. Β 

Kernel Architecture

The basic architecture of Linux kernel:

Kernel Architecture

Core functionalities of a kernel are:

  • Process management
  • Device management
  • Memory management
  • Interrupt handling
  • Block I/O communication
  • File System Management

For writing an emulator that just runs an Android ELF binary, the most interesting kernel components are Memory Management, File System Management, Process Management and Interrupt handling, and System Call Interface via which ELF communicates with Kernel.

Kernel Architecture  

Signals: The kernel uses signals to call into a process. For example, signals are used to notify a process of certain faults, such as division by zero.

Processes and Scheduler: Creates, schedules, and manages processes.

Virtual Memory: Allocates and manages virtual memory for processes.

File Systems: Implements the file and filesystem-related interfaces for user-space to communicate with the underlying disks.

Traps and faults: Handles traps and faults generated by the processor, such as a memory fault.

Physical memory: Manages the pool of page frames in real memory and allocates pages for virtual memory.

Interrupts: Handles all the interrupts from peripheral devices.

System calls: The system call is the means by which a process requests a specific kernel service for example read from a file, write to file, execute a program. There are several hundred system calls, which can be roughly grouped into six categories:
* file system
* process
* scheduling
* interprocess communication (ipc)
* socket (networking)
* miscellaneous.

How do Emulators do what they do?

An emulator usually has an MMU to manage guest’s memory requests, an instruction interpreter (decode -> translate -> execute), signal handlers, interrupt handlers.

These are the steps an emulator usually does:

  • load the target binary to memory
  • figure out the ISA of target binary
  • if emulator supports the ISA, initialize CPU
  • initialize signal handlers
  • initialize interrupt handlers
  • initialize syscall handlers
  • start CPU loop

What happens inside a CPU Loop:

  • fetch opcode to execute at Program Counter
  • increment Program Counter
  • decode opcode
  • translate opcode from emulated ISA to host ISA
  • execute the translated opcode
  • handle any raised signals/interrupts
  • continue the loop
emulate cpu loop

Rudroid’s Architecture

So, our Rudroid is just going to be a binary that implements an ELF loader, memory management, system call interface, filesystem. The final Rudroid’s binary should take the ELF that prints ‘Hello World’ to stdout as command-line argument and execute it on the host. The command should look something like this:

1
2
# ./Rudroid hello_world.elf
hello world

We are going to run our Rudroid on a Linux machine. This is how our Rudroid’s architecture is going to look like:

Rudroid Architecture

ELF loading process

We’ll try not to dwell too much into the details of the ELF file format. Take a look at this comprehensive ELF standard here.

Executable (ELFs) and shared object files (libraries) statically represent programs. When you decide to run a binary, the operating system starts by setting up a new process for the program to run.

ELFs are composed of three major components:

  • an executable header (Ehdr)
  • Sections (section header are represented as Shdr)
  • Segments (also known as Program Headers are represented as Phdr)

Ehdr as defined in /usr/include/elf.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
typedef struct {
    unsigned char e_ident[16]; /* Magic number and other info */
    uint16_t e_type; /* Object file type */
    uint16_t e_machine; /* Architecture */
    uint32_t e_version; /* Object file version */
    uint64_t e_entry; /* Entry point virtual address */
    uint64_t e_phoff; /* Program header table file offset */
    uint64_t e_shoff; /* Section header table file offset */
    uint32_t e_flags; /* Processor-specific flags */
    uint16_t e_ehsize; /* ELF header size in bytes */
    uint16_t e_phentsize; /* Program header table entry size */
    uint16_t e_phnum; /* Program header table entry count */
    uint16_t e_shentsize; /* Section header table entry size */
    uint16_t e_shnum; /* Section header table entry count */
    uint16_t e_shstrndx; /* Section header string table index*/
} Elf64_Ehdr;

Β 

Phdr as defined in /usr/include/elf.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
typedef struct elf64_phdr {
  Elf64_Word p_type;
  Elf64_Word p_flags;
  Elf64_Off p_offset;       /* Segment file offset */
  Elf64_Addr p_vaddr;       /* Segment virtual address */
  Elf64_Addr p_paddr;       /* Segment physical address */
  Elf64_Xword p_filesz;     /* Segment size in file */
  Elf64_Xword p_memsz;      /* Segment size in memory */
  Elf64_Xword p_align;      /* Segment alignment, file & memory */
} Elf64_Phdr;

Β 

Shdr as defined in /usr/include/elf.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef struct elf64_shdr {
  Elf64_Word sh_name;       /* Section name, index in string tbl */
  Elf64_Word sh_type;       /* Type of section */
  Elf64_Xword sh_flags;     /* Miscellaneous section attributes */
  Elf64_Addr sh_addr;       /* Section virtual addr at execution */
  Elf64_Off sh_offset;      /* Section file offset */
  Elf64_Xword sh_size;      /* Size of section in bytes */
  Elf64_Word sh_link;       /* Index of another section */
  Elf64_Word sh_info;       /* Additional section information */
  Elf64_Xword sh_addralign; /* Section alignment */
  Elf64_Xword sh_entsize;   /* Entry size if section holds table */
} Elf64_Shdr;

The kernel only really cares about Ehdr and Phdrs and only three types of program header entries:

  • PT_LOAD : Loadable Segment
  • PT_INTERP : Segment holding .interp section
  • PT_GNU_STACK : flag to set program’s stack to executable

The ELF loader in the kernel starts loading ELF by first examining the ELF header to check the validity of ELF. After this, the loader now loops over the program header entries, looking for PT_LOAD and PT_INTERP. For every PT_LOAD entry, the loader maps memory at load_address + phdr_header.p_vaddr of size phdr_header.mem_size and copies the contents of the segment into allocated memory. If PT_INTERP is found, the loader again parses this as an ELF file and maps it into memory, and keeps track of the entrypoints of the main ELF file and interpreter’s ELF file.

Once this is done, the loader starts setting up and populating the stack with auxiliary vector (ELF tables), environment variables, and command-line arguments passed to the ELF. An ELF auxiliary vector is an (id, value) pair that describes useful information about the program being run and the environment it is running in.

For this, we need an ELF parser in rust. We can either write our own ELF parser or use an already existing xmas-elf crate.

Before we could start writing an ELF loader, we also need a memory manager as we have to map the ELF into memory, manage stack, etc. Let’s look at how a memory manager works.

Memory Management (MMU)

Linux memory management subsystem is responsible, as the name implies, for managing the memory in the system. This includes implementation of virtual memory and demand paging, memory allocation both for kernel internal structures and userspace programs, mapping of files into processes address space, and many other cool things. Β 

It provides functionality to map and unmap memory allocations. We have to implement these functionalities:

  • map memory at a given location or of a given size
  • unmap memory at a given location or of a given size
  • read from memory
  • write to memory
  • manage permissions of the memory Β 

Mapping ranges from an address to address + size_of_the_mapping. We can look at mmap reference from the manual here. Β 

1
2
void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);
1
 mmap() creates a new mapping in the virtual address space of the calling process.  The starting address for the new mapping is specified in addr.  The length argument specifies the length of the mapping (which must be greater than 0).

Memory protections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
PROT_EXEC
    Pages may be executed.

PROT_READ
    Pages may be read.

PROT_WRITE
    Pages may be written.

PROT_NONE
    Pages may not be accessed.

Unicorn Engine offers this functionality:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
    /// Map a memory region in the emulator at the specified address.
    ///
    /// `address` must be aligned to 4kb or this will return `Error::ARG`.
    /// `size` must be a multiple of 4kb or this will return `Error::ARG`.
    pub fn mem_map(&mut self, 
            address: u64, 
            size: libc::size_t, 
            perms: Protection
    ) -> Result<(), uc_error>;


    /// Unmap a memory region.
    ///
    /// `address` must be aligned to 4kb or this will return `Error::ARG`.
    /// `size` must be a multiple of 4kb or this will return `Error::ARG`.
    pub fn mem_unmap(&mut self, 
            address: u64, 
            size: libc::size_t
    ) -> Result<(), uc_error>;


    /// Set the memory permissions for an existing memory region.
    ///
    /// `address` must be aligned to 4kb or this will return `Error::ARG`.
    /// `size` must be a multiple of 4kb or this will return `Error::ARG`.
    pub fn mem_protect(&mut self, 
            address: u64, 
            size: libc::size_t, 
            perms: Protection
    ) -> Result<(), uc_error> {
        let err = unsafe { ffi::uc_mem_protect(self.uc, address, size, perms.bits()) };
        if err == uc_error::OK {
            Ok(())
        } else {
            Err(err)
        }
    }

Β 

We can define protections and mapping as structs in rust:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
bitflags! {
#[repr(C)]
pub struct Protection : u32 {
        const NONE = 0;
        const READ = 1;
        const WRITE = 2;
        const EXEC = 4;
        const ALL = 7;
    }
}

pub struct MapInfo {
    pub memory_start    : u64,
    pub memory_end      : u64,
    pub memory_perms    : Protection,
    pub description     : String,
}

Using these mem_map, mem_unmap functions from Unicorn, We can implement our MMU as a hashmap of starting address and MapInfo struct.

We’ll also look at how system calls work and then start writing our Emulator.

System Calls

A system call is a routine that allows a user application to request actions that require special privileges or functionalities. Adding system calls is one of several ways to extend the functions provided by the kernel. Β 

In AArch64, there are special instructions for making such system calls. These instructions cause an exception, which allows controlled entry into a more privileged Exception level.

  • SVC - Supervisor call: Causes an exception targeting EL1. Used by an application to call the OS.
  • HVC - Hypervisor call: Causes an exception targeting EL2. Used by an OS to call the hypervisor, not available at EL0.
  • SMC - Secure monitor call: Causes an exception targeting EL3. Used by an OS or hypervisor to call the EL3 firmware, not available at EL0.
AArch64 system call  

InAArch64, the system call number is passed in X8 register and the return value in X0 register. We will use Unicorn’s hooks to hook onto these SVC calls and execute the corresponding system call and return the results.

AArch64 Instruction Emulation

Since writing emulating all the AArch64 instructions is a tedious job, we will make use of Unicorn Engine for emulating the instructions. We will still see how it works.

impl rudroid

Finally, we’ll start writing the code for our Rudroid. Let’s see how easy or complex it will be.

I’m going to use a Linux Docker container on my Apple M1 as the host for running Rudroid.

Rudroid’s Dockerfile:

Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
FROM rust:latest

RUN apt update -y
RUN apt install -y nano cmake 

WORKDIR /setup
RUN git clone https://github.com/unicorn-engine/unicorn/
WORKDIR /setup/unicorn/
RUN ./make.sh
RUN ./make.sh install

WORKDIR /setup/
RUN git clone https://github.com/keystone-engine/keystone/
RUN mkdir build
WORKDIR /setup/keystone/build
RUN ../make-share.sh
RUN make install

RUN cp /usr/local/lib/libkeystone.so* /usr/lib/

RUN apt-get install -y clang llvm binutils-dev libunwind-dev
WORKDIR /home/

run.sh

1
2
3
4
#!/bin/bash
image=Rudroid
docker build -t $image .
docker run --rm -v `pwd`:/home -v `pwd`/resources/:/resources/ -it $image bash
1
2
3
$ chmod +x run.sh
$ run.sh
root@9346e6664ae9:/home/code#

Here we are installing the required rust, unicorn-engine, capstone-engine, and keystone-engine.

We will extend Unicorn impl from Unicorn Rust crate and add system call handlers, file system management, etc. I took only the required files and discarded the remaining.

1
2
3
4
5
6
7
8
➜  src git:(main) βœ— tree core/unicorn/ 
| |____
| | |____unicorn_const.rs
| | |____ffi.rs
| | |____mod.rs
| | |____arch
| | | |____arm64.rs
| | | |____mod.rs

Directory structure

Let’s set up the below directory structure:

Tree

Β 

We are going to need libc crate to interact/forward our system calls to the host and xmas-elf crate for parsing ELF file. Add libc = "0.2.101" and xmas-elf = "0.8.0"to dependencies in Cargo.toml. Also added some helpers functions in utilities.rs to print in color.🎨

Cargo.toml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
[package]
name = "Rudroid"
version = "0.1.0"
edition = "2018"

[dependencies]
libc = "0.2.101"
bitflags = ">=1.1.0"
xmas-elf = "0.8.0"
byteorder = "1.4.3"
keystone = "0.9.0"
capstone="0.10.0"
nix = "0.22.1"

So I deleted the Unicorn new implementation and struct definition and added a new struct definition inside core/rudroid.rs. Our new implementation declares a new struct called Emulator that keeps track of details of the Elf file, filesystem, and Unicorn hooks.

core/rudroid.rs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

// #[derive(Debug)]
pub struct Emulator<D>  {
    pub debug               : bool,

    pub rootfs              : String,
    pub elf_path            : String,

    pub machine             : header::Machine,
    pub endian              : header::Data,
    pub arch                : Arch,

    pub uc                  : ffi::uc_handle,
    pub uc_type             : D,

    pub filesystem          : fs::FsScheme,

    // mmu stuff
    pub load_address        : u64,
    pub mmap_address        : u64,
    pub new_stack           : u64,
    pub interp_address      : u64,
    pub entry_point         : u64,
    pub elf_entry           : u64,
    pub brk_address         : u64,

    //elf arguments
    pub args                : Vec<String>,
    pub env                 : Vec<String>,

    pub map_infos           : HashMap<u64, mmu::MapInfo>,

    //hook
    pub code_hooks          : HashMap<*mut libc::c_void, Box<ffi::CodeHook<D>>>,
    pub mem_hooks           : HashMap<*mut libc::c_void, Box<ffi::MemHook<D>>>,
    pub intr_hooks          : HashMap<*mut libc::c_void, Box<ffi::InterruptHook<D>>>,
    pub insn_in_hooks       : HashMap<*mut libc::c_void, Box<ffi::InstructionInHook<D>>>,
    pub insn_out_hooks      : HashMap<*mut libc::c_void, Box<ffi::InstructionOutHook<D>>>,
    pub insn_sys_hooks      : HashMap<*mut libc::c_void, Box<ffi::InstructionSysHook<D>>>,

    // syscalls stuff
    pub sigmap              : HashMap<u64, Vec<u8>>,

    _pin                    : std::marker::PhantomPinned,
}

Now we have to implement Emulator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
impl<D> Emulator<D>
{
    pub fn new(elf_path: &str, rootfs: &str, elf: &mut ElfFile, endian: header::Data, args: Vec<String>, env: Vec<String>, data: D, debug: bool) -> Result<Emulator<D>, uc_error> {

        let mut machine = elf.header.pt2.machine().as_machine();

        let (arch, mode) = match machine {
            header::Machine::AArch64 => {
                (Arch::ARM64, Mode::LITTLE_ENDIAN)
            },
            _ => {
                panic!("Not implemented yet!")
            }
        };

        let mut handle = std::ptr::null_mut();

        //uc_open: Create new instance of unicorn engine.
        let err = unsafe { ffi::uc_open(arch, mode, &mut handle) };

        //create a new Emulator and return.
        let mut emu = Emulator {
            debug           : debug,
            rootfs          : String::from(rootfs),

            elf_path        : String::from(elf_path),
            args            : args,
            env             : env,
            
            uc              : handle,
            uc_type         : data,
            
            arch            : arch,
            machine         : machine,
            endian          : endian,

            map_infos       : HashMap::new(),
            entry_point     : 0,
            elf_entry       : 0,
            brk_address     : 0,
            mmap_address    : 0,
            interp_address  : 0,
            new_stack       : 0,
            load_address    : 0,

            //hooks
            code_hooks      : HashMap::new(),
            mem_hooks       : HashMap::new(),
            intr_hooks      : HashMap::new(),
            insn_in_hooks   : HashMap::new(),
            insn_out_hooks  : HashMap::new(),
            insn_sys_hooks  : HashMap::new(),

            _pin            : std::marker::PhantomPinned,
            
            //create a File System object
            filesystem      : fs::FsScheme::new(String::from(rootfs)),
            sigmap          : HashMap::new(),
        };
        
        //parse and load the ELF into memory
        emu.load(elf);

        // display the memory mapping
        emu.display_mapped();

        if err == uc_error::OK {
            Ok(emu)
        } else {
            Err(err)
        }
    }
}

Replaced all the implementations of impl UnicornHandler with impl<D> Emulator<D>. This way, we already have all the capabilities of Unicorn like memory management, hooks, instruction interpreter, CPU loop, etc. I think this is called Lazy programming? πŸ™Š

As explained in the ELF Loader section above, we parse the ELF using xmas-elf crate, go through the program headers, and map the respective segments into the memory. We also set up Stack for the program.

core/loaders/elfLoader.rs

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
impl<D> Emulator<D>
{
    pub fn load(& mut self, elf: &mut ElfFile)
    {
        self.enable_vfp();
        
        let profile = match self.machine {
            header::Machine::AArch64 => {
                (linux::OS64::stack_address, linux::OS64::stack_size)
            },
            _ => {
                    panic!("[load_with_ld] Not implemented yet!")
            }
        };

        let mut stack_address = profile.0 as u64;
        let stack_size      = profile.1 as usize;
        
        //initialise stack
        self.mmu_map(stack_address, stack_size, Protection::READ|Protection::WRITE, "[stack]", self.null_mut());

        // load ELF and linker into memory
        self.load_with_ld(stack_address.checked_add(stack_size as u64).unwrap() , 0, self.machine, elf);

        stack_address = self.new_stack;
        self.reg_write(RegisterARM64::SP as i32, stack_address).unwrap();
    }

    fn load_with_ld(&mut self, stack_address: u64, load_address: u64, archbit: header::Machine, elf: &mut ElfFile) {
        let mut load_address = match load_address {
            0 => {
                match  archbit {
                    header::Machine::AArch64 => {
                        self.mmap_address = linux::OS64::mmap_address as u64;
                        linux::OS64::load_address as u64
                    },
                    _ => {
                        panic!("Shouldn't be here");
                    }
                }
            },
            _ => {
                panic!("Shouldn't be here");
            }
        };
        
        let mut mem_start   : u64 = 0xffff_ffff;
        let mut mem_end     : u64 = 0xffff_ffff;
        let mut mem_s       : u64 = 0;
        let mut mem_e       : u64 = 0;

        let mut interp_path : String = String::new();

        match elf.header.pt2.type_().as_type() {
            header::Type::Executable => {
                load_address = 0;
            },
            header::Type::SharedObject => {
                
            }
            _ => {
                panic!("Some error in head e_type: {:?}", header::Type::SharedObject);
            }
        }

        for header in elf.program_iter() {
            match header.get_type().unwrap() {

                program::Type::Interp => {
                    let offset      = header.offset() as usize;
                    let end_offset  = (header.offset()+header.mem_size()) as usize;
                    let data = elf.input.get(offset..end_offset).unwrap();
                    interp_path = self.null_str(std::str::from_utf8(data).unwrap());
                },

                program::Type::Load => {
                    if mem_start > header.virtual_addr() || mem_start == 0xffff_ffff {
                        mem_start = header.virtual_addr();
                    };

                    if mem_end < header.virtual_addr()+header.mem_size() || mem_end == 0xffff_ffff {
                        mem_end = header.virtual_addr()+header.mem_size();
                    }
                },
                _ => {

                }
            }
        }

        mem_start = self.uc_align_down(mem_start);
        mem_end   = self.uc_align_up(mem_end);

        for header in elf.program_iter() {
            match header.get_type().unwrap() {
                program::Type::Load => {
                    mem_s = self.uc_align_down(load_address + header.virtual_addr());
                    mem_e = self.uc_align_up(load_address + header.virtual_addr() + header.file_size());
                    let perms =  utilities::to_uc_permissions(header.flags());

                    let desc = self.elf_path.clone();
                    self.mmu_map(mem_s, (mem_e-mem_s) as usize, perms, &desc, self.null_mut());
                    
                    let data = elf.input.get(header.offset() as usize..
                                                                (header.offset()+header.file_size()) as usize).unwrap();

                    self.write(load_address+header.virtual_addr(), data);
                },
                _ => {

                }
            }
        }
        
        let loaded_mem_end = load_address + mem_end;

        if loaded_mem_end > mem_e {
            let desc = self.elf_path.clone();
            self.mmu_map( mem_e, (loaded_mem_end-mem_e) as usize, Protection::ALL, &desc, self.null_mut());
        }

        self.elf_entry = elf.header.pt2.entry_point() + load_address;
        self.debug_print(format!("elf_entry {:x}", self.elf_entry));

        self.brk_address = mem_end + load_address + 0x2000; //not sure why?? seems to be used in ql_syscall_brk

        // load interpreter if there is an interpreter
        if !interp_path.is_empty() {
            self.debug_print(format!("Trying to load interpreter: {}{}", self.rootfs, interp_path));

            let mut interp_full_path = String::new();

            interp_full_path.push_str(&self.rootfs);
            interp_full_path.push_str(&interp_path);

            let interp_data = std::fs::read(&interp_full_path).unwrap();
            let interp_elf  = ElfFile::new(interp_data.get(0..).unwrap()).unwrap();

            let mut interp_mem_size: u64 = 0;
            let mut interp_address : u64 = 0;

            for i_header in interp_elf.program_iter() {
                match i_header.get_type().unwrap() {
                    program::Type::Load => {
                        if interp_mem_size < i_header.virtual_addr() + i_header.mem_size() || interp_mem_size == 0 {
                            interp_mem_size = i_header.virtual_addr() + i_header.mem_size();
                        }
                    },
                    _ => {

                    }
                };
            }

            interp_mem_size = self.uc_align_up(interp_mem_size);

            match archbit {
                header::Machine::AArch64 => {
                    interp_address = linux::OS64::interp_address as u64;
                }
                _ => {
                    panic!("what?");
                }
            };

            //map interpreter into memory
            self.mmu_map(interp_address, interp_mem_size as usize , Protection::ALL, &interp_path, self.null_mut());

            for i_header in interp_elf.program_iter() { 
                match i_header.get_type().unwrap() {
                    program::Type::Load => {
                        let data = interp_elf.input.get(i_header.offset()  as usize..
                                                                            (i_header.offset()+i_header.file_size()) as usize
                                                                                    ).unwrap();
                        self.write( interp_address+i_header.physical_addr(), data);
                    },
                    _ => {

                    }
                };
            }

            self.interp_address = interp_address;
            self.entry_point    = interp_elf.header.pt2.entry_point() + self.interp_address;
        }

        // setup elf table
        let mut elf_table: Vec<u8> = Vec::new();

        let mut new_stack = stack_address;

        // copy arg0 on to stack. elf_path
        new_stack = self.copy_str(new_stack, &mut self.elf_path.clone());

        elf_table.extend_from_slice(&self.pack(self.args.len() as u64 + 1)); // + 1 is for arg0 = elf path.
        elf_table.extend_from_slice(&self.pack(new_stack));
        
        let mut argc = self.args.len();

        loop {
            if argc <=0 {
                break;
            }
            argc -= 1;

            let mut arg = self.args[argc].clone();
            new_stack = self.copy_str(new_stack, &mut arg);
            elf_table.extend_from_slice(&self.pack(new_stack));
        }

        elf_table.extend_from_slice(&self.pack(0));
        
        let mut envc = self.env.len();

        loop {
            if envc <=0 {
                break;
            }
            envc -= 1;
            let mut env = self.env[envc].clone();
            new_stack = self.copy_str(new_stack, &mut env);
            elf_table.extend_from_slice(&self.pack(new_stack));
        }

        elf_table.extend_from_slice(&self.pack(0));

        new_stack = self.alignment(new_stack);

        //our super secure random string
        let mut randstr   = "a".repeat(0x10);
        let mut cpustr    = String::from("aarch64");

        let mut addr1 = self.copy_str(new_stack, &mut randstr);
        new_stack = addr1;

        let mut addr2 = self.copy_str(new_stack, &mut cpustr);
        new_stack = addr2;

        new_stack = self.alignment(new_stack);

        // Set AUX
        let head = elf.header;
        
        let elf_phdr    = load_address + head.pt2.ph_offset();
        let elf_phent   = head.pt2.ph_entry_size();
        let elf_phnum   = head.pt2.ph_count();
        let elf_pagesz  = 0x1000;
        let elf_guid    = linux::uid;
        let elf_flags   = 0;
        let elf_entry   = load_address + head.pt2.entry_point();
        let randstraddr = addr1; 
        let cpustraddr  = addr2;

        let elf_hwcap: u64 = match head.pt2.machine().as_machine() {
            header::Machine::AArch64 => {
                0x078bfbfd
            },
            _ => {
                panic!("");
            }
        };

        //setup auxiliary vectors
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHDR  as u64, elf_phdr + mem_start));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHENT as u64, elf_phent as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PHNUM as u64, elf_phnum as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PAGESZ as u64, elf_pagesz as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_BASE as u64, self.interp_address));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_FLAGS as u64, elf_flags));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_ENTRY as u64, elf_entry));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_UID as u64, elf_guid as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_EUID as u64, elf_guid as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_GID as u64, elf_guid as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_EGID as u64, elf_guid as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_HWCAP as u64, elf_hwcap as u64));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_CLKTCK as u64, 100));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_RANDOM as u64, randstraddr));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_PLATFORM as u64, cpustraddr));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_SECURE as u64, 0));
        elf_table.extend_from_slice(&self.new_aux_ent(AUX::AT_NULL as u64, 0));


        let len = 0x10 - ((new_stack - elf_table.len() as u64) & 0xf) as usize;
        let padding = std::iter::repeat('0').take(len).collect::<String>();

        elf_table.extend_from_slice(padding.as_bytes());
        
        let addr = new_stack - elf_table.len() as u64;
        self.write( addr, &elf_table);

        new_stack = new_stack - elf_table.len() as u64;

        self.new_stack = new_stack;
        self.load_address = load_address;
    }

    fn new_aux_ent(&self, key: u64, val: u64) -> Vec<u8>
    {
        //pack the aux key-val pair
        let mut aux: Vec<u8> = Vec::new();
        aux.extend_from_slice(&self.pack(key));
        aux.extend_from_slice(&self.pack(val));
        aux
    }

    // Run linker
    pub fn run_linker(&mut self)
    {
        utilities::context_title(Some("Emulating linker64"));
        let res = self.emu_start(self.entry_point, self.elf_entry, 0, 0);
        self.handle_emu_exception(res);
        utilities::context_title(Some("Emulating linker64 done"));
    }
}

We add three hooks in core/hook.rs

1
2
3
4
5
6
7
8
9
pub fn add_hooks(emu: &mut rudroid::Emulator<i64>) {
    //handle SVC
    emu.add_intr_hook(android::syscalls::hook_syscall).unwrap();    

    //handle MEM_FETCH_UNMAPPED
    emu.add_mem_hook(unicorn_const::HookType::MEM_FETCH_UNMAPPED, 1, 0, callback_mem_error).unwrap();

    //handle MEM_READ_UNMAPPED
    emu.add_mem_hook(unicorn_const::HookType::MEM_READ_UNMAPPED, 1, 0, callback_mem_error).unwrap();

And in hook_syscall function, we read the x8 register from the execution context, match it with syscalls of Android, and try to emulate the syscall. Instead of implementing every syscall in our code, we can just forward some of them to the host system, get the return values and forward them to the emulated binary.

core/android/syscalls/mod.rs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
mod syscalls;
mod unistd;

use crate::{core::{rudroid::Emulator, unicorn::arch::arm64::RegisterARM64}, utilities};

pub fn get_syscall(uc: &mut Emulator<i64>) -> syscalls::Syscalls {
    // syscall_num = UC_ARM64_REG_X8
    let syscall = uc.reg_read(RegisterARM64::X8 as i32).unwrap();
    unsafe { ::std::mem::transmute(syscall) }
}

pub fn hook_syscall(uc: &mut Emulator<i64>, intno: u32) {
    let pc = uc.reg_read(RegisterARM64::PC as i32).unwrap();
    let syscall = get_syscall(uc);
    uc.syscall(syscall);
}

impl<D> Emulator<D> {
    pub fn syscall(&mut self, syscall: syscalls::Syscalls) {
        if self.debug {
            utilities::draw_line();
            self.debug_print(format!("got syscall: {:?}", syscall));
        }
        
        match syscall {

            syscalls::Syscalls::__NR_getpid =>
            {
                self.sys_getpid();
            }
            
            _ => {
                panic!("Syscall {:?} not implemented yet!", syscall);
            }
        }; 
    }

    pub fn empty_syscall_return(&mut self) {
        self.reg_write(RegisterARM64::X0 as i32, 0).unwrap();
    }

    pub fn get_arg(&mut self, num: i32) -> u64 {
        // 'x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7'
        match num {
            0 => {
                self.reg_read(RegisterARM64::X0 as i32).unwrap()
            },
            1 => {
                self.reg_read(RegisterARM64::X1 as i32).unwrap()
            },
            2 => {
                self.reg_read(RegisterARM64::X2 as i32).unwrap()
            },
            3 => {
                self.reg_read(RegisterARM64::X3 as i32).unwrap()
            },
            4 => {
                self.reg_read(RegisterARM64::X4 as i32).unwrap()
            },
            5 => {
                self.reg_read(RegisterARM64::X5 as i32).unwrap()
            },
            6 => {
                self.reg_read(RegisterARM64::X6 as i32).unwrap()
            },
            7 => {
                self.reg_read(RegisterARM64::X7 as i32).unwrap()
            },
            _ => {
                panic!("i do not support any more arguments :/");
            }
        }
    }

    pub fn set_return_val(&mut self, value: u64) {
        self.reg_write(RegisterARM64::X0 as i32, value).unwrap();
    }
}

And in main.rs file, we parse the command line arguments to Rudroid, take target ‘Hello World’ ELF and rootfs (/system/ directory copied from an android device) folder as 2 arguments, create an Emulator, load the ELF into memory and start the CPU loop.

main.rs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
extern crate byteorder;
extern crate capstone;
extern crate keystone;
extern crate nix;
extern crate xmas_elf;

mod utilities;
mod core;

use std::env;
use xmas_elf::ElfFile;

use crate::utilities::context_title;

fn parse_args() -> env::Args {
    //! Parse Command line arguments
    let mut args = env::args();

    if args.len() != 3 {
        panic!("Please provide an ELF library and rootfs folder");
    }
    args
}

fn main()
{
    utilities::context_title(Some("Hello, world!"));
    let mut args = parse_args();
    let mut elf_filename = args.nth(1).unwrap();
    let rootfs       = args.next().unwrap();
    
    let mut elf_data    = std::fs::read(&mut elf_filename).unwrap();
    let mut elf: ElfFile        = ElfFile::new(&mut elf_data).unwrap();

    //our hello world program takes no arguments or environment variables
    let program_args: Vec<String>   = vec![];
    let program_env: Vec<String>    = Vec::new();
    
    let endian =  elf.header.pt1.data();
    let mut emu = core::rudroid::Emulator::new( &elf_filename, &rootfs, &mut elf, endian, program_args, program_env, 0, true).expect("Emulator initialisation failed");
    
    //set up hooks
    core::hooks::add_hooks(&mut emu);

    //run linker to load dependencies of ELF and then run the main from ELF
    emu.run_linker();
    emu.run_elf();

    context_title(Some("Emulator creted"))
}

Β  We are already ready to execute the ELF binary, except that when any syscall is called by the binary we panic with panic!("Syscall {:?} not implemented yet!", syscall);

The best part here is, we can just do it on the fly i.e., implement the requested syscall that was requested in the above panic.

Lets' compile and link it with Unicorn/Keystone/capstone.

1
2
build:
    RUSTFLAGS="-L /usr/lib/ -lunicorn -L /usr/local/lib/ -lkeystone -Awarnings" cargo run -- /setup/hello  /setup/rootfs/

Β 

Now compile and execute with make:

Compile and execute with make

You can notice in the screenshot above that Emulator panicked with 'Syscall __NR_getpid not implemented yet!'. So, let’s implement __NR_getpid syscall. If you check the documents of getpid (__NR_getpid) documented here https://man7.org/linux/man-pages/man2/getpid.2.html, just returns the PID of the executing process. Since here we are executing the binary in our own emulator, we can return whatever number as PID in the response. Let’s return 1337 as PID.

So we create a file unistd.rs inside syscalls folder and implement __NR_getpid syscall.

core/android/syscalls/unistd.rs

1
2
3
4
5
6
7
8
9
use std::process;
use crate::core::rudroid::Emulator;

impl<D> Emulator<D> {
    pub fn sys_getpid(&mut self) {
        let pid = 1337;
        self.set_return_val(pid as u64);
    }
}

Β 

And now we run make again to execute.

Compile and execute with make

As you can see, it executed [DEBUG]: got syscall: __NR_getpid and now panicked with 'Syscall __NR3264_mmap not implemented yet!'. If you are not sure what this syscall does, just search in bootlin. For this __NR3264_mmap, it is defined in https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/unistd.h#L645

__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)

This implements mmap. So, we go ahead and implement this as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
impl<D> Emulator<D> {
    pub fn sys_mmap(&mut self) {
        let addr    = self.get_arg(0);
        let len     = self.get_arg(1);
        let prot    = self.get_arg(2);
        let flags   = self.get_arg(3);
        let fd  : i32    = self.get_arg(4) as i32;
        let off     = self.get_arg(5) ;

        let aligned_len = self.align_len(len);

        let mut mmap_base = addr;
        let mut need_map : bool = true;

        if addr == 0 {
            mmap_base         = self.mmap_address;
            self.mmap_address = mmap_base + aligned_len;
        }
        
        else {
            need_map = false;
        }
        
        let is_fixed = (flags & MAP_FIXED) != 0;
        if self.debug {
            self.debug_print(format!("mmap_base 0x{:x} length 0x{:x} fixed: {} = ({:x}, {:x})", addr, len, is_fixed, mmap_base, aligned_len as usize));
        }

        if need_map {
            self.mmu_map(mmap_base, aligned_len as usize, Protection::ALL, "[syscall_mmap]", self.null_mut());
        }

        if (( flags & MAP_ANONYMOUS) == 0 ) && fd < MAX_FDS && fd > 0 {
            let mut data = vec![0u8; len as usize];
            self.filesystem.pread(fd, &mut data, off).unwrap();

            let mem_info: &str = &self.filesystem.get_path(fd).unwrap();

            let map_info = MapInfo {
                memory_start    : mmap_base,
                memory_end      : mmap_base+((len+0x1000-1)/0x1000) * 0x1000,
                memory_perms    : Protection::ALL,
                description     : String::from(mem_info),
            };

            self.add_mapinfo(map_info);
            self.write(mmap_base, &data);
        }

        self.set_return_val(mmap_base);
    }
}

Β 

And we make again.

Compile and execute with make

Do you see where I am going? Just keep doing this for few more syscalls till I saw the output ‘Hello World’ πŸ’ƒπŸ•ΊπŸ’ƒπŸ•Ί

Hello World from the binary

Uffff. That’s a long post. Hope it’s useful to someone. Please DM me if I made any booboo.

Β 

Don’t “panic” if the code sucks! Code is here: https://github.com/ant4g0nist/rudroid

Resources