Dynamically Load Assembler Code in R

IMPORTANT: If the title isn’t triggering enough, it should go without saying that this is a cool trick but terrible idea xD

The code is published along with the typed-sexp library.

Introduction

Imagine this frustration: You are not on your personal support computer, maybe a compute cluster or a friend is asking you ‘why doesn’t this work?’. You need a little native code to call a system API or do some heavy number crunching. You found that you don’t have a working compiler that links with R! You can’t even install a compiler because it’s not your computer. And of course with this all traditional “inline” libraries won’t work. How horrible! There has to be a way around this, right?

Introducing: rasm! A portable R extension that loads object files and executes them.

Motivation

None.

Well, maybe one, malicious compliance: you see, CRAN has a policy that:

Source packages may not contain any form of binary executable code.

It does’t say anything about assembling ASM and loading it into the address space of the R process, right? :D

Methodology

This section assumes you have a basic understanding of X86-64 assembly, memory paging and knows the memory representation of R objects as well as how R interacts with native code using the C FFIs. The last part can be learned from Writing R Extensions.

Getting it to run

Firstly, let’s make a short function that needs nothing but a stack and registers to run:

1
2
3
4
section .text
    id: ; id <- function(x) x
        mov rax, rdi
        ret

This will take an R object and return it back.

This is not a difficult task for low-level developers, but for R users, I will provide an explanation on how this is done:

The assembler code need to be translated to machine code. This is done by an assembler, in this case NASM. This is a pure text-to-binary translation and there is no external dependency required. The resulting binary is called an object file, in Linux it has the .o extension and its format is called “ELF”.
The object file is a container format that contains various “sections”, each representing a different part of the program. In this case, the only section we defined is .text, which means executable code.
We need to put this code into memory so it can be executed. Memory is divided into pages, and each page has its own permissions. A page cannot be both writable and executable at the same time, so we need to request a new page mmap(), copy the code into it, and change the permissions to be executable mprotect(). Since we only have one section, we can just copy the .text section into the new page.
$$ \text{function address} = \text{page base address} + \text{offset} $$
We record all function offsets and their names in a table, and then move memory management for this table and the allocated page to R by using an “external pointer” (EXTPTRSXP) R object. This allows R to automatically clean up these resources when they are no longer needed.

Making it useful

Just being able to run functions is not cool enough. I want to write assembly that can do anything a regular compiled shared library can do. Let’s write a more complex example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
section .rodata
    hello db "Hello, World!", 0

section .data
    counter dd 0

section .text
    extern Rf_ScalarInteger
    extern R_ShowMessage

    get_counter: ; SEXP(void)
        inc DWORD [counter]
        mov rdi, [counter]
        xor eax, eax
        call Rf_ScalarInteger
        ret
    
    hello_world: ; SEXP(SEXP)
        push rdi
        mov rdi, hello
        call R_ShowMessage
        pop rax
        ret

If we assemble this code and inspect it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Disassembly of section .text:

0000000000000000 <get_counter>:
   0:   ff 04 25 00 00 00 00    inc    DWORD PTR ds:0x0
   7:   48 8b 3c 25 00 00 00    mov    rdi,QWORD PTR ds:0x0
   e:   00 
   f:   31 c0                   xor    eax,eax
  11:   e8 00 00 00 00          call   16 <get_counter+0x16>
  16:   c3                      ret

0000000000000017 <hello_world>:
  17:   57                      push   rdi
  18:   48 bf 00 00 00 00 00    movabs rdi,0x0
  1f:   00 00 00 
  22:   e8 00 00 00 00          call   27 <hello_world+0x10>
  27:   58                      pop    rax
  28:   c3                      ret

It’s all zeros! But if you think about it, it makes sense. How will the assembler know the location of the counter relative to the get_counter function? It can’t because it is the linker’s job to put these sections together and we don’t have a linker. So we need to do this manually.

The thing we need to do is called relocation. Which means adapting the code to it’s actual loaded address by filling in these blanks the assembler left. We can list the relocations we need to do with readelf -r:

1
2
3
4
5
6
7
8
$ readelf -r get_counter.o
Relocation section '.rela.text' at offset 0x430 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  00030000000b R_X86_64_32S      0000000000000000 .data + 0
00000000000b  00030000000b R_X86_64_32S      0000000000000000 .data + 0
000000000012  000900000002 R_X86_64_PC32     0000000000000000 Rf_ScalarInteger - 4
00000000001a  000200000001 R_X86_64_64       0000000000000000 .rodata + 0
000000000023  000a00000002 R_X86_64_PC32     0000000000000000 R_ShowMessage - 4

We see that we have 5 “blank” to fill in. So how do we do this?

Let define the problem more formally:

There is an incomplete instruction in the code which references to a location we only know at runtime.
We need to be able to figure out what exactly the relocation is asking for, resolve the address and patch the instruction so that it points to the correct location.

Let’s try to read the first relocation, the important part is:

Offset: Where do I want the relocation to be applied?
Type: What kind of application do I want to do?
Sym. Value: What is the current address of the relocation? (We haven’t relocated yet so this is 0)
Sym. Name: What is the name of the thing I want to reference?
Addend: Where exactly do I want the address of, relative to the symbol?

Formally defined:

$$ \text{TransformFunc} :: \text{Ptr} \Rightarrow \text{Ptr} \Rightarrow \text{Ptr} \\ \text{transform} := \text{TransformOf}(\text{Type}) :: \text{TransformFunc} \\ \text{symValue} := \text{addrOf(referee(Sym. Name))} + \text{addend} \\ \text{dest} := \text{AddrOf(.text)} + \text{Offset} \\ \text{poke}(\text{dest} \lArr \text{transform}(\text{symValue}, \text{dest})) $$

Now let’s figure out how the transformation works:

We will look at the simpler of the two first, the R_X86_64_64: The name simply means “X86-64 platform, 64-bit absolute relocation”. So the transformation is simply:

$$ \text{transform64Abs } \text{symValue } \text{dest} = \text{symValue} $$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
pub struct X6464Applicator {
    pub r_offset: u64,
    pub value: u64,
}

impl X6464Applicator {
    fn new(r_offset: u64, sym_val: usize, addend: i64) -> Self {
        let value = apply_addend(sym_val, addend) as u64;
        Self { r_offset, value }
    }
}

impl Applicator for X6464Applicator {
    fn apply_on_page(&self, dest: &Page) {
        unsafe {
            dest.as_ptr()
                .cast::<u64>()
                .byte_add(self.r_offset as usize)
                .write_unaligned(self.value);
        }
    }
}

The other one is R_X86_64_PC32: This means “X86-64 platform, 32-bit PC-relative relocation”. This is a bit more complicated. The PC-relative means that the address is relative to the current instruction. So the transformation is:

$$ \text{transform32PC } \text{symValue } \text{dest} = \text{DWORD}(\text{symValue } - \text{dest}) $$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
pub struct X64PC32Applicator {
    pub r_offset: u64,
    pub value: i64,
}

impl X64PC32Applicator {
    fn new(r_offset: u64, sym_val: usize, addend: i64) -> Self {
        let value = apply_addend(sym_val, addend) as i64;
        Self { r_offset, value }
    }
}

impl Applicator for X64PC32Applicator {
    fn apply_on_page(&self, dest: &Page) {
        let pc = dest.as_ptr() as i64 + self.r_offset as i64;
        unsafe {
            dest.as_ptr()
                .cast::<u32>()
                .byte_add(self.r_offset as usize)
                .write_unaligned(self.value.wrapping_sub(pc) as u32);
        }
    }
}

The last one is R_X86_64_32S: This means “X86-64 platform, 32-bit sign-extended relocation”. It is the 32-bit version of the 64-bit absolute relocation. The transformation is:

$$ \text{transform32S } \text{symValue } \text{dest} = \text{DWORD}(\text{symValue}) $$

This one is a tricky one because we don’t have control over how big $\text{symValue}$ is. We need to make sure that the sign extension is done correctly. This is done by checking whether sign-extension yield the same value as the original. If it doesn’t, we will need to fail. There is a nasm warning option that can be used to make sure these instructions are not generated (-Wreloc-abs-dword).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
impl X6432Applicator {
    fn new(r_offset: u64, sym_val: usize, addend: i64, sign_extend: bool) -> Self {
        let target_value = apply_addend(sym_val, addend);
        let sign32 = target_value & 0x8000_0000;
        let sign_extend_mask: usize = if sign_extend && sign32 != 0 {
            !0 >> 31 << 31
        } else {
            0
        };
        // We don't really have a say on where our page is mapped, so pointers can be really far
        assert_eq!(
            target_value,
            (target_value & 0xffff_ffff) | sign_extend_mask,
            "Relocation overflow, solution: use 64-bit instructions when accessing relocated data"
        );
        Self {
            r_offset,
            value: target_value as u32,
        }
    }
}

impl Applicator for X6432Applicator {
    fn apply_on_page(&self, dest: &Page) {
        unsafe {
            dest.as_ptr()
                .cast::<u32>()
                .byte_add(self.r_offset as usize)
                .write_unaligned(self.value);
        }
    }
}

The problem is some instructions in x86-64 does not take 64-bit address operands, so we need to write them in a different way to make it work. My solution is, for relocations within the assembly file, use relative addressing with [rel my_data] syntax. This way, the assembler will generate R_X86_64_PC32 relocations, which we can make sure will fit. For external symbols, we will use 64-bit addressing by movabs into a temporary register first and then use the register from there.

Lastly, a short note about computing $\text{symValue}$, within the assembly file, the value is simply the loaded address of that section plus the offset of the symbol. For external symbols, we need to use the dlsym function to get the address of the symbol.

Calling from R

We wrap everything we need from above into a struct:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#[derive(Debug)]
pub struct AsmFunction<I: ISA> {
    text: Page,
    #[allow(unused)]
    data: Option<Page>,
    #[allow(unused)]
    rodata: Option<Page>,
    /// Function offset table.
    func: HashMap<String, usize>,
    _pin: PhantomPinned,
    _isa: std::marker::PhantomData<I>,
}

We put this into a Box, and then into an R external pointer so that R can tell us when it’s time to clean up.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#[export_name = "assemble"]
/// R external function to assemble a string into a module.
pub extern "C" fn assemble(input: SEXP) -> SEXP {
    let input = input
        .downcast_to::<CharacterVectorSEXP<_>>()
        .expect_r("input is not a string")
        .protect();

    if input.len() != 1 {
        Err::<(), _>("Expected a single string").unwrap_r();
    }
    let f = Box::new(
        AsmFunction::<X64ISA>::assemble(&input.get_elt(0).to_string())
            .expect_r("Failed to assemble"),
    );

    let ptr_inner = CharacterVectorSEXP::scalar("<asm_function>").protect();

    let ptr = Ptr::<SEXP, AsmFunction<X64ISA>>::wrap_boxed(f, r_nil(), ptr_inner);

    ptr.get_sexp()
}

Then, we do some macro magic to generate calling wrappers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
macro_rules! generate_asmcall {
    ($($name:ident( $( $arg_name:ident: $arg_ty:ident ),*))*) => {
        $(
            /// Call a function by name.
            #[cfg_attr(feature = "clobber_less", inline(never))] // let Rust clean up the registers as this function returns
            #[cfg_attr(not(feature = "clobber_less"), inline)]
            pub unsafe fn $name<R $(, $arg_ty)*>(&self, name: &str $(, $arg_name: $arg_ty)*) -> R {
                    let func =
                        self.text.as_ptr()
                            .byte_add(*self.func.get(name).expect("Function not found"));

                    let func = std::mem::transmute::<*const _, extern "C" fn($($arg_ty),*) -> R>(func);

                    log::debug!("Calling asm function {} at {:p}", name, func);

                    func($($arg_name),*)
            }
        )*
    }
}

Demo

Let’s see some demos in action!

Glue Code

The library is really simple, just equivalent to this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dyn.load("librasm.so")

assemble <- function(asm, flavor = "nasm", isa = "x86_64") {
    if (flavor != "nasm") {
        stop("Only NASM is supported at the moment!")
    }
    if (isa != "x86_64") {
        stop("Only x86_64 is supported at the moment!")
    }
    .Call("assemble", asm)
}

.Asm <- function(box, name, ...) {
    .Call("asm_call", box, name, list(...))
}

ForkR

For example, you want to call fork system call in Linux, you write this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
wait <- function(pid) {
    exit_labels <- c("exited", "killed", "dumped", "trapped", "stopped", "continued")
    print(sprintf("I am the parent R process! My child is: %d", pid))
    status <- .Asm(asm, "waitpidr", pid) # Dummy, an exercise for the reader
    print(sprintf("My child exited with status: %s!", exit_labels[status]))
}

code <- file("forkR.s", "r")

asm <- assemble(paste(readLines(code), collapse = "\n"))

pid <- .Asm(asm, "forkr")

if (pid == 0) {
    print(sprintf("I am the child R process! My PID is: %d", Sys.getpid()))
    print("Crashing!")
    .Asm(asm, "crashpls")
}

wait(pid)

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Forking a child process...
I'm the parent process from ASM!
I'm the child process from ASM!
[1] "I am the parent R process! My child is: 218008"
[1] "I am the child R process! My PID is: 218008"
[1] "Crashing!"
Crashing by executing UD2 in 3... 2... 1...

 *** caught illegal operation ***
address 0x7e8c0bf561ee, cause 'illegal operand'

Traceback:
 1: .Asm(asm, "crashpls")
An irrecoverable exception occurred. R is aborting now ...
[1] "My child exited with status: dumped!"

ASM code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
.section .rodata
    notice_msg db "Forking a child process...", 0
    ; omitted for brevity

%define SIGINFO_T_SIZE 128
; more %define omitted for brevity

.section .text
forkr: ; SEXP(void)
    mov rdi, notice_msg
    call R_ShowMessage
    xor eax, eax

    mov rax, sys_fork
    syscall
    test rax, rax
    js .error
    push rax
    
    mov edi, eax
    call Rf_ScalarInteger
    mov rdi, rax
    call Rf_protect

    push rax

    mov rdi, success_parent_msg
    mov r8, success_child_msg
    mov rcx, [rsp + 8]
    test rcx, rcx
    cmovz rdi, r8
    mov rsi, rax
    call R_ShowMessage
    xor eax, eax

    mov rdi, 0x1
    call Rf_unprotect

    pop rax
    pop rcx
    ret

waitpidr: ; SEXP(SEXP)
    call INTEGER; now %rax is the pointer to the pid
    mov r12, [rax]
    mov rax, 0

    mov rdi, P_PID
    mov rsi, r12
    sub rsp, SIGINFO_T_SIZE
    mov rdx, rsp
    mov r10, WEXITED
    xor r8, r8

    mov rax, sys_waitid
    syscall
    test rax, rax
    js .error

    xor rdi, rdi
    mov edi, DWORD [rsp + SIGINFO_T_CODE_OFFSET]
    add rsp, SIGINFO_T_SIZE
    jmp Rf_ScalarInteger

crashpls: ; !(void)
    ud2

SabotageR

You can also “sabotage” your R program by modifying the language itself :D

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
code <- file("sabotageR.s", "r")

asm <- assemble(paste(readLines(code), collapse = "\n"))

y = 0 # I don't like this!!! Do something about it!

invisible(.Asm(asm, "sabotage", "="))

x <- 1
print(sprintf("`<-` still works! x is now: %d", x))
# [1] "`<-` still works! x is now: 1"

tryCatch({
    y = 2
}, error = function(e) {
    print(e) # <simpleError in y = 2: This is R, use <- instead of `=` :D
}, finally = {
    print(sprintf("y is still: %d", y)) # y is still: 0
})

# Error in y = 2 : This is R, use <- instead of `=` :D
# Execution halted

This works under the hood by modifying the built-in function table. However sky is the limit here: we are physically in a different page running machine code, we can just reprotect the R binary, patch it on the fly, protect it back and return to R.

Here’s the ASM code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

%define FUNTAB_SIZE 40
%define SEXPREC_HEADER_LEN 32

section .rodata
    crashpls_msg db "Something's seriously wrong, crashing by executing UD2 in 3... 2... 1...", 0
    do_set_not_found_msg db "do_set not found in the function table.", 10, 0
    no_eq_sign_msg db "This is R, use <- instead of `=` :D", 0
    equal_sign db "=", 0
    fmt_s db "%s", 0
    fmt_s_nl db "%s", 10, 0

section .data
    real_do_set dq 0
    eq_assign_call_no dq 0

section .text

    extern Rf_errorcall
    extern Rf_error
    extern Rf_ScalarInteger
    extern R_ShowMessage
    extern strcmp
    extern R_FunTab
    extern R_CHAR
    extern Rf_asChar

sabotage: ; SEXP(SEXP) 
    ; for the operator named in the parameter
    ; replace its entry in the function table with a custom wrapper
    call Rf_asChar
    mov rdi, rax
    call R_CHAR
    mov r14, rax
    mov r12, R_FunTab ; lea is too far away :(
    sub r12, FUNTAB_SIZE
    mov r13d, 0
    .loop:
        inc r13d
        add r12, FUNTAB_SIZE
        mov rax, [r12]
        test rax, rax
        jz .notfound
        mov rdi, rax 
        mov rsi, r14 
        call strcmp
        test rax, rax
        jnz .loop

        lea rax, [r12 + 8] ; get the function pointer
        mov rcx, real_do_set
        mov [rcx], rax ; save the original function pointer
        dec r13d
        mov rcx, eq_assign_call_no
        mov [rcx], r13d ; save the index

        mov r13, __patched_do_set
        mov DWORD [r12 + 8], r13d ; patch the table, evil >:)

        mov rdi, 0x1
        jmp Rf_ScalarInteger
    .notfound:
        xor rdi, rdi
        jmp Rf_ScalarInteger

__patched_do_set: ; SEXP(SEXP, SEXP, SEXP, SEXP) // rdi is the call, rsi is the discr
    push rdi
    push rsi
    push rdx
    push rcx

    xor rcx, rcx
    mov ecx, DWORD [rsi + SEXPREC_HEADER_LEN]
    mov r12, eq_assign_call_no ; just a demo, not really complete
    cmp rcx, [r12]
    je .is_equal_sign
    
    pop rcx
    pop rdx
    pop rsi
    pop rdi

    mov r12, real_do_set
    jmp [r12]
    jmp crashpls
    
    .is_equal_sign:
        push rcx
        xor eax, eax
        mov rsi, fmt_s_nl
        mov rdx, no_eq_sign_msg
        call Rf_errorcall
        jmp crashpls

crashpls:
    mov rdi, crashpls_msg
    call R_ShowMessage
    ud2

Conclusion

This is a fun idea and I learned a lot about low-level programming and debugging. I guess it suits my interest since I was formerly an analyst in cybersecurity where we really do things as hacky as this and now I do data science, I am having some giggles on making R support inline assembly :D