diff --git a/articles/f91w/index.html b/articles/f91w/index.html index 82c37b6..f21f897 100644 --- a/articles/f91w/index.html +++ b/articles/f91w/index.html @@ -12,27 +12,12 @@ + + +
++CHERI is an acronym for +Capability Hardware Enhanced RISC Instructions. it is a security-focussed project aimed at +improving memory protection at the hardware level. the project is complex and it has many potential +applications. +
++in this article I will go into some basics to give an understanding behind some changes that CHERI +makes to how programs execute and are written. this will be focussed almost entirely in C, as this +is where my experience lies - it is also where some of the effects of CHERI are most easily felt. +this article is going to be a very simplistic introduction to CHERI, and I'm going to +attempt to explain the basics behind everything I cover. a basic understanding of C will be +beneficial. +
++note: the Morello +platform is an evaluation board produced by Arm to provide a +physical implementation of CHERI extending the Arm +AArch64 ISA. I previously worked on this platform at Arm, +porting the musl C library to +Morello. implementations for CHERI that are worth looking into from a more open perspective +are the MIPS (chapter 4) +and RISC-V (chapter 5) ones. Morello is the only implementation that exists in a true hard core +format, afaik - but this is obviously hard to obtain so you'll just be playing around with +emulators/models anyway. +
+ ++to first understand how CHERI tries to fix some simple issues, let's first look at some simplified +examples of issues that arise when we aren't using a CHERI-based architecture. +
+ ++let's take a look at this C code: +
+ +1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 +11 +12 +13 |
|
+now let's try using our new program: +
+ +1 +2 +3 +4 |
|
+works on my machine boss! code review +1, and merged... until our good friend + +Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr. comes along. he emails me a strange +error he's seen: +
+ +1 +2 +3 +4 |
|
+that's not supposed to happen! his name has spilled over into our my_perfect_string[]
+array! turns out our issue is that when we use fgets()
, we've set the second
+parameter, size
, to 1000
- but our user_name[32]
array c1593an
+only fit 32 characters (and the last of these should be a null terminator, so 31 usable characters).
+fgets
fills up user_name
, but it hasn't finished with the name yet! it
+doesn't care (or know) that user_name
is full, it's just going to keep going until it
+finishes our user input, or reads 999 characters from standard input. and thus it keeps mindlessly
+writing, overwriting the memory we've used to store our precious perfect string (which happens to
+be immediately after user_name
). let's take a look at the stack in GDB to see why this
+happens:
+
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 +11 +12 +13 +14 +15 +16 |
|
+we can see our two character arrays are right next to each other on the stack
+(user_name
contains some gibberish as it is not zero-initialised.
+
+note: this code was compiled with -fno-stack-protector
to reproduce this
+behaviour. compilers have certain techniques like this which can help protect against such attacks,
+but there are often ways around these by using less primitive attacks.
+
+okay, it's a pretty easy fix, we just need to change the fgets(char *s, int size, FILE *stream)
+parameter size
to 32
.
+
+note: you may initially think "why not 31? don't we need to save a character for
+the null byte at the end?". thankfully, fgets
does this for us. excerpt from man
+fgets
: "fgets() reads in at most one less than size characters from stream and stores
+them into the buffer pointed to by s [...] A terminating null byte ('\0') is stored after the last
+character in the buffer". this is a good question to be asking though, being careful is key when it
+comes to these kinds of things.
+
+okay, so that's an easy fix. why are we talking about doing anything in hardware here? just write +the code correctly! the issue is code gets very complex, and this is a very simplistic situation. +some memory safety bugs can be incredibly complicated and go unnoticed for decades. the C language +especially gives the programmer many, many opportunities to make mistakes - and it only takes one +to be a problem. a lot of the software we are using these days is based on stacks upon stacks of +software written in different languages, and there are going to be bugs in there. CHERI should +give us some protection "for free" (it's not this simple, in actuality). +
++some languages (e.g. Rust) are going to offer you strong memory safety guarantees +at compile-time, but that's not the topic of this article. the differences between doing this +kind of protection in software or hardware (or both) is more complex than the scope of this +article. in addition, CHERI's benefits are more wide in breadth than just protecting against this +kind of issue. +
+ ++let's quickly recap a basic idea of what a pointer is. we're going to ignore things like +virtual memory for brevity. we can think +of a pointer in a normal 64-bit architecture (e.g. AArch64) simply as a 64-bit unsigned value that +holds the memory address of something we care about. this is a simplification (as are most things), +but it can help us reason about the general idea: +
+1 +2 |
|
+and on these normal architectures, this pointer generally is just a number. we can do weird things +with it, treating it as a number... +
+1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 |
|
...and this code will often still work:
+1 +2 +3 +4 |
|
+yikes! now, when you start messing with pointers like this, you're bound to run into a bunch of
+undefined behaviour. but C programmers write undefined behaviour all the time, and my computer
+executes this program fine without complaining at all. doesn't it feel a bit weird that we can take
+a pointer to arr[0]
and modify it to load secret
? they're not even part
+of the same array...
+
+CHERI introduces capabilities, which can be thought of as an extension to pointers. they still +store an address of something we care about, but they have extra information too! in a 64-bit +system, a pointer would typically be a 64-bit value (as dicussed previously). the corresponding +capability in a CHERI platform is 128 bits (or 129 bits if you look at it a certain way, more about +that later...). +
++as you might have guessed, this "extra information" takes up 64 bits of the capability. bits are +assigned to three key pieces of metadata: bounds, permissions, and +object type. there is also an additional 1-bit tag which is stored out-of-band: it is +not a 129-bit value - instead each 128-bit capability can be thought of as being associated with a +1-bit validity tag. the architecture manages this. the diagram below is provided as a rough +overview of this. note that it is not to scale. +
+ + ++I am mostly going to focus on bounds in this article, as it is not too difficult to grasp, +and the impact is fairly easy to demonstrate for some simple examples. the bounds represent an +upper and lower bound on the memory region (address space) that this capability is allowed to +access. if we try to use the capability to access some address outside of this range, the hardware +will throw a fault - it simply won't let us do this! +
++note: it is important to note that I am going to oversimplify the way the bounds are +stored in this article. this especially includes the diagram above. in reality, there is a complex +compression method, necessitated by the range and sizes required by bounds. this depends on the +address value, alignment, etc. for now, we shouldn't need to think about this much, just know it +will be managed for us. the key take-away from this is that bounds can't always be 100% precise +for all addresses and ranges. +
+
+can you imagine how we can use bounds to prevent our previous memory safety bug from occurring? the
+key is that we can set the bounds on the capability pointing to user_name
which we
+pass to fgets
, such that the capability may only access the contents of the array.
+this means that when fgets
tries to write past the end of the user_name
+array, the processor will throw a capability fault, and execution of our program will cease.
+
+the idea behind CHERI is that we don't have to set up these bounds ourselves. this is something the
+compiler can generate code for. the compiler knows that the user_name
array has a
+length of 32
, and can set the bounds accordingly on capabilities created that point to
+it. let's try it...
+
+unless you're lucky enough to have access to a physical Morello board, there is the issue of +actually using a CHERI implementation. for this article I will be making use of the +QEMU emulator to emulate a +RISC-V CHERI environment. running +CheriBSD on this emulator will allow us to have a nice +FreeBSD-based capability-enabled environment to play around +with. I'll use cheribuild to easily get set +up: +
+1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 |
|
+now we have our shell inside our CheriBSD emulated platform, we can start to try things out. let's
+compile our membug
program again, this time with the toolchain targetting
+CheriBSD RISC-V - this will have been built as part of the dependencies already. once it's built,
+we can scp
it over to the CheriBSD filesystem, as we set up the SSH forwarding port to
+1111
.
+
1 +2 +3 |
|
+and now we can see what happens when we explore our bug with CHERI: +
+1 +2 +3 +4 +5 +6 +7 +8 |
|
+it's working! we are getting a capability fault as we exceed the bounds of the
+user_name
capability bounds. we can use gdb to verify this is
+caused by the bounds fault:
+
1 +2 +3 +4 +5 +6 +7 +8 +9 |
|
+as we can see, the bounds for our user_name
capability (which is stored in capability
+register ca6
) are 0x3fffdfff44-0x3fffdfff64
, but the address is
+0x3fffdfff78
. this is out of the bounds allowed by the capability, so the architecture
+throws a fault. if we look at the assembly generated by the compiler, we can see it set our
+capability bounds to a size of 32 to enforce this behaviour:
+
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 +11 +12 +13 +14 +15 +16 +17 |
|
+at this point you may be thinking "okay, that's great, but if we can just set +the bounds of a capability with an instruction then what's the point? surely +I can just set global bounds on some random pointer and access whatever I want?" +
+
+fundamental to the idea of capabilities is their provenance and
+monotonicity. simply put, the first says we can only construct a
+capability using specific instructions, from an existing capability. we can't
+just create a capability from some random number. let's see what happens when
+we try to run our ptrs_as_numbers
program on CheriBSD:
+
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 |
|
+we can see we get a fault - the tag isn't set. any capability with a tag not
+set to 1 cannot be dereferenced - it is invalid. in fact, this capability has
+no capability metadata - when we copied it into our unsigned long
,
+we just copied the 64-bit address.
+
+monotonicity is what stops us taking an existing capability, and +creating a capability with more permissions and/or access than the original. it +stipulates that when we create a capability from another capability (which we +have to do - provenance), the permissions and bounds of the new capability must +be equal to or less than the original. so our bounds can only get narrower as +we create new capabilites from an existing capability. this means that +capabilities trace back in a chain - they are all created from other +capabilities, and narrowed as necessary. in this case, (simplified) when the +kernel loads our program it will give us capabilities that are wide enough to +do everything we need to do, and the compiler will try and make sure all the +capabilities that we make and use from these are as tightly bound and +unpermissive as possible. +
+
+you'll notice we got a lot of these benefits "for free". we only had to
+recompile our code, and we got this extra security. of course, CHERI does
+require changes to programs. naturally, the compiler had to be changed a lot to
+implement this behaviour. it also especially requires changes to things like
+the C library and kernel in order to take advantage of the features fully.
+sufficiently large userspace programs do need changes too. one common issue is
+that a lot of existing C code assumes that
+sizeof (*void) == sizeof(size_t)
. with CHERI, our pointers are
+now twice as big. however, size_t
hasn't changed size, as the
+address space size hasn't changed - for example, if we index into an array with
+size_t
, the index should still be the same size; the extra data in
+our void *
capability is the metadata, not extra address data. any
+program that tries to convert from some unsigned long
or
+size_t
to a capability will fault - this violates provenance. so,
+sometimes code changes have to be made to ensure we are keeping the capability
+metadata around.
+
+I appreciate this has been a fragmented and surface level introduction to +CHERI. hopefully it has provided some education in some basic aims of CHERI +regardless. potential benefits and uses for CHERI go much deeper than anything +I've touched on here, so please, read more about everything - and get your +hands dirty trying out messing about with qemu and CheriBSD! +
++here are some links to check out: +
+ +