website/_posts/cheri/2022-11-19-cheri.md

291 lines
26 KiB
Markdown
Raw Normal View History

2022-11-19 18:57:04 +00:00
---
layout: post
title: "CHERI"
---
## preamble
[CHERI](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/) is an acronym for Capability Hardware Enhanced RISC Instructions. it is a security-focussed project aimed at improving memory protection at the hardware level. the project is complex and it has many potential applications.
2022-11-22 00:13:56 +00:00
in this article I will go into some basics to give an understanding behind some changes that CHERI makes to how programs execute and are written. this will be focussed almost entirely in C, as this is where my experience lies - it is also where some of the effects of CHERI are most easily felt.this article is going to be a _very simplistic_ introduction to CHERI, and I'm going to attempt to explain the basics behind everything I cover. a basic understanding of C will be necessary.
2022-11-19 18:57:04 +00:00
***note:*** [the Morello platform](https://www.arm.com/architecture/cpu/morello) is an evaluation board produced by [Arm](https://www.arm.com/) to provide a physical implementation of CHERI extending [the Arm AArch64 ISA](https://en.wikipedia.org/wiki/AArch64). I previously worked on this platform at Arm, [porting the musl C library to Morello](https://git.morello-project.org/morello/musl-libc/). implementations for CHERI that are worth looking into from a more open perspective <a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf"> are the MIPS (chapter 4) and RISC-V (chapter 5) ones</a>. Morello is the only implementation that exists in a true hard core format, afaik - but this is obviously hard to obtain so you'll just be playing around with emulators/models anyway.
## memory safety bugs
2022-11-22 00:13:56 +00:00
to understand how CHERI tries to fix some simple issues, we'll first look at some simplified examples of issues that arise when we aren't using a CHERI-based architecture.
2022-11-19 18:57:04 +00:00
### a simple memory safety bug
let's take a look at this C code:
{% highlight c linenos %}
{% include_relative code/membug.c %}
{% endhighlight %}
2022-11-22 00:13:56 +00:00
and try running the compiled output of said program:
2022-11-19 18:57:04 +00:00
{% highlight console %}
$ ./membug
enter your name: jack
hello jack
my_perfect_string: what a beautiful string
{% endhighlight %}
2022-11-22 00:13:56 +00:00
works on my machine boss! code review +1, and merged...
2023-01-05 19:46:24 +00:00
...until our good friend [Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.](https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff_Sr.) comes along. he emails me a strange error he's running into:
2022-11-19 18:57:04 +00:00
{% highlight console %}
$ ./membug
enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.
hello Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.
my_perfect_string: hausenbergerdorff Sr.
{% endhighlight %}
2023-01-05 19:46:24 +00:00
***note:*** if you compile and run this on your machine, you may not get the same output. that's because we're invoking *undefined behaviour* here, so the compiler can kind of do whatever it wants. I'll always provide the output that demonstrates what I'm trying to show when giving examples like this. for what it's worth, I'm running `clang 10.0.0-4ubuntu1` with target `x86_64-pc-linux-gnu`. compilation options, the `Makefile`, and such are available [code subdirectory of this article's source](https://github.com/jackbondpreston/jackbondpreston.github.io/tree/master/_posts/cheri/code).
2022-11-22 00:13:56 +00:00
that's not supposed to happen! his name has spilled over into our `my_perfect_string[]` array! turns out our issue is that when we use `fgets(char *str, int count, FILE *stream)`, we've set the second parameter (`size`) to `1000` - but our `user_name[32]` array can only fit 32 characters (and the last of these should be a null terminator, so 31 usable characters).
2022-11-19 18:57:04 +00:00
2023-01-05 19:46:24 +00:00
`fgets()` fills up `user_name`, but it hasn't finished with the name yet! it doesn't care (or know) that `user_name` is full, it's just going to keep going until it finishes our user input, or reads 999 characters from standard input. thus it keeps mindlessly writing, overwriting the section memory we've used to store our precious perfect string (which happens to be immediately after `user_name`).
`fgets()` has a cousin, `gets(char *s)`, which is particularly poor with regards to memory safety, [and has largely been moved away from in modern C](https://linux.die.net/man/3/fgets):
> LSB deprecates `gets()`. POSIX.1-2008 marks `gets()` obsolescent. ISO C11 removes the specification of `gets()` from the C language, and since version 2.16, glibc header files don't expose the function declaration if the `_ISOC11_SOURCE` feature test macro is defined.
let's take a look at the stack in GDB to see how this happens:
2022-11-19 18:57:04 +00:00
{% highlight plaintext %}
(gdb) b memdebug.c:7
(gdb) run
Breakpoint 1, main () at membug.c:7
7 printf("enter your name: ");
(gdb) n
8 fgets(user_name, 1000, stdin); // get user's name from stdin
(gdb) n
9 printf("hello %s", user_name);
(gdb) x/56bc $sp
2022-11-22 00:13:56 +00:00
0x7fffffffdbf0: 106 'j' 97 'a' 99 'c' 107 'k' 10 '\n' 0 '\000' 0 '\000' 0 '\000'
0x7fffffffdbf8: 77 'M' 82 'R' 85 'U' 85 'U' 85 'U' 85 'U' 0 '\000' 0 '\000'
0x7fffffffdc00: -24 '\350' -78 '\262' -5 '\373' -9 '\367' -1 '\377' 127 '\177' 0 '\000' 0 '\000'
0x7fffffffdc08: 0 '\000' 82 'R' 85 'U' 85 'U' 85 'U' 85 'U' 0 '\000' 0 '\000'
0x7fffffffdc10: 119 'w' 104 'h' 97 'a' 116 't' 32 ' ' 97 'a' 32 ' ' 98 'b'
0x7fffffffdc18: 101 'e' 97 'a' 117 'u' 116 't' 105 'i' 102 'f' 117 'u' 108 'l'
0x7fffffffdc20: 32 ' ' 115 's' 116 't' 114 'r' 105 'i' 110 'n' 103 'g' 0 '\000'
2022-11-19 18:57:04 +00:00
{% endhighlight %}
we can see our two character arrays are right next to each other on the stack (`user_name` contains some gibberish as it is not zero-initialised).
2022-11-22 00:13:56 +00:00
***note:*** this code was compiled with `-fno-stack-protector` to reproduce this behaviour. compilers have certain techniques which can help protect against such attacks, but there are often ways around these by using less primitive attacks. we are just ignoring these in this article for simplicity.
2022-11-19 18:57:04 +00:00
2022-11-22 00:13:56 +00:00
okay, at least it's a pretty easy fix: we just need to change the `fgets()` parameter `size` to `32`.
2022-11-19 18:57:04 +00:00
2022-11-22 00:13:56 +00:00
***note:*** you may initially think "why not `31`? don't we need to save a character for the null byte at the end?". thankfully, `fgets` does this for us. excerpt from `man fgets`:
2022-11-19 18:57:04 +00:00
2022-11-22 00:13:56 +00:00
> `"fgets() reads in _at most one less than size_ characters from stream and stores them into the buffer pointed to by s [...] A terminating null byte ('\0') is stored after the last character in the buffer".`
2022-11-19 18:57:04 +00:00
this is a good question to be asking though, being careful is key when it comes to these kinds of things.
### why hardware?
2022-11-22 00:13:56 +00:00
okay, that wasn't too bad. why are we talking about doing anything in hardware here? just write the code correctly!
we've looked at a very simplistic situation, with no real stakes and nothing to exploit (and an unrealistically simple bug). if this bug was exploitable for malicious gain, it could already be too late by the time we found it.
2022-11-19 18:57:04 +00:00
2022-11-22 00:13:56 +00:00
memory safety problems make up the vast majority of problematic security issues. the Chromium project [found 70% of its serious security bugs were memory safety related](https://www.chromium.org/Home/chromium-security/memory-safety/) and [Microsoft found the same prevalence](https://msrc-blog.microsoft.com/2019/07/16/a-proactive-approach-to-more-secure-code/). some memory safety bugs can be incredibly complicated and go unnoticed for decades. the C language especially gives the programmer many, many opportunities to make mistakes - and it only takes one to be a problem. a lot of the software we are using these days is based on layers upon layers of software written in different languages, and there are going to be bugs in there. CHERI aims to give us some protection at a hardware level.
***Note:*** some languages (e.g. Rust) are going to offer you strong memory safety guarantees at compile-time, but I'm not going to include the discussions around this and how it compares to CHERI in this article. this article will focussed on how CHERI applies to C (and to some extent, C++ by extension).
2022-11-19 18:57:04 +00:00
## pointers recap
let's quickly recap a basic idea of what a pointer is. we're going to ignore things like [virtual memory](https://en.wikipedia.org/wiki/Virtual_memory) for brevity. we can think of a pointer in a normal 64-bit architecture (e.g. AArch64) simply as a 64-bit unsigned value that holds the memory address of something we care about. this is a simplification (as are most things), but it can help us reason about the general idea:
{% highlight c %}
int val = 1593;
int *x = &val; // x points to val
{% endhighlight %}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 314"><defs><style>.prefix__prefix__d{fill:none;stroke-miterlimit:10}.prefix__prefix__f,.prefix__prefix__h,.prefix__prefix__i{font-size:24px}.prefix__prefix__f,.prefix__prefix__h,.prefix__prefix__k{fill:#fcfcfc}.prefix__prefix__f,.prefix__prefix__l{font-family:Source Code Pro}.prefix__prefix__d{stroke:gray;stroke-width:4px}.prefix__prefix__h,.prefix__prefix__m{font-family:Source Code Pro;font-weight:700}.prefix__prefix__i{fill:gray}</style></defs><g id="prefix__prefix__a"><path fill="#0c1114" d="M0 0h1920v314H0z"/><text class="prefix__prefix__h" transform="translate(577.46 133.41)"><tspan x="0" y="0">int *x</tspan></text><text class="prefix__prefix__f" transform="translate(490.97 177.1)"><tspan x="0" y="0">0x0000010000000004</tspan></text><path d="M481.16 206v18.5M760.5 206v18.5m-279 0h279" stroke="#fcfcfc" fill="none" stroke-miterlimit="10" stroke-linecap="square" stroke-width="3"/><text transform="translate(578.78 241.33)" font-size="20" font-family="Source Code Pro" fill="#fcfcfc"><tspan x="0" y="0">address</tspan></text><path stroke-width="4" stroke="#fcfcfc" fill="none" stroke-miterlimit="10" d="M752 171h204.56"/><path class="prefix__prefix__k" d="M948.64 182.62L992 171.01l-43.36-11.63v23.24z"/><text transform="translate(1272.76 177.16)" fill="#fcfcfc" font-size="24"><tspan class="prefix__prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__prefix__l" x="57.6" y="0">0x0000010000000004</tspan><tspan class="prefix__prefix__m" x="316.79" y="0">]</tspan></text><text class="prefix__prefix__i" transform="translate(1272.76 133.16)"><tspan class="prefix__prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__prefix__l" x="57.6" y="0">0x0000010000000000</tspan><tspan class="prefix__prefix__m" x="316.79" y="0">]</tspan></text><text class="prefix__prefix__i" transform="translate(1271.76 224.16)"><tspan class="prefix__prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__prefix__l" x="57.6" y="0">0x0000010000000008</tspan><tspan class="prefix__prefix__m" x="316.79" y="0">]</tspan></text></g><g id="prefix__prefix__b"><path class="prefix__prefix__d" d="M1260 58v48H985V58"/><path d="M1258 195v40H987v-40h271m4-4H983v48h279v-48zm-4-84v40H987v-40h271m4-4H983v48h279v-48z" fill="gray"/><path class="prefix__prefix__k" d="M756.16 150.93v40h-271v-40h271m4-4h-279v48h279v-48zM1258 151v40H987v-40h271m4-4H983v48h279v-48z"/><text class="prefix__prefix__f" transform="translate(1094 177.09)"><tspan x="0" y="0">1593</tspan></text><text class="prefix__prefix__h" transform="translate(1007.6 45.16)"><tspan x="0" y="0">memory (as ints)</tspan></text><path class="prefix__prefix__d" d="M1260 284v-48H985v48"/></g></svg>
and on these normal architectures, this pointer generally is just a number. we can do weird things with it, treating it as a number...
{% highlight c linenos %}
{% include_relative code/ptrs_as_numbers.c %}
{% endhighlight %}
...and this code will often still work:
{% highlight console %}
$ ./ptrs_as_numbers
2022-11-22 00:13:56 +00:00
*(7fff98640c20)=1234
*(7fff98640c24)=5678
*(7fff98640c28)=9999
2022-11-19 18:57:04 +00:00
{% endhighlight %}
2022-11-22 00:13:56 +00:00
yikes! now, when you start messing with pointers like this, you're bound to run into a bunch of undefined behaviour. but C programmers write undefined behaviour all the time (and not always by accident), and my computer executes this program fine without complaining at all. doesn't it feel a bit weird that we can take a pointer to `arr[0]` and modify it to load `secret`? they're not even part of the same array...
2022-11-19 18:57:04 +00:00
## introducting capabilities
2022-11-22 00:13:56 +00:00
CHERI introduces capabilities, which can be thought of as an extension to pointers. they still store an address of something we care about, but they have extra information too! in a 64-bit system, a pointer would typically be a 64-bit value (as dicussed previously). the corresponding capability in a CHERI platform is 128 bits (or 129 bits if you look at it a certain way, more about that later).
2022-11-19 18:57:04 +00:00
2022-11-22 00:13:56 +00:00
as you might have guessed, this "extra information" takes up 64 bits of the capability. bits are assigned to three key pieces of metadata: *bounds*, *permissions*, and *object type*. there is also an additional 1-bit _tag_ which is stored out-of-band: it is not a 129-bit value - instead each 128-bit capability can be thought of as being associated with a 1-bit validity tag. the architecture manages this association for us. the diagram below is provided as a rough overview of this. note that it is not to scale.
2022-11-19 18:57:04 +00:00
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 314"><defs><style>.prefix__c{fill:none;stroke:#fcfcfc;stroke-linecap:square;stroke-miterlimit:10;stroke-width:3px}.prefix__f,.prefix__g{fill:#fcfcfc}.prefix__f{font-family:Source Code Pro;font-size:20px}</style></defs><g id="prefix__a"><path fill="#0c1114" d="M0 0h1920v314H0z"/><text transform="translate(101.86 232.41)" font-family="Source Code Pro" font-weight="700" fill="#fcfcfc" font-size="24"><tspan x="0" y="0">int *x (capability)</tspan></text><text transform="translate(1205.97 232.1)" font-family="Source Code Pro" fill="#fcfcfc" font-size="24"><tspan x="0" y="0">0x0000010000000004</tspan></text><path class="prefix__c" d="M1016 261v18.5M1656 261v18.5M1016 279.5h640"/><text class="prefix__f" transform="translate(1293.78 296.33)"><tspan x="0" y="0">address</tspan></text><path class="prefix__c" d="M700 191.5V173M1020 191.5V173M700 173h320"/><text class="prefix__f" transform="translate(823.78 167.74)"><tspan x="0" y="0">bounds</tspan></text><path class="prefix__c" d="M554 260.34v18.5M704 260.34v18.5M554 278.84h150"/><text class="prefix__f" transform="translate(562.78 295.68)"><tspan x="0" y="0">object type</tspan></text><g><path class="prefix__c" d="M391.89 191.56v-18.5M541.89 191.56v-18.5M391.89 173.06h150"/></g><text class="prefix__f" transform="translate(400.67 167.8)"><tspan x="0" y="0">permissions</tspan></text><text class="prefix__f" transform="translate(304.67 31.07)"><tspan x="0" y="0">tag (out-of-band)</tspan></text><g><path class="prefix__c" d="M391.33 55.92v-18.5M421.33 55.92v-18.5M391.33 37.42h30"/></g></g><g id="prefix__b"><path class="prefix__g" d="M1651.66 205.93v40h-632v-40h632m4-4h-640v48h640v-48z"/><path class="prefix__g" d="M1016 206v40H704v-40h312m4-4H700v48h320v-48z"/><path class="prefix__g" d="M700 206v40H558v-40h142m4-4H554v48h150v-48z"/><path class="prefix__g" d="M554 206v40h-12v-40h12m4-4h-20v48h20v-48z"/><path class="prefix__g" d="M538 206v40H396v-40h142m4-4H392v48h150v-48zM418.5 70v40h-22V70h22m4-4h-30v48h30V66z"/></g></svg>
I am mostly going to focus on _bounds_ in this article, as it is not too difficult to grasp, and the impact is fairly easy to demonstrate for some simple examples. the bounds represent an upper and lower bound on the memory region (address space) that this capability is allowed to access. if we try to use the capability to access some address outside of this range, the hardware will throw a fault - it simply won't let us do this!
**_note:_** it is important to note that I am going to oversimplify the way the bounds are stored in this article. this especially includes the diagram above. in reality, there is a complex compression method, necessitated by the range and sizes required by bounds. this depends on the address value, alignment, etc. for now, we shouldn't need to think about this much, just know it will be managed for us. the key take-away from this is that *bounds can't always be 100% precise for all addresses and ranges*.
can you imagine how we can use bounds to prevent our previous memory safety bug from occurring? the key is that we can set the bounds on the capability pointing to `user_name` which we pass to `fgets`, such that the capability may only access the contents of the array. this means that when `fgets` tries to write past the end of the `user_name` array, the processor will throw a *capability fault*, and execution of our program will cease.
2022-11-22 00:13:56 +00:00
the idea behind CHERI is that we as the C programmer don't have to set up these bounds ourselves most of the time---this is something the compiler can generate code for. the compiler knows that the `user_name` array has a length of `32`, and can set the bounds accordingly on capabilities created that point to it. let's try it...
2022-11-19 18:57:04 +00:00
## playing with CHERI RISC-V
unless you're lucky enough to have access to a physical Morello board, there is the issue of actually using a CHERI implementation. for this article I will be making use of the [QEMU](https://en.wikipedia.org/wiki/QEMU) emulator to emulate a [RISC-V](https://en.wikipedia.org/wiki/RISC-V) CHERI environment. running [CheriBSD](https://www.cheribsd.org/) on this emulator will allow us to have a nice [FreeBSD](https://www.freebsd.org/)-based capability-enabled environment to play around with. I'll use [cheribuild](https://github.com/CTSRD-CHERI/cheribuild) to easily get set up (the `cheribuild.py` step will take a very long time the first time):
{% highlight console %}
$ sudo apt install autoconf automake libtool pkg-config clang bison cmake \
ninja-build samba flex texinfo time libglib2.0-dev libpixman-1-dev \
libarchive-dev libarchive-tools libbz2-dev libattr1-dev libcap-ng-dev
$ git clone git@github.com:CTSRD-CHERI/cheribuild
$ cd cheribuild
$ ./cheribuild.py --include-dependencies --run/ssh-forwarding-port 2222 run-riscv64-purecap
CheriBSD/riscv (cheribsd-riscv64-purecap) (ttyu0)
login: root
root@cheribsd-riscv64-purecap:~ #
{% endhighlight %}
2022-11-22 00:13:56 +00:00
now we have our shell inside our CheriBSD emulated platform, we can start to try things out. let's compile our `membug` program again, this time with the toolchain targetting CheriBSD RISC-V - this will have been built as part of the dependencies already.
once our `membug-cheribsd` executable is built, we can `scp` it over to the CheriBSD filesystem. remember, we set up the SSH forwarding port to `1111`.
from a terminal on your host machine:
2022-11-19 18:57:04 +00:00
{% highlight console %}
$ ~/cheri/output/sdk/utils/cheribsd-riscv64-purecap-clang membug.c -Wall -g -fno-stack-protector -o membug-cheribsd
$ scp -P 2222 ./membug-cheribsd root@localhost:~/
{% endhighlight %}
and now we can see what happens when we explore our bug with CHERI:
{% highlight console %}
$./membug-cheribsd
enter your name: jack
hello jack
my_perfect_string: what a beautiful string
$ ./membug-cheribsd
enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.
In-address space security exception (core dumped)
{% endhighlight %}
it's working! we are getting a capability fault as we exceed the bounds of the
`user_name` capability bounds. we can use gdb to verify this is caused by the bounds fault:
{% highlight plaintext linenos %}
(gdb) run
Starting program: /root/membug-cheribsd
enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.
Program received signal SIGPROT, CHERI protection violation.
Capability bounds fault caused by register ca6.
0x0000000040314ce8 in memcpy (dst0=0x3fffdfff44, src0=<optimized out>, length=54) at /home/jack/cheri/cheribsd/lib/libc/string/bcopy.c:143
(gdb) p $ca6
$1 = () 0x3fffdfff78 [rwRW,0x3fffdfff44-0x3fffdfff64]
{% endhighlight %}
as we can see, the bounds for our `user_name` capability (which is stored in capability register `ca6`) are `0x3fffdfff44-0x3fffdfff64`, but the address is `0x3fffdfff78`. this is out of the bounds allowed by the capability, so the architecture throws a fault. if we look at the assembly generated by the compiler, we can see it set our capability bounds to a size of 32 to enforce this behaviour:
{% highlight armasm linenos %}0000000000001ce8 <main>:
; int main() {
cincoffset csp, csp, -160
csc cra, 144 (csp)
csc cs0, 128 (csp)
cincoffset cs0, csp, 160
cincoffset ca0, cs0, -36
csetbounds ca2, ca0, 4
cincoffset ca0, cs0, -60
csetbounds ca0, ca0, 24
csc ca0, -128 (cs0)
cincoffset ca1, cs0, -92
csetbounds ca1, ca1, 32
csc ca1, -144 (cs0)
mv a1, zero
csd a1, -104 (cs0)
csw a1, 0 (ca2)
{% endhighlight %}
2022-11-30 00:53:42 +00:00
### chains of capabilities
2022-11-19 18:57:04 +00:00
at this point you may be thinking "okay, that's great, but if we can just set the bounds of a capability with an instruction then what's the point? surely I can just set global bounds on some random pointer and access whatever I want?"
2022-11-30 00:53:42 +00:00
fundamental to the idea of capabilities is their *provenance* and *monotonicity*.
*provenance*, simply put, means we can only construct a capability from an existing capability, using specific instructions. we can't just create a capability from some random `size_t` and use it to load/store something. let's see what happens when we try to run our `ptrs_as_numbers` program on CheriBSD:
2022-11-19 18:57:04 +00:00
{% highlight plaintext %}
2022-11-30 00:53:42 +00:00
(gdb) run
Starting program: /root/ptrs_as_numbers-cheribsd
2022-11-19 18:57:04 +00:00
*x=1234
2022-11-21 00:07:45 +00:00
Program received signal SIGPROT, CHERI protection violation.
Capability tag fault caused by register ca1.
0x0000000000101c66 in main () at ptrs_as_numbers.c:1414
printf("*x=%d\n", *x);
2022-11-19 18:57:04 +00:00
(gdb) p $ca1
$1 = () 0x3fffdfff74
{% endhighlight %}
2022-11-30 00:53:42 +00:00
we get a fault, because the tag isn't set. any capability with a tag not set to 1 cannot be dereferenced -- it is invalid. in fact, this capability has no capability metadata -- when we copied it into our `unsigned long`, we just copied the 64-bit address.
2022-11-19 18:57:04 +00:00
2022-11-30 00:53:42 +00:00
*monotonicity* is what stops us taking an existing capability and creating a capability with more permissions and/or access than the original. it stipulates that when we create a capability from another capability (which we have to do -- provenance), the permissions and bounds of the new capability must be less than or equal to the original. so our bounds can only get narrower as we create new capabilites from an existing capability. this means that capabilities trace back in a chain - they are all created from other capabilities, and narrowed as necessary. in this case, (simplified) when the kernel loads our program it will give us capabilities that are wide enough to do everything we need to do, and the compiler will try and make sure all the capabilities that we make and use from these are as tightly bound and unpermissive as possible.
2022-11-19 18:57:04 +00:00
### CHERI-fying code
2022-11-30 00:53:42 +00:00
you'll notice we got a lot of these benefits "for free". we only had to recompile our code, and we gained this extra security. of course, CHERI does require changes to program sources. naturally, the compiler was changed a lot to implement this behaviour. in particular, CHERI also requires changes to things like the C library and kernel in order to take advantage of the features fully. sufficiently large userspace programs will generally require source changes.
one common issue is that a lot of existing C code assumes that `sizeof (*void) == sizeof(size_t)`. with CHERI, our pointers are now twice as big. however, `size_t` hasn't changed size, as the address space size hasn't changed - for example, if we index into an array with `size_t`, the index should still be the same size; the extra data in our `void *` capability is the metadata, not extra address data. any program that tries to convert from some `unsigned long` or `size_t` to a capability will fault - this violates provenance. so, sometimes code changes have to be made to ensure we are keeping the capability metadata around. in CHERI, we can use `ptraddr_t` to store addresses and `[u]intptr_t` to store capabilities.
2022-11-19 18:57:04 +00:00
2022-11-30 00:53:42 +00:00
let's make a program to see some differences in types, and demonstrate how `uintptr_t` can preserve capabilities:
{% highlight c linenos %}
{% include_relative code/ptrtypes.c %}
{% endhighlight %}
running this on our non-CHERI host will give us:
{% highlight terminal %}
$ ./ptrtypes
type size (hex) size (dec)
=====================================
uintptr_t 0x08 08
size_t 0x08 08
void* 0x08 08
=====================================
{% endhighlight %}
running this on CHERI (64-bit):
{% highlight terminal %}
$ ./ptrtypes-cheribsd
type size (hex) size (dec)
=====================================
ptraddr_t 0x08 08
uintptr_t 0x10 16
size_t 0x08 08
void* 0x10 16
=====================================
*b: 888
*b: 111
*b: 999
{% endhighlight %}
2022-11-19 18:57:04 +00:00
## epilogue
I appreciate this has been a fragmented and surface level introduction to CHERI. hopefully it has provided some education in some basic aims of CHERI regardless. potential benefits and uses for CHERI go much deeper than anything I've touched on here, so please, read more about everything - and get your hands dirty trying out messing about with qemu and CheriBSD!
here are some links to check out:
- [CHERI homepage @ CUCL](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/)
- [technical report: An Introduction to CHERI](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf)
- [technical report: CHERI C/C++ Programming Guide](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf)
- [technical report: CHERI ISAv8](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf)
- [Morello homepage @ Arm](https://www.arm.com/architecture/cpu/morello)
- [Morello Architecture Reference Manual @ Arm](https://developer.arm.com/documentation/ddi0606/latest)