website/articles/morello/index.html

561 lines
42 KiB
HTML
Raw Normal View History

2022-11-19 11:56:01 +00:00
<!DOCTYPE html>
<html lang="en">
<head>
<title>cheri (and morello) | jack bond-preston</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="../../style/stylesheet.css">
<link rel="stylesheet" href="../../style/pygments.css">
<link rel="icon" type="image/png" href="">
<style>
</style>
</head>
<body>
<div class="article">
<!-- article body { -->
<h1 id="title"><a href=".">cheri (and morello)</a></h1>
<h2 id="premable"><a href="#preamble">preamble</a></h2>
<p>
<a href="https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/">CHERI</a> is an acronym for
Capability Hardware Enhanced RISC Instructions. it is a security-focussed project aimed at
improving memory protection at the hardware level. the project is complex and it has many potential
applications.
</p>
<p>
in this article I will go into some basics to give an understanding behind some changes that CHERI
makes to how programs execute and are written. this will be focussed almost entirely in C, as this
is where my experience lies - it is also where some of the effects of CHERI are most easily felt.
this article is going to be a <i>very simplistic</i> introduction to CHERI, and I'm going to
attempt to explain the basics behind everything I cover. a basic understanding of C will be
beneficial.
</p>
<p>
<i><b>note:</b></i> <a href="https://www.arm.com/architecture/cpu/morello">the Morello
platform</a> is an evaluation board produced by <a href="https://www.arm.com/">Arm</a> to provide a
physical implementation of CHERI extending <a href="https://en.wikipedia.org/wiki/AArch64">the Arm
AArch64 ISA</a>. I previously worked on this platform at Arm,
<a href="https://git.morello-project.org/morello/musl-libc/">porting the musl C library to
Morello</a>. implementations for CHERI that are worth looking into from a more open perspective
<a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf">are the MIPS (chapter 4)
and RISC-V (chapter 5) ones</a>. Morello is the only implementation that exists in a true hard core
format, afaik - but this is obviously hard to obtain so you'll just be playing around with
emulators/models anyway.
</p>
<h2 id="memory-safety-bugs"><a href="#memory-safety-bugs">memory safety bugs</a></h2>
<p>
to first understand how CHERI tries to fix some simple issues, let's first look at some simplified
examples of issues that arise when we aren't using a CHERI-based architecture.
</p>
<h3>a simple memory safety bug</h3>
<p>
let's take a look at this C code:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">my_perfect_string</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;what a beautiful string&quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// so beautiful, I sure hope no-one touches it</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">user_name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;enter your name: &quot;</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">fgets</span><span class="p">(</span><span class="n">user_name</span><span class="p">,</span><span class="w"> </span><span class="mi">1000</span><span class="p">,</span><span class="w"> </span><span class="n">stdin</span><span class="p">);</span><span class="w"> </span><span class="c1">// get user&#39;s name from stdin</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;hello %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">user_name</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;my_perfect_string: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">my_perfect_string</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div></td></tr></table></div>
<p>
now let's try using our new program:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp">$ </span>./membug
<span class="go">enter your name: jack</span>
<span class="go">hello jack</span>
<span class="go">my_perfect_string: what a beautiful string</span>
</code></pre></div></td></tr></table></div>
<p>
works on my machine boss! code review +1, and merged... until our good friend
<a href="https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff_Sr.">
Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.</a> comes along. he emails me a strange
error he's seen:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp">$ </span>./membug
<span class="go">enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.</span>
<span class="go">hello Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.</span>
<span class="go">my_perfect_string: hausenbergerdorff Sr.</span>
</code></pre></div></td></tr></table></div>
<p>
that's not supposed to happen! his name has spilled over into our <code>my_perfect_string[]</code>
array! turns out our issue is that when we use <code>fgets()</code>, we've set the second
parameter, <code>size</code>, to <code>1000</code> - but our <code>user_name[32]</code> array c1593an
only fit 32 characters (and the last of these should be a null terminator, so 31 usable characters).
<code>fgets</code> fills up <code>user_name</code>, but it hasn't finished with the name yet! it
doesn't care (or know) that <code>user_name</code> is full, it's just going to keep going until it
finishes our user input, or reads 999 characters from standard input. and thus it keeps mindlessly
writing, overwriting the memory we've used to store our precious perfect string (which happens to
be immediately after <code>user_name</code>). let's take a look at the stack in GDB to see why this
happens:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="kt">(gdb)</span> <span class="nb">b</span> memdebug.c:<span class="mh">7</span>
<span class="kt">(gdb)</span> <span class="nb">run</span>
Breakpoint <span class="mh">1</span>, main () at membug.c:<span class="mh">7</span>
<span class="mh">7</span> printf(<span class="s">&quot;enter your name: &quot;</span>);
<span class="kt">(gdb)</span> <span class="nb">n</span>
<span class="mh">8</span> fgets(user_name, <span class="mh">1000</span>, stdin); // get user&#39;s name from stdin
<span class="kt">(gdb)</span> <span class="nb">n</span>
<span class="mh">9</span> printf(<span class="s">&quot;hello %s&quot;</span>, user_name);
<span class="kt">(gdb)</span> <span class="nb">x</span>/<span class="mi">56</span><span class="kc">bc</span> <span class="nv">$sp</span>
<span class="mh">0x7fffffffdbf0</span>: <span class="mh">106</span> &#39;j&#39; <span class="mh">97</span> &#39;a&#39; <span class="mh">99</span> &#39;c&#39; <span class="mh">107</span> &#39;k&#39; <span class="mh">10</span> &#39;\n&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39;
<span class="mh">0x7fffffffdbf8</span>: <span class="mh">77</span> &#39;M&#39; <span class="mh">82</span> &#39;R&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39;
<span class="mh">0x7fffffffdc00</span>: -<span class="mh">24</span> &#39;\<span class="mh">350</span>&#39; -<span class="mh">78</span> &#39;\<span class="mh">262</span>&#39; -<span class="mh">5</span> &#39;\<span class="mh">373</span>&#39; -<span class="mh">9</span> &#39;\<span class="mh">367</span>&#39; -<span class="mh">1</span> &#39;\<span class="mh">377</span>&#39; <span class="mh">127</span> &#39;\<span class="mh">177</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39;
<span class="mh">0x7fffffffdc08</span>: <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">82</span> &#39;R&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">85</span> &#39;U&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39;
<span class="mh">0x7fffffffdc10</span>: <span class="mh">119</span> &#39;w&#39; <span class="mh">104</span> &#39;h&#39; <span class="mh">97</span> &#39;a&#39; <span class="mh">116</span> &#39;t&#39; <span class="mh">32</span> &#39; &#39; <span class="mh">97</span> &#39;a&#39; <span class="mh">32</span> &#39; &#39; <span class="mh">98</span> &#39;b&#39;
<span class="mh">0x7fffffffdc18</span>: <span class="mh">101</span> &#39;e&#39; <span class="mh">97</span> &#39;a&#39; <span class="mh">117</span> &#39;u&#39; <span class="mh">116</span> &#39;t&#39; <span class="mh">105</span> &#39;i&#39; <span class="mh">102</span> &#39;f&#39; <span class="mh">117</span> &#39;u&#39; <span class="mh">108</span> &#39;l&#39;
<span class="mh">0x7fffffffdc20</span>: <span class="mh">32</span> &#39; &#39; <span class="mh">115</span> &#39;s&#39; <span class="mh">116</span> &#39;t&#39; <span class="mh">114</span> &#39;r&#39; <span class="mh">105</span> &#39;i&#39; <span class="mh">110</span> &#39;n&#39; <span class="mh">103</span> &#39;g&#39; <span class="mh">0</span> &#39;\<span class="mh">000</span>&#39;
</code></pre></div></td></tr></table></div>
<p>
we can see our two character arrays are right next to each other on the stack
(<code>user_name</code> contains some gibberish as it is not zero-initialised.
</p>
<p>
<i><b>note:</b></i> this code was compiled with <code>-fno-stack-protector</code> to reproduce this
behaviour. compilers have certain techniques like this which can help protect against such attacks,
but there are often ways around these by using less primitive attacks.
</p>
<p>
okay, it's a pretty easy fix, we just need to change the <code>fgets(char *s, int size, FILE *stream)</code>
parameter <code>size</code> to <code>32</code>.
</p>
<p>
<i><b>note:</b></i> you may initially think "why not 31? don't we need to save a character for
the null byte at the end?". thankfully, <code>fgets</code> does this for us. excerpt from <code>man
fgets</code>: "fgets() reads in <i>at most one less than size</i> characters from stream and stores
them into the buffer pointed to by s [...] A terminating null byte ('\0') is stored after the last
character in the buffer". this is a good question to be asking though, being careful is key when it
comes to these kinds of things.
</p>
<h3>why hardware?</h3>
<p>
okay, so that's an easy fix. why are we talking about doing anything in hardware here? just write
the code correctly! the issue is code gets very complex, and this is a very simplistic situation.
some memory safety bugs can be incredibly complicated and go unnoticed for decades. the C language
especially gives the programmer many, many opportunities to make mistakes - and it only takes one
to be a problem. a lot of the software we are using these days is based on stacks upon stacks of
software written in different languages, and there are going to be bugs in there. CHERI should
give us some protection "for free" (it's not this simple, in actuality).
</p>
<p>
some languages (e.g. Rust) are going to offer you strong memory safety guarantees
at compile-time, but that's not the topic of this article. the differences between doing this
kind of protection in software or hardware (or both) is more complex than the scope of this
article. in addition, CHERI's benefits are more wide in breadth than just protecting against this
kind of issue.
</p>
<h2 id="pointers-recap"><a href="#pointers-recap">pointers recap</a></h2>
<p>
let's quickly recap a basic idea of what a pointer is. we're going to ignore things like
<a href="https://en.wikipedia.org/wiki/Virtual_memory">virtual memory</a> for brevity. we can think
of a pointer in a normal 64-bit architecture (e.g. AArch64) simply as a 64-bit unsigned value that
holds the memory address of something we care about. this is a simplification (as are most things),
but it can help us reason about the general idea:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="kt">int</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1593</span><span class="p">;</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">val</span><span class="p">;</span><span class="w"> </span><span class="c1">// x points to val</span>
</code></pre></div></td></tr></table></div>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 314"><defs><style>.prefix__c{stroke-linecap:square;stroke-width:3px}.prefix__c,.prefix__d{fill:none;stroke-miterlimit:10}.prefix__c{stroke:#fcfcfc}.prefix__f,.prefix__h,.prefix__i{font-size:24px}.prefix__f,.prefix__h,.prefix__k{fill:#fcfcfc}.prefix__f,.prefix__l{font-family:TeXGyreCursor}.prefix__d{stroke:gray;stroke-width:4px}.prefix__h,.prefix__m{font-family:TeXGyreCursor;font-weight:bold;}.prefix__i,.prefix__n{fill:gray}</style></defs><g id="prefix__a"><path fill="#0c1114" d="M0 0h1920v314H0z"/><text class="prefix__h" transform="translate(577.46 133.41)"><tspan x="0" y="0">int *x</tspan></text><text class="prefix__f" transform="translate(490.97 177.1)"><tspan x="0" y="0">0x0000010000000004</tspan></text><path class="prefix__c" d="M481.16 206v18.5M760.5 206v18.5M481.5 224.5h279"/><text transform="translate(578.78 241.33)" font-size="20" font-family="TeXGyreCursor" fill="#fcfcfc"><tspan x="0" y="0">address</tspan></text><path stroke-width="4" stroke="#fcfcfc" fill="none" stroke-miterlimit="10" d="M752 171h204.56"/><path class="prefix__k" d="M948.64 182.62L992 171.01l-43.36-11.63v23.24z"/><text transform="translate(1272.76 177.16)" fill="#fcfcfc" font-size="24"><tspan class="prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__l" x="57.6" y="0">0x0000010000000004</tspan><tspan class="prefix__m" x="316.79" y="0">]</tspan></text><text class="prefix__i" transform="translate(1272.76 133.16)"><tspan class="prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__l" x="57.6" y="0">0x0000010000000000</tspan><tspan class="prefix__m" x="316.79" y="0">]</tspan></text><text class="prefix__i" transform="translate(1271.76 224.16)"><tspan class="prefix__m" x="0" y="0">mem[</tspan><tspan class="prefix__l" x="57.6" y="0">0x0000010000000008</tspan><tspan class="prefix__m" x="316.79" y="0">]</tspan></text></g><g id="prefix__b"><path class="prefix__d" d="M1260 58v48H985V58"/><path class="prefix__n" d="M1258 195v40H987v-40h271m4-4H983v48h279v-48zM1258 107v40H987v-40h271m4-4H983v48h279v-48z"/><path class="prefix__k" d="M756.16 150.93v40h-271v-40h271m4-4h-279v48h279v-48zM1258 151v40H987v-40h271m4-4H983v48h279v-48z"/><text class="prefix__f" transform="translate(1094 177.09)"><tspan x="0" y="0">1593</tspan></text><text class="prefix__h" transform="translate(1007.6 45.16)"><tspan x="0" y="0">memory (as ints)</tspan></text><path class="prefix__d" d="M1260 284v-48H985v48"/></g></svg>
<p>
and on these normal architectures, this pointer generally is just a number. we can do weird things
with it, treating it as a number...
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">magic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">9999</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">arr</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">1234</span><span class="p">,</span><span class="w"> </span><span class="mi">5678</span><span class="w"> </span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="p">(</span><span class="n">arr</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"> </span><span class="c1">// x is a pointer to first element of arr</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;*x=%d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">x_addr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="w"> </span><span class="n">x</span><span class="p">;</span><span class="w"> </span><span class="c1">// we&#39;re going to assume size_t = unsigned long here</span>
<span class="w"> </span><span class="n">x_addr</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">4</span><span class="p">;</span><span class="w"> </span><span class="c1">// sizeof(int) == 4</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">x_addr</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;*x=%d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span>
<span class="w"> </span><span class="n">x_addr</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">4</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">x_addr</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;*x=%d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div></td></tr></table></div>
<p>...and this code will often still work:</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp">$ </span>./ptrs_as_numbers
<span class="go">*x=1234</span>
<span class="go">*x=5678</span>
<span class="go">*x=9999</span>
</code></pre></div></td></tr></table></div>
<p>
yikes! now, when you start messing with pointers like this, you're bound to run into a bunch of
undefined behaviour. but C programmers write undefined behaviour all the time, and my computer
executes this program fine without complaining at all. doesn't it feel a bit weird that we can take
a pointer to <code>arr[0]</code> and modify it to load <code>secret</code>? they're not even part
of the same array...
</p>
<h2 id="introducting-capabilities"><a href="#introducting-capabilities">introducting capabilities</a></h2>
<p>
CHERI introduces capabilities, which can be thought of as an extension to pointers. they still
store an address of something we care about, but they have extra information too! in a 64-bit
system, a pointer would typically be a 64-bit value (as dicussed previously). the corresponding
capability in a CHERI platform is 128 bits (or 129 bits if you look at it a certain way, more about
that later...).
</p>
<p>
as you might have guessed, this "extra information" takes up 64 bits of the capability. bits are
assigned to three key pieces of metadata: <i>bounds</i>, <i>permissions</i>, and
<i>object type</i>. there is also an additional 1-bit <i>tag</i> which is stored out-of-band: it is
not a 129-bit value - instead each 128-bit capability can be thought of as being associated with a
1-bit validity tag. the architecture manages this. the diagram below is provided as a rough
overview of this. note that it is not to scale.
</p>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1920 314"><defs><style>.prefix__c{fill:none;stroke:#fcfcfc;stroke-linecap:square;stroke-miterlimit:10;stroke-width:3px}.prefix__f,.prefix__g{fill:#fcfcfc}.prefix__f{font-family:TeXGyreCursor;font-size:20px}</style></defs><g id="prefix__a"><path fill="#0c1114" d="M0 0h1920v314H0z"/><text transform="translate(101.86 232.41)" font-family="TeXGyreCursor" font-weight="700" fill="#fcfcfc" font-size="24"><tspan x="0" y="0">int *x (capability)</tspan></text><text transform="translate(1205.97 232.1)" font-family="TeXGyreCursor" fill="#fcfcfc" font-size="24"><tspan x="0" y="0">0x0000010000000004</tspan></text><path class="prefix__c" d="M1016 261v18.5M1656 261v18.5M1016 279.5h640"/><text class="prefix__f" transform="translate(1293.78 296.33)"><tspan x="0" y="0">address</tspan></text><path class="prefix__c" d="M700 191.5V173M1020 191.5V173M700 173h320"/><text class="prefix__f" transform="translate(823.78 167.74)"><tspan x="0" y="0">bounds</tspan></text><path class="prefix__c" d="M554 260.34v18.5M704 260.34v18.5M554 278.84h150"/><text class="prefix__f" transform="translate(562.78 295.68)"><tspan x="0" y="0">object type</tspan></text><g><path class="prefix__c" d="M391.89 191.56v-18.5M541.89 191.56v-18.5M391.89 173.06h150"/></g><text class="prefix__f" transform="translate(400.67 167.8)"><tspan x="0" y="0">permissions</tspan></text><text class="prefix__f" transform="translate(304.67 31.07)"><tspan x="0" y="0">tag (out-of-band)</tspan></text><g><path class="prefix__c" d="M391.33 55.92v-18.5M421.33 55.92v-18.5M391.33 37.42h30"/></g></g><g id="prefix__b"><path class="prefix__g" d="M1651.66 205.93v40h-632v-40h632m4-4h-640v48h640v-48z"/><path class="prefix__g" d="M1016 206v40H704v-40h312m4-4H700v48h320v-48z"/><path class="prefix__g" d="M700 206v40H558v-40h142m4-4H554v48h150v-48z"/><path class="prefix__g" d="M554 206v40h-12v-40h12m4-4h-20v48h20v-48z"/><path class="prefix__g" d="M538 206v40H396v-40h142m4-4H392v48h150v-48zM418.5 70v40h-22V70h22m4-4h-30v48h30V66z"/></g></svg>
<p>
I am mostly going to focus on <i>bounds</i> in this article, as it is not too difficult to grasp,
and the impact is fairly easy to demonstrate for some simple examples. the bounds represent an
upper and lower bound on the memory region (address space) that this capability is allowed to
access. if we try to use the capability to access some address outside of this range, the hardware
will throw a fault - it simply won't let us do this!
</p>
<p>
<b><i>note:</i></b> it is important to note that I am going to oversimplify the way the bounds are
stored in this article. this especially includes the diagram above. in reality, there is a complex
compression method, necessitated by the range and sizes required by bounds. this depends on the
address value, alignment, etc. for now, we shouldn't need to think about this much, just know it
will be managed for us. the key take-away from this is that <i>bounds can't always be 100% precise
for all addresses and ranges</i>.
</p>
<p>
can you imagine how we can use bounds to prevent our previous memory safety bug from occurring? the
key is that we can set the bounds on the capability pointing to <code>user_name</code> which we
pass to <code>fgets</code>, such that the capability may only access the contents of the array.
this means that when <code>fgets</code> tries to write past the end of the <code>user_name</code>
array, the processor will throw a <i>capability fault</i>, and execution of our program will cease.
</p>
<p>
the idea behind CHERI is that we don't have to set up these bounds ourselves. this is something the
compiler can generate code for. the compiler knows that the <code>user_name</code> array has a
length of <code>32</code>, and can set the bounds accordingly on capabilities created that point to
it. let's try it...
</p>
<h2 id="playing-with-cheri"><a href="#playing-with-cheri">playing with CHERI RISC-V</a></h2>
<p>
unless you're lucky enough to have access to a physical Morello board, there is the issue of
actually using a CHERI implementation. for this article I will be making use of the
<a href="https://en.wikipedia.org/wiki/QEMU">QEMU</a> emulator to emulate a
<a href="https://en.wikipedia.org/wiki/RISC-V">RISC-V</a> CHERI environment. running
<a href="https://www.cheribsd.org/">CheriBSD</a> on this emulator will allow us to have a nice
<a href="https://www.freebsd.org/">FreeBSD</a>-based capability-enabled environment to play around
with. I'll use <a href="https://github.com/CTSRD-CHERI/cheribuild">cheribuild</a> to easily get set
up:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp">$ </span>sudo apt install autoconf automake libtool pkg-config clang bison cmake <span class="se">\</span>
ninja-build samba flex texinfo <span class="nb">time</span> libglib2.0-dev libpixman-1-dev <span class="se">\</span>
libarchive-dev libarchive-tools libbz2-dev libattr1-dev libcap-ng-dev
<span class="gp">$ </span>git clone git@github.com:CTSRD-CHERI/cheribuild
<span class="gp">$ </span><span class="nb">cd</span> cheribuild
<span class="gp">$ </span>./cheribuild.py --include-dependencies --run/ssh-forwarding-port <span class="m">2222</span> run-riscv64-purecap
<span class="go">CheriBSD/riscv (cheribsd-riscv64-purecap) (ttyu0)</span>
<span class="go">login: root</span>
<span class="gp">root@cheribsd-riscv64-purecap:~ #</span>
</code></pre></div></td></tr></table></div>
<p>
now we have our shell inside our CheriBSD emulated platform, we can start to try things out. let's
compile our <code>membug</code> program again, this time with the toolchain targetting
CheriBSD RISC-V - this will have been built as part of the dependencies already. once it's built,
we can <code>scp</code> it over to the CheriBSD filesystem, as we set up the SSH forwarding port to
<code>1111</code>.
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp"># </span>on a separate terminal on your host machine
<span class="gp">$ </span>~/cheri/output/sdk/utils/cheribsd-riscv64-purecap-clang membug.c -Wall -g -fno-stack-protector -o membug-cheribsd
<span class="gp">$ </span>scp -P <span class="m">2222</span> ./membug-cheribsd root@localhost:~/
</code></pre></div></td></tr></table></div>
<p>
and now we can see what happens when we explore our bug with CHERI:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="gp">$</span>./membug-cheribsd
<span class="go">enter your name: jack</span>
<span class="go">hello jack</span>
<span class="go">my_perfect_string: what a beautiful string</span>
<span class="gp">$ </span>./membug-cheribsd
<span class="go">enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.</span>
<span class="go">In-address space security exception (core dumped)</span>
</code></pre></div></td></tr></table></div>
<p>
it's working! we are getting a capability fault as we exceed the bounds of the
<code>user_name</code> capability bounds. we can use gdb to verify this is
caused by the bounds fault:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span>
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="kt">(gdb)</span> <span class="nb">run</span>
Starting program: /root/membug-cheribsd
enter your name: Hubert Blaine Wolfeschlegelsteinhausenbergerdorff Sr.
Program received signal SIGPROT, CHERI protection violation.
Capability bounds fault caused by register ca<span class="mh">6</span>.
<span class="mh">0x0000000040314ce8</span> in memcpy (dst<span class="mh">0</span><span class="o">=</span><span class="mh">0x3fffdfff44</span>, src<span class="mh">0</span><span class="o">=</span>&lt;optimized out&gt;, length<span class="o">=</span><span class="mh">54</span>) at /home/jack/cheri/cheribsd/lib/libc/string/bcopy.c:<span class="mh">143</span>
<span class="kt">(gdb)</span> <span class="nb">p</span> <span class="nv">$ca6</span>
$<span class="mh">1</span> <span class="o">=</span> () <span class="mh">0x3fffdfff78</span> [rwRW,<span class="mh">0x3fffdfff44</span>-<span class="mh">0x3fffdfff64</span>]
</code></pre></div></td></tr></table></div>
<p>
as we can see, the bounds for our <code>user_name</code> capability (which is stored in capability
register <code>ca6</code>) are <code>0x3fffdfff44-0x3fffdfff64</code>, but the address is
<code>0x3fffdfff78</code>. this is out of the bounds allowed by the capability, so the architecture
throws a fault. if we look at the assembly generated by the compiler, we can see it set our
capability bounds to a size of 32 to enforce this behaviour:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="mh">0000000000001ce8</span><span class="w"> </span><span class="p">&lt;</span><span class="nf">main</span><span class="p">&gt;:</span>
<span class="x">; int main() {</span>
<span class="x"> 1ce8: 5b 11 01 f6 cincoffset csp, csp, -160</span>
<span class="x"> 1cec: 23 48 11 08 csc cra, 144(csp)</span>
<span class="x"> 1cf0: 23 40 81 08 csc cs0, 128(csp)</span>
<span class="x"> 1cf4: 5b 14 01 0a cincoffset cs0, csp, 160</span>
<span class="x"> 1cf8: 5b 15 c4 fd cincoffset ca0, cs0, -36</span>
<span class="x"> 1cfc: 5b 26 45 00 csetbounds ca2, ca0, 4</span>
<span class="x"> 1d00: 5b 15 44 fc cincoffset ca0, cs0, -60</span>
<span class="x"> 1d04: 5b 25 85 01 csetbounds ca0, ca0, 24</span>
<span class="x"> 1d08: 23 40 a4 f8 csc ca0, -128(cs0)</span>
<span class="x"> 1d0c: db 15 44 fa cincoffset ca1, cs0, -92</span>
<span class="hll"><span class="x"> 1d10: db a5 05 02 csetbounds ca1, ca1, 32</span>
</span><span class="x"> 1d14: 23 48 b4 f6 csc ca1, -144(cs0)</span>
<span class="x"> 1d18: 81 45 mv a1, zero</span>
<span class="x"> 1d1a: 23 3c b4 f8 csd a1, -104(cs0)</span>
<span class="x"> 1d1e: 23 20 b6 00 csw a1, 0(ca2)</span>
</code></pre></div></td></tr></table></div>
<h3>capability monotonicity</h2>
<p>
at this point you may be thinking "okay, that's great, but if we can just set
the bounds of a capability with an instruction then what's the point? surely
I can just set global bounds on some random pointer and access whatever I want?"
</p>
<p>
fundamental to the idea of capabilities is their <i>provenance</i> and
<i>monotonicity</i>. simply put, the first says we can only construct a
capability using specific instructions, from an existing capability. we can't
just create a capability from some random number. let's see what happens when
we try to run our <code>ptrs_as_numbers</code> program on CheriBSD:
</p>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="kt">(gdb)</span> <span class="nb">run</span>
Starting program: /root/ptrs_as_numbers-cheribsd
*x<span class="o">=</span><span class="mh">1234</span>
Program received signal SIGPROT, CHERI protection violation.
Capability tag fault caused by register ca<span class="mh">1</span>.
<span class="mh">0x0000000000101c66</span> in main () at ptrs_as_numbers.c:<span class="mh">14</span>
<span class="mh">14</span> printf(<span class="s">&quot;*x=%d\n&quot;</span>, *x);
<span class="kt">(gdb)</span> <span class="nb">p</span> <span class="nv">$ca1</span>
$<span class="mh">1</span> <span class="o">=</span> () <span class="mh">0x3fffdfff74</span>
</code></pre></div></td></tr></table></div>
<p>
we can see we get a fault - the tag isn't set. any capability with a tag not
set to 1 cannot be dereferenced - it is invalid. in fact, this capability has
no capability metadata - when we copied it into our <code>unsigned long</code>,
we just copied the 64-bit address.
</p>
<p>
<i>monotonicity</i> is what stops us taking an existing capability, and
creating a capability with more permissions and/or access than the original. it
stipulates that when we create a capability from another capability (which we
have to do - provenance), the permissions and bounds of the new capability must
be equal to or less than the original. so our bounds can only get narrower as
we create new capabilites from an existing capability. this means that
capabilities trace back in a chain - they are all created from other
capabilities, and narrowed as necessary. in this case, (simplified) when the
kernel loads our program it will give us capabilities that are wide enough to
do everything we need to do, and the compiler will try and make sure all the
capabilities that we make and use from these are as tightly bound and
unpermissive as possible.
</p>
<h3>CHERI-fying code</h3>
<p>
you'll notice we got a lot of these benefits "for free". we only had to
recompile our code, and we got this extra security. of course, CHERI does
require changes to programs. naturally, the compiler had to be changed a lot to
implement this behaviour. it also especially requires changes to things like
the C library and kernel in order to take advantage of the features fully.
sufficiently large userspace programs do need changes too. one common issue is
that a lot of existing C code assumes that
<code>sizeof (*void) == sizeof(size_t)</code>. with CHERI, our pointers are
now twice as big. however, <code>size_t</code> hasn't changed size, as the
address space size hasn't changed - for example, if we index into an array with
<code>size_t</code>, the index should still be the same size; the extra data in
our <code>void *</code> capability is the metadata, not extra address data. any
program that tries to convert from some <code>unsigned long</code> or
<code>size_t</code> to a capability will fault - this violates provenance. so,
sometimes code changes have to be made to ensure we are keeping the capability
metadata around.
</p>
<h2 id="epilogue"><a href="#epilogue">epilogue</a></h2>
<p>
I appreciate this has been a fragmented and surface level introduction to
CHERI. hopefully it has provided some education in some basic aims of CHERI
regardless. potential benefits and uses for CHERI go much deeper than anything
I've touched on here, so please, read more about everything - and get your
hands dirty trying out messing about with qemu and CheriBSD!
</p>
<p>
here are some links to check out:
<ul>
<li><a href="https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/">CHERI homepage @ CUCL</a></li>
<li><a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf">technical report: An Introduction to CHERI</a></li>
<li><a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf">technical report: CHERI C/C++ Programming Guide</a></li>
<li><a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf">technical report: CHERI ISAv8</a></li>
<li><a href="https://www.arm.com/architecture/cpu/morello">Morello homepage @ Arm</a></li>
<li><a href="https://developer.arm.com/documentation/ddi0606/latest">Morello Architecture Reference Manual @ Arm</a></li>
</ul>
</p>
</div>
</body>
</html>