BlogWerx

Sunday, August 10, 2025

Are we Teaching Rust Effectively?

Prologue

When I was a young Rustacean, I can remember being profoundly confused by lifetime annotations. The concept is incredibly simple, and I don't believe it's a major concern for students to understand: the borrow cannot outlive the value. This is especially obvious for anyone coming from a C background. Less obvious is that this is an invocation of subtyping in Rust ^[1].

So, why did I find the annotations confusing? I'm talking about not only the syntactic lifetime annotations like 'a, but also their semantics. My first impression was (as far as I can remember), "these are just noise." They appeared to clutter the code without providing clarity. That was back in early 2016. After a year or so, my perspective had completely flipped.

And nearly a decade on, I am pretty comfortable with lifetime annotations, and I find myself frequently naming them with intention. Lifetime names like 'a and 'b are bad style, in my opinion, especially when many lifetimes that do not add clarity can be elided. I might abbreviate some of them like 'win instead of 'window, but it's still a good signal to the reader about what this lifetime represents. Another specific name I frequently use is 'arena when I have an arena allocator. This one is most useful for invariant shared lifetimes like &'arena Node<'arena, T>.

If the term "invariant shared lifetime" means little to you, I can recommend Arenas in Rust to start you on a profound journey through category theory. 😃

This intro is just a bit of exposition to set the stage for a problem that gives me pause. Lifetimes are an important concept in Rust, but I would say they are not the most important concept. Lifetimes are just the cost of doing business. They are tightly coupled with the concept of ownership and borrowing. Which means neither of these concepts are the most important, either. They all sit at the same level of importance, if we could order language features as such.

In my humble opinion, the Aliasing Xor Mutability ^[2] principle (henceforth AXM) is Rust's raison d'être. Everything else is supported by this principle. I am not the first to make this observation. In 2013, Niko Matsakis made the connection clear within "On the connection between memory management and data-race freedom".

In "Notes on a smaller Rust", withoutboats adds algebraic types and RAII as other necessary components. I like these features of Rust a lot, and they certainly make my code more robust. But I can hardly put them at the same level of importance.

The entire functional programming paradigm exists as a result of confining all mutable state (or more broadly, side effects) to an effect system. Rust's primary contribution is proving that programming languages do not need to go as far as eliminating mutability, just eliminating the combination of mutability and sharing is enough.

The point I'm trying to make is that AXM is fundamental. More than ownership, borrowing, lifetimes, automatic memory management, and type systems. Don't misunderstand how I'm framing this. These features all work together to make Rust what it is. But if I had to choose just one to start with when teaching the language, it would be AXM. By a long shot, and it's not even close.

The majority of the introductory material should focus solely on this one property. Because it shows up everywhere from your first borrow to your last unsafe implementation. It is a rule you can never break ^[3] for safe &/&mut references. The rule is absolute; interior mutability is an opt-in runtime-checked or hardware-checked exception. I think this is all strong enough evidence to make the claim that AXM is a big deal.

The magnitude of importance of the AXM principle wasn't brought to my attention while I was initially learning the language. It didn't even show up on my radar until I read The Problem With Single-threaded Shared Mutability (which links back to Niko's post that I mentioned before), probably sometime in 2018. The analogy with read/write locks is what finally made it "click" for me.

Another enlightening read was the "GhostCell: Separating Permissions from Data in Rust" paper, which I also found around the same time. I learned of this principle's significance much later than I should have and it looks like others might be falling into the same trap.

Evaluating Rust Learning Material

Now, let's compare with how Rust is actually taught in the wild, at least in a few specific cases. To start with, The Book elevates the AXM discipline early on in Chapter 4.2: References and Borrowing, directly after an introduction to the language syntax and concepts. The principle is discussed in the literature. I believe that is what's missing is making it the spine for beginner curriculum.

"Learn Rust with Entirely Too Many Linked Lists" is often cited for its humorous display of the difficulty of writing a simple data structure in Rust. Six implementations of a linked list with the goal of teaching pointers, ownership, borrowing, interior mutability, keywords, patterns, generics, destructors, testing, miri, raw pointers, aliasing, and variance. That's a nice list of things that do need to be covered to learn the language.

Ownership is covered in Chapter 2.3: Ownership 101 as a primary principle. I did not get the sense from this chapter that sharing and mutation are mutually exclusive. It's maybe only vaguely implied. The principle is not specified (as far as I could tell) until Chapter 3.6: IterMut.

Subtyping is not covered until the sixth implementation. Subtyping is essential knowledge for all references, not just raw pointers and unsafe code. Just a short and early introduction to variance like the relationships betwen shared and mutable references is enough to set a basic intuition. The prologue to this very post introduces a common example of invariant shared references, and we'll see it used later.

"Too Many Linked Lists" is not the only source that could spend more time on AXM. In the last TWiR, there was an article about the difficulty of writing a binary tree. I empathize with the author, the problem they were given is very frustrating. But I disagree with their conclusion that the difficulty is caused by Rust.

The lesson on LeetCode provides a type definition of the tree using shared ownership and interior mutability. Neither are required for a binary tree, and both of which add unnecessary complexity that makes solving this "easy problem" "hard", in the words of the Monday Morning Haskell blog.

To my point, the blog post in question pins the blame on ownership. My suspicion is that if more attention was given to AXM in educational material, this misconception may have been avoided. In the best case, the student might even see the red flag and question the lesson. Here is the type definition given:

// Definition for a binary tree node.
#[derive(Debug, PartialEq, Eq)]
pub struct TreeNode {
    pub val: i32,
    pub left: Option<Rc<RefCell<TreeNode>>>,
    pub right: Option<Rc<RefCell<TreeNode>>>,
}

This type definition wants to pretend that you can break the AXM principle. If you try it anyway with this type, the code will panic when borrowing (RefCell is a runtime-checked read/write lock). There is no need for RefCell in this type. You can (and should) just use normal references. And there certainly isn't any need for reference counting, or restricting usage to a single thread.

It also allows constructing DAGs and cyclic graphs, which are not binary trees. In a general sense, a binary tree is tree-structured and acyclic. The data structure can be designed intentionally as a graph, but I would not call it a tree in that case.

Binary trees have a purpose, but I would certainly change the type definition if I were using the data structure in learning material. I would instead start with this definition:

// Definition for a binary tree node.
#[derive(Debug, PartialEq, Eq)]
pub struct TreeNode {
    pub value: i32,
    left: Option<Box<TreeNode>>,
    right: Option<Box<TreeNode>>,
}

The remainder of this post will explain why, using the AXM principle.

The number one most important thing that you must always respect is this principle of AXM. Without it, you do not have a Rust program. This discipline constrains how the data structure we are about to implement can be used. It also informs the designer on crucial design considerations, namely ergonomics.

For instance, if a reference is going to be held to any internal state by any caller (because the interface returns references) then the designer should be careful to remember that references act like read/write locks for all intents and purposes. Design your interfaces around this knowledge so that references are held only as long as necessary, and no longer. References are statically-checked read/write locks.

I can imagine a method that takes a path and optionally returns an arbitrarily deep node reference. That would be a lot less fragile than requiring the caller to walk each node individually, and would eliminate any possibility of holding onto references to any intermediate node. Reducing the appeal of holding references for long spans goes a long way in making Rust work. I would even say that this is part of the art form for interface designers.

By the way, did you happen to notice that the left and right fields on this struct are private? That's fairly important for encapsulation, but we are not going to rely on it for upholding invariants. There really are no invariants with this data structure aside from those provided by Option, Box, and references. They could just as well be made public.

The Constructor

The constructor is almost exactly what you might expect. I decided to hide the Box as an implementation detail, so that callers do not have to box everything before Option-wrapping. This is an ergonomic consideration.

impl TreeNode {
    pub fn new(value: i32, left: Option<Self>, right: Option<Self>) -> Self {
        Self {
            value,
            left: left.map(Box::new),
            right: right.map(Box::new),
        }
    }
}

That's enough to construct immutable trees. But we can also add accessor methods to allow callers to walk the tree and insert or update nodes.

impl TreeNode {
    pub fn left(&self) -> Option<&Self> {
        self.left.as_deref()
    }

    pub fn set_left(&mut self, left: Option<Self>) -> Option<Self> {
        std::mem::replace(&mut self.left, left.map(Box::new)).map(|value| *value)
    }

    pub fn right(&self) -> Option<&Self> {
        self.right.as_deref()
    }

    pub fn set_right(&mut self, right: Option<Self>) -> Option<Self> {
        std::mem::replace(&mut self.right, right.map(Box::new)).map(|value| *value)
    }
}

Once again, the Box is hidden as an implementation detail. This is the only reason I decided to make the left and right fields private.

Finally, let's solve this "Invert Tree" problem. I'm going to dismiss the interface provided by LeetCode because it completely ignores mutable references, and instead requires moving ownership of nodes. I came up with this interface and solution, which I believe to be a remarkable improvement:

impl TreeNode {
    pub fn invert(&mut self) {
        std::mem::swap(&mut self.left, &mut self.right);

        if let Some(left) = &mut self.left {
            left.invert();
        }
        if let Some(right) = &mut self.right {
            right.invert();
        }
    }
}

That wasn't so bad, was it? No cloning, no reference counting, no interior mutability, no lifetime annotations. It adheres to the AXM principle. So long as the caller is not currently holding an aliased reference to the tree, it can be inverted.

I didn't code golf it down to one line, but I don't care. Terse code is not "easy" code. It's a compression scheme that depends on the writer and readers having the same foundational knowledge base (hint: they do not).

Unfortunately, I cannot just paste this into LeetCode and see if it solves the problem! LeetCode provided a type definition that I disagree with. One that made this problem seem hard. The problem itself is very approachable, even in Rust. And that is my main concern with how the language is being taught, at least in the few instances I've explored here. In order to write a decent implementation, I have to discard what the learning material is trying to impart.

I cannot reasonably expect anyone to come up with this kind of solution on their own if they are not yet familiar with Rust and the AXM principle. They would have to be intimately familiar with the concept to question the lesson.

What is LeetCode trying to teach in this case? It doesn't appear to be "good data structure design" or "good API design". It isn't even "how to write C code in Rust". I think what's actually going on is that the LeetCode curators are not Rust educators. They are mechanically converting lessons from other languages like C++, Java, and Python. It causes an impedance mismatch that does a harsh disservice to learners.

LeetCode isn't what's Wrong with Rust Education

This is just my opinion (okay, everything in this post is my opinion), but I think LeetCode is projecting a symptom, not the problem at large. The symptom is that shortcuts and escape hatches like cloning, reference counting, and interior mutability are getting overused, and ownership is overemphasized.

The problem as I see it is that the real fundamentals, like the AXM principle and subtyping, are not put at the forefront of lessons and truly hammered in as a pillar or keystone of the language. Not enough time is spent early on dissecting why they are needed and how to use them effectively, especially when designing interfaces.

There is no roadmap for learning the most important parts of the language in beginner courses. You have to learn it the hard way, just like everyone else did. The Curse of Knowledge strikes again.

What else does this Teach Us?

Before I forget, let's also implement that tree traversal by path method that I briefly mentioned earlier:

#[derive(Copy, Clone, Debug)]
pub enum Direction {
    Left,
    Right,
}

impl TreeNode {
    pub fn walk(&self, path: &[Direction]) -> Option<i32> {
        let mut result = self;

        for dir in path {
            result = match dir {
                Direction::Left => result.left(),
                Direction::Right => result.right(),
            }?;
        }

        Some(result.value)
    }
}

No surprises here. Some path is used to recursively traverse the tree, and it returns the value at that path location if reachable. If we also need a mutable version, it ends up looking almost the same:

impl TreeNode {
    pub fn walk_replace(&mut self, path: &[Direction], value: i32) -> Option<i32> {
        let mut result = self;

        for dir in path {
            result = match dir {
                Direction::Left => result.left_mut(),
                Direction::Right => result.right_mut(),
            }?;
        }

        Some(std::mem::replace(&mut result.value, value))
    }

    pub fn left_mut(&mut self) -> Option<&mut Self> {
        self.left.as_deref_mut()
    }

    pub fn right_mut(&mut self) -> Option<&mut Self> {
        self.right.as_deref_mut()
    }
}

This method takes a value and inserts it into the tree, returning the existing value. It does not create missing intermediate nodes, so it is not a general purpose setter.

What I'm trying to show here is that using temporary borrows to access your data structures, even in some non-trivial ways, can be done in a straightforward manner. This is how I would have preferred to learn Rust, if I could start all over ^[4].

Show me that I can do these kinds of things with references and how the AXM principle makes that possible. Show me that the borrow checker is not adversarial. Give me an intuition that Rc, RefCell, Arc, Mutex, RwLock, and Clone are all tools that I can opt into when the situation calls for it. Help me recognize what those situations are. Don't treat every problem as something to be solved by garbage collection, workarounds, or someone more knowledgeable.

A Mini Postmortem

If you attempt to use the data structure we've designed, you'll find that it necessarily has some shortcomings. For one thing, you won't be able to hold a reference to any interior node while also attempting to insert a new node into a non-conflicting branch. This is intentional. But the API as introduced doesn't tell us why. I can logically prove that it is safe to do this only because the interface leaks implementation details. I must know the internal structure of the tree to do the analysis. A caller that did not design this data structure can reach the same conclusion without ever looking at the implementation. The public interface contains all of the information necessary to find this limitation.

My observation on this point is that the interface itself is bad. Requiring users to traverse the tree using "left" and "right" instructions breaks the abstraction barrier. A better interface would encapsulate the implementation detail that the data structure is a binary tree. It might seem like an unusual premise, that a binary tree would want to hide the fact that it's a binary tree. In my experience, I tend to think of abstractions from an outside perspective most of the time. I don't care that this map is implemented as a binary tree, or a hash table, or a B+ tree, or a trie. They might have specific properties that appeal to me, but what I really want is to be able to replace them as requirements change.

If my code ends up being littered with "left, right, right, left, right" paths, that is going to make it more difficult to switch to a keyed data structure later. There are a few ways around this:

Use keys instead of explicit paths. This can use any mechanism that allows a key to uniquely identify every node in the data structure. One example is an opaque handle that is treated directly as a bitmap describing the path (0 for left, 1 for right), or indirectly through the Hash trait for arbitrary key types. Or some other "basic property" of the key like ordering (this is what std::collections::BTreeMap does).
Use the values themselves, constructing a set instead of a map. It requires that values are unique, but the same strategies apply because sets are just maps without values: Set<T> ~= Map<T, ()>

Making this "abstract leap" helps escape the problem where limitations can be identified by logical deduction. It allows the designer to reserve the right to make changes that would otherwise be breaking. This intentional limitation makes it possible for the next version of the data structure to be self-balancing without requiring all users to change how they are interacting with it. Or changing it internally to a B+ tree. We're simultaneously mitigating bad access patterns and future churn.

There's another obvious benefit to raising the abstraction bar. Ergonomics increases ten-fold when the caller can bring their own key without worrying about the specifics of how it might affect memory layout or traversal time. The abstraction is a promise that it will do the right thing.

I'll leave the design of a good abstract interface for this data structure as an exercise for the reader. Presumeably I've given some good ideas to help you get started on what it could look like. I've also glossed over an important detail in regards to making the node's value generic. The original type definition used i32, so that's what I copied. I also designed a variant that used generic values while writing this article. (A binary tree that only holds a specific number type is practically useless.) The biggest concerns came up as part of the next implementation because it needs two separate arenas: One for the nodes and one for the values. This article is long enough as it is!

One final thought on data structure design: The one we've come up with so far is limited not only in how internal references can be held by callers, but also the way in which the tree can be structured. It is strictly a tree. When trees become DAGs or cyclic graphs, you must re-introduce interior mutability (and perhaps also shared ownership). The next section shows one way to do this intentionally.

Arena-based Graph

Let's assume for a moment that the reasoning presented above just isn't convincing. You need to hold a reference while simultaneously mutating, and nothing else could possibly work. That's where arenas come in.

I'll spare you the walk-through this time, but I also implemented the same kind of binary tree interface using an arena and shared references everywhere which allows constructing graphs:

use std::cell::Cell;
use typed_arena::Arena;

// Definition for a graph.
#[derive(Default)]
pub struct Graph<'arena> {
    nodes: Arena<GraphNode<'arena>>,
}

impl<'arena> Graph<'arena> {
    pub fn node(
        &'arena self,
        value: i32,
        left: Option<&'arena GraphNode<'arena>>,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> &'arena GraphNode<'arena> {
        self.nodes.alloc(GraphNode::new(value, left, right))
    }
}

// Definition for a graph node.
#[derive(Debug, PartialEq, Eq)]
pub struct GraphNode<'arena> {
    value: Cell<i32>,
    left: Cell<Option<&'arena GraphNode<'arena>>>,
    right: Cell<Option<&'arena GraphNode<'arena>>>,
}

impl<'arena> GraphNode<'arena> {
    fn new(
        value: i32,
        left: Option<&'arena GraphNode<'arena>>,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> GraphNode<'arena> {
        GraphNode {
            value: Cell::new(value),
            left: Cell::new(left),
            right: Cell::new(right),
        }
    }

    pub fn invert(&self) {
        self.left.swap(&self.right);

        if let Some(left) = self.left.get() {
            left.invert();
        }
        if let Some(right) = self.right.get() {
            right.invert();
        }
    }

    pub fn value(&self) -> i32 {
        self.value.get()
    }

    pub fn set_value(&self, value: i32) {
        self.value.set(value);
    }

    pub fn left(&'arena self) -> Option<&'arena GraphNode<'arena>> {
        self.left.get()
    }

    pub fn set_left(
        &'arena self,
        left: Option<&'arena GraphNode<'arena>>,
    ) -> Option<&'arena GraphNode<'arena>> {
        self.left.replace(left)
    }

    pub fn right(&'arena self) -> Option<&'arena GraphNode<'arena>> {
        self.right.get()
    }

    pub fn set_right(
        &'arena self,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> Option<&'arena GraphNode<'arena> {
        self.right.replace(right)
    }
}

The details of how this works are very interesting, and you have to understand them to write code like this. The key is that invariant shared references are Copyable.

This implementation is structured differently from the original heap-allocated data structure we came up with before. There's a new Graph type that serves as the owner of the Arena. (I spent a great deal of time downplaying the role of ownership only for its importance to surface! But I maintain that the knowledge of AXM is a far superior tool.)

It can also be code golfed a bit by defining a type alias for the node pointers:

// Definition for a graph node pointer.
pub type NodePointer<'arena> = Option<&'arena GraphNode<'arena>>;

Some might consider that the interface with this type alias is easier to read, but I think it hides the invariant shared lifetime. Hiding (or making opaque) one of the crucial reasons this code even works just to save a few lines does not seem like a wise tradeoff.

Usage looks something like this:

let graph = Graph::default();

let foo = graph.node(13, None, None);
let bar = graph.node(42, None, None);
let baz = graph.node(27, Some(bar), None);
let root = graph.node(0, Some(foo), Some(baz));
dbg!(root);

root.invert();
dbg!(root);

Same bad constructor and traversal interface as before, but this time it allows the intuitive "mutate unrelated branches" idea to work. For instance, I can keep the bar reference around and change the node's value to 33 after inserting a right child into baz, or even after inverting the graph at the root node. That saves a graph traversal after each mutation.

Because this implementation is arena-based, it gains a new capability. The graph can contain cycles:

let graph = Graph::default();

let foo = graph.node(13, None, None);
let bar = graph.node(42, Some(foo), None);
foo.set_left(Some(bar));

Just don't dbg!() print it! It will traverse the graph endlessly. A Debug implementation that is aware of potential cycles is better than deriving.

The biggest caveat with arena-based data structures is that they only grow. When growth is left unbounded, that's a memory leak. The only way to reclaim their memory is dropping the Arena. They tend to be useful in niche scenarios, like compilers that need to allow cyclic references within an AST. Additionally, &Cell<T> is not thread-safe, it would have to be changed to Arc<RwLock<T>> to be sharable across threads.

This ties into my earlier point about teaching situational awareness. Students need to learn what situations are a best fit for all of the complex tools at their disposal. And the best way to do that is by teaching abstract concepts. Although, doing that is difficult itself and commonly boils down to explanation by analogy, as prominently criticized by E.W. Dijkstra in "On the cruelty of really teaching computing science".

Conclusion

Where does this leave us? I haven't solved any problems. I didn't make it any easier to learn Rust. I didn't make it any easier to solve the LeetCode problem that catalyzed this rant. Maybe, if anything, I've introduced some important lessons to people that are trying to learn this language. The best outcome would be that I've inspired educators to reformulate their curriculum around fundamental language properties like AXM, and dial down the emphasis on ownership and lifetimes.

Lifetimes aren't that scary, anyway. The naive heap-allocated binary tree above doesn't have any lifetime annotations at all. They just aren't necessary in that case. The arena-based graph is full of them syntactically, but there's only one and it refers to the Arena.

At this time, I do not think we are teaching Rust effectively. But I also think there are ways that we can.

&'long T <: &'short T (covariant). Another common example is that &mut T is invariant.
AKA Shared Xor Mutable.
So important is this rule that it's called out three times in rapid succession in the Nomicon, in the most memorable way possible:
- Transmuting an & to &mut is always Undefined Behavior.
- No you can't do it.
- No you're not special.
Anachronism: This code uses a lot of Quality of Life syntax and language semantics that did not exist when I was learning, circa Rust 1.7.

The most notorious of these has to be Non-Lexical Lifetimes, stabilized in Rust 1.36. Before then, the iterative loop that takes an exclusive borrow in two potential branches could not easily be written. The code can be modified to compile with Rust 1.0 by desugaring and including the workaround for NLL Problem Case #4.

Thursday, June 20, 2024

The GCNrd Story.

The most important piece of software I ever released was a Nintendo GameCube debugger over 20 years ago. It was not a financial success, receiving roughly $200 in donations. The initial version was released in full and for free on 2004-03-31, around 9 months after development began.

The Seed Planted

I was a bit of a boaster. After several years of creating cheat device codes^[1] that were a bit more creative than the standard flare of "infinite health" or "start on level X", I had given myself license to make bold claims without any real need to back them up. I had a knack for finding the developer's hidden debug menus and changing game physics in unintended ways, allowing players to explore the game world outside of the predefined boundaries. So, one day in 2003 when I claimed that I was going to do something to shake up the game hacking community by creating a GameCube debugger, a lot of people probably accepted it without thinking twice. On the other hand, I had no idea if I could actually pull it off.

A few similar tools had already existed for other platforms. The most popular at the time was probably Caetla. Connecting your PlayStation through an ISA "Comm board" and an original Action Replay / GameShark enabled full control over the PS1 software including a debugger, disassembler, and some homebrew development tools. It was an obvious inspiration for GCNrd.

Action Replay was released for GameCube on 2003-02-18. And this, in my opinion, is where it all began. Datel had already done all of the legwork to break into the system and execute arbitrary code. Not much is known about how it all happened, or even who was involved. The biggest problem for the game hacking community was that the codes were all encrypted, and only Datel (CodeJunkies) had the tools to create codes that would be accepted by the AR. Inputting random characters never worked. The AR would always be able to determine it was not a legitimate code, or it would notice that the code was valid but was for the wrong game.

For the first few months, nothing really interesting happened. CodeJunkies continued to very slowly release new codes, usually only one batch for whatever the newest game was at the time. Older games were mostly ignored after the first batch of codes was released for them. If you wanted a specific kind of code for your favorite game, you were out of luck and at the mercy of CodeJunkies to do what they wanted to do.

On 2003-06-17, everything changed. A program called PSUL^[2] was released by CrazyNation (legendary N64 sceners) that allowed anyone to execute arbitrary code through a broadband adapter (BBA) and the Phantasy Star Online Episode 1 & 2 game. In mere hours a number of game disc images were also released in the warez scene. One of them was a version of the Action Replay.

I saw this as an opportunity. After a few days of reverse engineering the PowerPC assembly from the AR dump, I released a tool that was capable of decrypting existing AR codes and encrypting brand new "homebrew" codes without Datel's permission. GCNcrypt released on 2003-06-22, just 5 days after the PSUL release. Not bad but could have happened sooner if not for all of the weird stuff Datel did with the AR codes. I had to interpret the meaning of the "identifier code" which is one of the ways the AR rejected random inputs. Real life was also slowing me down, I had work and nothing to go on but intrinsic motivation.

According to my notes, it wasn't until two weeks after the release of GCNcrypt that I decided to actually make the debugger a reality. Two months into the project, I started working on the network driver that would become the foundation for the debugger. Most of it was reverse engineered from the official Nintendo SDK and compared to datasheets available for a very similar (though not identical) Macronix ethernet controller. Within 5 days, I had enough working that I could send and receive UDP packets to a PC connected on the same network. This was 2003-09-01, and though I worked on the network driver mostly on my own, I was also moonlighting as a scener, releasing GameCube ISOs and writing tools for the group to use.

One of the sceners I worked closely with at the time was Aftermath. He and I came up with an idea to run the ISOs on GameCube hardware by streaming parts of it over the BBA. And after the network driver was usable enough, that's exactly what we did. We picked the easiest possible game to start with, Animal Crossing. Because it was a small game, and it was known at the time that you could remove the disc after the title screen and the game would just keep running forever. Everything it needed was loaded at the start, and then it would not need to read the disc ever again. It was the perfect test case. Animal Crossing Loader was released on 2003-10-10. It was the first publicly available method of loading ISO backups on the GameCube.

ACloader went through several iterations, starting with compatibility improvements, and each release fixed bugs or increased network performance. At its core, the idea was simple. After initializing the network driver and setting a fixed IP, it would send packets to a predetermined server IP to establish a session. The GameCube would request the initial boot blocks from the virtual disc, which the server would send as requested. It followed the basic boot process, reading the boot block and running the AppLoader, etc. After the main executable was loaded by the AppLoader, the memory would be scanned for the Read() function, which would be patched to read from the simulated DVD drive over the network.

The debugger worked on the same basic principle, so it's no wonder GCNrd shares a common ancestry with ACloader. They share about 80% the same code, which we called "libeur". The initial GCNrd release in 2004-03-31 removed all of the "DVDsim" capability, which I kept in private builds. GCNrd was most well-known as a debugger for commercial game discs, but it played a role behind the scenes that made it possible to release new codes as soon as a game hit store shelves, sometimes even earlier.

Sceners raced each other to get the newest games as quickly as possible, and sometimes that meant a scener embedded somewhere in the logistics for game distribution would end up taking a disc home prior to the official release date where it would be copied and uploaded to FTP sites across the internet. On the other end, I would download the ISO and run it locally with the debugger's DVD simulator. I would create codes and designate a new unique game ID for it. In some cases, Datel would adopt the same game ID for their official code list.

The Nitty Gritty Details

It wasn't all roses and rainbows. Though what I was able to achieve with GCNrd is hard to overstate, the saga was laced with drama and technical challenges. Aftermath ended up doing a large number of ACloader-spinoff projects on his own, infrequently contributing his improvements back to GCNrd. Some of them are lost forever. I had a contentious relationship with Costis, of PSOLoad and SDLoad fame. I made some very minor contributions to SDLoad, which made launching GCNrd about 100 times faster. Datel copied the idea for their own SD Media Launcher about 2 years later. LOL.

Development with the PSO loaders was very rough. It took about 2 minutes from power-on to first Arbitrary Code Execution. Several compatibility bugs that GCNrd had were due to state leftover from PSO. Typically, some hardware needed to be reset before starting the new game. Star Wars: Rogue Squadron II - Rogue Leader was the one that gave me the most trouble. I would spend several hours across multiple days trying to find the cause of the game crashing in the level intro. I think I eventually fixed it, but I no longer remember what the actual problem was. And even then, SDLoad didn't have the same crash bug. So, I may have not fixed it at all, instead recommending SDLoad as "the right way" to use GCNrd.

Programming in assembly and low-level C for the console was harsh. It was too easy to screw things up, especially because there is no cache coherency in the CPU. Doing any kind of self-modifying code was a unique challenge. There was no real debugging, apart from what GCNrd itself implemented! I could send formatted strings in UDP packets, which is how I did a lot of the initial development. And ultimately, I added an exception handler to catch crashes and report thread context. It was a slog, lots of time resetting the GC and waiting through the PSO loading screens. Debugging was just trial and error for months on end.

I had limited memory available, about 32 KiB which I reserved for the debugger and its network stack. I was able to offload some of the logic to the PC-side by implementing commands that could read and write arbitrary memory locations. This was probably the saving grace for the project. The memory-resident debugger could get away with just the bare minimum functionality for using a pre-initialized broadband adapter (the menu would initialize prior to loading and running the memory-resident half) and some basic functions to control the debugging circuits on the CPU. Most of the interesting parts of the debugger were built on top of these simple primitives as a "remote debugger". Searching for library SDK functions just needed to read memory, for instance. The comparison code existed on the PC-side. Patching was the same, the assembly routines were on the PC-side and the only thing needed on the GC-side was the capability of writing to memory.

In hindsight, it would have been much easier with a smaller memory footprint to create a serial interface instead of using the BBA. However, I had good reasons (at the time) for preferring the BBA:

I lacked any kind of hardware knowledge. Designing even a simple serial interface at that point was not something I could have done. In 2004, it was unclear if the GC even had a general-purpose serial port available to use (e.g. not the controller ports).
The BBA was already available, and it was much faster than what could be done with a bespoke serial port. The full 24 MiB of main memory could be downloaded in about 14.5 seconds, according to a pre-release note that I kept in my logs. A rate of about 1.7 MiB per second. This was about half of the max theoretical bandwidth of the EXI bus.

Source control was basically non-existent at the time. Mercurial and Git wouldn't have been released for another 2 years. Code was copied to other directories or zipped to make backups, whenever I felt like it. Which clearly wasn't enough. I no longer have the code for the initial released version, or the second release, "v1.10b"^[3]. I instead have a few variations of source code from intermediate points between (and after) both releases. One day I would like to go through these variations and try to understand what was changed between them.

I could release the code, but I don't even know if it compiles. There was a point in time when I thought I would clean it all up and make the memory-resident debugger relocatable and remove all of the duplicate code between the menu and the debugger. That subproject was incomplete, and I am sure it left the code in an unusable state. The code is also absolutely terrible, being a specimen from my earliest developer days. It would be a really bad release without some real curation effort going into it first.

Wrapping Up

So, why write about this now? Well, for starters, it all happened a very long time ago. Everything about GameCube is already known by now, but back then we were pioneering. I left the warez scene sometime in 2008 because I felt like I had done everything I set out to do. I made piracy possible on GameCube and had a pretty big impact on the DS as well. These days, game piracy is completely alien to me. I just felt a need to reminisce about a very interesting time in my life, at least the good parts!

It's very weird watching something like an MVG video where he describes your work but doesn't say anything about who you were or how you did it. Clearly, I had an impact on many people's lives, and they don't know anything about me. The ACloader was built mostly in anonymity (for obvious reasons)^[4], but I'm happy to claim responsibility for it. It was an awesome hack that let hundreds of thousands of people^[5] play games that they would otherwise not have access to. On the other hand, GCNrd was always associated with my internet handle (Parasyte), just stripped down to remove the piracy features.

Given the chance, would I do it again? In a heartbeat! It's as close as I have ever come to feeling like my existence was meaningful. That I wasn't just consuming resources, but also producing something of value. It's what I imagine the early phreakers and hackers must have felt in the 60's and 70's, before I was born. Doing something that was generally looked down upon, but in its own right created immense value by allowing people to connect in ways they couldn't before.

I am proud of GCNrd and ACloader. And even though I have a chip on my shoulder because I was never able to top those projects (despite untold amounts of effort) I still believe it was one of my crowning achievements. I was 20 years old, self-taught, working a terrible low-wage job, and I just personally enjoyed messing around in game internals. It was inevitable that I would create tools to help other people do the same thing.

All that said, GCNrd and ACloader was not my only project that gained wide attention. I hinted about DS earlier, but I also did a lot of stuff with N64, NES, SNES, PS1, all of the GameBoys, and dozens of emulators for several platforms. It's too much to list, but there are a few other good stories in there that have never been told. Maybe someday I'll feel nostalgic enough to write about them, too.

In summary, working on the project was miserable at times and it's now largely forgotten. It was all worth it, though.

"Cheat devices" were mostly unlicensed cartridges or discs for home video game machines that would allow players to make some real-time modifications to games. The most common effects were "cheats" like being able to take an infinite amount of damage or give yourself all of the unlockable upgrades in a game.
As far as I can make out, PSUL and PSOSocks were both released on the same day. PSOSocks contains a disc dumper and other tools. It kicked off the GameCube piracy scene. At least as far as copying went. PSUL was more homebrew-oriented, and it can be said to have kicked off the GameCube homebrew scene.
I DO have the complete source code for the very first ACloader, however. Which probably deserves a release just for the historical significance.
The initial ACloader clearly lists "AFTERMATH AND PARASYTE" in the scroller, and the PC-side server includes the same information. But I tended to distance myself from it in the open.
I really don't know how to estimate the actual number, but 6-digits is easily in the right ballpark.

Monday, March 18, 2024

Goto is an Abomination

In response to Goto is Not a Horror, I must implore you witness:

func main() {
	var i = 0

	loop:
	i += 1
	if i > 10 {
		goto done
	}

	fmt.Println("Hello, world", i)

	goto loop
	done:
}

You might say, "that isn't fair, this is just bad code using bad practices". And you would be half correct.

Goto can be used for evil. And it is. Too often, goto is abused. But here's the thing: Goto can only be abused. The example is fair criticism because there is no "good" use case for it.

That was Dijkstra's point.

The go to statement as it stands is just too primitive; it is too much an invitation to make a mess of one's program.

It's too unconstrained, or "unbridled" as Dijkstra put it. This is still the case with the goto keyword today. And it will remain the case 56 years from now.

"It's not a problem anymore, that goto is gone," says this unusual justification for an abomination, as if Go supposedly fixed it. You've clearly read the original 1968 letter, but seem to have missed its 1987 follow up, "On a Somewhat Disappointing Correspondence."

The real myth is that somehow jumping around arbitrarily just magically stopped being an issue. As the 1987 paper alludes to, we have better problems to solve than your decision to avoid constraint.

You can write code without using the goto keyword at all and it will make your code less enigmatic. Try it sometime. Stop defending the indefensible.

As for me, I thought that by now professional programmers would have learned something.

Wednesday, April 12, 2023

Improving build times for derive macros by 3x or more

I was dissatisfied with argument parsing crates. There didn't seem to be one that checked all the boxes I cared about:

Simple. Don't overcomplicate it! All I want to do is provide a rudimentary human interface to my app. If I wanted interface complexity, I'd put a whole TUI library in it.
Don't make me repeat myself, and don't make me write unnecessary boilerplate. The only thing a parser needs to do is compare given args to a known list and populate some state based on it. It shouldn't require adding a bunch of attributes to define how that state is populated, and it shouldn't require writing the application "help text" that describes the arguments separately from struct declaration.
Correctness. If I give my app invalid UTF-8 and it panics or returns an error inappropriately, I'm going to be furious.
Small! I once used clap as a young Rustacean and a third of my binary size was for parsing arguments! The app was fairly sophisticated, doing things like parsing ELF files. A whole third of the size of the binary was only used in the first few milliseconds of the app's entire runtime. (I may be exaggerating a bit, and I can't find any evidence to back the claim. The very first commit in the project in question doesn't contain any argument parsing library. No doubt due to the experience described.)
Fast! Build times are terrible for all arg parsing #[derive] macros that I've seen. Even on my relatively modern/high-end desktop (with a Ryzen 9 5900X running Windows 11 and Ubuntu 22.04 in WSL2), the fastest derive crate (argh) takes 2.15 seconds for the initial build. For comparison, the fastest all-around crate is pico-args at 370 ms.

There's a list of popular argument parsing crates along with various benchmarks at https://github.com/rosetta-rs/argparse-rosetta-rs. These results sparked my curiosity. If we eliminate all crates that don't support derive and invalid UTF-8 from this list, we're left with just two options: bpaf_derive and clap_derive. Both of these fail the "small" and "fast" requirements, though bpaf_derive is the better of the two.

Stability and versioning: Lock yourself in at your own peril.

I usually write about Rust, but today I want to discuss something broader; flexibility in software, particularly in design and architecture. To begin, let's focus briefly on versioning.

Versioning

Prior to semantic versioning, version numbering conventions used in software releases were nearly as numerous as the releases themselves. Some had implied meaning for releases with odd vs even numbers, some would skip numbers arbitrarily, some would use letters and other non-numeric characters, and some would have unique counting strategies. Semantic versioning provided some much-needed consistency.

Consistency, however, is not the ultimate goal. You might imagine some issues with consistency itself; is it better to have some inconsistencies (mix the good with the bad), or just be consistently bad? One of the most important goals for a software maintainer is to provide repeatability; in other words, consistency within your build process and testing infrastructure. Same word, just scoped differently. Versioning, when used appropriately, can help increase the repeatability of your builds and tests.

So then, let's defined "appropriate versioning". If we follow SemVer closely, then it should not be possible to introduce breaking changes in our software that will negatively impact downstream dependents. If a mistake is made and a breaking change goes out in a version that is not a semantically breaking version, then we have to do the right thing and unpublish/yank the semantically incorrect version. That's all there really is to it. I'll leave you to read the SemVer spec for the precise rules to follow, but this is all that one needs to appropriately version their software.

Stability

This term, "stability", is ironically such a fragile concept. For something to be stable, it must be unchanging. For something to be unchangeable, it must be perfect. Very few things are universally perfect. I like to use science to illustrate this point: the sciences are ever-evolving; knowledge is not static but accumulates and is refined over time precisely because it didn't start out perfect. It is fitting that computer science is one of these fluidly changing organizations of knowledge.

When software engineers think of stability, they are usually looking at it in the context of any of the following layers of abstraction:

API stability. This is what SemVer addresses directly.
ABI stability. The idea that binaries compiled long ago (or by different toolchains) should work transparently with binaries compiled today.
Resilience. Resistance to changing behavior or breaking compatibility, but also resistance to logic bugs and the like.

Resilience in service-oriented software is a very interesting topic of ongoing research, but we can frame resilience in the terms described above. Namely resistance to logic bugs, since changes in behavior or interface are better addressed through tools like SemVer.

So, what do we know about logic bugs in stable interfaces? The first thing that comes to my mind is in the C standard library. It's our old friend, the null-terminated string and its wild band of merry havoc-wreaking unsafe functions! There is good reason this has been dubbed the most expensive one-byte mistake. Now just because C has null-terminated strings and its standard library supports them doesn't mean you have to use these abominations even if you write code purely in C. But the sheer support for them in the wider ecosystem sure does make it harder to integrate various libraries when you are using any of the safe alternatives. Not to mention the fragmentation that each of those contributes to.

We are left with a stable interface whose intractable design makes it almost useless in practice. It may as well not exist at all! We'd be better off without null-terminated strings, de facto. But in some sense, we are stuck with them for all time, and there is nothing that I, you, or anyone else can do about it. This is clearly a highly important lesson of stabilization.

There is another term for stability, often used with critical connotation:

Ossification

No doubt, many will first encounter this term when trying to understand the QUIC protocol. It has been used in various circles for far longer, though.

Here it is seen in an article written in 2007 on the state of the Internet at the time: "The Internet is Broken"
And here it is used in reference to the C++ standard library: "The Day the Standard Library Died"
And Fastly mentions it in regards to the Y2K bug (among other examples): "Minimizing ossification risk is everyone’s responsibility"
Of course, it also makes several appearances in the Cloudflare Blog.

If you read between the lines a little, ossification sounds like an objectively bad thing. It is outside of your control, and you cannot fix it. What could be worse than that? In my opinion, being the decisionmaker in that case would be worse. I wouldn't want to be held accountable for saying "this is our call; it's set in stone and it can never be changed."

Stability comes packaged with this gremlin called ossification, and they are inseparable. What sounds like a good idea right now may turn out to be a bad idea later. It happened with null-terminated strings and nullable pointers, it happened with PHP and Python, and it will happen again. Maybe the next library or application you stabilize will turn out to be totally wrong/broken/inadequate.

You can't tell the future, but you will definitely be stuck with the past.

Outside of standard libraries that ship with programming languages and compilers, it is a pretty safe bet that being totally wrong/broken/inadequate is a reasonable situation to be in; you just patch your library or application when things need to be fixed and all is right, right? I think this actually depends on a few aspects:

The software maintainer needs to be on top of maintenance. If they fall behind on making patches or merging PRs, then you're left with the maintenance burden yourself, often on software that you are not an expert in. It just happens that you depend on its functionality and you get stuck with the bill when the maintainer moves on to other projects.
Other dependents need to maintain their software, too! That means all transient dependencies in the entire tree need the same treatment and amount of prioritization to receive patches in a timely manner.

To echo the sentiment of the Fastly article, as long as everyone keeps up their end of the bargain with maintenance, there isn't a whole lot that can go wrong. It's just an impractical task to herd all of those cats; that's the reality.

What, then, is there to do about standard libraries? Hopefully as little as possible! Considering the likelihood of getting a design or public interface wrong, and the resistance to change with strong stability guarantees, it is in everyone's best interest that standard libraries be as small and self-contained as reasonably possible. Even some primitive types that are useful in general may be better off in a library that can freely be updated (in breaking fasion) by end users. Yes, I'm thinking of strings in C, again; but I'm also thinking about mutex poisoning in Rust; I'm thinking about Python's multiple XML interfaces and its asyncio module that is arguably poorly designed compared to curio and trio; I'm thinking about Go officially supporting largely irrelevant and insecure ciphers like RC4, DES, and MD5 (honorable mention to both pseudorandom number packages, because that's never going to confuse anyone).

Maybe it's better to not include everything in the standard library. We just end up hearing recommendations over and over about how you shouldn't use this feature or that function because it's unsafe/insecure/deprecated/there's a better alternative. Starting with a very small surface area seems like the right way to go. This is not a proposal to start small and grow gradually, though! I don't think a standard library should grow much, if at all.

Think about it this way: Something like JSON might seem like an obvious choice for inclusion now, but in 40 years, is anyone honestly still going to use JSON? I mean, ok, fine. People are still using COBOL. But my point is that 80% of all developers are not COBOL devs. In four decades, sure there will be some JSON stragglers just like there are some COBOL stragglers today, but I can't believe that it will remain popular considering how bad it is for so many reasons. (Let's face it! The only reason protobuf or something else hasn't entirely supplanted JSON is because web browsers have ossified their serialization format.) JSON will very probably end up like null-terminate strings; the bane of some seasoned developer's existence. They both seem just so innocent, don't they?

Be very cautious about what you choose to ossify.

Resilience

I'm not much of a words-person, but when I research things that I write about, I usually go to the thesaurus often to find a good synonym for whatever concept I'm trying to convey. One of the top synonyms given for "flexibility" was "resilience"! And this was a fascinating discovery itself, worthy of a paragraph or two. Flexibility implies resilience because flexible things bend rather than break. When something is rigid and enough pressure is applied, it has to break at some point. And breaking is not very resilient, is it?

This definition of "bend rather than break" applies in a number of relevant ways for software design. A service provider strives for resilience to downtime and outages. A programming language designer and compiler implementer strives for resilience to breaking user code. Interface designers strive for resilience to footguns and obvious logic bugs. In each case, the person wants the software (or user) to bend rather than break.

I like it. In fact, I'm willing to boldly claim that resilience is objectively desired. If that is the case, then its synonym, flexibility, should also be just as desirable, objectively. Among the definitions of this particular word, you'll find "bend rather than break", "ease of adaptation or offering many options", and (my person favorite) "the willingness to adjust one's thinking or behavior." Doesn't that last definition sound suspiciously like science and knowledge? It's no coincidence, and I'm not just playing word games. I'm pointing out the underlying truth that things must change. It's the third law of thermodynamics.

Any resilient system will necessarily resist ossification.

What are these quotes, by the way? Don't worry about it! I just made them up. If I used twitter, these are the kinds of one-liners I would tweet. But I don't, so I don't. You have to read the longform, instead.

I have argued so far that the concept of "stability" has something of a duality. There's the notion that something is stable because it resists external change, and the notion that something is stable because it resists bugs; it resists stagnation. An example of the former is a strong interface stability guarantee from a library. An example of the latter is keeping your browser and OS up-to-date to fix bugs, known security vulnerabilities, and just to acquire new features. Or stated another way, the "it's not a bug, it's a feature" line of thinking is the bug.

Lock yourself in

Having set the stage for stability as a means to lock-in a design or set it in stone, let's dive in to that concept a bit more. There are some examples in package managers where "locking in" your dependency list is a normal best-practice. npm has a package-lock file and a ci command to install packages from it. Cargo has a Cargo.lock file and installs packages from it by default (top-level only; lock files in dependencies are ignored). go.mod files have the same basic use.

Aside (added May 2024):

Nix takes package manager version locking to an extreme. My very broad understanding is that it uses hashes for package references in a similar way that git uses hashes for commit references. Meaning that it will scale well and be resilient, exactly the properties that we are looking for here!

I bring this up because versioning dependencies has been converging on this concept of locking the precise version number of every package in the tree within many major package managers. And it's no surprise, because this is one of the best tools for providing repeatable builds and testing. Semantic versioning plays right into that, as I covered earlier.

There's just one problem. This is not the kind of "locking in" that I allude to in the title. I'm talking about locking yourself into a specific design that is hard to change later. I'm talking about being that decisionmaker that owns the responsibility for putting a crappy middlebox on the Internet without ever updating it to support newer protocol revisions (or newer protocols, period). I'm talking about tying your hands by offering a stable API that just plain sucks. No, I take that back. I'm talking about tying your users' hands. Because it's not the author that is locking themselves into a contract. They are locking their users into a contract.

You may lock yourself in at your own peril, but you lock everyone else in with you at their peril.

This is something I take personal grievance with. Development philosophies like "worse is better" means that everything I use on a daily basis is figuratively worse than it could be otherwise. The Internet is held together with bubblegum and string. Wirth's law is real, and it threatens our very existence. I'll admit, using Wirth's law to criticize Bitcoin is a bit cheeky. But sometimes you just have to be cheeky.

I've rambled on enough, and linked to about a dozen articles that you should read. I'll just leave you with one last thought: Change or die.

Tuesday, July 27, 2021

Mutable statics have scary superpowers! Do not use them

In Rust, it is well-known that static mut is dangerous. The docs for the static keyword have this to say about mutable statics:

If a static item is declared with the mut keyword, then it is allowed to be modified by the program. However, accessing mutable statics can cause undefined behavior in a number of ways, for example due to data races in a multithreaded context. As such, all accesses to mutable statics require an unsafe block.

This makes perfect sense, so far. But my program is single threaded. I know it is single threaded because it runs on bare metal with no_std. I also know that I will need global state to manage access to memory-mapped I/O, and the global state needs to be mutable so that I can replace it at runtime. I can do this with static mut because I do not have to worry about data races with multiple threads right?

Well, that's the thought I initially had, anyway. And I believe I am not alone. But there is a lot of subtlety with static mut references that are problematic, even in a guaranteed single-threaded context. At first I believed it all has to do with the special 'static lifetime, but that is not the case. Rust By Example describes this lifetime as follows:

As a reference lifetime 'static indicates that the data pointed to by the reference lives for the entire lifetime of the running program. It can still be coerced to a shorter lifetime.

That also makes sense, and it's exactly what I want. In my case, the memory-mapped I/O that I wanted to control was a serial port. The primary goal was to reproduce something akin to standard I/O with println! and dbg! macros like those provided by the Rust standard library. A secondary goal was to make the sink configurable at runtime. This would allow the software to run on hardware with different serial interfaces, for example, without recompiling it for each distinct device.

The platform I was targeting is the Nintendo 64. Which, as you might guess, does not have your good old RS-232 serial port. It has a slot for game cartridges, and an expansion bus on the bottom which is basically just another cartridge port. It also has some controller ports on the front. Nothing really stands out as a plug-and-play means to get a serial console interface. As luck would have it, there are development cartridges that have an FTDI chip and USB port on them for exactly this purpose! The bad news is that each dev cart has its very own custom MMIO interface, and none of them are compatible.

Going back in time a bit, there are also debug interfaces on the original development hardware used in the 90's (and for a few years in the early 2000's) which use either parallel or SCSI ports. Some emulators support these interfaces because games (especially prototypes) actually print interesting things to them.

So we're in a situation where we want to write software for a platform (N64) that has a number of different serial interfaces available, but we don't want to build 3 or 4 different executables and put the burden on the user to figure out which one they need. And so here we are; we make the serial I/O implementation configurable at runtime. It can detect which interface is available, and that's what it uses. No sweat!

The core:fmt::Write trait

We want to use core::fmt::Write to make the implementation generic. This trait provides some useful methods for implementing the macros, and the core formatting machinery takes care of the details I don't want to think about. There are far better designs than what I came up with, but the initial implementation of my runtime-configurable I/O API would store a trait object (implementing fmt::Write) in a lazy_static. And the macros just needed a way to get a mutable reference to it. That's &'static mut dyn fmt::Write, mind you. But, I thought to myself, if I just hide all that inside the macro, nothing bad can happen!

What I didn't realize is that static mut has superpowers. I'll try my best to explain what these superpowers are, but please bear with me! I do not fully understand why these superpowers exist or what they are fully capable of. What I do know is that you never, ever, under any circumstance, want to anger them.

The 'static lifetime means that it lives forever (as far as the program is concerned) and making a static mutable means that you can only touch it in unsafe code, because threads and data races, right? Wrong! You need unsafe because the superpowers of static mut allow you to introduce instant Undefined Behavior. To explain, let's revisit what &mut means in the first place.

It's true that &mut allows you to change the contents of the thing behind it, but that's really just a side effect of the real purpose of &mut; it's an exclusive (unique) borrow. There can be only one.

All jokes aside, this is an invariant that you cannot break. Creating mutable aliases is instantly Undefined Behavior. Invoking Undefined Behavior gives the compiler license to do anything to your code. We really only want the compiler to do what we want it to do with our code, so let's just agree to avoid UB at all costs. To really nail this one down, consider what the Nomicon has to say about transmute:

Transmuting an & to &mut is UB.

Transmuting an & to &mut is always UB.

No you can't do it.

No you're not special.

There is some subtlety here that I won't go into, but the core of the issue is that this creates mutable aliases. Don't do it!

A starting point

There are some things that you will naturally begin to intuit while writing Rust after you have been using it for a while. It's helpful to recognize things that will cause the borrow checker to reject your code, for example. A quick internal borrow check in your brain can save a few seconds (or minutes, in extreme cases) of compile time just to be provided with an error message.

Let's begin with this bare minimum State struct:

#[derive(Copy, Clone, Debug)]
struct State {
    x: i32,
}

impl State {
    const fn new() -> Self {
        Self { x: 0 }
    }

    fn get_mut(&mut self) -> &mut i32 {
        &mut self.x
    }
}

There is nothing really special here, so far. You can gain a mutable reference to the x field if you have mutable access to the struct. Standard stuff, so far. If you attempted to write code like the following, it might make you pause as your internal borrow checker raises a red flag:

fn main() {
    let mut state = State::new();

    let a = state.get_mut();
    let b = state.get_mut();

    *a = 42;

    println!("I have two i32s: {}, {}", a, b);
}

Importantly, it isn't the explicit dereference (*a = 42;) that trips the borrow checker, here. It's the println! macro. And the error message illustrates this perfectly:

error[E0499]: cannot borrow `state` as mutable more than once at a time
  --> src/main.rs:22:13
   |
21 |     let a = state.get_mut();
   |             ----- first mutable borrow occurs here
22 |     let b = state.get_mut();
   |             ^^^^^ second mutable borrow occurs here
23 | 
24 |     println!("I have two i32s: {}, {}", a, b);
   |                                         - first borrow later used here

This mutable aliasing issue is something I've become more aware of over time, so it's kind of silly for me to write examples like this. The demonstration does help prove my point about superpowers though.

Let's make it static

Using the same State struct from above, let's make some minor adjustments to the program that uses it, so that State becomes static:

static STATE: State = State::new();

fn main() {
    let a = STATE.get_mut();

    *a = 42;
    
    println!("I have one i32: {}", a);
}

Your internal borrow checker should raise another red flag, here. We made the state static, but not mutable, so we cannot compile this code:

error[E0596]: cannot borrow immutable static item `STATE` as mutable
  --> src/main.rs:20:17
   |
20 |         let a = STATE.get_mut();
   |                 ^^^^^ cannot borrow as mutable

Everything is as expected so far. We need to make the static state mutable so that we can borrow it mutably (exclusively, remember). And that also means we need to use the unsafe keyword. We're about to find out why it's actually unsafe. :)

static mut STATE: State = State::new();

fn main() {
    unsafe {
        let a = STATE.get_mut();

        *a = 42;
    
        println!("I have one i32: {}", a);
    }
}

Well, that's not so bad! My internal borrow checker can't spot anything wrong with that. And neither can the compiler's (much better) borrow checker:

I have one i32: 42

But now I want to get spicy! 🔥 I want two exclusive references to the same memory. Or maybe I don't, but it accidentally happens anyway.

static mut STATE: State = State::new();

fn main() {
    unsafe {
        let a = STATE.get_mut();
        let b = STATE.get_mut();

        *a = 42;
    
        println!("I have two i32s: {}, {}", a, b);
    }
}

My internal borrow checker says this cannot work, just like it didn't work before. And the Rust borrow checker is way better than mine, so surely it won't allow this!

I have two i32s: 42, 42

ಠ_ಠ

(╯°□°）╯︵ ┻━┻

To be fair, trying to run this code in miri does detect the Undefined Behavior. This is a very powerful tool, along with the sanitizers. Always use these when you are writing unsafe code! No exceptions. This is a huge relief, or at the very least it's better than nothing. Here's what miri finds:

error: Undefined Behavior: no item granting write access to tag <2843> at alloc1 found in borrow stack.
  --> src/main.rs:23:9
   |
23 |         *a = 42;
   |         ^^^^^^^ no item granting write access to tag <2843> at alloc1 found in borrow stack.

Still, you may be caught off guard by this. Don't worry, it's a perfectly normal reaction. We didn't fundamentally change the State implementation, and yet putting it into a static mut changes its semantics. We might be given a hint about why this is if we desugar the get_mut method's function signature:

fn get_mut(&'static mut self) -> &'static mut i32

Go ahead, try it out! It compiles just the same. The static lifetime was simply elided. But wait, what if we keep this 'static lifetime and go back to the safe program code that puts State on the stack?

error[E0597]: `state` does not live long enough
  --> src/main.rs:21:13
   |
21 |     let a = state.get_mut();
   |             ^^^^^----------
   |             |
   |             borrowed value does not live long enough
   |             argument requires that `state` is borrowed for `'static`
...
31 | }
   | - `state` dropped here while still borrowed

So it is true that the 'static lifetime is special. We cannot borrow State for 'static because State lives on the stack. We would have to move it to the heap and leak the reference to make it live for 'static if we wanted to call the method with this lifetime bound. However, we cannot create mutable aliases in safe code with leaked boxes because they are not static.

In the case of static mut, we're allowed to borrow with a 'static lifetime, but we're also allowed to exclusively borrow multiple times. If you could only exclusively borrow a static mut once, it would make the feature useless for many valid cases. And this is the tricky part that, at least in my mind, makes static mut such a nightmare. You don't have a choice but to ensure the invariant that mutable aliasing does not occur, but the compiler won't prevent you from doing so. It is particularly insidious when your safe interface allows reentrancy, and that's how I hit this problem. Nested dbg! macro calls are pretty common.

This is the superpower of static mut.

How did I fix this in my configurable serial I/O?

Well, I'm not entirely sure that I did, to be honest! But I did find a way to get my macros to receive an exclusive borrow with a lifetime shorter than 'static. And I did that by not borrowing a static mut. Instead I have statics with interior mutability (STDOUT and STDERR). I opted to use spin::Mutex<RefCell<T>> to keep my uses of the unsafe keyword to a minimum, but it's also possible to store state with interior mutability in a static UnsafeCell. This is the recommended replacement according to Consider deprecation of UB-happy `static mut` · Issue #53639 · rust-lang/rust (github.com)

I also had to jump through hoops to get trait objects working without Box, but that was another rabbit hole entirely. In the end, I learned a valuable lesson about static mut and how bad it is for your health (and mine!) There are ways to make it work that are trivially safe, but reading through that GitHub ticket ought to ruffle your feathers, even if I haven't already convinced you that static mut probably should not be used.

Saturday, June 13, 2020

The Rust compiler isn't slow; we are.

This might be a bit of an unpopular opinion, so the clickbait title should be appropriate!

I've been learning and using Rust for nearly four and a half years, since version 1.6. That's a good amount of time to become familiar with some of the troubles of the language, the tooling, and the ecosystem. But this is a slightly different story, this is a dissent to a common criticism that the compiler is slow. In this discussion, I will argue that these claims are misleading at best.

Screenshot of Activity Monitor showing rustc using 100% CPU

rustc compiling a non-trivial application.

There is plenty of evidence that suggests rustc is slow relative to compilers for other languages; Go can compile applications with similar complexity in 1/5 the time, and clang or gcc can do the same for applications written in C. It might be acceptable to justify that the Rust macro and type systems are significantly more complex than what is usually written in either Go or C, and that it warrants much heavier lifting by the Rust compiler. While there is some truth in that sentiment, I believe there is more to it than that. I believe it's more of a human problem than an algorithmic one.

Tiny MCU 3D Renderer Part 10: Benchmarking, profiling, and optimizations

The reason I have been so quiet recently is because I've been working on a very dry subject; testing. Unit tests are probably the most uninteresting thing I do on a daily basis as a software architect. Create some fixtures, write some assertions, run the test suite, fix any issues, repeat. It makes me cringe, just thinking about it.

But here I am, blogging about how I've been writing tests in my fun personal project, too! I am taking this project very seriously, so tests are a necessary evil. I haven't checked coverage yet (because I hit this bug with tarpaulin vs serde_derive), but it's probably somewhere around 70~80%.

Update 2017-11-05: serde_derive is only used in the app. The lib doesn't have this dependency, so I can run coverage on the lib separately. The result is even better than I imagined: 85.14% coverage, 573/673 lines covered

Surprise! There are only 26 tests (ignoring the 4 benchmark tests), how can coverage to be so high? This is one of the most interesting things about 3D rasterization; it doesn't require a whole lot of code! To be honest, though, I'm only exposing the bare minimum set of APIs; for example there's no support in the fixed function pipeline for stencil/scissor, alpha blending, etc.

Tiny MCU 3D Renderer Part 9: Bug fixes, and first shader tests

It's shader time! I finished the simple sunbeam, which I think looks quite nice in motion.

It could use a little work on the gradient, and some extra geometry wouldn't hurt. There are four sunbeams in this test scene. Each is just a plane (two triangles). I'm thinking of splitting the plane into thirds (vertically) so I can shape it more into a semi-circle, instead of just laying flat in the background. That might give it some depth, and help combat the problem with the right-most sunbeam looking so thin.

Tiny MCU 3D Renderer Part 8: Programmable Pipeline and Asset Pipeline

Oh hey, look at that! I changed the CSS on my blog a smidge. It's worth mentioning, anyway. Say hello to the Blipjoy Invader! You can just make it out on the left side, there. (Depending on your screen resolution, it might be behind the post content, whoops!)

I also finalized the programmable pipeline on the 3D renderer, thanks to @vitalyd over at the Rust user's forum for the hint I needed to push me in the right direction. The API isn't exactly what I had in mind, but it's certainly reasonable.

This is what I built yesterday. The model on the left is drawn with our very familiar Gouraud shader with interleave dithering on a four-color gradient. This is what we've seen exclusively in screenshots to this point. On the right is something new! A much simplified shader that renders something like a cartoon, aka cel shading minus edge detection. Each model rotates in opposite directions for funsies.

Code from the last article will be referenced below.

Tiny MCU 3D Renderer Part 7: Generics and Traits, oh my!

If you've been following my blog, you'll know that I've been writing a 3D renderer in Rust. This is my first real experience using the language. I dabbled a bit in Rust on a project in January 2016, where I was challenged by lifetime annotations, and gave up. This time around, I've managed to write a complete software renderer with all the bells and whistles of a modern shader model, without the need for a single lifetime annotation. And until just recently, without defining a single Trait or Generic.

Earlier articles in this blog series have focused exclusively on the renderer from an end user's point of view. In other words, trying to make it attractive to the everyday gamer. In this episode, I want to describe in detail one of the issues that I have been struggling with in the code design, and how I've been approaching a solution. This is by no means the "right way" to handle this, or even similar situations. I just want to provide some info for anyone who happens upon the right magical incantation in The Googlies, and ends up reading this.

(Edit 2017-10-04: The illustration below was originally flipped vertically. It now shows the correct scanning direction.)

Triangle rasterization scans pixels in the target buffer from left-to-right and bottom-to-top

So let's start from the middle, and work our way out. Above is a simplified representation of the rasterization step of the renderer. The details prior to, and those that follow rasterization can be ignored for the time being.

This image represents a zoomed-in detail of a frame buffer (or any render target) as a 2D triangle is being rasterized. This occurs after the perspective division, so the Z coordinate can simply be dropped for rasterization; the Z component is used later for depth testing.