Sunday, August 10, 2025

Are we Teaching Rust Effectively?

Prologue

When I was a young Rustacean, I can remember being profoundly confused by lifetime annotations. The concept is incredibly simple, and I don't believe it's a major concern for students to understand: the borrow cannot outlive the value. This is especially obvious for anyone coming from a C background. Less obvious is that this is an invocation of subtyping in Rust [1].

So, why did I find the annotations confusing? I'm talking about not only the syntactic lifetime annotations like 'a, but also their semantics. My first impression was (as far as I can remember), "these are just noise." They appeared to clutter the code without providing clarity. That was back in early 2016. After a year or so, my perspective had completely flipped.

And nearly a decade on, I am pretty comfortable with lifetime annotations, and I find myself frequently naming them with intention. Lifetime names like 'a and 'b are bad style, in my opinion, especially when many lifetimes that do not add clarity can be elided. I might abbreviate some of them like 'win instead of 'window, but it's still a good signal to the reader about what this lifetime represents. Another specific name I frequently use is 'arena when I have an arena allocator. This one is most useful for invariant shared lifetimes like &'arena Node<'arena, T>.

If the term "invariant shared lifetime" means little to you, I can recommend Arenas in Rust to start you on a profound journey through category theory. 😃

This intro is just a bit of exposition to set the stage for a problem that gives me pause. Lifetimes are an important concept in Rust, but I would say they are not the most important concept. Lifetimes are just the cost of doing business. They are tightly coupled with the concept of ownership and borrowing. Which means neither of these concepts are the most important, either. They all sit at the same level of importance, if we could order language features as such.

In my humble opinion, the Aliasing Xor Mutability [2] principle (henceforth AXM) is Rust's raison d'être. Everything else is supported by this principle. I am not the first to make this observation. In 2013, Niko Matsakis made the connection clear within "On the connection between memory management and data-race freedom".

In "Notes on a smaller Rust", withoutboats adds algebraic types and RAII as other necessary components. I like these features of Rust a lot, and they certainly make my code more robust. But I can hardly put them at the same level of importance.

The entire functional programming paradigm exists as a result of confining all mutable state (or more broadly, side effects) to an effect system. Rust's primary contribution is proving that programming languages do not need to go as far as eliminating mutability, just eliminating the combination of mutability and sharing is enough.

The point I'm trying to make is that AXM is fundamental. More than ownership, borrowing, lifetimes, automatic memory management, and type systems. Don't misunderstand how I'm framing this. These features all work together to make Rust what it is. But if I had to choose just one to start with when teaching the language, it would be AXM. By a long shot, and it's not even close.

The majority of the introductory material should focus solely on this one property. Because it shows up everywhere from your first borrow to your last unsafe implementation. It is a rule you can never break [3] for safe &/&mut references. The rule is absolute; interior mutability is an opt-in runtime-checked or hardware-checked exception. I think this is all strong enough evidence to make the claim that AXM is a big deal.

The magnitude of importance of the AXM principle wasn't brought to my attention while I was initially learning the language. It didn't even show up on my radar until I read The Problem With Single-threaded Shared Mutability (which links back to Niko's post that I mentioned before), probably sometime in 2018. The analogy with read/write locks is what finally made it "click" for me.

Another enlightening read was the "GhostCell: Separating Permissions from Data in Rust" paper, which I also found around the same time. I learned of this principle's significance much later than I should have and it looks like others might be falling into the same trap.

Evaluating Rust Learning Material

Now, let's compare with how Rust is actually taught in the wild, at least in a few specific cases. To start with, The Book elevates the AXM discipline early on in Chapter 4.2: References and Borrowing, directly after an introduction to the language syntax and concepts. The principle is discussed in the literature. I believe that is what's missing is making it the spine for beginner curriculum.

"Learn Rust with Entirely Too Many Linked Lists" is often cited for its humorous display of the difficulty of writing a simple data structure in Rust. Six implementations of a linked list with the goal of teaching pointers, ownership, borrowing, interior mutability, keywords, patterns, generics, destructors, testing, miri, raw pointers, aliasing, and variance. That's a nice list of things that do need to be covered to learn the language.

Ownership is covered in Chapter 2.3: Ownership 101 as a primary principle. I did not get the sense from this chapter that sharing and mutation are mutually exclusive. It's maybe only vaguely implied. The principle is not specified (as far as I could tell) until Chapter 3.6: IterMut.

Subtyping is not covered until the sixth implementation. Subtyping is essential knowledge for all references, not just raw pointers and unsafe code. Just a short and early introduction to variance like the relationships betwen shared and mutable references is enough to set a basic intuition. The prologue to this very post introduces a common example of invariant shared references, and we'll see it used later.

"Too Many Linked Lists" is not the only source that could spend more time on AXM. In the last TWiR, there was an article about the difficulty of writing a binary tree. I empathize with the author, the problem they were given is very frustrating. But I disagree with their conclusion that the difficulty is caused by Rust.

The lesson on LeetCode provides a type definition of the tree using shared ownership and interior mutability. Neither are required for a binary tree, and both of which add unnecessary complexity that makes solving this "easy problem" "hard", in the words of the Monday Morning Haskell blog.

To my point, the blog post in question pins the blame on ownership. My suspicion is that if more attention was given to AXM in educational material, this misconception may have been avoided. In the best case, the student might even see the red flag and questioned the lesson. Here is the type definition given:

// Definition for a binary tree node.
#[derive(Debug, PartialEq, Eq)]
pub struct TreeNode {
    pub val: i32,
    pub left: Option<Rc<RefCell<TreeNode>>>,
    pub right: Option<Rc<RefCell<TreeNode>>>,
}

This type definition wants to pretend that you can break the AXM principle. If you try it anyway with this type, the code will panic when borrowing (RefCell is a runtime-checked read/write lock). There is no need for RefCell in this type. You can (and should) just use normal references. And there certainly isn't any need for reference counting, or restricting usage to a single thread.

It also allows constructing DAGs and cyclic graphs, which are not binary trees. In a general sense, a binary tree is tree-structured and acyclic. The data structure can be designed intentionally as a graph, but I would not call it a tree in that case.

Binary trees have a purpose, but I would certainly change the type definition if I were using the data structure in learning material. I would instead start with this definition:

// Definition for a binary tree node.
#[derive(Debug, PartialEq, Eq)]
pub struct TreeNode {
    pub value: i32,
    left: Option<Box<TreeNode>>,
    right: Option<Box<TreeNode>>,
}

The remainder of this post will explain why, using the AXM principle.

Aliasing Xor Mutability

The number one most important thing that you must always respect is this principle of AXM. Without it, you do not have a Rust program. This discipline constrains how the data structure we are about to implement can be used. It also informs the designer on crucial design considerations, namely ergonomics.

For instance, if a reference is going to be held to any internal state by any caller (because the interface returns references) then the designer should be careful to remember that references act like read/write locks for all intents and purposes. Design your interfaces around this knowledge so that references are held only as long as necessary, and no longer. References are statically-checked read/write locks.

I can imagine a method that takes a path and optionally returns an arbitrarily deep node reference. That would be a lot less fragile than requiring the caller to walk each node individually, and would eliminate any possibility of holding onto references to any intermediate node. Reducing the appeal of holding references for long spans goes a long way in making Rust work. I would even say that this is part of the art form for interface designers.

By the way, did you happen to notice that the left and right fields on this struct are private? That's fairly important for encapsulation, but we are not going to rely on it for upholding invariants. There really are no invariants with this data structure aside from those provided by Option, Box, and references. They could just as well be made public.

The Constructor

The constructor is almost exactly what you might expect. I decided to hide the Box as an implementation detail, so that callers do not have to box everything before Option-wrapping. This is an ergonomic consideration.

impl TreeNode {
    pub fn new(value: i32, left: Option<Self>, right: Option<Self>) -> Self {
        Self {
            value,
            left: left.map(Box::new),
            right: right.map(Box::new),
        }
    }
}

That's enough to construct immutable trees. But we can also add accessor methods to allow callers to walk the tree and insert or update nodes.

impl TreeNode {
    pub fn left(&self) -> Option<&Self> {
        self.left.as_deref()
    }

    pub fn set_left(&mut self, left: Option<Self>) -> Option<Self> {
        std::mem::replace(&mut self.left, left.map(Box::new)).map(|value| *value)
    }

    pub fn right(&self) -> Option<&Self> {
        self.right.as_deref()
    }

    pub fn set_right(&mut self, right: Option<Self>) -> Option<Self> {
        std::mem::replace(&mut self.right, right.map(Box::new)).map(|value| *value)
    }
}

Once again, the Box is hidden as an implementation detail. This is the only reason I decided to make the left and right fields private.

Finally, let's solve this "Invert Tree" problem. I'm going to dismiss the interface provided by LeetCode because it completely ignores mutable references, and instead requires moving ownership of nodes. I came up with this interface and solution, which I believe to be a remarkable improvement:

impl TreeNode {
    pub fn invert(&mut self) {
        std::mem::swap(&mut self.left, &mut self.right);

        if let Some(left) = &mut self.left {
            left.invert();
        }
        if let Some(right) = &mut self.right {
            right.invert();
        }
    }
}

That wasn't so bad, was it? No cloning, no reference counting, no interior mutability, no lifetime annotations. It adheres to the AXM principle. So long as the caller is not currently holding an aliased reference to the tree, it can be inverted.

I didn't code golf it down to one line, but I don't care. Terse code is not "easy" code. It's a compression scheme that depends on the writer and readers having the same foundational knowledge base (hint: they do not).

Unfortunately, I cannot just paste this into LeetCode and see if it solves the problem! LeetCode provided a type definition that I disagree with. One that made this problem seem hard. The problem itself is very approachable, even in Rust. And that is my main concern with how the language is being taught, at least in the few instances I've explored here. In order to write a decent implementation, I have to discard what the learning material is trying to impart.

I cannot reasonably expect anyone to come up with this kind of solution on their own if they are not yet familiar with Rust and the AXM principle. They would have to be intimately familiar with the concept to question the lesson.

What is LeetCode trying to teach in this case? It doesn't appear to be "good data structure design" or "good API design". It isn't even "how to write C code in Rust". I think what's actually going on is that the LeetCode curators are not Rust educators. They are mechanically converting lessons from other languages like C++, Java, and Python. It causes an impedance mismatch that does a harsh disservice to learners.

LeetCode isn't what's Wrong with Rust Education

This is just my opinion (okay, everything in this post is my opinion), but I think LeetCode is projecting a symptom, not the problem at large. The symptom is that shortcuts and escape hatches like cloning, reference counting, and interior mutability are getting overused, and ownership is overemphasized.

The problem as I see it is that the real fundamentals, like the AXM principle and subtyping, are not put at the forefront of lessons and truly hammered in as a pillar or keystone of the language. Not enough time is spent early on dissecting why they are needed and how to use them effectively, especially when designing interfaces.

There is no roadmap for learning the most important parts of the language in beginner courses. You have to learn it the hard way, just like everyone else did. The Curse of Knowledge strikes again.

What else does this Teach Us?

Before I forget, let's also implement that tree traversal by path method that I briefly mentioned earlier:

#[derive(Copy, Clone, Debug)]
pub enum Direction {
    Left,
    Right,
}

impl TreeNode {
    pub fn walk(&self, path: &[Direction]) -> Option<i32> {
        let mut result = self;

        for dir in path {
            result = match dir {
                Direction::Left => result.left(),
                Direction::Right => result.right(),
            }?;
        }

        Some(result.value)
    }
}

No surprises here. Some path is used to recursively traverse the tree, and it returns the value at that path location if reachable. If we also need a mutable version, it ends up looking almost the same:

impl TreeNode {
    pub fn walk_replace(&mut self, path: &[Direction], value: i32) -> Option<i32> {
        let mut result = self;

        for dir in path {
            result = match dir {
                Direction::Left => result.left_mut(),
                Direction::Right => result.right_mut(),
            }?;
        }

        Some(std::mem::replace(&mut result.value, value))
    }

    pub fn left_mut(&mut self) -> Option<&mut Self> {
        self.left.as_deref_mut()
    }

    pub fn right_mut(&mut self) -> Option<&mut Self> {
        self.right.as_deref_mut()
    }
}

This method takes a value and inserts it into the tree, returning the existing value. It does not create missing intermediate nodes, so it is not a general purpose setter.

What I'm trying to show here is that using temporary borrows to access your data structures, even in some non-trivial ways, can be done in a straightforward manner. This is how I would have preferred to learn Rust, if I could start all over [4].

Show me that I can do these kinds of things with references and how the AXM principle makes that possible. Show me that the borrow checker is not adversarial. Give me an intuition that Rc, RefCell, Arc, Mutex, RwLock, and Clone are all tools that I can opt into when the situation calls for it. Help me recognize what those situations are. Don't treat every problem as something to be solved by garbage collection, workarounds, or someone more knowledgeable.

A Mini Postmortem

If you attempt to use the data structure we've designed, you'll find that it necessarily has some shortcomings. For one thing, you won't be able to hold a reference to any interior node while also attempting to insert a new node into a non-conflicting branch. This is intentional. But the API as introduced doesn't tell us why. I can logically prove that it is safe to do this only because the interface leaks implementation details. I must know the internal structure of the tree to do the analysis. A caller that did not design this data structure can reach the same conclusion without ever looking at the implementation. The public interface contains all of the information necessary to find this limitation.

My observation on this point is that the interface itself is bad. Requiring users to traverse the tree using "left" and "right" instructions breaks the abstraction barrier. A better interface would encapsulate the implementation detail that the data structure is a binary tree. It might seem like an unusual premise, that a binary tree would want to hide the fact that it's a binary tree. In my experience, I tend to think of abstractions from an outside perspective most of the time. I don't care that this map is implemented as a binary tree, or a hash table, or a B+ tree, or a trie. They might have specific properties that appeal to me, but what I really want is to be able to replace them as requirements change.

If my code ends up being littered with "left, right, right, left, right" paths, that is going to make it more difficult to switch to a keyed data structure later. There are a few ways around this:

  1. Use keys instead of explicit paths. This can use any mechanism that allows a key to uniquely identify every node in the data structure. One example is an opaque handle that is treated directly as a bitmap describing the path (0 for left, 1 for right), or indirectly through the Hash trait for arbitrary key types. Or some other "basic property" of the key like ordering (this is what std::collections::BTreeMap does).

  2. Use the values themselves, constructing a set instead of a map. It requires that values are unique, but the same strategies apply because sets are just maps without values: Set<T> ~= Map<T, ()>

Making this "abstract leap" helps escape the problem where limitations can be identified by logical deduction. It allows the designer to reserve the right to make changes that would otherwise be breaking. This intentional limitation makes it possible for the next version of the data structure to be self-balancing without requiring all users to change how they are interacting with it. Or changing it internally to a B+ tree. We're simultaneously mitigating bad access patterns and future churn.

There's another obvious benefit to raising the abstraction bar. Ergonomics increases ten-fold when the caller can bring their own key without worrying about the specifics of how it might affect memory layout or traversal time. The abstraction is a promise that it will do the right thing.

I'll leave the design of a good abstract interface for this data structure as an exercise for the reader. Presumeably I've given some good ideas to help you get started on what it could look like. I've also glossed over an important detail in regards to making the node's value generic. The original type definition used i32, so that's what I copied. I also designed a variant that used generic values while writing this article. (A binary tree that only holds a specific number type is practically useless.) The biggest concerns came up as part of the next implementation because it needs two separate arenas: One for the nodes and one for the values. This article is long enough as it is!

One final thought on data structure design: The one we've come up with so far is limited not only in how internal references can be held by callers, but also the way in which the tree can be structured. It is strictly a tree. When trees become DAGs or cyclic graphs, you must re-introduce interior mutability (and perhaps also shared ownership). The next section shows one way to do this intentionally.

Arena-based Graph

Let's assume for a moment that the reasoning presented above just isn't convincing. You need to hold a reference while simultaneously mutating, and nothing else could possibly work. That's where arenas come in.

I'll spare you the walk-through this time, but I also implemented the same kind of binary tree interface using an arena and shared references everywhere which allows constructing graphs:

use std::cell::Cell;
use typed_arena::Arena;

// Definition for a graph.
#[derive(Default)]
pub struct Graph<'arena> {
    nodes: Arena<GraphNode<'arena>>,
}

impl<'arena> Graph<'arena> {
    pub fn node(
        &'arena self,
        value: i32,
        left: Option<&'arena GraphNode<'arena>>,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> &'arena GraphNode<'arena> {
        self.nodes.alloc(GraphNode::new(value, left, right))
    }
}

// Definition for a graph node.
#[derive(Debug, PartialEq, Eq)]
pub struct GraphNode<'arena> {
    value: Cell<i32>,
    left: Cell<Option<&'arena GraphNode<'arena>>>,
    right: Cell<Option<&'arena GraphNode<'arena>>>,
}

impl<'arena> GraphNode<'arena> {
    fn new(
        value: i32,
        left: Option<&'arena GraphNode<'arena>>,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> GraphNode<'arena> {
        GraphNode {
            value: Cell::new(value),
            left: Cell::new(left),
            right: Cell::new(right),
        }
    }

    pub fn invert(&self) {
        self.left.swap(&self.right);

        if let Some(left) = self.left.get() {
            left.invert();
        }
        if let Some(right) = self.right.get() {
            right.invert();
        }
    }

    pub fn value(&self) -> i32 {
        self.value.get()
    }

    pub fn set_value(&self, value: i32) {
        self.value.set(value);
    }

    pub fn left(&'arena self) -> Option<&'arena GraphNode<'arena>> {
        self.left.get()
    }

    pub fn set_left(
        &'arena self,
        left: Option<&'arena GraphNode<'arena>>,
    ) -> Option<&'arena GraphNode<'arena>> {
        self.left.replace(left)
    }

    pub fn right(&'arena self) -> Option<&'arena GraphNode<'arena>> {
        self.right.get()
    }

    pub fn set_right(
        &'arena self,
        right: Option<&'arena GraphNode<'arena>>,
    ) -> Option<&'arena GraphNode<'arena> {
        self.right.replace(right)
    }
}

The details of how this works are very interesting, and you have to understand them to write code like this. The key is that invariant shared references are Copyable.

This implementation is structured differently from the original heap-allocated data structure we came up with before. There's a new Graph type that serves as the owner of the Arena. (I spent a great deal of time downplaying the role of ownership only for its importance to surface! But I maintain that the knowledge of AXM is a far superior tool.)

It can also be code golfed a bit by defining a type alias for the node pointers:

// Definition for a graph node pointer.
pub type NodePointer<'arena> = Option<&'arena GraphNode<'arena>>;

Some might consider that the interface with this type alias is easier to read, but I think it hides the invariant shared lifetime. Hiding (or making opaque) one of the crucial reasons this code even works just to save a few lines does not seem like a wise tradeoff.

Usage looks something like this:

let graph = Graph::default();

let foo = graph.node(13, None, None);
let bar = graph.node(42, None, None);
let baz = graph.node(27, Some(bar), None);
let root = graph.node(0, Some(foo), Some(baz));
dbg!(root);

root.invert();
dbg!(root);

Same bad constructor and traversal interface as before, but this time it allows the intuitive "mutate unrelated branches" idea to work. For instance, I can keep the bar reference around and change the node's value to 33 after inserting a right child into baz, or even after inverting the graph at the root node. That saves a graph traversal after each mutation.

Because this implementation is arena-based, it gains a new capability. The graph can contain cycles:

let graph = Graph::default();

let foo = graph.node(13, None, None);
let bar = graph.node(42, Some(foo), None);
foo.set_left(Some(bar));

Just don't dbg!() print it! It will traverse the graph endlessly. A Debug implementation that is aware of potential cycles is better than deriving.

The biggest caveat with arena-based data structures is that they only grow. When growth is left unbounded, that's a memory leak. The only way to reclaim their memory is dropping the Arena. They tend to be useful in niche scenarios, like compilers that need to allow cyclic references within an AST. Additionally, &Cell<T> is not thread-safe, it would have to be changed to Arc<RwLock<T>> to be sharable across threads.

This ties into my earlier point about teaching situational awareness. Students need to learn what situations are a best fit for all of the complex tools at their disposal. And the best way to do that is by teaching abstract concepts. Although, doing that is difficult itself and commonly boils down to explanation by analogy, as prominently criticized by E.W. Dijkstra in "On the cruelty of really teaching computing science".

Conclusion

Where does this leave us? I haven't solved any problems. I didn't make it any easier to learn Rust. I didn't make it any easier to solve the LeetCode problem that catalyzed this rant. Maybe, if anything, I've introduced some important lessons to people that are trying to learn this language. The best outcome would be that I've inspired educators to reformulate their curriculum around fundamental language properties like AXM, and dial down the emphasis on ownership and lifetimes.

Lifetimes aren't that scary, anyway. The naive heap-allocated binary tree above doesn't have any lifetime annotations at all. They just aren't necessary in that case. The arena-based graph is full of them syntactically, but there's only one and it refers to the Arena.

At this time, I do not think we are teaching Rust effectively. But I also think there are ways that we can.


  1. &'long T <: &'short T (covariant). Another common example is that &mut T is invariant.

  2. AKA Shared Xor Mutable.

  3. So important is this rule that it's called out three times in rapid succession in the Nomicon, in the most memorable way possible:
    • Transmuting an & to &mut is always Undefined Behavior.
    • No you can't do it.
    • No you're not special.
  4. Anachronism: This code uses a lot of Quality of Life syntax and language semantics that did not exist when I was learning, circa Rust 1.7.

    The most notorious of these has to be Non-Lexical Lifetimes, stabilized in Rust 1.36. Before then, the iterative loop that takes an exclusive borrow in two potential branches could not easily be written. The code can be modified to compile with Rust 1.0 by desugaring and including the workaround for NLL Problem Case #4.