Wednesday, October 4, 2017

Tiny MCU 3D Renderer Part 7: Generics and Traits, oh my!

If you've been following my blog, you'll know that I've been writing a 3D renderer in Rust. This is my first real experience using the language. I dabbled a bit in Rust on a project in January 2016, where I was challenged by lifetime annotations, and gave up. This time around, I've managed to write a complete software renderer with all the bells and whistles of a modern shader model, without the need for a single lifetime annotation. And until just recently, without defining a single Trait or Generic.

Earlier articles in this blog series have focused exclusively on the renderer from an end user's point of view. In other words, trying to make it attractive to the everyday gamer. In this episode, I want to describe in detail one of the issues that I have been struggling with in the code design, and how I've been approaching a solution. This is by no means the "right way" to handle this, or even similar situations. I just want to provide some info for anyone who happens upon the right magical incantation in The Googlies, and ends up reading this.

(Edit 2017-10-04: The illustration below was originally flipped vertically. It now shows the correct scanning direction.)

Triangle rasterization scans pixels in the target buffer from left-to-right and bottom-to-top

So let's start from the middle, and work our way out. Above is a simplified representation of the rasterization step of the renderer. The details prior to, and those that follow rasterization can be ignored for the time being.

This image represents a zoomed-in detail of a frame buffer (or any render target) as a 2D triangle is being rasterized. This occurs after the perspective division, so the Z coordinate can simply be dropped for rasterization; the Z component is used later for depth testing.

The color coded shapes are as follows; the green lines represent a triangle defined by three vertices. These vertices are defined by the attributes from the attribute buffer that get passed to the vertex shader. The red rectangle represents a minimum bounding box that fully encloses the triangle. The black grid represents pixels in the frame buffer (enlarged). The gray pixels have already been rasterized and fall outside of the triangle; these pixels are ignored. The blue pixels have also been rasterized, but they fall within the triangle; these pixels are updated after the fragment shader decides what color each of these pixels should be. There is also an arrow pointing to the right, illustrating the current position and direction of the rasterizer.

Pixels in the bounding box are tested one at a time for hits within the triangle. The test uses barycentric coordinates, since this coordinate system is extremely useful for interpolating between the three vertices. This interpolation is done on all of the shader varyings; UV coordinates, colors, normals, etc.

Rasterization is part of the fixed functions in the rendering pipeline. When you use an API like OpenGL, DirectX, or Vulkan, you do not write any of this logic; the GPU takes care of it for you. The only thing you really have control over are the programmable parts of the pipeline. Those being the various shaders, and the connections between each step in the pipeline.

In my software renderer, this fixed function pipeline is also something that a user does not need to deal with. When I write my game, I'm not going to crack open this source code and hack away at it when I want to add a new fancy graphical effect. Just like OpenGL et al., it's only the shaders that I need to write (along with chaining useProgram and drawArrays/drawElements calls; defining "connections" between each step in the pipeline).

This is wear we meet the first challenge I had to overcome; Rust is a statically compiled language, so the compiler needs to know the data type for every object at compile-time. But, I'm writing a library that has a very incomplete view of the world at compile time! The library does not contain any shaders! The application using the library has the shaders, and the application passes those shaders to the library, so it can make pretty pixels.

Enter Generics

Generics are nothing new in computer science. C++ has class templates which are roughly identical in practice, and Swift has generics out of the box. The concept is simple enough; A function is defined with placeholders for concrete data types, and the compiler can generate code for as many variations of placeholder data types as you use. This is different from saying the compiler will generate code for every possible data type that could be used. In short, a function that is generic over two different data types will result in two different functions being compiled; both accessible by the same name, and the compiler is smart enough to choose the right one based on the data type used.

That's a simplified description of generics, anyway. In truth, a function can be generic over multiple data types at the same time. And this is a feature I had to take advantage of, as you will see.

One tricky part about using generics in Rust is that you don't have access to any properties or methods defined on your data type (which is often a struct) because the compiler does not know anything about the concrete data type at compile time. Rust introduces a Trait system to overcome this limitation, allowing the language to effectively support duck typing. As commonly pointed out, if it walks like a duck and it quacks like a duck, it must be a duck.

Enter Traits

This is a feature of Rust that took some getting used to. Traits define implementations (methods) on a data type. They can also include associated types, which made it possible to write the fixed function pipeline and shaders at all! An associated type is closely related to generics; these are data type placeholders on a trait. Furthermore, generics can be specified with bounding conditions. For example, the function is only valid for types that implement the Foo trait. And the associated type allows further bounding, like the function is only valid for types that implement the Foo trait and whose associated data type named Bar is your favorite concrete type (or generic).

Trait bounds and associated types were the most important cornerstones of my shader exercise. Without further ado, let's have a look at the simple Gouraud lighting shader that I've been using on the head model.

pub struct Attribute {
    position: Vector4<f32>,
    uv: Vector2<f32>,
    normal: Vector3<f32>,

pub struct Uniform {
    pub projection: Matrix4<f32>,
    pub modelview: Matrix4<f32>,
    pub normal_matrix: Matrix3<f32>,
    pub light_dir: Vector3<f32>,
    pub diffuse: Texture,

pub struct Vertex {}

pub struct Fragment {
    uv: Vector2<f32>,
    intensity: f32,

Wow! I just installed PrismJS on my blog. It's so much nicer that trying to highlight code by hand with the font styles... Anyway...

Those <f32>'s all over the code are a concrete type (32-bit floating point number) for generic structs. As an example, Vector4 is a 4-dimensional vector. This type is generic, and here I'm defining a Vector4 of f32's. The Vector and Matrix structs are from the cgmath crate, by the way. And interestingly, it's also the only dependency in the renderer!

The Texture type is a little different. That's my own struct. It has some useful properties, like width and height. It also contains the texel data, obviously. It also contains a Gradient, which is what you see at the bottom of this screenshot, just above the grayscale gradient:
This is the magic behind the shader. :) The Uniform diffuse contains not only the texture data, but also a gradient mapping for the final output color.

Last thing to mention before moving on to the next bits of code is that empty Vertex struct. This is what I've seen referred to as a "phantom" type by the community. It contains no data, and implements no methods, making it invisible at runtime. But it's not purely phantom in this case, since I do end up implementing a trait on it:

impl VertexShader for Vertex {
    type Attribute = Attribute;
    type Uniform = Uniform;
    type Fragment = Fragment;

    fn shader(
        attr: &Self::Attribute,
        uniform: &Self::Uniform,
    ) -> (Vector4<f32>, Self::Fragment) {
        // The transformed position
        let position = uniform.projection * uniform.modelview * attr.position;

        // Create varyings
        let light_dir = attr.normal
            .dot(uniform.normal_matrix * uniform.light_dir)
        let varyings = Fragment::new(attr.uv, light_dir);

        (position, varyings)

impl FragmentShader for Fragment {
    type Uniform = Uniform;

    fn shader(&self, position: Vector3<f32>, uniform: &Self::Uniform) -> (u8, bool) {
        let diffuse = &uniform.diffuse;

        // Linearly interpolate the texture
        let u = (self.uv.x * diffuse.width) as usize;
        let v = (self.uv.y * diffuse.height) as usize;
        let texel = diffuse.texels[u + v * diffuse.width as usize];

        // Apply lighting and convert to Gamma-Space
        let gray = linear_to_gamma(self.intensity * texel as f32) as u8;

        // Apply dither
        let x = position.x as usize;
        let y = position.y as usize;
        let color = dither_interleave(x, y, gray, &diffuse.gradient);

        (color, false)

And here are the shader implementations! 🎉 This is the meat and potatoes of the renderer, right here. You'll have to forgive the crazy whitespace on the vertex shader function signature. rustfmt made it readable, so you don't have to scroll 5 miles to the right.

So you see, my Vertex struct does actually take some storage space. I think it's just one pointer for the shader method, but it's definitely not a pure phantom type. There should be no surprises, here; the vertex shader takes an Attribute and a Uniform as inputs, and returns a position and a Fragment! Yeah, maybe that is a little surprising, come to think of it. The vertex shader returns a Fragment. That's because Fragments are magical; they implement the interpolation between other Fragments. We'll see that in a minute. The fragment shader, on the other hand, takes a position and a Uniform as inputs, and returns a color and "discard" boolean. If you've worked with OpenGL, this should all feel familiar so far.

You'll also notice some function calls in the fragment shader for image processing; the dithering is done here. I'll gloss over it for now, the code for these functions is readily available elsewhere. This is an interesting point though; the shaders can call any Rust code you want. No limitations with GLSL language features or anything. For performance reasons, it is important to keep the shaders fast. It's most useful for debugging... Which I'll cover later, I'm sure.

So, how does a Fragment implement the interpolation? It's clearly not implemented by the FragmentShader trait, seen above. Well, Rust has another little trick up its sleeve; types can implement any number of traits! As many as you want, go nuts! And the trait bounds can additively require that a type implements ALL of the defined traits. So I just have a second trait, simple!

impl Varying for Fragment {
    fn interpolate(&self, s: &Fragment, t: &Fragment, b: Vector3<f32>) -> Fragment {
        // How to interpolate a Vector2
        let u = Vector3::new(self.uv.x, s.uv.x, t.uv.x).dot(b);
        let v = Vector3::new(self.uv.y, s.uv.y, t.uv.y).dot(b);

        // How to interpolate an f32
        let intensity = Vector3::new(self.intensity, s.intensity, t.intensity).dot(b);

        // TODO: Reuse a Fragment object instead of heap-allocate. E.g. object pooling
        Fragment {
            uv: Vector2::new(u, v),

Yep, there's a comment at the bottom about optimizations. Don't want to waste time now on something that probably won't have a noticeable impact (premature optimization, root of all evil, I don't know). Otherwise, this implementation is completely uninteresting. The input args are all one letter because this implementation eventually needs to be automated away; the user should not have to write this code! It's part of the fixed function pipeline. Automation is also the reason the interpolate method is not part of the FragmentShader trait.

Anyway, the input args are short for (s)econd Fragment, (t)hird Fragment, and (b)arycentic coordinates, where self is the first Fragment.

On the topic of automation, there are two possible routes that I know of (for now) in Rust: First, a procedural macro should be able to knock this one out relatively easily. And secondly, a compiler plugin. The procedural macro can derive the Varying implementation on any type, much like std::default::Default. Both will require a nightly Rust compiler, and procedural macros seem like the right way to go.

The automation is super dumb and simple; just a matter of breaking up each Varying vector into its scalar components, after which a new vector is created from the scalar components of all three Fragments, then a dot-product is taken over the resulting vector. These dot-products are finally reassembled into the original vector formats for each Varying. It took me a minute or two to spot the pattern when I was originally writing the interpolation, before I even had a shader model. But it's straightforward, and can be automated away so the user never has to write this boilerplate, or even think about it!

And then?

Well, that's literally all there is to show you, in regard to the shaders! The fixed function pipeline is now filled with generics and trait bounds. It's a little verbose, but manageable. That said, I do have one more thing I can show you, and it should help make it clear how crazy the generics and trait bounds syntax starts to get.

// Shader traits that users must implement
pub trait VertexShader {
    type Attribute;
    type Uniform;
    type Fragment;

    /// Vertex Shader
    /// Inputs:
    /// * `attr`: Current Attribute variables
    /// * `uniform`: Current uniform variables
    /// Outputs:
    /// * `.0`: Transformed position
    /// * `.1`: Fragment (with varyings) for this vertex
    fn shader(
        attr: &Self::Attribute,
        uniform: &Self::Uniform,
    ) -> (Vector4<f32>, Self::Fragment);

pub trait FragmentShader {
    type Uniform;

    /// Fragment Shader
    /// Inputs:
    /// * `position`: Fragment position in Screen Space
    /// * `uniform`: Current uniform variables
    /// Outputs:
    /// * `.0`: Palette index
    /// * `.1`: Discard fragment
    fn shader(&self, position: Vector3<f32>, uniform: &Self::Uniform) -> (u8, bool);

pub trait Varying {
    /// Interpolate all varyings in the triangle
    /// Inputs:
    /// * `s`: Second varying from triangle
    /// * `t`: Third varying from triangle
    /// * `b`: Barycentric coordinates for rasterized triangle
    fn interpolate(&self, s: &Self, t: &Self, b: Vector3<f32>) -> Self;

// The shader program contains structs that implement the various traits
pub struct ShaderProgram<F, U, V>
    F: Varying + FragmentShader<Uniform=U>,
    V: VertexShader<Uniform=U, Fragment=F>,
    pub attributes: Vec<<V>::Attribute>,
    pub uniform: U,
    pub vertex: V,

impl<F, U, V> ShaderProgram<F, U, V>
    F: Varying + FragmentShader<Uniform=U>,
    V: VertexShader<Uniform=U, Fragment=F>,
    /// Create a new ShaderProgram
    pub fn new(uniform: U, vertex: V) -> Self {
        ShaderProgram {
            attributes: Vec::new(),

This is the ShaderProgram struct and its implementation, plus the various trait definitions that the shaders have to implement. This code is part of the fixed function pipeline, and of course it doesn't know anything about those concrete types. This is where things get tricky, so be prepared for madness.

The ShaderProgram owns the Attribute list, the Uniform struct, and the Vertex shader. It only knows that Attribute is the concrete type of whatever VertexShader's associated Attribute type is. You can see that with the <V>::Attribute syntax where V implements VertexShader<Uniform=U, Fragment=F>. (Wow, what a mouthful! I'm so sorry.)

That Foo<Bar=Baz> syntax defines the associated type bounds for trait Foo; that its Bar type must be a Baz. The U and F types are also generics. U is unbounded, so it can be any type (in practice it's the Uniform). The F generic must implement the Varying and FragmentShader<Uniform=U> traits. In other words, FragmentShader and VertexShader must have the same type for Uniform. And VertexShader must have a Fragment type that implements the very same FragmentShader trait.

WHEW! Let that sink in a bit. All the code is there, but it's full of circular references and black magic. Good luck. I wasn't exaggerating when I said that using multiple generics and associated types was a requirement for this project. I couldn't have done this without learning what tools Rust offers for metaprogramming. There might have been an easier way to write it, but this is what I settled on.

My main motivation was a clean separation of concerns between the library and its consumer. The biggest challenge was finding a way to pass opaque data between the various implementations written by the user. Iterating the attribute list and passing them to the vertex shader one-by-one, passing varyings from the vertex shader to the fragment shader... The fixed function pipeline does all of the data handling for these things, but the code knows nothing about what kinds of data these structs store, or the methods they implement. I'm glad traits were enough to get it to compile. (And if there's one thing I've learned about Rust; if it compiles, it will probably work!)

Wrapping Up

I've learned a lot about Rust in the short time I've had working with it. The language is always surprising me. I should be used to the "if it compiles, it will probably work" mantra by now, but it never gets old ripping out half of the code and rewriting it all with a ton of foreign concepts and syntax ... and after fixing the compiler errors ... the application works exactly like it did before you went and broke everything. That's exactly the opposite of my experience with every other language. Color me impressed.

Now that I have a fully generic shader model, I can theoretically add as many shaders as I like and swap between them however I see fit, right? Well, mostly. The one issue I expect to hit right away is that I can't just replace the reference to ShaderProgram when the user calls the useProgram() method, because the two ShaderProgram structs will be different types. What I think might work is something I have only read about, called a trait object. As far as I can tell, this just means that the object needs to be wrapped in a Box that is generic over your trait. So if that works, it means useProgram() should be painless to implement! We shall see...

Then comes the question of content. As I've mentioned before, the head model is just a test subject, it's not going to appear in the final game. It is still useful as a debugging tool, though. And to derail my thought on content, I did work on a bit of a debug GUI before I managed to complete the second shader model refactor. The gradients shown in the screenshot above are now hidden by default. And when they are turned on, a slider overlay is drawn over each of the gradient steps, allowing the position of each step to be moved at runtime.

The code to get the sliders working wasn't a lot of trouble, but it would be extraordinarily tedious to do the same thing for a second GUI widget that I need (a color picker). It would have to handle modal states so you couldn't open multiple color pickers at the same time, and I would have to be really careful not to break the slider implementation, since it's simple and fragile anyway. And then what if I need a third GUI widget? This train of thought went on until it induced panic, and I back peddled on even trying to add a GUI at all. (This is exactly why I believe Electron is the only GUI framework worth using. Basically the complexity grows exponentially, the more you want to do with a GUI, and HTML+CSS is the only framework that handles it well in a platform-agnostic way.)

But let's face it, I'm not going to go through the trouble of integrating Electron with Rust. Or even Servo for that matter (not that Servo is even close in capability to Electron). For GUI stuff in the context of video games, it's hard to beat dear imgui. There are Rust bindings for it, and the bindings include renderer backends for gfx and glium. Trouble is, I'm not using either of these crates! I'm using sdl2 because it only requires around 5 lines of boilerplate, that's like 1/10 of the boilerplate required for either gfx or glium.

Anyway, I went back and forth over which backend renderer I should use, or whether I should write a new renderer for sdl2. And if I was to roll my own a renderer, how would it integrate with the code I've already written? I was stuck on this decision for days, when I finally had enough and told myself I was going to stop working on the debug GUI, and come back to it later. It apparently worked, because I was able to focus solely on the refactor, and that was a complete success. Coming back to the GUI though, I think I'll switch over to gfx. It seems like it will have long term benefits over sdl2, and importantly I don't have to write a renderer for imgui. I will just have to change up my texture draw calls and pull in a bunch of boilerplate code.

Ok, debug GUI thought complete. Now where was I ... content! I still have that test scene in Blender and the start of the export script, ready to spit out some CBOR for the app to consume and send to the renderer as attribute lists. Assuming I can implement useProgram() and a Blender export script and a sunbeam shader, my little test scene should be good to go for the next update! (I know, I know, I've been making this promise for the last month! But I'm ready this time!)

That's what we have to look forward to. I can't wait to show you. It's going to be pretty great.

Onward to LoFi 3D gaming!

No comments: