Sunday, September 24, 2017

Tiny MCU 3D Renderer Part 6: Camera animation and display scaling

Yesterday I finally got around to adding some simple animations. The app was always rendering at 60 fps, but the image was static because there was no animation. That's why I've only been posting PNG images of progress so far. But now I can do this:


This was my first ever foray into quaternions! And I must admit, learning about quaternions suuuuuuucks. Surprisingly, this is one area where I would actually recommend developers keep quaternions as a mysterious black box, though an essential part of their repertoire. Every academic writing you will find about quaternions is deep in imaginary number territory (which we all know is impossible to represent on a computer). The important point is that the imaginary numbers can be optimized out, so having them in the first place is completely stupid, but I digress. Ok, ok, imaginary numbers help make sense of the derivatives... So what? It still sucks. And it's still pointless in the context of 3D graphics.

Scaling

On the secondary topic of display scaling, the video above is scaled 2x in hardware (technically, it's scaled 4x, because macOS implicitly scales by 2x when High-DPI support is disabled). This is the only place in the project where drawing is explicitly done by the GPU. (Of course, the final push of pixels to the screen has always been done by the GPU.) The scaling uses two textures; one is exactly the same size as the frame buffer, and is simply a 24-bit RGB representation of the 8-bit frame buffer. This copy is done in software with the following code:

// Copy the frame buffer to the texture
texture
    .with_lock(None, |pixel_data, _| {
        for i in 0..pixel_data.len() {
            let idx = shader.pixels[i / 3] as usize;
            pixel_data[i] = PALETTE[idx * 3 + i % 3];
        }
    })
    .expect("Draw to texture failed.");

shader.pixels is the frame buffer, and PALETTE is a flat array of RGB data for each of the palette indices held in the frame buffer. texture is then scaled to a render target texture in hardware with nearest-neighbor using this code:

// Scale the texture with nearest-neighbor (sharp pixels) to the render target
canvas
    .with_texture_canvas(&mut texture_target, |texture_canvas| {
        texture_canvas.copy(&texture, None, r1).unwrap();
    })
    .expect("Draw to render target failed.");

r1 is the scaled destination rectangle, which I only need because the render target has some extra space on the bottom for the debug gradients. It specifies the rectangle that covers the full area with exception of those gradients. This only does integer scaling, which is why nearest neighbor is nice. I've tested the scaling up to 10x, which fills my 3,840 x 2,400 screen, and there's no measurable impact on CPU utilization (exactly as expected).

The final step in scaling is stretching the render target to the correct pixel aspect ratio, which is done in hardware with linear interpolation. It's just a one-liner:

// Stretch the render target to fit the canvas
canvas.copy(&texture_target, None, None).unwrap();

texture_target has a 1:1 pixel aspect ratio (required for nearest neighbor), and canvas has an 8:7 aspect ratio, as discussed in Part 5 of this article series. (Side note: the pixel AR is completely independent of the scene AR used in the projection matrix. The scene AR needs to be configured appropriately to cancel the effects of the pixel AR. The exact scene AR to use is a little fuzzy, but 13:10 is as accurate as I could get it.)

Separating the scaling steps like this allows for very cleanly scaled pixels, while preserving the 8:7 pixel AR (which is a fixed value that cannot be changed). The video shown above is not representative of the clean scaling, since it is extremely compressed and introduces several artifacts. Below are some screenshot details with the display scaled to 3x and 10x, showing how pixel definition is cleanly preserved. The detail shots were subsequently zoomed by another 10x to illustrate the effects of linear interpolation on the pixel AR.

3x scaling, zoomed in 10x to show detail

10x scaling, zoomed in 10x to show detail
These detail shots demonstrate how the linear interpolation only effects the horizontal rows, with minimal antialiasing. The antialiasing really cannot be avoided, due to the non-integer scaling of the pixel AR. But it doesn't effect the entire image, so it's absolutely reasonable.

Shader Model Refactoring: Take 2

I hate refactoring almost as much as I hate testing. Sometimes they are both necessary evils. In this case, I have already refactored the shaders once. The end result was not enough, because I can't swap shaders at runtime with the current state of the code. For that I need Generics and Traits, which are admittedly nice features that Rust supports today. There are many gotchas with this approach, but from my experiments so far, it should be enough.

On the second refactoring front, I've so far defined three traits; one for VertexShader, one for FragmentShader, and one for Varying. The *Shader traits each implement a method named shader which, as you guessed, is the shader code. The Varying trait implements a method named interpolate that, given a triangle containing varyings for each point and a barycentric coordinate, will return a new varying that is interpolated by the barycentric.

One implementation detail is that the Fragment struct implements both FragmentShader and Varying. The Fragment struct owns the varyings for the fragment (aka pixel). The Vertex struct implements VertexShader and returns Fragments. Finally there's a ShaderProgram struct which owns the Vertex struct, a Uniform struct, and a vector of Attribute structs. Both shaders receive a reference to the Uniform struct as an argument, and VertexShader also receives a reference to each Attribute (one at a time). Attributes, Uniforms, and Varyings are conceptually identical to the synonymously named variable classes in OpenGL ES.

I moved the frame buffer out of the ShaderProgram, since it doesn't really belong there. A better place for it would be a singleton for managing internal state. This singleton would also be responsible for the z-buffer, viewport matrix, clear colors, etc. Altogether this should allow the API to begin converging toward the OpenGL ES 2.0 API. I'm targeting the ES 2.0 API initially because I am familiar with it (from WebGL) and because it's a nice base API with a programmable rendering pipeline. I might also steal some features from ES 3.0, like texture compression and instanced rendering, but those are quite aways down the road.

This round of refactoring is incomplete; all of the images in this post where rendered with the shader model described in Part 5 of this series. The new shader model is slowly coming together, and should finally allow the app to render more than one material and lighting model. It's important to note that the 4-color gradient used on the head mesh is already easy to change dynamically at runtime. It just requires changing the Uniform gradient, and calling draw_array() on the attribute indices. In other words, the refactor will not add a feature to render more than 4 colors; I already have that capability.

Wrapping Up

I'm really happy that I finally get to show it off in motion! I think it looks pretty neat, and the friends that I've shown have all liked it, too. Being able to scale the window is also a nice touch, and it was important that it be fast and produce accurate results. So mission accomplished?

The next steps beyond the second shader refactoring will be testing content in the scene. I already have a simple scene with some textured planes ready to go. It's just awaiting the Blender export plugin and a use_shader() implementation. Both are works in progress, and coming along nicely.

Onward to LoFi 3D gaming!

No comments: