Saturday, October 14, 2017

Tiny MCU 3D Renderer Part 9: Bug fixes, and first shader tests

It's shader time! I finished the simple sunbeam, which I think looks quite nice in motion.



It could use a little work on the gradient, and some extra geometry wouldn't hurt. There are four sunbeams in this test scene. Each is just a plane (two triangles). I'm thinking of splitting the plane into thirds (vertically) so I can shape it more into a semi-circle, instead of just laying flat in the background. That might give it some depth, and help combat the problem with the right-most sunbeam looking so thin.

Monday, October 9, 2017

Tiny MCU 3D Renderer Part 8: Programmable Pipeline and Asset Pipeline

Oh hey, look at that! I changed the CSS on my blog a smidge. It's worth mentioning, anyway. Say hello to the Blipjoy Invader! You can just make it out on the left side, there. (Depending on your screen resolution, it might be behind the post content, whoops!)

I also finalized the programmable pipeline on the 3D renderer, thanks to @vitalyd over at the Rust user's forum for the hint I needed to push me in the right direction. The API isn't exactly what I had in mind, but it's certainly reasonable.

This is what I built yesterday. The model on the left is drawn with our very familiar Gouraud shader with interleave dithering on a four-color gradient. This is what we've seen exclusively in screenshots to this point. On the right is something new! A much simplified shader that renders something like a cartoon, aka cel shading minus edge detection. Each model rotates in opposite directions for funsies.

Code from the last article will be referenced below.

Wednesday, October 4, 2017

Tiny MCU 3D Renderer Part 7: Generics and Traits, oh my!

If you've been following my blog, you'll know that I've been writing a 3D renderer in Rust. This is my first real experience using the language. I dabbled a bit in Rust on a project in January 2016, where I was challenged by lifetime annotations, and gave up. This time around, I've managed to write a complete software renderer with all the bells and whistles of a modern shader model, without the need for a single lifetime annotation. And until just recently, without defining a single Trait or Generic.

Earlier articles in this blog series have focused exclusively on the renderer from an end user's point of view. In other words, trying to make it attractive to the everyday gamer. In this episode, I want to describe in detail one of the issues that I have been struggling with in the code design, and how I've been approaching a solution. This is by no means the "right way" to handle this, or even similar situations. I just want to provide some info for anyone who happens upon the right magical incantation in The Googlies, and ends up reading this.

(Edit 2017-10-04: The illustration below was originally flipped vertically. It now shows the correct scanning direction.)

Triangle rasterization scans pixels in the target buffer from left-to-right and bottom-to-top

So let's start from the middle, and work our way out. Above is a simplified representation of the rasterization step of the renderer. The details prior to, and those that follow rasterization can be ignored for the time being.

This image represents a zoomed-in detail of a frame buffer (or any render target) as a 2D triangle is being rasterized. This occurs after the perspective division, so the Z coordinate can simply be dropped for rasterization; the Z component is used later for depth testing.

Sunday, September 24, 2017

Tiny MCU 3D Renderer Part 6: Camera animation and display scaling

Yesterday I finally got around to adding some simple animations. The app was always rendering at 60 fps, but the image was static because there was no animation. That's why I've only been posting PNG images of progress so far. But now I can do this:


This was my first ever foray into quaternions! And I must admit, learning about quaternions suuuuuuucks. Surprisingly, this is one area where I would actually recommend developers keep quaternions as a mysterious black box, though an essential part of their repertoire. Every academic writing you will find about quaternions is deep in imaginary number territory (which we all know is impossible to represent on a computer). The important point is that the imaginary numbers can be optimized out, so having them in the first place is completely stupid, but I digress. Ok, ok, imaginary numbers help make sense of the derivatives... So what? It still sucks. And it's still pointless in the context of 3D graphics.

Sunday, September 17, 2017

Tiny MCU 3D Renderer Part 5: Aspect Ratio and Field of View

I had a long week on vacation, and was able to do a little bit of coding almost every night. There was a lot of time spent doing touristy things, so my coding opportunities were limited. I had a good solid 4 hours of nothing but coding time on the plane, though! Both ways.

On my departure flight, I managed to finally fix the aspect ratio (as far as I can tell). This was just a matter of adjusting the projection matrix to use the correct aspect ratio for non-square pixels. On my return flight, I finished almost all of the refactoring for the new Shader API, and finally completed it from the comfort of my own couch.
It shouldn't look too much different from the previous screenshot. There are a few obvious differences if you look closer, though.

Monday, September 4, 2017

Quick update, progress report, current plans

This weekend I was distracted by a well-intentioned good friend of mine who suggested solving a chess puzzle described as "deceptively simple". Unfortunately, the article is criminally misleading. Therein it is claimed that computers cannot "solve the conundrum quickly and efficiently". The article is misleading because it is in fact trivial to solve the puzzle in linear time with constant space complexity.

Solving the puzzle is not the challenge alluded to. The challenge is that given any starting position with queens already placed, find a valid solution by adding more queens. A notable related problem is enumerating and counting all valid solutions. To date, it has been shown by brute force that a 27x27 chessboard has over 243 quadrillion solutions; removing all symmetrical solutions results in about 29 quadrillion. The brute force work took about 7 years with a massively parallel array of custom hardware (written to FPGAs).

To provide some context to the size of the numbers involved, it would take a modern CPU (single-core at 4.5 GHz) about 4 years just to increment a counter as fast as possible from 0 to 29 quadrillion. That is the time estimate just for the work involving the counter; nothing more. It's also hopelessly optimistic, since that assumes you already have a list of all 29 quadrillion solutions, or that there are zero false positives or any instances of wasted effort.

Sunday, August 20, 2017

Tiny MCU 3D Renderer Part 4: Gouraud Shading

Today, it's interpolating normals to render smooth lighting. That's right; Gouraud Shading in full effect. Two screenshots to start with; first is a view with diffuse disabled, to show the full effect of the shading. Followed by fully textured.
Surprising that the texture is so dark. But it is what it is. I think this test model has just about reached the end of its usefulness for the project. There's just one duty left for it to serve. Remember those gradients at the bottom of the image that were added in Part 2? It's time to put our friend here through some post-processing!

Saturday, August 12, 2017

Tiny MCU 3D Renderer Part 3: Textures and Perspective

I was surprised by how easy it was to interpolate over the texture coordinates, given the barycentric coordinate space. I have more boilerplate code to convert the mesh vertices into cgmath vectors than there is code to interpolate the triangles! I'll refactor it all away after the renderer's features begin to stabilize. With a little gamma and luminance love, I now have nearest-neighbor texture mapping:
The gamma correction was crucial, since this texture went through two separate processing passes; first, I dropped all chrominance information from the RGB leaving just the relative brightness; and second the global illumination was applied to the texture mapped geometry as you might imagine. To get it working right, the RGB components are transformed from Gamma Space to Linear Space prior to the luminance transformation. Then the texture stays in Linear Space until after the final illumination pass. The pixels are transformed back to Gamma Space as they are written to the frame buffer.

Friday, August 11, 2017

Tiny MCU 3D Renderer Part 2: Dithering

Dithering is an important post processing technique for color quantization, and is especially useful for smoothing gradients with a low precision color space. I got ahead of myself a little bit on the 3D renderer development, and decided to research and experiment with various dithering algorithms. The most popular algorithm is arguably Floyd-Steinberg, which is based upon error diffusion. I used this algorithm back in 2009 for an image processing side-project.

It's safe to say I've learned a bit more about dithering in the last 8 years. Most obviously that Floyd-Steinberg is not ideal for animations because error diffusion will cause an avalanche of artifacts over the temporal domain. A noisy animation could be nice - even artistic - if the noise was evenly distributed. Avoiding the grainy look may be a better option, however. To that end, Bayer's ordered dithering algorithm is commonly used. Unfortunately, the apparent pattern may be too distracting. Various deterministic noise functions are also useful (e.g. pink noise or blue noise ... definitely not white noise).

My first dithering attempt was simple: I would draw a smooth gradient from black to white, using only 2 stops: black at 0.0 and white at 1.0. It was immediately clear that I needed some gamma correction, because my gradient was far too bright overall (when compared to a linear gradient without dithering). Everything I know about gamma, I learned from this article; highly recommended read. This was the first decent dithered gradient I created, using 2 stops:
You'll have to stand pretty far away from the image, and maybe squint a bit to see how the gradient tones line up (note that gamma correction was performed with γ = 1.8, which looks correct on macOS and Windows 10). Not bad for two shades! Close up though, the pattern is a little too strong. It will look better when applied to an image with lower frequency components; the linear gradient is the same pattern of pixels repeated vertically. If applied to the head model, the dithering would probably look rather nice (TBD). But we can always do better!

Tuesday, August 8, 2017

Tiny MCU 3D Renderer Part 1

It's hard to believe that it has been two years since my last blog update. A lot has happened since then, but nothing to write about. I have done surprisingly little in the way of game development or hobby programming since js13k-2015. I experimented with Rust a bit, kept up on some minor maintenance work for my nodeJS Capstone bindings, and I've played a whole lot of Rocket League.

But today I want to share some progress on something that I have been working on periodically for a very long time, because I've been getting more serious about it recently. In the tradition of keeping up my personal motivation, it's time to start sharing what I've been doing. It's not much to look at, but here it is:

3D Renderer in Rust (100% Software)

This is rendered entirely in software using Rust. And, well, that's about all there is to it! Flat-shaded triangles rendered with orthographic projection. I have other screenshots from earlier stages of development, including a wireframe raster, and polygonal (as above) without depth correction. In this screenshot, I had just added a depth buffer which completes all of the geometry rendering work. Next steps are adding diffuse texture mapping and perspective projection. I'll get to that later.

Sunday, September 13, 2015

It's a Lovely Day for a Postmortem

After Liberated Pixel Cup, I promised myself I would never again enter a month-long game jam competition. It was just too exhausting. And then I discovered js13k. I guess I can't refuse a challenge. I also noticed the numerology around the number 13. I dig it! That's my number. To make matters even better, the competition deadline fell on my birthday (today).



What Is It?

The theme word announced for js13k this year was "reversed". That provoked many immediate ideas, the most enticing of which was a driving game where ... you guessed it! You drive in reverse. I envisioned a game like Outrun, Rad Racer, or Road Rash. A road vanishing in the distance, and all you can do is drive away from it. Absurd.

Soon enough, the Scope Creep Monster came knocking. I suddenly had a fully 3D rendered world, with trees, and rocks, and billboards, and mountains. Other cars on the road to avoid, and cops that chase you with sirens blaring (because hey, you're causing moral outrage through reckless driving).

The cutback finally came just a few days into the project. I decided to take inspiration from Desert Bus, known for 8 hours of driving on a perfectly straight road in an empty desert with literally nothing exciting happening ever. That's the game I wanted! The worst game ever made could only be made worse by driving backwards. This is its story.

First Steps

My first task was to find a way to get a full 3D game crammed into 13KB. This is no easy challenge, mind you. The most popular 3D framework is probably Three.js, and it clocks in at over 400KB minified! Another popular framework, Babylon.js, is over 800KB... You know what that means. Roll-your-own 3D engine? Pretty much! I had to use straight-up WebGL without all the fancy doodads provided by a framework. (Note: There are probably minimal 3D frameworks out there, I just don't know of any.)

With my framework chosen (lol), I set out to research some of the tricks used by js1k alumni. This is the same idea, except they get a meager 1,024 bytes to work with. Obviously they have some pretty good tech to do much in a little over a thousand bytes! (The first three paragraphs in this blog post are about 1KB.) I was able to find some good tips on The Googlie. Many of the tips were obvious, but then there were a lot of truly innovative ones, too! I definitely recommend checking out some of the previous js1k entries, even if just for inspiration!

My first commit had a blue triangle rotating and scaling on a white background. Super exciting! Important note: this is my first attempt at making a 3D game. I know the theory, let's put it to the test.

Art, Music, and Maps

When I said many of the tips were "obvious", I was really referring to procedural content generation. You're not going to squeeze a large map into such a tiny space, but you can randomly generate a map that goes on for infinity with just a small piece of code! And you can go deeper; generate all of your art, music, and sound effects procedurally, too!


So, a few lines of code can generate textures and music, huh? An excellent paper from 2006 explains in detail: Procedural Content Generation by Bjarki Guðlaugsson. This is the original source of my own personal connection with PCG; a paper which I read back then eagerly seeking information after the release of the 96KB FPS, .kkreiger.

The true first step, then, is to choose a PCG algorithm. For simplicity, I decided I would only use one, with a generator pattern if I needed it to create different kinds of content. There are numerous algorithms to pick from. The first that came to mind was Perlin Noise, of which I am familiar. The thing is, everyone's using Perlin Noise, since Minecraft exploded its popularity. (Nothing wrong with that, and the noise function is fully capable of doing a lot more than create Minecraft clones.)

I wanted to try something new. What I found was the diamond-square algorithm. What convinced me about this was the happy "starry sky and mountain" image about 1/3 down the page. Followed up by a cloud. Perfect, it's exactly what I needed!

After a few false starts, I had the diamond-square working in JavaScript. Right away, I had a solid-colored mountain rendered from a single row of my 2D height map. Then I added all of the rows as individual layers, each with a slightly darker color. For added "coolness", I used the same height map to render clouds.


From there, a few adjustments were made to the layer positioning and interpolation within the cloud. Crucially, the 2D height map also grew from 128x128 to 512x512 (which was deemed capable of reasonable quality, but incapable of reasonable speed). This is what I ended up with, still just rendering directly to a single texture (e.g. not usable for animations).


Important note: I've never used procedural content generation in a game. I know the theory, let's put it to the test.

Sundown

Porting that texture renderer to WebGL was a pain. Ideally, I would have generated 3D geometry and used the fragment shader to add haze at runtime. But I wanted an old-school look. Remember, Rad Racer, not Need For Speed! So I settled on 8 layers, each drawn with 16 rows of the height map.

It also didn't help that I had a bug in my vertex shader. It took two days to determine the cause of the unusual rendering of my 3D geometry. It was simple: place a textured quad in 3D space at "some appropriate scale", then place another quad in front of it, scaled less, etc. Until all layers where in place. The issue I had was finding out what "the proper scale" was for each layer. From testing, it appeared to be on a logarithmic scale; for each unit of distance back, the layer size had to double to get decent results. It was truly mind boggling.

After fighting with the geometry and triple-checking my perspective matrix, I broke down and compared my code against a perspective demo provided by Gregg Tavares. Perspective matrix computation was spot-on. Then it hit me; vertex shader! Comparing the two, I noticed the only real difference is that I was multiplying the position vector by the matrix. You're doing it wrong.

With the geometry fixed, I went on to add colors to my vertex attributes (for linear gradients). The gradients evolved into a sunset, which demanded a new color scheme.


This was my first attempt at picking the right color scheme. It's pretty close to the final choice. After that, I added four layers of clouds, each with their own random color level with distinct spectrums; orange, blue, purple, peach. This is what gives it that "Arizona sunset" look. Seeing it in motion is strikingly realistic.


w00t! Background is done. All that's left is something to drive on...

The Road Ahead

As a dumb test, I created three triangles and laid them out in my 3D coordinate space. Looks like a perspective road to me!


The trouble here is the vanishing point goes into infinity. To simulate this with a rectangular road would have been infeasible; the road is always going to hit the back of my perspective frustum before it gets to 1px wide. Not to mention the road needs to curve, and will make it too wonky to fake or force the perspective. The solution then is to fade the road into the mountain background using those linear gradients that I love so much.


Shown in this screenshot is bad premultiplied alpha blending, which was fixed afterward. :) It still demonstrates the technique well enough. The perspective looks quite different as well, because the road is now made with a rectangle laying down on the surface, instead of a triangle.

Also notice the difference in the clouds. No code changes. Just regenerated with different random values. Pretty striking!

Now about that dull road ...

The Road Behind

I had to mull over the challenge of constructing curved roads for a few days. After all, at this point I only had a single, gray rectangle. I would have to chop it into smaller rectangles, and then somehow squeeze and stretch them in the right way to get a realistic winding road.

Back to the fractals! I already had a pluggable noise generator, since my diamond-square implementation was built with it in mind. I did some research on pink noise and experimented with the algorithm presented on that site. I found that it looked too much like white noise for my tastes. By limiting the range of motion for a random walk of white noise, I could generate much more suitable pink-ish noise with the diamond-square.

I plugged in a visualization of the fractal data (red), and drew a "road" (black) by rotating line segments by a fraction of the fractal data on every step.


And another one for comparison.


Not bad! However, I did find out later that the algorithm which drew these "roads" was not pivoting along the line segment as it was supposed to, but pivoting at its origin in the upper center of the image. That's why the roads seem very smooth at first, and then get more chaotic toward the end, where the small rotations have a greater effect on the more distant segments.

Now I have a road paving algorithm, this is going to be awesome!

Extending the 1D Road to 2D

My road is so far still single-dimensional; it's just a line. I have to extend it to 2D by giving it arbitrary width. Doing this required a plan. For every segment, I would extend perpendicular (to the paving direction) by half of the width in each direction. These two points would be recorded. Then the segment would be rotated, and walk one unit to pave the segment. This process would be repeated, each time creating a rectangle (quad) from the two new points and the two last points, building a road around my simple 1D curvy line.


This is what I saw in my head as I designed the road 2D paving algorithm. In step 1, the red dot is the current location, and the two blue dots are the "next points" to compute. This is a straight segment.

In step 2, the current location is rotated, walked one unit, and then it's just a straight shot out to the sides to hit the blue points, which are what I want to capture.

The 3rd step shows the current location (red dot) advanced, and the extraneous geometry removed, leaving two triangles (a single quad).

The algorithm repeats like this forever, building an infinite road that is anti-dull. With this idea in place, I updated my visualization to draw 2D roads.


Score.

Extending the 2D Road to 3D

This was so easy, it almost doesn't deserve an entire section of this postmortem. But I'm gonna, because I have another screenshot to show!

The whole process of the 3D extension was to use the X-Z plane. Ta da! While hills were definitely a part of my original Scope Creep Monster plan, there was no way that would have been a realistic goal. So the road never diverges from -1 on the Y axis (which is below the camera at Y:0).

Here is the first 3D road that I was able to render. Notice the black-and-white pattern, clearly showing the individual quad segments, and a small piece of the 2D visualization of which it represents in 3D.


I still didn't have a texture for it, but even dull gray would be better than a piano road.


Very nice! I can now generate a road infinitely. My fractal is 128x128, so if I compute segment angles for each point in the fractal, that gives me 16,384 segments of road before it cycles. Each segment is 1 unit deep, or about 1 meter in human length. 16 km of random road for free!

To keep memory usage sane for the road, only 28 segments are kept at a time. (28 happens to be evenly divisible by 4; the number of segments of depth for my asphalt texture, and also happens to be the depth at which the first layer of mountains appears.) When a new segment needs to be created, all segments are shifted up (into the distance) and the farthest segment gets pushed off into the void. Because the road is cyclical, the texture coordinates from the dropped segment are reused by the new segment. (This will become important later on.)

Of course, I'm still very far from being done, by this point. I'm already 3 weeks into the competition, and I don't even have any interactive elements. So far it's all been hand-wavy "realistic graphics are so important!" stuff. That was my biggest mistake of this project. (Don't get me wrong, it's a great tech demo.) Time to work on game play elements, right?

Nope

Gouraud shaded roads do not belong with that sunset! We need an asphalt texture. Yes, Leo, yes.


I looked at a lot of photos of asphalt on The Googlie Images. (Try telling that to your girlfriend...) Seems like it should be pretty easy to generate a texture... Just a bunch of white noise made really dark! But white noise doesn't actually look like asphalt, it looks like TV static. And darkening TV static just looks like dark TV static. No, we need a method. Something more natural. Something that looks bumpy, instead of noisy.

In my search, I came across this Gimp tutorial: http://gimpchat.com/viewtopic.php?f=23&t=9760 It isn't super great, but it would do. I have the directions, now I need to codify it. I already have clouds, and white noise is a good replacement for HSV noise. The steps I'm missing are "sparkle", "emboss", and "gaussian blur". (I skipped making any cracks.) Blur is an easy one; just a matter of duplicating neighboring pixels with some falloff. I was able to combine sparkle and emboss into a single approximation by applying another "neighboring pixel" algorithm after the blur step; if the neighboring pixels are too dark, make this pixel bright. Throw in some desaturation and...


That'll do! Bonus: this texture wraps appropriately, which is important for realism (there I go again!)

Now it looks [a little bit] like freshly paved asphalt. It's missing something, though. Oh yeah, lines for the driving lanes. This was a fun bit! I created something like a paint roller simulation to make sure the deep crevasses in the asphalt left air bubbles that would pop before the paint dries.


I kid, I kid. The intension, of course, was for the texture to look like the paint had imperfections. That's what makes the game's unique style so "realistic"; I relished in the imperfections! The algorithm is simple; if the color is too dark, leave it dark. If its brightness is above a certain threshold, replace it with a yellow or white pixel. Plus some white noise for added roughness.

Two interesting points: First, the vertical lines do not have a perfectly flat edge (again, by design). There is an extra condition along with the brightness test; on the edges of the lines, the brightness threshold is raised, meaning fewer pixels will be colored. Second, the texture shown here is what appears in the final build, even though two other line patterns are generated (double solid, and broken left). This one creates the broken right line.

My intention was to randomly choose one of the three textures for a random run of highway. It's complicated a bit by my road paving algorithm, which is inherently iterative. The idea was to take the value in the first column of the fractal on the current row, and derive two numbers: which line pattern to use [0..2] and how long to let it run. That would have been deterministic.

The harder part, IMHO, would be putting the proper texture coordinates into the geometry buffer as it shifts. Remember when I said that the texture coordinate reuse would be important? This is where it bites us. With additional textures on the road, my buffer is no longer cyclical, making the shift operation hard. Not impossible, just "I don't have time to do this right now."




A selection of screenshots showing the different line patterns. Now I have a road that is drawn correctly. 2 days left in the competition, and I still need to make the camera move along the road, and add some kind of interactivity. Jeez!

LERP FTW

Linear interpolation (LERP, aka tween) is a glorious thing. You've seen it before. You've probably even used it. It creates smooth (linear) transitions between two states over time. To create the driving animation between plays, I run a LERP between each road segment, along the midline. It interpolates the camera position and rotation in one step. Finally, the camera is translated to the right side of the road, about where a driver would be sitting (if you're in the US).

The LERP was a bit of a problem to get right. Mostly because camera matrices are awful. I ended up with a decent matrix inversion function based on maths from an awesome website I found. It's slightly cheaper (and easier to understand) than the popular method.

Anyway, I finally got something to work "good enough", and that's what you see in the final build. It's a little bit bouncy when the camera jumps to the next segment. I haven't looked into any way to make that better.

1.5 days remain, and I still don't have any interactive elements to the game! My pacing on this was atrocious.

Input Output

Scope Creep Monster was getting a little unruly, to be quite honest. I still wanted it all; desktop support, mobile support, keyboard input, gamepad input, motion controls, ... I must be insane.

Sure! Why not? It's not like it's a whole lot of code! In an evening, I had a full featured input module that covered keyboard, gamepad, and motion controls. Each input sets normalized state in an input object, which is used later by the game.

I plugged my new input object into the camera code, which required a lot of refactoring, actually. But it worked! With a little bit of physics code, I was able to drive in reverse! Of course, the first iteration was horribly broken. The auto-driving camera was responsible for shifting the road segments, but with a human in control, I cannot use the same LERP! Humans don't drive linearly.

What I needed was collision detection; determine when the camera crossed the "current location" of the road generator, and generate the next segment! The road would be generated as it was driven on. (You can see the effect of this in the final build by making a really sharp turn... There's nothing but the void behind you!) I needed collision detection anyway to keep the player on the road. Any off-roading could not be allowed.

I could have used any number of collision algorithms, like SAT. But I felt that was unnecessary overkill, there had to be a cheaper way! In a quick flash of inspiration, I decided to try a triangulation between the camera and the two points on either side of the road. (Remember the image earlier with the red dot and two blue dots? Yes those blue dots.) This turned out to be really easy! Not even a triangulation, really. Just two dot-products.

I have no idea what a dot-product is (it's a 1D projection of one vector onto another), but if you take the dot product of a vector against itself, you get the square of the vector's length! It's basically the pythagorean theorem, so yeah, I guess it is triangulation, of sorts! Anyway, the camera's position minus the point's position is the vector whose length I want. The variables are as follows:

  • The road is 6 units wide
  • The camera is somewhere within the area covered by the last [generated] segment of the road.
  • The camera can only move backwards

Given these constraints, it's easy to compute when the player passes between the two points: take the square root of each vector, and add them. If the sum is near 6 (the width of the road) then the line between the two points is being crossed. Try it out on a sheet of graph paper! It's simple geometry. For simplicity, I consider "near 6" to be < 6.1. That gives a 10% margin of error, more than enough to account for weird floating point precision and rounding errors.

That takes care of generating more road. Determining whether the player left the road was equally easy! The road is 6 units across, so logically, if either point is ever more than 6 units away from the camera, the player has left the road. This can be simplified a bit because you don't need a square root of the dot-product; just compare the dot-product directly to 6².

Magic! I have a complete game, with a full day to spare. What could possibly go wrong?

Sound Makes It FUN

This is a tip I learned long, long ago; music and sound effects are the real key to creating an enjoyable experience. This is part of the "juicing" philosophy to game development. You can have the craziest screen shake, most intense particle effects, and super colorful warpy shader things... But without sound effects to go with it, it's kind of dead. While my game has none of those juicy things, I certainly didn't go crazy on sounds, either!

I woke up on "launch day", with a full 16-and-something hours until deadline. The first thing on my TODO list was music. Earlier I had played with the idea of fully procedural music generation, so I did some studying, and now have a cursory understanding of the math behind music theory. There's not much to it, but I don't have enough practical knowledge to make use of any of that. My attempt with the WebAudio API was the ultimate disaster. Thankfully, that never made it into a single commit.

I was then informed on The Twitters by a fellow js13k developer, "Sonant-x to the rescue!" Holy crap, Ryan, you saved the day! Within an hour or two, I had my sound track. It turned out well. So well in fact, that I've had the music playing on repeat almost constantly since I wrapped it up. How in the hell? I'm no musician. Did that really come from my brain? Props to Nicolas Vanhoren, Marcus Geelnard, and Jake Taylor for their work which culminated into a tool that allows a tool like me to create a melody so peaceful and serene.

You're gonna need it, because this game ... Oh, this game.

The music really brought the project to fruition, IMO. But with over half a day remaining, there was plenty of time to juice it up and make the experience a little more coherent. I added a title screen by reusing the LERP animation (auto-driving) and putting "cursive" font-family text (a div element) over the canvas. I re-reused the animation for the loss screen, and made the loss text as happy as I could. Because let's face it, the game is truly brutal and unforgiving. Every little bit of happiness that can be impressed upon my unfortunate audience will be well-deserved.

Oh yeah, and iOS is a constant source of grief! I had to add a "Tap to continue" message after the loader for iOS, because SCREW YOU APPLE. That's why. iOS silences all audio until audio is played in response to a user interaction. I don't know why. I pinched a small function from howler.js to enable audio on iOS in response to a tap. The tap only enables audio about 10% of the time on the iOS simulator. It seems a bit better on real hardware. Oh well, I give up. Chrome for Android works flawlessly without this terrible hack.

Sound effects! I was still missing sound effects. There's only one sound effect that a driving game truly needs. A motor! I knew from my previous music research that synthesis was the way to go. But I stopped short of understanding Fourier Transforms, so I couldn't really get into it that much. Instead, The Googlie showed me the light! A tiny javascript library that generates motor sounds! With a bit of tuning, it would be perfect.

First, the library as-is only generated audio for the left channel in a stereo audio buffer. Easy enough to duplicate the wave into both channels. And second, it only generated purely random (white noise) sounds. I wanted more control over it, so I added a simple deterministic generator that produced a fine motor sound. It just had to be hooked up to my physics (completed earlier in the day, following input). Velocity affects both motor volume and frequency, which is pretty darned convincing.

Strangely, the motor sound is kind of awful on Chrome for Android. I haven't looked into why that is.

Final Touches

There isn't much more to say about the development process. I added a score counter which tracks the distance driven, and saves the high score to localStorage. Nothing special there! Another fix added late was support for 3:1 screen ratios (crazy), because the cloud layers are capable of scrolling all the way until a gap appears at the right side on wider display ratios. This was fixed by stretching the clouds horizontally a little bit.

And crucially, a game breaking bug was fixed just before submission; it only occurred in the minified/compressed build, due to overaggressive compression techniques! At just under 9KB (with 4KB to spare, that's about 30%) I could really afford to loosen up on the compression. I removed the breaking code, which cost 70 bytes (lol) but increased compatibility. All good!

What Went Wrong

iOS was painful almost the entire way. First, I was surprised an early build worked at all on my ancient iPhone 4S. Little did I know that adding an awesome hack would completely break it on iOS. I demand iOS compatibility! So I removed the hack very late to get it working acceptably again. And the final punch to my gut from iOS is the terrible motion controls. I didn't bother adding a manual calibration step (which was honestly a mistake). The gyro is not the most accurate thing in the world. It kind of sucks, actually.

The other thing I'm disappointed with is spending 90% of my time on graphics. A fatal rookie mistake if ever there was one. I don't know if I'll ever learn to mitigate this issue properly.

What Went Right

Uh, well, yeah. It's safe to say that pretty much everything (except iOS) went very well! There were normal development hiccups and bugs along the way (par for the course), but looking back, I don't think I've ever had a project that I've been more proud of.

There were a lot of personal firsts, here! First 3D game, first game using Procedural Content Generation, first to use WebAudio, WebGL, Gamepad, and DeviceOrientation APIs... To say that I did poorly with any of these things is missing the point! You gotta start somewhere, and for a game like Lovely Drive to set the bar high for myself right off the bat, in so many areas... Well, that's really quite humbling. (To myself, anyway.)

I feel like I also hit my Desert Bus homage goal (after cutting out the Scope Creep Monster). You can drive forever in this game, backwards of course. Importantly, you can take your time, or blaze a trail recklessly. There's no time limit, there are no obstacles to overcomplicate the already-cramped highway. The presentation is one of serenity and tranquility. I've had the music on loop almost non-stop, not out of hubris or ego. But because it truly instills an emotional reaction when I listen to it. Especially with a beautiful, inviting (ceremonial, even) font overlaid on a never ending desert sunset. It's probably the most gorgeous thing I will ever make.

I didn't have any trouble with the file size. The only reason I almost hit 9KB before release was because I had to add a bunch of one-off code (the devil, to any size-restricted project). I can thank math for that.

Closing Thoughts

The game is not the most fun to play. It is challenging, and perhaps could have an audience in the party game crowd (see who can get the farthest distance in the shortest time?) But really it is not a casual game by any stretch. It is Desert Bus in reverse. That sums it up nicely.



The font is different between Desktop and Mobile. I didn't do that; browser vendors did.



Sunday, February 15, 2015

melonJS should be *All About Speed* Part 6

Let's continue talking WebGL.

The original WebGL article was split into two parts, as the material covered too much ground. In the first half, WebGL was described in agonizing detail. In this second half, the WebGL theory takes a backseat to melonJS. This time around, there are a lot of high-level details revolving around rendering a scene in melonJS using all those crazy triangles.

As always, you can revisit previous articles in the series. Make sure you don't miss Part 5, as this article assumes you've read it and have and intimate understanding of WebGL.

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html
Part 2 : http://blog.kodewerx.org/2014/04/melonjs-should-be-all-about-speed-part-2.html
Part 3 : http://blog.kodewerx.org/2014/10/melonjs-should-be-all-about-speed-part-3.html
Part 4 : http://blog.kodewerx.org/2014/12/melonjs-should-be-all-about-speed-part-4.html
Part 5 : http://blog.kodewerx.org/2015/02/melonjs-should-be-all-about-speed-part-5.html

Here comes the wall of text.

melonJS Gets Fast

The upcoming melonJS v2.1.0 release contains significant improvements in its WebGL support. The most obvious improvement is in rendering speed, which was achieved using the research that lead to the WebGL primer writeup in Part 5 of this series. After gaining an understanding of how WebGL works, it was pretty clear that we had to do some things quite a bit different from the typical 3D scene rendering paradigms used today.

melonJS only renders 2D scenes, and it is historically tied to the HTML5 Canvas API. So what we've developed is a compromise on both sides; an API that is mostly compatible with Canvas, and mostly compatible with WebGL. "Mostly compatible" refers to the fact that there are differences from the pure Canvas API, and there are features entirely missing; gradients for example. The team believes this "best of both worlds" approach is the right thing to do for v2.1.0. In future releases, we will be focusing on WebGL as the primary renderer, and slowly deprecating Canvas.

What's New

In melonJS, we now have WebGL rendering frames using only two native calls: bufferData, (send elements to the GPU) and drawElements (draw the data that was just sent). All of the draw operations are batched up into a series of primitives (quads or lines), and melonJS flushes the batch operation when it needs to switch the primitive type (or everything has already been computed for the entire scene).

This new "Compositor" replaces the alpha WebGLRenderer that we shipped in melonJS v2.0.0. I guess you could call the WebGLRenderer in v2.1.0 a beta, now. Building the Compositor required the addition of two new classes, and the simplification of two others:

  • me.WebGLRenderer.Compositor is the new Compositor class that stores WebGL operations into a single streaming buffer, manages all of the texture units and shaders, etc.
  • me.Renderer.TextureCache is a new class that caches Texture objects, providing texture lookups by image reference, texture unit (index) lookups, and bounds the total number of texture units to hardware limitations.
  • me.video.renderer.Texture is the new name for an old class (me.TextureAtlas) that handles texture mapping; an image and its associated regions for e.g. animation sheets and tile sets.
  • me.Matrix2d was overhauled to make it safe for use with WebGL, and me.Matrix3d has been removed (it was not a 3D matrix implementation).

The Compositor is responsible for sending all commands to the GPU. This was implemented as its own class (outside of the WebGLRenderer class) to satisfy a need for custom shaders. Because the Compositor is tied directly to the shaders, it necessitates the need for a custom Compositor for game developers who wish to use custom shaders. The WebGLRenderer simply forwards all draw operations to the Compositor, and the Compositor contains the logic to batch these operations into the smallest possible number of calls to the GPU.

Why WebGL is Fast

That last statement is part of what makes WebGL fast; Having the hardware do the rasterization (putting pixels into a frame buffer) is where most of the speed comes from, but you can't take full advantage of that hardware unless you can quickly send it information about "what to draw". Compositing an entire scene on the CPU and sending the information in one large batch is the best way to reduce overhead in vertex memory bandwidth and the User Agent/GPU driver.

But reducing that overhead is only part of the solution. Higher performance can also be achieved by reducing the total size of payload data sent to the GPU. It follows that sending less data allows the GPU to begin rendering sooner. In fact, a lot of the "WebGL benchmarks" I have seen were written specifically to benchmark the GPU by only sending the minimal amount of information necessary to render a scene.

It's pretty straightforward to design a complex vertex shader which accepts only a single variable (the time delta) to render an animated scene entirely on the GPU without any vertex bandwidth concerns. But that doesn't qualify as a game by any stretch. Sure, throw in some more variables like gamepad inputs and such to make it interactive. Now you have a custom shader that performs incredibly well, but will only work for one game (or in the best case, one style of game). Some notable examples are a Flappy Bird Clone and a Legend of Zelda Clone.

How melonJS Benefits From WebGL

Unfortunately, the non-interactive and inflexible shader approaches are not at all compatible with the melonJS vision. We need shaders that work with any kind of generic renderable (a 2D image) that can be translated, scaled, and rotated arbitrarily and independently of all other renderables. This requirement necessitates a different kind of design, which I will get to later in the article. Suffice to say we've put our WebGL support through several revisions and experiments to get the most out of WebGL in a way that matches our framework.

That's something that can't be stressed enough; What we have built significantly lowers the entry barrier for any melonJS developer to start using WebGL right away. Just flip the switch, and it works! Not only does it work, but it really is faster than the 2D canvas renderer. And getting to that point was a lot of hard work.

How We Got Here

It didn't happen over night! In November, we launched v2.0.0 with an alpha quality WebGLRenderer. It's "alpha quality" because it's actually slower than the CanvasRenderer! That may seem shocking since the term "WebGL" is almost a buzzword for "fast 3D graphics on the web". I opened the Part 5 article with a bit of a disclaimer that flipping on the WebGL switch won't magically make your game run faster. And the sad truth is that this is exactly what I meant. Using WebGL doesn't just grant 60fps high resolution, multi-textured, antialiased, dynamically lit, billion-polygon-count rendered 3D scenes for free; you have to get there through sheer will and determination.

The alpha WebGLRenderer is naïve; For everything that needs to be drawn:

  • vertices are computed and uploaded to an attribute buffer (!)
  • texture coordinates are uploaded to an attribute buffer (!)
  • the index buffer is bound to the element buffer (!)
  • a texture is bound to texture unit 0 (!)
  • the transformation matrix is uploaded as a uniform variable
  • the color is uploaded as a uniform variable (!)
  • and 6 vertices (two triangles) are drawn immediately (!)

Everything I marked with (!) is an operation that equates to unnecessary overhead. Of these, only the transformation matrix changes often (representing the result of translate, rotate, scale). This ends up being something on the order of 10-15 calls per renderable. Even the simple platformer example routinely draws about 140 renderables at a time, making roughly 2,000 WebGL calls per frame. Or 120,000 calls per second. That's a lot of overhead for a very simple game.

Internally, the Canvas API in the browser is doing its own compositing, which results in better performance than the naïve GPU-poking approach. Clearly, we had to do better. With the new Compositor, melonJS makes 2 WebGL calls per frame (about 120 per second) in the best case scenario. This is a significant reduction in the number of calls (and driver overhead) but we now have new bandwidth requirements; the payload size for each call has increased. Reducing the payload size is the way forward to get even more performance out of the hardware.

The Architecture of the melonJS Compositor

And now the moment you've all been waiting for! (amirite?) Now that you know how WebGL works, you can probably imagine the difficulty of getting good results out of it while hanging on to a legacy API. We're sticking with the same renderer API that we introduced in melonJS v2.0.0 to ease the transition for game developers familiar with the canvas API. This was an important decision we made so that WebGL can be used and taken advantage of with the least amount of resistance. This is how we did it.

Recall that earlier in this article, I mentioned that the shaders used in melonJS need to be interactive and flexible; they need to support hundreds of individually moving images, each with its own rotation angle, scaling factor, and positioning information. This is all state that needs to be provided to the GPU, and for that number of images, it can only be provided as part of the vertex attributes. Yes, every vertex sent to the GPU contains information about rotation, scaling, etc.

Let's start with one of the Compositor's primitive rendering components, the quad. Quad is short for quadruple, meaning four; An image has four corners, a quad has four vertices, an image is a quad. Since the GPU works with triangles (not quads), we have to describe a quad as a set of two triangles. We use the ELEMENT_ARRAY_BUFFER to describe the triangles in our quad; every six elements in the ELEMENT_ARRAY_BUFFER points to the four vertices in the quad, in the following order:
[ 0, 1, 2, 2, 1, 3 ]
The first three elements are triangle 1, and the second three are triangle 2. You can see that both triangles share two vertices. That lets us send just the four vertices in our quad using the ARRAY_BUFFER. During initialization, the Compositor creates a large "index buffer" containing indices for 32,000 triangles (16,000 quads) like above. The index array above describes the first two triangles, and the second two triangles are described by [ 4, 5, 6, 6, 5, 7 ] ... With 16,000 of these blocks in total. (That's 96,000 total floats...) This large index buffer is created once and uploaded to the GPU as the ELEMENT_ARRAY_BUFFER. It is never touched by JavaScript again. (Though other index buffers may be bound in its place! The line shader's index buffer, for example.)

Each quad is made of four pieces of information, currently:

  • Vertex (vec2) : A point in pixel coordinate space.
  • Color (vec4) : A color sent to the fragment shader for blending.
  • Texture (float) : The texture unit index used by the fragment shader as the Sampler2D selector.
  • Region (vec2) : Texture coordinates for the Sampler2D.

You can count 9 floats (per vertex) that need to be streamed to the GPU, or 36 floats per quad. Unfortunately, the last three of those per-vertex bits are static for all four vertices in the quad! So we end up sending a lot of duplicated information to the GPU. The good news is that there's a lot of vertex memory bandwidth available, so it's a good tradeoff. (See the "Experiments" section below for our plans to reduce the number of floats per quad.)

Architecture Rationale

This vertex streaming approach is in comparison to using uniform variables for the blend color, texture index, and texture coordinates. As you know, uniforms are constant across the draw call, and we don't necessarily have the opportunity to share these values across every quad in the scene. Hypothetically, we could have used uniforms and individual draw calls, but it would have certainly degenerated to the worst-case scenario nearly every time; with one draw call per quad (due to lack of sharing, and draw order requirements). The cost would have been too great.

If you're familiar with WebGL, you might be wondering what happened to the matrices? Well, there's only one matrix used in our vertex shader today, and that's the projection matrix! (The projection matrix transforms our pixel coordinate space into WebGL clip space.) There is no concept of a view matrix or model matrix in the current iteration of the melonJS WebGL Compositor. Instead, those matrices are premultiplied before they ever reach the Compositor (this was done historically as we only supported the canvas renderer when the code was written). Once inside the compositor, the global "ModelView" matrix is multiplied with every vertex, and that's what gets sent to the vertex shader!

The vertex shader is very simple; it just multiplies the projection matrix (a uniform variable) with the vertex, and sends the remaining attributes to the fragment shader through varying variables.

The fragment shader selects the correct Sampler2D based on the texture unit index, samples it with the texture coordinates, and finally combines the color. This shader is more interesting because of how difficult it is to do Sampler2D selection. It is not possible to dynamically index arrays within the fragment shader (see the WebGL Spec) So instead we use a method pioneered by Kenneth Russell and Nat Duca from the Chromium team (see: http://webglsamples.org/sprites/readme.html) Which uses a series of if-then-else statements to select the correct Sampler2D.

But wait! Their example only supports four textures. Surely that's not going to be enough for melonJS?! As it turns out, WebGL requires a minimum of 8 texture units. But what about hardware that supports more than 8? We don't want anything bad to happen like crashing the GPU process when attempting to index too many Sampler2Ds, assuming the shader GLSL will even compile! And in the case that it does compile and doesn't crash, we would just be wasting GPU memory with a bunch of useless if-statements that never run!

The solution is compiling your GLSL before compiling your GLSL. ;) The GLSL is just a string, after all; we can manipulate it in any way we want at runtime before it is compiled into a working shader program. The best thing I came up with for doing the GLSL preprocessing is running it through a template engine. I've used doT before with great success, so it seemed like the obvious choice; it's tiny, it's fast, and it's extremely expressive.

We now have all of our GLSL sources written as doT templates. The fragment shader template in particular uses JavaScript evaluation to create a series of if-then-else statements in a loop (according to compiler theory, it creates an unrolled loop!) The templates are compiled to functions by doT at build-time, melonJS passes template variables to the template functions at runtime which produces the final GLSL source, and finally it's compiled by the UA into a usable shader program.


Experiments for a Faster Future

What we have now is pretty good, but it can get a lot better. Some experimentation was done that attempts to classify vertex attributes by how often they change, and only send them to the GPU when necessary. What we found is that the additional bookkeeping required to "detect changes" is in fact a lot of wasted CPU effort. It's more efficient to just send everything regardless.

Our experiments were complicated by the nature of the melonJS rendering pipeline, which already attempts to optimize by only drawing objects that are known to be visible in the scene. As things in the scene move, leaving and entering the viewport, the position of renderables within the stream buffer changes. That explains the extra bookkeeping requirement.

At first glance, it appears to be a great deal easier uploading everything to the GPU regardless of its visibility, and only sending changes as they are performed by the game. In other words, keeping the entity state within the GPU and synchronizing changes. Problems arise when entities get added and removed from the scene, especially particles; That puts you right back into a memory management role, with plenty of bookkeeping. This experiment lead to seeking other ways to make the compositor more efficient.

Another, simpler approach to working around bandwidth limitations is reducing the size of each vertex element. It's common to send a color as a vec4; R, G, B, and A components in the range 0..1. But that's considerably wasteful when a color needs to be included with every vertex. To reduce the size of the color information, these components can be packed into a single float (similar to the more common 32-bit RGBA unsigned integer that most game developers are familiar with). And the GPU can unpack the color into a vec4 within the vertex shader. Some precision information is lost in the packing/unpacking process, so that needs to be taken into consideration.

The next step is to rework all of the drawing code outside of the compositor, getting away from the "old way" of doing things like 2D Canvas. The Canvas API has a consistent global state for color and transformation matrix. The matrix, as I mentioned before, is a bit like a combined "ModelView matrix"; it handles the camera position and entity position in one. Replacing that with just a view matrix means we can remove a lot of heavy math operations from JavaScript, and move them to the GPU! A model matrix is not necessary for quads, because it would only apply to six vertices! (It would make sense for much larger meshes like Spine.) This work will take place in ticket #637. The API will change enough that WebGL will benefit in terms of better performance, but will still work with 2D Canvas.

Strict requirements for draw order have further complicated our WebGL architecture. We've disabled the depth buffer and draw everything with the same depth to simulate a pure 2D rendering environment. With the addition of extra metadata afforded by the work in #637, we will be able to make use of the depth buffer after all; every draw operation can be provided with z-index information. This in turn means the compositor can be transformed from a single stream buffer with many attributes to a series of buffers with fewer attributes and more use of uniforms (all the while being mindful of draw order for proper transparency/blending.)

WebGL 2.0 is an update that will be coming to UAs in the future, and it contains a lot of goodies that we can make use of: Array Textures, Vertex Attribute Objects, Multiple Render Targets, Instanced Objects, etc. I'm sure we'll have some very interesting things to look forward to, and some good optimizations by using them.

Benchmarks

It's time for some pretty pictures and geeky numbers! The hardware I used for these completely unscientific benchmarks is my Late 2011 Macbook Pro; Radeon HD 6770M, 2.5GHz quad core i7 Sandy Bridge, 8GB DDR3 RAM. The UA is Chrome 40.0.2214.111 (64-bit).

Since we don't have usable numbers from the particle debug panel (at the time of writing) I used the small Stats.js library and hacked it into the melonJS RAF. Then I added a particle emitter to the platformer example and configured it to spew 2,500 particles. First, some screenshots of this configuration with the CanvasRenderer.

CanvasRenderer

melonJS v2.1.0 CanvasRenderer (FPS)
melonJS v2.1.0 CanvasRenderer (MS)

Ouch! Right away, the CanvasRenderer is under heavy stress, running at an average 50fps. The particle motion was very obviously not smooth during this test. The second screen shows the reason for the lower FPS: each me.game.draw() call takes between 16 and 20ms! It's right on the threshold for 60FPS. Combined with the particle updates, CPU time goes over 16.7ms on many frames.

Still, the results for the CanvasRenderer are very impressive. I expected it to perform a lot worse in this test. Props to the Chrome team for that!

WebGLRenderer

Next, the same exact code running with WebGL. Take note the only difference is the URI! The code itself is unchanged. Also notice the bilinear filtering, especially around the spaceman's helmet; This is caused by Chrome lacking CSS3 image-rendering: pixelated; support (I'm using Chrome 40; pixelated will be added to Chrome 41). melonJS is already using this CSS property.

melonJS v2.1.0 WebGLRenderer (FPS)
melonJS v2.1.0 WebGLRenderer (MS)

Much better! A solid 60FPS, with each frame draw taking about 11 to 13ms. The FPS screen shows "59", and even stranger the range shows it hit "61" at some point. While I could just explain it away as a known issue in Chrome, I'm going to be very forward and say that I am not affected by that bug on this machine. However, it does manifest on a Macbook Air (especially with external monitor attached) that I use at work. So it's actually a thing. Instead I'm going to explain it away as an acceptable margin of error. ;)

One thing you can notice in the FPS graph (far left) is that it starts out pretty rocky, dropping as low as 37FPS! This is from a combination of JIT warmup and Compositor behavior as it resets and uploads new textures to the GPU when switching between the loading screen and play screen states. The particle emitter is also a bit of a beast that instantaneously creates 2,500 particles and sleeps until it can create more (as older particles die off). This causes a bit of a CPU spike until the particle distribution evens out.

In case you were wondering, the little red "GL" icon on the right side of the address bar is the WebGL Inspector Chrome extension. Very handy for debugging (disabled in these tests).

If we take it at face value, the 18ms → 11ms change marks a 39% improvement over CanvasRenderer. In other words, WebGLRenderer is 1.6x faster. That's not bad for a beta! Still more improvements to come.

All Done

While we don't get the raw GPU performance of a non-interactive benchmark, we still get really good results with fully interactive content and unlimited potential flexibility. We also afford the ability to replace the Compositor and shaders with completely custom code, just in case you want to create a non-interactive benchmark in melonJS. ;) Or more likely if you just want to pull off some crazy special effects that require additional attributes to be passed to your shaders. It would also be interesting to see other Compositors designed to be even faster than what we've built!

So far, WebGL support in melonJS is finally starting to shape up. It takes a backward-compatible approach to drawing the same way as the 2D Canvas API, which is limiting in terms of WebGL's strengths. However, it does lower the bar for melonJS users to get their games performing better with less work, and it also provides the kind of flexibility that a general purpose game engine cannot survive without.

In closing, there are a lot of tricky details to deal with when using WebGL. It's not all unicorns and rainbows. New algorithms need to be created to get the most out of the API. The biggest performance gains come from application-specific shaders, if you can manage. For a general purpose game engine, there's only so far you can stretch it. The parallelism really needs to be shared by application code (on the CPU) and shader code (on the GPU).

Saturday, February 14, 2015

melonJS should be *All About Speed* Part 5

Let's talk WebGL.

In the last *All About Speed* article, I resurrected an old draft and retooled it to fit the series. It asserted that a clever algorithm to skip logic updates on objects that won't need it would be a nice win for performance. While I still believe this is the case, we should first look at other bottlenecks that can be eliminated with a greater overall impact. Of course I'm talking about leveraging the GPU hardware to make drawing insanely fast.

The original WebGL article was split into two parts, as the material covered too much ground. In this first half, I discuss the dark internals of WebGL in general (not related to melonJS) and hopefully illuminate some points that are, at best, hidden deep in some specification that is difficult for a normal person to grok. In the second half, I will describe how WebGL is utilized in melonJS, explaining my reasoning behind certain design decisions along the way.

Before we jump into this boat, you can revisit previous articles to get a sense for where we've been:

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html
Part 2 : http://blog.kodewerx.org/2014/04/melonjs-should-be-all-about-speed-part-2.html
Part 3 : http://blog.kodewerx.org/2014/10/melonjs-should-be-all-about-speed-part-3.html
Part 4 : http://blog.kodewerx.org/2014/12/melonjs-should-be-all-about-speed-part-4.html

And now we can look into where we are going. Prepare yourself for a wall of text.

All Aboard the WebGL Train

The primary importance given to any WebGL article these days is on how much faster you can draw stuff when using it. That's a dead horse, in my opinion, but I will still jump on the WebGL train. We all know that WebGL is magical and can make your games run faster just by flipping the switch! Right?

Well, not exactly. There are a number of tradeoffs to consider, and we'll get to that in the next article. To start, I want to describe some of the work I've been doing with WebGL, and explain why it's faster than plain ol' 2D canvas. In the next article, I will cover some of the historical steps taken in melonJS to get it ready for WebGL. Most importantly, I intend to dive into a pseudo-catastrophe that proves it's not always possible to make your game faster just by turning on WebGL. As always, I will write with a great level of detail, but try not to get caught up in material that is too dense to consume comfortably.

What I especially like about the blog format is that I can present information in the form of friendly English prose. Because WebGL is such a different beast from anything else in web browsers today, it really deserves some in-depth coverage. Most literature assumes working knowledge or experience with 3D rendering APIs like OpenGL. I'll take it from a "what the heck is going on?" perspective. That means you won't find any matrix maths, 3D projection, vector normals, geometry, or trigonometry here. Just a description of the ins and outs of the WebGL concept. In other words, this is a human-readable explanation of what WebGL does instead of how to draw things in 3D.

Before we get into the nitty gritty, let me express a quick disclaimer: I don't know what I'm talking about. I've been playing with WebGL for a total of three months, and have zero prior experience with any 3D API, theory, or the mathematics behind it. (Ok, so I followed some basic tutorials a few decades ago to rasterize a wireframe cube in software. That doesn't count.)

All About the GPU

This isn't a topic I have seen discussed much. All information I have about GPUs, so far, comes from learning to use WebGL. And the resources I've used for learning WebGL can all be found on The Googlie.  Some notable examples are the MSDN WebGL documentation (surprise! Microsoft has the best API reference available today), the Khronos wiki, multiple StackExchange questions, some really informative WebGL presentations from Google I/O (search YouTube for these), and a few WebGL-specific tutorials here and there. The problem with all of these resources (except for some of the presentations) is that they don't explain what's going on! You just get a heap of code and you're told, "this is how you do it".

For my own sake (and perhaps for you lucky readers) I want to describe how WebGL works with the GPU. Remember folks, I still don't know what I'm talking about. All I have is a vague idea of how things fit together. The only trouble now is to decide where to begin.

Maybe we'll start with the code used the initialize a WebGL context, and work from there. You've seen it before; call the getContext("webgl") method on a canvas, compile shaders, initialize the viewport. Getting a context is simple, and once you have acquired it, that canvas can no longer use the 2D canvas API.

Next you compile the shaders. This was really mysterious at first, but it's simply a pair of programs (written in a weird C-like language called GLSL) that get compiled and uploaded to the GPU. Most tutorials and frameworks have you place your GLSL code inside a <script> element in the DOM. It's probably a convenient place to put it, but I would actually describe it as a victim of circumstance; <script> happens to be available, and also happens to bypass the DOM layout engine and JavaScript interpreter. It's literally just used to store a string. You could just as easily put the GLSL into their own files and read them using XHR. For melonJS, I chose neither option. Instead, the GLSL is compiled right into the concatenated JavaScript source code as a string variable! Hopefully that clears up any confusion about why shaders are defined in <script> elements in many tutorials.

Speaking of shaders ... What are these things anyway? They are incredibly low-level compared to anything else seen in the HTML+CSS+JS web platform. These are programs that run directly on your GPU. JavaScript doesn't even run directly on your CPU (technically, it does go through a series of compilers to eventually run as optimized assembly on the CPU, but that's arguably a far cry from statically compiling a shader and running it on a GPU). And you'll notice there are two kinds of shaders, termed a vertex shader and a fragment shader.

Some tutorials describe what these shaders do without going into detail on the GLSL language and its peculiarities. Others just claim that you shouldn't worry about it, because most people will just use a library that includes its own shaders. Both cases are counter to educational utility, so I'll attempt to describe what shaders are actually doing behind the scenes; the parts that you don't necessarily think about when writing GLSL because the complexity is all abstracted away. The ultimate goal is of course to learn what shaders do and how to feed data to them so that pretty things show up on your screen.

The Vertex Shader

This shader is responsible for taking an array of attributes and performing computations on them to be passed to the fragment shader. The attributes are things like vertices (points representing a position in 2D or 3D space), vectors (points representing a direction and magnitude, rather than a position), colors, and basically everything that can be defined with a number. The attributes can be anything you want, keeping in mind that there are a limited number of attributes that a shader may use (at least 8; almost all implementations support 15 though). Typically, a vertex shader takes at least one vertex attribute as input, outputs one vertex and any additional information (like color, normal vectors, whatever you want) to the rasterizer.

The rasterizer is something you never see or interact with. (It is part of what's called the Fixed Functionality Pipeline in WebGL and OpenGL ES.) The information it gets from the vertex shader is interpolated for every pixel that covers the primitive's area and is then passed to the fragment shader. The vertex shader is called three times for each triangle, (two times for each line, one time for each point) receiving one vertex and its attributes on each call.

The Fragment Shader

The data from the rasterizer can be modulated or combined in the fragment shader to output a single color back to the rasterizer. This allows for texture sampling and lighting effects using vector normals, for example. Once in the fragment shader stage, the pixel position in the frame buffer has already been determined. The only thing the fragment shader can do is output a single color for that pixel, or discard it (nothing will be drawn to the frame buffer for this pixel).

The fragment shader is called once for every pixel within the area covered by the triangle that was output by the vertex shader. In general, this means you do position adjustments in the vertex shader, and color adjustments in the fragment shader.

Not Just For Triangles

A triangle is just a simple polygon. The shaders are for programmable geometry rendering, after all. But triangles are just one type of primitive supported by WebGL. The GPU can also be instructed to draw lines or points. A line is exactly what it sounds like; a line segment defined by two vertices. A point is defined by a single vertex. The rasterizer runs slightly differently depending on what it's drawing. For example, when drawing a line, the vertex shader only runs twice for each primitive, and the fragment shader returns color information for the pixels interpolated along that line. For a point, you get one vertex and one pixel (or more depending on the point size configured by the vertex shader).

Shaders Work Together

It should be apparent that shaders work in concert to create the pretty graphics you want. They have different jobs and different runtime characteristics, but they are both required if you want anything to happen. The simplest shaders are ones that output exactly what is given as input. This would be used for example to create flat-shaded geometry that is computed on the CPU-side; pass each vertex to the vertex shader in an attribute array, and pass the color to the vertex shader in a uniform variable.

Attribute vs Uniform vs Varying

You might also notice these keywords in shader GLSL examples. These are data type storage qualifiers, as defined by the OpenGL ES Shading Language specification. (That's a very informative document in its own right, but incomprehensibly dense in scope.) The meaning behind these qualifiers is quite simple, but perhaps a bit unintuitive:

  • Attributes are properties of a vertex; its position, its normal vector, its color, etc. These are sent to the vertex shader via the bufferData and bufferSubData WebGL calls... assuming you've bound the attribute to the ARRAY_BUFFER. Attributes are sent as arrays, and processed by the vertex shader one at a time.
  • Uniforms are variables sent to both shaders simultaneously through the uniform* WebGL calls. The uniforms are constant for the duration of a draw (drawArrays or drawElements) so they cannot be used for dynamic values. They are best used for things like simple ambient lighting, color blending, projection matrices, etc. Things that are static for each batch draw.
  • Varyings are only sent from the vertex shader to the fragment shader. The vertex shader sets the varyings for each vertex, and the rasterizer interpolates these varying values for each pixel, passing them to the fragment shader. This is how a fragment shader is able to get information such as the vertex normal for the current pixel to calculate per-pixel lighting, and the coordinates for texture mapping. Because these values are interpolated, they vary over the entire primitive being rasterized.

That's a description of WebGL shaders in a nutshell! I've provided information about all of the inputs and outputs for the shaders in the rendering pipeline, with exception of one minor detail: the shaders can also output information to the rasterizer using a set of hardcoded named variables. This is an unfortunate design decision on the part of Khronos Group, which oversees the specification. But it's what we have to live with. Effectively, they look like undefined global variables in a C-like language (and maybe they are exactly that, at the end of the day). It's ugly, but I guess it works! The outputs that will be used most often are:

  • gl_Position : The vertex shader output for a vec4 (4-element vector) describing the vertex position.
  • gl_FragColor : The fragment shader output for a vec4 describing the fragment color.

There are a few others available as well:

  • gl_PointSize : The vertex shader output that describes the size of a point in pixels.
  • gl_FragData : The fragment shader output for providing an array of vec4 data for the fragment on the fixed functionality pipeline. A fragment shader may use gl_FragData or gl_FragColor, but not both. This variable is used with multiple render targets (MRT) in place of gl_FragColor. E.g. the WEBGL_draw_buffers extension.
And there are a few inputs:
  • gl_FragCoord : A read-only variable denoting the window-relative vec4 position of the fragment. Useful for full-screen effects like plasma and other color distortions.
  • gl_FrontFacing : Another read-only fragment shader variable (boolean) that defines whether the fragment is facing toward the viewer (true) or away (false). This is used for back-face culling and two-sided lighting.
  • gl_PointCoord : The last fragment shader read-only variable (vec2) that provides the pixel coordinate within a point. Used with gl_PointSize to do things like radial gradients.

Finally, there are also a number of constants available to both shaders, but they aren't necessary for this discussion. And that is everything you need to know about shaders!

Compiling Shaders

After your shaders are written with fancy maths, you need to compile them. Your browser contains a static compiler that takes your GLSL source code (as a text string) and compiles it into something that can run on your GPU. There's a lot of boilerplate around this, including error handling. Most of that you can safely ignore for the discussion. As long as you include it in your WebGL bootstrap code, it's a write-once-and-forget-it kind of thing.

The steps are:
  1. Bind shader source
  2. Compile shaders
  3. Attach compiled shaders to a shader program
  4. Link program
  5. Use program

The specifics aren't very interesting, but there you have it! Typically you will use only one shader program, but it's possible to use multiple shader programs to render complex scenes using different shaders. For example, you might use a special vertex shader to simulate grass and leaves blowing in the wind, but you don't want to distort all of the geometry in your scene with this shader. Another example involves using a secondary (or even tertiary) shader program for post-processing; adding a grain filter (noise), bloom, reflections, deferred lighting, etc.

Binding GPU I/O

After the shader program is setup, you will create bindings for each of the attribute and uniform variable inputs for each shader. There are several methods available for this step: getAttribLocation, enableVertexAttribArray, vertexAttribPointer, getUniformLocation, ... You want to make these calls once, just like you only compile the shaders once, and using the shader program once (if you only have a single shader program). Binding these variables and getting their location multiple times is a waste of CPU effort.

With your variables bound, you can begin populating them by creating even more bindings and finally calling bufferDatabufferSubData, or the uniform* methods.

There is only one important bind target (aka pointer) to use for uploading attributes; ARRAY_BUFFER. The ARRAY_BUFFER target selects which attribute you are uploading to. So for multiple attribute arrays, you will be rebinding ARRAY_BUFFER often. The only other bind target for the bufferData and bufferSubData call is ELEMENT_ARRAY_BUFFER, which is a special array that provides a list of array indices for lookups within the attribute buffers. In other words, it allows you to send repetitive information to the vertex shader in a "compressed" format.

The purpose of all these bindings is simply because they are fairly lightweight; it's just a pointer that is adjusted to point at something else which the GPU already knows about. This can be used for instance to upload a ton of textures at the start of a level, and then arbitrarily bind the [limited] texture units to any of those textures while rendering the scene.

Texture Units

A texture unit is a register that points to a texture (aka image) in video memory, and defines how the texture should be used (sampled) by the fragment shader. WebGL specifies 2D texture samplers and cube map samplers.

The number of texture units available on the GPU is very limited; Every WebGL implementation has at least 8 texture units, depending on the hardware, with some GPUs supporting up to 192 (as of writing). The number of texture units sets an upper bound on how many textures you can use in a single batch operation. The GPU on my laptop has 16 texture units, meaning I can render a scene that requires up to 16 textures with a single draw call. To support more textures per batch operation, I would have to merge source images into bigger textures for each texture unit.

It is also possible to sort the scene by common features like texture, in order to fit all renderables into a single batch that uses a specific texture (or set of textures). You will run into problems with layering (actual draw order vs expected draw order) when using this method. The typical solution is using the depth buffer to occlude fragments which have already been drawn. To support different blending modes (especially translucency) in a batch, the sorting also needs to place fully opaque textures first.

The depth buffer is another part of the fixed functionality pipeline. Another fixed function buffer is the stencil buffer. There are also color buffers, which is what the rasterizer writes fragment colors to (the image that eventually is displayed on the screen is a color buffer). I won't cover these buffers here; just keep in mind that they exist, and they help you do things that would otherwise be very tricky!

Creating Textures

Textures are pretty simple; just an image sent to the GPU and internally organized for performance. Textures that are exactly sized to powers-of-two can be reorganized as a mipmap, which is a fancy way to say that the texture scaling (downsampling) is precomputed. This makes sampling the texture faster when rendering it at a smaller size, and provides higher quality rasterization.

Like attributes and uniforms, textures should be uploaded and configured once, and then bound dynamically to texture units as needed. For best results, use all of the texture units available for each batch operation.

Drawing

Finally, we can draw something! This is the part where you fill your dynamic attribute array buffers, send them to the GPU, and request it to draw. You may also set uniform variables here, but if the uniforms aren't changing, don't bother sending them more than once. WebGL is a state machine; it remembers the state until you change it. I see a lot of tutorials fail to describe this behavior, and instead show code that is always doing a bunch of useless calls, wasting your poor CPU and battery time. Luckily, tools like WebGL Inspector can highlight duplicate calls so you can treat WebGL like the state machine that it is!

The simplest thing to draw is a single point, which consists of one vertex; an attribute array of size 1. The attribute is a vec2, and when used with the simplest possible shaders, you will get a colored dot on the screen. To move the dot, send a different value for the vertex attribute and redraw. A slightly more complex thing to draw is a triangle, which will need a three-element attribute array; each element describing the point for each of the three vertices. Now you get a solid-colored triangle.

(Aside: This is how most tutorials start; notice how much description went into this writing before we got here? There are a lot of moving parts that need to be covered before one can just jump into rendering a simple triangle! Perhaps after all that WebGL theory, you can truly appreciate how much work it takes just to draw a dumb triangle.)

To draw a textured triangle, the fragment shader needs to be updated to use a texture sampler, and its texture unit bound to a texture. The fragment shader also needs texture coordinates to map the texture onto the triangle. The simplest texture coordinates just use the same triangle vertices, but this is ultra rare in practice; usually the texture coordinates are static, and the vertices are dynamic, allowing a triangular piece of the texture to be drawn at any location on screen. Occasionally you might use dynamic texture coordinates as well, for things like flowing water (though the result is unrealistic).

Then you draw two triangles back-to-back and you have a rectangle (aka quad)! And you can send a complex triangle mesh giving you a fully textured 3D model, or even a complete scene.

Now You Know

That's all there is to know about the GPU! At least, as far as I have come with WebGL. ;) There is a large wealth of information available about everything from matrix transformations to lighting, and cube mapping to render-to-texture... The list is endless. But that should be enough information to get other newbs started with this incredible technology. It also shows the depth and breadth of knowledge required to get good results out of it, and highlights the expansive differences compared to the 2D canvas API.

The biggest topic I have not covered yet is handling lost context events. This is just the reality that you must be prepared to face when writing WebGL code. I won't go into details here, because I haven't actually done this work in melonJS yet. Just know that it's a problem, and you will have to handle it. Here's a decent resource that will point you in the right direction with the what, why, and how of handling lost context: https://www.khronos.org/webgl/wiki/HandlingContextLost

Give your brain a few days to let all this stuff sink in! It's not at all intuitive to programmers that are unfamiliar with OpenGL ES. Armed with this knowledge, you will have a much better experience using any WebGL framework. At least, better compared to blindly evaluating something like Three.js without any understanding of the actual work that it is doing. You should also have a good idea of how to start writing custom shaders to do things that the frameworks can't do out-of-the-box. All you need now is a GLSL Cheat Sheet (the good stuff starts on page 3).

Oh, you're all rested up now? Then get out there and make us something beautiful!