Saturday, February 14, 2015

melonJS should be *All About Speed* Part 5

Let's talk WebGL.

In the last *All About Speed* article, I resurrected an old draft and retooled it to fit the series. It asserted that a clever algorithm to skip logic updates on objects that won't need it would be a nice win for performance. While I still believe this is the case, we should first look at other bottlenecks that can be eliminated with a greater overall impact. Of course I'm talking about leveraging the GPU hardware to make drawing insanely fast.

The original WebGL article was split into two parts, as the material covered too much ground. In this first half, I discuss the dark internals of WebGL in general (not related to melonJS) and hopefully illuminate some points that are, at best, hidden deep in some specification that is difficult for a normal person to grok. In the second half, I will describe how WebGL is utilized in melonJS, explaining my reasoning behind certain design decisions along the way.

Before we jump into this boat, you can revisit previous articles to get a sense for where we've been:

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html
Part 2 : http://blog.kodewerx.org/2014/04/melonjs-should-be-all-about-speed-part-2.html
Part 3 : http://blog.kodewerx.org/2014/10/melonjs-should-be-all-about-speed-part-3.html
Part 4 : http://blog.kodewerx.org/2014/12/melonjs-should-be-all-about-speed-part-4.html

And now we can look into where we are going. Prepare yourself for a wall of text.

All Aboard the WebGL Train

The primary importance given to any WebGL article these days is on how much faster you can draw stuff when using it. That's a dead horse, in my opinion, but I will still jump on the WebGL train. We all know that WebGL is magical and can make your games run faster just by flipping the switch! Right?

Well, not exactly. There are a number of tradeoffs to consider, and we'll get to that in the next article. To start, I want to describe some of the work I've been doing with WebGL, and explain why it's faster than plain ol' 2D canvas. In the next article, I will cover some of the historical steps taken in melonJS to get it ready for WebGL. Most importantly, I intend to dive into a pseudo-catastrophe that proves it's not always possible to make your game faster just by turning on WebGL. As always, I will write with a great level of detail, but try not to get caught up in material that is too dense to consume comfortably.

What I especially like about the blog format is that I can present information in the form of friendly English prose. Because WebGL is such a different beast from anything else in web browsers today, it really deserves some in-depth coverage. Most literature assumes working knowledge or experience with 3D rendering APIs like OpenGL. I'll take it from a "what the heck is going on?" perspective. That means you won't find any matrix maths, 3D projection, vector normals, geometry, or trigonometry here. Just a description of the ins and outs of the WebGL concept. In other words, this is a human-readable explanation of what WebGL does instead of how to draw things in 3D.

Before we get into the nitty gritty, let me express a quick disclaimer: I don't know what I'm talking about. I've been playing with WebGL for a total of three months, and have zero prior experience with any 3D API, theory, or the mathematics behind it. (Ok, so I followed some basic tutorials a few decades ago to rasterize a wireframe cube in software. That doesn't count.)

All About the GPU

This isn't a topic I have seen discussed much. All information I have about GPUs, so far, comes from learning to use WebGL. And the resources I've used for learning WebGL can all be found on The Googlie.  Some notable examples are the MSDN WebGL documentation (surprise! Microsoft has the best API reference available today), the Khronos wiki, multiple StackExchange questions, some really informative WebGL presentations from Google I/O (search YouTube for these), and a few WebGL-specific tutorials here and there. The problem with all of these resources (except for some of the presentations) is that they don't explain what's going on! You just get a heap of code and you're told, "this is how you do it".

For my own sake (and perhaps for you lucky readers) I want to describe how WebGL works with the GPU. Remember folks, I still don't know what I'm talking about. All I have is a vague idea of how things fit together. The only trouble now is to decide where to begin.

Maybe we'll start with the code used the initialize a WebGL context, and work from there. You've seen it before; call the getContext("webgl") method on a canvas, compile shaders, initialize the viewport. Getting a context is simple, and once you have acquired it, that canvas can no longer use the 2D canvas API.

Next you compile the shaders. This was really mysterious at first, but it's simply a pair of programs (written in a weird C-like language called GLSL) that get compiled and uploaded to the GPU. Most tutorials and frameworks have you place your GLSL code inside a <script> element in the DOM. It's probably a convenient place to put it, but I would actually describe it as a victim of circumstance; <script> happens to be available, and also happens to bypass the DOM layout engine and JavaScript interpreter. It's literally just used to store a string. You could just as easily put the GLSL into their own files and read them using XHR. For melonJS, I chose neither option. Instead, the GLSL is compiled right into the concatenated JavaScript source code as a string variable! Hopefully that clears up any confusion about why shaders are defined in <script> elements in many tutorials.

Speaking of shaders ... What are these things anyway? They are incredibly low-level compared to anything else seen in the HTML+CSS+JS web platform. These are programs that run directly on your GPU. JavaScript doesn't even run directly on your CPU (technically, it does go through a series of compilers to eventually run as optimized assembly on the CPU, but that's arguably a far cry from statically compiling a shader and running it on a GPU). And you'll notice there are two kinds of shaders, termed a vertex shader and a fragment shader.

Some tutorials describe what these shaders do without going into detail on the GLSL language and its peculiarities. Others just claim that you shouldn't worry about it, because most people will just use a library that includes its own shaders. Both cases are counter to educational utility, so I'll attempt to describe what shaders are actually doing behind the scenes; the parts that you don't necessarily think about when writing GLSL because the complexity is all abstracted away. The ultimate goal is of course to learn what shaders do and how to feed data to them so that pretty things show up on your screen.

The Vertex Shader

This shader is responsible for taking an array of attributes and performing computations on them to be passed to the fragment shader. The attributes are things like vertices (points representing a position in 2D or 3D space), vectors (points representing a direction and magnitude, rather than a position), colors, and basically everything that can be defined with a number. The attributes can be anything you want, keeping in mind that there are a limited number of attributes that a shader may use (at least 8; almost all implementations support 15 though). Typically, a vertex shader takes at least one vertex attribute as input, outputs one vertex and any additional information (like color, normal vectors, whatever you want) to the rasterizer.

The rasterizer is something you never see or interact with. (It is part of what's called the Fixed Functionality Pipeline in WebGL and OpenGL ES.) The information it gets from the vertex shader is interpolated for every pixel that covers the primitive's area and is then passed to the fragment shader. The vertex shader is called three times for each triangle, (two times for each line, one time for each point) receiving one vertex and its attributes on each call.

The Fragment Shader

The data from the rasterizer can be modulated or combined in the fragment shader to output a single color back to the rasterizer. This allows for texture sampling and lighting effects using vector normals, for example. Once in the fragment shader stage, the pixel position in the frame buffer has already been determined. The only thing the fragment shader can do is output a single color for that pixel, or discard it (nothing will be drawn to the frame buffer for this pixel).

The fragment shader is called once for every pixel within the area covered by the triangle that was output by the vertex shader. In general, this means you do position adjustments in the vertex shader, and color adjustments in the fragment shader.

Not Just For Triangles

A triangle is just a simple polygon. The shaders are for programmable geometry rendering, after all. But triangles are just one type of primitive supported by WebGL. The GPU can also be instructed to draw lines or points. A line is exactly what it sounds like; a line segment defined by two vertices. A point is defined by a single vertex. The rasterizer runs slightly differently depending on what it's drawing. For example, when drawing a line, the vertex shader only runs twice for each primitive, and the fragment shader returns color information for the pixels interpolated along that line. For a point, you get one vertex and one pixel (or more depending on the point size configured by the vertex shader).

Shaders Work Together

It should be apparent that shaders work in concert to create the pretty graphics you want. They have different jobs and different runtime characteristics, but they are both required if you want anything to happen. The simplest shaders are ones that output exactly what is given as input. This would be used for example to create flat-shaded geometry that is computed on the CPU-side; pass each vertex to the vertex shader in an attribute array, and pass the color to the vertex shader in a uniform variable.

Attribute vs Uniform vs Varying

You might also notice these keywords in shader GLSL examples. These are data type storage qualifiers, as defined by the OpenGL ES Shading Language specification. (That's a very informative document in its own right, but incomprehensibly dense in scope.) The meaning behind these qualifiers is quite simple, but perhaps a bit unintuitive:

  • Attributes are properties of a vertex; its position, its normal vector, its color, etc. These are sent to the vertex shader via the bufferData and bufferSubData WebGL calls... assuming you've bound the attribute to the ARRAY_BUFFER. Attributes are sent as arrays, and processed by the vertex shader one at a time.
  • Uniforms are variables sent to both shaders simultaneously through the uniform* WebGL calls. The uniforms are constant for the duration of a draw (drawArrays or drawElements) so they cannot be used for dynamic values. They are best used for things like simple ambient lighting, color blending, projection matrices, etc. Things that are static for each batch draw.
  • Varyings are only sent from the vertex shader to the fragment shader. The vertex shader sets the varyings for each vertex, and the rasterizer interpolates these varying values for each pixel, passing them to the fragment shader. This is how a fragment shader is able to get information such as the vertex normal for the current pixel to calculate per-pixel lighting, and the coordinates for texture mapping. Because these values are interpolated, they vary over the entire primitive being rasterized.

That's a description of WebGL shaders in a nutshell! I've provided information about all of the inputs and outputs for the shaders in the rendering pipeline, with exception of one minor detail: the shaders can also output information to the rasterizer using a set of hardcoded named variables. This is an unfortunate design decision on the part of Khronos Group, which oversees the specification. But it's what we have to live with. Effectively, they look like undefined global variables in a C-like language (and maybe they are exactly that, at the end of the day). It's ugly, but I guess it works! The outputs that will be used most often are:

  • gl_Position : The vertex shader output for a vec4 (4-element vector) describing the vertex position.
  • gl_FragColor : The fragment shader output for a vec4 describing the fragment color.

There are a few others available as well:

  • gl_PointSize : The vertex shader output that describes the size of a point in pixels.
  • gl_FragData : The fragment shader output for providing an array of vec4 data for the fragment on the fixed functionality pipeline. A fragment shader may use gl_FragData or gl_FragColor, but not both. This variable is used with multiple render targets (MRT) in place of gl_FragColor. E.g. the WEBGL_draw_buffers extension.
And there are a few inputs:
  • gl_FragCoord : A read-only variable denoting the window-relative vec4 position of the fragment. Useful for full-screen effects like plasma and other color distortions.
  • gl_FrontFacing : Another read-only fragment shader variable (boolean) that defines whether the fragment is facing toward the viewer (true) or away (false). This is used for back-face culling and two-sided lighting.
  • gl_PointCoord : The last fragment shader read-only variable (vec2) that provides the pixel coordinate within a point. Used with gl_PointSize to do things like radial gradients.

Finally, there are also a number of constants available to both shaders, but they aren't necessary for this discussion. And that is everything you need to know about shaders!

Compiling Shaders

After your shaders are written with fancy maths, you need to compile them. Your browser contains a static compiler that takes your GLSL source code (as a text string) and compiles it into something that can run on your GPU. There's a lot of boilerplate around this, including error handling. Most of that you can safely ignore for the discussion. As long as you include it in your WebGL bootstrap code, it's a write-once-and-forget-it kind of thing.

The steps are:
  1. Bind shader source
  2. Compile shaders
  3. Attach compiled shaders to a shader program
  4. Link program
  5. Use program

The specifics aren't very interesting, but there you have it! Typically you will use only one shader program, but it's possible to use multiple shader programs to render complex scenes using different shaders. For example, you might use a special vertex shader to simulate grass and leaves blowing in the wind, but you don't want to distort all of the geometry in your scene with this shader. Another example involves using a secondary (or even tertiary) shader program for post-processing; adding a grain filter (noise), bloom, reflections, deferred lighting, etc.

Binding GPU I/O

After the shader program is setup, you will create bindings for each of the attribute and uniform variable inputs for each shader. There are several methods available for this step: getAttribLocation, enableVertexAttribArray, vertexAttribPointer, getUniformLocation, ... You want to make these calls once, just like you only compile the shaders once, and using the shader program once (if you only have a single shader program). Binding these variables and getting their location multiple times is a waste of CPU effort.

With your variables bound, you can begin populating them by creating even more bindings and finally calling bufferDatabufferSubData, or the uniform* methods.

There is only one important bind target (aka pointer) to use for uploading attributes; ARRAY_BUFFER. The ARRAY_BUFFER target selects which attribute you are uploading to. So for multiple attribute arrays, you will be rebinding ARRAY_BUFFER often. The only other bind target for the bufferData and bufferSubData call is ELEMENT_ARRAY_BUFFER, which is a special array that provides a list of array indices for lookups within the attribute buffers. In other words, it allows you to send repetitive information to the vertex shader in a "compressed" format.

The purpose of all these bindings is simply because they are fairly lightweight; it's just a pointer that is adjusted to point at something else which the GPU already knows about. This can be used for instance to upload a ton of textures at the start of a level, and then arbitrarily bind the [limited] texture units to any of those textures while rendering the scene.

Texture Units

A texture unit is a register that points to a texture (aka image) in video memory, and defines how the texture should be used (sampled) by the fragment shader. WebGL specifies 2D texture samplers and cube map samplers.

The number of texture units available on the GPU is very limited; Every WebGL implementation has at least 8 texture units, depending on the hardware, with some GPUs supporting up to 192 (as of writing). The number of texture units sets an upper bound on how many textures you can use in a single batch operation. The GPU on my laptop has 16 texture units, meaning I can render a scene that requires up to 16 textures with a single draw call. To support more textures per batch operation, I would have to merge source images into bigger textures for each texture unit.

It is also possible to sort the scene by common features like texture, in order to fit all renderables into a single batch that uses a specific texture (or set of textures). You will run into problems with layering (actual draw order vs expected draw order) when using this method. The typical solution is using the depth buffer to occlude fragments which have already been drawn. To support different blending modes (especially translucency) in a batch, the sorting also needs to place fully opaque textures first.

The depth buffer is another part of the fixed functionality pipeline. Another fixed function buffer is the stencil buffer. There are also color buffers, which is what the rasterizer writes fragment colors to (the image that eventually is displayed on the screen is a color buffer). I won't cover these buffers here; just keep in mind that they exist, and they help you do things that would otherwise be very tricky!

Creating Textures

Textures are pretty simple; just an image sent to the GPU and internally organized for performance. Textures that are exactly sized to powers-of-two can be reorganized as a mipmap, which is a fancy way to say that the texture scaling (downsampling) is precomputed. This makes sampling the texture faster when rendering it at a smaller size, and provides higher quality rasterization.

Like attributes and uniforms, textures should be uploaded and configured once, and then bound dynamically to texture units as needed. For best results, use all of the texture units available for each batch operation.

Drawing

Finally, we can draw something! This is the part where you fill your dynamic attribute array buffers, send them to the GPU, and request it to draw. You may also set uniform variables here, but if the uniforms aren't changing, don't bother sending them more than once. WebGL is a state machine; it remembers the state until you change it. I see a lot of tutorials fail to describe this behavior, and instead show code that is always doing a bunch of useless calls, wasting your poor CPU and battery time. Luckily, tools like WebGL Inspector can highlight duplicate calls so you can treat WebGL like the state machine that it is!

The simplest thing to draw is a single point, which consists of one vertex; an attribute array of size 1. The attribute is a vec2, and when used with the simplest possible shaders, you will get a colored dot on the screen. To move the dot, send a different value for the vertex attribute and redraw. A slightly more complex thing to draw is a triangle, which will need a three-element attribute array; each element describing the point for each of the three vertices. Now you get a solid-colored triangle.

(Aside: This is how most tutorials start; notice how much description went into this writing before we got here? There are a lot of moving parts that need to be covered before one can just jump into rendering a simple triangle! Perhaps after all that WebGL theory, you can truly appreciate how much work it takes just to draw a dumb triangle.)

To draw a textured triangle, the fragment shader needs to be updated to use a texture sampler, and its texture unit bound to a texture. The fragment shader also needs texture coordinates to map the texture onto the triangle. The simplest texture coordinates just use the same triangle vertices, but this is ultra rare in practice; usually the texture coordinates are static, and the vertices are dynamic, allowing a triangular piece of the texture to be drawn at any location on screen. Occasionally you might use dynamic texture coordinates as well, for things like flowing water (though the result is unrealistic).

Then you draw two triangles back-to-back and you have a rectangle (aka quad)! And you can send a complex triangle mesh giving you a fully textured 3D model, or even a complete scene.

Now You Know

That's all there is to know about the GPU! At least, as far as I have come with WebGL. ;) There is a large wealth of information available about everything from matrix transformations to lighting, and cube mapping to render-to-texture... The list is endless. But that should be enough information to get other newbs started with this incredible technology. It also shows the depth and breadth of knowledge required to get good results out of it, and highlights the expansive differences compared to the 2D canvas API.

The biggest topic I have not covered yet is handling lost context events. This is just the reality that you must be prepared to face when writing WebGL code. I won't go into details here, because I haven't actually done this work in melonJS yet. Just know that it's a problem, and you will have to handle it. Here's a decent resource that will point you in the right direction with the what, why, and how of handling lost context: https://www.khronos.org/webgl/wiki/HandlingContextLost

Give your brain a few days to let all this stuff sink in! It's not at all intuitive to programmers that are unfamiliar with OpenGL ES. Armed with this knowledge, you will have a much better experience using any WebGL framework. At least, better compared to blindly evaluating something like Three.js without any understanding of the actual work that it is doing. You should also have a good idea of how to start writing custom shaders to do things that the frameworks can't do out-of-the-box. All you need now is a GLSL Cheat Sheet (the good stuff starts on page 3).

Oh, you're all rested up now? Then get out there and make us something beautiful!

No comments: