BlogWerx: 2014

Tuesday, December 16, 2014

melonJS should be All About Speed Part 4

We've reached Part 4 of the melonJS performance improvement series! In the last post, I discussed some benefits that we can get out of WebGL, and wrapped up the inheritance improvements with some simple benchmark numbers. We've seen remarkable speed increases so far, and there's no reason to stop there! Today we'll discuss the game loop, one of the hottest parts of the framework (in terms of profiling), and what we can do to make it faster.

But first, here are the first three posts in this series, in case you want to catch up:

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html
Part 2 : http://blog.kodewerx.org/2014/04/melonjs-should-be-all-about-speed-part-2.html
Part 3 : http://blog.kodewerx.org/2014/10/melonjs-should-be-all-about-speed-part-3.html

Processing AI on Non-Displayed Entities

A common question on the melonJS forum is, "Why don't my entities move when they are outside of the screen?" The answer is short and sweet, but understanding it will require the complete content of this article: melonJS will not attempt to perform updates on entities the player cannot see or interact with.

The usual followup question is "How do I make my entities update anyway?" And the answer to this question is something that most inexperienced developers do not want to hear: You do not want to update entities the player cannot see or interact with.

Trust me, you do not want to do it.

I know, you have big plans for a dynamic world where every NPC goes about its business and may even make some game-changing alterations when they are 1,000 miles away. Or your multiplayer game needs to track projectiles shot by other players on the server. And on, and on.

You still don't want to do it.

I'll tell you why you don't want to do it: The biggest hit to performance your game will face is rendering frames. But it is easy to optimize; only draw the parts that the player can currently see. Once that has been solved, the second performance challenge is going to be updating AI on all of the entities within the game world. This is also easy to optimize; only update the entities that the player can currently see.

The good news is that the end result you want can easily be "faked". You don't have to let AI wander and interact with stuff to create random world-events. Nor do you have to test every projectile for collisions when the player can't even see the things. These two concepts can be categorized under "mental masturbation"; it seems like a really interesting idea, and it also seems like the obvious way that things in a game world should work. The truth is, these are both very naïve implementations of game features than can be done in O(1) constant time; the calculations taking far less than even a single frame to complete.

The first one: making NPCs move about the world? That's wasteful! It can be simulated by giving the NPC a new random off-screen position as soon as it leaves the viewport. And the interactions can also be done with a much simpler random event system; with some random/rare chance, perform a random interaction. It's two lines of code, it doesn't burn CPU time unnecessarily, it has the same end result, and it's really quite convincing.

The second one: Handling all remote player actions client-side, even when not viewable? Use events instead! Your server will be tracking the projectiles for collision detection. (Right? Right!) It should also track every player's viewport, and signal the remote player when a projectile first enters their viewport.

Here's a quote from a forum post I made a while ago, which I think describes the issue as good as anything: "I like to think of the problem in a similar way to Schrödinger's cat; if I can't observe it, it may or may not exist." It's better to be on the safe-side, and assume it doesn't exist until the player can see it.

Why go to this much trouble to discard things the player cannot see? Because the performance win is unprecedented! Imagine a fairly big game that has 300 "alive" objects. Many of the objects are entities that have a renderable, but not all; there are also objects for the HUD, objects for the map, even objects for performing tweens and timers. Now let's say you also want the game to run at 60 fps. So that's 300 objects to update and draw within 16.7ms.

Because there are two loops (update loop, and draw loop) you can divide 16.7 by 300*2 and derive the average amount of time your game may spend processing an update or a draw from a single object. Pencils down! It's about 28µs! That's 0.000028 seconds each. For reference, Javascript's Date() resolution only goes down to 1ms (0.001 seconds) and the timer (setTimeout and friends) resolution is only about ~~10ms at best (0.01 seconds)~~ 4ms (0.004 seconds) in recent UAs. Most developers using JavaScript will not have to deal with time slices any smaller than this.

So, can your objects update or draw in just 28µs or less? Well, maybe. Depending on the overall CPU speed (something you do not control!) and whether your code is highly optimized (often for a specific browser) ... maybe. Instead, let's assume we don't have to update and draw all 300 objects. (Because you don't!) Maybe only 50 of them are on screen at any one time. Punch in the new number, and you get 167µs, much better! It's still not a whole lot of time, but it is six times more available time than when operating on all 300 objects.

Now that you have heard the theory and some very simple workarounds, I will tell you how to ignore all of this advice and allow your objects to update while off-screen anyway: Never use the `alwaysUpdate` flag. It is your enemy. It was put in so that it could be used responsibly to provide incredibly powerful functionality, like me.Tween. This is an example of an object that does not need to be considered in the viewport, so should update every frame.

How we plan to make it better

Some day this might change, and objects like me.Tween could get their own cozy little area to live, outside of the normal game loop. That would allow us to get rid of the alwaysUpdate flag, for the most part. One exception is destroying projectiles when they leave the viewport. For this, melonJS needs viewport-enter and viewport-leave events (see #191). This ticket primarily covers an entity "sleep" API, which will allow entities to be put to sleep even if they are in the viewport, further saving precious CPU cycles for entities you know will not need AI updates.

There is a goal, here, of course. melonJS should automatically take care of putting entities to sleep when they are not needed. The obvious heuristic is whether the player can see the entity; the inViewport flag. Some other heuristics may be chosen automatically, but I don't know exactly what they will be.

For the inViewport stuff, it's pretty easy to optimize, now that we have a QuadTree. This data structure can be used to select all entities near the viewport. The selected list of entities are then considered "awake" for the frame (minus the entities that were forced to sleep). I have a hunch the event handling can be done efficiently with set theory operations.

Effectively we have four sets: all, awake, sleeping, and inViewport.

all: All entities in the world container.
awake: A subset of all, containing the awake entities.
sleeping: A subset (all ^ awake), containing all sleeping entities.
inViewport: A subset of all, containing entities that are in the viewport rectangle.

To automatically sleep entities, we take a difference: (inViewport ^ awake) the result is a set containing all of the awake entities outside of the viewport. For each entity in this result set, trigger the viewport-leave event, and move it from awake to sleeping.

To automatically wake up entities when they enter the viewport, we can take the intersection: (inViewport & sleeping) the result is a set containing all sleeping entities within the viewport. melonJS can then decide whether these entities need to be woken; whether is was put to sleep automatically, is set to wake up at some time in the future, etc. Waking an entity is a matter of triggering the viewport-enter event on it, and moving it from sleeping to awake.

Finally we are left with a synchronized awake set that can be updated for the frame. And the inViewport set is drawn.

This is not exactly how it will be implemented, but it shows the basic idea; Only entities that are awake can be updated, and by default only entities that are in the viewport are awake. There will likely be some overrides available for forcing entities to be awake regardless of their visibility, ~~just because we know how game developers love to shoot themselves in the foot!~~ But we're hoping the viewport events will be used instead for things like destroying projectiles when they leave the screen.

The benefit of doing all of this work with set theory is that melonJS won't have to iterate the entire list of 300 entities each frame; it only has to iterate the list of entities which are potentially in the viewport. I say "potentially" because there is a chance that the list provided by QuadTree contains entities just outside the viewport boundaries. Those edge cases still must be filtered by doing rectangle hit testing. This is an overall win as long as all of your entities are not within the viewport. (That can happen! Especially in single-screen maps.)

The second benefit is keeping track of entities in the awake and sleeping sets to assist efficient event handling; using a subset of subsets, further reducing the amount of iterations required to process each frame.

The viewport events, along with the related entity lifecycle events, can make all the difference in a game with large maps. Developers need to be cautious about unnecessarily using the CPU, but the game engine also needs to provide utilities to directly manage how the CPU is used by game entities. That's what these events are all about. And that's why melonJS is all about speed!

Wednesday, October 15, 2014

melonJS should be All About Speed Part 3

In the first two parts of this blog series, I discussed classical inheritance and object pooling. You can find links to both parts below.

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html
Part 2 : http://blog.kodewerx.org/2014/04/melonjs-should-be-all-about-speed-part-2.html

Inheritance Wrap-Up

After a long hiatus, I'm back to work on melonJS. Continuing with the performance enhancements we introduced with v1.1.0, we've fixed a lot of the performance burden with the collision detection system for v1.2.0.

But before we get into that, let's see what kind of performance increase was actually gained by transplanting our classical inheritance microlib and getting rid of the deepcopy that it had to perform. I haven't actually prepared a benchmark of any kind. Fortunately, there's not much need for building a benchmark, since we can just use the Particle Editor example that ships with melonJS since v1.0.0! The nice thing about the Particle Editor is that it includes a custom particle-debug-panel plugin which reports information like how long the update and draw steps take, and even renders a nice little performance graph.

Here's what the Particle Editor looks like in v1.0.0:

melonJS 1.0.0 Particle Editor

This is what it looked like upon initial release. While the general display hasn't changed much (some rendering bugs aside), you should take note of the performance graph in the lower left corner; In 1.0.0, 300 particles take roughly 3ms to update the Particle Container, and 3.3ms (total) to update the entire frame. Similarly for drawing: 3.7ms to draw the Particle Container, and 4.5ms (total) to draw the entire frame. That's certainly not terrible; 7.9ms used over the entire frame, which leaves about 8.7ms for game logic to maintain 60fps.

How does 1.1.0 compare, with its improved object inheritance?

melonJS 1.1.0 Particle Editor; Note the missing control widgets. A bug that went unnoticed.

Wow ... wait, what?

This is the same Particle Editor example, which itself hasn't really changed. We also didn't touch the Particle Emitter or Container code. Ignoring the missing widgets (which was a bug introduced by the new video renderer) this example is exactly the same. And yet, it's overall 4x faster.

The Particle Container update time reduced from 3ms to under one tenth of a millisecond. And total frame update time is down to about 0.16ms! The performance increase with drawing is equally stunning; Particle Container draw time is down to 1.9ms (from 3.7ms) and total frame drawing time is down to 2.8ms (from 4.5ms). The overall frame time with 300 particles is about 2.9ms! Which means the biggest remaining impact is the rendering of 300 individual images. Nothing a little WebGL can't solve! But more on that later...

These numbers tell us that even with 300 particles, you still have over 13.7ms to do your game logic on each frame to maintain a buttery-smooth 60fps. (It might be telling to note that I ran these tests on a Late-2011 Macbook Pro, with Chrome 38.0.2125.101, and "automatic graphics switching" disabled in System Preferences.)

Furthermore, I've fixed the rendering issues with the control widgets for 1.2.0, and just for kicks here's what the Particle Editor now looks like running under tonight's build!

melonJS 1.2.0 Particle Editor (WIP)

As you can see, the performance numbers have not changed at all. That's because the widgets in 1.1.0 were being drawn, just incorrectly. For some reason, I managed to capture this screen shot on a frame that was actually drawing faster than what 1.1.0 showed. But that's just coincidence. The average frame times are identical.

So there you go! I actually do know what I'm talking about. ;) Go update your JavaScript projects to Jay Inheritance if you care about speed.

Collision Detection

This is the next major performance boost on our end! Olivier finally ripped out all of the old tile-based collision detection code and replaced it with the lovely SAT.js (Separating Axis Theorem). This gives us pixel-precise polygon collision and hit detection using an efficient algorithm invented by people much smarter than any of us. Olivier also incorporated a QuadTree implementation for the broad phase, which reduces the number of tests that need to be done when collision detection is requested. Together, these two features make collision detection incredibly fast and flexible.

For an example of the flexibility, melonJS now has proper collision detection support for isometric maps for the first time. It is possible to define arbitrary shapes within your maps for world collisions (create bumpy terrain or funky shapes!) and you're also able to define multiple arbitrary shapes for your entities' physics bodies. For anyone familiar with 2D or 3D physics engines, we have the same concept of a "body" where you apply forces, and specify collision shapes. The entity references its body, so the entity itself is no longer responsible for physics within melonJS.

There are still bugs and missing features (like ellipse shapes not working 100% -- perfect circles are fine, though). However, it is a really good start for fixing performance issues around collision detection. There are also a few more things we can do to increase performance even more, and there are open tickets for all of them:

Of these, #551 is probably the least important for speed; It will help a bit, but the algorithm is tricky, and it won't be a super big win. #587 will make a noticeable improvement on maps with many world collision shapes. And #574 will be the biggest win of the three.

What #574 promises is to remove the user's responsibility for performing collision detection, which in turn means the user won't be able to slow down her game by doing collision detection calls too often. In other words, the user won't be able to penalize herself just because she wants to perform collision detection against 1000 entities. She should absolutely be allowed to do that! Bare in mind that such a game might only be feasible on Desktop. Collision detection is still hard to optimize for mobile platforms.

Onward to WebGL

Aaron has been busy since 1.0.0 adding support for WebGL! This will be the biggest performance improvement on mobile platforms, hands-down. Even the WIP 1.2.0 doesn't have a working WebGL renderer, so I'm not able to show off any crazy benchmarks. Primarily we're still in the architecture stage for implementing WebGL. We're currently missing useful things like font rendering! And our 2D/3D matrix code seems to have some issues, causing the WebGL renderer to display all black. ;)

WebGL is kind of a buzzword for speed, so me clarify what WebGL means to the melonJS team: WebGL means that we can offload some of the heavy drawing code to the GPU (where it belongs). That seems pretty obvious. WebGL also means we can offload some of the heavy non-drawing code to the GPU. That's where we will gain the biggest wins. GPUs are exceptionally good at parallelizing computations. The kinds of computations we have to do on the CPU; things like vector math for physics. Why not write a physics shader to handle some of those transformations on the GPU?

Speaking of parallelization, desktop and mobile platforms have multiple CPU cores, which are typically idle, even in the computationally heaviest of melonJS examples and games. We have an opportunity to utilize all of the cores for the best player experience using Web Workers. We haven't ironed out all the details, but there's an ongoing discussion in the ticket.

WebGL won't be ready for the 1.2.0 release at the end of the month. But the looming thought of ultra fast sexy graphics in melonJS makes my mouth water!

Wednesday, May 21, 2014

The Debugger is a Hacker's Katana

It may be difficult to tell from my blog alone, but I have a strong passion for debuggers. Especially in the context of reverse engineering; analyzing software at the machine level. Today I shall step outside of video game development and write about debuggers, code analysis, and assembly hacking. All of the fun stuff that I grew up with!

Yesterday I released the first version of node-capstone, a node module that provides bindings for the Capstone disassembler library. It's the first piece of a debugger project that I recently restarted. It's a long story, but the project has been in development for about 5 years. These are its latest developments.

Hacking like a Samurai

First, I should probably establish my credentials in this area. I've written debuggers for five different architectures. Two of those were released publicly; FCEUd and GCNrd. The other three targeted Nintendo DS, Nintendo 64, and Nintendo Virtual Boy.

There are a few patterns to be spotted here! Obviously, they are all for Nintendo game systems. But more importantly, the debuggers interact with and analyze assembly code. Games for the most recent of these machines (NDS, GCN, and N64) were written in C and C++, but of course we don't have the luxury of inspecting any of this source code. The next best thing is analysis of the machine code itself; transform the machine code into human-readable assembly language, and begin studying!

In the past three years, I have written a lot of JavaScript and Python, which leaves little room for poking around in assembly languages. But there are good debug environments for each; For JavaScript you have Firebug, Firefox Browser Debugger, and the Webkit Inspector; For Python there's the excellent pdb module. These tools all hold many similarities to one another; providing breakpoints for trapping code execution, inspecting and modifying variables at runtime, injecting code, but most importantly they are all source-level debuggers.

Source Level vs Machine Level

So let's talk about source level debuggers and how they differ from machine level debuggers. A source level debugger expects the original source code and debug info from the compiler that maps machine code addresses to the source code. Some of these source level debuggers (gdb, et al.) will disassemble machine code if there is no source (or if you ask it to), but these tools are not designed for static analysis. They can give you a murky picture of the machine code at best.

A machine level debugger is the ultimate super king of static analysis. There is no "source of truth" to make sense of the blob of binary being analyzed. Some executable formats provide hints about code structure, like which segments contain code and which segments are only for data. Then there is the issue of determining which bytes in the code segments are instructions (this is complicated by some architectures using variable-sized instructions. x86 is one such architecture). Finally, it must determine the overall structure of the code; which groups of instructions comprise a complete function, flow control, how functions refer to one another, how they refer to data, pattern matching and heuristics to identify common library functions, etc.

With static analysis you can go even one step further, and determine how sets of instructions can be "decompiled" into a high-level language. A simple example is a load instruction followed by an arithmetic instruction that adds 5, and a store instruction back to the same address; this set could be equivalent to a variable assignment like the following in C:

x += 5;

This method of analysis is not necessary in a source level debugger. But it becomes important with fairly complex code in a machine level debugger. The above snippet of C is immediately recognizable, compared to the sample assembly below:

ldr  r3, 8[r4]
addi r3, r3, #5
str  r3, 8[r4]

It is this sophisticated analysis that I want my debugger to do. I have a long, long way to go before it gets to that point.

The Project

I've explained enough to describe my intention; building a machine level debugger with awesome analysis tools. But I haven't really touched on internal design or even specific architecture targets or languages. And with good reason! This debugger is taking an architecture-independent approach. Meaning it should be able to do a decent job debugging anything that it can disassemble. But it can do the best job with architectures that it can analyze automatically.

To achieve this abstraction, the debugger needs a solid core with all architectures supported by plugins or modules. Even the tracing of running processes will be written with an abstraction layer; a protocol I'm designing called Scalable Remote Debugger Protocol. SRDP provides a generic interface for transporting raw data and issuing primitive debugger commands like "add breakpoint". The protocol also supports a notification mechanism for event handling (like a breakpoint being triggered).

SRDP

The protocol is still in the design phase, having undergone many revisions already. The latest developments in protocol land have been separating the data blocks from the data streams. CPU memory is an example of a block of data. Things like commands and event notifications are data streams. Conceptually all three can use the same transport, but the endpoints will be a bit different:

Data block endpoints are encapsulated as a mountable block storage device, similar to a USB flash drive or SD card. This allows interacting with a machine using native OS utilities; want to read raw memory from a specific address? Open and read the file, just like you are already used to doing!
Commands and events are encapsulated in a data stream interface using sockets, which itself uses the same open/read/write/close API as the file system.

Putting the data blocks into a mountable storage interface was an idea from qeed, who is working on something similar for his GameBoy emulator.

SRDP gives us a way to connect debugger targets to debugger UIs. A UI could provide a text mode interface like gdb, or a graphical interface like IDA Pro.

The User Interface

This project has been in development for a long time, and it has gone through plenty of revisions itself. When I initiated the project over 5 years ago, I spent a lot of time researching available GUI toolkits. The only one I found promising was the web stack! HTML is already used to layout every website in existence, CSS styles them, and JavaScript breathes life into them. This is the web stack; it is not tied to any particular operating system, and it works everywhere.

As early as '10, the only real web stack that could be used outside of a browser was Mozilla's framework, called XULRunner. I've written a few samples with it [1] [2], and was impressed immediately. But at the time, the technology was not mature enough to support development of a full-featured application by just one man.

XULRunner is still alive and kicking today, in the same capacity as always; it powers Firefox and similar applications. There are three reasons I did not stick with it as my platform of choice:

XUL. This is the name of a structural language with XML syntax that replaces HTML as the GUI layout definition format in XULRunner. You don't so much write HTML+CSS+JavaScript as you write XUL+CSS+JavaScript. It's rather unfamiliar.
XPCOM. This technology allows JavaScript to call functions written in another language (C, C++, Java, Python, etc.) XPCOM is written in C++. Even to interface with an existing dynamically linked library, one would have to write an XPCOM component in C++ to create the bindings. This situation has changed since '10, with the introduction of js-ctypes.
Debugging. Native debugger UIs in XULRunner were non-existent at the time. It was possible to install a version of Firebug which worked with XULRunner, but it just wasn't the same. This situation may also have been alleviated, now that Firefox Browser Debugger is available.

Due to the rough edges with XULRunner, I briefly switched gears away from JavaScript for my debugger UI, and went full-force into Python+Tkiner! While Tk certainly has its place, I'm fully convinced that it is not for developing debugger user interfaces. It does not give you full control over element styling (only a little!) and there's a grievous bug in the OSX release that will probably go unfixed for the next decade. The only thing Tkinter has going for it is that it's bundled with Python.

With Python eliminated, I'm back to JavaScript as the host for my application. Being in game development, I was made aware of node-webkit, which is very similar to XULRunner, but uses HTML everywhere instead of XUL, integrates node.js instead of XPCOM, and has the Webkit Inspector built in for debugging. Seems perfect, right?

The problem with node.js is that native modules must be written in C++ (if you can't tell, I have a major aversion to this language). But then I came across the ffi module, which can be compared to js-ctypes. Now it's perfect!

Baby Steps

I have not yet decided on how I want to render the disassembler or hex editor (two widgets that are required in any machine level debugger), but I know exactly how I'm going to implement the disassembler!

In the Python experiment, I wrote a complete MIPS disassembler directly in Python. It wasn't going to be the fastest disassembler ever, but it sure was easy to write and extend! Not wanting to rewrite it in JavaScript, I decided to look for an open source library that would suit my needs. I found Capstone. It is awesome.

Capstone is written in C, based on LLVM, and supports five architectures: x86 (including X86_64), ARM, ARM64, MIPS, and PowerPC, with two more coming in the next version (SystemZ and Sparc). These are all interesting to me (especially x86_64, which is a disassembler I do NOT want to write myself), and the library should be straight-forward to extend with other architectures that I want.

All I need now is JavaScript bindings for it!

var capstone = require("capstone");

var code = new Buffer([
    0x55, 0x48, 0x8b, 0x05, 0xb8, 0x13, 0x00, 0x00
]);

var cs = new capstone.Cs(capstone.ARCH.X86, capstone.MODE.X64);

cs.disasm(code, 0x1000).forEach(function (insn) {
    console.log(
        "0x%s:\t%s\t%s",
        insn.address.toString(16), insn.mnemonic, insn.op_str
    );
});

cs.close();

Welp! That was easy! :D

In all seriousness, I spent a weekend getting this working. Here's the actual output:

0x1000: push    rbp
0x1001: mov     rax, qword ptr [rip + 0x13b8]

Compare to the Python example: http://www.capstone-engine.org/lang_python.html (the close() method was unavoidable in JavaScript, because there are no destructors.)

It is written as a node module using ffi, and has been released on github under the terms of the MIT license. This will provide the "solid core" that I mentioned earlier. Hook the disassembler up with some static analysis tools, an awesome renderer/editor, and SRDP; a modular debugger foundation that runs everywhere and debugs everything!

It's still just a tool

While that sounds quite glorified and romantic, it must be said that even the one-debugger-to-rule-them-all won't make you the best hacker in the world. It takes years of practice to use such a tool effectively. And it takes many more years just to create the tool, and to refine it to perfection.

The debugger is my katana.

Sunday, May 11, 2014

Trials of Documenting JavaScript Source Code

This should be a solved problem, by now. It's 2014, for crying out loud! Documenting source code in other languages is pretty much trivial; for Python there's Sphinx, Ruby has RDoc, for Java you have JavaDoc, and then there's Doxygen if you need to document C/C++ code. These are all pretty much de facto standards in their respective realms. What do we have for JavaScript?

A horrible wart called JSDoc3.

I hate JSDoc3. It's slow, the default template is unacceptably bad, it's difficult to create links between symbols in a semantic way, formatting requires writing HTML junk into your doc comments, ...

JSDoc3 is so bad, there are tons of replacements available:

And plenty of forks that were omitted. (I've only included generators written in JavaScript for this list. I've also seen Sphinx, JSDuck, and some others that I won't go into, because they cannot be used as node-modules.)

Docco doesn't support structured comments, and it's really designed for micro projects or tutorials (see Jasmine documentation for an example), so that's out.

dox-foundation is a good start with markdown support, but organization is a major issue (everything is sorted by file name!) and it still doesn't solve the problem with linking symbols.

Doxx is also nice (and is a fork of dox-foundation), but it's still not good at organizing generated documentation, and linking between symbols is still a pain. And what's with the jade templates? I like to write my HTML in HTML.

YUIDoc looks just as big and clunky as JSDoc3, and its default template looks like something out of Microsoft or IBM.

dox-docco is similar to Docco, but with structured comment support. It's also best for tutorials.

Dissatisfied with everything available, I decided to use dox and Prism (the same components behind dox-foundation, Doxx, and dox-docco) and combine them with doT to create an incredibly fast and flexible documentation generator that will do everything I need for one of my personal projects. Styling is based on Foundation, a la dox-foundation.

Here's what it looks like so far:

node-capstone documentation

This is a screenshot showing the WIP documentation for a project I've been hacking on for the past few weeks. (I will blog about that project later.) The project is a node module, and I wanted an easy way to generate the documentation that meets the following criteria:

It has to be blazing fast! No more waiting 30-seconds to spin up a JVM, please.
It must also be lightweight, but not limiting; easy to extend.
It should support JSDoc3-style tags in block comments.
It must support styling within documentation, preferably without HTML soup.
It must be capable of organizing documentation in a more sane manner than sorted-by-filename.
It must provide links to contextually relevant information in a semantic way.
The theme must not make my eyes bleed.

I think I've accomplished all of these goals, so far. It's a tiny JavaScript app that just does some comment parsing, symbol organization, and template rendering (with a really fast renderer). The template renderer, doT, is awesome because the template itself can run JavaScript, meaning it can do some of the heavy lifting (which is important as you will see later).

dox parses JSDoc3-style block comment tags, and integrates markdown, so that's perfect.

After comments are parsed, symbols are organized; first by kind (classes, functions, constants, miscellaneous). On the TODO List: child symbols below the four top-level categories will be further organized by kind (again), then sorted alphabetically.

Data types are already linked properly (the arguments table for INSN_OFFSET() in the screenshot has a type linking to a class) and automatically; there's no special documentation syntax for it! It just works.

The theme is easy on the eyes; it's pretty, it doesn't get in the way with obnoxious frames, drop-down menus, or accordions. There are no bouncy animations, giant titles, or logos; just the juicy meat of the documentation! Syntax highlighting is all there. The full source code for each symbol is included, hidden by default, and automatically highlighted as well; in the screenshot, source code for INSN_OFFSET() is shown.

All this was built in just the last three days or so, and I've been creating it as a grunt task from the start, so hopefully it can be reused again on other projects. The grunt task uses metadata from the project's package.json, and can get additional information (or overrides) from the task options.

Design of the Generator

After parsing comments with dox, it does some basic symbol organization to build a tree structure that will represent the final documentation output. The tree is simply passed on to the template renderer. The template is responsible for inferring types and generating proper links between symbols; the renderer and parsers do not care about context. Doing the type inference late in the rendering allows the template to choose link destinations (not just DOM structure and styling) to keep the documentation comment semantics separate from the documentation template structure. Only the template knows where the template is putting symbols!

At present it is not ready to take on a project the size of melonJS. In fact, it's not even complete enough to fully document my little node module! You'll even notice in the screenshot there are three kind-of-broken links at the end of the side navigation bar. ;) Only the functions and main pages have templates.

It has been a very pleasant side project, though. Of all the components I'm using, the one with the steepest learning curve has been Foundation ... and that's just a bunch of CSS! dox makes comment parsing dead simple, and provides enough context to create beautiful documentation. Prism was absolutely zero-effort to integrate (after installing a dox fork that uses marked for markdown parsing). And doT is extremely versatile; it's almost ... magical. ;) The last piece, which I haven't mentioned yet, is Font Awesome. Four of the five icons I've used can be seen in the screenshot. The last icon I'm using is for Twitter, and it appears in the page footer if you've specified a twitter handle in the grunt task options. How's that for awesome?

Next Steps

I actually have a small TODO list for this side project, and it happens to be a larger list than the node module which required documentation in the first place! One thing that would be handy is printing the file name and line number alongside the source code, but this will require extra patches to dox; may as well fork it at this point.

Another thing to do is make the side navigation bar "sticky", so I don't have to scroll to the top of the page each time I want to visit a different symbol. This is actually more important than it sounds. ;)

I have not started on the "class", "constants", or "miscellaneous" pages at all! But considering the "functions" page was blank just yesterday (and now it looks like the screenshot) I would say it isn't a lot of trouble to do each page. To be honest, I was expecting it to take longer for some reason.

And lastly, to actually package it as a grunt plugin. Right now it's just embedded into the node module's repo, where it doesn't really belong! That should just be a matter of running grunt-init. But then there's the task of documenting it! Hahaha!

Friday, April 11, 2014

melonJS should be All About Speed Part 2

If you haven't already read Part 1, head over to that article for some important background information before continuing below.

Part 1 : http://blog.kodewerx.org/2014/03/melonjs-should-be-all-about-speed.html

Deep Copy

In version 0.9.11, we quietly rolled out a little feature that allows including complex objects in class descriptors for the inheritance mechanism we have been using (John Resig's Simple Inheritance). This allows, for example, creating classes with properties that instantiate other classes, or include arrays or dictionaries. It works by modifying Object.extend to do a deep copy on all of the non-function properties in the descriptor. This feature came about by ticket #307. You'll find some discussion in the ticket about why it doesn't work without deep copy, and some of the ugly prototype hacks we had to use to get it working.

Once the deep copy was all well and good, it came right back to bite us! That isn't surprising, to say the least. But it has caused one bug report so far, due to cyclic references and object pooling in #443. (Pooling is a method of reusing objects by resetting their internal state, which we will cover momentarily.)

Deep copy is unfortunately very slow. It's the process of iterating each property and determining the data type, then performing a shallow copy for simple objects like numbers and strings, or another deep copy for complex objects like arrays and dictionaries. Because the complex objects are reference types, they may create circular references, the simplest example being object A references object B, and B has a reference back to A. These object cycles are hard to deep copy, because you need to protect against copying the same object more than once. If that protection is missing, you quickly enter an infinite loop or stack depth overflow with recursion.

Even worse, this deep copy needs to be done at object instantiation time, so it will be dramatically slow to use the new keyword on any of your classes. Mix that with large classes, and your game will be in a lot of trouble!

That's why I decided not to include deep copy in Jay Inheritance. In fact, I took it even one step further and specifically made all properties immutable when added to a class. So if you define integers using the class descriptors, you'll find that you are not allowed to change the values in those properties. This should get the idea across that class descriptors are for methods only. No exceptions. And all in the name of speed.

Object Pooling

Deep copy was added as part of the object pooling revamp in 0.9.11. The idea was to simplify child object creation, by allowing child objects to be defined in the class descriptor. All children would be copied into the new object automatically as part of the Class constructor. And a bit of background on this decision: The pooling mechanism has to call the constructor to reconfigure objects when reusing them. And you don't want the constructor to create new child objects when the children already exist! That would defeat the purpose of pooling.

But now we know that was not the right way to fix child object creation with object pooling. A better approach is to embrace separation of responsibilities, allowing each part of the pooling mechanism to be responsible for one thing and one thing only. In the case of child object creation, that responsibility relies with the constructor, not the class descriptor.

We've supported pooling for quite a long time. But it has been broken, to be honest. The reason I say "broken" is due to the way the object life cycle works in melonJS. There's the init method which is the "user" constructor, and an onResetEvent method which is for resetting internal state of the object, and finally there is also a destroy method which can be used as a destructor.

init : Constructor, called when the object is created with new.
destroy : Destructor, called when the object is deactivated.
onResetEvent : Reset internal state, called when the object is reactivated.

Unfortunately, these methods are poorly defined within the object life cycle. For example, destroy is not called when the object is ready for garbage collection! Instead, it is called when the object is deactivated; when it is placed into the pool for later reuse. This means destroy cannot safely be used as a destructor unless onResetEvent also does constructor work, like creating child objects and event listeners. This is broken by definition; only the constructor should do constructor work.

To complicate matters, only a few classes have an onDestroyEvent callback that is called by the default destroy method. It's treated as a user-defined destructor, of sorts. But again, it is only called during object deactivation, and only very rarely used.

To fix all of these issues, we need to define the object life cycle concretely. A clear separation between construction and activation, and between deactivation and destruction is necessary to make pooling work effectively. To that end, here is my proposal for the life cycle methods, and when they are called:

init : Constructor, called when the object is created with new.
destroy : Destructor, called when the object is ready for garbage collection.
onResetEvent : Reset internal state, called when the object is activated/reactivated.
onDestroyEvent : Disable object, called when the object is deactivated.

For clarity, "deactivation" means the object is removed from the scene graph, and placed into the pool for later reuse. This is the signal to your object that it should remove any event listeners. Likewise, "activation" means placing the object into the scene graph ("reactivation" is when an object is reused from the pool). This is a signal to the object that it should reset its internal state, the internal state of its child objects, and adding any event listeners.

The constructor still needs to define all of the object's properties, and set values on them. This also means creating new child objects and setting their default internal state.

Finally, the destructor will make sure to remove any kind of "global state" changes that it has made, like event listeners, loaded resources, timers, tweens, etc.

The life cycle of an object without pooling then looks like this:

init
onResetEvent
onDestroyEvent
destroy

And as you might imagine, the life cycle of an object in the pool might look more like this:

init
onResetEvent
onDestroyEvent
onResetEvent
onDestroyEvent
...

Without ever calling destroy. It will just continually be deactivated and reactivated as needed.

This API can make object pooling faster by being very strict about where in the object life cycle child objects can be created. For example, a shape object needs a vector for positioning. As long as this vector is only created in init, there will be no garbage collection thrashing when the shape object is pooled. Worst case, the position vector only gets reset (x and y properties updated) each time the object is reactivated with onResetEvent.

One final note here is that I want to establish these method names as the de facto object lifetime methods within the entire game engine. Not just for objects which get added to the scene graph, but for everything; even me.ScreenObject has had an onResetEvent callback implemented forever. It's slightly different though, called just after the registered state has been switched to. And onDestroyEvent is called when the state is switched away. But still very fitting.

Resetting Object Internal State

Now would be a good time to also touch on how I expect the object internal state to be set initially in the constructor. The answer is simply "don't care!" I cannot recommend having init call onResetEvent, due to inheritance rules; It would be a bad idea because both methods must call the overridden method on the super class. You would then have onResetEvent getting called multiple times at constructor-time! That would be slow and dumb. The right way to set initial state is to do so in the constructor only if you have to. Otherwise do all of the configuration when the object is added to the scene (handled by onResetEvent).

Here's some example code to help illustrate the concept:

var x = Math.random() * me.video.getWidth();
var y = Math.random() * me.video.getHeight();
var z = 9;
var particle = me.pool.pull("Particle", x, y, { "image" : "flame" });
me.game.world.addChild(particle, z);

This code will fetch an object from the pool that was registered with the class name "Particle", and pass some constructor/reset parameters. In the case that the "Particle" pool is empty, a new object will be instantiated, and the parameters are passed straight to the constructor. In the case that a "Particle" object is reused from the pool, it's constructor will not be called. Finally, the particle is added to the scene graph, and its z-index is set properly.

And this is where we need to make some big decisions! The pooling mechanism today actually does call the constructor in the latter case, which is very bad because the constructor should never have to worry about whether a child object already exists... In my mind, the child objects will never exist when the constructor is called, so it is only responsible for instantiating the children.

I would like this reuse path to rely on onResetEvent being called by me.game.world.addChild(), which will pass along the "object state" configuration parameters, probably even removing these parameters from the constructor entirely. In other words, the example code needs to be changed to this:

var x = Math.random() * me.video.getWidth();
var y = Math.random() * me.video.getHeight();
var z = 9;
var particle = me.pool.pull("Particle");
me.game.world.addChild(particle, x, y, z, { "image" : "flame" });

You will also notice the z parameter when adding the object to the scene graph. This has always been there! And it feels quite nice to finally put all of the coordinates together in the same place. The point to take away from this is that creating the object is not enough to actually use it, so why configure its internal state at constructor time? Especially if the constructor may not even be called because the object is just getting reused!

Simplified: create an unconfigured object, add it to the scene with configuration. The same pattern works equally well without pooling, just replace the me.pool.pull() call with the new keyword:

var x = Math.random() * me.video.getWidth();
var y = Math.random() * me.video.getHeight();
var z = 9;
var particle = new Particle();
me.game.world.addChild(particle, x, y, z, { "image" : "flame" });

Configuring Objects Without the Scene Graph

This is another good point to cover before I wrap up pooling changes! There are some objects which can be reused that never get added directly to the scene graph, or rather they discretely add themselves to the scene graph without the user's knowledge. me.Tween comes to mind immediately! This class uses the scene graph to ensure it updates object properties at the right time, and it adds itself to the scene graph when its start method is called. For this specific object though, I think it will be enough to reset internal state in onDestroyEvent, when it is removed from the scene. That will set it up for reuse later, without an extra reset step anywhere else.

For other objects like me.Color and me.Vector2d, my first thought is making the configuration step explicit, by calling onResetEvent directly. But that seems kind of silly, especially for people used to passing configuration parameters to the constructor in classical Object-Oriented Programming. But maybe it really is that easy? If you want to use pooling, call the object's onResetEvent to configure the object after it is retrieved from the pool. If you don't need pooling, pass configuration to the constructor!

// Object without pooling
var v1 = new me.Vector2d(10, 50);

// Object with pooling
var v2 = me.pool.pull("me.Vector2d").onResetEvent(10, 50);

These kinds of classes need to duplicate configuration code, because as mentioned earlier it's not a good idea to call onResetEvent from the constructor. Hopefully these classes will all be simple enough that the possibility of double-configuration (constructor directly followed by onResetEvent) will not be a large burden. Though we can be smart about it by counting arguments passed to the constructor, if necessary.

Conclusion

With that, we should be on our way to a very streamlined, GC-friendly game engine! Next time I'll talk about optimal array management; speeding up node addition/removal in the scene graph and object pool.

Saturday, March 22, 2014

melonJS should be All About Speed

I've been working on melonJS since I first used it in July 2012. It's now over 1.5 years later, and we've learned a lot as a team. One particular area of melonJS that I have focused a lot of time is performance. And in this article, I will try to describe current pain points and what we are doing to make them better.

TL;DR: A super fast prototypal inheritance microlib for the modern web. https://github.com/parasyte/jay-extend

Simple Inheritance

melonJS has always used John Resig's Simple Inheritance pattern, like many of the HTML5 game engines available. While looking for things to optimize, I stumbled upon some funny code with regular expressions in the Object.extend helper function. If you've never looked at the source, you can head over to John's original article where he introduced it. The RegExp is a hack to determine if the browser can coalesce functions into a string; in other words, it is a test to determine if the browser is capable of decompiling functions. But why on earth would it need to decompile functions? Well, for syntactic sugar, of course! To save the programmer from typing prototype so often.

To see how that regular expression got there, we first have to do some code archeology. One of the best ways to illustrate is to reinvent it. All the code I will be showing here was run using jsfiddle with my browser's web console open.

Let's start with prototypal inheritance; the concept of creating a function in a prototype form (a class), then instantiating objects from that class. Here is a very simple example of a class which shows a constructor and an accessor method:

function Foobar() {
    this.num = 123;
}

Foobar.prototype.setNum = function (num) {
    return this.num = (typeof(num) === "number") ? num : this.num;
};

In this example, the function named Foobar is the class, and its body is the constructor. Then the class prototype is modified to include an accessor method. Let's instantiate an object with this class and try it out:

console.clear();

var f = new Foobar();
console.log("Expect 123:", f.num);
console.log("Expect 9000:", f.setNum(9000));
console.log("Expect 9000:", f.setNum("abc"));

If you run the code, you'll see that it does work as intended! This is the classic example of prototypal inheritance in JavaScript, and you will see it used in tons of libraries today. But as you start to add more and more methods to your class, you will be typing prototype quite a lot. A little syntactic sugar can hide prototype by iterating over a descriptor which defines the methods you want added to the prototype.

Object.prototype.extend = function (Class, descriptor) {
    for (var method in descriptor) {
        Class.prototype[method] = descriptor[method];
    }
}

function Foobar() {
    this.num = 123;
    this.str = "xyz";
}

Object.extend(Foobar, {
    "setNum" : function (num) {
        return this.num = (typeof(num) === "number") ? num : this.num;
    },
    "setStr" : function (str) {
        return this.str = (typeof(str) === "string") ? str : this.str;
    }
});

I've added a second accessor method to our Foobar class, and also created a new function called Object.extend (same name as used in Simple Inheritance, by the way) which provides the syntactic sugar we were looking for. Hooray! No more prototype in our class definitions. And here's another quick test to prove that it works:

console.clear();

var f = new Foobar();
console.log("Expect 123:", f.num);
console.log("Expect 9000:", f.setNum(9000));
console.log("Expect 9000:", f.setNum("abc"));
console.log("Expect xyz:", f.str);
console.log("Expect abc:", f.setStr("abc"));
console.log("Expect abc:", f.setStr(9000));

Now what about inheritance? In JavaScript, you can do single inheritance quite simply by replacing the class prototype with a copy of the prototype from another class. There's a function built into the language for doing exactly that: Object.create.

(Side note: Object.create is a relatively recent addition to the JavaScript language, and Simple Inheritance pre-dates its inclusion. So Simple Inheritance uses a trick involving the instantiation of the object itself with a little hack to prevent the constructor from running during the prototype creation phase. See John's article for more details on that little catch.)

In order to create the inheritance chain, we need to modify Object.extend to return an empty class whose prototype is given a copy of the base class (thanks to Object.create). Because the empty class has its own constructor, we need a way to run the user's constructor. Again, following the Simple Inheritance API, we'll adopt init as the name of the user's constructor, and simply call that if it exists.

Object.prototype.extend = function (descriptor) {
    function Class() {
        if (this.init) {
            this.init.apply(this, arguments);
        }
        return this;
    }

    Class.prototype = Object.create(this.prototype);

    for (var method in descriptor) {
        Class.prototype[method] = descriptor[method];
    }

    return Class;
}

We'll use this updated Object.extend to recreate our base class Foobar, then create a Duck class that inherits from Foobar.

var Foobar = Object.extend({
    "init" : function () {
        this.num = 123;
        this.str = "xyz";
    },
    "setNum" : function (num) {
        return this.num = (typeof(num) === "number") ? num : this.num;
    },
    "setStr" : function (str) {
        return this.str = (typeof(str) === "string") ? str : this.str;
    }
});

var Duck = Foobar.extend({
    "speak" : function () {
        return "quack";
    }
});

And then we'll write a simple test to verify it all works as I claim:

console.clear();

var d = new Duck();
console.log("Expect 123:", d.num);
console.log("Expect 9000:", d.setNum(9000));
console.log("Expect 9000:", d.setNum("abc"));
console.log("Expect xyz:", d.str);
console.log("Expect abc:", d.setStr("abc"));
console.log("Expect abc:", d.setStr(9000));
console.log("Expect quack:", d.speak());

There you go! The reinvention of Simple Inheritance is complete. Well, almost! What happens if you want to override a method, instead of just adding new ones? You can do that with the above code, sure, but you probably want to run the overridden method, as well. This is done by going back to the prototype chain, again! Let's modify Duck to append "quack" to every string set:

var Duck = Foobar.extend({
    "speak" : function () {
        return "quack";
    },
    "setStr" : function (str) {
        return Foobar.prototype.setStr.call(this, str + " quack");
    }
});

This prototype mess is how we call a method on the super class in JavaScript! Yes, it's normal. Yes, it's ugly. Yes, that's why Simple Inheritance adds more sugar in the form of this._super! The problem to recognize is that binding the instantiated object reference (this) to a method on the prototype chain is hard to do in a generic way. John's solution is creating a proxy method called _super which redirects method invocations to the same method name on the super constructor. It manages this via a series of closures to retain scope in the prototype chain.

In order to create each proxy method, Simple Inheritance adds checks to the descriptor iteration to find methods (functions). For each method found, a RegExp test (!) is run to determine whether a proxy should be added. This, at long last, answers our original question, "why does Simple Inheritance need to decompile functions?" Frankly, it doesn't need to decompile anything! But as an optimization, it won't add the proxy to methods that don't need it. And it determines which methods need the proxy by decompiling the function; looking for a string that matches _super. So if _super is ever used in the method, it gets a proxy that adds the _super property!

"But isn't decompiling functions slow?" ... Yeah, probably!

"And isn't running a regular expression over a long function-as-a-string also slow?" ... I would say so!

*sigh*

And so began my adventure to make Simple Inheritance fast! Actually, I did so much of my own work here to reinvent Simple Inheritance that I'm just going to release it on its own, with a nod to John Resig for the inspiration.

I made _super fast by adding the proxy within the Class constructor, rather than during prototype creation time. This means that none of the methods get proxied automatically; instead, _super is the proxy. After going through multiple failed tests, I found the best interface for _super is actually closer to Python than Simple Inheritance. Notice that the Python super() function takes two arguments; a reference to the super class, and a reference to the instantiated object.

All that means is _super in Jay Inheritance looks like this:

var B = A.extend({
    "foo" : function (arg1, arg2) {
        this._super(A, "foo");
        this._super(B, "bar", [ arg1, arg2 ]);
    },
    "bar" : function (arg1, arg2) {
        // ...
    }
}

Quite a bit different from Simple Inheritance! And there is also a big difference between my final implementation and Python's super() function, which returns a proxy to the super class. If I were to do that, it would require another iteration, which I'm not willing to do, for the sake of raw performance. So instead, my _super interface accepts a reference to the super class, and a method name, followed by an array of arguments that will be proxied to the super class method.

Requiring a reference to the super class also allows calling sibling methods on the class, without going through the normal prototype chain. This is useful for base classes where you don't want to call overridden methods. Simple Inheritance does not have this functionality in syntactic sugar form; you'll have to use the ugly prototype mess for that.

As an added bonus, I also include mixin support, which allows adopting methods from a class without inheriting its prototype.

Finally, I've wrapped all of this knowledge into a single project called Jay Inheritance, named after yours truly. Here is the official gist: https://gist.github.com/parasyte/9712366 and I also have the same code on jsfiddle, where the tests can be run directly: http://jsfiddle.net/Qb7e8/6/ (Update, July 2015: the project has been release as jay-extend: https://github.com/parasyte/jay-extend and is available through dependency managers npm and bower.)

Jay Inheritance is as screaming fast as classical inheritance patterns can get in JavaScript. From here, we need to refactor melonJS classes to use it, and also to make the classes much smaller! Objects with too many properties are hard for JavaScript engines to optimize. For example, the V8 engine in Chrome switches to slow-mode when an object has about 30 properties. Source: http://youtu.be/XAqIpGU8ZZk?t=25m07s Property access also uses type-specialized optimizations, so we should not be changing the data type stored in any properties, nor adding or removing properties outside of the constructor.

In the next article, I'll talk about some modifications that were recently made to Simple Inheritance in melonJS, and why this was a bad idea from a performance perspective. I'll also go over some ideas that we can use to improve the situation in the future with the inheritance pattern.