How I tried to get into game development and failed

July 17, 2017

If you read this blog regularly, you know that I usually write about DDD, functional programming, and enterprise software development best practices in general. These are the techniques I enjoy talking about, as well as applying them in my own projects. However, there was another field I always wanted to try myself in. That is game development. The ability to write my own games was the reason why I started learning to program in the first place. And I believe many programmers had this motivation behind their careers as well.

But let’s start from the beginning.

Inception

I wrote my first game when I was about 14 years old. It was a poor version of Minesweeper - a ubiquitous game shipped with every version of Windows. I wrote it in Turbo Pascal. It was a console application where you had to enter the coordinates of a field with the keyboard. Not the best choice in terms of user experience but I had no idea how to work with the computer mouse back then.

Since then, the idea of writing a game was always in the back of my mind. I made numerous attempts to implement it but never made it through. In the beginning, it was mainly due to my impatience. I never had enough motivation to finish things that are both hard and don’t yield measurable results in the short term. Later in my career, it was more of a calculated choice. Game development takes a lot of time, it’s completely unrelated to what I do for a living and doesn’t have any financial upsides. So the circumstance was never good enough to dive into this field, despite the fact that I always wanted to.

Until last year.

I don’t know if you are familiar with this phenomenon of io games, so let me give you a quick explanation. Up until maybe 2014 or so game development meant one of a couple options (although, I’m not into this industry much, so might be wrong on that).

The first one was small flash single player games similar to what you can find at kongregate.com and alike. They are good for spending a couple hours but that’s usually it. You forget about them completely as soon as you figure out the mechanics or go through the entire game content. They usually don’t possess much replayability. Only the best bring substantial monetary reward to their creators.

The second option is large AAA games like Civilization, Call of Duty and so on. Needless to say, they require too much of investment to be feasible for an indie developer. There’s a better alternative nowadays - Unity. It makes the development process easier but the distribution model stays the same - you need to either publish it on Steam or find a "traditional" distributor to work with.

Both options didn’t work for me in terms of the ratio between the time investment and the potential outcome. Flash games require little investment but bring too little in return. "Big" games could potentially become a hit in terms of financial results, but require years of work, even with Unity.

There’s also an option of mobile gaming but the competition there is even harsher than on PC nowadays.

And here came the io games which provided a unique combination.

They are simple, even trivial to play, but not so easy to master so there’s some learning curve involved.
They have a multiplayer which adds a high degree of replayability. It’s always more fun to play with a real player than it is with a bot.
They are easy to develop. The WebSockets technology has opened an immense opportunity for multi-player online games. No need to build a desktop client anymore, and no need to find ways to distribute your game as you can now build it using plain HTML and Javascript.

In short, an io game is a multi-player HTML game. The name "io game" is due to the fact that, historically, such games have been hosted in the .io domain. One of the first such games were agar.io and slither.io.

So, as you can see, the effort required to build an io game is not much higher than it is to develop a typical flash game. Unlike typical flash games, though, an io game can potentially attract more players because of the multi-player aspect.

And what about the potential outcome? This is where the things start to become very interesting.

In July 2016 I read this article about slither.io. It says that the author of this game pulls more than $100,000 in revenue daily. A hundred. Freaking. Thousand. A day.

Me after reading the article

I decided that it was time to act. I figured that even if the article is off by 2 or even 3 orders of magnitude, I would still be doing very well should I develop something similar. So here began my game development journey.

The idea

I had lots of ideas for the game from my past failed attempts. Most of them revolved about strategy and MMO RPG games. That’s because I personally prefer strategies (such as Civilization) over action/arcade type of games.

However, I decided not to go this route as the game logic would be too complex to implement and thus the whole endeavor would become too risky. It’s hard enough to dive into a completely unfamiliar field; taking on additional hurdle of working on a complex game would add up unnecessary risk.

Also, I’m a big fan of the stairstep approach. If you want to open a new area for yourself, start small. Learn what you can from this experience, then expand into something bigger. In the case of game development, that means creating a simple game first, almost an MVP, and then move on to something more ambitious.

The resulting game turned out to be not as simple as I expected, just as any other project would with that many of unknown variables. But I’m getting ahead of myself.

So I threw away my initial plans to implement the dream game and instead focused on coming up with something simple but still fun to play. And there the idea of a robot game was born. In it, you would control a robot and try to shoot enemy robots, also controlled by real players. No 3D graphics, just top-view 2D.

Now it was time to decide which development stack to use. The client was going to be HTML + Javascript, there was no doubt about it. However, I thought about using a transpiler as I didn’t like the dynamic typing that comes with vanilla Javascript. For a short moment, I considered using F# with something like BABEL, Fable or FunScript but threw this idea away. I wasn’t proficient enough in F# (and still am not), and so I decided that the additional risk of adding another unfamiliar technology is not worth the potential benefits of using a purely functional language. Also, I was certain that the game would require some pretty low-level optimizations and so getting as close to bare JS as possible would be beneficial in the long run.

My second option was TypeScript. It’s not that different from pure Javascript, can be used in combination with Javascript, and provides a strong type system you can get advantage of during development. Overall, TypeScript looked like a logical choice for someone with a Microsoft stack background. It almost felt like C# for browsers.

Moving forward, TypeScript proved itself as a good choice. The thing I like the most about it is optionality of its type system. You can always fall back to the plain Javascript where needed.

As for the server side, the choice was pretty clear to me as well. I was going to use C#. At the time I started the development (August 2016), .NET Core wasn’t mature enough, had no out-of-the-box WebSockets support, and so I stuck to the "big" .NET Framework.

Proof of concept

I quickly developed a prototype in order to check the fundamental assumptions and the technology stack. Here’s what it looked like:

Two "robots" staying across each other

The two red squares represented robots. You could enter the game by visiting the game’s front website and have a "robot" assigned to you. You could move the square around and see other players' movement.

Everything looked good overall, and so I moved on to step 2: finding a partner. I knew that taking this endeavor on my own would mean longer development time, more problem with unfamiliar technologies, and, therefore, additional risk.

And so I asked a colleague of mine - one of the best programmers I know - to participate in this project. I laid out its potential perspectives, and he agreed. He had some personal tasks he had to take care of at that time and I needed to finish my Pragmatic Unit Testing course, so we took a small break. Starting from late September 2016, however, we were working on the game day in and day out during our evenings, weekends, and holidays up until April 2017.

Looking back to this period of time, I have to say that I rarely had a period in my life when I worked so much during that long, and I consider myself a pretty hardworking guy overall. The game absorbed me completely. During those 6 intense months, I stopped working on courses for Pluralsight and scaled back my blogging activity from 1 post a week to 1 post every 2 weeks. I even skipped a New Year vacation my wife and I were supposed to take - the choice I now regret making.

OK, back to the proof of concept. The main objective of it was figuring out how to deal with lag compensation. If you read anything about online game development, you notice that the most fundamental problem there is perceived latency. You absolutely have to ensure that the delay between some action the user takes and the result of that action is minimal. There’s nothing more critical to the game dynamics than this. A noticeable lag, even when it’s not that large, can make it completely unplayable. The game should feel as if there’s no network communication at all.

And the more interactive the game is, the harder it is to achieve this kind of experience. For example, a multi-player mode in a turn-based game like Civilization doesn’t require much effort to implement. Players wait for each other’s turn anyway; they don’t see the results of other players' actions until they are complete.

It’s a different story with games like Quake. Quake or, say, StarCraft must inform your opponents about every move you make as soon as you make it. It’s crucial because those games are highly interactive; any lag could slow down the counter-measures your enemies undertake and therefore worsen their position in the game.

I brought up these two games for a reason. There are interesting stories behind the way they implemented their multi-player modes.

There are basically two ways you can play with other human beings: via a local network connection (LAN) and via the global one (WAN, the Internet). The difference in lag between the two is huge. While the ping between computers in the local network is usually within a single digit millisecond mark, the ping between machines connected via the Internet can reach hundreds of milliseconds and more.

The minimum delay a human can notice is about 100 milliseconds. Everything less than that constitutes a great experience for us, basically a no lag situation. We do notice delays above 100 ms. And the bigger the delay is, the more it irritates us. A big lag degrades our perception of the game. You will have to come up with a really great game play in order to make players tolerate such delays.

So the difference between LAN and WAN is that it’s very easy to create a responsive and fast multiplayer game where all players are connected via LAN. And it’s not so easy to do that when playing via the Internet.

This entails two completely different programming models. With LAN, you can implement a simple and intuitive interaction routine between all players:

Multi-player mode implementation

This is (more or less) how the multi-player mode is implemented in StarCraft. When you command a soldier to move, he doesn’t move right away, your computer sends a message to other players first and applies the change only when it receives its own message back. Basically, all computers modify their state simultaneously, the initiator doesn’t have a priority here.

This works due to the fact that the LAN connection is super fast. You can send tons of messages to each other and the delay would be unnoticeable. Which is perfect as it enables very simple programming model.

The situation changes when you start communicating via the Internet. You no longer can just make the player wait while the messages make a full circle before showing the effect of their actions. This would make the game virtually unplayable.

That is what Quake’s developers experienced when they tried to use the multi-player mode they developed for LAN in the setting of the Internet. The increased lags made the game unresponsive to the point of being unusable. John Carmack then had to make a full re-write of the multiplayer engine in order to compensate for those lags. Here are a couple of articles where you can read more about the compensation logic: one, two.

In short, you have to treat the player’s character differently from his enemies. You no longer can afford the simplicity of the programming model built for LAN. The reason being is that the player is susceptible to his own moves more than he is to the moves of his enemies. When the user starts firing, he expects the rifle to respond momentarily, no delays are acceptable here. But it’s not the case for the fire started by his enemies. There’s no way for the user to know when they started firing anyway unless the players sit right behind each other.

So on one hand, you must show the player’s actions in the present time: any move or update performed on the main character must be shown immediately. This is done by client side prediction. Here’s how it works. In most cases, you can safely assume that the action the player takes will be acknowledged by the server and act as if it already is. In other words, show the character responding right away. Then, when you receive the actual game state from the server, correct the local state. If the prediction is implemented correctly, the difference would be small and the correction will go unnoticed.

And that works fine in the vast majority of cases. The only time when the client and the server might go out of sync is when there’s an unexpected network lag between them. In this case, the client prediction can be off by a large margin. I’m sure that you as a gamer can remember instances of when that happened to you.

For the enemy actions, there’s no way you can do any kind of prediction as the only information you have about them is the one you receive from the server. The problem here is that this information itself comes with a delay because of the time the message needs to travel from the server to you.

So what you have here is this convoluted situation where your own character acts in the present (no delays involved), but the enemy characters are shown in the past (because of the ping time between you and the server). And that means that any time you aim at someone, you aim at the position in which that character was hundreds of milliseconds ago. Kind of like when you look at the stars, you see the state they were in years ago due to the time it takes the light to travel through space. Consequently, when you shoot at a moving enemy, you will miss, even though you have the enemy right before your gunsight.

So how the games like Quake deal with this kind of issues? When they see a message from your client about trying to shoot someone, they rewind the game scene backward to see where exactly your aim was at the time of shooting. If you indeed targeted the victim correctly, you will get the hit. The downside of this technique is that the victim can potentially get the hit even after they hide themselves. From a victim’s perspective, it could be that they quickly pass some open space, hide behind a wall, and then get killed by a lagged shot. Again, kind of like relativity of simultaneity - one of the principles in special theory of relativity in physics.

Overall, though, it’s a small price to pay for the opportunity to get better user experience in the face of slow Internet connection.

Alright, why am I telling you all this? I just wanted you to know how important this topic is for any kind of highly interactive multi-player game. My colleague and I tried to tackle it first, before moving forward with the actual game implementation. For about a couple of weeks or even a month, all we had gameplay-wise were red squares moving around and shooting each other with black dots.

Finally, after some period of trial and error, we had a good solution to deal with lags. In order to test it, we inserted random artificial lags in two directions: from client to server and from server to client. The application worked around those lags pretty well, they were mostly unnoticeable for the player.

What we ended up with were three models. The first one was client prediction for the player’s character. It was implemented using a technique similar to event sourcing. Whenever the player did something with the robot, the client sent a message to the server and applied that action to the local game scene right away to avoid delays. The event also got recorded to a local list of pending events. When the client received an updated game scene from the server, this update contained the number of the last processed event the server received from the player. The client then overrode the player’s position and reapplied all pending events starting from the one which was not yet processed by the server. This ensured smooth and responsive experience when the player controlled his character.

The second model was for the enemy characters. When the client received updates about the enemies, it didn’t just change their positions on the local game scene, it interpolated them. Interpolation means that the application showed the transition between the previous and the current states smoothly, without weird and quirky jumps.

And for the bullets, we used extrapolation. Which means that the client simulated the position of the bullet in the future and displayed it a little bit ahead of the actual position received from the server.

During the development period, we experimented with these models further and made small adjustments to them. The final version ended up using extrapolation for both bullets and the enemies. The reasoning behind the initial design was that it was easy to simulate the position of a bullet because, once created, its speed was constant and could not change. As for the enemy robots, they could move unpredictably and so it was too risky to extrapolate them taking into account that this extrapolation will most likely be incorrect. But as it turned out, this fear was unfounded. Even though the enemies did move randomly, the resulting position corrections were small and hard to notice. At the same time, extrapolating the enemy positions meant that all objects on the game scene resided in the present time relative to the player’s character. And that simplified a lot of things for us moving forward.

This lag compensation algorithm worked (and still works) extremely well. With a decent Internet connection, the game feels like you are playing it locally. You can see lags only if your connection is slow or if you are physically too far away from the server (which at the time of writing resides in US East).

Not only that, the algorithm also allowed us to send fewer messages from the server comparing to other similar games. Because of the extrapolation logic we implemented, the client behaved great even with fewer data points. And that meant less work for the server (and thus better performance) and less traffic cost.

Issues with client garbage collection

So, getting back to our story timeline. In parallel with the compensation mechanism, we worked on issues that arose on the client. The game delivered good 60 FPS for the first minute or two but then the rate plummeted to 10 or even 5 PFS. Which was an obvious show stopper.

We fixed the issue pretty quickly. It turned out we instantiated the game scene several times, hence the immense pressure on the Javascript garbage collector. Another change that helped a lot was combining all game loops into a single one. Game scene rendering, user input processing, and applying updates from the server - all this was processed by a single handler now.

By the way, we considered using an existing UI engine for the game but decided not to do so. A 3rd party engine meant giving away a lot of control over the low-level implementation details which could potentially affect our ability to deal with performance issues down the road. Also, we estimated that the time needed to learn that engine would be roughly the same as the time to learn how to implement the same features using plain HTML canvas. The game was a 2D shooter, so the rendering part wasn’t too complicated anyway.

The game client optimization, especially garbage collection optimization, was our ongoing task moving forward. We sacrificed some programming best practices in order to improve the performance. Specifically, we gradually departed from the use of immutable data structures on the client in favor of mutable ones just to decrease the pressure on the GC. Here’s for example what our Vector class ended up looking like:

export class Vector {
    private _x: number;
    private _y: number;

    public getX(): number {
        return this._x;
    }

    public getY(): number {
        return this._y;
    }

    public set(x: number, y: number): void {
        this._x = x;
        this._y = y;
    }
}

Note the set method. I wouldn’t recommend anyone to use this practice in a typical enterprise application :)

Issues with server garbage collection

It was October 2016 and the lag compensation mechanism worked well for the most part. But there still was something strange it how the game behaved. From time to time, the lag between server and client spiked. So much that the prediction algorithm couldn’t handle it and the game showed sudden jumps for the objects on the game scene.

I rechecked the math, double checked all prediction rules, tried to change them, but nothing of that helped. Then we turned to the server. Added excessive logging to catch exactly when that happened, started monitoring CPU and memory usage and sure enough, the pattern soon emerged:

Issues with server garbage collection

This was a VM in Azure which we used as our test game server. Note that on that picture, the server was handling a single game with no players connected to it. So it’s not that we were overloading it or something.

The server ought to have been processing the game scene every 15 milliseconds, and it did for a while. Then the time between ticks increased to 20-30 ms and then it dropped again. This variation resulted in sending game updates with unequal intervals which ultimately led to objects jumping on the game scene.

We tried to tame the garbage collector, tried to tweak its settings, increase and decrease the time between its executions, nothing helped.

After further investigation, it turned out that we also had an issue with the WebSockets library we used on the server. It didn’t handle the volume of messages we tried to sent with it. Or rather, it did handle it but sent those messages with unpredictable delays which also added up to the perceived game lag. Although, that could be due to the issues with the garbage collection.

After days of trial and digging into the WebSockets implementation, we decided that enough is enough. Screw it, we were going to rewrite our server in NodeJs.

It took us several days to convert our code from C# to TypeScript. And when we finally did that, a miracle had occurred. All those problems we were dealing with during the good part of October just went away. No lags. No issues with unequal distribution of messages from the server. No performance problems. Nothing.

Note that it was the exact same code converted from C# into TypeScript, the implementations were identical. We just copied .cs files into .ts ones and fixed the compiler errors. The only exception was the WebSockets implementation, we used one of the NodeJs libraries for that.

I must say that I was skeptical about this whole "Javascript on the server" thing in the past. To me, it always looked like a dumb idea. Apparently, I was wrong. NodeJs proved to be a good solution for our problem: it is fast, stable, and easy to work with. And TypeScript allowed us to keep the strong typing and not deal with the bare bones JS, so I had nothing to complain about.

There were two other benefits we had after the rewrite. First, we were able to share the code between the client and the server. Which was a big thing as they had a lot of logic in common. Second, it opened better hosting opportunities as Linux is cheaper than Windows Server and is much more prevalent. Currently, we use Ubuntu as our host OS.

Change of game concept

Alright, I’m getting ahead of myself once again. So in the beginning of November, we had a working prototype where we solved the majority of our hardest problems. We had red squares firing to each other with black dots and they were doing that very well in spite of lags and poor Internet connection (which we simulated on both the server and the client). It was time for the actual game play and content development.

As I mentioned earlier, we were going to create a 2D top-view robots game where you control your character with WASD or arrows and shoot enemy robots with your mouse.

We found some robot animation on the Internet (as a temporary solution). Meanwhile, we started looking for a freelance designer who could draw the game art for us.

This is what those temporary robots looked like:

Temporary robots

The robot foundation rotated independently from its head. You could move in one direction and point the robot’s head with your mouse in another. These two features - movement and firing - were completely separate. This made the game more fun in my opinion. Alternative solution - where the robot’s head direction depended on the direction of its legs - would have been more accurate but then again you would have to think more about how to control your robot instead of focusing on the game’s main goal - shooting everyone around you. We tried to make the game as simple as possible for the players.

Two more game concept we started implementing were walls and perks. Walls added more tactical variety as they allowed you to hide behind them and be safe from the enemy fire. Perks did the same thing but from a different angle: they allowed your robot to have a temporary advantage in terms of speed, health, fire power, and so on. Walls were destructible, so you still could get your enemy, it just took more time as you needed to destroy the wall before him first. As for perks, they appeared on the map from another game object - crate. Both walls and crates respawned on the map periodically.

The development moved fast after we finished with the lag compensation algorithm and fixed our problem with garbage collection.

This is what it looked like soon:

On this video, my colleague shoots two lonely bots.

As a side note, I must say that the game we were developing was heavily inspired by NES’s Battle City:

Battle City for NES

I don’t know how famous this game was here in America but in Russia, everyone of roughly my age played this game back when they were kids. To say this game was famous in Russia is an understatement. Everyone knew and loved it.

And by the way, the bricks on the video are from Battle City. This was also a temporary solution, of course, we were going to replace them afterward.

We implemented lots of things during November: collision detection, score calculation, HP, perks, centering (when your robot always stays at the center of the map and its other game objects that move when you walk, not your robot), and so on and so forth. We also added bots. We knew that in the beginning, there would be few real players online, so the game needed to offer those players something even when there’s no one else playing it yet.

This is a video of a robot fighting with lots of bots at once:

Note the circle around him. It represented a force shield which made the player invincible for some period of time.

And here’s a video of us both fighting against evil hordes a couple of dumb bots:

By the way, on this video, you can notice that two bullets annihilate when they come close to each other. We had this feature for some period of time. And it added a nice tactical capability - dodge the enemy bullets by destroying them with those of your own. But we removed it later when we were optimizing the game. Collision detection of bullets consumed a lot of CPU time, and we decided it’s not worth it. More on this later.

In the mid of November, we found a designer and got estimates from her as to how much it would cost to create a robot animation. It was a lot, and not because she was overpricing this work. An animation similar to what you can see on the video is genuinely hard to implement, there are quite a few high-resolution details that need to be worked out.

After some thought, I proposed to change the game concept. Let’s turn robots into tanks, I said. There were several reasons behind this proposal. First, tank games are ubiquitous and the players would have much easier time relating to them. Second, with the top-view we had in the game, we could show tanks in much more detail in comparison to robots, just because robots don’t look that attractive when looking at them from the top, but tanks do. And finally, the animation was simpler and the designer said it was going to be much cheaper for us to have them instead.

So here we were. We decided to switch from robots to tanks and started working on our first tank design.

Conclusion

Heck, I’m 5 thousand words into this post already, and only covered everything up to November. There’s still at least as much things to write about that happened between December and April.

Let me know in the comments below if you want to read the rest of the story. What I think I could do is split it into two or three parts and call this one the first.