Water Simulation in the Blink of an Eye

Fluid simulations, especially non-compressible fluid simulations, are certainly not a new idea. They often are used for simulating aerodynamics, flow throughout pipes, heat transfer, you know… practical purposes. Unfortunately, highly realistic fluid simulations don’t really make their way into video games. Realistic simulations could add interesting, emergent, gameplay and help breathe life into whatever game they’re in. Some approximations have become somewhat commonplace in higher budget games like in Sea of Thieves. Some simplified simulations are seen in various indie games, like Noita. Furthermore, smoke particle effects can be pretty convincing. However, really accurate, interactive, fluid simulations aren’t seen very often because they are very expensive to compute. In reality, realistic water simulations are actually quite doable… just not for us three dimensional beings.

Changing the dimensions of our simulation from 3D to 2D moves a fast and accurate water simulation into the realm of possibilities as the algorithmic requirements loosen greatly. Unfortunately, this removes some practical applications but doesn’t remove the possibility of making a game featuring it! This now leaves us with a choice, do we want our two dimensional simulation to be from a bird’s eye view, or from the side, like an aquarium. I chose to go for a bird’s eye view (top-down) approach because that lends itself better to the type of game I want to make, an action-adventure game. An aquarium (side-view) style fluid simulation would be better suited for a game with more platformer elements, like Super Mario.

Different 2D perspectives in video games

“Basics”

There are two main ways of going about making realistic fluid simulations (in both 2D and 3D), Lagrangian and Eulerian. Lagrangian simulations use a bunch of small particles to represent clumps of water that move together and interact with other clumps of water. Eulerian simulations treat the water like a grid of cells, each containing the average properties of the water in them. For our particular need, an Eulerian water simulation works better as it seems to be more geared for top-down situations. It also allows for other systems to more easily access and change the data of the water which is important if we want to make a game involving it.

Different ways of simulating fluids

This isn’t really supposed to be guide on how to make a fluid simulation, so I’ll spare you some of the gritty details in making it initially work. There are a couple videos and papers on that part already. In short, a realistic water simulation should abide by the Navier-Stokes equations. The ideas of these equations are translated into a few different types of functions: diffuse, advect, and project. These rules are enough to make a realistic 2D simulation but there aren’t really any features yet such as adding water to the system. Not to mention that many of online resources only make the simulation take place in a big square which isn’t very interesting.

Features

The first big issue is that the shape of our fluid simulation should be able to twist and turn in order to make for interesting levels in the game. More specifically, we want to be able to enforce boundaries of some sort where our simulation isn’t simulating. Fortunately, online resources do cover the idea of how these boundaries should behave so we don’t need to be mathematicians to make boundaries work more generally. There are plenty of little decisions I made when implementing boundaries: how they are stored, how the boundary conditions are enforced quickly, what happens inside of the boundaries. Some of the answers to these questions are clearer than others but I like to think that my solutions work pretty well.

Gravity was also an important feature to add as it is what allows for the water to naturally accumulate speed. Flowing water only gains velocity because of gravity as some of the potential energy of the water is converted into kinetic energy according to the standard energy equations. Implementing gravity involved adjusting the velocity according to the height difference of the surrounding cells by using said energy equations. Now with a simple source of water and sink, the water naturally flows just like you’d expect it would.

What can be achieved with boundaries, gravity, sources, and sinks

Profiling

Now that the simulation is realistic, it’s time to try and make it fast. Since each particular instance of the fluid simulation in the game is shaped differently, and of different sizes, trying to measure all of their performances against each other is like comparing apples to oranges. For simplicity, I profile the fluid simulation based on a large area where almost the entire area is part of the simulation and serious computation is happening. While this particular worst-case scenario might not ever appear in game, if it runs smoothly the game will surely perform well for simpler cases.

First comes some basic measurements in order to identify what to focus on as well as figuring out if we even need to optimize in the first place. When I first started to seriously get into optimizing, the test scenario took roughly 9 milliseconds (ms). For a game trying to reach 60fps, it takes up more than half the time in a frame. This is really bad as we might not be able to maintain our framerate with everything else going on during the game loop. Luckily for us, most of the runtime of the simulation happened in one subroutine, “linear solve”, which takes about 1600 microseconds (us) per call. This routine is called multiple times throughout the update step and loops through every cell in the simulation several times. This lets us focus on something specific and use the ideas when optimizing that for our other functions.

Vectorizing

When performing a lot of computation on large arrays of data, like in our case, one way to drastically speed up the CPU’s performance is to vectorize the data and use SIMD operations on it. Each cell needs different computation based on whether it is in the simulation or not. This means that within one vector different actions need to happen. One main optimization of SIMD is that you’re doing the same operation to a group of data. So how do we vary what we do within a vector?

SIMD can do many computations in one instruction

I was able to use branchless programming to account for the fact that we need to take different action within a vector. I first create a mask vector based on whether each cell in the vector is being simulated or not. I then do the calculation for every lane in the vector as if they all were being simulated. Finally, I use the mask to keep the calculation for the lanes that were in the simulation, or undo the computation for lanes that weren’t supposed to be simulated. This optimization was able to make the routine about 4 times faster. However, my CPU can do 8 floating point operations at a time. Wouldn’t an 8x speed increase be expected?

Instruction Level Parallelism

When investigating the profiling data from my routine, I noticed something weird. The first load instruction was taking roughly half the time. There were about 8 load instructions per iteration of the inner loop. Why was the first one taking so long? It turns out that the first instruction needed to wait for the previous iteration to write out its data before executing because of the order the computations were performed. The solution was actually pretty simple. Instead of sequentially accessing the data, access every other chunk of data and then come back and fill in the gaps. Essentially do the even indices, then the odd ones. This removed any local dependency chains as one iteration didn’t rely on the previous one’s data.

After some other small tweaks, our routine runs around 12 times faster then it did starting off. But how do we know when to stop? Can we do even better? Using the profiling info, I figured out that each iteration took approximately 8 cycles where we do roughly 10 reads/writes and 12 arithmetic operations. Knowing that my CPU can do 2 read/writes and 2 arithmetic operations per cycle, it seemed like 8 cycles was about as good as it is going to get.

CPUs have different capabilities for different types of instructions

Hardware Acceleration

One other direction to go when trying to speed up the simulation time is by utilizing the GPU by converting all the subroutines into compute shaders. I implemented this idea fully and unfortunately the shader approach wasn’t quite as good as the SIMD approach. Since the data would need to be transferred to the GPU and back to the CPU every frame, too much time is spent even though the GPU does its computations almost instantly just from the time spent moving the data. The shader strategy is also much more complicated and harder to debug than just using the CPU. 

This isn’t to say that trying to use the GPU for the simulations will never work, just that my first run at it was only a mild success. Eventually I might try to use the GPU again with better support for multithreading so the CPU doesn’t need to wait for the GPU as much. I’m more optimistic about rendering the simulation with the GPU in order to properly show its detail as that doesn’t need to communicate back to the CPU.

Conclusion

So it seems like making real time fluid simulations for games is definitely possible. One full iteration of the simulation was able to be reduced from about 9 ms to 750us, a roughly 12x speed increase after optimizing all the other subroutines using the strategies I mentioned for “linear solve”. In order to further boost performance, multithreading can be used to updated several simulation instances at the same time trivially. Optimizing the simulation was very interesting and I was quite surprised at how much speed could be gained. In this particular instance, more iterations of the simulation leads to a more accurate simulation so all of the optimizations were well worth it. See if these optimizations could make their way into your projects as making something an order of magnitude faster without drastic changes is almost always worth it.

Simple example of my implementation of a fluid simulation

Previous
Previous

Life Without Malloc