Who is this for?
If you are getting started with Unity’s Data Oriented Technology Stack (DOTS), or trying to wrap your head around all the terminology, and what is what, this is for you.
Programming Concepts vs Unity’s Concepts
You can think of these as the foundation of Unity’s DOTS. So, to understand DOTS, we need to understand its foundations first.
Data-Oriented Design
CPU-cache
.
Source Wikipedia
You can think of the CPU-Cache
as a small, but very fast memory (RAM
) on the CPU
.
When the CPU
needs to fetch some data, it will #1 Check the CPU-cache
* first. If the data is present, we get a 1.A Cache-hit
, and the CPU
can resume processing.
If the data was not found, we get instead a #1.B Cache-miss
and need to fetch the value from RAM
.
But the catch is that we can’t copy just an individual value from RAM
. We need to copy #2 Copy an entire chunk of memory.
* CPU
s have several levels of Cache
(L1-Cache
, L2-Cache
, L3-Cache
). For simplification I just draw one. But the process is similar. Check L1
, if not check L2
, if not check L3
, if not check RAM
Why is might be a problem?
When we program using an Object-Oriented
approach, using MonoBehaviors
, we commonly find something like this:
public class MyButton : MonoBehavior {
public MyDoor DoorGO;
// ... more attributes
private void Update() {
// ... some code logic
if (ifButtonWasPressed) {
DoorGO?.OpenDoor();
}
}
}
We normally don’t tend to think about data locality –where data lives in memory– much in this form.
The result is something like this:
Without having control of where data is stored, we end up jumping around memory quite often, resulting in a bunch of cache-misses
.
A solution
Now that we know how the underlying hardware operates, imagine we re-order the data from the previous scenario to look like this:
By arranging the elements sequentially, we incur only a cache-miss
on the first element, the rest are already in the CPU-cache
and we get a cache-hit
, hence reducing the time it takes the CPU
to process the data.
Data-Oriented Design is about designing programs that efficiently use the
CPU-cache
. Said otherwise, tend to reduce the amount of CPU-cache
misses.
Memory Management
If you are making games in Unity, you use C#, which has its own Memory Management System.
The 👍 good thing is that we don’t have to think about the underlying memory when programming.
The 👎 bad thing is that we don’t have control about how the underlying memory is access / managed.
CPU-cache
, you can start seeying why this might be a problem.
This also has implications when talking about multi-threading.
How does it work in lower-level languages (C, C++)?
When we look at languages that are not Garbage Collected, we usually start talking about Pointers
to memory and manual memory allocation.
The 👍 good thing is that we gain control about how the underlying memory is access / managed.
The 👎 bad thing is that we have un-restricted access to memory, and we are responsible of managing this with caution, and to manually remove the data that is no longer in use.
Data-Oriented Design requires the ability to self manage memory allocation, in order to place data in memory to make efficient use of the
CPU-cache
.
Multi-threading
You probably know that CPU
s have multiple cores. And each core is capable of doing operations at the same time.
One Core is the one running the Engine (Unity), and it’s known as the main thread
. The others are known as worker threads
.
When multi-threading there are two aspects of it:
- Running a part of our application in a
worker thread
- Splitting a part of our application in several
task(s)
that can run in parallel.
Multi-threading is about running parts of your program in other
cores
, outside of the main thread
, possibly even splitting them into several tasks
that run in parallel.
This is easier said than done. Multi-thread programming is a deep topic, and one that is quite complex at its core. But for the purpose of this post, its as far as we go here.
Compiler
An oversimplification of the process, but the idea remains. Normally C# code is compiled into an Intermediate Language (IL)*. In Unity’s land, Mono is also an IL.
* Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL). Wikipedia.
This code is later read by a virtual machine and translated into platform specific code*.
* This is also known as just-in-time (JIT) compilation. Wikipedia.
A better way
An alternative to this, is to compile into platform specific code once for each target platform we want to target our game.
And there is more 😎. By doing Data-Oriented Design
we can leverage other compiler optimizations, like Single Instruction Multiple Data (SIMD) instructions.
It's outside the scope of this post, but Unity is providing us exactly this with his Burst Compiler. It's amazing 🎉💃
Entity Component System (ECS)
An Entity
is an ID
that identifies a unique entity in the world.
E.g. You can think of them as the equivalent of a GameObject's Instance
Each Entity
is formed by one –or more– Components
, each with it’s own data.
Systems
operate over the Components
of Entities
that match a given criteria.
E.g. In the image we see a System
called MoveSystem
that operates over Entities
with a Location
and Acceleration
.
Key differences between ECS and Object-Oriented
So while Objects
tend to hold both data
and code
together, in ECS
, Components
hold the data
, and Systems
hold the code
.
And in Objects
, we tend to think in single units (batch operations are the exception, not the rule), like Player.Move()
.
On the other side, Systems
tend to operate in batches of units, like “for all Entities
with a Location
and Acceleration
, I will move them”.
ECS and Data-Oriented Design
While
ECS
does not require a Data-Orieted Design
per se, it pairs really well with it.
By separating data
from code
, we can store data
taking into consideration data locality.
By having Systems
operating over batch of Entities
at a time based on a criteria –Components
present in an Entity
–, we can have several Systems
instead of dealing with inheritance.
This –not dealing with inheritance– has an positive impact on the CPU-cache
, but not just on the data, but also on the code.
Yes, CPU-cache
also affect your code. When you have Polymorphism, you are likely incurring CPU-cache
misses.
Next steps
I wrote this because when I started with Unity’s DOTS I was a bit lost. The concepts are not that hard to grasp (mastery is another thing entirely 😅), but when thrown at you all at once, it was a bit overwhelming.
As next steps you can:
-
Dive deeper into the concepts described in here. Specially around
Data-Oriented Design
. -
Start delving into Unity’s DOTS implementation.
Here are some links
To start learning about Unity’s DOTS implementations:
Start here 👉 Basics of DOTS: Jobs and Entities, Unity Learn project
This is an intro to the concepts
I really recommend this bootcamp. But be wary, it’s not holding your hand. Expect to revisit parts of it more than once, and have the code open while going through the video. Pause, experiment a bit, repeat.
Unity’s Github repo with DOTS samples
To learn more about Data-Oriented Design:
🎬 Solving the Right Problems for Engine Programmers, Mike Acton, TGC 2017
This is a talk aimed towards Game Engine Programmers, so be aware. You are going down the rabbit hole fast.
A topic outside of this blog is composition over inheritance:
🎬 Is There More to Game Architecture than ECS?, Bob Nystrom, Roguelike Celebration
Nystrom is also the writter of the 📚 Game Programming Patterns book. A really good read 😉 and the main inspiration for this blog.