Who is this for?

If you are getting started with Unity’s Data Oriented Technology Stack (DOTS), or trying to wrap your head around all the terminology, and what is what, this is for you.

🤚 This post focuses on the general programming concepts from which Unity's DOTS is based, not in Unity's implementation of them.

Programming Concepts vs Unity’s Concepts

You can think of these as the foundation of Unity’s DOTS. So, to understand DOTS, we need to understand its foundations first.

Data-Oriented Design

💡 Data-Oriented Design (DOD) is an optimization approach to create programs that make efficient use of the CPU-cache.
Source Wikipedia

You can think of the CPU-Cache as a small, but very fast memory (RAM) on the CPU.

When the CPU needs to fetch some data, it will #1 Check the CPU-cache* first. If the data is present, we get a 1.A Cache-hit, and the CPU can resume processing.

If the data was not found, we get instead a #1.B Cache-miss and need to fetch the value from RAM.

But the catch is that we can’t copy just an individual value from RAM. We need to copy #2 Copy an entire chunk of memory.

* CPUs have several levels of Cache (L1-Cache, L2-Cache, L3-Cache). For simplification I just draw one. But the process is similar. Check L1, if not check L2, if not check L3, if not check RAM

Why is might be a problem?

When we program using an Object-Oriented approach, using MonoBehaviors, we commonly find something like this:

public class MyButton : MonoBehavior {

  public MyDoor DoorGO;
  // ... more attributes

  private void Update() {
    // ... some code logic

    if (ifButtonWasPressed) {
      DoorGO?.OpenDoor();
    }
  }
}

We normally don’t tend to think about data locality –where data lives in memory– much in this form.

The result is something like this:

Without having control of where data is stored, we end up jumping around memory quite often, resulting in a bunch of cache-misses.

A solution

Now that we know how the underlying hardware operates, imagine we re-order the data from the previous scenario to look like this:

By arranging the elements sequentially, we incur only a cache-miss on the first element, the rest are already in the CPU-cache and we get a cache-hit, hence reducing the time it takes the CPU to process the data.

🧠 Concept
Data-Oriented Design is about designing programs that efficiently use the CPU-cache. Said otherwise, tend to reduce the amount of CPU-cache misses.

Memory Management

If you are making games in Unity, you use C#, which has its own Memory Management System.

The 👍 good thing is that we don’t have to think about the underlying memory when programming.

The 👎 bad thing is that we don’t have control about how the underlying memory is access / managed.

🔥 If you remember the chapter about Data-Oriented Design and CPU-cache, you can start seeying why this might be a problem.
This also has implications when talking about multi-threading.

How does it work in lower-level languages (C, C++)?

When we look at languages that are not Garbage Collected, we usually start talking about Pointers to memory and manual memory allocation.

The 👍 good thing is that we gain control about how the underlying memory is access / managed.

The 👎 bad thing is that we have un-restricted access to memory, and we are responsible of managing this with caution, and to manually remove the data that is no longer in use.

🧠 Concept
Data-Oriented Design requires the ability to self manage memory allocation, in order to place data in memory to make efficient use of the CPU-cache.

Multi-threading

You probably know that CPUs have multiple cores. And each core is capable of doing operations at the same time.

One Core is the one running the Engine (Unity), and it’s known as the main thread. The others are known as worker threads.

When multi-threading there are two aspects of it:

  1. Running a part of our application in a worker thread
  2. Splitting a part of our application in several task(s) that can run in parallel.
🧠 Concept
Multi-threading is about running parts of your program in other cores, outside of the main thread, possibly even splitting them into several tasks that run in parallel.

This is easier said than done. Multi-thread programming is a deep topic, and one that is quite complex at its core. But for the purpose of this post, its as far as we go here.

Compiler

An oversimplification of the process, but the idea remains. Normally C# code is compiled into an Intermediate Language (IL)*. In Unity’s land, Mono is also an IL.

* Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL). Wikipedia.

This code is later read by a virtual machine and translated into platform specific code*.

* This is also known as just-in-time (JIT) compilation. Wikipedia.

A better way

An alternative to this, is to compile into platform specific code once for each target platform we want to target our game.

And there is more 😎. By doing Data-Oriented Design we can leverage other compiler optimizations, like Single Instruction Multiple Data (SIMD) instructions.

👻 Foreshadow
It's outside the scope of this post, but Unity is providing us exactly this with his Burst Compiler. It's amazing 🎉💃

Entity Component System (ECS)

An Entity is an ID that identifies a unique entity in the world.

E.g. You can think of them as the equivalent of a GameObject's Instance

Each Entity is formed by one –or more– Components, each with it’s own data.

Systems operate over the Components of Entities that match a given criteria.

E.g. In the image we see a System called MoveSystem that operates over Entities with a Location and Acceleration.

Key differences between ECS and Object-Oriented

So while Objects tend to hold both data and code together, in ECS, Components hold the data, and Systems hold the code.

And in Objects, we tend to think in single units (batch operations are the exception, not the rule), like Player.Move().

On the other side, Systems tend to operate in batches of units, like “for all Entities with a Location and Acceleration, I will move them”.

ECS and Data-Oriented Design

🧠 Concept
While ECS does not require a Data-Orieted Design per se, it pairs really well with it.

By separating data from code, we can store data taking into consideration data locality.

By having Systems operating over batch of Entities at a time based on a criteria Components present in an Entity, we can have several Systems instead of dealing with inheritance.

This –not dealing with inheritance has an positive impact on the CPU-cache, but not just on the data, but also on the code.

Yes, CPU-cache also affect your code. When you have Polymorphism, you are likely incurring CPU-cache misses.

Next steps

I wrote this because when I started with Unity’s DOTS I was a bit lost. The concepts are not that hard to grasp (mastery is another thing entirely 😅), but when thrown at you all at once, it was a bit overwhelming.

As next steps you can:

  1. Dive deeper into the concepts described in here. Specially around Data-Oriented Design.

  2. Start delving into Unity’s DOTS implementation.

To start learning about Unity’s DOTS implementations:

Start here 👉 Basics of DOTS: Jobs and Entities, Unity Learn project

This is an intro to the concepts

Unity’s DOTS bootcamp

I really recommend this bootcamp. But be wary, it’s not holding your hand. Expect to revisit parts of it more than once, and have the code open while going through the video. Pause, experiment a bit, repeat.

Unity’s Github repo with DOTS samples

To learn more about Data-Oriented Design:

🎬 Solving the Right Problems for Engine Programmers, Mike Acton, TGC 2017
This is a talk aimed towards Game Engine Programmers, so be aware. You are going down the rabbit hole fast.

A topic outside of this blog is composition over inheritance:

🎬 Is There More to Game Architecture than ECS?, Bob Nystrom, Roguelike Celebration

Nystrom is also the writter of the 📚 Game Programming Patterns book. A really good read 😉 and the main inspiration for this blog.