Wednesday, April 14, 2010

Map class: Array & List+Hash

A short presentation about using List+Hash instead of Arrays.
Confirmed programmers can skip this one but I thought it might be quite usefull for novices.
Non-programmers can skip this large nonsense post.

- Good Array

A map is basically a 2D grid of stuff. A 2d grid most natural data structure is a 2d array.
The map is made of tiles. So let's make a 2d array of tiles:

class Map
    Tile[,] m_Tiles;  // easy and fast!

A tile has various properties, such as the terrain there or if the player visited this tile. These core properties are everywhere on your map, so you want them at every tile.

Your game will access the tiles by position a lot so having an array makes sense and is the most efficient solution.

Entity everywhere on the map + frequent access by position = Array of entity is good.

- Bad Array, quite Good List but not good enough
You have other entities on a map. They have a position on the map too, but all the tiles do not necessary have one of these entities and in fact most of them do not. For instance you never have an actor at each and every tile.
Since most of your map will be void of these entities it makes sense to NOT use an array. That would be a waste of memory and your saved games size will skyrocket.

class Map
     Actor[,] m_Actors;  // most of this will be empty! what a waste!

That sucks. Image the wasted memory on a large map (yeah even null pointers take memory).

Entity rare on the map = Array of entity is bad.

Ah. Well what about a List then?

class Map
     List<Actor> m_Actors;  // ok no waste. happy?

Entity rare on the map = List is good.

Hey! But I liked my array! It was so nice to access efficiently to the entity by position! Having to process the whole list of actors to see if there is an actor at a position is awfully inefficient!

Correct. That's why God created Hashtables.

- List+Hash : you are winner!

Ok we have two requierements that seem contradicting:
1 don't waste memory.
2 fast access by position.

List does 1. Array does 2.
Having 1 is nice but we REALLY wants 2, especially if you have a large number of stuff to process in your game.

Hashtables are usefull for this. You can think of them as "hollow" arrays.
- you can access them by index (called key) like an array.
- they hold only the elements you put in them like a list, they have no "holes".

When you think about it, a position (point) on the map is an index. There you have your key.

class Map
    List<actor> m_Actors;
    Dictionary<Point, Actor> m_ActorsByPosition;

Now you can retrieve actors directly by their position by querying the hashtable:

Actor GetActorAt(Point position)
    return m_ActorsByPosition[position];

(This little code doesn't handle error cases but you get the idea)
It works great. It looks and behave like an array, with very little overhead and with the benefit of no wasted memory.

We still keep the list, cause we want to be able to write things like: "for all actors in this map, do this". Getting the list of actors from the hashtable doesn't guarantee a stable order, plus you don't know what it does under the hood, it might reallocate a list each time you call it.

Think of the list as the master data structure, and the dictionary/hashtable as the slave or auxiliary data structure : you don't need it, but it proves quite usefull.

- Real life use : AI improvement
Well not quite real life but real game. I use this dual data structure a lot in Rogue Survivor, from Actors to MapObjects and Items stacks.

But I confess I was lazy once and did not use this for the Odor feature and instead kept the basic list alone.
Some actors leave odors behind them and some AIs use this to follow or chase them around.
Worked good enough, until I wanted to improve the Zombies AIs. I thought they were a bit too passive and I wanted them to more actively chase humans by smelling odors at a distance rather than just by standing around.
Implement. Test. Uh oh. With my 100+ Zombies per map performance was fugly.
I forgot about the list and used a profiler to find the problem, if there was one. I was ready to surrender the AI improvement to the god of performance :(
Well, turned out I still had the list and it was doing "find an odor at position X in a big fucking list" a lot.
I promptly repaired this injustice by implenting the companion hashtable and tadda, not only performance improved a lot, but it improved to better levels than before. In the end I had better AI with better performance at the cost of few lines of code.
So trust me, use it :)

End of post.


  1. Heh, I've used this technique before but I didn't call them hash tables. Remember, arrays that use string indexes instead of just number keys take slightly more memory and processor cycles, but obviously they were a *much* better choice in this situation.

  2. Hashtables are often implemented as balanced binary trees so the overhead is minimal.