Monday, February 28, 2011

More fault tolerance & protocol improvements

Today I and Niklas sat down and took a look at improving the fault tolerance of our dispatcher process. Currently it (for some reason) crashes all of our clients when it goes down, and this is bad. We are looking at fault-proofing every subsystem of GGS, so in a worst case scenario we still wont lose any data, but all the clients will be disconnected when the dispatcher goes down.

The problem seems to have to do with the way sockets are implemented in Erlang, and I'm sure there is a way around this, we just need to find it. Other than looking at the dispatcher, I also changed all outgoing communication to also use our protocol, previously it did not. I wrote a very rudimentary protocol "message builder" for this purpose, and also a very very simple parser for python.

The chat client now uses the protocol both for receiving messages and also for sending them. As a step in the direction of game loops not written in Erlang, I redid the "identification system" for player and table processes. Previously we handled all communication using process IDs, which is really handy when doing game loops in Erlang, since you get a unique identifier, and at the same time a way to communicate with the process. When moving outside Erlang and into JavaScript for example, this is not as convenient.

I wrote a post about why we need UUIDs for describing players, and this is what I have implemented, both for tables and for players. Now all identification of tables and processes are in the form of UUIDs and not process IDs.

No comments:

Post a Comment