The Generic Game server development blog: February 2011

Monday, February 28, 2011

More fault tolerance & protocol improvements

Today I and Niklas sat down and took a look at improving the fault tolerance of our dispatcher process. Currently it (for some reason) crashes all of our clients when it goes down, and this is bad. We are looking at fault-proofing every subsystem of GGS, so in a worst case scenario we still wont lose any data, but all the clients will be disconnected when the dispatcher goes down.

The problem seems to have to do with the way sockets are implemented in Erlang, and I'm sure there is a way around this, we just need to find it. Other than looking at the dispatcher, I also changed all outgoing communication to also use our protocol, previously it did not. I wrote a very rudimentary protocol "message builder" for this purpose, and also a very very simple parser for python.

The chat client now uses the protocol both for receiving messages and also for sending them. As a step in the direction of game loops not written in Erlang, I redid the "identification system" for player and table processes. Previously we handled all communication using process IDs, which is really handy when doing game loops in Erlang, since you get a unique identifier, and at the same time a way to communicate with the process. When moving outside Erlang and into JavaScript for example, this is not as convenient.

I wrote a post about why we need UUIDs for describing players, and this is what I have implemented, both for tables and for players. Now all identification of tables and processes are in the form of UUIDs and not process IDs.

Friday, February 25, 2011

Fault tolerance & chat

Today we have seen some improvements in two different parts of GGS. I have extended the GGSChat program a bit, it now features nicknames stored in GGS via Mnesia (ggs_db), a list to show the nicks, and some other cosmetics.

This is interesting because this is the first usage of ggs_db, and I can conclude that it works nicely.

I have also added some fault tolerance to the ggs_coordinator. Now clients are unaffected by crashes in ggs_coordinator. This was mostly accomplished by adding a backup server which ggs_coordinator can restore it's old state from when it crashes, but also by changing the startup routine of new tables.

Previously we created new tables using gen_server:start_link, but this didn't work very well. ggs_coordinator becomes the parent process of all tables, and when ggs_coordinator goes down, it brought all the tables down with it. I solved this by using gen:server:start instead, and also I enable trap_exit to catch any exit messages.

Many features of GGS have now been tested using GGSChat, and next in line is the room creation feature. Currently all clients join the same room when GGSChat starts, I want to be able to specify a room. This makes the chat a bit more IRC-like.

Oh, here's how the chat looks now:

Wednesday, February 23, 2011

Brain storming

I sat down this morning and thought a bit more about which features we need to provide to game developers using GGS. I came up with some stuff that we hadn't written down previously, and also described our current things a bit more in detail.

One thing that we had discusses, but never really formalized was the idea of a rule enforcer, who checks if a rule is broken when a move is made, or rather, if a move is legal or not. Our supervisor had previously asked us if it would be possible to change the rules of the game while it was commencing, and this is a great way to do this, we think. The enforcer will have a list of all the rules, and checks each move against the rules, if a rule is broken, the enforcer will stop the player from performing the bad action. A bit like a policeman, or a judge in soccer.

The problem then becomes; how do we specify the rules of the game in such a way that they can be changed, and easily formulated? Mathias found a specification [1] for the GDL language, which in part does just this (and much more). Using a GDL dialect, or possibly a fully fledged GDL system, we could allow developers to form rules, and players to change some of them, on the fly.

The idea is that a developer writes the rules he/she wants players to be able to change (possibly all the rules) in GDL, passes them to the enforcer, and the player can then change them as she wishes.

On a side note, I will stop time-logging on the blog and do this on ping-pong instead.

Monday, February 21, 2011

Bulk update

A lot has happened, and this post should really be several posts; unfortunately it is not.

We now have an 'emulator' for our game virtual machine, which is in the future written in a DSL, but for now written in Erlang, lets call this emulator gamevm_e.

Gamevm_e implements the functions and protocol that will later be written and uploaded by game developers, things like placing chess pieces on a board. The purpose of gamevm_e is to be able to test our backing architecture before we have a proper vm in place, so in the future, gamevm_e will probably be removed, since we really want to use other languages, such as Javascript, or Lua for this.

We have during the end of the past week and this weekend managed to connect several Player processes to one Table process, thereby sharing one gamevm_e process. This has allowed us to communicate between players, and thereby also client. To test this functionality I developed a small chat client with its server side logic written in gamevm_e, in erlang.

We have also shifted more of our focus towards investigating which support libraries and helpers we need to provide to game developers. We have so far come up with things related to ranking systems, content distribution (for sending graphics, maps, etc to clients), perhaps micro-payment systems to allow charging players. More time will be spent investigating these things.

Time report:

From th 18/2 - 20/2 : Approximately 10 hours

Board game classifications

Strategic play: Choice is based on multiple turns. Example: Openings in chess. Tactical play: Choice is based on current turn only. Example: Having the knight perform check while simultaneously threatening the queen.

informal games: Undirected play. As in a playground.
formal games: Games with means and ends. where Means consists of are predetermined set of rules and equipment. Ends are a set goals to accomplish through contest.

Game board:
* A flat surface. Example: Chess
* Tile laying. Tokens stacked on top of each other.Example: Scrabble(Afapet in swedish)
* Boardless. Boards which isn't flat like 3D boards or no board at all. Example: Yatzy

Combinatorial games
Discrete: No simultaneous moves
Deterministic: No luck
Perfect information: No hidden information
Finite: Well defined outcome after first nr of moves

Race games: Players trying to be the first to reach from start to end. Example: Parcheesi
Space games: Arrange pieces into patterns. Developing and disrupting the enemy. Example: Three in a row
Chase games: Hunter and pray. Capturing and entrapping. Example: Cops & robbers.
Displace games: Capture a fixed nr of opponents pieces. Example: Fox games.

Variable geometry:
Fixed: Nr in a row, Target pattern. Example: 3 in a row
Variable: Pathmaking. Territorial. Reach a goal. China chess.

Sunday, February 20, 2011

Board game decomposition

I tried figuring out the elements of boardgames. Hopefully it will be useful in future design decisions.
Source of inspation: Decoupling Aspects in Board Games Modeling

Logic:
This is the core of the game.
Agents:
Users of the game. May be humans or computers.
Representation:
How the game is presented to the user. Possibly with figures cards and tokens. Usually expresses the game logic, making it reasonable for the user.

States:
The conditions of the game at each turn.
Behaviors:
Transitions from state to state.
Rules:
Static laws producing the behaviors in the game.

Administrive agents:
Upholder of the rules of a game. Like the bank in monopoly.
Players:
Players playing the game.

Saturday, February 19, 2011

Game Semantics

Used to study the behavior of simple programming constructs and create an abstract model. This model is then used to express game rules, strategies, terminologies etc.

Example: An expression written using a newly defined model compared to a lambda expression.

The reason is to have the simplest most well fitting syntax for the domain being studied. In this way it will be easy to understand expressions even as they become complex.

How could GGS benefit of game semantics? There probably exists optimal models to express all sorts of game behaviours. I believe you could have an application reading the history of gamestates and parses them with respect to such a model and then represent them to a user. These sorts of applications is not our main concern but could probably very easy be used in conjunction with GGS. Or maybe even built into the javascript game.

Another idea would be able to send such expressions to the game in runtime. Therefore enabling players to modify the rules of the game. Unfortunately GGS isn't powerful enough being able to support that idea. * You would need to translate whatever the game expressions are into java script. * The ability to reload the gamefunctions inside spidermonkey in runtime. * Alternatively having multiple virtual machines running simultaneously, cross communicating. This would be even harder as we would need to write our own vm and having them communicate with each other.

To summarize: Adding this kind of support to GGS would direct the focus from our current goals of not having to implement network support by oneself, to creating a game design tool with networking and communication support.

Friday, February 18, 2011

Semantics

I found some links about game semantics, game design languages and structures in board games which I've been looking into. The purpose is to get in touch with these subjects so that we will be able to find out what of this should be supported with GGS, what we will implement or what would easily be implemented by others. If not then give reasons and use the material while writing standards. Together with this I wrote a brief summary of the word Semantics for everyone to enjoy.

Semantics. The study of meaning according to wikipedia.

It is learning about the meaning of different words and symbols and how they together build up sentences which then has new meanings.

Applied to computer languages,this is learning about how they work and compare their syntaxes. Make abstractions of them and create mathematical models.

Wednesday, February 16, 2011

A bit of a rewrite

Today we realized that our old structure of GGS wasn't very good, so we decided to redo it! We have learned a lot developing the "old" structure of GGS, such as how to use supervisor structures properly and which the big parts of the application are.

Anyway.. Using this knowledge, we sat down and really thought through which interfaces we need internally. We also asked ourselves "what happens when this part of the application fails?", and provided solutions for this. In the end, I think we came up with a pretty nice solution.

We now represent all clients as separate processes, and we have a coordinator process which connects different clients with different games. The idea is that this coordinator process should be able to rebuild the entire server in a different location, and thereby easily provide replication.

We all sat down and created stubs for the new design and documented the design in this document, and I later sat down and implemented the basic supervisor structure and socket communication, just so we have something runnable.

I predict we will soon be up and running with the same amount of functionality we had with the previous design, much of the code was written in such a way that we can simply pop it in to the new design.

Time report:

Today: 8 hours

Yesterday: 5 hours

Tuesday, February 15, 2011

Joe Armstrong on Erlang lecture

Today we all have been on a lecture held by John Armstrong - who is one of the inventors of Erlang - on Erlang. The lecture seemed to have been made exactly for us with our problems.

Is Erlang the answer?
In that case what was the question?
Joe Armstrong
Programming languages are used to solve problems.
But what problems? Erlang solves a few interesting problems.
What problem does it solve, and why are these problems interesting?
What big problems are left in computer science and how might we solve them?
What's gone wrong with computer architectures and what can we do about it?
This talks looks back at the history of Erlang and forward to a future where one day software might actually work and be useful. Right now software is in a bit of mess, but there are ways to fix it ...
I'll tell you more next Tuesday, ...
/Joe

At first he talked a little bit about the history of Erlang, which was interesting but most of the stuff you already knew from books and so on.

Message passing and processes

He wrote a interesting paper about OOP a while ago: Why OO Sucks

One of his tutors said, after reading this "Why OO Sucks" paper, You got it all wrong. You know, Erlang is probably the only really object oriented language because the main thing is not at all the classes and inheritance and stuff, but message passing and code encapsulation. (I paraphrased here, can't remember the exact words he used).

Just yesterday I tried to explain to the other guys why we really should use asynchronous message passing instead of synchronous, I never could wrap what I was thinking and feeling about it into words. But lucky me, Joe did exactly that today for me, so I think we now agree on that we want asynchronous message passing for our server.

Another anecdote was that he once gave a talk in Germany about Erlang and some server programmer said that their server would serve data for about 10000 users he wouldn't need Erlang to acomplish that. Joe asked him: If something goes wrong in your server and it crashes, how many users does this affect, all the 10000? Our server only serves one user, but we start 10000 server processes to serve data to every user, when it crashes it only affects that one user. I must admit, it got me thinking.

Protocols

He also talked about protocols and that he never liked the fact that there are sooo many of them, he counted about 4900 or something like that. We felt like he was talking directly to us, because during the last two weeks we have been doing exactly that, designing a new protocol :-/.

In 2002 he proposed UBF, but as he states in his blog in february 2009, This scheme was never widely adopted - perhaps it was just to strange.... During the whole lecture he never mentioned a real world alternative to designing a own protocol, so during the Q&A session I asked him directly what we should use today instead of the protocol designed by ourselves.

He had to think about this question for some time and said That is a good question :D, but then he mentioned Google Protocol Buffers and Facebooks Thift. He said we should have a look at both and then just pick one, and we should really do that.

Realizing that this would be the right thing to do crossed my heart a little bit because I was a little bit proud that we had developed a simple to understand but flexible and powerful protocol.

So tomorrow we will have a closer look at Protocol Buffers and Thift and try to decide if one of them suits us good enough. I assume that we will rethink our structure and if we should use more processes or something. After all it was the perfect lecture in the perfect time for us.

Monday, February 14, 2011

Internal interfaces

Today we found out that there is no way to continue programming together without first defining some internal interfaces for our erlang modules. Every time someone added some functionality it broke the entire application because of the changed method calls.

So we decided to set up a new Wiki page: Interfaces

Untill now there are only three interfaces, the ones we are working on right now, but we want to add a interface for each of the modules we use. They will help us to structure the application and to easy understand what each module is supposed to do.

We tried to do that by the "Single responsibility principle", where each module only has one responsibility. If its doing more than one thing then obviously it needs refactoring.

Time: 10 hours

Improved protocol & test app

Today we had a meeting where we discussed the structure of the new protocol module. We ended up with a resursive solution which looks much better than the old one. This new module allows us to write more powerful handlers in ggs_server, so that's very good.

I've also written a little GTK application to demonstrate GGS, just a simple calculator. The important features of the calculator is the calculate button, and the connect button, which interact with GGS.

Time report
Approx. 8 hours today

Thursday, February 10, 2011

Usage of javascript

JavaScript has gained a lot of popularity lately, it is used in large projects such as Riak[1], CouchDB[2], PhoneGap[3]. On the popular social coding site GitHub.com, 18% of all code is written in JavaScript.

When starting a new project, it is nice to have a low learning curve, to attract new users. Using a language that many people already know about means you get a lot of potetial users right away.

The popularity of JavaScript in the programming community, in combination with the availability of several[4] different JavaScript virtual machines was an important influence in choosing JavaScript as the main control language for GGS.

In addition to JavaScript being a popular language, both CouchDB and Riak mentioned above use Erlang.

References:
1. http://wiki.basho.com/An-Introduction-to-Riak.html
2. http://couchdb.apache.org/
3. http://www.phonegap.com/about
4. http://www.mozilla.org/js/spidermonkey/, https://wiki.mozilla.org/JaegerMonkey, http://code.google.com/p/v8/

Tuesday, February 8, 2011

Literature Quest

Here comes another time report. Today I haven't done much at all, but yesterday we did some good stuff. I assisted Jeena in trying to communicate with Erlang from C (and thus JS), and prior to that I read Joe Armstrong's thesis on Erlang, it's a really really good read for anyone wanting to know more about Erlang and OTP. I believe it was written some time around 2003, but it is according to my analysis still current.

I also read a report which was mainly focused on postmortem analysis of crashed Erlang applications / systems, and included a new tool to aid in this. This tool was cool and all, but what was really good about that report was that it also covered a good deal of how to properly write fault tolerant Erlang applications.

In total I spent around 6 hours

Sunday, February 6, 2011

Exposing C-functions to Spidermonkey

Today I implemented my very first additional function into Spidermonkey, it was just a test function which just responds with a static string but anyhow, this is hot shit! We will use it to call Erlang functions from JavaScript to implement something like webstorage. here is some code, the js_erlang() function which will be exposed to JavaScript:

JSBool js_erlang(JSContext *cx, uintN argc, jsval *vp) {
  const char *s = "text comes from C function";
  JSString *str = JS_NewStringCopyN(cx, s, sizeof(s));
  JS_SET_RVAL(cx, vp, STRING_TO_JSVAL(str));
  return JSVAL_TRUE;
}

And here is how you add it to all the other native functions in Spidermonkey:

JSNative *js_erlptr = (JSNative *) *js_erlang;
JS_DefineFunction(vm->context, JS_GetGlobalObject(vm->context), "callErlang", js_erlptr, 0, JSFUN_FAST_NATIVE);

Because we wanted to add this, we had to fork the erlang_js project, I hope we can write code which is general enough so it will be back ported to the original project. Here is our version.

The next step is to get the argument from the function, I suppose it'll look something like that:

jsval *argv = JS_ARGV(cx, vp);
jsval js_arg = argv[0];
char *argument = JS_GetStringBytes(JS_ValueToString(cx, js_arg));

And after that we have to call a erlang function. Jonathan found this neat erlang helper called erl_call. With its help you can call functions in a node:

me@Zepto$ echo "[2+2, node(), \"Hello world\"]." | erl_call -sname ggs -e
{ok, [4, ggs@Zepto, "Hello world"]}

And you get an answer from the erlang node in plaintext. This looks like the easiest way to talk to erlang yet. You probably saw the argument "-sname ggs" and wandered what that is. It is the name of the node you would like to talk to, here "ggs".

Time report

Today: 7 hours

Saturday, February 5, 2011

Fault tolerant GGS

Now we have some fault tolerance in GGS. Currently the server 'crashes' when a client exits, and this causes a new socket to be opened, which is good, because the old one was closed by the client.

The {ReferenceID, JSVM} tuple stored in the ggs_server state was previously lost when the ggs_server crashes, but this is now stored in ggs_backup as well, and this is where the fault tolerance takes place.

ggs_server may now crash at pretty much any time, and resume its state from ggs_backup. This works according to the following:

ggs_sup starts
mnesia_ctrl starts
ggs_server_sup starts
ggs_backup starts
ggs_server starts

In 5. when ggs_server starts, it always asks ggs_backup for any previous state, even if this is the initial start of the whole supervisor hierarchy. When the application is first run, there is no state, and therefore ggs_backup returns an atom signaling that it has no state to back up from, and that ggs_server should initialize a new state and back it up.

If ggs_backup is to crash, the ggs_server_sup will restart it, just as it restarts ggs_server, and the state is reloaded from ggs_server, unless both ggs_server and ggs_backup crash at the exact same time.

For some reason though, the JSVMs stored in the ggs_server state do not live through the migration to ggs_backup and back, most likely due to them being spawned from inside ggs_server and not being supervised properly. This problem should be easily solved by supervising erlang_js on a higher level.

UPDATE: Moving erlang_js to a higher hierachy is not the solution to the JSVM death issue. The JSVMs disappearing when ggs_server crashes is a design choice coming down to the new/3 function in js_driver.erl in erlang_js.

new/3 creates a connection to the erlang_js driver using an open_port command, and this command registers the calling process (ggs_server) as its port owner. Looking at the documentation for ports we can see that when the port owner dies, the port is supposed to go down as well. I see two possible solutions to this:

Change the owner of the port to ggs_backup when ggs_server crashes, and then back again
Set the owner of the port to a process which should not ever go down (thanks)

Now, according to the SO link above, option 2 is hacky, so option 1 is to be preferred. This means we start yet another gen_server process somewhere in the hierachy, which acts somewhat like a wrapper for erlang_js. This is probably the future of js_runner.

Update 2: A different approach is to incorporate the "wrapper" inside erlang_js so we don't need to see it, we just do our js:call/? calls and those are handed of to a "persistent" process inside erlang_js which registers its JSVMs to that process.

Time report
Today: 5 hours
Updating day: 1 hour

Friday, February 4, 2011

Protocol changes

Now I have implemented much of the new protocol Jeena proposed. GGS has gotten to the point where it is possible to define a function with an arbitrary name, and run the same function. A session is maintained via a token that is passed back and forth between client and server.
We still have not implemented unique IDs for the users, so if more than one user connected at once, there'd be a 0.1% chance that they get the same ID!
The javascript running code is really simple at the moment, and Jeena is working on improving it.
The supervisor tree graphic that was shown in the last post is now implemented. We have an mnesia controller which we can use to communicate with mnesia, we still only have one ggs_server though.
I and Niklas also went to the library to learn more about their databases, it was very informative and will certainly prove to be a great resource.

Time report
Yesterday: 3 hours (including meeting w/ supervisor)
Today: 4 hours, working on the protocol, 2 hours in library = 6 hours total

Tuesday, February 1, 2011

Supervisor

Now the supervisor is active and working in our little GGS. Currently there is one root supervisor, which supervises the ggs_server process. Once the ggs_server process dies (which can be forced by a __crash command), the supervisor restarts it.

The problem is that the ggs_server process holds valuable state, which we need to propagate to other ggs_server processes for reliability. The next step is figuring out how we can run two (or more) completely mirrored ggs_server processes, and bringing one down doesn't wring the other down.

The network part of the ggs_server module should also be moved out to ggs_network, similar to how the application level protocol is placed in ggs_protocol. The actions taken by ggs_server when a message is received are currently not implemented in an "OTP fashion" and should be rewritten as such (should be simple).

Yesterday, we spent some time discussing and implementing the protocol which we use to communicate to the clients and back to the server again. This ended up as the ggs_protocol module. Also, yesterday night (this morning..) I implemented {client, JS-VM} mappings in an OTP fashion in ggs_server.

The picture below shows the supervisor setup we are currently working towards. The tree depicts the GGS system running on one machine, when run on several machines, we need to link several of these trees together

Time report Yesterday (day): 4 hours Yesterday (night): 2 hours Today: 3 Hours

Randomness and Identifiers

We have a somewhat new problem to consider in our project. Picture GGS as a 'cloud' of computers, where we can add and remove computers as we wish. Now consider adding a game client (player) on node A of the cloud. At the same time, add a new player on node B. These two players need to be distinctly identifiable from each other. This means, we can not simply give each player a pseudorandom number from 1-1000, or even a sequential number from 1-1000, because we will have clashes after a while. To solve this, we first considered using Erlang's make_ref() function, however, we encountered several issues with this:

The refs are not really intended to be sent over network in binary form (or so it seems at least)
When converted to binary, they are quite large
According to [1], they are unique to "approximately 2^82 calls" - which is a bit vague.

So, looking beyond make_ref(), we found UUIDs, which have more appealing properties:

It is designed to be used over a network [2]
It is designed to be used for objects with a long life [2]

We will proceed the project with UUIDs as out identifiers, both internally in the system, and externally towards the clients. If needed, we could even salt the UUIDs a bit, but I personally don't think we will need to. References

http://www.erlang.org/doc/man/erlang.html#make_ref-0
http://www.opengroup.org/dce/info/draft-leach-uuids-guids-01.txt

Section 3.0