The {ReferenceID, JSVM} tuple stored in the ggs_server state was previously lost when the ggs_server crashes, but this is now stored in ggs_backup as well, and this is where the fault tolerance takes place.
ggs_server may now crash at pretty much any time, and resume its state from ggs_backup. This works according to the following:
- ggs_sup starts
- mnesia_ctrl starts
- ggs_server_sup starts
- ggs_backup starts
- ggs_server starts
If ggs_backup is to crash, the ggs_server_sup will restart it, just as it restarts ggs_server, and the state is reloaded from ggs_server, unless both ggs_server and ggs_backup crash at the exact same time.
For some reason though, the JSVMs stored in the ggs_server state do not live through the migration to ggs_backup and back, most likely due to them being spawned from inside ggs_server and not being supervised properly. This problem should be easily solved by supervising erlang_js on a higher level.
UPDATE: Moving erlang_js to a higher hierachy is not the solution to the JSVM death issue. The JSVMs disappearing when ggs_server crashes is a design choice coming down to the new/3 function in js_driver.erl in erlang_js.
new/3 creates a connection to the erlang_js driver using an open_port command, and this command registers the calling process (ggs_server) as its port owner. Looking at the documentation for ports we can see that when the port owner dies, the port is supposed to go down as well. I see two possible solutions to this:
- Change the owner of the port to ggs_backup when ggs_server crashes, and then back again
- Set the owner of the port to a process which should not ever go down (thanks)
Update 2: A different approach is to incorporate the "wrapper" inside erlang_js so we don't need to see it, we just do our js:call/? calls and those are handed of to a "persistent" process inside erlang_js which registers its JSVMs to that process.
Time report
Today: 5 hours
Updating day: 1 hour
No comments:
Post a Comment