Dev Diary: The Hunt for the One Sync Bug!
Tired but Wired!
Well I have to say I really, really detest sync bugs - probably more than all of you combined. But as of this moment I am elated despite being utterly exhausted (it’s 4:04am as I type this sentence) because I’ve found and tracked a sync bug that has been in Sins for a VERY long time. One of the great joys of making games is tracking down the nastiest bugs, the kind that hound you for days, months and even years, the kind that constantly show up in your support email, the kind that always end up in the top ten posts of the forums. Sync bugs are the worst.
I’m going to explain the cause of the sync bug, how it was tracked, and how it was fixed. It might be easier to understand if you take at look at my last dev diary where I spoke at length about sync bugs. You can find it here: https://forums.sinsofasolarempire.com/331090.
Cause and Effect
The sync bug was caused by the second type listed in the previous dev diary. Type 2 is mixing non-deterministic code with deterministic code. Sins uses two random generators, one is the deterministic generator and the other is the non-deterministic generator (DRandom and NDRandom as we call them in code). DRandom is always called by stuff that affects the simulation so that the same results are achieved on everyone’s computers without having to transmit data (it’s a property of the math). NDRandom is called by stuff that doesn’t affect the simulation like the random direction particles shoot off in an explosion or the flickering of the exhaust. You can’t mix them but unfortunately we did!
A Barrage of Problems
In Sins, when certain classes of abilities are activated, the effect you see comes from hidden “buff points” on the mesh. For example when the ultimate ability of the Marza (“Missile Barrage”) fires you see lots of missiles shoot out of the right-side missile racks which are defined as a little grid of “buff points”. When each individual missile goes to launch, one of those invisible points is randomly selected as the missile’s origin. Because this is an effect, it’s supposed to be using the NDRandomStream to make that point selection and it is. However, the randomly selected point is then used in the math that calculates the time it will take the missile to damage the target. That time then becomes part of the ability’s simulation. Based on this, different computers are going to generate different results for the time it will take the missile to hit the target and they may go out of sync. Of course, Missile Barrage isn’t the only culprit. Any ability that is built using ApplyBuffToTargetWithTravel has a chance to go out of sync. This would include things like EMP blast, Gauss Blast and many more. However, it turns out for the majority of them, the probability is actually zero. The chance of it going out of sync is linearly proportional to the number of buff points. Most of these abilities only have one buff point so it doesn’t matter which random generator we use - the same result will always occur as there is only one point to pick! The Marza’s Missile Barrage is more likely because it’s made up of a bunch of points.
Detecting a de-sync is conceptually as simple as comparing the complete simulation state of the game on one person’s computer against that of another’s. This is pretty expensive (i.e., slow your computer to a crawl) so we add up all the state values into a “checksum” and then just compare that. Every update, each person sends his or her checksum to the host and the host compares them. If they differ, they are out of sync. Once you know they are out of sync, you need to determine the delta, or the difference, in the states to see where they diverged. I put a post up asking people to enable “snapshots” and to turn the detail up to 3. What this did is write out a file that contains a very detailed snapshot of that person’s game state at the time the de-sync was detected. Once all players send me the files I can throw them into a special program that shows me the delta between them (I use Beyond Compare which I highly recommend for a wide variety of game development related tasks). The delta tells me what diverged and I can then start looking at cause. The problem is getting the de-sync to happen in the first place!
The Hunt for the Ring
One of my favourite board games is War of the Ring which is based on Lord of the Rings. In this game the evil guys get to put aside special Hunt Dice that increases their chances of tracking down the ring-bearer and his hidden fellowship. The more dice you put in there, the better your chances. Tracking sync bugs is like tracking Frodo and Sam wearing those elven cloaks and having Gollum lead them through all the secret hidden paths but we’ve only got half a dice which I suppose symbolizes half of one of the nine Nazgul’s horses. We are basically screwed, the ring gets thrown into the lava time and again, and we receive nasty emails that would do Sauron proud. In order to stop this vicious cycle we’ve called upon the masses of the internet to become our Hunt Dice and track the ring-bearer down! Luckily, a few special Nazgul found Frodo and sent us their reports so we could kill him and take back the ring.
Special thanks to the following nine Dark Riders:
(Yes, by pure coincidence there were exactly 9 people able to recreate it and send in valid snapshot files.)
Out Damn Spot!
Not much to say here. I simply made sure the buff point was selected using the Deterministic Random Generator instead of the Non-Deterministic one. Expect both original Sins and Sins: Entrenchment to be patched up with this fix shortly!
Thanks for reading,