Jashank Jeremy's Blog

round07

Round at 20:03. 30 submissions, 28 Draculas compiled, 27 Hunters compiled, 729 games ran. A slight sag in the number of running games because of compile failures -- all spectacular. At least one group submitted a broken Makefile; others seem to have forgotten to at least write syntactically valid C. One group's Dracula compiled, but their hunter didn't; strange.

Times: fastest game 888.00 μsec; lower quartile 240.96 msec; median 1.02 sec; upper quartile 2.52 sec; mean 2.65 sec; slowest game 55.93 sec. I'm very curious about the mean slowly sliding outside the most common round times.

Here's the customary graph of comparative rounds -- the curve is shorter because the game count was down:

And for illustration, here's all the games thus far:

There's not been a major change in the number of submissions, which is the most striking observation, but the rounds have been fairly consistent in time: not much has changed in the last day or so.

While we're looking at the picture so far, let's have a look at how the timing statistics have changed.

It looks like the typical game time has actually been fairly constant. The mean is skewed surprisingly high, presumably because of just how slow the slow games have been. Here's just the middle 50% of games, and the mean relative to those.

(Google Sheets isn't very good at dealing with area fills on log-scale graphs. *sigh* )

Next round is at 06:12 tomorrow.

posted by Jashank Jeremy Friday 26 January 2018, 08:54:02 PM, last modified Friday 26 January 2018, 08:54:30 PM

round06

Round ran at 06:11. 28 submissions, 28 compiled, 756 games ran. Fastest game 390.00 μsec; lower quartile 263.33 msec; median 977.49 msec; upper quartile 3.04 sec; mean 3.41 sec; slowest game 317.14 sec. Some slow games creeping in as people's strategies get more complex and compute-hungry:

And if you look in at a game log, you'll see why this analysis was so late: I've finally switched the new game log viewer live. It does better round-by-round analysis and lets you skip forward and backwards player- or turn-at-a-time. It hides player output by default, too.

Games before round 5 don't have running score display; in rounds 5 and onwards, the game engine reports the score each turn. I don't get player health until that player's turn, so it may appear, for example, that Dracula is alive right at the end of a game where his health goes to zero.

(The reason for that is I'm too lazy to implement the views in Yet Another programming language this session.)

I've also whacked a few small bugs caused by our bazaar of bizarre Unicode group names. They now render properly in the group list, ranks, and in game log headers.

Let me know if you run into any bugs while you're on the trail.

Next round at 20:03.

posted by Jashank Jeremy Friday 26 January 2018, 03:06:46 PM, last modified Friday 26 January 2018, 03:45:38 PM

round05

Round at 20:03; 27 submissions, 27 compiled, 702 games ran. Fastest game in 439 μsec, lower quartile 230 msec, median 852 msec, upper quartile 2.30 sec, mean 2.34 sec, slowest game 45.36 seconds. The graph looks pretty similar:

And a huge spike in the number of floating-point exceptions: dividing by zero isn't a great way to start a game.

So why is the analysis for this round so late? Here's a sneak peek at what I've been working on this afternoon:

All things going well, this will be the new game log viewer soon. I entirely blame Keiran for making an off-hand suggestion during the lecture today that's brought this to life. Notably, it gives better navigation between turns and rounds.

Also, fixed a bug in the player around truncating overlong player messages: one group, in particular, has very long messages, which should be fine: the player deals with that automatically, by only copying the first MESSAGE_SIZE bytes and ensuring there's a NUL at the end -- but if your message contains Unicode runes, which require multiple `char`s to encode, this will create a malformed Unicode sequence, which the JSON library we use silently drops. Hooray!

I've also added a new "game score" line to the game log. If you (like me) are parsing game logs using regular expressions and finite state machines, you'll need to account for that.

It's looking very congested at the top of the leaderboards... next round at 06:11.

UPDATE 2243: ~~there's a bug in the game runner~~ . more details soon, hopefully.
UPDATE 2336: there's a bug in a group who didn't read the rules. good work, team.

posted by Jashank Jeremy Thursday 25 January 2018, 10:24:35 PM, last modified Friday 26 January 2018, 03:45:34 PM

round04

25 submissions, 24 compiled, 552 games ran. Fastest game 2 msec, lower quartile 221 msec, median 860 msec, mean 2.35 sec, upper quartile 2.36 sec, slowest game 45.40 sec. The graph looks very similar; as I'd expect, not much performance improvement shows up in the morning round.

The sole compile failure was very interesting: Clang's automatic static and semantic analysis beats the mouldering pile of cruft that is GCC. As you've probably noticed, since round00c, we've used the same `3c` compiler shim that inserts AddressSanitizer and UndefinedBehaviourSanitizer, but because its architecture is radically different to GCC, it's a lot more capable of picking up compile errors.

GameView.c:690:15: error: variable 'specialID' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
   } else if (location[0] == 'T'){
              ^~~~~~~~~~~~~~~~~~
GameView.c:693:11: note: uninitialized use occurs here
   return specialID;
          ^~~~~~~~~
GameView.c:690:11: note: remove the 'if' if its condition is always true
   } else if (location[0] == 'T'){
          ^~~~~~~~~~~~~~~~~~~~~~~
GameView.c:673:17: note: initialize the variable 'specialID' to silence this warning
   int specialID;
                ^
                 = 0
1 error generated.

Control flow analysis is actually a very interesting digraph problem. Each "basic block" forms a node, and each conditional creates edges linking other blocks together. It's not undirected -- control flow can't suddenly go backwards -- and it's not acyclic, because control flow is allowed to loop.

On the topic of AddressSanitizer, there was a great question on the forums by Brian Do:

What does the address sanitizer do and why does it pick up stuff that valgrind doesn't? I copied the dryrun tests and ran it under valgrind and it doesn't detect any errors, but the address sanitizer does.

The AddressSanitizer is a memory error checker. It detects use after free/return/scope, heap/stack/global buffer under/overflows, initialization order bugs, and memory leaks. If you've used dcc before, that's what does its memory error management.

Valgrind takes any executable, and instead of running it on the machine directly, it has a great big switch statement in it that interprets each instruction, checks it, and then runs it. Along the way, it replaces calls to a few types of function,s including memory allocation and memory and string manipulation functions, with its own book-keeping versions of them.

The AddressSanitizer, by comparison, uses a very different approach: it must be compiled into a program (which is why the binaries produced when you compile with ASan or dcc are ~1.5 MB, and without them are tens of kilobytes), and it slots into the compiler's code generation pass: it also replaces memory allocator and some string/memory functions, but also inserts function calls to check against its shadow memory, a bit-mask of what memory is accessible and why. On memory errors, you'll see a dump of the shadow memory around what you tried to access. (If you're interested, there's much more excruciating detail on the Sanitizers wiki < https://github.com/google/sanitizers/wiki >.)

Because ASan has a quite different algorithm, a fairly different approach, and is under very active development (by Google, who use it to whack memory bugs in Chrome) you'll often find it picks up a very different set of bugs to Valgrind. The flip-side is that ASan is necessarily invasive: it uses a significant amount more memory and increases executable size.

I (personally) tend to trust ASan more, mainly out of familiarity. For the dryruns, I've hooked up ASan as a back-stop; most of the memory errors you could make would be detected by Valgrind, but a small number aren't. If you run into one of those cases, I'm happy to take a look.

So, inspired by that, I went and scraped the logs for memory errors being reported by AddressSanitizer... uh oh.

 6   SUMMARY: AddressSanitizer: FPE dracula.c:28:52 in decideDraculaMove
 1   SUMMARY: AddressSanitizer: FPE dracula.c:98:30 in decideDraculaMove
13   SUMMARY: AddressSanitizer: heap-buffer-overflow dracula.c:55:20 in decideDraculaMove
41   SUMMARY: AddressSanitizer: heap-buffer-overflow dracula.c:60:33 in decideDraculaMove
 6   SUMMARY: AddressSanitizer: undefined-behavior dracula.c:28:52 in 
 1   SUMMARY: AddressSanitizer: undefined-behavior dracula.c:68:19 in 
 1   SUMMARY: AddressSanitizer: undefined-behavior dracula.c:98:30 in 
 1   SUMMARY: AddressSanitizer: undefined-behavior DracView.c:288:16 in 
 1   SUMMARY: AddressSanitizer: undefined-behavior DracView.c:292:16 in

I'm impressed by the number of actual issues we're picking up, notably buffer overflows and actual crashes. I haven't seen any leaks reported, though, so I guess there's that.

I'll be putting together something to view rank change over time later today, hopefully. Next round is at 20:03 tonight.

posted by Jashank Jeremy Thursday 25 January 2018, 09:46:12 AM, last modified Friday 26 January 2018, 03:45:29 PM

round03

For the first time, all the round automation code Just Worked, and the round went off without a hitch. Rejoice!

24 submissions, 24 compiled, 552 games ran. Fastest round in 383 μsec (?!), lower quartile at 190 msec, median at 614 msec, upper quartile at 1.78 sec, mean 2.05 sec, slowest round 45 seconds. The infamous log-time graph for just rounds 2 and 3:

Next round runs at 06:10 tomorrow. Go go go!

posted by Jashank Jeremy Thursday 25 January 2018, 09:21:23 AM, last modified Friday 26 January 2018, 03:45:23 PM

round02

21 submissions, 20 compiled. 380 games run.

It looks like the people with infinite loops actually fixed them; hooray! The run-time graph looks much less depressing now. And the numbers on that: fastest game was 2.70 msec, lower quartile, 180 msec, median 710 msec, mean 1.71s, upper quartile 1.85s slowest game 18.27 seconds. I'm much happier with the flatter line here.

Interestingly, the line overall has gotten flatter, suggesting that fast games are getting slower. I'd expect that, as algorithms in views and players gets more complex, but it's interesting to note that the fastest game from last round (526 μsec) is now 6.46 msec, just over an order of decimal magnitude slower. Most of the same groups are up the top of my timing chart, too.

I've set up table sorting on all pages with tables on them. Click one of the table headers to sort by that column.

The round this morning didn't start on time; I'm not entirely sure why, because as far as I can tell, it was rigged to start at the right time. Next round is this evening.

posted by Jashank Jeremy Wednesday 24 January 2018, 11:30:09 AM, last modified Friday 26 January 2018, 03:45:18 PM

round00[abcd] - the test rounds

while round02 runs, I want to talk about the test rounds.

round00a: our first actual round. 2 submissions, not much interesting to report

round00b: 4 submissions, 2 compiled. as far as I can tell, the compile errors here were due to the supplied view code interacting badly with Makefiles; I'm not sure why.

round00c: 6 submissions, 6 compiled. we switched to using the `3c` wrapper around clang+ASan here as well, following on from the view phase. it's much more aggressive about memory error checking.

round00d: 8 submissions, 8 compiled. this round ran just after the new web interface was finished; I'm very surprised that so many new submissions showed up.

... oh, hey, round02's done. let's go crunch some numbers.

posted by Jashank Jeremy Wednesday 24 January 2018, 11:00:22 AM, last modified Friday 26 January 2018, 03:44:58 PM

round01

20 submissions, 18 compiled. 306 games ran.

Round 1 broke due to some network issues in CSE, so restarted a bit late. It ran pretty quickly with the exception of one group (you know who you are) whose Hunters and Dracula were just infinite loops. I'm now very tempted to add a penalty for "number of times you hit the move timeout".

Here's the run-times as a graph. y is log-scale total move time in seconds per game; fastest 526 μsec, median 276 msec, mean 35 sec, slowest 1183 sec:

I've also fixed up ranks, so they now make more sense. Congratulations to team Blade who improved as Dracula the most; and to team Kevin and Hui for holding onto the top of the hunter ranks. The next round is scheduled for a bit after 6am.

I'm also going to be keeping some round analysis and notes on my blog; if you're really curious about how rounds run and various interesting bits of rambling about the game engine and interface, keep an eye on it.

posted by Jashank Jeremy Tuesday 23 January 2018, 09:40:50 PM, last modified Tuesday 23 January 2018, 09:52:01 PM

CRICOS Provider No. 00098G