The subtests were almost all marked as parallel, but that's not enough.
That only makes the subtests run in parallel with other subtests within
the same tests, not with any other test.
Since none of the tests make use of globals nor require the entire
program to themselves, properly run all the tests in parallel.
Speeds up 'go test' on my 8-core laptop from an average of ~130s to an
average of ~50s. Many tests hit timeouts and have sleeps, so we want to
avoid running those sequentially whenever possible.