Mike Bostock knows far more about data visualization than I do so I assume there are good reasons for choosing this presentation. But it strikes me as hard to think about, so I must ask: when is stacked+packed circles a good choice?
The drawbacks to me seem to be:
- At any given point on the x-axis, the associated height of the pile is partly taken up by circles but is partly space lost to circle packing. This proportion is hard to estimate visually.
- Because circles extend in the x-direction, they contribute to height over a _variable_ time range. E.g. WaMu is still the tallest circle at the 2010 line. It's a bit like a kernel density plot where the _kernel_ is data dependent.
Because of these effects taken together, though it's tempting to think of the profile of the pile as a smoothed average of the rate of bank-failures, but this is pretty misleading, which I suppose is why there's no labeled y-axis.
The way I understand the graph is that is not the "number" of circles, but the area (money) that they consume. So those three bank failures are taking up the same space as all those other failures from 2008.
It is the total volume of money involved vs the count. The height of both stacks show just how much damage three did vs all the others in the left hand stack.
Yeah I get the count vs area distinction, but I think the point would have been more cleanly communicated with e.g. a stacked barchart with failures by quarter. With circles, the filled area over any span of the x-axis is not solely due to failures which happened during the corresponding time period. So the area above a short span in early 2010 includes area contributed by WaMu, which happened in Sept 2008.
My read of it was that the circles are centered on the date of the failure, and the stacking was some flavor of minimum height. The result is you get an approximation of "area under the curve" of bank value that has failed, while retaining a sense of whether a particular peak was a few large actors, or a mound of tiny. I find that clear distinction valuable.
The nice thing about Mike having posted the code is that, if you think there's a better way to portray it, the canvas is spread out before you. :)
The drawbacks to me seem to be:
- At any given point on the x-axis, the associated height of the pile is partly taken up by circles but is partly space lost to circle packing. This proportion is hard to estimate visually.
- Because circles extend in the x-direction, they contribute to height over a _variable_ time range. E.g. WaMu is still the tallest circle at the 2010 line. It's a bit like a kernel density plot where the _kernel_ is data dependent.
Because of these effects taken together, though it's tempting to think of the profile of the pile as a smoothed average of the rate of bank-failures, but this is pretty misleading, which I suppose is why there's no labeled y-axis.