Tuesday, March 18, 2008

From Telemetry to Trajectory

Cam Moore stopped by my office to chat yesterday, and in 15 minutes he managed to completely revolutionize my thinking about how to visualize software project information over time. (Not bad, most visitors usually need at least 30 :).

One outcome of our prior research in Hackystat was the idea of Software Project Telemetry, which Cedric Zhang explored for his Ph.D. thesis. Software Project Telemetry defines a nifty domain specific language for defining the kinds of software measures you want to consider, how to combine them together, and how to display them as trend lines. One thing that's neat about telemetry is that it enables you to discover covariances among measures. For example: "Hey, when my percentage of TDD-compliant episodes drops from 85% to 40%, my coverage drops precipitously too!" (That really happened.) It also serves as a way to create an "early warning system" for projects in trouble. For example, if coverage is steadily dropping, complexity is steadily increasing, and coupling is steadily increasing too, then it looks like your design or architecture is in trouble because the trends for three orthogonal static measures of structural quality are all deteriorating. Here's an example of a telemetry chart:

As the above chart illustrates, the X axis is always time, and there can be multiple Y axes for each kind of trend line.

Telemetry turns out to work really well for in-process monitoring of a single project, and we definitely want to continue supporting and enhancing software project telemetry.

Recently, we have started to think about software project "portfolio management": what happens when an organization has 100 or more projects under development and wants insight not only into individual projects, but into the "portfolio" as a whole? For example, what projects are similar to each other? What projects constitute "outliers"? What kind of management or organizational changes appear to make broad impacts across multiple projects?

These questions pose difficulties for a conventional telemetry oriented viewpoint. For example, how would you compare the telemetry for a project that is six months in duration with the telemetry for a project that is 12 months in duration? What if one project started in January and another project started in June? The notion of having multiple X axes in addition to multiple Y axes seems problematic at best.

Enter Cam. His idea is to think about "trajectory" instead of "telemetry". To make it simple, let's consider a situation where we are collecting three measures for a project: coupling, complexity, and coverage. Instead of using a 2D plot with the X axis being time, we use a 3D plot, where the three axes are coupling, complexity, and coverage. Now time is implicit in the "trajectory" of the plots through space.

Here's an exceedingly lame mockup using 3D VOPlot with a little powerpoint post-processing:

I actually took some astronomical data to make this image, so you should ignore the axis values and pretty much everything else about this image except the basic notion that we are now focusing on trajectory, rather than telemetry, and this makes certain things much easier.

First, since the "time" dimension is now implicit, we can much more easily compare projects that start/end at different times and/or have different durations. Just plot their trajectories using different color points for different projects, and the scaling/displacing comes "for free". My mockup illustrates two projects, one with a sequence of blue dots, and one with a sequence of green dots. The visualization makes a couple of things pretty obvious: (a) for a while, Project Blue and Project Green have pretty similar trajectories, except that Project Green's complexity is below Project Blue's, and (b) something weird happened to Project Blue near the end that didn't happen to Project Green.

Note that we have no idea about the relative durations of Project Green or Blue, nor about their start/end dates. And in many cases abstracting away those details might be exactly the right thing to do.

Second, we can now ask ourselves a whole new set of interesting questions about the trajectories associated with different projects, such as:
  • What set of measures create interesting trajectories?
  • Which projects have similar trajectories?
  • Which projects have anomalous trajectories?
  • What about higher dimensionalities, where we want to compare trajectories involving more than three measures?
I look forward with great anticipation to the next time Cam drops by for a little chat.


Pavel Senin said...

Old joke just popped in my head:
A mathematician and an engineer attend a lecture by a physicist. The topic concerns Kulza-Klein theories involving physical processes that occur in spaces with dimensions of 9, 12 and even higher. The mathematician is sitting, clearly enjoying the lecture, while the engineer is frowning and looking generally confused and puzzled. By the end the engineer has a terrible headache.
At the end, the mathematician comments about the wonderful lecture.

The engineer says "How do you understand this stuff?"
Mathematician: "I just visualize the process."
Engineer: "How can you visualize something that occurs in 9-dimensional space?"
Mathematician: "Easy, first visualize it in N-dimensional space, then let N go to 9."

Anonymous said...
This comment has been removed by a blog administrator.