Thursday, January 17, 2008

Hackystat Version 8

Hackystat Version 8 is now in public release. Hackystat is an open source framework for collection, analysis, visualization, interpretation, annotation, and dissemination of software development process and product data. This eighth major redesign of the system is intended to retain the advantages of previous versions while incorporating significant new capabilities:

RESTful web service architecture. The most significant change in Version 8 is its re-implementation as a set of web services communicating using REST principles. This architecture facilitates several of the features noted below, including scalability, openness, and platform/language neutrality.

Sensor-based. A primary means of data collection in Hackystat is through "sensors": small software plugins to development tools that unobtrusively collect and transmit low-level data to the Hackystat sensor data repository service called the "SensorBase".

Extensible. Hackystat can be extended to support new development tools by the creation of new sensors. It can also be extended to support new analyses by the creation of new services.

Open. All sensors and services communicate via HTTP PUT, GET, POST, and DELETE, according to RESTful web service principles. This "open" API has two advantages: (1) it makes it easy to extend the Hackystat Framework with new sensors and services; and (2) it makes it easy to integrate Hackystat sensors and services with other information services. Hackystat can participate as just one part of an "ecosystem" of information services for an organization.

High performance. The default version of the SensorBase uses an embedded Derby RDBMS for its back-end data store. Initial performance evaluation of this repository, in combination with our multi-threaded client-side SensorShell, has been been quite encouraging: we have achieved sustained transmission rates of approximately 1.2 million sensor data instances per hour. The SensorBase is designed to allow "pluggable" back-end data stores. One organization, for example, is using Microsoft SQL server as the back-end data store.

Scalable. A natural outcome of a web service architecture is scalability: one can distribute services across multiple services or aggregate them on a single server depending upon load and resource availability. Hackystat is also scalable due to the fact that each organization can run its own local SensorBase, or even multiple SensorBases if required. Finally, Hackystat can exploit HTTP caching as yet another scalability mechanism.

Secure. While Hackystat maintains a "public" SensorBase and associated services for use by the community, we expect that most organizations adopting Hackystat will choose to install and run the SensorBase and associated services locally and internally. This facilitates data security and privacy for organizations who do not wish sensitive product or process information to go beyond their corporate firewalls.

Platform and language neutrality. Hackystat's implementation as a set of RESTful web services makes it language and platform neutral. For example, a sensor implemented in .NET and running on Windows might send information to a SensorBase written in Java running on a Macintosh, which is queried by a user interface written in Ruby on Rails web application hosted on a Linux machine.

Open Source. Hackystat is hosted at Google Project Hosting, and distributed among approximately a dozen individual projects. The "umbrella" Hackystat project includes a Component Directory page with links to all of the related subprojects. Since most subprojects correspond to independent Hackystat services, they are typically free to choose their own open source license, though most have chosen GNU V2.

Out of box support for process and product data collection and analysis. The standard Hackystat includes a variety of process and product data collection and analyses, including: editor events and developer editing time, coverage, unit test invocations, build invocations, code issues discovered through static analysis tools, size metrics, complexity metrics, churn, and commits. Of course, the Open API makes it possible to extend this list with more.

When we began work on Hackystat in 2001, we thought of it primarily as a software metrics framework. Seven years later, we find that vision limiting, because it tends to focus one on the collection and display of numbers. Our vision for Hackystat now is broader: we believe that the collection and display of numbers is just the first step in an ongoing process of collaborative sense-making within a software development organization. An organization needs numbers, but it also needs ways to get those numbers to the right people at the right time. More importantly, it needs ways to incrementally interpret, re-interpret, and annotate those numbers over time to build up a collective consensus as to their meaning and implications for the organization. Our goal for Hackystat Version 8 is to be an effective infrastructure for participation in the broader knowledge gathering and refinement processes of an organization, or even the software development community as a whole. If successful, it can play a role in creating new mechanisms for improving the collective intelligence of a software development group.