Thursday, March 29, 2007

Hackystat UI: Swivel Google Gadgets

Swivel is a site where users can upload data sets and combine contributed data sets in various ways.

What I discovered today is their interface to the Google Home Page via Google Gadgets. I think this is a nice example of how simple it could be to provide a Version 8 Hackystat user interface via Google Gadgets.

Check it out here.

Wednesday, March 28, 2007

REST and web services

Some useful links to understand Representational State Transfer:

REST seems like an appropriate architectural style for the Version 8 web service component.

The framework I am most interested in evaluating for Java-based REST components is Restlet.

Tuesday, March 27, 2007

Hackystat UI: Wesabe and Social Software Metrics

Robert recently pointed me to Wesabe, which is a social networking site focusing on personal finances. This is an interesting site to compare/contrast with Hackystat, since it:
  • Deals with numbers and "metrics".
  • Requires members to share aspects of very personal information (finances) in order to exploit the potential of social networks.
Their help guide is in the form of YouTube videos, which is a little weird (or maybe the wave of the future). I will show some blurry screen shots to illustrate some of the interesting aspects of this tool. This first one shows the top-level organization of your Wesabe account, which has three tags: Accounts, Tips, and Goals.

Accounts basically corresponds to your "raw sensor data" in Hackystat. In Wesabe, you are expected to upload your bank and credit card information.

Tips correspond to information supplied by other users based upon analysis of your account data. The idea is that the raw financial data is parsed to find out what you spend money on.

For example, if you have gas charges, then you will be hooked up with Tips on how to save money on gas. They use a keyword-based mechanism to hook together account data with tip data.

The tips could be generic (don't buy premium gas) or more specific (Don't buy gas from the gas station you're going to; they are a rip-off).

Here's a screen shot of a drill-down into an account, along with the tips and keywords associated with it. Often, you will need to manually annotate your raw financial data in order for Wesabe to start to work its magic on it. You can also see from this screen shot that individual financial items can be rated, and you can also see whether other Wesabeans have recorded a similar kind of purchase.

While tips are a kind of "bottom up" mechanism for producing "actionable" information from your raw account data, goals are more of a top-down approach, in which you first specify your high level goal, and then you get hooked up with other users interested in the same approach.

In Wesabe, it seems that the main focus is to direct you into existing discussion forums rather than explicitly connect you to your financial data. For example, a goal would be something like "Start a College Savings fund for my kids.

So, how does this all relate to Hackystat? I think there are some intriguing possibilities. First, Hackystat currently allows data to be "shared" only within the context of a Project---if there are multiple members of a Project, then they can potentially see each other's data. Wesabe illustrates how you might think about "sharing" on a more global level. The idea is that you don't share the actual financial information: no one knows where you shopped or how much spent, but via the social bookmarking mechanism, the system can hook you up with "tips" (specific actionable items) or "goals" (a community of people with the same intentions).

To explore how this might work, let's imagine some possible "Tips" from the realm of Java software development:
  • How to convert from Java 1.4 to Generics in Java 5 (See my previous blog posting on this.)
  • Proper use of concurrency mechanisms.
  • Diagnosing a null pointer problem.
Hmm. These tips all seem to require more context than is typically provided by Hackystat data. One could image a sensor data type that provides data on the import statements associated with a file you are editing. That would give some insight into the kinds of libraries you are using, which might enable you to be hooked up with helpful tips. Another sensor data type might provide the stack trace and error message associated with a thrown exception.

Now let's think of possible "Goals"
  • Reduce the number of daily build failures.
  • Reduce the time required for running all unit tests.
  • Improve the quality of code.
  • Improve the scalability of the database.
Some of these might be inferable from the kinds of telemetry charts you are monitoring, for example.

In any case, Wesabe indicates an interesting research direction for Hackystat: create the capability for users to add keywords to their data, and then process these keywords as a way to hook users with common interests and mutually useful skills with each other.

Hackystat UI : Telemetry and Alexa charts

Alexa is a site that provides information relating to site traffic. Hongbing sent out a link recently to this site with a query as to how they could produce PNG charts so quickly. I assume that with enough CPU and network bandwidth, anything is possible. What I was personally struck by is their user interface, which rather elegantly supports a lot of the features we want from telemetry. Consider the following screen image from their site, which I have annotated with a 1, 2, 3, and 4.

UI Feature (1) is the tabs, which provide various perspectives on the set of sites. From the Telemetry perspective, this is analogous to a set of related Charts. Thus, what they've done is provided the equivalent of the Telemetry Report interface, but in a much nicer package. Instead of scrolling down through an endless series of charts, you click on a tab to see the related chart. Much, much nicer.

UI Feature (2) is the "Range" selection. This is analogous to our "Day", "Week", "Month" interval selection mechanism. While it is not as flexible as ours, it provides easier access to common interval requests: Last 7 days, Last 1 Month, Last 3 months, etc.

UI Feature (3) is the "See Traffic Details". This is analogous to our "Daily Project Summary" drilldown (or maybe a "Project Status To Date" analysis.

UI Feature (4) is the ability to easily add and subtract different trend lines. This is interesting when translated to the Telemetry domain, because it could be interpreted in one of two different ways: (a) add/subtract one or more telemetry streams, or (b) add/subtract one or more Projects. Indeed, we might want to think of providing both abilities: you could specify what telemetry streams you want to display, and then this set of telemetry streams would be specified for each Project you specify. Thus the number of lines appearing on the chart would be the number of streams times the number of projects. In most cases, you will probably want to have one stream and multiple Projects, or multiple streams for one Project.

Monday, March 26, 2007

Software Development Communication Media

As I have maintained this online engineering log over the past couple of weeks, I have started to think more generally about the types of media used in a software development team to communicate and coordinate activities:

(1) Requirements and Design documents. These are relatively static documents, providing a high-level perspective. Relatively non-interactive.

(2) Mailing lists. Provide a forum for threaded communication. Generally ASCII oriented. Generally interactive.

(3) JavaDocs. Generated from the code itself. Communicates API-level information. Non-interactive. Context-sensitive with the code.

(4) Engineering logs. Semi-interactive way for an individual developer to document design issues, questions, and so forth. Others could potentially comment on log entries.

(5) CM Commit messages. Provides a record of the changes made to a set of sources.

(6) Issues. Provides a decomposition of the high level requirements/design documents into a set of tasks. Also records bugs found.

(7) IM. Generally non-persistent, highly interactive means for developers to get immediate, synchronous feedback and help.

(8) Face to face meetings.

Recently, I was thinking about an idea that I wanted to discuss with the Hackystat software development team, and I was unsure of how to do it:
  • Send it as an email to the developer's mailing list?
  • Write about it in my online engineering log?
  • Write about it in my online engineering log, then send a link to that entry in an email to the developer's mailing list?

Wednesday, March 21, 2007

Java 5 Conversion Notes

Having finished hackyCore_Kernel, my basic strategy for updating Hackystat code to 1.5 is currently the following:

(0) SVN update, then 'ant -q freshStart all.junit'. Make sure the system isn't busted before you start busting on it. (Be sure to configure the file to include the modules you will be working on.)

(1) Use Eclipse to identify a class containing at least one warning.

(2) Fix instance variables. Navigate to that class, then go to the top of the file and check for any collection classes as instance variables. If present, add type information. For example,
private static TreeMap  numOfPeriodsMap = null;
private static TreeMap<String, String> numOfPeriodsMap = null;
Note that this often requires some hunting through the file to determine the kinds of objects being added to the collection.

(3) Fix collection references. Continue through source code, updating references to collection classes to include type information. For example,
numOfPeriodsMap = new TreeMap(new StringIntegerComparator());
numOfPeriodsMap = new TreeMap<String, String>(new StringIntegerComparator());
(4) Update comparators to include type information. For example,
public class StringIntegerComparator implements Comparator {
public int compare(Object o1, Object o2) {
public class StringIntegerComparator implements Comparator<String> {
public int compare(String o1, String o2) {
(5) Update method signatures to include type information. For example,
public static TreeMap getYearOptions() {
public static TreeMap<String, String> getYearOptions() {
(6) Remove occurrences of "old style" for loops. For example,
Set analysisNameList = manager.getAnalysisNames();
for (Iterator i = analysisNames.iterator(); i.hasNext();) {
String analysisName = (String);
String enabled = request.getParameter(analysisName);
Set<String> analysisNameList = manager.getAnalysisNames();
for (String analysisName : analysisNames) {
String enabled = request.getParameter(analysisName);
In some situations, it doesn't make sense to update them. For example, I've seen loops where the next() method was called twice in each body (the list contained "pairs" of objects that were operated on two at a time.) In this case, you must leave it as an "old style" loop.

(7) Implement Iterable<T> if necessary. In some cases, to accomplish (6) you must have a class implement "Iterable". For example, the SdtManager class should enable you to iterate across all instances of SensorDataTypes using the following for/in loop:
for (SensorDataType sdt : SdtManager.getInstance())
To accomplish that, the SdtManager class had to be changed from:
public class SdtManager  {
public Iterator iterator() {
public class SdtManager implements Iterable<SensorDataType> {
public Iterator<SensorDataType> iterator() {

(8) Remove vestigial casts. After adding in the type information, you will hopefully be able to remove casts.

(9) Remove vestigial imports. When you're all done with a class, you will hopefully need to remove imports. For example:
import java.util.Iterator;

(10) Use @SuppressWarnings for the SerialVersionUID warning. There's just no reason to add this instance variable for Hackystat code; it will not be serialized.

(11) Continue until Eclipse reports no warnings for this class.

Note that sometimes this strategy requires working on several classes at once when they are interdependent.

(12) 'ant -q freshStart all.junit', then SVN commit. Be sure the system is OK, then commit your changes du jour. I found that my Java 5 updates would sometimes create Checkstyle errors, so be sure to do a 'freshStart'. (Double check that the file includes the modules you have been working on.)

Also, I'm cleaning up documentation, removing the @version tag that is a relic of the CVS days, etc. as I work on the class. As long as I'm touching it, I might as well make the JavaDocs better and fix any coding bogosities I encounter.

Monday, March 19, 2007

Scalability and StringListCodec

Now that I'm worrying about Large Scale Hackystat applications, I've run across a scalability issue in StringListCodec. It currently hardwires the maximum string size as 99,999 and the maximum number of strings as 9,999.

These translate to constraints on (a) the size of a sensor data instance and (b) the number of sensor data instances that can be sent in any individual Hackystat SOAP transmission.

Fortunately, this is easy to fix: these are constants that could potentially be overridden, for example, by a property value. The cost of increasing them is an increase in the "fixed cost" of a sensor data transmission, due to the way StringListCodec works.

Friday, March 16, 2007

Java 5 Migration

Frustrations du jour:
  • SerialVersionUID. Eclipse generates a warning whenever a serializable class (such as all classes that inherit from Exception) don't define this instance variable. I've decided to add an @SuppressWarnings("serial"), but I'm not sure if this is the right decision.

Thursday, March 15, 2007

To Do

  • Remove obsolete configurations from build.
  • Fix Hackystat HPC dependencies
  • Update Version 8 document to include scalability section

Wednesday, March 14, 2007

Java 5 migration

Now beginning the great Java 5 update of the Core subsystem.

Used Eclipse and set the compiler warnings back to 'default'. Got 587 warnings.

HackyCore_Telemetry is going to be a problem since it has JavaCC generated code. I don't know if I can disable Eclipse warnings on just a set of packages. I know I can individually change the warnings settings for a single Project.

Decided to work on one module at a time. Now fiddling with HackyCore_Build.

Here's an issue: I would like to write the following code:

List propertyList = project.getChildren("property");

However, project is a class from JDOM and is not generic, so I get the following warning:

Type safety: The expression of type List needs unchecked conversion to conform to List

That kind of sucks. Either propertyList is a raw collection class (and I have to do a class cast when iterating through it) or I have this warning. Here's my options:

I'm going to go with the @SuppressWarnings

Online Engineering Logs

A traditional engineering log (EL) is a notebook. Developers maintain an EL to record information that facilitates software development, including "to do" lists, emergent designs, problems they are encountering, and rationales for implementation decisions. They serve as a kind of "offline memory", that helps engineers in several ways:
  • It helps them to quickly re-establish what they were working on at the start of each day, or if they have been interrupted from the project for multiple days.
  • It provides a media in which to work out issues and questions they are having during development.
  • It supports management of the many 'micro-tasks' that emerge during development. By maintaining a "to do" list, the engineer can keep track of things that need to be done later without interrupting the current task.
For over 10 years, I have kept a traditional engineering log in a series of notebooks. This blog represents an experiment in moving my private, personal, offline EL into a shared, public, online setting. I am interested in understanding what the trade-offs are between private offline ELs and public online ELs.

Pros and Cons:

Advantages of Online Engineering Logs:
  • People can comment on postings. So, you can indicate an issue you're having in your EL and someone might help you with it.
  • By tagging a post with the issue ID, you can search the online EL later to recover design rationale information.
  • Other developers in your workgroup can subscribe to your blog in order to keep track of what you're doing without interrupting you.
  • It is easy to include images.
  • It is easy to include links.
Disadvantages of Online Engineering Logs:
  • There is a huge privacy hit. It is definitely way different to be writing in a public forum vs. your own notebook where no one is going to see it. I am interested to see whether this is ultimately positive or negative.
  • You have to be online to manipulate them. I will perhaps be writing things on paper and transferring to this blog?
  • At first, you spend a lot of time fooling around with formatting.
  • It is hard to doodle.

Adapting the Blog media to Engineering Logs:
  • It's not clear that you want to maintain strict timestamps. For example, I want to maintain this list within this entry over time and edit it repeatedly. It would not make sense to build up this list by scattering it across all of the days that I came up with items. I have edited this entry repeatedly over the past several weeks, which seems perfectly appropriate. This seems quite counter to the conventional wisdom for maintaining engineering notebooks, where timestamping your thoughts is a critical feature.
  • Personal vs. Project Engineering logs. One could imagine keeping a single blog for a project, with multiple authors, or having each author keep their own blog, and use tags to indicate the project. I am not at all sure what the pros and cons these approaches are.
Related links:

Engineering notebooks are related to engineering logs but are more focussed on patent protection:

Some example online engineering logs:

None of these seem to fully exploit the possibilities of modern blogging infrastructure for supporting online engineering logs.