PMJ Engineering Log

Tuesday, August 28, 2007

Grid Computing, Java, and Hackystat

I just got finished watching a really interesting screencast called "Grid Application in 15 minutes" that features GridGain, a new open source grid computing framework in Java. See their home page for the link to the screencast.

Things I found interesting while watching the screencast:

It uses some advanced Java features (Generics, Annotations, AOP) to dramatically simplify the number of lines of code required to grid-enable a conventional application.
It is a nice example of how to use the Eclipse framework to maximize the amount of code that Eclipse writes for you and minimize the amount that you have to type yourself.

I think there are some really interesting opportunities in Hackystat for grid computing. Many computations related to DailyProjectData and Telemetry (for example) are "embarrassingly parallel" and GridGain seems like the shortest path to exploiting this domain attribute.

Thursday, August 23, 2007

Project Proprioception

In the latest issue of Wired Magazine, there is an interesting article in defense of Twitter. One thing he says is that you can't really understand Twitter unless you actually do it (which might explain why I don't really understand Twitter.)

He goes on to say that the benefit of Twitter is "social proprioception":

When I see that my friend Misha is "waiting at Genius Bar to send my MacBook to the shop," that's not much information. But when I get such granular updates every day for a month, I know a lot more about her. And when my four closest friends and worldmates send me dozens of updates a week for five months, I begin to develop an almost telepathic awareness of the people most important to me.

It's like proprioception, your body's ability to know where your limbs are. That subliminal sense of orientation is crucial for coordination: It keeps you from accidentally bumping into objects, and it makes possible amazing feats of balance and dexterity.

Twitter and other constant-contact media create social proprioception. They give a group of people a sense of itself, making possible weird, fascinating feats of coordination.

Aha! That makes a lot more sense to me, and also suggests the following hypothesis:

Hackystat's fine-grained data collection capabilities can support "project proprioception": the ability for a group of developers to have "a sense of themselves" within a given software development project.

I think that DevEvents and Builds and so forth support a certain level of project proprioception without any further interaction with the developer. But, what if Hackystat had a kind of "Twitter sensor", in which developers could post small nuggets of information about what they were thinking about or struggling with that could be combined with the DevEvents:

"Trying to figure out the JAXB newInstance API"
"WTF is with this RunTime Exception?"
"General housecleaning for the Milestone Release"
"Pair Programming With Pavel"
"Reviewing the Ant Sensor"
"Upgrading Tomcat"

Now imagine these messages being combined with the other Hackystat DevEvents and being visualized using something like Simile/Timeline. Further, imagine the timeline being integrated into a widget with a near-real-time nature like the Sensor Data Viewer, such that you could see the HackyTwitter information along with occurrences of builds, tests, and commits scrolling by on a little window in the corner of your screen. Would this enable "weird, fascinating feats of coordination" within a software development project?

Sounds cool to me.

Wednesday, August 8, 2007

Web application development

Here's a really nice "screen cast" that compares web development in several different languages/frameworks (J2EE, Ruby on Rails, Zope/Plone, TurboGears, etc.)

<http://oodt.jpl.nasa.gov/better-web-app.mov>

A few of the things I found interesting:

Presentation style is quite different from standard Powerpoint "Title plus bullet list". I would love to evolve to his style for my lectures.
Provides evidence that we made the right choice for the new ICS website. :-)
One of the more compelling illustrations I've seen of the differences between Ruby on Rails and Java/J2EE for web development. RoR beats J2EE by a mile, but doesn't win overall

It's fairly long but held my interest all of the way through.

If you want to learn how he puts these presentations together, see here.

Wednesday, July 11, 2007

Empirical Software Engineering and Web 3.0

I came across two interesting web pages today that started me thinking about empirical software engineering in general and Hackystat in particular with respect to the future of web technology.

The first page contains an interview with Tim Berners-Lee on the Semantic Web. In his response to the request to describe the Semantic Web in simple terms, he talks about the lack of interoperability between the data in your mailer, PDA calendar, phone, etc. and pages on the web. The idea of the Semantic Web, I guess, is to add sufficient semantic tagging to the Web to provide seamlessness between your "internal" data and the web's "external" data. So, for example, any web page containing a description of an event would contain enough tagging that you could, say, right click on the page and have the event added to your calendar.

There is a link on that page to another article by Nova Spivak on Web 3.0. It contains the following visualization of the web's evolution:

To me, what's interesting about this is the transition we're now in between Web 2.0, which is primarily focused on user-generated, manual "tagging" of pages, and Web 3.0, where this kind of "external" tagging will be augmented by "internal" tagging that provides additional semantics about the content of the web document.

It seems to me that the combination of internal and external tagging can provide interesting new opportunities for empirical software engineering. Let's go back to Tim Berners-Lee's analogy for a second: it's easy to transpose this analogy to the domain of software development. Currently, a development project produces a wide range of artifacts-- requirements documents, source code, documentation, test plans, test results, coverage, defect reports, and so forth. All of these evolve over time, all are related to each other, and I would claim that all use (if anything) a form of "external" tagging to show relationships. For example, a configuration management system enables a certain kind of "tagging" between artifacts which is temporal in nature. Some issue management systems, like Jira, will parse CV commit messages looking for Issue IDs and use that to generate linkages between Issues and the modifications to the code base related to them.

Nova Spivak adds a few other technologies to the Web 3.0 mix besides the Semantic Web and its "internal" tagging:

Ubiquitous connectivity
Software as service
Distributed computing
Open APIs
Open Data
Open Identity
Open Reputation
Autonomous Agents

The "Open" stuff is especially interesting to me in light of the recent emergence of "evaluation" sites for open source software such as Ohloh, SQO-OSS, and Coverity/Scan. Each of these sites are trying to address the issue of how to evaluate open source quality. Each of them are more-or-less trying to do it within the confines of Web 2.0.

Evaluation of open source software is an interesting focus for the application of Web 3.0 to empirical software engineering, because open source development is already fairly transparent and accessible to the research community, and also because increasing numbers of open source software are becoming mission-critical to organizational and governmental infrastructure. The Coverity/Scan effort was financed by the Department of Homeland Security, for example.

Back to Hackystat. It seems to me that Hackystat sensors are, in some sense, an attempt to take a software engineering artifact (the sensor data "Resource" field, in Hackystat 8 terminology), and retrofit Web 3.0 semantics on top of it (the SensorDataType field being a simple example). The RESTful Hackystat 8 services are then a way to "republish" aspects of these artifacts in a Web 3.0 format (i.e. as Resources with a unique URI and an XML representation) . What is currently lacking in Hackystat 8 is the ability to obtain a resource in RDF representation rather than our home-grown XML, but that is a very small step from where we are now.

There is a lot more thinking I'd like to do on this topic (probably enough for an NSF proposal), but I need to stop this post now. So, I'll conclude with three research questions at the intersection of Web 3.0 and empirical software engineering:

Can Web 3.0 improve our ability to evaluate the quality/security/etc. of open source software development projects?
Can Web 3.0 improve our ability to create a credible representation of an open source programmer's skills?
Can Web 3.0 improve our ability to create autonomous agents that can provide more help in supporting the software development process?

Tuesday, June 19, 2007

Bile Blog, Google Project Hosting, and Download Deletion

First off, a pretty hilarious Bile Blog posting on Google Project Hosting.

Even better, one of the comments describes how to delete download files:

Click on the "Summary + Labels" link for the file you wish to delete.
Click on the "Click to edit download" link
A "Delete" link will now appear in the toolbar.

Monday, May 28, 2007

Restful Resources

This past week, I've come across a couple really useful resources for REST style architectural development that I can recommend:

The first is the O'Reilly book "Restful Web Services". I'm about halfway through and it has already illuminated some dark corners of Restful web service design, such as:

When to use POST vs. PUT. (Use POST when the server is responsible for generating the URI of the associated resource; use PUT when the client is responsible for generating the URI).
Authentication.

The authors make a point of distinguishing between REST in general and REST when applied to web service design, for which they describe a set of concrete best practices they call "Resource Oriented Architecture". This is very nice, and reminds me of Effective Java, which provides a set of best practices for Java software development.

The second REST resource I would like to recommend is the "Poster" plugin for FireFox. Poster enables you to make GET, PUT, POST, and DELETE http calls from within FireFox and see the results. It is a nice way to obtain a sanity check on what your web service is doing when you don't quite understand why your unit tests are failing.

Tuesday, May 22, 2007

Hackystat on Ohlo

I came across Ohlo recently, and decided to create Ohlo projects for Hackystat-6, Hackystat-7, and Hackystat-8. Ohlo is a kind of directory/evaluation service for Open Source projects that generates statistics by crawling the configuration management repository associated with the project. It also generates some pretty interesting data about individual committors.

There's a lot of things I found interesting about the Hackystat-7 Ohlo project:

The Hackystat development history is quite truncated and only goes back a year and a half (basically when we switched to Subversion). I consulted the FAQ, where I learned that if I also point Ohlo at our old CVS repository for Hackystat 6, it will end up double counting the code. Oh well. That's why there's three unconnected projects for the last three versions of Hackystat.
They calculate that the Hackystat-7 code base represented 65 person-years of effort and about $3.5M investment. I think that's rather low, but then again, they only had 18 months of data to look at. -)
There is more XML than Java in Hackystat-7. That's a rather interesting insight into the documentation burden associated with that architecture. I hope we can reduce this in Hackystat-8.
The contributor analyses are very interesting as well, here's mine. This combines together the stuff from all three Hackystat projects. I find the Simile/Timeline representation of my commit history particularly cool.

There are a number of interesting collaborative possibilities between Hackystat and Ohlo, which I will post about later. If you have your own ideas, I'm all ears.

Finally, it seems pretty clear from their URLs that they are using a RESTful web service architecture.

There are several other active CSDL open source projects that we could add to Ohloh: Jupiter, LOCC, SCLC.