Wednesday, July 11, 2007

Empirical Software Engineering and Web 3.0

I came across two interesting web pages today that started me thinking about empirical software engineering in general and Hackystat in particular with respect to the future of web technology.

The first page contains an interview with Tim Berners-Lee on the Semantic Web. In his response to the request to describe the Semantic Web in simple terms, he talks about the lack of interoperability between the data in your mailer, PDA calendar, phone, etc. and pages on the web. The idea of the Semantic Web, I guess, is to add sufficient semantic tagging to the Web to provide seamlessness between your "internal" data and the web's "external" data. So, for example, any web page containing a description of an event would contain enough tagging that you could, say, right click on the page and have the event added to your calendar.

There is a link on that page to another article by Nova Spivak on Web 3.0. It contains the following visualization of the web's evolution:







To me, what's interesting about this is the transition we're now in between Web 2.0, which is primarily focused on user-generated, manual "tagging" of pages, and Web 3.0, where this kind of "external" tagging will be augmented by "internal" tagging that provides additional semantics about the content of the web document.

It seems to me that the combination of internal and external tagging can provide interesting new opportunities for empirical software engineering. Let's go back to Tim Berners-Lee's analogy for a second: it's easy to transpose this analogy to the domain of software development. Currently, a development project produces a wide range of artifacts-- requirements documents, source code, documentation, test plans, test results, coverage, defect reports, and so forth. All of these evolve over time, all are related to each other, and I would claim that all use (if anything) a form of "external" tagging to show relationships. For example, a configuration management system enables a certain kind of "tagging" between artifacts which is temporal in nature. Some issue management systems, like Jira, will parse CV commit messages looking for Issue IDs and use that to generate linkages between Issues and the modifications to the code base related to them.

Nova Spivak adds a few other technologies to the Web 3.0 mix besides the Semantic Web and its "internal" tagging:
  • Ubiquitous connectivity
  • Software as service
  • Distributed computing
  • Open APIs
  • Open Data
  • Open Identity
  • Open Reputation
  • Autonomous Agents
The "Open" stuff is especially interesting to me in light of the recent emergence of "evaluation" sites for open source software such as Ohloh, SQO-OSS, and Coverity/Scan. Each of these sites are trying to address the issue of how to evaluate open source quality. Each of them are more-or-less trying to do it within the confines of Web 2.0.

Evaluation of open source software is an interesting focus for the application of Web 3.0 to empirical software engineering, because open source development is already fairly transparent and accessible to the research community, and also because increasing numbers of open source software are becoming mission-critical to organizational and governmental infrastructure. The Coverity/Scan effort was financed by the Department of Homeland Security, for example.

Back to Hackystat. It seems to me that Hackystat sensors are, in some sense, an attempt to take a software engineering artifact (the sensor data "Resource" field, in Hackystat 8 terminology), and retrofit Web 3.0 semantics on top of it (the SensorDataType field being a simple example). The RESTful Hackystat 8 services are then a way to "republish" aspects of these artifacts in a Web 3.0 format (i.e. as Resources with a unique URI and an XML representation) . What is currently lacking in Hackystat 8 is the ability to obtain a resource in RDF representation rather than our home-grown XML, but that is a very small step from where we are now.

There is a lot more thinking I'd like to do on this topic (probably enough for an NSF proposal), but I need to stop this post now. So, I'll conclude with three research questions at the intersection of Web 3.0 and empirical software engineering:

  • Can Web 3.0 improve our ability to evaluate the quality/security/etc. of open source software development projects?
  • Can Web 3.0 improve our ability to create a credible representation of an open source programmer's skills?
  • Can Web 3.0 improve our ability to create autonomous agents that can provide more help in supporting the software development process?

Tuesday, June 19, 2007

Bile Blog, Google Project Hosting, and Download Deletion

First off, a pretty hilarious Bile Blog posting on Google Project Hosting.

Even better, one of the comments describes how to delete download files:
  1. Click on the "Summary + Labels" link for the file you wish to delete.
  2. Click on the "Click to edit download" link
  3. A "Delete" link will now appear in the toolbar.

Monday, May 28, 2007

Restful Resources

This past week, I've come across a couple really useful resources for REST style architectural development that I can recommend:

The first is the O'Reilly book "Restful Web Services". I'm about halfway through and it has already illuminated some dark corners of Restful web service design, such as:
  • When to use POST vs. PUT. (Use POST when the server is responsible for generating the URI of the associated resource; use PUT when the client is responsible for generating the URI).
  • Authentication.
The authors make a point of distinguishing between REST in general and REST when applied to web service design, for which they describe a set of concrete best practices they call "Resource Oriented Architecture". This is very nice, and reminds me of Effective Java, which provides a set of best practices for Java software development.

The second REST resource I would like to recommend is the "Poster" plugin for FireFox. Poster enables you to make GET, PUT, POST, and DELETE http calls from within FireFox and see the results. It is a nice way to obtain a sanity check on what your web service is doing when you don't quite understand why your unit tests are failing.

Tuesday, May 22, 2007

Hackystat on Ohlo

I came across Ohlo recently, and decided to create Ohlo projects for Hackystat-6, Hackystat-7, and Hackystat-8. Ohlo is a kind of directory/evaluation service for Open Source projects that generates statistics by crawling the configuration management repository associated with the project. It also generates some pretty interesting data about individual committors.

There's a lot of things I found interesting about the Hackystat-7 Ohlo project:
  • The Hackystat development history is quite truncated and only goes back a year and a half (basically when we switched to Subversion). I consulted the FAQ, where I learned that if I also point Ohlo at our old CVS repository for Hackystat 6, it will end up double counting the code. Oh well. That's why there's three unconnected projects for the last three versions of Hackystat.
  • They calculate that the Hackystat-7 code base represented 65 person-years of effort and about $3.5M investment. I think that's rather low, but then again, they only had 18 months of data to look at. -)
  • There is more XML than Java in Hackystat-7. That's a rather interesting insight into the documentation burden associated with that architecture. I hope we can reduce this in Hackystat-8.
  • The contributor analyses are very interesting as well, here's mine. This combines together the stuff from all three Hackystat projects. I find the Simile/Timeline representation of my commit history particularly cool.
There are a number of interesting collaborative possibilities between Hackystat and Ohlo, which I will post about later. If you have your own ideas, I'm all ears.

Finally, it seems pretty clear from their URLs that they are using a RESTful web service architecture.

There are several other active CSDL open source projects that we could add to Ohloh: Jupiter, LOCC, SCLC.

Friday, May 11, 2007

Sample Restlet Application for Hackystat

I decided to get my feet wet with Restlet by building a small little server with the following API:

http://localhost:9876/samplerestlet/file/{filename}

The idea is that it will retrieve and display {filename}, which is an instance of the "file" resource.

This was a nice way to wade into the Restlet framework; more than a Hello World app, lacking stuff we don't need (like Virtual Hosts), and requiring stuff we do (URL-based dispatching).

To see what I did, check out the following 'samplerestlet' module from SVN:

svn://www.hackystat.org/csdl/samplerestlet

To build and run it:
  1. Download Restlet-1.0, unzip, and point a RESTLET_HOME env variable at it.
  2. Build the system with "ant jar"
  3. Run the result with "java -jar samplerestlet.jar"
This starts up an HTTP server that listens on port 9876.

Try retrieving the following in your browser: http://localhost:9876/samplerestlet/file/build.xml

Now look through the following three files, each only about 50 LOC:
  1. build.xml, which shows what's needed to build the executable jar file.
  2. Server.java, which creates the server and dispatches to a FileResource instance to handle URLs satisfying the file/{filename} pattern.
  3. FileResource.java, which handles those GET requests by reading the file from disk and returning a string representation of it.
If this looks confusing, the Restlet Tutorial is a reasonable introduction. There's also a pretty good Powerpoint presentation that introduces both REST architectural design and the Restlet Framework at the Restlet Wiki Home Page. It comes with some decent sample code as well.

Next step is to add this kind of URL processing to SensorBase.

Wednesday, May 9, 2007

How to browse HTML correctly from SVN repositories

I just committed a bunch of HTML files to SVN, then realized that they don't display as HTML when you browse the repository. After painfully reconstructing the solution, I figured it would be good to jot a note on how to deal with this.

First, you need to fix the svn:mime-type property on all of your committed HTML files. To do this, use your SVN client to select all of the HTML files under SVN control, then set their svn:mime-type property to text/html, then commit these changes. That fixes the current files.

To ensure that all of your current and future HTML files are committed from the get-go with the svn:mime-type property set to text/html, you have to use the SVN auto-props feature. What that basically means is that you have to edit the file called "config" in your local SVN installation, and uncomment the following line:

enable-auto-props = yes

Then you have to add a new line that looks like this:

*.html = svn:mime-type=text/html

Finally, you (potentially) need to instruct your SVN client to consider auto-props when doing its commits. For example, in SmartSVN, you have to go to Projects | Default Settings | Working Copy and check "Apply Auto-Props from SVN 'config' file to added files".

In TortoiseSVN, there is a "Setting" menu that allows you to edit the 'config' file in a similar manner.

Tuesday, May 8, 2007

JAXB for Hackystat for Dummies

I spent today working through the XML/Java conversion process for SensorBase resources, and it occurred to me near the end that my struggles could significantly shorten the learning curve for others writing higher level services that consume SensorBase data (such as the UI services being built by Alexey, Pavel, and David.)

So, I did a quick writeup on the approach, in which I refer to a library jar file I have made available as the first SensorBase download.

After so many years using JDOM, which was nice in its own way, it is great to move onward to an even faster, simpler, and easier approach.