PMJ Engineering Log: May 2007

Monday, May 28, 2007

Restful Resources

This past week, I've come across a couple really useful resources for REST style architectural development that I can recommend:

The first is the O'Reilly book "Restful Web Services". I'm about halfway through and it has already illuminated some dark corners of Restful web service design, such as:

When to use POST vs. PUT. (Use POST when the server is responsible for generating the URI of the associated resource; use PUT when the client is responsible for generating the URI).
Authentication.

The authors make a point of distinguishing between REST in general and REST when applied to web service design, for which they describe a set of concrete best practices they call "Resource Oriented Architecture". This is very nice, and reminds me of Effective Java, which provides a set of best practices for Java software development.

The second REST resource I would like to recommend is the "Poster" plugin for FireFox. Poster enables you to make GET, PUT, POST, and DELETE http calls from within FireFox and see the results. It is a nice way to obtain a sanity check on what your web service is doing when you don't quite understand why your unit tests are failing.

Tuesday, May 22, 2007

Hackystat on Ohlo

I came across Ohlo recently, and decided to create Ohlo projects for Hackystat-6, Hackystat-7, and Hackystat-8. Ohlo is a kind of directory/evaluation service for Open Source projects that generates statistics by crawling the configuration management repository associated with the project. It also generates some pretty interesting data about individual committors.

There's a lot of things I found interesting about the Hackystat-7 Ohlo project:

The Hackystat development history is quite truncated and only goes back a year and a half (basically when we switched to Subversion). I consulted the FAQ, where I learned that if I also point Ohlo at our old CVS repository for Hackystat 6, it will end up double counting the code. Oh well. That's why there's three unconnected projects for the last three versions of Hackystat.
They calculate that the Hackystat-7 code base represented 65 person-years of effort and about $3.5M investment. I think that's rather low, but then again, they only had 18 months of data to look at. -)
There is more XML than Java in Hackystat-7. That's a rather interesting insight into the documentation burden associated with that architecture. I hope we can reduce this in Hackystat-8.
The contributor analyses are very interesting as well, here's mine. This combines together the stuff from all three Hackystat projects. I find the Simile/Timeline representation of my commit history particularly cool.

There are a number of interesting collaborative possibilities between Hackystat and Ohlo, which I will post about later. If you have your own ideas, I'm all ears.

Finally, it seems pretty clear from their URLs that they are using a RESTful web service architecture.

There are several other active CSDL open source projects that we could add to Ohloh: Jupiter, LOCC, SCLC.

Friday, May 11, 2007

Sample Restlet Application for Hackystat

I decided to get my feet wet with Restlet by building a small little server with the following API:

http://localhost:9876/samplerestlet/file/{filename}

The idea is that it will retrieve and display {filename}, which is an instance of the "file" resource.

This was a nice way to wade into the Restlet framework; more than a Hello World app, lacking stuff we don't need (like Virtual Hosts), and requiring stuff we do (URL-based dispatching).

To see what I did, check out the following 'samplerestlet' module from SVN:

svn://www.hackystat.org/csdl/samplerestlet

To build and run it:

Download Restlet-1.0, unzip, and point a RESTLET_HOME env variable at it.
Build the system with "ant jar"
Run the result with "java -jar samplerestlet.jar"

This starts up an HTTP server that listens on port 9876.

Try retrieving the following in your browser: http://localhost:9876/samplerestlet/file/build.xml

Now look through the following three files, each only about 50 LOC:

build.xml, which shows what's needed to build the executable jar file.
Server.java, which creates the server and dispatches to a FileResource instance to handle URLs satisfying the file/{filename} pattern.
FileResource.java, which handles those GET requests by reading the file from disk and returning a string representation of it.

If this looks confusing, the Restlet Tutorial is a reasonable introduction. There's also a pretty good Powerpoint presentation that introduces both REST architectural design and the Restlet Framework at the Restlet Wiki Home Page. It comes with some decent sample code as well.

Next step is to add this kind of URL processing to SensorBase.

Wednesday, May 9, 2007

How to browse HTML correctly from SVN repositories

I just committed a bunch of HTML files to SVN, then realized that they don't display as HTML when you browse the repository. After painfully reconstructing the solution, I figured it would be good to jot a note on how to deal with this.

First, you need to fix the svn:mime-type property on all of your committed HTML files. To do this, use your SVN client to select all of the HTML files under SVN control, then set their svn:mime-type property to text/html, then commit these changes. That fixes the current files.

To ensure that all of your current and future HTML files are committed from the get-go with the svn:mime-type property set to text/html, you have to use the SVN auto-props feature. What that basically means is that you have to edit the file called "config" in your local SVN installation, and uncomment the following line:

enable-auto-props = yes

Then you have to add a new line that looks like this:

*.html = svn:mime-type=text/html

Finally, you (potentially) need to instruct your SVN client to consider auto-props when doing its commits. For example, in SmartSVN, you have to go to Projects | Default Settings | Working Copy and check "Apply Auto-Props from SVN 'config' file to added files".

In TortoiseSVN, there is a "Setting" menu that allows you to edit the 'config' file in a similar manner.

Tuesday, May 8, 2007

JAXB for Hackystat for Dummies

I spent today working through the XML/Java conversion process for SensorBase resources, and it occurred to me near the end that my struggles could significantly shorten the learning curve for others writing higher level services that consume SensorBase data (such as the UI services being built by Alexey, Pavel, and David.)

So, I did a quick writeup on the approach, in which I refer to a library jar file I have made available as the first SensorBase download.

After so many years using JDOM, which was nice in its own way, it is great to move onward to an even faster, simpler, and easier approach.

Saturday, May 5, 2007

How to start a new software development project

Alexey made an engineering log post in which he wonders how to get started with a summer job in which he will be asked to develop a "simple client-server system" for decision process simulation. Here's my advice:

1. Create a Google project to host all of your code and documentation. You're going to need to put stuff somewhere. Putting it in a public repository is good for at least two reasons: (a) you get a lot of useful infrastructure (svn, mailing lists, issue tracking, wiki) for free, and (b) your sponsors will feel better about you by having open access to what you're doing. Such transparency is a good thing: it will encourage more effective communication between them and you about the status of the project. If you just show up each week for a meeting and say, "Things are going good", it's easy for the project progress to stall for quite a while before that's apparent. If your project is hosted, then you can show up each week, review with them the issues that you're working on, and show them demos or code or requirements or whatever. The more they understand what you're doing at all points in the process, the happier they will be with you and the more likely you are to succeed.

2. Once you have your project repository infrastructure set up, create a wiki page containing the design of a REST API for client-server communication. Having just created a REST API for the SensorBase, which is itself a "simple client-server system", I can heartily recommend this approach to exploring the requirements for your system. Basically, start asking yourself what the "resources" are in your application, and how the clients will manipulate these resources on the server. At this point, you don't worry too much about the specific look-and-feel of the interface; you're more focussed on the essential information instances and their structure. Of course, you can and should get feedback from your sponsors about your proposed set of resources and the operations available upon them. Having this API specification available makes getting into coding a breeze, as I discovered yesterday. I previously posted a few links that I found useful in learning about REST.

3. Once you feel comfortable with your API and thus understand what resources exist and how they are manipulated, create a mockup of the user interface. This helps you figure out what user interface you need, and what technology you might want to use---GWT, Ruby on Rails, plain old Java, or even .NET. Since REST is an architectural "style", not a technology, your work defining the API will not be wasted regardless of what user interface infrastructure you choose.

4. Apply the project management and quality assurance skills you learned in 613. Create unit tests and monitor your coverage. Associate each SVN commit with an Issue so that your issue list also constitutes a complete Change Log. Create intermediate milestones that you review with your sponsor. Request code reviews from fellow CSDL members. Maintain an Engineering Log with regular entries, and encourage your sponsors to read it so that they know what you're thinking about and the problems you are encountering. Use static and dynamic analysis tools to automate quality assurance whenever possible; for example, if you are programming in Java, use Checkstyle, PMD, FindBugs, and the Eclipse warnings utility.

5. Finally, be confident in your ability to learn what is required to successfully complete this project. A common project risk factor is developers who feel insecure about not knowing everything they need to know about the application domain in advance, and as a result try to "hide" that fact from their sponsors. In any interesting development project, there are going to be things you don't know in advance, and paths you follow that turn out to be wrong and require backtracking. The goal of software engineering practices is to make those risks obvious, and put in place mechanisms to discover the wrong paths as fast as possible. You will make mistakes along the way. Everyone does, we're only human.

I hope that you will have the experience Pavel had when instituting these same kinds of practices for his bioinformatics RAship. He told me that his sponsor was delighted by his use of Google Projects and an Engineering Log to make his work more visible and accessible. I would love to see a similar outcome for you in this project.

Friday, May 4, 2007

SensorBase coding has begun!

To my great delight (given that the May 15 milestone is rapidly approaching) I have committed my first bit of SensorBase code today.

Some interesting tidbits:

First, I am continuing to observe the Hackystat tradition of always including a reference to an Issue in the SVN commit message. In this case, the reference looks like:

http://code.google.com/p/hackystat-sensorbase-uh/issues/detail?id=3

Second, to my surprise, I am coding 100% in a TDD style, not out of any philosophical commitment or moral imperative, but simply out of the sense that this is just the most natural way to start to get some leverage on the SensorBase implementation. The REST API specification turns out to form a very nice specification of the target behavior, and so I just picked the first URI in the table (GET host/hackystat/sensordatatypes) which is supposed to return a list of sensordatatype resource references, and wrote a unit test which tries that call on a server. Of course, the test fails, because I haven't written the server yet.

Third, to my relief, the Restlet framework makes that test case wicked easy to write. In fact, here it is:


@Test public void getSdtIndex () {
 // Set up the call.
 Method method = Method.GET;
 Reference reference = new Reference("http://localhost:9090/hackystat/sensordatatypes");
 Request request = new Request(method, reference);

 // Make the call.
 Client client = new Client(Protocol.HTTP);
 Response response = client.handle(request);

 // Test that the request was received and processed by the server OK.
 assertTrue("Testing for successful status", response.getStatus().isSuccess());

 // Now test that the response is OK.
 XmlRepresentation data = response.getEntityAsSax();
 assertEquals("Checking SDT", "SampleSDT", data.getText("SensorDataTypes/SensorDataType/@Name"));
 }

There's a couple of rough edges (I can't hard code the server URI, and my XPath is probably bogus), but the code runs and does the right thing (i.e. fails at the getStatus call with a connection not found error.)

I'm sure things won't be this easy forever, but it's nice to get off to a smooth start.

Thursday, May 3, 2007

Minimize library imports to your Google Projects

As we transition to Google Project Hosting, one thing we need to be particularly careful about is uploading of third party libraries into SVN. In general, try to avoid doing this. There are two reasons for this. First, there is limited disk space in Google Project Hosting, and its easy to burn up your space with libraries (remember that since SVN never deletes, libraries that need updating frequently will burn through your space quickly.) Second, different services will often share the same library. For example, most of our Java-based services will probably want to use the Restlet framework. It is generally better to install that in one place as a developer.

To avoid uploading libraries to SVN, you can generally do one of the following alternatives:

Instruct your developers in the installation guide to download the library to a local directory, create an environment variable called {LIBRARY}_HOME, and point to those jar files from your Eclipse classpath or Ant environment variable.
For files that need to be in a specific location in your project, such as GWT, download the GWT to a local directory, then copy the relevant subdirectories into your project.

Binary distributions of releases is a different situation. In that case, we will typically want to bundle the libraries into the binary distribution. That will cause its own difficulties, since Google Project Hosting limits us to 10MB files for the download section, but we'll cross that bridge when we come to it.

H4 with Robert

I had an entertaining and enjoyable H4 with Robert yesterday. We spent the time fooling around with the sample Restlet framework applications. I was a bit worried about whether we would be able to make progress, since the samples were almost totally undocumented (the documentation points you to a directory containing sample code, which turns out to be sample code from an as-yet-unpublished O'Reilly book on the Restlet framework.)

Presumably with the book in our hands, we would have had complete instructions on how to run the examples.

Without the book in our hands, we just blasted ahead, created an Eclipse project, imported the code, looked for public static main() methods, and ran them.

To my surprise and delight, after some traipsing around the lib directory and guessing at jar files to add to the classpath, we eventually got all of the sample code to run.

So, the bad news is: the Restlet Framework examples are poorly documented: buy the book when it comes out. The good news is: given a few lucky guesses, you really don't need any documentation, at least to get them up and running.

PMJ Engineering Log