PMJ Engineering Log: Why REST for Hackystat?

Cedric asked a really good question on the hackystat mailing list today, and I thought it was worth posting to this blog:

> Probably my question is too late since you have already decide use REST, but I want to
> know the rationale behind it.
>
> Since you are still returning data in xml format, what makes you decide not to publish
> a collection of WSDL and go along with more industrial standard web service calls?

Excellent question! No, it's not too late at all. This is exactly the right time to be discussing this kind of thing.

It turns out that when I started the Version 8 design process, I was still thinking in terms of a monolithic server and was heading down the SOAP/WSDL route. I was, for example, investigating Glassfish as an alternative to Tomcat due to its purportedly better support for web services.

Then the Version 8 design process took an unexpected turn, and the monolithic server fragmented into a set of communicating services: SensorBase services for raw sensor data, Analysis services that would request data from SensorBases and provide higher level abstractions, and UI services that would request data from SensorBases and Analyses and display it with a user interface.

What worried me about this design initially was that every Analysis service would have to be able to both produce and consume data (kind of like being a web server and a web browser at the same time), and that Glassfish might be overkill for this situation. So, I started looking for a lightweight Java-based framework for producing/consuming web services, and came upon the Restlet Framework (http://www.restlet.org/), which then got me thinking more deeply about REST.

It's hard to quickly sum up the differences between REST and WSDL, but here's a few thoughts to get you started. WSDL is basically based upon the remote procedure call architectural style, with HTTP used as a "tunnel". As a result, you generally have a single "endpoint", or URL, such as /soap/servlet/messagerouter, that is used for all communication. Every single communication with the service, whether it is to "get" data from the service, "put" data to the service, or modify existing data is always implemented (from an HTTP perspective) in exactly the same way: an HTTP POST to a single URL. From the perspective of HTTP, the "meaning" of the request is completely opaque.

In REST, in contrast, you design your system so that your URLs actually "mean" something: they name a "resource". Furthermore, the type of HTTP method also "means" something: GET means "get" a representation of the resource named by the URL, "POST" means create a new resource which will have a unique URL as its name, DELETE means "delete" the resource named by the URL, and so forth.

For example, in Hackystat Version 7, to send sensor data to the server, we use Axis, SOAP, and WSDL to send an HTTP POST to http://hackystat.ics.hawaii.edu/hackystat/soap/rpcrouter, and the content of the message indicates that we want to create some sensor data. All sensor data, of all types, for all users, is sent to the same URL in the same way. If we wanted to enable programmatic access to sensor data in Version 7, we would tell clients to continue to use HTTP POST to http://hackystat.ics.hawaii.edu/hackystat/soap/rpcrouter, but tell them that the content of the POST could now invoke a method in the server to obtain data.

A RESTful interface does it differently: to request data, you use GET with an URL that identifies the data you want. To put data, you use POST with an URL that identifies the resource you are creating on the server. For example:

GET http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit/1176759070170

might return the Commit sensor data with timestamp 1176759070170 for user x3fhU784vcEW. Similarly,

POST http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit/1176759070170

would contain a payload with the actual Commit data contents that should be created on the server. And

DELETE http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit/1176759070170

would delete that resource. (There are authentication issues, of course.)

In fact, REST asserts a direct correspondance between the CRUD (create/read/update/delete) DB operations and the POST, GET, PUT, and DELETE methods for resources named by URLs.

Now, why do we care? What's so good about REST anyway? In the case of Hackystat, I think there are two really significant advantages of a RESTfully designed system over an RPC/SOAP/WSDL designed system:

(1) Caching can be done by the Internet. If you obey a few more principles when designing your system, then you can use HTTP techniques as a way to cache data rather than build in your own caching system. It's exactly the same way that your browser avoids going back to Amazon to get the logo files and so forth when you move between pages. In the case of Hackystat, when someone invokes a GET on the SensorBase with a specific URL, the results can be transparently cached to speed up future GETs of the same URL, since that represents the same resource. (There are cache expiration issues, which I'm pretty sure we can deal with.)

In Hackystat Version 7, there is a huge amount of code that is devoted to caching, and this code is also a huge source of bugs and concurrency issues. With a REST architecture, it is possible that most, perhaps all, of this code can be completely eliminated without a performance hit. Indeed, performance might actually be significantly better in Version 8.

(2) A REST API is substantially more "accessible" than a WSDL API. One thing I want from Hackystat Version 8 is a substantially simpler, more accessible interface, that enables outsiders to quickly learn how to extend Hackystat for their own purposes with new services and/or extract low-level or high-level data from Hackystat for their own analyses. To do this with a RESTful API, it's straightforward: here are some URLs, here's how they translate into resources, invoke GET and you are on your way. Pretty much every programming language has library support for invoking an HTTP GET with an URL. One could expect a first semester programming student to be able to write a program to do that. Shoots, you can do it in a browser. The "barrier to entry" for this kind of API is really, really low.

Now consider a WSDL API. All of a sudden, you need to learn about SOAP, and you need to find out how to do Web Services in your chosen programming language, and you have to study the remote procedure calls that are available, and so forth. The "barrier to entry" is suddenly much higher: there are incompatible versions of SOAP, there's way more to learn, and I bet more than a few people will quickly decide to just bail and request direct access to the database, which cuts them out of 90% of the cool stuff in Hackystat.

So, from my reckoning, if we decided to use Axis/SOAP/WSDL in Version 8, we'd (1) continue to need to do all our own caching with all of the headaches that entails, and (2) we'd be stuck with a relatively complex interface to the data.

I want to emphasize that a RESTful architecture is more subtle than simply using GET, POST, PUT, and DELETE. For example, the following is probably not restful:

GET http://foo/bar/baz&action=delete

For more details, http://en.wikipedia.org/wiki/Representational_State_Transfer has a good intro with pointers to other readings.

Your email made another interesting assertion:

> what makes you decide not to publish
> a collection of WSDL and go along with more industrial standard web service calls?

Although I agree that WSDL is an "industry standard", this doesn't mean that REST isn't one as well. Indeed, my sense after a few weeks of research on the topic is that most significant industrial players have already moved to REST or offer REST as an alternative to WSDL: eBay, Google, Yahoo, Flickr, and Amazon all have REST-based services. I recall reading that the REST API gets far more traffic than the correponding WSDL API for at least some of these services.

Finally, no architecture is a silver bullet, and REST is no exception. For example, if you can't effectively model your domain as a set of resources, or if the CRUD operations aren't a good fit with the kinds of manipulations you want to do, then REST isn't right. Another REST requirement is statelessness, which can be a problem for some applications. So far in my design process, however, I haven't run into any showstoppers for the case of Hackystat.

Version 8 is still in the early stages, and the advantages of REST are still hypothetical, so I'm really happy to have this conversation. There are no hard commitments to anything yet, and if there turns out to be a showstopping problem with REST, then we can of course make a change. The more we talk about it, the greater the odds we'll figure out the right thing.

Cheers,
Philip

PMJ Engineering Log

Wednesday, April 18, 2007

Why REST for Hackystat?

2 comments:

Table of Contents

Labels