Tuesday, August 26, 2008

Reflections on Google Summer of Code 2008

Background

Back in April, we applied Hackystat to the 2008 Google Summer of Code program. We didn't know too much about it, other than that it provided a chance for students to be funded by Google to work on open source projects for the summer.

With great glee, we learned in March that Hackystat was accepted as one of the 140 projects sponsored by Google. The next step was to solicit student applications, which we did by sending email to the Hackystat discussion lists. We ended up with around 20 applications. There were a few that were totally off the wall from people who had no clue what Hackystat was, and a few others that were disorganized, incomplete, or otherwise indicative of a student who would probably not be successful. But, a good dozen of the 20 applications appeared quite promising and deserving of funding.

Google then posted the number of "slots" for each project--the maximum number of students that they would support. Hackystat got 4 slots. The number of slots is apparently based partially on the number of applications received by the project, and partially on the organization's past track record with GSoC. Hackystat had no prior track record, and couldn't compete with the number of applications for, say, the Apache Foundation. The GSoC Program Administrator answered the anguished pleas of new organizations who got less slots than they wanted by basically saying, "Look, we don't want to give you a zillion slots and then have half a zillion projects fail. Do a good job this year with the slots you were given and reapply next year." Sound advice, actually.

We then started ranking the applications to figure out which four students should be funded. It was difficult and frustrating, because there were many good applications. At the end, we came up with four students who we felt had a combination of interesting project ideas and a good chance of success based on their skills and situations.

We were right. Three out of four of the students successfully completed their projects, and the fourth student had to drop out of the program due to sudden illness, which no one could have foreseen.

GSoC requires each student to have a mentor. This summer, Greg Wilson of the University of Toronto and I each took two students. Greg's students were physically sited at the University of Toronto, so he was able to have face-to-face interactions. My students were in China.

Student support took several forms over the summer. First, there was email and the Hackystat developer mailing lists. At the beginning of the summer, I received a few emails from students that I redirected to the mailing list, so that other project developers could respond, and also because the question asked was of general Hackystat interest. Fairly quickly, the students caught on, and started posting most of their general-interest questions to the list. I think this was one conceptual hurdle for the students to get over: they were not in a relationship just with me or Greg, but also with the entire Hackystat developer and user community. While there were certainly issues pertaining to the GSoC program that they discussed privately with their mentors, they were also "real" Hackystat developers and needed to learn how to interact with the wider community. All of the students acclimated to their new role.

We also requested that the students maintain a blog and post an entry at least once a week that would summarize what they'd been working on, what problems they'd run into, and what they were planning to do next. This was also pretty successful. You can see Shaoxuan's, Eva's, and Matthew's blogs for details. Interestingly, the Chinese students found they could not access their (U.S. created) blogs once they were in China, and so had to use Wiki pages.

Finally, I also set up weekly teleconferences via Skype with the two students I was mentoring in China. This was a miserable failure, probably due to my own lameness. Despite the fact that I live in a timezone (HST) shared by very few of my software engineering colleagues, and thus have lots of experience with multi-timezone teleconferencing, the Hawaii-China difference just totally threw me. The international dateline did not help matters. At any rate, we simply fell back to asynchronous communication via blogs and email and that worked fine.

For source code and documentation hosting, we used two mechanisms. The Hackystat project uses Google Project Hosting, and so the students I mentored used this service. Greg is the force behind Dr. Project, and so the students he mentored used that service. As part of the wrapup activities, his students ported their work to Google Project Hosting to conform to the Hackystat conventions.

Results

So, what did they actually accomplish? Matthew Bassett created a sensor for Microsoft Team Foundation server. Here's a screen shot of one page where the user can customize the events the sensor collects:

The sensor itself is available at: http://code.google.com/p/hackystat-sensor-tfs/.

Eva Wong worked on a data visualization system for Hackystat based on Flare.

Her project is available at: http://code.google.com/p/hackystat-ui-sensordatavisualizer/.

Finally, Shaoxuan Zhang worked on multi-project analysis mechanisms for Hackystat using Wicket. Here is a screen shot of the Portfolio page:

His project is available at: http://code.google.com/p/hackystat-ui-wicket/.

Reflections

So, what makes for a successful GSoC?

First, and most obviously, it's important to have good students. "Good", with respect to GSoC, seems to boil down to two essential attributes: a passion for the problem, and the ability to be self-starting. (As long as the student "starts", the mentors and other developers can help them "finish"). It was delightful to read Matthew's blog entries about Team Foundation Server: he obviously likes the technology and enjoyed digging into its internals. At one point in the summer, Shaoxuan sent me an email in which he apologized that he had not been working much for the past week because he just got married, but he'd work extra hard the next week to catch up! We clearly had passionate students.

It also helps to have good mentors. In the Hackystat project, we have an embarrassment of riches on this front, since the project includes a large number of academics who mentor as part of their day jobs. In the end, we only needed two active mentors for the four students, but we easily had mentoring capacity for a couple dozen students.

Establishing effective communication systems is critical. Part of this is technological. We found that email and blogs worked well. Skype did not work well for me, but that was probably operator error on my part. Greg had the additional opportunity to use face-to-face communication, which is certainly helpful but not at all necessary to success. The other part is social. Most of our students needed to learn over the summer to: (a) request help quickly when they ran into problems, and (b) direct their question to the appropriate forum: either the Hackystat developer mailing list or privately to a mentor via email. This wasn't particularly difficult or anything, it was just a part of the process of understanding the Hackystat project culture.

I think I would have more insightful "lessons learned" had any of the student projects crashed and burned, but fortunately for the students (and unfortunately for this blog posting), that simply didn't happen.

For the Hackystat project, participation in GSoC this summer has had many benefits. Clearly, we'll benefit from the code that the students have created and which now is publically visible in the Hackystat Component Directory. We are crossing our fingers that the students will continue to remain active members of the Hackystat community.

GSoC has also helped to create a new "center" of Hackystat expertise at the University of Toronto. We hope to build upon that in the future.

GSoC also catalyzed a number of discussions within the Hackystat developer community about the direction of the project and how students could most effectively participate. These insights will have long term value to the project.

I believe we are now significantly more skillful at mentoring students. I hope we get a chance to participate in GSoC 2009, and that we can build upon our experiences this summer next year.

Saturday, August 23, 2008

Clean code

There is a persistent, subtle perception that "coding is for the young": that as you progress as a technical professional, you "outgrow" coding. Perhaps this is because most organizations pay senior managers way better than their technical staff. Perhaps this is because many software developers hit a glass ceiling and decide they've learned all there is to know about code. Perhaps this is because coding is seen as a low-level skill and vulnerable to outsourcing.

Other disciplines lack this perception: no one would ever want Itzhak Perlman to give up playing violin for a management position, or believe that his musical development ended while he was in his 20's, or that a symphony would outsource his position to a young virtuoso, no matter how talented, on the basis that they could get equivalent quality for less money.

I think part of the reason for this difference in perception is a difference in visibility: one can immediately hear the quality of a great violinist, even if one does not play violin themselves. The quality of the work produced by a great coder is, unfortunately, almost invisible: how do you "hear" code that is simultaneously flexible, maintainable, understandable, and efficient? How do you hear it if you are a senior manager who doesn't even know how to code?

Clean Code: A Handbook of Agile Software Craftsmanship is a nice attempt to make the quality of great code visible, and in so doing makes some other points as well: that great code is very, very difficult to write; that even apparently well-written code can still be significantly improved; and that the ability to consistently write great code is a goal that will take most of us decades of consistent practice to achieve.

Clean Code is written by Robert Martin and his colleagues at Object Mentor. It begins with a chapter in which Bjarne Stroustrup, Grady Booch, Dave Thomas, Michael Feathers, Ron Jeffries, and Ward Cunningham are all asked to define "clean code". Their responses, and Martin's synthesis, would make a stunning Wikipedia entry for "clean code".

However, as he points out, knowing what clean code is, and even recognizing it when you see it, is far different (and far easier) than being able to actually create it yourself. The most interesting parts of the book are case studies where code from various open source systems (Tomcat, Apache Commons, Fitnesse, JUnit) is reviewed and improved.

Along the way, the "Object Mentor School of Clean Code" emerges. I found much to agree with, along with some controversial points. For example, I am a great believer in the use of Checkstyle to ensure that all public methods and classes have "fully qualified" JavaDoc comments (i.e. that all parameters and (if present) return values) are documented. The OMSCC actually has a fairly low opinion of comments: that they should be eliminated whereever possible in favor of code that is so well-written that comments are redundant, even for public methods. As a result, I don't think they could use an automated tool such as Checkstyle for QA on their JavaDocs.

In some cases, they create "straw man" examples: such as using three lines for a JavaDoc comment that could easily be contained on one line, and then complaining that the JavaDoc takes up too much vertical space.

From a truth in advertising standpoint, the book should have the word "Java" in its title. All of the examples are in Java, and while the authors attempt to generalize whenever possible, it is clear that many aspects of cleanliness are ultimately language-specific. While extremely useful to Java programmers, I am not sure how well these lessons would translate to Perl, Scala, or C.

One final nit: some of the chapters are quite small---dare I say too small, such as the five page chapter on Emergence. On the other hand, the chapter on concurrency gets a 30 page appendix with additional guidelines. If there is a second edition (and I hope there will be), I expect that the topics will get more balanced and even treatment.

Despite these minor shortcomings, I found this book to be well worth reading. I was humbled to see just how much better the authors could make code that already seemed perfectly "clean" to me. And I am happy that someone has made an eloquent and passionate argument for remaining in the trenches writing code for 10, 20, or 30 years, and the maturity and beauty that such discipline and persistence can yield.