Monthly Archives: July 2008

Dabbling with Groovy

I am currently implementing a Bayesian document classifier in Groovy and I have to say that I am really enjoying the language. As far as dynamic languages go, Groovy has all the bells and whistles, such as closures, meta-programming, and dynamic typing. The language has a familiar feel for Java developers, yet has the features to accommodate developers that know Ruby, Python, or Perl.

Furthermore, Groovy and Grails have excellent tool support with IntelliJ IDEA. I cannot recommend that IDE enough. Auto-completion, code navigation, syntax highlighting, and more importantly refactoring all work as expected. This is something that I have found lacking with IDEs for other dynamic languages like Ruby and Python. Yes, I know there is a Ruby plugin for IntelliJ IDEA, but it still isn’t as good as the Groovy plugin!

Ok, enough of the sales pitch, lets see some code for the document classifier that I am building. The following Feature class contains a method for extracting a set of features from a document. The document is simply a string of text, and the features are all unique words that have three letters or more. So words like [I, a, be, is] will be ignored, and words like [alpha, bravo, charlie] will be identified as features. If we begin with a test, our expectations will look like the following.

class FeatureServiceTests extends GroovyTestCase {

    void testShouldExtractFeatureFromDocument() {
        def sampleDocument = "Groovy is really groovy man."
        def service = new FeatureService()
        def feature = service.getFeature(sampleDocument)
        def expectedFeature = ["groovy", "really", "man"]

        assertEquals(expectedFeature, feature)
    }
}

The test implies that only words that are unique and have more than three characters within a given document will be extracted as a set of lowercase words. This is a utility method for extracting features that can be used to determine their probability of occurring within a document that belongs to a specific category of documents. Since I am implementing this document classifier in Grails, I am using the convention of naming the feature utility class as a service. The implementation of FeatureService is as follows.

class FeatureService {

    boolean transactional = true

    def getFeature(String doc) {
        def feature = []
        def words = doc.split(/W/)*.toLowerCase().unique()
        words.each {word ->
            if (word.size() >= 3) {
                feature << word
            }
        }
        return feature
    }
}

Walking through the code, line 1 defines the class name. Kind of standard, unless you come from Java, in which case there is no need to define the visibility of the class using public, etc. You can if you want, but the Groovy compiler will just ignore it.

Line 3 was generated by Grails. It contains a property called transactional.

Line 5 contains the method signature for getFeatures(). There is no need to state the return type of the method, but you can if you want.

Line 6 defines an empty list in which we’ll add features to when we find them.

Line 7 uses a regular expression to split the string into a list of words. All of the words are converted to lowercase, and only the unique set of words are kept. This line demonstrates Groovy’s ability to chain methods. The ‘*.” operator means apply the toLowerCase() method to each item in the list that is returned by the split operation.

Lines 8 to 12 show a closure that is being used to filter out words that are less than 3 characters in length. Only words that have 3 or more characters are added to the feature list using the left-shift operator (<<). This is where a Perl “unless” keyword would be useful in the Groovy language, as it would reduce the if statement to a single line. It would also read better, along the lines of “add word to feature list unless size of word is less than 3″. There is demand for this keyword, so we’ll just have to wait and see if it makes it into Groovy.

Finally, line 13 returns the feature list. You don’t need to provide a return statement in Groovy, but in this case it is necessary as the last variable in the method is returned by default. I prefer to explicitly state what the method returns, because doing otherwise could introduce unexpected behaviour into your code.

This code example is quite brief, but it should give you a taste of what Groovy has to offer.

Sydney Groovy Group

Today I decided to start up a special interest group for Groovy and Grails developers in Sydney. If you are interested in Groovy or would like to be a presenter then please join up at Groovy Sydney. You can register your Groovy talk on the Presentation Topics page, and I will organise for you to present at the next meeting. The Groovy Sydney group will meet on a monthly basis at the ThoughtWorks office, followed by drinks at a nearby pub.

Groovy is a dynamic language that I have been ignoring until now for no particular reason. I first heard about it when I was doing consulting work in Brisbane last year. One of the client developers of a large bankasurance company gave a pretty good presentation on Groovy. But back then I was ignorant and caught up in the Ruby on Rails hype.

The thing about Groovy that won me over was that you can start off by writing your code in Java, and then refactor to make your code groovier. It reduces the learning curve required to be productive in a new programming language, which I believe to be quite novel.

Groovy makes sense in the enterprise as it allows you to leverage your existing Java based systems. You can deploy a Groovy application in your expensive J2EE application servers, therefore maximising your return on investment in infrastructure costs. For example, Grails — a Rails-esque web application framework for Groovy — can be deployed as part of a Spring application. Essentially you can implement a lot of your integration business logic in Java, and use Grails to quickly create frontend CRUD functionality, which is tedious to do in Spring alone. Groovy is a tool worth having for any Java developer looking for productivity gains.