Sunday, February 12, 2012

Neo4j transactions in Scala

Recently I started to work on a project which depends on Neo4j. Neo4j - using is its own definition - "is a graph database, storing data in the nodes and relationships of a graph".

Once we have a database instance, it's fairly simple to work with it. There are methods to get, create, update and delete nodes and relationships. There is, however, a requirement: every operation that writes to the database must be in a transaction. A transaction must either be committed or rolled back when it's finished. Neo4j is written in Java, so the simplest way to demonstrate a basic transaction is a Java example:

public class Neo4jExample {

    private final GraphDatabaseService db;

    public Neo4jExample(GraphDatabaseService db) {
        this.db = db;
    }

    public Node createFooNode() {
        Transaction tx = db.beginTx();
        try {
            Node node = db.createNode();
            node.setProperty("foo", "bar");
            tx.success();
            return node;
        } finally {
            tx.finish();
        }
    }
}

First we need to start a new transaction in the database with the db.beginTx() method. When the transaction is started we can create a new node in the database by calling db.createNode(), then we can set the foo property to bar by using node.setProperty("foo", "bar"). To finish a transaction we need to call tx.finish(), which will either commit or roll back the transaction. It will always roll back unless the transaction was marked as successful with tx.success(), in which case the transaction is committed.

Now notice the try-finally block which encloses the whole transaction. This is the recommended way to work with Neo4j transactions, because this is the only way which ensures that the transaction is properly finished. If an exception is thrown in the middle of the try-block before tx.success() is even being called, the transaction will be rolled back since it was not marked as successful.

The solution is fail-safe and simple, but there is one problem with it. The transaction-handling pollutes our code. The only important thing is in the two middle lines in which we create a node and set a property on that node. The other six lines surrounding it are required by design, but they only take our attention away. Additionally, if we need transactions at many different parts of our code base, we need to repeat those six lines everywhere, which is a lot of code duplication. We could, of course, overcome it by using AOP or by rewriting the above code to use the command pattern, but the former requires some AOP library and the latter makes this simple code a lot more complicated.

A more elegant solution exists in Scala which does not require additional libraries or an utterly complicated design. In Scala not only can we define functions and call
them, but we can write down functions as unnamed literals and then pass
them around as values. Scala supports first-class functions, which means we
can express functions in function literal syntax, i.e., (x: Int) => x + 1,
and that functions can be represented by objects, which are called function values.

So all we need is a method that takes a function-literal argument and executes that function inside a transaction:

trait TransactionSupport {

  protected def transaction[A <: Any](db: GraphDatabaseService)(dbOp: => A): A = {
    val tx = db.beginTx()
    try {
      val result = dbOp
      tx.success()
      result
    } finally {
      tx.finish()
    }
  }
}

We can use this method in our code where we need a transaction:

class Neo4jExample(db: GraphDatabaseService) extends TransactionSupport {

  def createFooNode(): Node = transaction(db) {
    val node = db.createNode()
    node.setProperty("foo", "bar")
    node
  }
}

The transaction handling is defined in a trait which can be mixed in to any class where transactions are needed. The transaction method takes two argument lists: the first one is the database in which the transaction is started, the second one is the function that must be executed inside the transaction boundaries. This signature makes it possible to call the method in the transaction(db) { ... } format which makes it look like the transaction handling is supported by the language itself. The method simply returns whatever the passed in function returns. The body of transaction follows the exact same structure of the above Java example: it starts a transaction, then does something in the try-block, and last it finalizes the transaction in the finally-block. The only difference is that instead of specific database operations it executes the passed in function.

Using the above transaction support is very simple: just wrap the code which must be executed in a transaction into the transaction(db) { ... } method. The code duplication is reduced to mixing in the TransactionSupport trait and calling the transaction method whenever it's necessary. No external libraries are used and no complicated patterns are implemented. And the resulting code is not only not complicated but also very easy to read.

The TransactionSupport trait, however, is not perfect. As long as we don't need complex transaction handling it's good enough, but as soon as we have to roll back transactions when certain conditions are met, or we must acquire read- or write-locks we need to come up with some other solution.