Saturday 28 June 2014

First impressions of Neo4j

I have always had interest in big data - much more than, say, low latency processing. I never wanted to optimize the code for every processor cycle, or every bit sent across the wire. I appreciate people who are capable of doing that, it's just never been something I was passionate about. It's too... low level. Too C ;). I want abstraction, business domain - and somehow, in particular I get the buzz from the idea of having and processing loads and loads of information.

In my career so far I was lucky enough to have more or less exposure to technologies such as Gigaspaces, Cassandra and Attivio, and even though I've heard about neo4j and briefly touched it quite a while ago, it wasn't until very recently that I properly picked it up for a personal project (which will turn into the next Facebook, obviously... ) ;).

I started with an awful lot of enthusiasm, only to be reminded that picking up a new toy can be painful sometimes - especially when the toy is still very much 'work in progress'. :) Below is a short summary of my journey so far.

Hello, world!

As a typical geek, I downloaded the latest and greatest stable versions of everything involved - at that time it was Neo4j 2.1 (2.1.1 or 2.1.2, can't remember) and Spring Data Neo4j 3.1.0. I read a book about graph databases from O'Reilly (very good BTW), got another one for Spring Data (joys of Safari), and well equipped started my 'hello world' level app. What could possibly go wrong, right? :)

Well, only a startup issue, throwing the following exception:

Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'currencyRepository': Cannot resolve reference to bean 'neo4jTemplate' while setting bean property 'neo4jTemplate'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'neo4jTemplate' defined in class org.springframework.data.neo4j.config.Neo4jConfiguration: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.springframework.data.neo4j.support.Neo4jTemplate org.springframework.data.neo4j.config.Neo4jConfiguration.neo4jTemplate() throws java.lang.Exception] threw exception; nested exception is java.lang.NoClassDefFoundError: org/neo4j/kernel/impl/transaction/LockException
 at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:336)
 [...]

Googled a bit, didn't find an awful lot except for a logged jira suggesting that the versions I got are not compatible. Fine, I'll go with the SDN 3.2.0-SNAPSHOT, they surely would fix something like this? Well... no.
Okay, still with plenty of enthusiasm I decided to downgrade to neo4j 2.0.3. Things started fine. Nice.

Hot like a model. But not unique.

I modeled a couple of domain objects. Although neo4j only supports primitives and strings, SDN is meant to do automatic conversion. Sweet! Made one of my fields an enum, annotated it as indexed and unique - doesn't work :( Crashes on a save to neo4j saying that type is incompatible:

org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is java.lang.IllegalArgumentException: [MONDAY:java.time.DayOfWeek] is not a supported property value
 at org.springframework.data.neo4j.support.query.CypherQueryEngineImpl.query(CypherQueryEngineImpl.java:61)
 at [...]

Well, not surprising - I can clearly see that it sends an enum. But wasn't it meant to do translation for me? Boohoo. Took me some time to figure out that things get messed up when you add the uniqueness on a non-string/primitive field. Asked a question about it on StackOverflow about it, got no answers. Fine. Enum. Unique. Pick one. Let's go back to String then...

Oh, and let's not forget that the uniqueness in SDN is implemented as an update, not an exception. Could easily trip on this one, too - and the number of questions on SO and elsewhere about it seems to imply it's not just me expecting a crash rather than silent modification of the data.

All your data are belong to us (*)

Having the super simple model it's time to get some data in. New neo4j has the neat CSV import. Desperate to use it I decided to hack the SDN and fix the class that was failing on class not found. Surprisingly, it worked (and later I learnt SDN team finally released a SNAPSHOT with a fix so switched to that instead). So now on neo4j 2.1.2, SDN 3.2.0-SNAPSHOT I managed to import some data and create nodes that should be read by SDN. No joy first time round, SDN doesn't find it and crashes with bizarre errors.

 
java.lang.IllegalStateException: No primary SDN label exists .. (i.e one starting with _) 
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:126)
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:39)

org.neo4j.graphdb.NotFoundException: No such property, '__type__'.
 at org.neo4j.kernel.impl.core.RelationshipProxy.getProperty(RelationshipProxy.java:189)
 at org.springframework.data.neo4j.support.typerepresentation.AbstractIndexBasedTypeRepresentationStrategy.readAliasFrom(AbstractIndexBasedTypeRepresentationStrategy.java:126)

It turned out SDN labels the nodes with 2 labels, not just one as I expected (for a class Foo I thought I only need label Foo, turns out both Foo and _Foo need to be present). For relationships I also needed an extra property added.

Everything is cypher-rific.

Having data in, basic mapping and basic repos, time for some Cypher fun. After a couple of classic PEBKAC issues I got a few more sophisticated queries working. All of them were basically copy-pasted from the lovely neo4j browser tool and worked just fine - but one problem with that. What about refactoring? What about more dynamic queries? Well, Neo4j has the dsl library which allows for dynamic query creation, clearly that should be better, shouldn't it?
Only finding any documentation for it, except for mentions here and there 'it exists and works with querydsl' proved a non-trivial task. The best doc I found? The project's Github repository junit test cases (I honestly cannot count a handful of rather dated articles here and there, with 1-2 simple examples each). Call me old fashioned, but that doesn't make me feel like it's a mature, stable and well supported tool :) Also, although someone kindly committed a patch for labels to work, I couldn't find a SNAPSHOT jar to download so had to clone Git project and build it myself. Not that it's an awful lot of work but let's just say that in the place I work that would be a total no-go.

That aside, there is a library meant to work well with cypher, querydsl, which should make my queries even nicer. No-brainer, huh? Well, yes, except that it didn't even pretend to work with my LocalDate fields and produced code that wouldn't even compile. Back to Strings. And back to lack of decent documentation how to marry this and cypher.

And the journey continues

And this is where I am at the moment. All in all, the journey so far was a lot of fun and neo4j starts to get the feel of an enterprise solution - sadly, the surrounding eco system can't quite catch up. There are things that I take for granted in a mature technology - any scenario covered somewhere, just ask Google, ask a question on stackoverflow and it shall be answered. There are so many books and articles and examples for pretty much anything, whatever you come up with - someone has done it before.
Here problems often turn into a lot of debugging and glueing together little pieces of information spread across the web. It's not necessarily a massive problem, after all I'm a smart and experienced cookie ;) I know how to use source code and google (as limited as the resources are, they are somewhere out there, and there are helpful people too). I wouldn't want all this to sound like complaining - I realize a lot of that work is done in spare time that passionate people put in, and only some of them actually get paid for this. So it's more of a reflection and observation of the current state, and not something that is a problem for me. Having said that, I would hesitate a bit before throwing Neo4j at a team of newbies (AKA n00bs), especially if it was a project with relatively fixed deadlines (as opposed to my leisurely pace of 'whenever').

Or perhaps it's an opportunity to become an expert in a new, exciting technology before everybody starts using it :)

(*) Allegedly attributed to US government.