Friday, 4 July 2014

Indexing of fulltext properties from cypher and unique relationships

Why are you hiding?

Issues from the previous post aside, I needed to import some data from CSV. That went surprisingly painless (well, the first part anyway...) - but despite having an index on one of the fields in the class, after some testing I realized that I couldn't find my entities by that field.

My class mapping looked something like:

public class City extends GraphNode {

    @Indexed(indexType = IndexType.FULLTEXT, indexName = "locations")
    private String name;
...
}

My cypher import (note the extra labels for SDN):

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "

SDN repository I used for testing:

public interface CityRepository extends GraphRepository {

    Page findByNameLike(String name, Pageable page);
    
    List findByName(String cityName);
}

I had a test for the repository and the lookup worked fine when data was inserted via SDN but not with my CSV Cypher import. With the help of brilliant Michael Hunger I managed to find a reason and workaround. For details of why the next step is needed check Michael's explanation, if all you want is to make it work, for now you'll need to do something like this:

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "
+ "RETURN city";
Result cities = neo4jTemplate.query(cypher, ImmutableMap.of()).to(Node.class);
Index index = db.index().forNodes("locations");
for (Node city : cities) {
    String location = (String) city.getProperty("name");
    index.remove(city);
    index.add(city, "name", location);
}

Modelling marriage relationship

Well, in all honesty I was actually modelling a flight schedule but the same principle applies - you cannot fly out from two airports at the same time on the same flight number. Yet (with the help of Excel autocomplete feature, which changed flight code QF1 into QF10...) I managed to create data that implied that this can actually happen. My SDN model was not taking this situation into account and cried not-so-silently when I tried to retrieve the data from Neo4j

java.lang.IllegalArgumentException: Cannot obtain single field value for field 'to'
 at org.springframework.data.neo4j.fieldaccess.RelatedToSingleFieldAccessorFactory$RelatedToSingleFieldAccessor.getValue(RelatedToSingleFieldAccessorFactory.java:94)
 at org.springframework.data.neo4j.fieldaccess.DefaultEntityState.getValue(DefaultEntityState.java:97)

So a quick tip is - if you want to avoid bigamy in your database (and this exception) - make sure you're not making node A married to B and C at the same time.

Stay tuned for more Neo4j drama. :)

Saturday, 28 June 2014

First impressions of Neo4j

I have always had interest in big data - much more than, say, low latency processing. I never wanted to optimize the code for every processor cycle, or every bit sent across the wire. I appreciate people who are capable of doing that, it's just never been something I was passionate about. It's too... low level. Too C ;). I want abstraction, business domain - and somehow, in particular I get the buzz from the idea of having and processing loads and loads of information.

In my career so far I was lucky enough to have more or less exposure to technologies such as Gigaspaces, Cassandra and Attivio, and even though I've heard about neo4j and briefly touched it quite a while ago, it wasn't until very recently that I properly picked it up for a personal project (which will turn into the next Facebook, obviously... ) ;).

I started with an awful lot of enthusiasm, only to be reminded that picking up a new toy can be painful sometimes - especially when the toy is still very much 'work in progress'. :) Below is a short summary of my journey so far.

Hello, world!

As a typical geek, I downloaded the latest and greatest stable versions of everything involved - at that time it was Neo4j 2.1 (2.1.1 or 2.1.2, can't remember) and Spring Data Neo4j 3.1.0. I read a book about graph databases from O'Reilly (very good BTW), got another one for Spring Data (joys of Safari), and well equipped started my 'hello world' level app. What could possibly go wrong, right? :)

Well, only a startup issue, throwing the following exception:

Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'currencyRepository': Cannot resolve reference to bean 'neo4jTemplate' while setting bean property 'neo4jTemplate'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'neo4jTemplate' defined in class org.springframework.data.neo4j.config.Neo4jConfiguration: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.springframework.data.neo4j.support.Neo4jTemplate org.springframework.data.neo4j.config.Neo4jConfiguration.neo4jTemplate() throws java.lang.Exception] threw exception; nested exception is java.lang.NoClassDefFoundError: org/neo4j/kernel/impl/transaction/LockException
 at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:336)
 [...]

Googled a bit, didn't find an awful lot except for a logged jira suggesting that the versions I got are not compatible. Fine, I'll go with the SDN 3.2.0-SNAPSHOT, they surely would fix something like this? Well... no.
Okay, still with plenty of enthusiasm I decided to downgrade to neo4j 2.0.3. Things started fine. Nice.

Hot like a model. But not unique.

I modeled a couple of domain objects. Although neo4j only supports primitives and strings, SDN is meant to do automatic conversion. Sweet! Made one of my fields an enum, annotated it as indexed and unique - doesn't work :( Crashes on a save to neo4j saying that type is incompatible:

org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is java.lang.IllegalArgumentException: [MONDAY:java.time.DayOfWeek] is not a supported property value
 at org.springframework.data.neo4j.support.query.CypherQueryEngineImpl.query(CypherQueryEngineImpl.java:61)
 at [...]

Well, not surprising - I can clearly see that it sends an enum. But wasn't it meant to do translation for me? Boohoo. Took me some time to figure out that things get messed up when you add the uniqueness on a non-string/primitive field. Asked a question about it on StackOverflow about it, got no answers. Fine. Enum. Unique. Pick one. Let's go back to String then...

Oh, and let's not forget that the uniqueness in SDN is implemented as an update, not an exception. Could easily trip on this one, too - and the number of questions on SO and elsewhere about it seems to imply it's not just me expecting a crash rather than silent modification of the data.

All your data are belong to us (*)

Having the super simple model it's time to get some data in. New neo4j has the neat CSV import. Desperate to use it I decided to hack the SDN and fix the class that was failing on class not found. Surprisingly, it worked (and later I learnt SDN team finally released a SNAPSHOT with a fix so switched to that instead). So now on neo4j 2.1.2, SDN 3.2.0-SNAPSHOT I managed to import some data and create nodes that should be read by SDN. No joy first time round, SDN doesn't find it and crashes with bizarre errors.

 
java.lang.IllegalStateException: No primary SDN label exists .. (i.e one starting with _) 
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:126)
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:39)

org.neo4j.graphdb.NotFoundException: No such property, '__type__'.
 at org.neo4j.kernel.impl.core.RelationshipProxy.getProperty(RelationshipProxy.java:189)
 at org.springframework.data.neo4j.support.typerepresentation.AbstractIndexBasedTypeRepresentationStrategy.readAliasFrom(AbstractIndexBasedTypeRepresentationStrategy.java:126)

It turned out SDN labels the nodes with 2 labels, not just one as I expected (for a class Foo I thought I only need label Foo, turns out both Foo and _Foo need to be present). For relationships I also needed an extra property added.

Everything is cypher-rific.

Having data in, basic mapping and basic repos, time for some Cypher fun. After a couple of classic PEBKAC issues I got a few more sophisticated queries working. All of them were basically copy-pasted from the lovely neo4j browser tool and worked just fine - but one problem with that. What about refactoring? What about more dynamic queries? Well, Neo4j has the dsl library which allows for dynamic query creation, clearly that should be better, shouldn't it?
Only finding any documentation for it, except for mentions here and there 'it exists and works with querydsl' proved a non-trivial task. The best doc I found? The project's Github repository junit test cases (I honestly cannot count a handful of rather dated articles here and there, with 1-2 simple examples each). Call me old fashioned, but that doesn't make me feel like it's a mature, stable and well supported tool :) Also, although someone kindly committed a patch for labels to work, I couldn't find a SNAPSHOT jar to download so had to clone Git project and build it myself. Not that it's an awful lot of work but let's just say that in the place I work that would be a total no-go.

That aside, there is a library meant to work well with cypher, querydsl, which should make my queries even nicer. No-brainer, huh? Well, yes, except that it didn't even pretend to work with my LocalDate fields and produced code that wouldn't even compile. Back to Strings. And back to lack of decent documentation how to marry this and cypher.

And the journey continues

And this is where I am at the moment. All in all, the journey so far was a lot of fun and neo4j starts to get the feel of an enterprise solution - sadly, the surrounding eco system can't quite catch up. There are things that I take for granted in a mature technology - any scenario covered somewhere, just ask Google, ask a question on stackoverflow and it shall be answered. There are so many books and articles and examples for pretty much anything, whatever you come up with - someone has done it before.
Here problems often turn into a lot of debugging and glueing together little pieces of information spread across the web. It's not necessarily a massive problem, after all I'm a smart and experienced cookie ;) I know how to use source code and google (as limited as the resources are, they are somewhere out there, and there are helpful people too). I wouldn't want all this to sound like complaining - I realize a lot of that work is done in spare time that passionate people put in, and only some of them actually get paid for this. So it's more of a reflection and observation of the current state, and not something that is a problem for me. Having said that, I would hesitate a bit before throwing Neo4j at a team of newbies (AKA n00bs), especially if it was a project with relatively fixed deadlines (as opposed to my leisurely pace of 'whenever').

Or perhaps it's an opportunity to become an expert in a new, exciting technology before everybody starts using it :)

(*) Allegedly attributed to US government.

Thursday, 6 February 2014

Synology 412+ - first impressions

For some time now I've been planning to buy an extra machine as a server - for local storage but also to do some more serious development and testing. My current laptop is starting to show its age (or I shall rather say: is on its last legs...) and whilst during pure development it's not bad enough for me to go through the hassle of picking a new one, I decided I could use something extra - just to run some dev tools, an app or two, or a db server, or a little cloud, or whatever it is I'm testing. On top of that I needed a place to store the code - I do not necessarily want to put it all on public github. And some space to store the data (if/when I start playing with things like Neo4j). And a place for the company website (tried google sites but honestly it doesn't seem to be quite what I was looking for). And my hard drive started to make funny noises and I got worried that one day I'll lose all my data.
Theoretically I could have bought a private github account, space in the cloud and what not but I wanted something that will grow with me, and something entirely under my control. Plus, cloud is all good if you have a handful of gigabytes of data but anything beyond and things just get increeeeeeedibly slow to backup.

Finally after some research I decided to buy this little baby: https://www.synology.com/en-us/products/overview/DS412+ . It seemed a bit expensive but after having had it for a few days I think it will turn out to be worth every penny. I am getting more and more excited about it - it's basically a unix server, with full root access, with heaps of inbuilt stuff (NAS being a major one of course). It even has a git server, JDK and Tomcat. Linux on it is rather basic, but I don't think I'll need it much, I will just add maven to the mix and that should be enough for my needs. For now :)