Friday, 4 July 2014

Indexing of fulltext properties from cypher and unique relationships

Why are you hiding?

Issues from the previous post aside, I needed to import some data from CSV. That went surprisingly painless (well, the first part anyway...) - but despite having an index on one of the fields in the class, after some testing I realized that I couldn't find my entities by that field.

My class mapping looked something like:

public class City extends GraphNode {

    @Indexed(indexType = IndexType.FULLTEXT, indexName = "locations")
    private String name;
...
}

My cypher import (note the extra labels for SDN):

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "

SDN repository I used for testing:

public interface CityRepository extends GraphRepository {

    Page findByNameLike(String name, Pageable page);
    
    List findByName(String cityName);
}

I had a test for the repository and the lookup worked fine when data was inserted via SDN but not with my CSV Cypher import. With the help of brilliant Michael Hunger I managed to find a reason and workaround. For details of why the next step is needed check Michael's explanation, if all you want is to make it work, for now you'll need to do something like this:

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "
+ "RETURN city";
Result cities = neo4jTemplate.query(cypher, ImmutableMap.of()).to(Node.class);
Index index = db.index().forNodes("locations");
for (Node city : cities) {
    String location = (String) city.getProperty("name");
    index.remove(city);
    index.add(city, "name", location);
}

Modelling marriage relationship

Well, in all honesty I was actually modelling a flight schedule but the same principle applies - you cannot fly out from two airports at the same time on the same flight number. Yet (with the help of Excel autocomplete feature, which changed flight code QF1 into QF10...) I managed to create data that implied that this can actually happen. My SDN model was not taking this situation into account and cried not-so-silently when I tried to retrieve the data from Neo4j

java.lang.IllegalArgumentException: Cannot obtain single field value for field 'to'
 at org.springframework.data.neo4j.fieldaccess.RelatedToSingleFieldAccessorFactory$RelatedToSingleFieldAccessor.getValue(RelatedToSingleFieldAccessorFactory.java:94)
 at org.springframework.data.neo4j.fieldaccess.DefaultEntityState.getValue(DefaultEntityState.java:97)

So a quick tip is - if you want to avoid bigamy in your database (and this exception) - make sure you're not making node A married to B and C at the same time.

Stay tuned for more Neo4j drama. :)

Saturday, 28 June 2014

First impressions of Neo4j

I have always had interest in big data - much more than, say, low latency processing. I never wanted to optimize the code for every processor cycle, or every bit sent across the wire. I appreciate people who are capable of doing that, it's just never been something I was passionate about. It's too... low level. Too C ;). I want abstraction, business domain - and somehow, in particular I get the buzz from the idea of having and processing loads and loads of information.

In my career so far I was lucky enough to have more or less exposure to technologies such as Gigaspaces, Cassandra and Attivio, and even though I've heard about neo4j and briefly touched it quite a while ago, it wasn't until very recently that I properly picked it up for a personal project (which will turn into the next Facebook, obviously... ) ;).

I started with an awful lot of enthusiasm, only to be reminded that picking up a new toy can be painful sometimes - especially when the toy is still very much 'work in progress'. :) Below is a short summary of my journey so far.

Hello, world!

As a typical geek, I downloaded the latest and greatest stable versions of everything involved - at that time it was Neo4j 2.1 (2.1.1 or 2.1.2, can't remember) and Spring Data Neo4j 3.1.0. I read a book about graph databases from O'Reilly (very good BTW), got another one for Spring Data (joys of Safari), and well equipped started my 'hello world' level app. What could possibly go wrong, right? :)

Well, only a startup issue, throwing the following exception:

Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'currencyRepository': Cannot resolve reference to bean 'neo4jTemplate' while setting bean property 'neo4jTemplate'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'neo4jTemplate' defined in class org.springframework.data.neo4j.config.Neo4jConfiguration: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.springframework.data.neo4j.support.Neo4jTemplate org.springframework.data.neo4j.config.Neo4jConfiguration.neo4jTemplate() throws java.lang.Exception] threw exception; nested exception is java.lang.NoClassDefFoundError: org/neo4j/kernel/impl/transaction/LockException
 at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:336)
 [...]

Googled a bit, didn't find an awful lot except for a logged jira suggesting that the versions I got are not compatible. Fine, I'll go with the SDN 3.2.0-SNAPSHOT, they surely would fix something like this? Well... no.
Okay, still with plenty of enthusiasm I decided to downgrade to neo4j 2.0.3. Things started fine. Nice.

Hot like a model. But not unique.

I modeled a couple of domain objects. Although neo4j only supports primitives and strings, SDN is meant to do automatic conversion. Sweet! Made one of my fields an enum, annotated it as indexed and unique - doesn't work :( Crashes on a save to neo4j saying that type is incompatible:

org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is java.lang.IllegalArgumentException: [MONDAY:java.time.DayOfWeek] is not a supported property value
 at org.springframework.data.neo4j.support.query.CypherQueryEngineImpl.query(CypherQueryEngineImpl.java:61)
 at [...]

Well, not surprising - I can clearly see that it sends an enum. But wasn't it meant to do translation for me? Boohoo. Took me some time to figure out that things get messed up when you add the uniqueness on a non-string/primitive field. Asked a question about it on StackOverflow about it, got no answers. Fine. Enum. Unique. Pick one. Let's go back to String then...

Oh, and let's not forget that the uniqueness in SDN is implemented as an update, not an exception. Could easily trip on this one, too - and the number of questions on SO and elsewhere about it seems to imply it's not just me expecting a crash rather than silent modification of the data.

All your data are belong to us (*)

Having the super simple model it's time to get some data in. New neo4j has the neat CSV import. Desperate to use it I decided to hack the SDN and fix the class that was failing on class not found. Surprisingly, it worked (and later I learnt SDN team finally released a SNAPSHOT with a fix so switched to that instead). So now on neo4j 2.1.2, SDN 3.2.0-SNAPSHOT I managed to import some data and create nodes that should be read by SDN. No joy first time round, SDN doesn't find it and crashes with bizarre errors.

 
java.lang.IllegalStateException: No primary SDN label exists .. (i.e one starting with _) 
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:126)
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:39)

org.neo4j.graphdb.NotFoundException: No such property, '__type__'.
 at org.neo4j.kernel.impl.core.RelationshipProxy.getProperty(RelationshipProxy.java:189)
 at org.springframework.data.neo4j.support.typerepresentation.AbstractIndexBasedTypeRepresentationStrategy.readAliasFrom(AbstractIndexBasedTypeRepresentationStrategy.java:126)

It turned out SDN labels the nodes with 2 labels, not just one as I expected (for a class Foo I thought I only need label Foo, turns out both Foo and _Foo need to be present). For relationships I also needed an extra property added.

Everything is cypher-rific.

Having data in, basic mapping and basic repos, time for some Cypher fun. After a couple of classic PEBKAC issues I got a few more sophisticated queries working. All of them were basically copy-pasted from the lovely neo4j browser tool and worked just fine - but one problem with that. What about refactoring? What about more dynamic queries? Well, Neo4j has the dsl library which allows for dynamic query creation, clearly that should be better, shouldn't it?
Only finding any documentation for it, except for mentions here and there 'it exists and works with querydsl' proved a non-trivial task. The best doc I found? The project's Github repository junit test cases (I honestly cannot count a handful of rather dated articles here and there, with 1-2 simple examples each). Call me old fashioned, but that doesn't make me feel like it's a mature, stable and well supported tool :) Also, although someone kindly committed a patch for labels to work, I couldn't find a SNAPSHOT jar to download so had to clone Git project and build it myself. Not that it's an awful lot of work but let's just say that in the place I work that would be a total no-go.

That aside, there is a library meant to work well with cypher, querydsl, which should make my queries even nicer. No-brainer, huh? Well, yes, except that it didn't even pretend to work with my LocalDate fields and produced code that wouldn't even compile. Back to Strings. And back to lack of decent documentation how to marry this and cypher.

And the journey continues

And this is where I am at the moment. All in all, the journey so far was a lot of fun and neo4j starts to get the feel of an enterprise solution - sadly, the surrounding eco system can't quite catch up. There are things that I take for granted in a mature technology - any scenario covered somewhere, just ask Google, ask a question on stackoverflow and it shall be answered. There are so many books and articles and examples for pretty much anything, whatever you come up with - someone has done it before.
Here problems often turn into a lot of debugging and glueing together little pieces of information spread across the web. It's not necessarily a massive problem, after all I'm a smart and experienced cookie ;) I know how to use source code and google (as limited as the resources are, they are somewhere out there, and there are helpful people too). I wouldn't want all this to sound like complaining - I realize a lot of that work is done in spare time that passionate people put in, and only some of them actually get paid for this. So it's more of a reflection and observation of the current state, and not something that is a problem for me. Having said that, I would hesitate a bit before throwing Neo4j at a team of newbies (AKA n00bs), especially if it was a project with relatively fixed deadlines (as opposed to my leisurely pace of 'whenever').

Or perhaps it's an opportunity to become an expert in a new, exciting technology before everybody starts using it :)

(*) Allegedly attributed to US government.

Thursday, 6 February 2014

Synology 412+ - first impressions

For some time now I've been planning to buy an extra machine as a server - for local storage but also to do some more serious development and testing. My current laptop is starting to show its age (or I shall rather say: is on its last legs...) and whilst during pure development it's not bad enough for me to go through the hassle of picking a new one, I decided I could use something extra - just to run some dev tools, an app or two, or a db server, or a little cloud, or whatever it is I'm testing. On top of that I needed a place to store the code - I do not necessarily want to put it all on public github. And some space to store the data (if/when I start playing with things like Neo4j). And a place for the company website (tried google sites but honestly it doesn't seem to be quite what I was looking for). And my hard drive started to make funny noises and I got worried that one day I'll lose all my data.
Theoretically I could have bought a private github account, space in the cloud and what not but I wanted something that will grow with me, and something entirely under my control. Plus, cloud is all good if you have a handful of gigabytes of data but anything beyond and things just get increeeeeeedibly slow to backup.

Finally after some research I decided to buy this little baby: https://www.synology.com/en-us/products/overview/DS412+ . It seemed a bit expensive but after having had it for a few days I think it will turn out to be worth every penny. I am getting more and more excited about it - it's basically a unix server, with full root access, with heaps of inbuilt stuff (NAS being a major one of course). It even has a git server, JDK and Tomcat. Linux on it is rather basic, but I don't think I'll need it much, I will just add maven to the mix and that should be enough for my needs. For now :)

Sunday, 25 July 2010

Java Decompiler for Eclipse

Recently I had to install a decompiler plugin for Eclipse at work and had to spend some time to actually find it as JadClipse sourceforge only points to version 3.3.0 of the plugin which is rather old. In February 2009 a new version - 3.4.0 - has been released, which also changed the name of the plugin. The update site that works also with Eclipse 3.6.0 is http://jadclipse.sf.net/update .

Friday, 4 June 2010

Perforce and Subversion - a short story of a new-found love

Before my previous project I used CVS and SVN and obviously preferred SVN. In my eyes it wasn't maybe the best thing ever, but I was reasonably confident user, knew how to merge and, more importantly, I knew how to force it to cooperate when I screwed my local .svn folders. I liked it.

First few weeks after joining, working with Perforce was a major pain. Ideas of client specs, changelists, "opening for edit", creating branches etc. - all that was new, and not exactly friendly for ex-svn user. I simply didn't know how to use it. I complained a lot, I was completely confused (and rather complicated build process around it specific to the project didn't help at all).

However, recently I switched jobs, and after first shock of learning that we're still using CVS, I realized that the migration I'd love to see is not to SVN. It's to perforce.

What has changed? I have become much more comfortable with P4 and I realized that it really supports productivity and is more powerful than SVN. Bear in mind that last time I used SVN was not this recently and some of the things I'll mention here may be in SVN - but I don't think they were there when I was using it.

The first great strength of P4, especially when working in a really big team and not much of code ownership (i.e. everybody touches everything) is merging. The fact that P4 can track exactly which changes were already integrated (merged) and which not makes it sooo easy to work on branch/trunk and only merge as and when necessary. And unless there are conflicts the merge task of quite big branch can be literally 2 mins. And - again - because P4 tracks what you already merged and what not, you could merge every day and barely notice you're working on a branch. you can bring changes between branch and trunk back and forth and unless you're (rather stupidly) trying to manually merge files - it's absolutely unbeatable. Of course working on branches will always give some overhead - but I feel that managing this with P4 is way easier than with SVN.

I also loved 3-way diffs (and even though I'm sure that could be done on every version control system, somehow I first saw it when using P4).

I loved changelists. I now can barely imagine living without them in SVN. Perhaps it's because I sometimes have some "side-work" that is of very low priority, but being able to keep it on a seperate changelist until I'm (finally) able to commit is really nice. I don't have to check with each commit what files I want to submit now and which ones later - or not at all. BTW latest P4 has idea of "shelving changes" so there isn't even a worry that if the machine dies, you lose your work.

I absolutely love the fact that it doesn't store any additional information in my code folders and that it's so easy to revert if I accidentally remove a directory.

Despite my initial reservations about "open for edit" sometimes it actually was really useful - for example before doing a major refactoring and deleting/renaming some classes I could check if by any chance someone else is working on them and talk to them, maybe postpone my refactoring a bit so that we don't have to merge. It's not a lock, multiple people can have a file open for edit - it's just a marker. The only drawback is that the first time you start editing a file Eclipse would hang for a fraction of second while opening the file for edit - but you can really live with that.

Absolutely fantastic support for viewing history, with my favourite "time line view" which is basically "blame on steroids". You can see whole history (or selected range) of the file with deleted lines stroked out, additions and modifications clearly marked, with author names alongside, and easy access to details of the commit (when, comment etc.).

What I didn't like?

That you need to remember to explicitly add files for add - in theory you have to do that in svn as well but somehow SVN clients are always much better in detecting new files. The solution here is to make Eclipse always open for Add when you create a file

Eclipse plugin for P4 is nowhere near SVN plugin - but to be honest, it didn't take me very long to get used to having P4Win constantly open on second screen and I didn't miss direct Eclipse integration this much. It would have been much worse if I only had something like Tortoise.

Creating of labels and branches could be easier. Once you grasp the concept and understand how it works - the procedure is not this bad, still I think it could be a bit more user-friendly (perhaps all it needs is a nicer wizard in P4Win!).

Overall, I'd say that yes, perforce has a learning curve (in my case from "omg, wtf is that?!" to "it's actually kind of nice" it was about 2 months), but now given choice I'd go for Perforce without a second of hesitation. Unfortunately the fact that is not free (or perhaps should say: is very expensive) doesn't help - but I just found out that you can get Perforce for free for 2 devs - so you can always try it.

I hope that in near future I'll get a chance to try out Git which gets a lot of praise - however, I'm slightly worried by the lack of something in the class of P4Win/TortoiseSVN. Perhaps it's just a matter of time.

Sunday, 2 May 2010

Useful Eclipse plugins (1) - Grep console

I hope this is just the first in series of posts regarding useful Eclipse plugins - there are so many of them. Quite a lot are widely known - here I want to concentrate on the ones that I wasn't aware of for quite a long time.

Grep Console is one of them - a very useful plugin as, like in my current project, the amount of log output is enormous. It's easy to miss what you're looking for... well, not any more.

Here are my settings for this plugin (unfortunately the line with settings cannot be split!):


#Fri Jan 22 16:46:53 GMT 2010
eclipse.preferences.version=1
settings=<?xml version\="1.0" encoding\="UTF-8" standalone\="no"?><expressions><expression grepExpression\="^(.*Exception\:|.*Caused by\:|\\s*at).*$" name\="EXCEPTION"><style background\="ff80c0" foreground\="000000"/><style/></expression><expression grepExpression\=".*ERROR.*" name\="ERROR"><style background\="ff0000"/></expression><expression grepExpression\=".*WARN.*" name\="WARNING"><style background\="ff8040"/></expression><expression enabled\="false" grepExpression\="^.*INFO .*$" name\="INFO"><style foreground\="808080"/></expression><expression grepExpression\="^.*DEBUG.*$" name\="DEBUG"><style foreground\="c0c0c0"/></expression><expression enabled\="false" grepExpression\="^[^\\d].*$" name\="NONLOG"><style foreground\="c0c0c0"/></expression><expression grepExpression\="^\\t.*$" name\="TAB"><style foreground\="000000"/></expression><expression grepExpression\="^Hibernate.*$" name\="HIBERNATE"><style foreground\="c0c0c0"/></expression></expressions>

It needs to be placed in %ECLIPSE_WORKSPACE%\.metadata\.plugins\org.eclipse.core.runtime\com.musgit.eclipse.grepconsole.prefs file

Saturday, 20 February 2010

Google collections ordering and filtering

Recently on my project I faced a task that could be simplified to a following problem:

I have a list of objects of class Foo, which has a field bar which is a String. I get a list of Strings X and the task is:

a) from the set of objects Foo I need to filter out anything for which the bar field is not equal to one of the Strings on the list X

b) the order of objects Foo according to their bar value must match the order of Strings on the list X

Google Collections came to the rescue allowing me to write a nice and concise (and bug-free!) code in a matter of minutes. Here is how it looked like:


import java.util.Collection;
import java.util.List;

import com.google.common.base.Function;
import com.google.common.base.Predicate;
import com.google.common.collect.Collections2;
import com.google.common.collect.Ordering;

public class Bazz {

 private final static class Foo {
  private String bar;

  public String getBar() {
   return bar;
  }
 }

 /**
  * @param bars
  *            Reference bar List to define filter and sorting
  * @param foos
  *            List of object that needs to be filtered and sorted
  * @return Filtered and sorted Foos
  */
 public List filterAndSort(final List bars, final List foos) {

  Collection filtered = Collections2.filter(foos, new Predicate() {
   public boolean apply(Foo foo) {
    return bars.contains(foo.getBar());
   }
  });
  Ordering fooOrdering = Ordering.explicit(bars).onResultOf(new Function() {
   public String apply(Foo foo) {
    return foo.getBar();
   }
  });
  return fooOrdering.sortedCopy(filtered);
 }
}

The only thing I'm currently wondering about is how would it look in a functional language like Scala for which the book is already in the reading queue...

Geekette's playground