Sunday, 11 December 2016

Yet Another "Tech Interview Process is Broken" post

Recently I came across a few articles on the web - people writing about process in general, or about their own (nightmarish) experiences.
The thing that tipped me to actually write this post was somebody writing about the interview for Amazon. I read this and "WTF" would be a rather mild description of what was going in my head. I actually admire this person's patience as I don't think I'd go beyond the "please give me full control over your machine", and most definitely not past the "take all your written materials off your desk, and you're not allowed to use pen and paper". The privacy aspect of this experience is one thing, covered there quite well, and I'm not going to write about it here. In my world it would be unacceptable, period. But what else came to me was the reason for why Amazon considered such extensive control necessary.

Before I go any further with this let's check if we agree on one primary thing. What is the purpose of an interview? Is it
a) to show the interviewer's superiority?
b) test someone's IQ?
c) check resilience to stress and/or ability to work in unknown circumstances?
or
d) check that someone can perform the job that we're hiring for?

Now, my answer to this is most definitely d), and obviously anybody selecting other option will most likely not agree with the rest of reasoning here, so please feel free to drop off the call now.

Quick show of hands. Do you think employees in Amazon (or any other company doing similar tests) during their daily job can:
a) access the internet (you know, google, wikipedia, stackoverflow, the lot...)
b) use pen and paper?
c) use any other written materials, such as books, personal notes etc.?
Why is it then that suddenly, just because it's "an interview for a job", all of that becomes forbidden? How closely does this mimic actual performance?
I mean, seriously. Imagine for a moment that you have to spend a week in your current role and internet access, all written materials AND your personal notepad were taken away. How productive would you be?

Being a good developer is not (just) about knowledge. It's not about how much you remember. How many books you can almost quote from the top of your head. It's about how you apply the knowledge in practice, and how well you manage to deal with things that maybe you haven't come across before.

We're no longer school kids. The testing surely can move on from the idea "unless it's in your head it doesn't exist". A vast majority of what we do is how quickly, how efficiently, how confidently we can use the resources at our hands. When I go to google with the problem - how good my query is. How quickly can I filter out the unrelated stuff from what's useful. How well can I translate solution in SO to my particular circumstances. How well I understand the root cause of the problem (rather than just being able to copy-paste some duct tape solution around it). Am I able to isolate the problem in a more complex situation. And so on and so on and so on. Why is it suddenly considered "cheating" if I do what I always do?

Software development is now so complex, so vast an area that it really is impossible for anybody to remember it all. So why try? Why make the test focus on our "in memory resources"? Yes, I know, the problems that are given are meant to be simple enough that you shouldn't have to use external resources (if that actually is the case is a topic for a different conversation). But that only tests that the candidate can use what's in his head. Where is your test that they can efficiently use what's out there - which no doubt they will have to do on a day to day basis.
Ironically, I think that taking away the internet access tests LESS for how well a person is prepared for the job, than if you allow everything and just watch: how does this person use the power given to them. If I got a pound every time I have to send a colleague to lmgtfy.com I'd be... well, maybe not rich :), but it certainly would pay for a pint or two.

It always reminds me of one exam we had during my university times. Unlike most other professors, this chap allowed us to bring absolutely anything and everything to the exam - books, notes, slides from lectures. Anything you thought was useful. (Communication between students, thus internet, wasn't allowed, but that's rather obvious). You'd think that everybody came out of this exam with straight A but that was definitely not the case. In fact, it was probably one of more difficult ones, a lot of people still failed - but I absolutely loved it. It was timed, and you needed to have an idea what you were doing - but if you forgot this or that detail, you could just look it up. It was the essence of testing how you use your knowledge.

It also reminds me of a fairly simple coding exercise we send to candidates for our India team. Knowing what you're doing it would take you 20-30mins to write it all, tested on colleagues at work. They can complete it at a time of their choosing, spend as much time as they want, use any resources they want. You'd think it's a pointless test and we'd only get perfect solutions. Actual statistics?
- probably 30-40% that doesn't work at all - doesn't compile, is not sent in a project (mvn/intellij or eclipse) format despite explicit instructions, doesn't correctly implement the requirements, is completely not thread safe
- another this many which sort of work but they have a list of "sins" - like no unit tests (again, despite explicit instruction they need to be provided), "tests" that are not tests with assertions but main methods, using strange language constructs/coding style, or effectively single-threaded solutions even though instructions call for "efficient" solution and stress out the multi-threaded aspect of the task. Even asking "make sure you don't send any binaries" (because our company firewall blocks such attachments) is too much to ask
- that leaves us with maybe 10-20% solutions where there is any point talking to the candidate, at which point a couple of "why did you solve it this way" questions reveal whether they wrote it themselves.

I honestly don't understand the obsession with "no internet because it's cheating" coming from anybody who works in this industry. It's not cheating. It's part of my job to know how to use the internet well.

Does it make it more difficult to prepare a task that cannot be copy-pasted and tells you nothing? Well, of course it does. You can no longer ask the candidate to implement "check if number is a prime". But going for "ban usual coding resources" just because you're lazy is no excuse.
So sorry Amazon (and Facebook, and Google, and...). You will not see my CV on that pile you get every day.

Friday, 20 November 2015

The abstraction illusion

It's been an awfully long time since I last posted here about Neo4j. I have actually been using it quite a lot, on and off, on my personal project and doing some prototyping for the clients. Sadly it never took off (more due to political environment of the project than due to merits) but I've had some thoughts that I'd like to share here.

Let me start by saying that I absolutely love Neo4j as a database. It has definitely reached the state where you can get a lot for your (metaphorical) buck - i.e. get something working really nicely in a very short space of time. I managed to prepare a simple model, import some CSV data and link it nicely within a couple of hours (and most of that time was getting the data into CSV!). Everybody on the project reacted with a lovely "WOW" when they saw me playing around with the GUI to look into some data patterns.
There were voices that perhaps having nodes bouncing around like little bunnies is a bit too much, but otherwise, a full success.

As long as my queries are 5-liners fitting into the UI, and my data can be imported from CSV, I'm loving it. Things get quite a bit different when I start looking into building an actual proper project on top of Neo4j - i.e. where data needs to be updated on a selective basis, transactionality, object model, data retrieval for dynamically built queries, and so on. Basically, if we think Neo4j UI app is the equivalent of Embarcadero/SquirellSQL/Oracle SQL Developer/pick your poison - then what I'm talking about getting from there to building an actual application. JDBC, Hibernate, the lot.

And I think this is precisely where the problem is. We're trying to re-write JDBC and Hibernate for Neo4j and it just doesn't feel good.

I understand the appeal of "one size fits all". I understand the rationale behind Spring Data X initiative or creating the JDBC driver. I actually was really excited about it, so much Spring JDBC goodness to reuse! It's lovely to think that we can abstract away from our underlying storage, have a common API for operating with all our data, and swap things underneath. I have started by trying to use Spring Data Neo4j (in both versions, before and after re-write). I've tried Neo4j JDBC. I really wanted to love it - but I didn't. With all of them, the more I used it the more frustrated I got. The abstraction was taking away parts of the functionality of Neo4j, or at least making it very awkward and unnatural to work with. It felt almost as if I had to hack it to do what I wanted to. So whilst in theory abstraction is great, in practice... Well. It didn't work for me.

My data is not arranged in rows

The whole point I'm choosing Neo4j and not a traditional database is that my data does NOT fit nicely in a tabular world. My data is a graph - and therefore trying to fit it into abstractions that were thought of at times when data was mostly tabular just doesn't work that well. For NoSQL databases which are closer to tabular world (like Cassandra) I suppose it makes a bit more sense - but for something like Neo4j sorry, but it just does not work for me.
The simplest example I can think of: say I have a graph with cities and countries, and a relationship between city and country. I want to fetch all cities in countries X and Y and build object model where a Country contains its cities. There are 100 cities in country X and 50 in country Y. The "rows" approach of Neo4j JDBC (or SDN repository) of query along the lines of

   MATCH (country:Country) -- (city:City) WHERE country.name IN ['X','Y'] RETURN country,city

will mean that I get the details of country X 100 times, and country Y 50 times. Which is 99+49 times too many. It's not wrong. I can get to my object model from that. It's just inefficient. Neo4j itself (even on the REST endpoint) supports a "graph" view as well as "rows" view (after all some queries might return more "tabular" data) - but if you're going through an abstraction, the opportunity to choose which one is better for any given query is lost.

Cypher is different than SQL 

Another thing that really bugged me in SDN/OGM was lack of proper support for the "MERGE" functionality. One of the things that I absolutely love about cypher is how easy it is to update/enrich things. If I have 3 sources of data that complement each other, it's super easy to combine them into one superset (enrich properties, create missing nodes etc.). I don't need to try and find a matching node (if it exists) first - all this is done for me. SQL doesn't really have a direct equivalent (except perhaps specific dialects) but it's no reason not to use it with Neo4j. Which sort of brings me to next point...

I don't care about internal ID, I have a business id 

Whilst the support for composite (multi-property) business ids in Neo4j could be better, with a tiny bit of magic (AKA string concatenation), or if you're lucky and id is a single property, managing updates is super easy. Sadly, Neo4j/OGM brings Neo4j internal ID into the picture, and pretty much all the operations are based on this.
Why is that a problem? I have an externally managed business id (e.g. given by a database, or externally generated UUID). I process updates to entities, e.g. get MQ messages with new state of the entity with given business id. If I go bare-bones Neo4j with a merge, this is a single super-simple query. If I try to go via SDN/OGM-route, each message requires me to first fish out the entity out of graph (based on business id), then update all the properties on it from the received object, and only then can I issue an update. If the object has relationships you have to be really careful about the depth of the fetch, and overall things can get really messy really quickly - I managed to get all my relationships wiped out as I was trying to update an object properties for example... Probably my fault, but it wouldn't have happened if I wasn't using the "magically" generated queries and just issued a simple merge with update of properties instead.

Quo vadis?

I realize that some of the issues that I mentioned here can be fixed. However, the point I'm trying to make is that the abstraction that we're starting with is pushing us towards working with Neo4j in sub-optimal ways. It brings relational database usage patterns into a graph world. We can try and adjust it into this new world but ultimately, it wasn't designed with that in mind and will probably always feel a little bit awkward. It will always push our thinking into rows-oriented view first, which then (maybe) will be adjusted into a graph view. It creates an illusion that we're working with something familiar - but IMO we're not.
It might be especially dangerous when you try this approach with a team of people who are very familiar and comfortable with the database world, but don't take the time to understand the difference that Neo4j brings. You'll see queries like "MATCH (foo:Foo), (bar:Bar) WHERE foo.id = bar.id" - and they'll wonder why things are so slow. But it's hard to blame them - if it looks like a database, if it works with Spring JdbcTemplate, shouldn't it behave the same?
Abstractions are nice when we're abstracting from an apple and an orange to a fruit (which is why SQL and JDBC were so successful), but what do you abstract to from an apple and a bunny?

So for now, I decided to go with bare Neo4j. I've started creating a mini-abstraction over embedded querying vs REST API. It is very graph-specific - but I'm fine with that. That's the level of abstraction that I find useful. Neo4j native APIs are actually quite pleasant to work with, so I find that using them directly instead of through an abstraction works much better for me. And contrary to what I expected, I'm much more productive now that I'm not fighting the tools to do what I want to do.

Friday, 4 July 2014

Indexing of fulltext properties from cypher and unique relationships

Why are you hiding?

Issues from the previous post aside, I needed to import some data from CSV. That went surprisingly painless (well, the first part anyway...) - but despite having an index on one of the fields in the class, after some testing I realized that I couldn't find my entities by that field.

My class mapping looked something like:

public class City extends GraphNode {

    @Indexed(indexType = IndexType.FULLTEXT, indexName = "locations")
    private String name;
...
}

My cypher import (note the extra labels for SDN):

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "

SDN repository I used for testing:

public interface CityRepository extends GraphRepository {

    Page findByNameLike(String name, Pageable page);
    
    List findByName(String cityName);
}

I had a test for the repository and the lookup worked fine when data was inserted via SDN but not with my CSV Cypher import. With the help of brilliant Michael Hunger I managed to find a reason and workaround. For details of why the next step is needed check Michael's explanation, if all you want is to make it work, for now you'll need to do something like this:

String cypher = "LOAD CSV WITH HEADERS FROM \"" + fileLocation + "\" AS csvLine "
+ "MERGE (country:Country:_Country { name: csvLine.Country } ) "
+ "MERGE (city:City:_City { name: csvLine.City } ) "
+ "MERGE (city) - [:IS_IN] -> (country) "
+ "MERGE (airport:Airport:_Airport {name: csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode: csvLine.ICAOcode} ) "
+ "MERGE (airport) - [:SERVES {__type__: 'AirportCityConnection'}] -> (city) "
+ "RETURN city";
Result cities = neo4jTemplate.query(cypher, ImmutableMap.of()).to(Node.class);
Index index = db.index().forNodes("locations");
for (Node city : cities) {
    String location = (String) city.getProperty("name");
    index.remove(city);
    index.add(city, "name", location);
}

Modelling marriage relationship

Well, in all honesty I was actually modelling a flight schedule but the same principle applies - you cannot fly out from two airports at the same time on the same flight number. Yet (with the help of Excel autocomplete feature, which changed flight code QF1 into QF10...) I managed to create data that implied that this can actually happen. My SDN model was not taking this situation into account and cried not-so-silently when I tried to retrieve the data from Neo4j

java.lang.IllegalArgumentException: Cannot obtain single field value for field 'to'
 at org.springframework.data.neo4j.fieldaccess.RelatedToSingleFieldAccessorFactory$RelatedToSingleFieldAccessor.getValue(RelatedToSingleFieldAccessorFactory.java:94)
 at org.springframework.data.neo4j.fieldaccess.DefaultEntityState.getValue(DefaultEntityState.java:97)

So a quick tip is - if you want to avoid bigamy in your database (and this exception) - make sure you're not making node A married to B and C at the same time.

Stay tuned for more Neo4j drama. :)

Saturday, 28 June 2014

First impressions of Neo4j

I have always had interest in big data - much more than, say, low latency processing. I never wanted to optimize the code for every processor cycle, or every bit sent across the wire. I appreciate people who are capable of doing that, it's just never been something I was passionate about. It's too... low level. Too C ;). I want abstraction, business domain - and somehow, in particular I get the buzz from the idea of having and processing loads and loads of information.

In my career so far I was lucky enough to have more or less exposure to technologies such as Gigaspaces, Cassandra and Attivio, and even though I've heard about neo4j and briefly touched it quite a while ago, it wasn't until very recently that I properly picked it up for a personal project (which will turn into the next Facebook, obviously... ) ;).

I started with an awful lot of enthusiasm, only to be reminded that picking up a new toy can be painful sometimes - especially when the toy is still very much 'work in progress'. :) Below is a short summary of my journey so far.

Hello, world!

As a typical geek, I downloaded the latest and greatest stable versions of everything involved - at that time it was Neo4j 2.1 (2.1.1 or 2.1.2, can't remember) and Spring Data Neo4j 3.1.0. I read a book about graph databases from O'Reilly (very good BTW), got another one for Spring Data (joys of Safari), and well equipped started my 'hello world' level app. What could possibly go wrong, right? :)

Well, only a startup issue, throwing the following exception:

Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'currencyRepository': Cannot resolve reference to bean 'neo4jTemplate' while setting bean property 'neo4jTemplate'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'neo4jTemplate' defined in class org.springframework.data.neo4j.config.Neo4jConfiguration: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.springframework.data.neo4j.support.Neo4jTemplate org.springframework.data.neo4j.config.Neo4jConfiguration.neo4jTemplate() throws java.lang.Exception] threw exception; nested exception is java.lang.NoClassDefFoundError: org/neo4j/kernel/impl/transaction/LockException
 at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:336)
 [...]

Googled a bit, didn't find an awful lot except for a logged jira suggesting that the versions I got are not compatible. Fine, I'll go with the SDN 3.2.0-SNAPSHOT, they surely would fix something like this? Well... no.
Okay, still with plenty of enthusiasm I decided to downgrade to neo4j 2.0.3. Things started fine. Nice.

Hot like a model. But not unique.

I modeled a couple of domain objects. Although neo4j only supports primitives and strings, SDN is meant to do automatic conversion. Sweet! Made one of my fields an enum, annotated it as indexed and unique - doesn't work :( Crashes on a save to neo4j saying that type is incompatible:

org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is org.springframework.dao.InvalidDataAccessResourceUsageException: Error executing statement MERGE (n:`Weekday` {`weekdayCode`: {value}}) ON CREATE SET n={props} SET n:GraphNode  return n; nested exception is java.lang.IllegalArgumentException: [MONDAY:java.time.DayOfWeek] is not a supported property value
 at org.springframework.data.neo4j.support.query.CypherQueryEngineImpl.query(CypherQueryEngineImpl.java:61)
 at [...]

Well, not surprising - I can clearly see that it sends an enum. But wasn't it meant to do translation for me? Boohoo. Took me some time to figure out that things get messed up when you add the uniqueness on a non-string/primitive field. Asked a question about it on StackOverflow about it, got no answers. Fine. Enum. Unique. Pick one. Let's go back to String then...

Oh, and let's not forget that the uniqueness in SDN is implemented as an update, not an exception. Could easily trip on this one, too - and the number of questions on SO and elsewhere about it seems to imply it's not just me expecting a crash rather than silent modification of the data.

All your data are belong to us (*)

Having the super simple model it's time to get some data in. New neo4j has the neat CSV import. Desperate to use it I decided to hack the SDN and fix the class that was failing on class not found. Surprisingly, it worked (and later I learnt SDN team finally released a SNAPSHOT with a fix so switched to that instead). So now on neo4j 2.1.2, SDN 3.2.0-SNAPSHOT I managed to import some data and create nodes that should be read by SDN. No joy first time round, SDN doesn't find it and crashes with bizarre errors.

 
java.lang.IllegalStateException: No primary SDN label exists .. (i.e one starting with _) 
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:126)
 at org.springframework.data.neo4j.support.typerepresentation.LabelBasedNodeTypeRepresentationStrategy.readAliasFrom(LabelBasedNodeTypeRepresentationStrategy.java:39)

org.neo4j.graphdb.NotFoundException: No such property, '__type__'.
 at org.neo4j.kernel.impl.core.RelationshipProxy.getProperty(RelationshipProxy.java:189)
 at org.springframework.data.neo4j.support.typerepresentation.AbstractIndexBasedTypeRepresentationStrategy.readAliasFrom(AbstractIndexBasedTypeRepresentationStrategy.java:126)

It turned out SDN labels the nodes with 2 labels, not just one as I expected (for a class Foo I thought I only need label Foo, turns out both Foo and _Foo need to be present). For relationships I also needed an extra property added.

Everything is cypher-rific.

Having data in, basic mapping and basic repos, time for some Cypher fun. After a couple of classic PEBKAC issues I got a few more sophisticated queries working. All of them were basically copy-pasted from the lovely neo4j browser tool and worked just fine - but one problem with that. What about refactoring? What about more dynamic queries? Well, Neo4j has the dsl library which allows for dynamic query creation, clearly that should be better, shouldn't it?
Only finding any documentation for it, except for mentions here and there 'it exists and works with querydsl' proved a non-trivial task. The best doc I found? The project's Github repository junit test cases (I honestly cannot count a handful of rather dated articles here and there, with 1-2 simple examples each). Call me old fashioned, but that doesn't make me feel like it's a mature, stable and well supported tool :) Also, although someone kindly committed a patch for labels to work, I couldn't find a SNAPSHOT jar to download so had to clone Git project and build it myself. Not that it's an awful lot of work but let's just say that in the place I work that would be a total no-go.

That aside, there is a library meant to work well with cypher, querydsl, which should make my queries even nicer. No-brainer, huh? Well, yes, except that it didn't even pretend to work with my LocalDate fields and produced code that wouldn't even compile. Back to Strings. And back to lack of decent documentation how to marry this and cypher.

And the journey continues

And this is where I am at the moment. All in all, the journey so far was a lot of fun and neo4j starts to get the feel of an enterprise solution - sadly, the surrounding eco system can't quite catch up. There are things that I take for granted in a mature technology - any scenario covered somewhere, just ask Google, ask a question on stackoverflow and it shall be answered. There are so many books and articles and examples for pretty much anything, whatever you come up with - someone has done it before.
Here problems often turn into a lot of debugging and glueing together little pieces of information spread across the web. It's not necessarily a massive problem, after all I'm a smart and experienced cookie ;) I know how to use source code and google (as limited as the resources are, they are somewhere out there, and there are helpful people too). I wouldn't want all this to sound like complaining - I realize a lot of that work is done in spare time that passionate people put in, and only some of them actually get paid for this. So it's more of a reflection and observation of the current state, and not something that is a problem for me. Having said that, I would hesitate a bit before throwing Neo4j at a team of newbies (AKA n00bs), especially if it was a project with relatively fixed deadlines (as opposed to my leisurely pace of 'whenever').

Or perhaps it's an opportunity to become an expert in a new, exciting technology before everybody starts using it :)

(*) Allegedly attributed to US government.

Thursday, 6 February 2014

Synology 412+ - first impressions

For some time now I've been planning to buy an extra machine as a server - for local storage but also to do some more serious development and testing. My current laptop is starting to show its age (or I shall rather say: is on its last legs...) and whilst during pure development it's not bad enough for me to go through the hassle of picking a new one, I decided I could use something extra - just to run some dev tools, an app or two, or a db server, or a little cloud, or whatever it is I'm testing. On top of that I needed a place to store the code - I do not necessarily want to put it all on public github. And some space to store the data (if/when I start playing with things like Neo4j). And a place for the company website (tried google sites but honestly it doesn't seem to be quite what I was looking for). And my hard drive started to make funny noises and I got worried that one day I'll lose all my data.
Theoretically I could have bought a private github account, space in the cloud and what not but I wanted something that will grow with me, and something entirely under my control. Plus, cloud is all good if you have a handful of gigabytes of data but anything beyond and things just get increeeeeeedibly slow to backup.

Finally after some research I decided to buy this little baby: https://www.synology.com/en-us/products/overview/DS412+ . It seemed a bit expensive but after having had it for a few days I think it will turn out to be worth every penny. I am getting more and more excited about it - it's basically a unix server, with full root access, with heaps of inbuilt stuff (NAS being a major one of course). It even has a git server, JDK and Tomcat. Linux on it is rather basic, but I don't think I'll need it much, I will just add maven to the mix and that should be enough for my needs. For now :)

Sunday, 25 July 2010

Java Decompiler for Eclipse

Recently I had to install a decompiler plugin for Eclipse at work and had to spend some time to actually find it as JadClipse sourceforge only points to version 3.3.0 of the plugin which is rather old. In February 2009 a new version - 3.4.0 - has been released, which also changed the name of the plugin. The update site that works also with Eclipse 3.6.0 is http://jadclipse.sf.net/update .

Friday, 4 June 2010

Perforce and Subversion - a short story of a new-found love

Before my previous project I used CVS and SVN and obviously preferred SVN. In my eyes it wasn't maybe the best thing ever, but I was reasonably confident user, knew how to merge and, more importantly, I knew how to force it to cooperate when I screwed my local .svn folders. I liked it.


First few weeks after joining, working with Perforce was a major pain. Ideas of client specs, changelists, "opening for edit", creating branches etc. - all that was new, and not exactly friendly for ex-svn user. I simply didn't know how to use it. I complained a lot, I was completely confused (and rather complicated build process around it specific to the project didn't help at all).

However, recently I switched jobs, and after first shock of learning that we're still using CVS, I realized that the migration I'd love to see is not to SVN. It's to perforce.


What has changed? I have become much more comfortable with P4 and I realized that it really supports productivity and is more powerful than SVN. Bear in mind that last time I used SVN was not this recently and some of the things I'll mention here may be in SVN - but I don't think they were there when I was using it.


  • The first great strength of P4, especially when working in a really big team and not much of code ownership (i.e. everybody touches everything) is merging. The fact that P4 can track exactly which changes were already integrated (merged) and which not makes it sooo easy to work on branch/trunk and only merge as and when necessary. And unless there are conflicts the merge task of quite big branch can be literally 2 mins. And - again - because P4 tracks what you already merged and what not, you could merge every day and barely notice you're working on a branch. you can bring changes between branch and trunk back and forth and unless you're (rather stupidly) trying to manually merge files - it's absolutely unbeatable. Of course working on branches will always give some overhead - but I feel that managing this with P4 is way easier than with SVN.

  • I also loved 3-way diffs (and even though I'm sure that could be done on every version control system, somehow I first saw it when using P4).

  • I loved changelists. I now can barely imagine living without them in SVN. Perhaps it's because I sometimes have some "side-work" that is of very low priority, but being able to keep it on a seperate changelist until I'm (finally) able to commit is really nice. I don't have to check with each commit what files I want to submit now and which ones later - or not at all. BTW latest P4 has idea of "shelving changes" so there isn't even a worry that if the machine dies, you lose your work.

  • I absolutely love the fact that it doesn't store any additional information in my code folders and that it's so easy to revert if I accidentally remove a directory.

  • Despite my initial reservations about "open for edit" sometimes it actually was really useful - for example before doing a major refactoring and deleting/renaming some classes I could check if by any chance someone else is working on them and talk to them, maybe postpone my refactoring a bit so that we don't have to merge. It's not a lock, multiple people can have a file open for edit - it's just a marker. The only drawback is that the first time you start editing a file Eclipse would hang for a fraction of second while opening the file for edit - but you can really live with that.

  • Absolutely fantastic support for viewing history, with my favourite "time line view" which is basically "blame on steroids". You can see whole history (or selected range) of the file with deleted lines stroked out, additions and modifications clearly marked, with author names alongside, and easy access to details of the commit (when, comment etc.).


What I didn't like?



  • That you need to remember to explicitly add files for add - in theory you have to do that in svn as well but somehow SVN clients are always much better in detecting new files. The solution here is to make Eclipse always open for Add when you create a file

  • Eclipse plugin for P4 is nowhere near SVN plugin - but to be honest, it didn't take me very long to get used to having P4Win constantly open on second screen and I didn't miss direct Eclipse integration this much. It would have been much worse if I only had something like Tortoise.

  • Creating of labels and branches could be easier. Once you grasp the concept and understand how it works - the procedure is not this bad, still I think it could be a bit more user-friendly (perhaps all it needs is a nicer wizard in P4Win!).



Overall, I'd say that yes, perforce has a learning curve (in my case from "omg, wtf is that?!" to "it's actually kind of nice" it was about 2 months), but now given choice I'd go for Perforce without a second of hesitation. Unfortunately the fact that is not free (or perhaps should say: is very expensive) doesn't help - but I just found out that you can get Perforce for free for 2 devs - so you can always try it.


I hope that in near future I'll get a chance to try out Git which gets a lot of praise - however, I'm slightly worried by the lack of something in the class of P4Win/TortoiseSVN. Perhaps it's just a matter of time.