Which Graph database for php

Update: the Tinkerpop3 stack has just been released and now uses Gremlin Server instead of Rexster. If you’re interrested in using the Tinkerpop2 stack then by all means read on. If not I suggest you read Get up and running with Tinkerpop 3 and PHP

A while back I was confronted with the rewrite of a web application of ours.  Why rewrite? Very simple, the code was ugly, it was poorly documented, hard to maintain, and on some of the most stressed platforms it was slow.

Slow can mean anything but in this case it was quite obvious that the bottleneck was the database (MYSQL). It wasn’t so much the DB as what was being done with it that led us to our ruin. In fact, some of you will scream when you hear that an Entity Attribute Value (EAV) table was the root of all problems. Nothing like an anti-pattern to kill your app!

Unfortunately, as ugly as an EAV patterned table may be, it was the only option to fit the functional requirements of the application… Or so thought the original writers.

In fact, if you ever find yourself in a situation where you need to use such a pattern for more than simply finding Attributes and Values for a given Entity (like say you wanted to find an Entity from a Value) then you can be certain that a relational database is not for you.

But don’t despair, there’s a whole world of schema-less No-SQL databases out there just ready to give you a hand. I could go on and on about your options, but today I will focus on graph databases such as neo4j or orientDb, and more precisely, their use within php.

Before I go any further, I will not explain what a graph database is in this blog. You can head over to neo4j.org and check Emil’s video or read about what a graph database is.

Which graph database to choose? OrientDb vs neo4j vs whatever…

There are more graph databases out there than you probably expect.
Unfortunately it will always depend on your application and your use case. There is no other way or figuring this out without laying out your limitations, use cases and then prototyping them and benchmarking them, so instead of looking at things in this light let me help you answer this in the simplest of ways:

Use all of them.

Chances are you don’t know all your limitations, or you don’t have the time to make prototypes for all your graph DB options. So you can’t make an educated decision. Besides, no one likes to be restricted.

It might sound a little unrealistic to expect to be able to use any (or almost) graph DB without limiting yourself to a sole choice. But thankfully the TinkerPop stack exists. And in the case of PHP, the Rexster project is where it’s at.

TinkerPop and Rexster

Without going into detail, Rexster is a graph DB server that allows you to plug and play graphs from many different blueprint enabled graph databases. In other words, you could load a neo4j graph and/or an orientDB graph into rexster and access/modify them in the same way.
This means you can start developing your web app using neo4j and have the simple option of switching to orientDb if it suited your needs better later on. No need to decide on the database right away!! Let your project grow and then decide.

Since this is aimed towards php users, I will be concentrating on getting you started with the rexpro-php library but your favorite language is probably covered by other clients as well.

All scripts run against this server are to be written in gremlin (equivalent of SQL if you will). More on this in the next section.

Getting started with rexpro-php

Before anything else please make sure you have rexster server 2.4.0 running. You can find installation instructions here or download the server directly here.

Get rexpro-php

git clone https://github.com/PommeVerte/rexpro-php.git

Once this is done and it is put into your project you can simply get started by using:

require_once 'path-to-rexpro-php/rexpro/Connection.php';
$db = new \rexpro\Connection;
//you can set $db->timeout = 0.5; if you wish
$db->script = 'g.v(2)';
$result = $db->runScript();

This should output information on the node/vertex # 2 in the form of an array. You can find more about the scripting language used in the gremlin wiki and at gremlindocs.

I’m going to leave things at that for now. I might post more information about necessary precisions in the future. Stay tuned. And please ask questions if things weren’t clear.