I have talked about Graph Databases and PHP. It occurred to me that there are any number of fantastic tools available now. So this is a curated list of several graph databases with bindings in PHP, some comparison between them, and a few other tools that are database agnostic. These NoSQL databases can have some incredible benefits. It is definitely worth checking out.
The GraphDB landscape is still emerging, and a lot of technologies and people are fighting to define it. It’s especially difficult since SQL has dominated since, well, the birth of the internet. Developers know SQL and are comfortable with SQL database structures. Many of the burgeoning NoSQL (not only sql) databases try to mimic SQL in a lot of ways.
But, in many ways, graph databases are just different. They require a different way of thinking. There is no standard, no Structured Query Language that everyone agrees on. At least not yet.
Enter the Tinkerpop group, a collection of very smart engineers who are slowly standardizing graph databases. If you want to use graph data, tinkerpop is your first step, and NOT just for PHP. In fact, all of their tools are language agnostic.
A great place to start with Tinkerpop is: http://www.tinkerpopbook.com/
They have many standards and technologies that are being developed. Here are a couple of the highlights you must know:
“Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model. Blueprints is analogous to the JDBC, but for graph databases. As such, it provides a common set of interfaces to allow developers to plug-and-play their graph database backend.“
This is that elusive “standard” I was talking about earlier. Blueprints enabled graph databases allow for developers to learn one set of protocols (for creating graphs, dropping graphs, and the like).
Along the same lines, Gremlin is a “traversal language.” The power of graph databases isn’t in storing information, or even in retrieving it — any database can do that. A graph database saves connections and can find patterns. Once data is saved, you can land at a specific point and walk or traverse the graph to find who likes whom, what is connected to what, and commonalities like “users who purchases this…” all at blazing fast speeds.
Gremlin is the language to do this. Its the SQL of the graph database world, but designed with graph databases in mind. It harnesses the power of traversals. On another note, its more of a mini programming language so you can use the power of variables and control structures, even classes and functions. This gives a lot of flexability.
Rexster puts all this (and more) together. The Rexster server sits between your database of choice (there are plenty) and your application (written in PHP) and provides a RESTful or Binary point of access. So, you can make simple AJAX calls to Rexster and it will send back the appropriate information, very fast. This also allows you to switch our your database (or work with multiple databases) in the future.
There are dozens of graph databases. I will not mention them all here, but you can check out a good list at Wikipedia.
This is a great little database that can handle a lot of data in a flexible way. Orient is a document-graph database, so it uses concepts from document databases (like MongoDB) but manages the relations between these documents as a graph. Document databases allow data to have children (or nested documents). It is written in Java and (honestly) works best if you use Java to manipulate it. Most of the community seems focused this direction. However, the database has a RESTful server and binary connection, so there are packages for just about every language. Plus, it is Tinkerpop compatible, so you could use Rexster for PHP.
There is a free community version that is virtually identical to the paid version, minus support.
From the website:
OrientDB is an Open Source NoSQL DBMS with the features of both Document and Graph DBMSs. Written in Java, it is incredibly fast: it can store up to 150,000 records per second on common hardware. Even for a Document based database, the relationships are managed as in Graph Databases with direct connections among records. You can traverse parts of or entire trees and graphs of records in a few milliseconds.
This is the defacto leader of the graph database world. They do not claim to be anything but a graph database, but they do what they do very well. The focus is on high scalability (so your database could grow across several machines) and speed. It is well supported and has a large and active community. What really (in my opinion) makes Neo4j shine is its documentation and tutorials.
Neo4j has mature binding for most languages (including PHP). The downside is that the free version is fairly limited and the licence for the professional version can be pricey. Still check it out.
From the website:
Neo4j is a highly scalable, robust (fully ACID) native graph database. Neo4j is used in mission-critical apps by thousands of leading startups, enterprises, and governments around the world. The versatility of graph databases allows you to power applications in many different domains with their flexible data-model and high-performance query capabilities. You’ll only see a tiny selection of possibilities here, the sky is your limit.
The last I will mention is Titan.
Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals.
As the name implies, Titan is for the really big stuff. look at that: hundreds of billions. It also has great support for Tinkerpop and plays really nicely with other datastores (even allowing you to use other datastores to hold the actual data). Titan is (from what I gather) for the big boys. Possibly the best thing about Titan is that it is totally open source. There is one licence, though you can purchase support.