Using Neo4j with NCBI taxonomy and Gene Ontology datasets

Submitted by toniher on dl, 10/08/2015 - 10:45am

Around one year and a half ago I started some testing with graph databases (Neo4j so far) and I used Gene Ontology and NCBI taxonomy datasets as sample cases. I explained my experience in this presentation by February 2015:

After a while, I finally found time to update my importing scripts and API Java extension so they could work with newer versions of Neo4J and Py2neo (2.2.3 and 2.0.7 at the time of writing).

Regarding Py2neo, I noticed that Neo4j REST API seems to rely more explicitly on Cypher queries that it did in the past. With the help of this article about multiprocessing in Python and Py2neo, and after several tries, I managed to get importing work within acceptable time.

As final tips, if you plan to use similar approaches with your own data, I would suggest to create nodes and populate their properties at the same time (keeping data in memory if necessary). I also noticed that trying to create relationships with multiple parallel processes fails, so keep only one worker for these steps.

bioinformatics

neo4j

In English

toniher's blog