Making a Graph DB ep1

3 min

So what’s this all about? This is my first article (hopefully out of many), where I’ll describe my journey of learning how to build a Graph Database.

I’m not sure how far I’ll go yet, I’m doing this in my spare time, so I make no promises…

My background is: software developer :computer: for about 10 years now. I’ve written many kinds of software and used several databases: MySql, Microsoft SQL Server, SQLite, MongoDb, RethinkDB, LevelDB; but I’m no expert, just your regular user.
I’ve also used IPFS and Ethereum, not the regular kind of databases, but fun to learn nonetheless.

And this is how I want to do it: I’ll imagine that what I’m trying to do is already finished, then I’ll explain to you how I did it :scream_cat:
This mental hack keeps me focused and confident, even though it’s a HUGE task…

❓: So why am I doing this?
❗️: Because it’s fun. Because it’s a huge challenge, way more than I can handle. Because I need this kind of library. Because I’ll use this knowledge, even if it doesn’t become a stable product.

❓: What are the challenges?
❗️: I don’t know anything about how databases work internally (well, I have some ideas, but I didn’t study much). I don’t really know how indexing works. I don’t know much about graphs.

❓: So why the heck am I doing this again?!
❗️: In the end, I want to create a better alternative to table-based and document-based databases, so that I can connect data infinitely and navigate the resulting spider web. And of course, I want to learn in the process.

❓: What’s my use-case?
❗️: I want to save all the countries in the world and all their relations; and all the animals and plants in the world and all the data I can find about them; and all my favorite movies, with all the actors and the most relevant relations between them; quantity conversion (length, weight, temperature). I want to use this data in a normal application, where I would use a NoSql database. Maybe in the future, I’ll implement a nice GUI to visualize and edit the nodes and edges.

❓: What’s the maximum data size?
❗️: I’m thinking 10 million edges. It’s a pretty big number for a toy database (see Visualizing 1 million and multiply by 10). And if this becomes serious, I’ll target more.

❓: What programming languages will I use?
❗️: Any, or all of: Elixir, Python, Node. In this exact order. I prefer Elixir the most, because it’s really fun to play with and I want to learn it better. But I might end up using Python, because I’m the most productive with it. Node is also a great option, because there are hundreds of thousands of libs to help me.

❓: Can I really do this?
❗️: Time will tell. If I can’t do it, I’ll resume watching Game of Thrones, I haven’t finished the last season.

❓: Will this be open-source?
❗️: Yes. Most of it, anyway.


The tasks:

  • study Neo4J, Titan, DataStax
  • study MongoDB, RethingDB, CouchDB
  • study LevelDB, RocksDB, Redis, Riak
  • study what exactly is a graph, the math, the theory
  • study how to store a graph in a computer

The first things that I found:


Until next time!

@articles #software #programming #graph #db