Take a breath. Close those tabs. If you’re trying to model friends, fraud rings, or the terrifyingly efficient path from “Add to cart” to “chargeback,” a graph database often saves you from relational-query hell. This isn’t about making your SQL prettier — it’s about avoiding the Cartesian-product wielding chains of JOINs that show up when relationships matter more than rows.
When you actually need a graph DB (relational gets ugly fast)
Relational databases are great at tables, aggregates, and ACID guarantees. They’re terrible at questions like:
- “Who are Alice’s 3-hop friends who like sci‑fi and work within 50km of me?”
- “Show me the path between Order#123 and a suspicious account, crossing users, devices and IPs.”
- “Find communities or cycles in the social graph.”
If your domain is connected data — social, recommendations, knowledge graphs, fraud detection, dependency analysis — the object you care about is a traversal. Relational solutions (recursive CTEs, adjacency tables) work for small graphs, but they become complex and slow when depth or branching factor grows. That’s the moment a graph DB stops being academic and becomes practical.
That said: don’t reach for a graph DB because somebody said “graphs are cool.” If your joins are shallow, or your app is mostly CRUD with occasional reporting, stick with Postgres.
Neo4j install (Docker Compose example)
Neo4j is the iconic, purpose-built graph DB. Nice visual tooling, mature ecosystem, and Cypher — a query language that reads like “graph SQL.” Quick local spin-up with Docker Compose:
version: "3.8"services: neo4j: image: neo4j:5 container_name: neo4j ports: - "7474:7474" # HTTP Browser - "7687:7687" # Bolt protocol environment: NEO4J_AUTH: "neo4j/secret" volumes: - ./neo4j/data:/data - ./neo4j/import:/var/lib/neo4j/importStart it:
docker compose -f docker-compose.neo4j.yml up -d# visit http://localhost:7474, login neo4j / secretTakeaway: Community edition is extremely simple for single‑node local work.
Basic Cypher example (MATCH, CREATE, simple path traversal)
Cypher is expressive for creating and traversing patterns. Example: build a tiny social graph, then find 1–3 hop friends.
// create sample nodes and relationshipsCREATE (alice:Person {name: 'Alice'}), (bob:Person {name: 'Bob'}), (carol:Person {name: 'Carol'}), (dave:Person {name: 'Dave'}), (alice)-[:KNOWS {since:2020}]->(bob), (bob)-[:KNOWS {since:2019}]->(carol), (carol)-[:KNOWS {since:2018}]->(dave);
// find friends up to 3 hops away from AliceMATCH path=(a:Person {name:'Alice'})-[:KNOWS*1..3]->(friend)RETURN [n IN nodes(path) | n.name] AS chain, length(path) AS hopsORDER BY hopsLIMIT 10;And a tiny Node.js client snippet:
import neo4j from 'neo4j-driver';const driver = neo4j.driver('bolt://localhost:7687', neo4j.auth.basic('neo4j','secret'));const session = driver.session();
const q = `MATCH (a:Person {name:$name})-[:KNOWS*1..3]->(f) RETURN DISTINCT f.name AS friend LIMIT 25`;const res = await session.run(q, {name: 'Alice'});console.log(res.records.map(r => r.get('friend')));await session.close();await driver.close();ArangoDB install (Docker Compose)
ArangoDB is multi‑model: documents, key-value, and graphs in the same engine. Spin up a single node quickly:
version: "3.8"services: arangodb: image: arangodb:latest container_name: arangodb ports: - "8529:8529" environment: ARANGO_ROOT_PASSWORD: "secret" volumes: - ./arangodb/data:/var/lib/arangodb3docker compose -f docker-compose.arangodb.yml up -d# visit http://localhost:8529, login root / secretArangoDB also supports clustering without forking over to an enterprise license — more on that later.
Same query in AQL
ArangoDB stores graph edges in edge collections and vertices in document collections. You can use the General Graph API or raw AQL traversals. Example creates and queries roughly the same social graph:
// create collections (run once)CREATE COLLECTION peopleCREATE EDGE COLLECTION knows
// insert verticesINSERT { _key: 'alice', name: 'Alice' } INTO peopleINSERT { _key: 'bob', name: 'Bob' } INTO peopleINSERT { _key: 'carol', name: 'Carol' } INTO peopleINSERT { _key: 'dave', name: 'Dave' } INTO people
// insert edgesINSERT { _from: 'people/alice', _to: 'people/bob', since: 2020 } INTO knowsINSERT { _from: 'people/bob', _to: 'people/carol', since: 2019 } INTO knowsINSERT { _from: 'people/carol', _to: 'people/dave', since: 2018 } INTO knows
// traverse 1..3 OUTBOUND from aliceFOR v, e, p IN 1..3 OUTBOUND 'people/alice' GRAPH 'social' RETURN { path: p.vertices[*].name, hops: LENGTH(p.edges) }A couple of notes: you can also use FOR v IN 1..3 OUTBOUND 'people/alice' knows which references the edge collection directly (no graph object required).
Traversal performance reality
Talking speed: the theoretical complexity of a traversal depends on branching factor and depth. If each node points to 10 neighbors and you traverse 4 hops, you’re looking at 10^4 work in the worst case. Indexes only buy you the initial seed lookup — traversals are graph‑engine work.
Neo4j uses a native graph storage and a traversal engine optimized for following pointers. For deep, pointer‑chasing workloads (e.g., recommendation engines, pathfinding, community detection), Neo4j often wins on single‑node latency.
ArangoDB is highly optimized too, but remember it’s multi‑model: edges are logical constructs stored in collections. For many real workloads ArangoDB keeps up, and its horizontal sharding can outperform Neo4j when you need scale-out across many machines.
Practical advice:
- Index the properties used to seed traversals (name, id). Both engines rely on an index to find the start node quickly.
- Limit traversal depth or prune by labels/types. Blind deep traversals blow memory/CPU fast.
- For OLAP-style analytics over the graph, export to a batch engine (Spark, NetworkX) — not every graph query should be online.
Multi-model promise (is it complexity bait or actually useful?)
ArangoDB’s multi‑model approach is delightful in the “one tool to rule this dataset” sense. Want documents and graphs tightly coupled with minimal duplication? Nice. Want to mix key‑value lookups, graph traversals and document updates in a single transaction? Also nice.
But there’s a cost: API surface area and cognitive load. You’ll need to learn AQL (it’s SQL-ish but quirky), understand collections vs graphs, and decide when a piece of data is a vertex vs an embedded document. If you don’t actually need multi‑model features, that extra flexibility can be a distraction.
Use cases where multi‑model helps:
- Microservices where a document stores the object and a graph links those objects (e.g., product catalog + recommendation edges).
- When eliminating cross‑store duplication is worth a slightly steeper learning curve.
If you want a single, obvious graph primitive and a rich graph ecosystem (APOC, graph algorithms), Neo4j’s narrower focus can be a feature — less wiggle room for making poor modeling choices.
Licensing breakdown (Neo4j Community single-instance + GPL, ArangoDB OSS clustering)
Short version: Neo4j’s Community offering is aimed at single‑node use for local/dev; clustering and some enterprise features are behind the commercial/Enterprise line. The Community edition’s license is more restrictive than permissive‑open licenses; check Neo4j’s site for the exact legal wording before commercial deployment.
ArangoDB’s open‑source edition historically gives you more clustering options without an enterprise purchase. That matters if you want to run a fault‑tolerant cluster in a homelab or DIY cloud without paying license fees.
Legal nit: licensing changes over years. Treat this as a signpost: assume Neo4j enterprise features (fabric/causal clustering, advanced ops) are paid; assume ArangoDB community is more permissive for cluster usage — verify current licenses for production.
Scaling reality (Neo4j Enterprise for clustering, ArangoDB free clustering)
Neo4j scales vertically and offers causal clustering & sharding features in Enterprise. It’s battle‑tested, but the clustering stack is an enterprise feature and requires a license if you need production HA and sharding.
ArangoDB prides itself on shipping cluster functionality in the OSS version: coordinated agents, DB servers, and coordinators letting you shard and replicate collections (including graph data). That can be huge for homelabters who want a redundant graph cluster for free — but setup and ops are nontrivial.
Operationally:
- Neo4j Enterprise: polished clustering, tooling, and support — expect fewer surprises but a license cost for production.
- ArangoDB OSS cluster: powerful and free, but prepare for more hands‑on orchestration and network configuration.
Tooling (Neo4j Browser vs ArangoDB web UI)
Neo4j: excellent visual tooling. The classic Neo4j Browser is great for ad‑hoc exploration; Neo4j Bloom gives polished visual discovery (commercial). Drivers and the APOC library add a ton of power.
ArangoDB: a solid web UI with query editor, collection management, and graph visualization. Less “batteries‑included” for graph analysis than Neo4j’s ecosystem, but it’s more than usable and integrates nicely with Foxx microservices.
Both have language drivers for Node, Python, Java, etc. If visual, one‑click exploration matters to you, Neo4j’s UX is slightly friendlier out of the box.
Reality check: Postgres + recursive CTEs or AGE extension first
Before dropping to a graph DB, ask whether Postgres can do the job. For many hierarchical or shallow graph problems, recursive CTEs are fine and keep operational complexity low.
Example (Postgres recursive CTE):
WITH RECURSIVE path AS ( SELECT id, name, ARRAY[id] AS chain FROM nodes WHERE id = 1 UNION ALL SELECT e.to_id, n.name, chain || e.to_id FROM edges e JOIN nodes n ON n.id = e.to_id JOIN path p ON e.from_id = p.id WHERE NOT e.to_id = ANY(chain))SELECT * FROM path;If you’re already Postgres-heavy, try a prototype there. If queries become awkward or slow as depth/branching grows, then consider a graph DB. Also look at the AGE extension for Postgres— it adds graph semantics and Cypher-like queries inside Postgres if you want a hybrid route.
Decision matrix (use cases per DB)
-
Use Neo4j when:
- You need a dedicated, polished graph engine and the rich ecosystem (APOC, built-in graph algorithms).
- You prefer Cypher’s expressive pattern matching.
- Your workload is deep pointer-chasing on a single node or you have Enterprise budget for clustering.
-
Use ArangoDB when:
- You need documents + graphs in one place and want to avoid cross-store duplication.
- You want free clustering for a production-ish homelab without enterprise spend.
- You’re comfortable with AQL and a slightly broader mental model.
-
Use Postgres (CTEs/AGE) when:
- Your graph needs are modest and you prefer operational simplicity.
- You want to keep everything in a single mature RDBMS with straightforward backups and tooling.
SumGuy-voice conclusion (winner per use case)
Neo4j is the precision forklift: purpose‑built, tidy, and it’ll handle delicate graph hauling like a pro — but the heavy‑duty clamps (clustering, commercial tools) cost money. ArangoDB is the Swiss Army hexcrystal: multi‑model, flexible, and lets you run a real cluster without needing a corporate purchase order. It’s slightly messier, but it’s free and powerful.
So: if you want best‑in‑class graph ergonomics and you’re building a graph‑first product, Neo4j is the friendly winner. If you want a single datastore that covers documents, KV and graph, or you want to scale cheaply in a homelab, ArangoDB is probably the more practical pick.
Your 2 AM self will appreciate picking the right tool: use a graph DB when the problem is connectedness, not because the marketing team liked the logo. And if you need a last‑minute escape hatch — try Postgres + CTEs or AGE before committing to a new database.
Happy scheming; don’t hire a forklift to move a couch unless you like explaining yourself to the neighbors.