The Typo That Killed the Demo
Someone types 1600 Pensilvania Ave into your search box. Nominatim returns nothing. Blank. The cursor blinks. Your user assumes the map is broken.
That same input goes to Photon. Photon returns the White House, correctly, without blinking. First result: The President's House, 1600 Pennsylvania Avenue NW, Washington, D.C.
That’s the whole argument for Photon. Nominatim is extremely good at what it does — exact tokenized lookup, structured geocoding, reverse geocoding. It’s the engine the geocoding ecosystem runs on. But it’s essentially a very sophisticated string-matching system, and when the string doesn’t match, it tells you to try again. Users don’t try again. They close the tab.
Photon fixes this by putting Elasticsearch in front of the same OSM data and letting fuzzy matching, partial input, and language-aware scoring do the heavy lifting. It’s not a replacement for Nominatim. It’s a companion to it. And figuring out when you need both is the whole point of this article.
Full example: Hybrid Photon + Nominatim Compose at github.com/KingPin/sumguy-examples/tree/main/self-hosting/photon-deep-dive-nominatim-search
Where Nominatim Falls Flat
Nominatim’s search pipeline tokenizes the input, looks up tokens in its word index, and walks a ranked list of matching OSM objects. It’s tuned for precision — give it a well-formed address, get back a good answer. That’s perfect for batch geocoding, for structured forms where you control the input, for B2B workflows where the upstream data is clean.
The problem is users. Users type fast, autocorrect intervenes, copy-paste eats the first letter, someone’s phone keyboard is aggressive about capitalizing things mid-word. Real input looks like:
pensilvania ave dc(classic typo)Wite House(autocorrect overcorrection)1600 Penn(partial, hoping for autocomplete)washington dc capitol(keyword soup, not an address)
Nominatim handles the last case decently. The others it drops. Not because it’s poorly written — it’s because its design goal is precision on structured input, not recall on messy input. Those are different problems that require different indexing strategies.
Autocomplete-as-you-type compounds the issue. Nominatim requires a reasonable amount of input before it starts returning useful results. Send it Pen and you’ll get Penticton, Penrith, and Pensacola before Pennsylvania ever shows up. It’s not designed to handle live search where each keystroke is a new query against a half-formed thought.
How Photon Is Built
Photon was built by Komoot — the cycling and hiking navigation company — and open-sourced because they needed what most geocoding users actually need: a forgiving, fast, autocomplete-first search on OSM data.
The architecture looks like this:
- Elasticsearch back end. Photon’s index lives in Elasticsearch (or OpenSearch, which it also supports). ES gives you fuzzy matching, n-gram tokenization, language-boosted scoring, and per-document relevance tuning out of the box.
- Data from Nominatim. Photon doesn’t re-ingest the raw PBF. It reads from a Nominatim database via a custom export tool. You need a Nominatim instance — or you can use Photon’s prebuilt dump (more on that in a moment).
- Simple Java HTTP server. The Photon server process is a lightweight Java app that takes search requests, hits Elasticsearch, and returns GeoJSON. It’s not complex and it doesn’t need to be.
The index structure is the key difference. Where Nominatim builds inverted token indexes optimized for exact phrase matching, Photon builds Elasticsearch mappings with n-gram analyzers that index substrings. Pennsylvania gets indexed as Pen, Penn, Penns, Pennsyl… and so on. When you type a prefix, it matches. When you transpose two letters, the fuzzy distance scoring still returns a useful result. When you search in French for a German city, multilingual synonym handling kicks in.
This comes at a cost. Elasticsearch isn’t small. We’ll get to the resource math later.
Getting Photon Running
You have two paths to a working Photon index.
Path A: prebuilt dump (fastest, most people’s choice). Komoot publishes a daily Elasticsearch dump for the full planet via GraphHopper’s public hosting. The URL structure has shifted over time — check photon.komoot.io or the Photon GitHub repo for the current download link. The planet dump is compressed and large (~90 GB), but it means you skip building the index yourself. Download, extract, point Photon at the data directory, start.
# Download the planet dump — verify the current URL at github.com/komoot/photonwget https://download1.graphhopper.com/public/photon-db-latest.tar.bz2
# Extract into your data directorytar -xjf photon-db-latest.tar.bz2 -C /data/photon/The downside: the planet dump is the planet. If you only care about North America, you’re still downloading and running the planet index, which means more disk and more RAM.
Path B: build from your own Nominatim DB (slower, right-sized). If you already have a regional Nominatim instance (which you should if you’ve been following along from the Nominatim setup post), you can export just your region into a Photon index.
# From inside your Nominatim container, run the Photon exporter# (Photon ships a nominatim-to-photon export tool)java -jar photon-*.jar \ -nominatim-import \ -host localhost \ -port 5432 \ -database nominatim \ -user nominatim \ -password yourpassword \ -languages en,fr,deThis takes several hours for a large region. The upside is you get a scoped index that’s a fraction of the planet dump’s size. North America ends up around 20–25 GB in the Elasticsearch data directory. A single US state is under 5 GB.
Here’s a Compose stack that runs both Photon and its Elasticsearch dependency together:
services: elasticsearch: image: elasticsearch:8.13.4 container_name: photon-es environment: - discovery.type=single-node - xpack.security.enabled=false - ES_JAVA_OPTS=-Xms4g -Xmx4g volumes: - es-data:/usr/share/elasticsearch/data restart: unless-stopped
photon: image: komoot/photon:latest container_name: photon ports: - "2322:2322" command: > -listen-ip 0.0.0.0 -listen-port 2322 -es-host elasticsearch -languages en,fr,de depends_on: - elasticsearch restart: unless-stopped
volumes: es-data:Give Elasticsearch a good 30–60 seconds before Photon starts querying it. The depends_on doesn’t wait for ES readiness, only container start — you may need a restart policy or a health-check wrapper if you’re being precise about it.
The API Side by Side
The two APIs look similar but aren’t the same. Knowing the differences matters when you’re wiring them up.
Nominatim search:
# Structured or freeform, returns JSON arraycurl "http://nominatim.lan/search?q=1600+Pennsylvania+Ave&format=json&addressdetails=1&limit=5"Photon search:
# Returns GeoJSON FeatureCollectioncurl "http://photon.lan:2322/api?q=1600+Pensilvania+Ave&limit=5&lang=en"
# Bias results toward a location (good for "near me" autocomplete)curl "http://photon.lan:2322/api?q=pizza&lat=38.8977&lon=-77.0365&limit=5"
# Filter by OSM typecurl "http://photon.lan:2322/api?q=starbucks&osm_tag=amenity:cafe&limit=5"The Photon response is a GeoJSON FeatureCollection. Each feature’s properties object includes osm_id, osm_type, osm_key, osm_value, name, country, state, city, postcode, street, housenumber. It does not include a numeric importance score the way Nominatim does. Ordering is determined by Elasticsearch relevance scoring — first result is the best fuzzy match, not necessarily the highest-importance OSM object globally.
The lang parameter is worth knowing about. Photon will try to return names in the requested language if the OSM data has them. ?lang=de for German city names, ?lang=fr for French. This is useful for multilingual UIs and completely ignored by Nominatim’s default config.
The Hybrid Pattern: Use Both
This is the architecture that actually makes sense for most serious deployments:
- Photon for the autocomplete box: handles partial input, fuzzy matching, “as you type” UX, POI search by name or category
- Nominatim for structured geocoding and reverse geocoding: taking a finalized address and turning it into coordinates, or turning coordinates into a human-readable address
Neither one replaces the other. Photon is bad at reverse geocoding — it technically supports it but Nominatim’s spatial indexing is far more accurate. Nominatim is bad at autocomplete. They’re complementary.
The routing logic in your app looks like this: user is typing into a search box → hit Photon. User submits a structured form (street, city, zip) or you’re batch-processing records → hit Nominatim. User taps a pin on the map to look up the address → hit Nominatim’s reverse endpoint.
If both live in the same Compose stack, behind Caddy or Nginx, the routing can live at the proxy level:
maps.lan { # autocomplete / fuzzy search handle /photon/* { uri strip_prefix /photon reverse_proxy photon:2322 }
# structured + reverse geocoding handle /nominatim/* { uri strip_prefix /nominatim reverse_proxy nominatim:8080 }}Your frontend calls /photon/api?q=... for the search bar and /nominatim/reverse?lat=...&lon=... for everything else. Clean, testable, no routing logic in the application code.
Resource Math (ES Is Not Polite)
Here’s the honest resource conversation.
Elasticsearch on a single node, running the Photon index for North America, needs at minimum 4–6 GB of JVM heap — set via ES_JAVA_OPTS=-Xms4g -Xmx4g in the Compose file above — plus overhead for the OS page cache. In practice you’re looking at 6–8 GB of RAM dedicated to the ES process for a regional index. The planet index wants 16–24 GB.
Disk: North America Photon index at rest is roughly 20–25 GB in the ES data directory. Planet is 80–100 GB. ES also needs working space for merges and compactions, so add 30–40% headroom.
Photon itself (the Java HTTP server) is quiet — 512 MB to 1 GB is fine.
Add Nominatim running alongside (PostGIS, its own memory requirements) and you’re looking at a box with 16–32 GB of RAM for a comfortable full hybrid stack with North America data. If you’re running the planet index, budget 64 GB.
On a box that can actually handle it, the performance is very good. Photon fuzzy queries on Elasticsearch come back in 20–80ms. Nominatim reverse geocoding is 5–20ms. Your users will not notice the latency.
If you’re on a server with 16 GB of RAM, pick one or the other. Nominatim alone for exact-match use cases. Photon alone if autocomplete is the primary need and you’re willing to accept that reverse geocoding is a weaker experience.
When to Skip Photon
Photon is the right tool specifically when your users are typing freeform into a search box and you can’t guarantee input quality. That’s a meaningful constraint. Many use cases don’t have it.
If your input comes from structured forms, server-to-server API calls, or batch CSV files where you control the formatting — Nominatim is enough and Photon adds cost and complexity for nothing. B2B geocoding pipelines almost universally have clean inputs. Internal tools with a small user base can assume some care in input. Import jobs don’t typo.
If your server is on the smaller side, the Elasticsearch overhead is real. You can’t run Photon responsibly on 4 GB of RAM. If that’s your constraint, stay on Nominatim and accept the exact-match limitation. It’s the honest call.
If your data coverage is narrow — you’re geocoding addresses in one city for a delivery app, and your import reflects that — the typo surface area shrinks dramatically. Users typing 123 Main St into a local-only geocoder are probably not hitting exotic misspellings that Nominatim can’t handle.
Staying In Sync
Photon’s data is a snapshot. Nominatim can apply incremental OSM diffs daily via replication. Photon cannot — the Elasticsearch index is static until you refresh it.
The approach most people use is a periodic full re-index from Nominatim. Monthly is common for non-critical use cases. Weekly if your coverage area is actively being mapped (new developments, changing city layouts). Full planet re-indexes from the prebuilt dump are just a re-download and re-extract — annoying, but not complex.
If you built from your own Nominatim DB, the flow is: run Nominatim replication to keep the PostGIS data fresh, then periodically re-export to Photon with the nominatim-import exporter. You do this to a separate Elasticsearch index, then alias-swap to make the new index live without downtime. The Photon docs cover the alias pattern; it’s standard Elasticsearch practice.
The practical consequence: if someone reports that a newly-mapped street doesn’t show up in autocomplete, the answer is “next re-index cycle.” If that’s unacceptable, Nominatim with replication gives you more current data — at the cost of the UX compromise on fuzzy search.
Wrapping Up
The pattern that wins: Nominatim for structured geocoding and reverse, Photon for autocomplete and fuzzy search. Run them side by side on hardware that can handle both, route at the proxy level, and your users get a search experience that forgives the typo. Your batch pipelines get the precision they need. Nobody has to compromise.
If you only have one box and it’s not massive, pick based on your primary use case. Autocomplete for end users → Photon. Batch geocoding or structured forms → Nominatim. Either beats a rate-limited commercial API on latency and privacy.
The Compose file and Caddy routing config for the full hybrid stack are in the examples repo. Clone it, adjust the region and language settings, and you’re up in an afternoon.
Your users will type Pensilvania and get the White House. Honestly, that’s the bar.
Related posts
- Nominatim: Self-Hosted Geocoding — the foundation you need before adding Photon
- Nominatim vs Photon vs Pelias — choosing the right geocoder for your use case
- Nominatim Hardware Sizing — region vs planet resource requirements
- The Full Self-Hosted Maps Stack: Nominatim + PostGIS + Tiles — tiles + geocoding together