Photon Deep Dive: Search That Forgives

The Typo That Killed the Demo

Someone types 1600 Pensilvania Ave into your search box. Nominatim returns nothing. Blank. The cursor blinks. Your user assumes the map is broken.

That same input goes to Photon. Photon returns the White House, correctly, without blinking. First result: The President's House, 1600 Pennsylvania Avenue NW, Washington, D.C.

That’s the whole argument for Photon. Nominatim is extremely good at what it does, exact tokenized lookup, structured geocoding, reverse geocoding. It’s the engine the geocoding ecosystem runs on. But it’s essentially a very sophisticated string-matching system, and when the string doesn’t match, it tells you to try again. Users don’t try again. They close the tab.

Photon fixes this by putting a search engine (OpenSearch) in front of the same OSM data and letting fuzzy matching, partial input, and language-aware scoring do the heavy lifting. It’s not a replacement for Nominatim. It’s a companion to it. And figuring out when you need both is the whole point of this article.

Full example: Hybrid Photon + Nominatim Compose at github.com/KingPin/sumguy-examples/tree/main/self-hosting/photon-deep-dive-nominatim-search

Where Nominatim Falls Flat

Nominatim’s search pipeline tokenizes the input, looks up tokens in its word index, and walks a ranked list of matching OSM objects. It’s tuned for precision, give it a well-formed address, get back a good answer. That’s perfect for batch geocoding, for structured forms where you control the input, for B2B workflows where the upstream data is clean.

The problem is users. Users type fast, autocorrect intervenes, copy-paste eats the first letter, someone’s phone keyboard is aggressive about capitalizing things mid-word. Real input looks like:

pensilvania ave dc (classic typo)
Wite House (autocorrect overcorrection)
1600 Penn (partial, hoping for autocomplete)
washington dc capitol (keyword soup, not an address)

Nominatim handles the last case decently. The others it drops. Not because it’s poorly written, it’s because its design goal is precision on structured input, not recall on messy input. Those are different problems that require different indexing strategies.

Autocomplete-as-you-type compounds the issue. Nominatim requires a reasonable amount of input before it starts returning useful results. Send it Pen and you’ll get Penticton, Penrith, and Pensacola before Pennsylvania ever shows up. It’s not designed to handle live search where each keystroke is a new query against a half-formed thought.

How Photon Is Built

Photon was built by Komoot, the cycling and hiking navigation company, and open-sourced because they needed what most geocoding users actually need: a forgiving, fast, autocomplete-first search on OSM data.

The architecture looks like this:

OpenSearch back end. Photon’s index lives in OpenSearch. (Older Photon releases ran on Elasticsearch, and the project moved to OpenSearch after the ES license change: if you’re reading a dusty tutorial that still says Elasticsearch, that’s why.) The search engine gives you fuzzy matching, n-gram tokenization, language-boosted scoring, and per-document relevance tuning out of the box.
Data from Nominatim. Photon doesn’t re-ingest the raw PBF. It reads from a Nominatim database via a custom export tool. You need a Nominatim instance: or you can use Photon’s prebuilt dump (more on that in a moment).
Simple Java HTTP server. The Photon server process is a lightweight Java app that takes search requests, hits OpenSearch, and returns GeoJSON. It’s not complex and it doesn’t need to be. Recent releases bundle the search engine into the Photon JAR itself, so a single-node setup doesn’t even need a separate container.

The index structure is the key difference. Where Nominatim builds inverted token indexes optimized for exact phrase matching, Photon builds search-engine mappings with n-gram analyzers that index substrings. Pennsylvania gets indexed as Pen, Penn, Penns, Pennsyl… and so on. When you type a prefix, it matches. When you transpose two letters, the fuzzy distance scoring still returns a useful result. When you search in French for a German city, multilingual synonym handling kicks in.

This comes at a cost. A JVM-backed search engine isn’t small. We’ll get to the resource math later.

Getting Photon Running

You have two paths to a working Photon index.

Path A: prebuilt dump (fastest, most people’s choice). Komoot publishes a daily search-index dump for the full planet via GraphHopper’s public hosting. The URL structure has shifted over time, check photon.komoot.io or the Photon GitHub repo for the current download link. The planet dump is compressed and large (~90 GB), but it means you skip building the index yourself. Download, extract, point Photon at the data directory, start.

# Download the planet dump — verify the current URL at github.com/komoot/photon
wget https://download1.graphhopper.com/public/photon-db-latest.tar.bz2

# Extract into your data directory
tar -xjf photon-db-latest.tar.bz2 -C /data/photon/

The downside: the planet dump is the planet. If you only care about North America, you’re still downloading and running the planet index, which means more disk and more RAM.

Path B: build from your own Nominatim DB (slower, right-sized). If you already have a regional Nominatim instance (which you should if you’ve been following along from the Nominatim setup post), you can export just your region into a Photon index.

# From inside your Nominatim container, run the Photon exporter
# (Photon ships a nominatim-to-photon export tool)
java -jar photon-*.jar \
  -nominatim-import \
  -host localhost \
  -port 5432 \
  -database nominatim \
  -user nominatim \
  -password yourpassword \
  -languages en,fr,de

This takes several hours for a large region. The upside is you get a scoped index that’s a fraction of the planet dump’s size. North America ends up around 20 to 25 GB in the search-index data directory. A single US state is under 5 GB.

Here’s a Compose stack for Photon. Recent Photon images bundle OpenSearch inside the JAR, so the common single-node setup is just Photon plus a data volume, no separate search-engine container to babysit:

services:
  photon:
    image: rtsp/photon:latest   # community image bundling current Photon + OpenSearch
    container_name: photon
    ports:
      - "2322:2322"
    environment:
      - JAVA_OPTS=-Xms4g -Xmx4g
    volumes:
      # your imported index lives here as photon_data/
      - photon-data:/photon/photon_data
    restart: unless-stopped

volumes:
  photon-data:

The bundled OpenSearch runs in-process, so there’s no separate container to wait on, but it still needs a good 30 to 60 seconds to load a large index into memory before the first query lands. Give it a moment after a cold start before you point your search box at it. (Pin the image tag instead of latest if you care about reproducible re-indexes.)

The API Side by Side

The two APIs look similar but aren’t the same. Knowing the differences matters when you’re wiring them up.

Nominatim search:

# Structured or freeform, returns JSON array
curl "http://nominatim.lan/search?q=1600+Pennsylvania+Ave&format=json&addressdetails=1&limit=5"

Photon search:

# Returns GeoJSON FeatureCollection
curl "http://photon.lan:2322/api?q=1600+Pensilvania+Ave&limit=5&lang=en"

# Bias results toward a location (good for "near me" autocomplete)
curl "http://photon.lan:2322/api?q=pizza&lat=38.8977&lon=-77.0365&limit=5"

# Filter by OSM type
curl "http://photon.lan:2322/api?q=starbucks&osm_tag=amenity:cafe&limit=5"

The Photon response is a GeoJSON FeatureCollection. Each feature’s properties object includes osm_id, osm_type, osm_key, osm_value, name, country, state, city, postcode, street, housenumber. It does not include a numeric importance score the way Nominatim does. Ordering is determined by the search engine’s relevance scoring, first result is the best fuzzy match, not necessarily the highest-importance OSM object globally.

The lang parameter is worth knowing about. Photon will try to return names in the requested language if the OSM data has them. ?lang=de for German city names, ?lang=fr for French. This is useful for multilingual UIs and completely ignored by Nominatim’s default config.

The Hybrid Pattern: Use Both

This is the architecture that actually makes sense for most serious deployments:

Photon for the autocomplete box: handles partial input, fuzzy matching, “as you type” UX, POI search by name or category
Nominatim for structured geocoding and reverse geocoding: taking a finalized address and turning it into coordinates, or turning coordinates into a human-readable address

Neither one replaces the other. Photon is bad at reverse geocoding, it technically supports it but Nominatim’s spatial indexing is far more accurate. Nominatim is bad at autocomplete. They’re complementary.

The routing logic in your app looks like this: user is typing into a search box → hit Photon. User submits a structured form (street, city, zip) or you’re batch-processing records → hit Nominatim. User taps a pin on the map to look up the address → hit Nominatim’s reverse endpoint.

If both live in the same Compose stack, behind Caddy or Nginx, the routing can live at the proxy level:

maps.lan {
  # autocomplete / fuzzy search
  handle /photon/* {
    uri strip_prefix /photon
    reverse_proxy photon:2322
  }

  # structured + reverse geocoding
  handle /nominatim/* {
    uri strip_prefix /nominatim
    reverse_proxy nominatim:8080
  }
}

Your frontend calls /photon/api?q=... for the search bar and /nominatim/reverse?lat=...&lon=... for everything else. Clean, testable, no routing logic in the application code.

Resource Math (The Search Engine Is Not Polite)

Here’s the honest resource conversation.

The OpenSearch engine on a single node, running the Photon index for North America, needs at minimum 4 to 6 GB of JVM heap, set via JAVA_OPTS=-Xms4g -Xmx4g in the Compose file above, plus overhead for the OS page cache. In practice you’re looking at 6 to 8 GB of RAM dedicated to Photon for a regional index. The planet index wants 16 to 24 GB.

Disk: North America Photon index at rest is roughly 20 to 25 GB in the data directory. Planet is 80 to 100 GB. The search engine also needs working space for merges and compactions, so add 30 to 40% headroom.

Photon itself (the Java HTTP server) is quiet, 512 MB to 1 GB is fine.

Add Nominatim running alongside (PostGIS, its own memory requirements) and you’re looking at a box with 16 to 32 GB of RAM for a comfortable full hybrid stack with North America data. If you’re running the planet index, budget 64 GB.

On a box that can actually handle it, the performance is very good. Photon fuzzy queries come back in 20 to 80ms. Nominatim reverse geocoding is 5 to 20ms. Your users will not notice the latency.

If you’re on a server with 16 GB of RAM, pick one or the other. Nominatim alone for exact-match use cases. Photon alone if autocomplete is the primary need and you’re willing to accept that reverse geocoding is a weaker experience.

When to Skip Photon

Photon is the right tool specifically when your users are typing freeform into a search box and you can’t guarantee input quality. That’s a meaningful constraint. Many use cases don’t have it.

If your input comes from structured forms, server-to-server API calls, or batch CSV files where you control the formatting, Nominatim is enough and Photon adds cost and complexity for nothing. B2B geocoding pipelines almost universally have clean inputs. Internal tools with a small user base can assume some care in input. Import jobs don’t typo.

If your server is on the smaller side, the search-engine overhead is real. You can’t run Photon responsibly on 4 GB of RAM. If that’s your constraint, stay on Nominatim and accept the exact-match limitation. It’s the honest call.

If your data coverage is narrow, you’re geocoding addresses in one city for a delivery app, and your import reflects that, the typo surface area shrinks dramatically. Users typing 123 Main St into a local-only geocoder are probably not hitting exotic misspellings that Nominatim can’t handle.

Staying In Sync

Photon’s data is a snapshot. Nominatim can apply incremental OSM diffs daily via replication. Photon cannot, the search index is static until you refresh it.

The approach most people use is a periodic full re-index from Nominatim. Monthly is common for non-critical use cases. Weekly if your coverage area is actively being mapped (new developments, changing city layouts). Full planet re-indexes from the prebuilt dump are just a re-download and re-extract, annoying, but not complex.

If you built from your own Nominatim DB, the flow is: run Nominatim replication to keep the PostGIS data fresh, then periodically re-export to Photon with the nominatim-import exporter. You do this to a separate index, then alias-swap to make the new index live without downtime. The Photon docs cover the alias pattern; it’s standard OpenSearch practice.

The practical consequence: if someone reports that a newly-mapped street doesn’t show up in autocomplete, the answer is “next re-index cycle.” If that’s unacceptable, Nominatim with replication gives you more current data, at the cost of the UX compromise on fuzzy search.

Wrapping Up

The pattern that wins: Nominatim for structured geocoding and reverse, Photon for autocomplete and fuzzy search. Run them side by side on hardware that can handle both, route at the proxy level, and your users get a search experience that forgives the typo. Your batch pipelines get the precision they need. Nobody has to compromise.

If you only have one box and it’s not massive, pick based on your primary use case. Autocomplete for end users → Photon. Batch geocoding or structured forms → Nominatim. Either beats a rate-limited commercial API on latency and privacy.

The Compose file and Caddy routing config for the full hybrid stack are in the examples repo. Clone it, adjust the region and language settings, and you’re up in an afternoon.

Your users will type Pensilvania and get the White House. Honestly, that’s the bar.

Nominatim: Self-Hosted Geocoding, the foundation you need before adding Photon
Nominatim vs Photon vs Pelias: choosing the right geocoder for your use case
Nominatim Hardware Sizing: region vs planet resource requirements
The Full Self-Hosted Maps Stack: Nominatim + PostGIS + Tiles, tiles + geocoding together

Photon Deep Dive: Search That Forgives

The Typo That Killed the Demo

Where Nominatim Falls Flat

How Photon Is Built

Getting Photon Running

The API Side by Side

The Hybrid Pattern: Use Both

Resource Math (The Search Engine Is Not Polite)

When to Skip Photon

Staying In Sync

Wrapping Up

Responses from around the web

Discussion

Related Posts

Collateral Freedom: Costly to Block

SearXNG vs Whoogle: Private Search Frontends

Stirling-PDF: Stop Uploading Your Tax Returns to Sketchy Sites

AdGuard DNS Sync Across Two Instances

Photon Deep Dive: Search That Forgives

The Typo That Killed the Demo

Where Nominatim Falls Flat

How Photon Is Built

Getting Photon Running

The API Side by Side

The Hybrid Pattern: Use Both

Resource Math (The Search Engine Is Not Polite)

When to Skip Photon

Staying In Sync

Wrapping Up

Related posts

Related Reading

Responses from around the web

Discussion

Related Posts

Collateral Freedom: Costly to Block

SearXNG vs Whoogle: Private Search Frontends

Stirling-PDF: Stop Uploading Your Tax Returns to Sketchy Sites

AdGuard DNS Sync Across Two Instances