Skip to content
Go back

Nominatim: Self-Hosted Geocoding

By SumGuy 11 min read
Nominatim: Self-Hosted Geocoding

The Email That Started This

You build a side project. It does something cute with addresses. Maybe a hiking app, maybe a delivery dashboard, maybe just a thing that turns a list of customer ZIPs into a map. You wire up Google Maps Geocoding because it’s “the obvious choice.” Three months later there’s an email about your usage tier, and your enthusiasm for the side project drops by 80%.

That’s the geocoding tax. Per-request pricing on commercial APIs makes total sense for a Fortune 500. It makes zero sense for the home labber and the indie dev. Honestly, even small businesses get hosed. The data behind 95% of these APIs is OpenStreetMap. You can run the same engine they’re running, on your own hardware, for the cost of one SSD and a weekend.

The tool is called Nominatim. It’s the geocoder powering openstreetmap.org itself. Forward geocoding (address → coordinates), reverse geocoding (coordinates → address), structured search. All of it. And running it is honestly less painful than people make it sound — as long as you don’t try to import the entire planet on a Raspberry Pi.

Full example: Working Compose file and config at github.com/KingPin/sumguy-examples/tree/main/self-hosting/nominatim-self-hosted-geocoding-server

What Nominatim Actually Is

Nominatim is a geocoder built on top of PostgreSQL + PostGIS, fed with OpenStreetMap data. When you hand it 1600 Pennsylvania Avenue, Washington, it tokenizes that, hits a bunch of search indexes, and gives you back a lat/lon plus the matched OSM object. When you hand it 38.8977, -77.0365 it walks the spatial index and returns the closest meaningful address.

It’s the same engine the public site at nominatim.openstreetmap.org uses. The public service is rate-limited to roughly 1 request per second, which is fine for a status page and useless for anything real. That’s the whole reason self-hosting exists.

Worth knowing: Nominatim is OSM-only. If the data isn’t in OpenStreetMap, Nominatim can’t find it. For most use cases — addresses in populated areas — that’s plenty. For obscure POIs and addresses in countries with sparse OSM coverage, it gets thinner. We’ll come back to that.

The Cost Math That Pushed Me Here

I won’t quote prices because they shift, but the directional shape is consistent. Commercial geocoding APIs charge somewhere in the neighborhood of a few dollars per thousand requests. Sounds tiny. Now imagine a hobby app with 5,000 active users, each producing 20 location lookups a month. That’s 100,000 requests. Now imagine you wrote a script that backfills addresses on a million old records.

Rough rule of thumb I use: if you’re doing more than ~50,000 lookups a month, the math has already swung toward self-hosting. The break-even is faster than people think because the recurring cost of a small VPS or a corner of your home server is essentially fixed, while API costs scale linearly with usage.

There’s a second cost too: latency. A round-trip to a commercial API is 100–300ms. A local Nominatim on the same LAN is 5–20ms. If you’re doing batch work, that matters more than the dollars.

Pick Your PBF: Planet vs Region

The most common reason people bounce off Nominatim is that they try to import the entire planet on the wrong hardware. Don’t do that. The full planet PBF from OpenStreetMap is around 80 GB compressed and balloons to 700 GB+ once Nominatim builds its indexes. The import takes days even on serious hardware.

You almost certainly don’t need that. Geofabrik publishes regional extracts at every level you could want:

If you’re in the US and you only care about US addresses, the north-america-latest.osm.pbf extract is the sweet spot. It fits comfortably on a 1 TB NVMe with room to spare, and the import wraps in 6–12 hours on reasonable hardware. If you only need one state or region, smaller extracts import in under an hour.

We’ll dig into the hardware/RAM/disk math more deeply in the hardware sizing post. For now: pick the smallest extract that covers what you actually need. You can always import a bigger one later.

The Docker Setup (mediagis/nominatim)

The mediagis/nominatim image is the de-facto community Docker image for Nominatim. It bundles a working Postgres + Nominatim + a startup script that handles the import pipeline. Maintained, well-documented, sane defaults.

Here’s a working Compose stack that imports a regional extract:

docker-compose.yml
services:
nominatim:
image: mediagis/nominatim:4.5
container_name: nominatim
ports:
- "8080:8080"
environment:
PBF_URL: https://download.geofabrik.de/north-america-latest.osm.pbf
REPLICATION_URL: https://download.geofabrik.de/north-america-updates/
NOMINATIM_PASSWORD: ${NOMINATIM_PASSWORD}
IMPORT_STYLE: full
THREADS: 4
volumes:
- nominatim-data:/var/lib/postgresql/16/main
- nominatim-flatnode:/nominatim/flatnode
shm_size: 1gb
restart: unless-stopped
volumes:
nominatim-data:
nominatim-flatnode:

And a .env next to it:

.env
NOMINATIM_PASSWORD=please-change-me-for-real

Bring it up:

Terminal window
docker compose up -d
docker compose logs -f nominatim

The first time you start it, the container will download the PBF, set up Postgres, and run the import. This is the part that takes hours. Watch the logs — you’ll see phases like Downloading..., Importing..., Indexing..., Updating word counts.... When it finishes, you’ll see a healthy server listening on port 8080.

A few env vars worth knowing about:

What the Import Actually Does

Under the hood, the import is a multi-stage pipeline:

  1. Parse the PBF into Postgres staging tables
  2. Build the place table — the canonical “thing at a coordinate” table
  3. Build the search table — tokenized text indexes
  4. Build the word counts — frequency stats used for ranking
  5. Build the indexes — GiST spatial indexes, trigram indexes for fuzzy matching

This is CPU-bound during parsing, IO-bound during indexing, and RAM-hungry across the board. A few rules:

Hitting the API

Once the import is done, Nominatim exposes a REST API on port 8080. The endpoints match the public Nominatim site, so any client library written for nominatim.openstreetmap.org works against your local one — just point it at your URL.

Terminal window
# Forward geocode
curl "http://localhost:8080/search?q=1600+Pennsylvania+Ave+Washington&format=json"
# Reverse geocode
curl "http://localhost:8080/reverse?lat=38.8977&lon=-77.0365&format=json"
# Structured search
curl "http://localhost:8080/search?street=Pennsylvania+Avenue&city=Washington&format=json"

The JSON response shape includes lat, lon, display_name, importance (0–1 ranking score), address object with road/city/state/country/postcode, and boundingbox for the matched feature. Useful query parameters:

Rate limit it yourself if you’re going to expose it. The defaults don’t include any throttling.

Updates: Keep It Fresh Without Re-Importing

OSM data changes daily. New buildings get mapped, addresses get corrected, roads get added. You don’t want to re-import the whole region every week.

Setting REPLICATION_URL in your Compose file enables update mode. The mediagis image includes a start.sh that supports an update mode. The simple approach is a periodic update via cron on the host:

Terminal window
# Run a replication update once
docker exec nominatim sudo -u nominatim nominatim replication --once
# Or run it as a background daemon inside the container
docker exec -d nominatim sudo -u nominatim nominatim replication

Daily diffs are usually plenty. Hourly is overkill unless you have a real reason. The diffs are small — single megabytes — and apply in seconds.

Putting Caddy In Front

You probably want this on a nice hostname with TLS, even if it’s only on the LAN. Caddy makes this trivial:

Caddyfile
geocode.lan {
reverse_proxy nominatim:8080
}

Pop that into a Caddy container on the same Docker network and you’re done. If you’re going to expose it to the internet — which honestly, you probably shouldn’t — at minimum add basic auth, an IP allowlist, or a real auth proxy in front. Public Nominatim instances get hammered by bots within hours of going live.

Things That Will Bite You

When Nominatim Is the Wrong Tool

Nominatim is excellent at structured address lookup and reverse geocoding. It’s mediocre at fuzzy autocomplete (the “as you type” search experience). If your use case is a search box where the user types pizz and expects Pizza Hut to pop up in 50ms, you want Photon — same OSM data, different indexing strategy, optimized for typeahead.

If you need geocoding against multiple data sources (OSM + government address files + GeoNames + custom POI databases), you want Pelias. It’s heavier to operate, but it’s the right tool when “OSM only” is a hard limitation.

If you do <10,000 lookups a month and you genuinely don’t care about the cost or the privacy, just use a commercial API. Self-hosting has a real ops cost — disk, monitoring, replication updates, occasional debugging. Don’t do it for vanity.

Going deeper on the comparison? See Nominatim vs Photon vs Pelias — same OSM data, very different tradeoffs.

Wrapping Up

One Docker image, one regional PBF, a few hours of import time, and you have your own geocoder. No API keys, no per-request pricing, no third-party seeing every coordinate your app touches. The whole thing fits on a Mini PC.

If you’re going to dig in further: the hardware sizing post breaks down planet vs region requirements with real numbers. If you’re a Home Assistant user, reverse geocoding for HA without phoning home wires this into your smart home setup. And if you want the full self-hosted maps stack — geocoding plus tile serving plus PostGIS — the combo guide puts it all together.

Your 2 AM self will appreciate not getting paged about a billing alert.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Boundary vs Teleport

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts