I wanted to find farm stays in Italy. Google Maps wasn't enough.
Planning a family trip to Puglia, I hit a wall with Google Maps. So I built a pipeline that searches, deduplicates, and maps places automatically.
- Python
- Maps
- Side Project
The trip that started it all
My family wanted to spend a week in Puglia — the heel of Italy’s boot, where the coast is rocky, the food is incredible, and old farmhouses dot the countryside. The plan was simple: find an agriturismo (a working farm that hosts guests), somewhere quiet, close to the sea.
I opened Google Maps, typed “agriturismo near Punta Prosciutto,” and got a handful of results. Some were plain hotels that happened to tag themselves as farm stays. Others were buried because they didn’t use the word “agriturismo” at all — they called themselves masseria, casale, trullo, or tenuta. These are all regional names for the same kind of place, but Google doesn’t know that unless you search each term individually.
After an hour of tabbing between searches, copying addresses into a spreadsheet, and cross-referencing ratings, I had maybe 15 places. I knew there were more. I just couldn’t find them efficiently. And even the ones I found — I had no owner emails, no direct phone numbers, just whatever Google decided to show. I wanted one clean spreadsheet with every option side by side: name, rating, contact details, location — so we could actually compare and decide.
The real problem
It wasn’t a Google Maps problem. It was a search recall problem and a data consolidation problem. Italian farm stays use a dozen different architectural names that vary by region. A single search query misses most of them. And even if you run multiple searches, you end up with duplicates — the same place appearing under slightly different names or coordinates. On top of that, the useful information is scattered: ratings on Google, emails buried on websites, phone numbers locked in Italian registry sites. There was no single place to see it all.
I needed a way to:
- Run many searches at once with different keywords
- Filter out irrelevant results (standard hotels, restaurants)
- Deduplicate places that show up across searches
- Pull contact info that Google doesn’t surface — owner emails, phone numbers
- Consolidate everything into one spreadsheet with all key info
- See it all on an interactive map
Building the pipeline
I built a Python pipeline called Places Finder — written end-to-end with Claude Code as my pair, from the first config sketch to the deduplication logic and HTML map renderer. It works in five stages:
Discovery. I feed it a location, a radius, and a list of search queries — one for each regional term like masseria, podere, borgo. It runs all of them through Google’s Places API and filters results using a keyword allowlist, so a place tagged “hotel” without any farm-related keywords gets dropped.
Enrichment. For each place, the pipeline visits its website and looks for a contact or legal page. It extracts owner email addresses — something Google almost never shows.
Domain scraping. Italy has a dedicated agriturismo registry at agriturismo.it with structured data: phone numbers, license codes, details you won’t find on Google. The pipeline scrapes it by region and province.
Merging. Now I have CSVs from multiple sources. The merge step deduplicates by coordinate proximity — anything within 50 meters is the same place. Between 50 and 250 meters, it checks whether the names share significant tokens before merging. This prevents two genuinely different farms across the road from collapsing into one record.
Map rendering. The final step generates a self-contained HTML map. Markers are color-coded by rating, icons distinguish Google results from registry data, and you can toggle between satellite and street views. No server needed — just open the file.
What I learned
Regional vocabulary matters more than you’d think. The config file for Puglia farm stays has over a dozen search terms. Without them, recall drops dramatically. This applies to any category where naming conventions vary — try searching for “craft brewery” vs. “taproom” vs. “brewpub.”
Deduplication is harder than searching. Two entries for the same place might have coordinates 80 meters apart and slightly different names. A naive exact-match approach misses these. The two-tier system (coordinates first, then name similarity) catches most duplicates without false merges.
A good config beats more code. The entire pipeline is driven by a YAML profile. Switching from Italian farm stays to, say, specialty coffee shops in Berlin means writing a new config — not new code. That decision saved me from building category-specific logic I’d never reuse.
The result
I ended up with a map of 60+ agriturismi within a 20-minute drive of the coast, complete with ratings, contact details, and links.
But finding places was only half the job — I still had to contact them. With 60+ farms and no booking platform for most of them, that meant writing individual emails in Italian. So I used Claude Cowork to draft personalized outreach emails for each place, pulling from the spreadsheet data — name, location, what they offer. It turned a full day of writing into a quick review-and-send workflow.
We booked a masseria that didn’t appear in my first three Google searches. It was one of the best stays we’ve had.
The tool is open source on GitHub. If you’ve ever felt like Google Maps gives you 10 results when there should be 50, this might be useful.