๐บ๏ธ SpotThePlace: A Deep Learning Model for Country Classification Using Google Street View Images

Using random Google Street View images, SpotThePlace leverages cutting-edge deep learning techniques to classify the country of origin with high accuracy. The project also explores geographic regression to predict approximate locations based on visual cues. Central to this effort is the creation of a robust and diverse dataset of 50,000 images collected through a custom scraping pipeline.
Summary
๐ Introduction
Geographic information embedded in Street View images provides insights into landscapes, architecture, and environmental factors unique to specific regions. SpotThePlace tackles the challenge of identifying countries and estimating locations from these images using modern AI models, achieving remarkable accuracy in classification and regression tasks.
This project combines machine learning, web scraping, and data engineering to create a scalable framework for analyzing visual geographic data.
๐ Dataset Conception
A reliable and diverse dataset is the cornerstone of this project. Using a combination of Selenium for web scraping and a Random Point Generator with geopandas and shapely, we collected 50,000 Street View images spanning urban and rural environments across four countries.

Key Dataset Features
- Balanced geographical diversity for each country.
- Automatic scraping pipeline to ensure high-quality images.
- Dataset tailored to support both classification and regression tasks.
Example of dataset creation:
from spottheplace import RandomPointGenerator, StreetViewScraper
# Generate random points within a country
generator = RandomPointGenerator()
country_points = generator.generate_points_in_country(country_name="France", num_points=1000)
# Scrape Street View images for these points
scraper = StreetViewScraper(headless=True)
scraper.get_streetview_from_dataframe(country_points)

๐ Deep Learning Models
We explored various architectures and fine-tuning strategies to optimize performance. The models tested included:
- ResNet Architectures: ResNet18, ResNet50, and ResNet152.
- Google Vision Transformer (ViT): google/vit-base-patch16-224.

For the ResNet50 model, multiple levels of fine-tuning were evaluated:
- Unfreeze Classification Head Only: Train only the classification head (the fully connected layer).
- Unfreeze Classification Head and L4 Block: Train the classification head and the final residual block (Layer 4) while freezing the earlier layers.
- Unfreeze All Layers: Fully fine-tune the entire network.
</br>
Here are the results of experiments:
| Metric | Experiment 1 | Experiment 2 | Experiment 3 |
|---|---|---|---|
| Accuracy | 0.803 | 0.921 | 0.902 |
๐ก Note : We notice that retraining all layers does not necessarily improve the modelโs performance.
Some other results:
| Metric | Experiment 4 | Experiment 5 | Experiment 6 |
|---|---|---|---|
| Accuracy | 0.899 | 0.927 | 0.719 |
๐ก Note : A deeper model does not necessarily improve the modelโs performance.
All models are hosted on Hugging Face, You can use it with the model_usage.ipynb notebook.
๐ง Explainability
Understanding why a deep learning model makes a decision is crucial for trustworthiness. SpotThePlace implements Grad-CAM visualizations for ResNet models, showing how specific regions in an image contribute to predictions:
from spottheplace.ml import GradCam
grad_cam = GradCam(model_path=MODEL_PATH)
grad_cam.explain(image_path="path/to/image.jpg")

With this feature, we can observe that the model focuses on specific regions of an image to make accurate predictions like the road, vegetation, architecture or the sky.
โ Conclusion
SpotThePlace combines the power of AI with innovative dataset creation to tackle geographic classification and regression. This project serves as a valuable tool for geospatial analysis, with applications ranging from urban planning to environmental studies.
Explore more on GitHub or use the pre-trained models hosted on Hugging Face.
GitHub @titouanlegourrierec ย ยทย Email titouanlegourrierec@icloud.com
