🗺️ SpotThePlace: A Deep Learning Model for Country Classification Using Google Street View Images

Using random Google Street View images, SpotThePlace leverages cutting-edge deep learning techniques to classify the country of origin with high accuracy. The project also explores geographic regression to predict approximate locations based on visual cues. Central to this effort is the creation of a robust and diverse dataset of 50,000 images collected through a custom scraping pipeline.

🔍 Introduction

Geographic information embedded in Street View images provides insights into landscapes, architecture, and environmental factors unique to specific regions. SpotThePlace tackles the challenge of identifying countries and estimating locations from these images using modern AI models, achieving remarkable accuracy in classification and regression tasks.

This project combines machine learning, web scraping, and data engineering to create a scalable framework for analyzing visual geographic data.

🌐 Dataset Conception

A reliable and diverse dataset is the cornerstone of this project. Using a combination of Selenium for web scraping and a Random Point Generator with geopandas and shapely, we collected 50,000 Street View images spanning urban and rural environments across four countries.

Key Dataset Features

Balanced geographical diversity for each country.
Automatic scraping pipeline to ensure high-quality images.
Dataset tailored to support both classification and regression tasks.

Example of dataset creation:

from spottheplace import RandomPointGenerator, StreetViewScraper  

# Generate random points within a country  
generator = RandomPointGenerator()  
country_points = generator.generate_points_in_country(country_name="France", num_points=1000)  

# Scrape Street View images for these points  
scraper = StreetViewScraper(headless=True)  
scraper.get_streetview_from_dataframe(country_points)  

Example of StreetView images from the dataset

🚀 Deep Learning Models

We explored various architectures and fine-tuning strategies to optimize performance. The models tested included:

ResNet Architectures: ResNet18, ResNet50, and ResNet152.
Google Vision Transformer (ViT): google/vit-base-patch16-224.

Architectures of the models used for the project

For the ResNet50 model, multiple levels of fine-tuning were evaluated:

Unfreeze Classification Head Only: Train only the classification head (the fully connected layer).
Unfreeze Classification Head and L4 Block: Train the classification head and the final residual block (Layer 4) while freezing the earlier layers.
Unfreeze All Layers: Fully fine-tune the entire network.

</br>

Here are the results of experiments:

Metric	Experiment 1	Experiment 2	Experiment 3
Accuracy	0.803	0.921	0.902

Experiment 1 : unfreezing the classification head, Experiment 2 : unfreezing the L4 block and the classification head, Experiment 3 : unfreezing every layer.

💡 Note : We notice that retraining all layers does not necessarily improve the model’s performance.

Some other results:

Metric	Experiment 4	Experiment 5	Experiment 6
Accuracy	0.899	0.927	0.719

Experiment 4 : unfreezing every layer of a ResNet18, Experiment 5 : unfreezing every layer of a ResNet152, Experiment 6 : using the Vision Transformer model (google/vit-base-patch16-224).

💡 Note : A deeper model does not necessarily improve the model’s performance.

All models are hosted on Hugging Face, You can use it with the model_usage.ipynb notebook.

🧠 Explainability

Understanding why a deep learning model makes a decision is crucial for trustworthiness. SpotThePlace implements Grad-CAM visualizations for ResNet models, showing how specific regions in an image contribute to predictions:

from spottheplace.ml import GradCam  

grad_cam = GradCam(model_path=MODEL_PATH)  
grad_cam.explain(image_path="path/to/image.jpg")  

With this feature, we can observe that the model focuses on specific regions of an image to make accurate predictions like the road, vegetation, architecture or the sky.

✅ Conclusion

SpotThePlace combines the power of AI with innovative dataset creation to tackle geographic classification and regression. This project serves as a valuable tool for geospatial analysis, with applications ranging from urban planning to environmental studies.

Explore more on GitHub or use the pre-trained models hosted on Hugging Face.

GitHub @titouanlegourrierec · Email titouanlegourrierec@icloud.com

Titouan Le Gourrierec