πββοΈ TrackTheGap: Analyzing Gender Performance Gaps in Athletics with Advanced Data Analysis

Summary
π Introduction
TrackTheGap is a data-driven exploration of gender performance differences in Olympic-level athletics events. By leveraging world-class datasets, advanced statistical methods, and data visualization, this project aims to:
- Analyze historical trends in performance evolution.
- Quantify performance gaps between men and women across disciplines.
- Evaluate rank-based performance disparities within the top-20 global rankings for each discipline.
While this portfolio entry highlights the projectβs technical design and methodology, a more in-depth discussion of results can be found in my Medium article.
π Data Pipeline and Methodology
The success of TrackTheGap lies in its robust data pipeline and technical approach:
1. Data Collection
Although performance data is publicly available on the World Athletics website, it is not provided in a readily exploitable format for large-scale analysis. To overcome this limitation, I developed a web scraping pipeline to extract and structure the data into usable datasets.

The collected data includes:
- Comprehensive athletics records for both men and women
- Top-100 performances for each discipline and each year from 2001 to 2024 for both men and women
Tools used:
Seleniumfor dynamic scraping.BeautifulSoupfor parsing HTML content.pandasfor structuring the extracted data into a usable format.
2. Data Processing
The data processing consists of converting timestamps to a readable time format to enable performance comparisons, cleaning up performances when multiple records are present for the same discipline, calculating the performance gap (with different calculations for time-based and distance-based performances), and finally, calculating the rank-based performance disparity.

π Advanced Analysis
To understand performance differences between male and female athletes, I performed key analyses:
- Performance Gap Analysis: Calculating percentage differences between male and female records across disciplines.
- Trend Analysis: Examining historical improvements in world records.
- Rank-Based Analysis: Comparing performance gaps across the top-20 rankings for men and women.
For visualizations, I used libraries like matplotlib and seaborn to illustrate trends and gaps.

For detailed results, advanced analysis, and references to the research behind these findings, check out the full Medium article.
β Conclusion
TrackTheGap demonstrates how data analysis can uncover meaningful trends in athletic performance. By combining data engineering, statistical modeling, and visual storytelling, this project offers a deeper understanding of gender performance differences in sports.
GitHub @titouanlegourrierec Β Β·Β Email titouanlegourrierec@icloud.com
