πŸƒβ€β™‚οΈ TrackTheGap: Analyzing Gender Performance Gaps in Athletics with Advanced Data Analysis

GitHub Medium


Performance Gap Between Men and Women in Olympic Running Events

Summary

πŸ” Introduction

TrackTheGap is a data-driven exploration of gender performance differences in Olympic-level athletics events. By leveraging world-class datasets, advanced statistical methods, and data visualization, this project aims to:

  1. Analyze historical trends in performance evolution.
  2. Quantify performance gaps between men and women across disciplines.
  3. Evaluate rank-based performance disparities within the top-20 global rankings for each discipline.

While this portfolio entry highlights the project’s technical design and methodology, a more in-depth discussion of results can be found in my Medium article.

πŸ“Š Data Pipeline and Methodology

The success of TrackTheGap lies in its robust data pipeline and technical approach:

1. Data Collection

Although performance data is publicly available on the World Athletics website, it is not provided in a readily exploitable format for large-scale analysis. To overcome this limitation, I developed a web scraping pipeline to extract and structure the data into usable datasets.

World Athletics Official Website

The collected data includes:

  • Comprehensive athletics records for both men and women
  • Top-100 performances for each discipline and each year from 2001 to 2024 for both men and women

Tools used:

  • Selenium for dynamic scraping.
  • BeautifulSoup for parsing HTML content.
  • pandas for structuring the extracted data into a usable format.

2. Data Processing

The data processing consists of converting timestamps to a readable time format to enable performance comparisons, cleaning up performances when multiple records are present for the same discipline, calculating the performance gap (with different calculations for time-based and distance-based performances), and finally, calculating the rank-based performance disparity.

Final CSV Output from Data Pipeline

πŸ“œ Advanced Analysis

To understand performance differences between male and female athletes, I performed key analyses:

  • Performance Gap Analysis: Calculating percentage differences between male and female records across disciplines.
  • Trend Analysis: Examining historical improvements in world records.
  • Rank-Based Analysis: Comparing performance gaps across the top-20 rankings for men and women.

For visualizations, I used libraries like matplotlib and seaborn to illustrate trends and gaps.

Data Visualizations

For detailed results, advanced analysis, and references to the research behind these findings, check out the full Medium article.

βœ… Conclusion

TrackTheGap demonstrates how data analysis can uncover meaningful trends in athletic performance. By combining data engineering, statistical modeling, and visual storytelling, this project offers a deeper understanding of gender performance differences in sports.


GitHub @titouanlegourrierec Β Β·Β  Email titouanlegourrierec@icloud.com