Friday, January 28 • 5:00pm - 5:25pm
Uncovering Project Insights from GitHub PR Data

What does your GitHub repo say about your software development process? What’s the average “idea-to-production” time for new features? How long does it typically take before a Pull Request (PR) is merged? How much content does each PR add, remove, or modify? Understanding such bits of information about your project can help you better guide its development. Furthermore, it can help you promote a healthy and thriving open source community around your project.

In this talk we will show you how to use a number of open source tools to collect data about your repo’s PRs, analyze it, and visualize key metrics on a dashboard to gain greater insights into your software development process. Then, we will show you how to build reproducible workflows which use historical PR data to train machine learning models to predict the time taken to merge a PR. Finally, we will walk you through how we packaged our prediction pipeline and deployed it as a service using Seldon Core on OpenShift. This service can then be integrated into GitHub apps to give live predictions of time to merge for new incoming PRs.

By the end of this talk, participants will be able to use this open source tool to predict the time to merge PRs on their own projects, learn how to use OpenShift to build and deploy their own ML models, and learn how to calculate and visualize metrics from their GitHub repos on a dashboard.

Session chairs: Andrei Veselov and Pavel Yadlouski

Oindrilla Chatterjee

Data Scientist, Red Hat
Oindrilla is a Data Scientist at Red Hat, working in the Office of the CTO working on emerging trends and research in ML and AI. She spent the past year developing open source AI applications for CI data.

Karanraj Chauhan

Software Engineer, Red Hat
I like math, machine learning, and deep learning. Big fan of CPUs, GPUs, FPGAs, and other such lightning powered stones.

Friday January 28, 2022 5:00pm - 5:25pm CET
Session Room 1