Predict F1 Race Outcomes with RACENG F1 Predictor
- Lorenzo Mazzilli
- Jan 8
- 2 min read
Updated: Jan 9
RACENG is not just my Personal Project for my IB curriculum, it has been my gate into ML. Learn with me how I did it.

Random Forest Regressor Graph
1. Predicting the Unpredictable
Formula 1 is known for its unpredictability. A ten-second pit stop, a sudden rain shower, or a rival's engine failure can quickly change a guaranteed win into a mid-field finish. However, beneath the chaos of race day is a solid foundation of data—lap times, tire wear, and historical performance trends. This project started with the goal of cutting through the noise and using that data to make predictions. I built a Machine Learning model to forecast the final finishing positions for the key season-ending UAE Grand Prix.
2. Building the Data Engine
To train a model that understands racing, you need more than just past podium results. I provided the algorithm with historical data from earlier UAE races and broke down a race weekend into its essential parts. The model doesn't simply focus on who has the fastest car; it also takes into account where a driver starts (Grid Position), their speed in practice sessions (FP2 and FP3), their team's pit stop reliability, and their mechanical failure rate (DNF rate). By combining these different data points, we create a complete "profile" for success at this specific track.
3. The Algorithm Under the Hood
The engine behind these predictions is a Python-based algorithm called a "Random Forest Regressor." Instead of relying on a single decision path to predict the outcome, a Random Forest builds hundreds of "decision trees," each analyzing slightly different subsets of the data. It simulates the race scenario hundreds of times from various perspectives and averages the results. This approach makes the model much more reliable against outliers—like an unusual qualifying lap—and better at spotting true performance trends.
4. Identifying the Drivers of Success
One of the most interesting parts of this project is looking at how the model makes its decisions. The "Feature Relevance" output provides insights into race strategy. While the model confirmed that qualifying well (Grid Position) is crucial, it placed significant emphasis on "Pre-Weekend Form" and consistent practice pace. This data supports the visual observations: raw speed on Saturday means little if a driver lacks the sustained pace and team reliability needed to manage the race on Sunday.
5. From Simulation to the Checkered Flag
In the end, this model produced a complete, ranked prediction for a hypothetical 2025 grid. With a Mean Absolute Error (MAE) of about 2.5 positions, it won’t replace the excitement of watching the race, but it offers a surprisingly accurate starting point. It shows how effective modern data science tools like scikit-learn can be when applied to complex real-world situations. Future improvements could include adding more detailed data, like real-time weather updates or specific tire strategies, to further bridge the gap between simulation and reality.



che figo
Bellissimo
Great project