Identifying Run Blocking Concepts Using NFL Tracking Data

May 25, 2023

After assisting on the development of the NGS Coverage Classification Model while working for Next Gen Stats this past season, I thought it’d be a fun challenge to create a model that identified the run blocking concept used by an offense on running plays using Next Gen Stats tracking data.

The data sources for this project are tracking data from the 2020 Big Data Bowl and the run blocking labels come from PFF charting data. The tracking data only contains the frame at the time of handoff as the 2020 BDB was focused on building a Rushing Yards Over Expected (RYOE) model.

Rather than divide up the training data further and focusing on all the different sub categories of run types, I decided to break the run blocking scheme into three categories; Zone, Man and Gap. We are also just looking at the offensive lineman, RB and QB on these plays, all other players on both offense and defense are filtered out.

The GSIS position labels in the tracking data break offensive lineman into three positions; OT, OG, C. I decided to specify the positions further, basing who the LT and LG were on which of the tackles and guards had the highest Y value.

The Model

The model type I settled on for this project was an XGBoost tree model with 5-fold cross validation. The main difficulty for this project centered around the lack of a diversified training set to work off of. Zone blocking was the dominant blocking scheme in the dataset by far, so I decided to use the upSample function in the caret R library to even up the number of play examples within the training set. Depending on what plays were randomly duplicated by upSample the balanced accuracy of the model could swing about +/- 1%.

In addition to the fields supplied by the tracking dataset, I created a few additional features, mainly looking at the distance between different players and the angles between the lineman. The thing that sparked this project idea for me, orientation of the lineman, had very little effect as a feature to my surprise. If we look at a play from Week 3 of the 2017 season, we can see why this might be the case:

This screenshot is about half a second after the snap of the ball and it is already pretty clear that this is a zone blocking scheme based on how each lineman is oriented.

However, we don’t have this frame in our dataset, we have this:

A little more unclear, right? Both of our guards, LG Stefen Wisniewski (#61) and RG Brandon Brooks (#79), gain depth downfield looking for someone to block while C Jason Kelce (#62) stalemates with DT Damon “Snacks” Harrison (#98) at the line of scrimmage. The model incorrectly labeled this as a Man blocking scheme rather than Zone.

Here is the entire play if you’d like to watch:

I would expect orientation to have a higher importance if this was a dataset that included frames from snap to handoff.

Another surprising finding when looking at feature importance was lineman speed, with 3 of the top 7 features relating to it in someway. The model importance of each feature can be seen here:

After some tweaking, I was able to achieve an overall balanced accuracy of 92.4%, with a 95% confidence interval between 91.2% and 93.5%. The positive predictive values for the run blocking concepts were 93.5%, 86.2%, and 94.4% for Gap, Man, and Zone, respectively.

Here is a confusion matrix of those results:

Limitations

Other than the previously mentioned Zone blocking heavy data set, this is a project that would have benefitted a lot from an increase of in-play frames. Not only would it probably help with the accuracy of the model, but those additional frames would allow more specific run concepts to be considered, such as Counter, Split Zone, Trap, and others. It is also worth noting that the labeling of the charting data isn’t 100% accurate, so this could have led to some false negatives.

Use Cases

Coming from Next Gen Stats, my thought process behind this project was to help start the development of a stat that we could use to help tell stories about the offensive line. The 2023 Big Data Bowl looked at the impact of the offensive line in the passing game, so a stat that focused on the running game seemed like a good complement. From a team perspective, this could help free up the time coaches would usually spend charting plays manually, and would be available to them immediately following a game.

Look for more posts coming soon.

Where to Find Me

Twitter: @Mike_Lounsberry

LinkedIn: Mike Lounsberry

Email: lounsberrym@gmail.com

#Analytics

Discussion about this post