Analytics Posts

Probability of Getting Drafted by Star-Rankings and Ramifications of Arbitrary 5-Star Designation

As many college football fans follow recruiting and its relative importance to winning championships, the understanding that recruiting ratings (which largely determine star-ranking status) are a good approximation of team talent. This article is even more evidence that the rankings do a good job of identifying high school talent.

From my point of view, the NFL draft is the best measure of whether a player is talented or not. NFL teams spend a lot of money, time , and opportunity cost on each draft pick. As such, they have the most skin in the game and (presumably) relevant expertise in talent evaluation.

This article outlines the probability of getting drafted by using a more granular star-ranking (https://thefaircatch.com/2021/06/11/a-more-granular-view-of-star-rankings/), which is an expansion of the 247 Composite ratings. Furthermore, it highlights a possible ramification to the arbitrary cutoff of approximately 30 five-star players in each cycle.

Data

Recruiting class data and NFL draft data were obtained from http://www.collegefootballdata.com. These data sets spanned the years 2005 to 2021. There were 53,826 high school rated players and 2,307 records for drafted players.

Using a more granular breakdown of star-rankings provided insight as to where players draft propositions started to diverge.

In looking at the above scatterplot, a few things become noticeable. First, there is a visible cutoff at the 5.0 star-ranking in the upper left quadrant, which represents earlier-round draft picks and highly rated recruits. This is like because of the arbitrary 30-player limit to 5-star status. It pushes good players down into the high 4-star range.

There is also a clear diminishing presence of higher-rated players to the right of the vertical line (mean line for draft pick).

Below the mean line for star-ranking, things look fairly uniformed.

Probabilities

My hypothesis is that the arbitrary cutoff for 5-stars artificially divides them from 4-stars. The stacking up of data points along the 5.0 star range supports this belief. In looking at further breakdowns of the data, this hypothesis continues to be supported.

Taking all of the 53,826 observations against the NFL draft data and breaking it out by incremental star-rankings, the probabilities for each category getting drafted is shown in the bar graph above. To achieve even a 10% overall chance of getting drafted, a player needs to be a 4.2 star, or have a composite rating of 0.9087 at a minimum. To have a 20% chance of being drafted, a player should be a 4.9 star, which is a composite score of 0.9738.

Also visible here again, is the bottleneck right at the five-star cutoff, strengthening the case against the arbitrary 5-star cutoff.

Drafted by Round

Just getting drafted is a huge marker of success in my view. But getting drafted in early rounds is that much better.

In the bar graph above, the round drafted proportions are added. Once again, players at the 4.9 star rating are considerably more likely to be drafted in the first two rounds. Below, I broke these out by NFL draft round

Conclusion

It’s pretty clear that better players out of high school are more likely to get drafted, and more likely to get drafted earlier in the draft. However, the arbitrary ~30 player cutoff for 5-star designation strikes me as a marketing and hype gimmick. Unfortunately, this probably denies some very deserving players of the five-star status.

Based on what I’ve found, the 5-star designation should be awarded to any player with a composite rating score of ~0.9738, which would give them a 20% chance of being drafted.

Gators’ offensive and defensive performance this year using EPA

Just looking at some of the data from http://www.collegefootballdata.com, I was surprised to find that UF’s offense is performing at a lower level than their defense overall

These boxplots show that UF’s offense is not doing well on 3rd down.

Looking at the overall performance, the below violin plots indicate there is a statistically signficant difference in EPA (p = 0.005) between the two sides, with the offense showing much higher variance:

UF Quarterback Performance to Date. More Evidence of Anthony Richardson’s Play Making

With UF football unexpectedly struggling, there is considerable criticism of head coach Dan Mullen and starting quarterback Emory Jones (EJ). Not to mention the ever-present criticism of defensive coordinator Todd Grantham. However, one source of frustration among UF fans and college football analysts and pundits everywhere seems to be that Mullen continues to start Jones over freshman QB Anthony Richardson, or AR15.

Though this frustration often manifests itself as a slight against Jones, it may be more of a compliment to AR15. Using EPA data from CollegeFootballData.com I looked at how EJ, AR15, and Florida’s opposing QBs (OPP)have performed so far. I just used UF’s opponents as kind of a reference point so we aren’t just looking at EJ vs AR15 without some additional context.

The sample sizes for AR15 vs Jones and OPP are quite different. I was able to get 65 plays for AR, 255 for EJ, and 217 for OPP. Because I am lazy, I only used pass plays for opposing QBs but I used rushing and passing for Florida’s QBs. Sifting through every play and pulling out which rush play was by an opposing QB would take time and it isn’t that important to me. I just wanted some level of reference point. Keeping rushing plays for the UF QBs also helps bolster AR15s sample size. Plus, QB running is a big part of UF’s offensive philosophy with both EJ and AR15.

Just graphing the performance over time shows us something interesting. We can see that AR15 has a lot of plays above the upper bound (2 standard deviations above the group average) and few below the lower bound.

The dotted green lines are the upper and lower bounds. Black dotted line is the average. Using this as a threshold for Really Good and Really Bad, the simple probabilities (percentages) work out like so:

PlayerReally GoodReally Bad
AR 11%2%
EJ 2%2%
OPP1%1%

The clear extreme score is the probability of AR15 having a Really Good play. EJ has largely performed at the same level as UF’s opposing QBs.

This can also be seen in a simple boxplot where the outliers are shown (but only at 1.5 standard deviations, above and below the interquartile range).

The boxplots show that EJ has quite a few negative outliers (dots at the bottom), while all of AR15’s outliers are at the top (good).

I’ll continue to update throughout the season. The Georgia game will be a great opportunity to see what AR15 can do against an elite defense, though he will surely need some help from his teammates. Once I get some more time, I’ll include just the SEC QBs to get a better reference point.

Differences in mean weight for Offensive Line Recruits by Star Rating

Using data on offensive linemen for recruiting classes from 2005 to 2021 (data: https://www.collegefootballdata.com/), I found a statistically significant difference between 3, 4, and 5 star categories (p < 0.001). A Shapiro-Wilk test indicated a non-normal distribution for weights (p < 0.001), therefore a Kruskal-Wallis test with pairwise comparison was conducted.

Median weight for 3-stars was 285. 4-stars was 295. 5-stars was 305. Apparently weight is a factor in the ratings process.

A more granular view of star-rankings

One of the main problems with the star-categorization is that, like most ordinal data, it doesn’t provide an idea as to the degree of separation between intra-star gradings. Generally speaking, we can use the individual ratings to find that, which usually works well. But that still doesn’t let us know exactly where within a particular star category a player currently sits. Taking the range of ratings (composite) for each star category, I added a normalized score at each possible level (at 4 digits). Applying that to Florida’s current commits and leans (per crystal ball) to get an idea of where they stand provides some insight:

Tyler Booker is practically a 5-star (4.98) and Julian Humphrey is a very high 4-star. This is pretty straightforward. Booker is in the 98th percentile of the 4-star range, so he is a 4.98 star.

Class-wise, Florida has some ground to make up, as they are currently 15th using this metric (and controlling for number of commits/leans). Fortunately they have time:

Down and Distance success as a predictor to winning

As Nick Saban recently stated, offense is what is predominantly winning games in college football. Furthermore, passing and points are at a premium. Converting on a down and distance (D&D) situation is intuitively a good thing for the offense, which is then good thing for the team’s chances of winning. I was curious about how converting D&D relates to winning. Since I already had the play by play data from the SEC 2020 season courtesy of https://www.collegefootballdata.com/ (with the exception of the Ole Miss Vanderbilt game), I decided to check it out.

Among the 67 regular season games analyzed, the winner of the game converted a D&D situation (either a first down or a touchdown) at a higher rate than their opponent 77.6% of the time. The winner, on average, converted D&D 9.7% more than their opponent. In the few games in which the team with the lower D&D won, this difference shrunk to 4.2%. So in instances when the team with the higher D&D lost, it was typically much closer, which makes sense.

A regression model for the data indicates a pretty strong relationship between winning and D&D conversion success. A statistically significant model (p < .001, r2 = 0.6281). Yea, it’s a small sample size, but still informative. A quick look at a scatterplot shows the linearity:

8 of 14 observations land within 2 standard errors of the regression line. Adding the team labels shows us who underperformed/overperformed as well:

The team that overperformed best was Texas A&M, while Ole Miss underperformed. There were only two games in which the loser of the game had a D&D conversion rate of more than 10% better than their opponent. Mississippi State vs Vanderbilt (MSU won 24-17, but Vanderbilt had a 34.1% D&D while MSU had only a 22.8%) and LSU vs Florida (LSU won, but UF had a D&D of 40.5% compared to 29.1% for LSU). So, going by this, LSU beating Florida was the SEC’s biggest upset in 2020.

Teamavg D&Dwin percentageD&D Rank
Alabama0.4241.001
Florida0.3900.802
Ole Miss0.3700.443
Texas A&M0.3500.894
Georgia0.3240.785
Auburn0.3170.606
LSU0.3150.507
Missouri0.3080.508
Arkansas0.3080.309
Tennessee0.3050.3010
South Carolina0.3010.2011
Kentucky0.2890.4012
Vanderbilt0.2780.0013
Mississippi State0.2750.3014

I’ll probably play with this some more with additional years and conferences. I’d like to see how it plays out with larger sample sizes.

The Argument for Kyle Trask as 2020’s Best QB

Projecting the Gators’ QB’s total yards and total touchdowns over a 12 game schedule shows just how far ahead of the competition Trask was this year.

So much has already been said about Trask and the fact that his stats came against an all SEC schedule, etc. It seems as if there is some sentiment that Florida’s 3 unfortunate losses are being held against Trask, which is absurd.

The University of Florida Gators 2020 Football Team is highly likely one of the better teams in the nation despite having 3 losses prior to the Cotton Bowl against the University of Oklahoma Sooners. Three low probability plays lead to three narrow defeats, any of which-if prevented- could have resulted in wins instead of losses. If Malik Davis doesn’t fumble on the Gators’ final drive against Texas A&M. If Marco Wilson doesn’t throw a shoe against LSU. If Trey Dean looks left and protects himself against getting vaporized on an interception return against Alabama. Though none of these plays guaranteed victory in any of those games, they were each pivotal in the outcome. They were also each very unlikely to occur, yet they did.

When put into context, the sum of points in those three defeats, is equal to the closest point differential in any of Florida’s 8 victories, 12 against the University of Tennessee. Florida was dominant in all of its games except in its three losses, and in each of those, Florida could have just as easily won, but lady luck was simply not on their side.

The above scatter plot shows data for total yards and touchdowns (rushing and passing) for 103 college QBs in 2020 (https://www.sports-reference.com/cfb/years/2020-passing.html) as they would project over 12 games. Trask projects for the most TDs overall while Ole Miss’ Matt Corral projected to have the most yards, but a big chunk of those yards actually projected from rushing (625). Trask projected to have the most passing yards (4500) and most TDs (47)

Here is another look at the same data with each QB listed:

Touchdown passes by game for SEC QBs that started each game
Cumulative yards by passing by SEC QBs who started each game

The consolidation of power in college football recruiting since 2005

Since 2005, it appears as if a few teams have become recruiting super powers. Of course, there were good recruiters and power house teams before that. But it seems as if many of the top recruits have been ending up at the same ‘ol schools – Alabama, Ohio State, Clemson, and Georgia. In looking at this, some noticeable trends emerged.

I took all the teams that had an average composite recruit rating of at least 1 standard deviation above the Power 5 mean and separated them out from the rest of college football. The full list is at the bottom of the page. There were 14 teams. I then counted up how many recruits each of these teams had each year back to 2005 that were rated at least 0.9300. It is this rating that I have found to be a sort of ‘cutoff’ to probability for getting drafted by an NFL team. Players rated over 0.9300 are more likely to be drafted than those under.

Below we can see the proportion of how these teams powered up.

So, over time we can see how the concentration of recruiting power has moved from the bottom left to the top right. We can also see how good UF was under Meyer and how bad we were under Mac. Furthermore, it shows that LSU, Alabama, and Georgia are soaking up a ton of talent in the SEC, making the task of winning the conference (if you’re not one of those teams) probably much harder.

The data:

Teamoverall avgrankStandardized5-stars4-stars3-starsConf.
USC0.921412.1774815499Pac-12
Ohio State0.920122.1333218882Big Ten
Alabama0.916932.02152202126SEC
Texas0.912641.86624204116Big 12
Georgia0.910251.78342174135SEC
Florida0.909161.74430177132SEC
LSU0.908171.71027192133SEC
Florida State0.903681.54936152153ACC
Notre Dame0.902391.50413180142Ind
Oklahoma0.8982101.35915162152Big 12
Michigan0.8939111.20812173167Big Ten
Clemson0.8895121.05225125168ACC
Miami0.8894131.04913131175ACC
Auburn0.8886141.02113155178SEC
The above graph shows the recruiting classes for each SEC team from 2005-2020

The above graph shows each SEC team’s recruiting as it correlates with with percentage

A look at the Gators’ problems on 3rd down.

All of us Gators have heard – 3rd and Grantham. We certainly feel it each week, but is it backed up by data? Yep.

Using EPA has a measure of whether a play was won or lost, I took a look at how each SEC team has performed thus far in 2020. Definitely not a good finding for UF, and confirms what a lot of us Gators already knew. Now we know it even more (if that is a thing).

In the above table, we can see that Florida is winning on 3rd down (overall) 45.8% of the time, 8th in the SEC in 2020. However, they are only winning 46.3% of the time on 3rd and 7 or longer. That is 12th in the SEC, better than only Vanderbilt and Arkansas.

In this bar graph, you can see a visual of just how poorly UF has performed on 3rd down.