Introduction
Inconditional NBA fans can watch several games a day but we must face it, it could be sometime boring to attend on TV all the 82 games of your favorite team all along the season. You sometime ends up looking negligently at the screen waiting to the game to come to an end. Even more when your team leads by 20 and you just want the game to finish. Hopefully, this article will help you to decide wether or not you should stop watching the game because the result won't change.
Premise
Method is simple and based on well-known analyses already done multiple times by different people named "win probability". The idea is to determine at any moment of a game, the probability of victory, based on impartial elements (game score, home court advantage, team ranking...). To do such, one need to use historical data to infer probability of victory. For example, si 990 out ouf 1000, a team that led by 20 points at the end of the third quarter won the game, we could estimate that in the future, if a team lead by 20 points, there is a 99% probability of victory.
This is the very basic explanation. To create the estimations, we just need to extract data and use a mathematical model to fit the results. In this article,we will keep it simple by only taking into account the score difference and the remaining time. Some websites (like ESPN) also takes into account the team ranking differences, the possessions ans many other parameters. It is possible to gather these data and compute thiese models (that are indeed more accurate) but the obtained results are harder to summazrize. It seems easier to say that :
- If a team leads by 7 with 6 minutes remaining, this team has 90% of chance of winning the game
Instead of :
- If a team leads by 7 with 6 minutes remaining and is ranked ten places better and play at home but has already played the day before and its best scorer is injured then this team has 86% of chance of winning.
It is certainly more precise but it create too many cases to handle and to remember.
This is the very basic explanation. To create the estimations, we just need to extract data and use a mathematical model to fit the results. In this article,we will keep it simple by only taking into account the score difference and the remaining time. Some websites (like ESPN) also takes into account the team ranking differences, the possessions ans many other parameters. It is possible to gather these data and compute thiese models (that are indeed more accurate) but the obtained results are harder to summazrize. It seems easier to say that :
- If a team leads by 7 with 6 minutes remaining, this team has 90% of chance of winning the game
Instead of :
- If a team leads by 7 with 6 minutes remaining and is ranked ten places better and play at home but has already played the day before and its best scorer is injured then this team has 86% of chance of winning.
It is certainly more precise but it create too many cases to handle and to remember.
Data gathering
To analyze these probabilities, we must collect all the informations about previous games. Specialized sites have huge and comprehensive databases to run these analyses. But as you are not currently reading as ESPN article data has to be gathered differently. The method is simple (and ugly). One just need to look at basketball-reference.com that collect all the NBA games and to extract the play-by-play summary. For every game since 2001, every possession and action is detailled (who missed the shot, when, what was the score ...) that creates the following type of table
Example of play-by-play |
It is then possible, for each second of the game to know the score and so, the score differnce of the future wining team at this point of the game. By extracting all these informations from 2001 to the 2017-2018 season (with the playoff as weel), we obtain 23219 games. For the rest of the analysis we will focus only on the last quarter (the last 12 minutes). We just keep the data of the last remaing 720 seconds which create a dataset of 16.7 millions of rows (it is quite big but not that difficult to handle and anyway this is the software that do the computation, you don'y do it manually) and it looks like :
Data subset |
First column is the game ID, second column is the number of remaining seconds and last one is the score difference for the winning team. It is then possible to calibrate a model because we have 23219 observations for each second. Let's take for examplethe score difference with 60 secondes remaining. The following graph shows there was 1225 games where the score difference with 60 seconds was 1 point (first bar of the graph) but only 443 games where the score difference at 1 second was 20 points. In blue are presented all the games where the team leading the score (again with 60 seconds remaining) finally won and in orange the number of times where the leading team finally lost.
By looking at the 1225 games with 1 point diffrence and with 60 seconds remaining, 826 times the leading team finally won (blue bar) and 399 times it is the trailing team that finally won (orange). Hence, the "observed" probability of winning a game when leading by 1 at 60 seconds of the end of the game is 826/225 = 67.4% (overtime games were not taken into account to simplify the computation).
We cas easily see that a difference bigger than 10 points with 60 seconds remaining garanty a win (it was always the case the 9637 times it happens). It does not mean that it will never happen (we are just making inference) it just means it never happend before. So, if you are in this case, you team is trailing by more than 10 points with 60 seconds remaining, I advised you to leave, you will hurt yourself for nothing (or, if you stay, you may see an historical moment). The only time when a team was trailing by 10 and finally won was the infamous 2009, Houston Rockets - San Antonio Spurs with 13 points in 35 seconds from Tracy McGrady.
We cas easily see that a difference bigger than 10 points with 60 seconds remaining garanty a win (it was always the case the 9637 times it happens). It does not mean that it will never happen (we are just making inference) it just means it never happend before. So, if you are in this case, you team is trailing by more than 10 points with 60 seconds remaining, I advised you to leave, you will hurt yourself for nothing (or, if you stay, you may see an historical moment). The only time when a team was trailing by 10 and finally won was the infamous 2009, Houston Rockets - San Antonio Spurs with 13 points in 35 seconds from Tracy McGrady.
We just saw that a team leading by 1 with 60 seconds remaninig won 67% of the time. Is a team led by 2 it's 84% of victory, with 3 points it's 90%, 4 points 96% etc.. This analysis just need to be done also for all the other remanining seconds.
Analysis
It is then possible to create a model that fit all the data. It is only necessary to smoth these data because there is a slight variability in the results. After making these adjustments we can produce graph with victory levelss to synthetize the outcome.
If you are not vey motivated by watching a game, you just need to look at the 80% slope. It means that 4 times out of 5, the team leading by 5 with 7 minutes remaining (or 6 points with 12 minutes) finally won.
However, if you want to be sure about the winner, look at least at the 95% slope. It shows that a team leading by 10 with 7 minutes remaining will win 95% of the time
I would advise to use the 90% slope, a team leading by 10 at the start of the last quarter will win 9 times out of 10. You will save one useless quarter most of the time.
Conclusion
The "fun" part of this analysis is not really the model (which is fairly simple in order to be synthetic) but more the data gathering and formatting to create a concise answer to a question that arose in every NBA fan : "Does my team still have a chance to win this game ?". Of course numerous parameters has to be considered and this this why sport is magic, it is dur to the uncertainty of the final result. But how many times were you expecting a positive outcome for your time when it was trailed by 6 with 2 minutes remaining. It seems possible but in reality in only happends one time out of 40. So don't expect too much.
Aucun commentaire:
Enregistrer un commentaire