Stock à Stats: octobre 2021

Introduction

This is the year 2021 and the world only talks about one thing : the new Netflix tv Show called "Squid Game" (no, nobody talks about the coronavirus anymore, it's too 2020).

Warning : limited spoilers ahead.

The show tells the story of a groups of people that have to go through a serie of tests where, each time, some of the contestants will be definitively eliminated from the game. In the episode 7, we will attend to the 5th test out of the 6 planned in the competition. This test relies partially on probability and luck.

The design

!! SPOILERS AHEAD !!

At this point of the game, there are only 16 contestants left. The test consists on 18 rows of two tiles each. The contestants will have to go one by one on each of these 18 rows. For each row, he will have to select between two tiles (left or right) but, one of the two tiles of each row is trapped (and the other one is safe) and if a contestant steps on the trapped tile, he will fail the test and be eliminated. (I tried to be vague in the description to avoid as much to spoil as I can).

To summarize, for each row, a contestant has one chance out of two to be eliminated and one chance out of two to go to the next row. Also, in this game, order really matters because even if the first contestant as to rely on his luck to guess correctly 18 times in a row the good answer in order to go through all the rows (he has one chance out of 2^18 to succeed). The following contestants knows what were the good and the bad tiles for each rows that the previous contestants already manage to go through.

For example, if the first contestant succeed at the first row, selecting the left tile and fail at the second row choosing the right tile, the second contestant already knows for sure the good answer for the first two rows : they are in both cases the left ones. Which means he just have to try is luck on the last 16 remaining rows which slighlty increases its chances. And the probability to have somebody able to go through the 18 tiles increases for each following contestant.

The question is to know, what are the probabilities for any number of contestant to succeed at the game. We already know that the probability to have 16 successful contestant out of 16 means that the first player should not be eliminated and then should guess correctly the 18 good tiles, meaning one chance out of 262 144.

But what are the probabilities to have 15, 14, 13 succesful players?

"Squid Game" has zero chance of clearing through the glass bridge

Simulations

The basic way to know the answer is to run computer simulations, test plenty of cases and have an estimation of the repartition of the outcome (ie the number of successful player).

R code to run such a simulation looks like this :

NPlayers = 16
NRows = 18
# Function to count the number of successful people going one by one
Success = function(Sampling) {
  vec = array(NA, dim = ncol(Sampling))
  playerid = 1
  TileId = 1
  while (playerid <= nrow(Sampling) & TileId <= ncol(Sampling)) {
    if (Sampling[playerid, TileId] == 0) {
      # Bad tile selected the next player has to play
      # The next player will have the right i-th tile 
      #(because the previous contestant failed so he knows the right answer)
      playerid = playerid + 1
    }
    vec[TileId] = playerid
    TileId = TileId + 1 # Next row
  }
  # Return the number of succesful people
  if (any(vec > nrow(Sampling))) return(0) else return(nrow(Sampling) - vec[length(vec)] + 1)
}

Count = c()
for (iter in 1:1000000) {
# Basic random simulation of each participant walk

  # don't take into account their previous steps 
  Sampling = matrix(sample(c(0, 1), size = NPlayers * NRows, replace = T), ncol = NRows)
  Count[iter] <- Success(Sampling)
}
table(Count)

Which gives this kind of result :

Even if the bar does not appears on the graph, there are some cases where 13, 14, 15 and 16 people manages to go through but the values are so small they don't show up. From the simulation we can see that in this configuration ( 16 players and 18 rows ) more than 50% of the time 6 to 8 players will succeed. These are the most common occurences.

Results based on 1 million simulations :

N Success	N Occurences	Observed Probability 1 time out of ...
0	651	1536
1	3 009	332
2	11 658	86
3	32 901	30
4	70 535	14
5	121 481	8
6	166 921	6
7	185 816	5
8	167 026	6
9	121 408	8
10	70 595	14
11	32 585	30
12	11 583	86
13	3 154	317
14	597	1 675
15	75	13 333
16	5	200 000

The theory says that there is one chance out of 262 144 to have 16 players that succeed, in our simulation it happened 5 times out of 1 million, hence once each 200 000 trials, which is close to the theoretical value.

Analytical results

After creating a rough estimation of each proportion we can try to be more accurate in the description. In our case we have N = 18 rows, each of them have 2 possibilities. This means there are 2^18 possible outcomes from people crossing all the rows.

The case where no mistakes are made is unique and then appears once out of the 262 144 (2^18) cases.

The case where 1 failure happened is not unique. The mistake can happened at row 1 or row 2 or row3, etc.. up to row 18. Which means there are 18 cases possible where only one error happened. Hence the probability of an outcome with one error represents 18 of the 262 144 cases.

For the cases with two errors, you have to list all the combinations of two positions among the 18 where the failure occurs. You have 18 possibility for the first error and 17 remaining possibilities for the second. Hence you have 18 x 17 = 306 possibilities. But be careful because you are counting twice the same results. For example the couple of error (10,2) and (2,10) are in fact the same. So you need to divide by two which leads to 153 unique combinations aomng the 262 144 cases.

For the case with three failures, method remains the same, you can count all the possibilities 18x17x16 = 4896 and divide by 6 (because each triplet of number have 6 different way to be sorted but are the same for us). Three failures happens 816 times out of 262 144.

The rest of the computation works the same way except for the 3 last cases. Because we are counting the number of players going through all the cases with 16 or more mistakes will leads to no player left. That is why we should groups them in the same value : "0 player remaining"

N Success	Probability	Real occurence
0	0.06%	1524.09
1	0.31%	321.25
2	1.17%	85.67
3	3.27%	30.6
4	7.08%	14.12
5	12.14%	8.23
6	16.69%	5.99
7	18.55%	5.39
8	16.69%	5.99
9	12.14%	8.23
10	7.08%	14.12
11	3.27%	30.6
12	1.17%	85.67
13	0.31%	321.25
14	0.06%	1713.36
15	0.01%	14563.56
16	0.0004%	262144

Our simulated results were not too far from the reality.

Conclusion

SPOILER AGAIN

In the TV show, only 3 people manage to go through the 18 rows ( because, of course, nothing happened like it was supposed to) but in theory there should have been 6 to 8 people able to do it. The case where only 3 people manage to go through only happens 3% of the time and havig 3 person or less that manage to go through will only occurs with a probability below 5%.

Stock à Stats

Squid Game : Game 5 probability

Introduction

The design

Simulations

Analytical results

Conclusion

Evolution of unisex firstnames in the US