--- title: "Lecture2_BD" author: "Sabrina Streipert" date: "27 February 2019" output: word_document --- ```{r setup} knitr::opts_chunk$set(echo = TRUE) library(dplyr) ``` ### Chapter 2: The Birthday Probability If $N$ is the number of students in a class, then the probability that at least 2 students share the same birthday is $$P = 1- \frac{365!}{(365-n)!}\cdot \frac{1}{365^n}$$ So, let us calculate the probability that none of the students share a birthday and then we just substract that from 1. Frist, we create a class: ```{r cars} Classsize <- 2:70 BDex <- Classsize %>% data.frame() head(BDex) ``` Let us change the column name to the Number: ```{r} BDex <- BDex %>% rename(Number = '.') head(BDex) ``` Now, let us calculate the probability that none of the students share a birthday (= complement) by adding a column: ```{r, message=FALSE, warning=FALSE} BDex1 <- BDex %>% mutate(Prob_compl = factorial(365)/(factorial(365-Number)*(365**Number))) head(BDex1) ``` Oh NO, the values are too large. What now? -> One option: seek online help -> get the inforamtion of lfactorial, which is the logarithm of the factorial value.... If you want some more information about it, use the command: ?lfactorial Can I do that? ```{r} BDex2 <- BDex %>% mutate(lprob_compl = lfactorial(365)/(lfactorial(365-Number)*log(365**Number))) ``` No!!!! recall that $$\log \left( \frac{365!}{(365-N)! 365^N}\right) \neq \left( \frac{\log(365!)}{\log(365-N)! \log(365^N)}\right) $$ So, we need to be careful with these operations! We recall some logarithm operations: $$ \log \left( \frac{365!}{(365-N)! 365^N}\right) = \log(365!)-\log((365-N)!365^N)=\log(365!)-\log((365-N)!)-\log(365^N) $$ And even one more step to get $$ \log \left( \frac{365!}{(365-N)! 365^N}\right) = \log(365!)-\log((365-N)!365^N)=\log(365!)-\log((365-N)!)-N\log(365) $$ Let's try that ```{r} BDex3 <- BDex %>% mutate(lprob_compl = lfactorial(365)-lfactorial(365-Number)- Number*log(365)) head(BDex3) ``` Oh no, why are there negative values??? Don't forget to transform the probability back via the inverse of the logarithm function: ```{r} BDex3 <- BDex3 %>% mutate(prob_compl = exp(lprob_compl)) head(BDex3) ``` Since we are interested in the probability that at least two students share the same birthday, we have to subtract the complement from 1, that is: ```{r} BDex4 <- BDex3 %>% mutate(prob = 1-prob_compl) head(BDex4) ``` Now, let us plot that ```{r} plot(BDex4$Number, BDex4$prob, xlab = 'Number', ylab = 'probability', main = 'Prob that at least 2 students share BD') ``` ## Obtain an approximation of this probability through sampling (Monte Carlo Method): We create several classes, say 10000, and count how many of these classes have students that share the same birthday. For that let us fix a class room size, say of 30 students. ```{r} # pick the year 1998 (or any other year with 365 days) # make a list of all dates: Possible_Dates <- seq( as.Date("1998-01-01"), as.Date("1998-12-31"), by="+1 day") #check head(Possible_Dates) tail(Possible_Dates) ``` Now, we take samples from this list ```{r} Nr_class <- 10000 Nr_students <- 30 counter <- 0 for (i in 1:Nr_class){ Class <- 2:Nr_students %>% data.frame() %>% rename(StudentID = '.' ) Class$BD <- sample(Possible_Dates, (Nr_students-1),replace=TRUE) BD_counter <- Class %>% group_by(BD) %>% summarize(Countpergroup = n()) counter <- counter + max(BD_counter$Countpergroup>1) } prob <- counter/Nr_class print(prob) ``` Compared to the theoretical value: ```{r} Theor <- filter(BDex4, Number == Nr_students) print(Theor$prob) ```