Samples of Thoughts
https://samplesofthoughts.com/
Recent content on Samples of Thoughts
Hugo  gohugo.io
enus
© Corrie Bartelheimer {year}
Thu, 17 Sep 2020 00:00:00 +0000

Categorical Variables
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5partthree/
Thu, 17 Sep 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5partthree/
The Problem with Dummies The Index Variable Approach More Categories Even More Categories Notes of Caution These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 2, of the book Statistical Rethinking (version 2) by Richard McElreath.
In this section, we go through different ways how to add categorical variables to our models.
The Problem with Dummies In the simplest case we only have two categories, e.

Masked Relationship
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5parttwo/
Wed, 16 Sep 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5parttwo/
Hidden Influence in Milk The Causal Reasoning behind it Markov Equivalence Simulating a Masking Ball These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 2, of the book Statistical Rethinking (version 2) by Richard McElreath.
Hidden Influence in Milk In the previous section about spurious associations we used multiple regression to eliminate variables that seemed to have an influence when comparing bivariate relationships but whose association vanishes when introducing more variables to the regression.

Spurious Association
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5partone/
Wed, 09 Sep 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5partone/
Spurious Waffles (and Marriages) Yo, DAG Does this DAG fit? More than one predictor: Multiple regression In Matrix Notation Simulating some Divorces How do we plot these? Predictor residual plots Posterior prediction plots Counterfactual plot Simulating spurious associations Simulating counterfactuals These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 1, of the book Statistical Rethinking (version 2) by Richard McElreath.

Animated Facebook Messages
https://samplesofthoughts.com/2020/animatedfacebookmessages/
Fri, 04 Sep 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/animatedfacebookmessages/
I recently downloaded my own Facebook data and wanted to find out what kind of data gems I could find. There are some clear advantages when analyzing your own data, foremost, you’re the expert and know the “ground truth” behind the data. That said, there can still be big surprises!
In my case, the most interesting parts of the analysis could be boiled down in two graphics. Since there’s also a time factor in the data, I thought this is a good opportunity to learn about animated plots and indeed, it works quite beautifully with the two plots.

AI Guild Podcast: Data Science Interviews
https://samplesofthoughts.com/2020/aiguildpodcastdatascienceinterviews/
Mon, 18 May 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/aiguildpodcastdatascienceinterviews/
On turning the tables during job interviews as you experience grows Some time ago, I sat down with Leyla from the AI Guild to talk about my own Data Science path but also my experience in job interviews for data positions. Recently, as I’ve been interviewing for positions, I wondered how to find out some more sensitive topics. How is the company keeping it with diversity? Is diversity important for them and do they take active measures to increase e.

Curvy Regression
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4partthree/
Tue, 05 May 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4partthree/
Polynomial Regression Splines These are code snippets and notes for the fourth chapter, Geocentric Models, , sections 5, of the book Statistical Rethinking (version 2) by Richard McElreath.
Polynomial Regression Standard linear models using a straight line to fit data are nice for their simplicity but a straight line is also very restrictive. Most data does not come in a straight line. We can use polynomial regression to extend the linear model.

Visa Costs meet Data Viz
https://samplesofthoughts.com/2020/visacostsdataviz/
Tue, 28 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/visacostsdataviz/
I recently stumbled across this data set about visa costs. It is a collection of visa costs for all countries for different kind of visas (tourist, business, work, transit, and some other visas). Each row corresponds to visa relations between a source country (the country applying for the visa) and a target country (the country issuing the visa) together with the cost for the different visa types.
Since I had a bit of free time on my hand, I decided to do some “plotcrastinating”, play around with the data and try out some new visualizations.

Chapter 4  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4ex/
Wed, 22 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4ex/
Easy. Medium. Hard. These are my solutions to the practice questions of chapter 4, Linear Models, of the book Statistical Rethinking (version 2) by Richard McElreath.
Easy. 4E1. In the model definition below, which line is the likelihood: \[ \begin{align*} y_i &\sim \text{Normal}(\mu, \sigma) & & \text{This is the likelihood}\\ \mu &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Exponential}(1) \end{align*} \]
4E2. In the model definition just above, how many parameters are in the posterior distribution?

First Linear Predictions
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4parttwo/
Wed, 22 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4parttwo/
Prior Predictive Checks Running the Model in R Visualize our Model All the Uncertainty These are code snippets and notes for the fourth chapter, Geocentric Models, , sections 4, of the book Statistical Rethinking (version 2) by Richard McElreath.
In this section, we work with our first prediction model where we use the weight to predict the height of a person. We again use the !

Why everything so normal
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4partone/
Tue, 21 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chp4partone/
Why normal distributions are normal A Gaussian Model of Height Quadratic Approximation and Prior Predictive Checks These are code snippets and notes for the fourth chapter, Geocentric Models, sections 1 to 3, of the book Statistical Rethinking (version 2) by Richard McElreath.
Why normal distributions are normal The chapter discusses linear models and starts with a recap on the normal distributions. Why is it such a commonly used distribution and how does it arise?

Chapter 2  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_2/chp2ex/
Mon, 20 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_2/chp2ex/
Easy. Medium. Hard. These are my solutions to the practice questions of chapter 2, Small Words and Large Worlds, of the book Statistical Rethinking (version 2) by Richard McElreath.
Easy. 2E1. Which of the expressions below correspond to the statement: the probability of rain on Monday?
Pr(rain) Pr(rain  Monday) Pr(Monday  rain) Pr(rain, Monday) / Pr(Monday) Statement (4) is equivalent to (2) by Bayes theorem using joint probability.

Chapter 3  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_3/chp3ex/
Thu, 09 Apr 2020 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_3/chp3ex/
Easy. Medium. Hard. These are my solutions to the practice questions of chapter 3, Sampling the Imaginary, of the book Statistical Rethinking (version 2) by Richard McElreath.
Easy. The Easy problems use the samples from the globe tossing example:
p_grid < seq( from=0, to=1, length.out=1000 ) prior < rep( 1, 1000 ) likelihood < dbinom( 6, size=9, prob=p_grid) posterior < likelihood * prior posterior < posterior / sum(posterior) set.

Connecting Disinformation with tidygraph
https://samplesofthoughts.com/2020/connectingdisinformationwithtidygraph/
Wed, 25 Mar 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/connectingdisinformationwithtidygraph/
I recently participated in a hackathon organised by EU’s antidisinformation task force where they gave us access to their data base. The data base consists of all disinformation cases the group has collected since it started in 2015. Their data can also be browsed online on their web page www.euvsdisinfo.eu. The data contains more than 7000 cases of disinformation, mostly news articles and videos, that were collected and debunked by the EUvsDisinfo project.

HouseCleaning: Getting rid of outliers II
https://samplesofthoughts.com/2020/outlierhandlingtwo/
Mon, 24 Feb 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/outlierhandlingtwo/
In the previous post, we tried to clean rental offerings of outliers. We first just had a look at our data and tried to clean by simply using threshold derived from our own knowledge about flats with minor success. We got slightly better results by using the IQR rule and learned two things: First, the IQR rule works better if our data is normally distributed and, if it’s not, transforming it can work wonders.

HouseCleaning: Getting rid of outliers I
https://samplesofthoughts.com/2020/outlierhandlingone/
Mon, 17 Feb 2020 00:00:00 +0000
https://samplesofthoughts.com/2020/outlierhandlingone/
Working with realworld data presents many challenges that sanitized text book data doesn’t have. One of them is how to handle outlier. Outliers are defined as points that differ significantly from other data points and they are especially common in data obtained through manual input processes. For example, on an online listing site, someone might accidentally pressed the zerokey a bit too often and suddenly the small rental flat is as expensive as a palace.

Conference Time: Predictive Analytics World 2019
https://samplesofthoughts.com/2019/conferencetimepredictiveanalyticsworld2019/
Mon, 18 Nov 2019 00:00:00 +0000
https://samplesofthoughts.com/2019/conferencetimepredictiveanalyticsworld2019/
In the last two years, I pretty much only went to very technical conferences, such as the PyData Berlin, the PyCon or the SatRday. They’re all great conferences, organized by awesome people and I will definitely go again but this fall I decided to try out a new conference and check out the Predictive Analytics World in Berlin. First, because it’s always good to try out new things and also because in the last months I was wondering a lot how data teams can be made more useful, somehow more aligned with the business challenges, which frankly isn’t talked much in Python talks about how to deploy machine learning models.

Ordered Categories
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_12/chp12parttwo/
Sun, 28 Jul 2019 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_12/chp12parttwo/
Ordered Categorical Outcomes library(rethinking) data(Trolley) d < Trolley The data contains answers of 331 individuals for different stories, about how morally permissible the action in the story is. The answer is an integer from 1 to 7. The outcome is thus categorical and ordered.
simplehist( d$response, xlim=c(1,7), xlab="response") Describing an ordered distribution with intercepts We want to redescribe this histogram on the logcumulativeodds scale. We first compute the cumulative probabilities:

Reproducible (Data) Science with Docker and R
https://samplesofthoughts.com/2019/reproducibledatascience/
Mon, 17 Jun 2019 00:00:00 +0000
https://samplesofthoughts.com/2019/reproducibledatascience/
In my data team at work, we’ve been using Docker for a while now. At least, the engineers in our team have been using it, we data scientists have been very reluctant to pick it up so far. Why bother with a new tool (that seems complicated) when you don’t see the reason, right?
Until I was about to hold my Houseprice Talk again and wanted to make some small changes to my xaringan slides and nothing worked.

Analyzing the European Election: The Candidates
https://samplesofthoughts.com/2019/europeanelectiondataanalysis/
Tue, 21 May 2019 00:00:00 +0000
https://samplesofthoughts.com/2019/europeanelectiondataanalysis/
The European Election is coming up and for the first time, I have the impression this election is actually talked about and might have an impact. I don’t remember people caring that much about European Elections in the years before, but this, of course, could also just be because I got more interested in European politics. Unfortunately, European politics are complex and this is also mirrored in the quantity of parties that are up for vote in Germany.

Of Monsters and Mixtures
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_12/chp12partone/
Sun, 14 Apr 2019 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_12/chp12partone/
Overdispersed outcomes For the betabinomial model, we’ll make use of the beta distribution. The beta distribution is a probability distribution over probabilities (over the interval \([0, 1]\)).
library(rethinking) pbar < 0.5 theta < 5 curve( dbeta2( x, pbar, theta), from=0, to=1, xlab="probability", ylab="Density") There are different ways to parametrize the beta distribution:
dbeta2 < function( x , prob , theta , log=FALSE ) { a < prob * theta b < (1prob) * theta dbeta( x , shape1=a , shape2=b , log=log ) } We use the betabinomial for the UCBadmit data, which is overdispersed if we ignore department (since the admission rate varied quite a lot for different departments).

Chapter 10  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10ex/
Sat, 17 Nov 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10ex/
Easy. 10E1. If an event has probability 0.35, what are the logodds of this event?
log( 0.35 / (1  0.35)) [1] 0.6190392 10E2. If an event has logodds 3.2, what is the probabiity of this event?
1 / (1 + exp(3.2)) [1] 0.9608343 10E3. A coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?
exp(1.7) [1] 5.

Ohter Count Regressions
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10partthree/
Sun, 11 Nov 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10partthree/
Multinomial Regression A multinomial regression is used when more than two things can happen. As an example, suppose we are modelling career choices for some young adults. Let’s assume there are three career choices one can take and expected income is one of the predictors. One option to model the career choices would be the explicit multinomial model which uses the multinomial logit. The multinomial logit uses the multinomial distribution which is an extension of the binomial distribution to the case with \(K>2\) events.

Poisson Regression
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10parttwo/
Sun, 28 Oct 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10parttwo/
Poisson Regression Oceanic Tools A binomial distriution with many trials (that is \(n\) large) and a small probability of an event (\(p\) small) approaches a Poisson distribution where both the mean and the variance are equal:
y < rbinom(1e5, 1000, 1/1000) c(mean(y), var(y)) [1] 0.996960 1.000841 A Poisson model allows us to model binomial events for which the number of trials \(n\) is unknown.
We work with the Kline data, a dataset about Oceanic societies and the number of found tools.

Binomial Regression
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10partone/
Thu, 04 Oct 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chp10partone/
Logistic Regression The chimpanzee data: Do chimpanzee pick the more social option?
library(rethinking) data(chimpanzees) d < chimpanzees The important variables are the variable condition, indicating if another chimpanzee sits opposite (1) the table or not (0) and the variable prosocial_left which indicates if the left lever is the more social option. These two variables will be used to predict if the chimpanzees pull the left lever or not (pulled_left).

Scraping the web or how to find a flat
https://samplesofthoughts.com/2018/scrapingtheweborhowtofindaflat/
Wed, 03 Oct 2018 00:00:00 +0000
https://samplesofthoughts.com/2018/scrapingtheweborhowtofindaflat/
Berlin is a great city that used to have the reputation of affordable rents. While for sure other cities are still much more expensive, the rents in Berlin have risen considerably. Or so says everyone of my friends and my colleagues and so does it feel looking at renting listings. I decided to have a look myself at the data to find out if there’s still a secret neighborhood in Berlin resisting the raising prices and who knows, maybe the data can even tell us if the neighborhood Wedding is indeed “coming”.

Chapter 8  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chp8ex/
Tue, 11 Sep 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chp8ex/
Chapter 8  Exercises Easy. 8E1. Which of the following is a requirement of the simple Metropolis algorithm?
The proposal distribution must be symmetric 8E2. Gibbs sampling is more efficient than the Metropolis algorithm. How does it achieve this extra efficiency? Are there any limitations?
Gibbs sampling uses conjugate priors which allows it to make smarter proposals and is thus more efficient. The downside to this, is that it uses conjugate priors which might not be a good or valid prior from a scientific perspective.

Markov Chain Monte Carlo
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chp8partone/
Tue, 04 Sep 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chp8partone/
8.1 King Markov and His island kingdom A simple example of the Markov Chain Monte Carlo algorithm:
num_weeks < 1e5 positions < rep(0, num_weeks) current < 10 for (i in 1:num_weeks) { # record current position positions[i] < current # flip coin to generate proposal proposal < current + sample( c(1, 1), size=1) if ( proposal < 1 ) proposal < 10 if ( proposal > 10 ) proposal < 1 # move?

Chapter 7  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chp7ex/
Fri, 17 Aug 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chp7ex/
Chapter 7  Exercises Easy. 7E1. For the causal relationships below, name a hypothetical third variable that would lead to an interaction effect.
Bread dough rises because of yeast. sugar, since the yeast needs some food to grow temperature, if it’s too hot, the yeast dies, maybe a too cold temperature would slow down the dough rising salt inhibits yeast growth Education leads to higher income. class and race could potentially strengthen or weaken the impact of education in income same for gender Gasoline makes a car go.

Interaction
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chp7partone/
Tue, 14 Aug 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chp7partone/
7.1 Building an interaction library(rethinking) data(rugged) d < rugged How does terrain ruggedness influence the GDP?
# make log version of outcome d$log_gdp < log(d$rgdppc_2000) dd < d[ complete.cases(d$rgdppc_2000), ] # split into Africa andnotAfrica d.A1 < dd[ dd$cont_africa == 1, ] d.A0 < dd[ dd$cont_africa == 0, ] Make two model: one for Africa, one for nonAfrica:
# Africa m7.1 < map( alist( log_gdp ~ dnorm( mu, sigma) , mu < a + bR*rugged , a ~ dnorm(8, 100), bR ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data=d.

Chapter 6  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6ex/
Sun, 08 Jul 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6ex/
Chapter 6  Exercises These are my solutions to the exercises from chapter 6.
Easy. 6E1. State the three motivating criteria that define information entropy.
Information entropy (a measure of uncertainty) should be
continous. A small change in probability should also lead to only a small change in uncertainty. We don’t want to allow for sudden jumps. increasing as the number of possible events increases. That means, if only one event has a very high chance of happening and all other have only a very small chance, then there is little uncertainty in what comes next and thus more less information.

Using Information Criteria
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6partthree/
Wed, 04 Jul 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6partthree/
Using information criteria Model comparison library(rethinking) data(milk) d < milk[ complete.cases(milk), ] # remove NA values d$neocortex < d$neocortex.perc / 100 dim(d) head(d) We will predict kcal.per.g using the predictors neocortex and the logarithm of mass. For this, we use four different models (all with flat priors):
a.start < mean(d$kcal.per.g) sigma.start < log( sd( d$kcal.per.g )) m6.11 < map( alist( kcal.per.g ~ dnorm( a, exp(log.sigma) ) ), data=d, start=list(a=a.start, log.

Information Theory and Model Performance
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6parttwo/
Mon, 02 Jul 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chp6parttwo/
Entropy p < c( 0.3, 0.7) sum( p*log(p) ) compare this with:
p < c(0.01, 0.99) sum( p*log(p) ) # contains much less information KullbackLeibler Divergence p < c(0.3, 0.7) q1 < seq(from=0.01, to=0.99, length.out = 100) q < data.frame(q1 = q1, q2 = 1  q1) kl_divergence < function(p, q) { sum( p* log( p/ q) ) } kl < apply(q, 1, function(x){kl_divergence(p=p, q=x)} ) plot( kl ~ q1, type="l", col="steelblue", lwd=2) abline(v = p[1], lty=2) text(0.

Chapter 5  Exercises
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5ex/
Sun, 03 Jun 2018 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chp5ex/
Chapter 5  Exercises These are my solutions to the exercises from chapter 5.
Easy. 5E1. The following linear models are multiple linear regressions:
\(\mu_i = \beta_x x_i + \beta_z z_i\) \(\mu_i = \alpha + \beta_x x_i + \beta_z z_i\) whereas the following are bivariate linear regressions:
\(\mu_i = \alpha + \beta x_i\) \(\mu_i = \alpha + \beta(x_i  z_i)\) 5E2. Write down a multiple regression to evaluate the claim: Animal diversity is linearly related to latitude, but only after controlling for plant diversity.

How to make a website using blogdown and github
https://samplesofthoughts.com/2018/howtomakeawebsiteusingblogdownandgithub/
Sun, 13 May 2018 00:00:00 +0000
https://samplesofthoughts.com/2018/howtomakeawebsiteusingblogdownandgithub/
In this post, I will describe how to build your own webpage (more specific, a blog) using blogdown and have it hosted on your github.
Set up your github repo so it can serve as a web page Build your website using blogdown Set up Github Let’s start with setting up your github. This is actually super simple, you only need to create a new repository with the name <yourusername>.

Welcome to my blog!
https://samplesofthoughts.com/2018/myfirstpost/
Fri, 11 May 2018 00:00:00 +0000
https://samplesofthoughts.com/2018/myfirstpost/
Hello World! I’m Corrie and this is my blog where I plan to occasionally write about interesting topics. Interesting topics is of course entirely subjective and for me, machine learning, statistics (in particular the Bayesian flavor), and data science sound very exciting, so most of my posts will touch any of these topics. As a Mathematician by training, I spent quite some time during my studies with differential geometry, topology, and number theory (knots and prime numbers are cool!

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10_ex/
Chapter 10  Exercise Corrie November 17, 2018
Easy. 10E1. If an event has probability 0.35, what are the logodds of this event?
log( 0.35 / (1  0.35)) [1] 0.6190392 10E2. If an event has logodds 3.2, what is the probabiity of this event?
1 / (1 + exp(3.2)) [1] 0.9608343 10E3. A coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10a/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10a/
Binomial Regression Corrie October 4, 2018
Logistic Regression The chimpanzee data: Do chimpanzee pick the more social option?
library(rethinking) data(chimpanzees) d < chimpanzees The important variables are the variable condition, indicating if another chimpanzee sits opposite (1) the table or not (0) and the variable prosocial_left which indicates if the left lever is the more social option. These two variables will be used to predict if the chimpanzees pull the left lever or not (pulled_left).

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10b/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10b/
Poisson Regression Corrie October 28, 2018
Poisson Regression Oceanic Tools A binomial distriution with many trials (that is large) and a small probability of an event (small) approaches a Poisson distribution where both the mean and the variance are equal:
y < rbinom(1e5, 1000, 1/1000) c(mean(y), var(y)) [1] 0.996090 1.000805 A Poisson model allows us to model binomial events for which the number of trials is unknown.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10c/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_10/chapter10c/
Other count regressions Corrie November 11, 2018
Multinomial Regression A multinomial regression is used when more than two things can happen. As an example, suppose we are modelling career choices for some young adults. Let’s assume there are three career choices one can take and expected income is one of the predictors. One option to model the career choices would be the explicit multinomial model which uses the multinomial logit.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_2/chapter2_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_2/chapter2_ex/
Chapter 2  Exercises Corrie 20200420
These are my solutions to the practice questions of chapter 2, Small Words and Large Worlds, of the book “Statistical Rethinking” (version 2) by Richard McElreath.
Easy. 2E1. Which of the expressions below correspond to the statement: the probability of rain on Monday?
1) Pr(rain) 2) Pr(rain  Monday) 3) Pr(Monday  rain) 4) Pr(rain, Monday) / Pr(Monday)
Statement (4) is equivalent to (2) by Bayes theorem using joint probability.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_3/chapter3_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_3/chapter3_ex/
Chapter 3  Exercises Corrie 20200409
These are my solutions to the practice questions of chapter 3, Sampling the Imaginary, of the book “Statistical Rethinking” (version 2) by Richard McElreath.
Easy. The Easy problems use the samples from the globe tossing example:
p_grid < seq( from=0, to=1, length.out=1000 ) prior < rep( 1, 1000 ) likelihood < dbinom( 6, size=9, prob=p_grid) posterior < likelihood * prior posterior < posterior / sum(posterior) set.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chapter4_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_4/chapter4_ex/
Chapter 4  Exercise Corrie May 21, 2018
Chapter 4  Exercises These are my solutions to the practice questions of chapter 4, Linear Models, of the book “Statistical Rethinking” by Richard McElreath.
Easy Questions. 4E1. In the model definition below, which line is the likelihood:
& & \text{This is the likelihood}
\mu &\sim \text{Normal}(0, 10) \sigma &\sim \text{Normal}(0,10) \end{align*} “)
4E2. In the model definition just above, how many parameters are in the posterior distribution?

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chapter5_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_5/chapter5_ex/
Chapter 5  Exercises Corrie June 3, 2018
Chapter 5  Exercises These are my solutions to the exercises from chapter 5.
Easy. 5E1. The following linear models are multiple linear regressions:
whereas the following are bivariate linear regressions:
”) 5E2. Write down a multiple regression to evaluate the claim: Animal diversity is linearly related to latitude, but only after controlling for plant diversity.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6_ex/
Chpter 6  Exercies Corrie July 8, 2018
Chapter 6  Exercises These are my solutions to the exercises from chapter 6.
Easy. 6E1. State the three motivating criteria that define information entropy.
Information entropy (a measure of uncertainty) should be
continous. A small change in probability should also lead to only a small change in uncertainty. We don’t want to allow for sudden jumps. increasing as the number of possible events increases.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6b/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6b/
Information Theory and Model Performance Corrie July 2, 2018
Entropy p < c( 0.3, 0.7) sum( p*log(p) ) ## [1] 0.6108643 compare this with:
p < c(0.01, 0.99) sum( p*log(p) ) # contains much less information ## [1] 0.05600153 KullbackLeibler Divergence p < c(0.3, 0.7) q1 < seq(from=0.01, to=0.99, length.out = 100) q < data.frame(q1 = q1, q2 = 1  q1) kl_divergence < function(p, q) { sum( p* log( p/ q) ) } kl < apply(q, 1, function(x){kl_divergence(p=p, q=x)} ) plot( kl ~ q1, type="l", col="steelblue", lwd=2) abline(v = p[1], lty=2) text(0.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6c/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_6/chapter6c/
Using Information Critera Corrie July 4, 2018
Using information criteria Model comparison library(rethinking) data(milk) d < milk[ complete.cases(milk), ] # remove NA values d$neocortex < d$neocortex.perc / 100 dim(d) ## [1] 17 9 head(d) ## clade species kcal.per.g perc.fat perc.protein ## 1 Strepsirrhine Eulemur fulvus 0.49 16.60 15.42 ## 6 New World Monkey Alouatta seniculus 0.47 21.22 23.58 ## 7 New World Monkey A palliata 0.56 29.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chapter7/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chapter7/
Interactions Corrie August 14, 2018
7.1 Building an interaction library(rethinking) data(rugged) d < rugged How does terrain ruggedness influence the GDP?
# make log version of outcome d$log_gdp < log(d$rgdppc_2000) dd < d[ complete.cases(d$rgdppc_2000), ] # split into Africa andnotAfrica d.A1 < dd[ dd$cont_africa == 1, ] d.A0 < dd[ dd$cont_africa == 0, ] Make two model: one for Africa, one for nonAfrica:
# Africa m7.1 < map( alist( log_gdp ~ dnorm( mu, sigma) , mu < a + bR*rugged , a ~ dnorm(8, 100), bR ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data=d.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chapter7_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_7/chapter7_ex/
Chapter 7  Exercises Corrie August 17, 2018
Chapter 7  Exercises Easy. 7E1. For the causal relationships below, name a hypothetical third variable that would lead to an interaction effect.
Bread dough rises because of yeast. sugar, since the yeast needs some food to grow temperature, if it’s too hot, the yeast dies, maybe a too cold temperature would slow down the dough rising salt inhibits yeast growth Education leads to higher income.

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chapter8/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chapter8/
Markov Chain Monte Carlo Corrie September 4, 2018
8.1 King Markov and His island kingdom A simple example of the Markov Chain Monte Carlo algorithm:
num_weeks < 1e5 positions < rep(0, num_weeks) current < 10 for (i in 1:num_weeks) { # record current position positions[i] < current # flip coin to generate proposal proposal < current + sample( c(1, 1), size=1) if ( proposal < 1 ) proposal < 10 if ( proposal > 10 ) proposal < 1 # move?

https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chapter8_ex/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/chapter_8/chapter8_ex/
Chapter 8  Exercises Corrie September 11, 2018
Chapter 8  Exercises Easy. 8E1. Which of the following is a requirement of the simple Metropolis algorithm?
The proposal distribution must be symmetric 8E2. Gibbs sampling is more efficient than the Metropolis algorithm. How does it achieve this extra efficiency? Are there any limitations?
Gibbs sampling uses conjugate priors which allows it to make smarter proposals and is thus more efficient.

https://samplesofthoughts.com/projects/statisticalrethinking/readme/
Mon, 01 Jan 0001 00:00:00 +0000
https://samplesofthoughts.com/projects/statisticalrethinking/readme/
Statistical Rethinking These are code snippets, plots and my solutions to some of the exercises of the book “Statistical Rethinking” by Richard McElreath.
I am currently updating code snippets and exercise solution to follow the second version of the book. Solutions to old exercises can still be found on an old branch.
Chapter 2 Exercises Chapter 3 Exercises Chapter 4 Why everthing so normal First Linear Predictions Curvy Regression Exercises Chapter 5 Spurious Associations Masked Relationships When adding variables hurt Categorical Variables Ordinary Least Squares Exercises Chapter 6 Overfitting Information Theory and Model Performance Using Information Criteria Exercises Chapter 7 Interactions Exercises Chapter 8 MCMC Exercises Chapter 10 Binomial Regression Poisson Regression Other Count Regressions Exercises