Samples of Thoughts

Samples of Thoughts https://samples-of-thoughts.com/ Recent content on Samples of Thoughts Hugo -- gohugo.io en-us © Corrie Bartelheimer {year} Sat, 24 Sep 2022 00:00:00 +0000 Memory Efficiency in Pandas https://samples-of-thoughts.com/2022/memory-efficiency-in-pandas/ Sat, 24 Sep 2022 00:00:00 +0000 https://samples-of-thoughts.com/2022/memory-efficiency-in-pandas/ If you’re working with big data in pandas you can run into memory problems very quickly. When working locally, your machine might slow down or you even get this lovely message that asks you to please kill some applications. If working in the cloud, one can of course always ramp up memory but trust me, having to restart a couple of thousand killed jobs because of Out-of-Memory errors is not fun and also pricey! Piping in Pandas: Group By and Mutate https://samples-of-thoughts.com/2022/piping-in-pandas-group-by-and-mutate/ Fri, 16 Sep 2022 00:00:00 +0000 https://samples-of-thoughts.com/2022/piping-in-pandas-group-by-and-mutate/ I am a big fan of the tidyverse in R but most of the time, I actually use Python. If the rest of your team uses Python, your production code is in Python, it simply doesn’t make much sense to use R. Anyway, I started to like working with pandas much better once I figured out how to pipe with pandas and how to “translate” from tidyverse to pandas. Then this code in R Categorical Variables https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-three/ Thu, 17 Sep 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-three/ The Problem with Dummies The Index Variable Approach More Categories Even More Categories Notes of Caution These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 2, of the book Statistical Rethinking (version 2) by Richard McElreath. In this section, we go through different ways how to add categorical variables to our models. The Problem with Dummies In the simplest case we only have two categories, e. Masked Relationship https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-two/ Wed, 16 Sep 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-two/ Hidden Influence in Milk The Causal Reasoning behind it Markov Equivalence Simulating a Masking Ball These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 2, of the book Statistical Rethinking (version 2) by Richard McElreath. Hidden Influence in Milk In the previous section about spurious associations we used multiple regression to eliminate variables that seemed to have an influence when comparing bivariate relationships but whose association vanishes when introducing more variables to the regression. Spurious Association https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-one/ Wed, 09 Sep 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chp5-part-one/ Spurious Waffles (and Marriages) Yo, DAG Does this DAG fit? More than one predictor: Multiple regression In Matrix Notation Simulating some Divorces How do we plot these? Predictor residual plots Posterior prediction plots Counterfactual plot Simulating spurious associations Simulating counterfactuals These are code snippets and notes for the fifth chapter, The Many Variables & The Spurious Waffles, section 1, of the book Statistical Rethinking (version 2) by Richard McElreath. Animated Facebook Messages https://samples-of-thoughts.com/2020/animated-facebook-messages/ Fri, 04 Sep 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/animated-facebook-messages/ I recently downloaded my own Facebook data and wanted to find out what kind of data gems I could find. There are some clear advantages when analyzing your own data, foremost, you’re the expert and know the “ground truth” behind the data. That said, there can still be big surprises! In my case, the most interesting parts of the analysis could be boiled down in two graphics. Since there’s also a time factor in the data, I thought this is a good opportunity to learn about animated plots and indeed, it works quite beautifully with the two plots. AI Guild Podcast: Data Science Interviews https://samples-of-thoughts.com/2020/ai-guild-podcast-data-science-interviews/ Mon, 18 May 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/ai-guild-podcast-data-science-interviews/ On turning the tables during job interviews as you experience grows Some time ago, I sat down with Leyla from the AI Guild to talk about my own Data Science path but also my experience in job interviews for data positions. Recently, as I’ve been interviewing for positions, I wondered how to find out some more sensitive topics. How is the company keeping it with diversity? Is diversity important for them and do they take active measures to increase e. Curvy Regression https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-three/ Tue, 05 May 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-three/ Polynomial Regression Splines These are code snippets and notes for the fourth chapter, Geocentric Models, , sections 5, of the book Statistical Rethinking (version 2) by Richard McElreath. Polynomial Regression Standard linear models using a straight line to fit data are nice for their simplicity but a straight line is also very restrictive. Most data does not come in a straight line. We can use polynomial regression to extend the linear model. Visa Costs meet Data Viz https://samples-of-thoughts.com/2020/visa-costs-data-viz/ Tue, 28 Apr 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/visa-costs-data-viz/ I recently stumbled across this data set about visa costs. It is a collection of visa costs for all countries for different kind of visas (tourist, business, work, transit, and some other visas). Each row corresponds to visa relations between a source country (the country applying for the visa) and a target country (the country issuing the visa) together with the cost for the different visa types. Since I had a bit of free time on my hand, I decided to do some “plotcrastinating”, play around with the data and try out some new visualizations. First Linear Predictions https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-two/ Wed, 22 Apr 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-two/ Prior Predictive Checks Running the Model in R Visualize our Model All the Uncertainty These are code snippets and notes for the fourth chapter, Geocentric Models, , sections 4, of the book Statistical Rethinking (version 2) by Richard McElreath. In this section, we work with our first prediction model where we use the weight to predict the height of a person. We again use the !Kung data and restrict to adults above 18. Why everything so normal https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-one/ Tue, 21 Apr 2020 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chp4-part-one/ Why normal distributions are normal A Gaussian Model of Height Quadratic Approximation and Prior Predictive Checks These are code snippets and notes for the fourth chapter, Geocentric Models, sections 1 to 3, of the book Statistical Rethinking (version 2) by Richard McElreath. Why normal distributions are normal The chapter discusses linear models and starts with a recap on the normal distributions. Why is it such a commonly used distribution and how does it arise? Connecting Disinformation with tidygraph https://samples-of-thoughts.com/2020/connecting-disinformation-with-tidygraph/ Wed, 25 Mar 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/connecting-disinformation-with-tidygraph/ I recently participated in a hackathon organised by EU’s anti-disinformation task force where they gave us access to their data base. The data base consists of all disinformation cases the group has collected since it started in 2015. Their data can also be browsed online on their web page www.euvsdisinfo.eu. The data contains more than 7000 cases of disinformation, mostly news articles and videos, that were collected and debunked by the EUvsDisinfo project. House-Cleaning: Getting rid of outliers II https://samples-of-thoughts.com/2020/outlier-handling-two/ Mon, 24 Feb 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/outlier-handling-two/ In the previous post, we tried to clean rental offerings of outliers. We first just had a look at our data and tried to clean by simply using threshold derived from our own knowledge about flats with minor success. We got slightly better results by using the IQR rule and learned two things: First, the IQR rule works better if our data is normally distributed and, if it’s not, transforming it can work wonders. House-Cleaning: Getting rid of outliers I https://samples-of-thoughts.com/2020/outlier-handling-one/ Mon, 17 Feb 2020 00:00:00 +0000 https://samples-of-thoughts.com/2020/outlier-handling-one/ Working with real-world data presents many challenges that sanitized text book data doesn’t have. One of them is how to handle outlier. Outliers are defined as points that differ significantly from other data points and they are especially common in data obtained through manual input processes. For example, on an online listing site, someone might accidentally pressed the zero-key a bit too often and suddenly the small rental flat is as expensive as a palace. Conference Time: Predictive Analytics World 2019 https://samples-of-thoughts.com/2019/conference-time-predictive-analytics-world-2019/ Mon, 18 Nov 2019 00:00:00 +0000 https://samples-of-thoughts.com/2019/conference-time-predictive-analytics-world-2019/ In the last two years, I pretty much only went to very technical conferences, such as the PyData Berlin, the PyCon or the SatRday. They’re all great conferences, organized by awesome people and I will definitely go again but this fall I decided to try out a new conference and check out the Predictive Analytics World in Berlin. First, because it’s always good to try out new things and also because in the last months I was wondering a lot how data teams can be made more useful, somehow more aligned with the business challenges, which frankly isn’t talked much in Python talks about how to deploy machine learning models. Ordered Categories https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_12/chp12-part-two/ Sun, 28 Jul 2019 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_12/chp12-part-two/ Ordered Categorical Outcomes library(rethinking) data(Trolley) d <- Trolley The data contains answers of 331 individuals for different stories, about how morally permissible the action in the story is. The answer is an integer from 1 to 7. The outcome is thus categorical and ordered. simplehist( d$response, xlim=c(1,7), xlab="response") Describing an ordered distribution with intercepts We want to redescribe this histogram on the log-cumulative-odds scale. We first compute the cumulative probabilities: Reproducible (Data) Science with Docker and R https://samples-of-thoughts.com/2019/reproducible-data-science/ Mon, 17 Jun 2019 00:00:00 +0000 https://samples-of-thoughts.com/2019/reproducible-data-science/ In my data team at work, we’ve been using Docker for a while now. At least, the engineers in our team have been using it, we data scientists have been very reluctant to pick it up so far. Why bother with a new tool (that seems complicated) when you don’t see the reason, right? Until I was about to hold my Houseprice Talk again and wanted to make some small changes to my xaringan slides and nothing worked. Analyzing the European Election: The Candidates https://samples-of-thoughts.com/2019/european-election-data-analysis/ Tue, 21 May 2019 00:00:00 +0000 https://samples-of-thoughts.com/2019/european-election-data-analysis/ The European Election is coming up and for the first time, I have the impression this election is actually talked about and might have an impact. I don’t remember people caring that much about European Elections in the years before, but this, of course, could also just be because I got more interested in European politics. Unfortunately, European politics are complex and this is also mirrored in the quantity of parties that are up for vote in Germany. Of Monsters and Mixtures https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_12/chp12-part-one/ Sun, 14 Apr 2019 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_12/chp12-part-one/ Over-dispersed outcomes For the beta-binomial model, we’ll make use of the beta distribution. The beta distribution is a probability distribution over probabilities (over the interval $[0, 1]$). library(rethinking) pbar <- 0.5 theta <- 5 curve( dbeta2( x, pbar, theta), from=0, to=1, xlab="probability", ylab="Density") There are different ways to parametrize the beta distribution: dbeta2 <- function( x , prob , theta , log=FALSE ) { a <- prob * theta b <- (1-prob) * theta dbeta( x , shape1=a , shape2=b , log=log ) } We use the beta-binomial for the UCBadmit data, which is over-dispersed if we ignore department (since the admission rate varied quite a lot for different departments). Scraping the web or how to find a flat https://samples-of-thoughts.com/2018/scraping-the-web-or-how-to-find-a-flat/ Wed, 03 Oct 2018 00:00:00 +0000 https://samples-of-thoughts.com/2018/scraping-the-web-or-how-to-find-a-flat/ Berlin is a great city that used to have the reputation of affordable rents. While for sure other cities are still much more expensive, the rents in Berlin have risen considerably. Or so says everyone of my friends and my colleagues and so does it feel looking at renting listings. I decided to have a look myself at the data to find out if there’s still a secret neighborhood in Berlin resisting the raising prices and who knows, maybe the data can even tell us if the neighborhood Wedding is indeed “coming”. How to make a website using blogdown and github https://samples-of-thoughts.com/2018/how-to-make-a-website-using-blogdown-and-github/ Sun, 13 May 2018 00:00:00 +0000 https://samples-of-thoughts.com/2018/how-to-make-a-website-using-blogdown-and-github/ In this post, I will describe how to build your own webpage (more specific, a blog) using blogdown and have it hosted on your github. Set up your github repo so it can serve as a web page Build your website using blogdown Set up Github Let’s start with setting up your github. This is actually super simple, you only need to create a new repository with the name <yourusername>. Welcome to my blog! https://samples-of-thoughts.com/2018/my-first-post/ Fri, 11 May 2018 00:00:00 +0000 https://samples-of-thoughts.com/2018/my-first-post/ Hello World! I’m Corrie and this is my blog where I plan to occasionally write about interesting topics. Interesting topics is of course entirely subjective and for me, machine learning, statistics (in particular the Bayesian flavor), and data science sound very exciting, so most of my posts will touch any of these topics. As a Mathematician by training, I spent quite some time during my studies with differential geometry, topology, and number theory (knots and prime numbers are cool! https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10_ex/ Chapter 10 - Exercise Corrie November 17, 2018 Easy. 10E1. If an event has probability 0.35, what are the log-odds of this event? log( 0.35 / (1 - 0.35)) [1] -0.6190392 10E2. If an event has log-odds 3.2, what is the probabiity of this event? 1 / (1 + exp(-3.2)) [1] 0.9608343 10E3. A coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome? https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10a/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10a/ Binomial Regression Corrie October 4, 2018 Logistic Regression The chimpanzee data: Do chimpanzee pick the more social option? library(rethinking) data(chimpanzees) d <- chimpanzees The important variables are the variable condition, indicating if another chimpanzee sits opposite (1) the table or not (0) and the variable prosocial_left which indicates if the left lever is the more social option. These two variables will be used to predict if the chimpanzees pull the left lever or not (pulled_left). https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10b/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10b/ Poisson Regression Corrie October 28, 2018 Poisson Regression Oceanic Tools A binomial distriution with many trials (that is large) and a small probability of an event ( small) approaches a Poisson distribution where both the mean and the variance are equal: y <- rbinom(1e5, 1000, 1/1000) c(mean(y), var(y)) [1] 0.996090 1.000805 A Poisson model allows us to model binomial events for which the number of trials is unknown. We work with the Kline data, a dataset about Oceanic societies and the number of found tools. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10c/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_10/chapter10c/ Other count regressions Corrie November 11, 2018 Multinomial Regression A multinomial regression is used when more than two things can happen. As an example, suppose we are modelling career choices for some young adults. Let’s assume there are three career choices one can take and expected income is one of the predictors. One option to model the career choices would be the explicit multinomial model which uses the multinomial logit. The multinomial logit uses the multinomial distribution which is an extension of the binomial distribution to the case with events. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_2/chapter2_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_2/chapter2_ex/ Chapter 2 - Exercises Corrie 2020-04-20 These are my solutions to the practice questions of chapter 2, Small Words and Large Worlds, of the book “Statistical Rethinking” (version 2) by Richard McElreath. Easy. 2E1. Which of the expressions below correspond to the statement: the probability of rain on Monday? Pr(rain) Pr(rain | Monday) Pr(Monday | rain) Pr(rain, Monday) / Pr(Monday) Statement (4) is equivalent to (2) by Bayes theorem using joint probability. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_3/chapter3_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_3/chapter3_ex/ Chapter 3 - Exercises Corrie 2020-04-09 These are my solutions to the practice questions of chapter 3, Sampling the Imaginary, of the book “Statistical Rethinking” (version 2) by Richard McElreath. Easy. The Easy problems use the samples from the globe tossing example: p_grid <- seq( from=0, to=1, length.out=1000 ) prior <- rep( 1, 1000 ) likelihood <- dbinom( 6, size=9, prob=p_grid) posterior <- likelihood * prior posterior <- posterior / sum(posterior) set. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chapter4_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_4/chapter4_ex/ Chapter 4 - Exercise Corrie May 21, 2018 Chapter 4 - Exercises These are my solutions to the practice questions of chapter 4, Linear Models, of the book “Statistical Rethinking” by Richard McElreath. Easy Questions. 4E1. In the model definition below, which line is the likelihood: 4E2. In the model definition just above, how many parameters are in the posterior distribution? There are 2 parameters, and . 4E3. Write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chapter5_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_5/chapter5_ex/ Chapter 5 - Exercises Corrie June 3, 2018 Chapter 5 - Exercises These are my solutions to the exercises from chapter 5. Easy. 5E1. The following linear models are multiple linear regressions: whereas the following are bivariate linear regressions: 5E2. Write down a multiple regression to evaluate the claim: Animal diversity is linearly related to latitude, but only after controlling for plant diversity. 5E3. Write down a multiple regression to evaluate the claim: Neither amount of funding nor size of laboratory is by itself a good predictor of time to PhD degree; but together these variables are both positively associated with time to degree. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6_ex/ Chpter 6 - Exercies Corrie July 8, 2018 Chapter 6 - Exercises These are my solutions to the exercises from chapter 6. Easy. 6E1. State the three motivating criteria that define information entropy. Information entropy (a measure of uncertainty) should be continous. A small change in probability should also lead to only a small change in uncertainty. We don’t want to allow for sudden jumps. increasing as the number of possible events increases. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6b/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6b/ Information Theory and Model Performance Corrie July 2, 2018 Entropy p <- c( 0.3, 0.7) -sum( p*log(p) ) ## [1] 0.6108643 compare this with: p <- c(0.01, 0.99) -sum( p*log(p) ) # contains much less information ## [1] 0.05600153 Kullback-Leibler Divergence p <- c(0.3, 0.7) q1 <- seq(from=0.01, to=0.99, length.out = 100) q <- data.frame(q1 = q1, q2 = 1 - q1) kl_divergence <- function(p, q) { sum( p* log( p/ q) ) } kl <- apply(q, 1, function(x){kl_divergence(p=p, q=x)} ) plot( kl ~ q1, type="l", col="steelblue", lwd=2) abline(v = p[1], lty=2) text(0. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6c/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_6/chapter6c/ Using Information Critera Corrie July 4, 2018 Using information criteria Model comparison library(rethinking) data(milk) d <- milk[ complete.cases(milk), ] # remove NA values d$neocortex <- d$neocortex.perc / 100 dim(d) ## [1] 17 9 head(d) ## clade species kcal.per.g perc.fat perc.protein ## 1 Strepsirrhine Eulemur fulvus 0.49 16.60 15.42 ## 6 New World Monkey Alouatta seniculus 0.47 21.22 23.58 ## 7 New World Monkey A palliata 0.56 29.66 23.46 ## 8 New World Monkey Cebus apella 0. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_7/chapter7/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_7/chapter7/ Interactions Corrie August 14, 2018 7.1 Building an interaction library(rethinking) data(rugged) d <- rugged How does terrain ruggedness influence the GDP? # make log version of outcome d$log_gdp <- log(d$rgdppc_2000) dd <- d[ complete.cases(d$rgdppc_2000), ] # split into Africa andnot-Africa d.A1 <- dd[ dd$cont_africa == 1, ] d.A0 <- dd[ dd$cont_africa == 0, ] Make two model: one for Africa, one for non-Africa: # Africa m7.1 <- map( alist( log_gdp ~ dnorm( mu, sigma) , mu <- a + bR*rugged , a ~ dnorm(8, 100), bR ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data=d. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_7/chapter7_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_7/chapter7_ex/ Chapter 7 - Exercises Corrie August 17, 2018 Chapter 7 - Exercises Easy. 7E1. For the causal relationships below, name a hypothetical third variable that would lead to an interaction effect. Bread dough rises because of yeast. sugar, since the yeast needs some food to grow temperature, if it’s too hot, the yeast dies, maybe a too cold temperature would slow down the dough rising salt inhibits yeast growth Education leads to higher income. https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_8/chapter8/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_8/chapter8/ Markov Chain Monte Carlo Corrie September 4, 2018 8.1 King Markov and His island kingdom A simple example of the Markov Chain Monte Carlo algorithm: num_weeks <- 1e5 positions <- rep(0, num_weeks) current <- 10 for (i in 1:num_weeks) { # record current position positions[i] <- current # flip coin to generate proposal proposal <- current + sample( c(-1, 1), size=1) if ( proposal < 1 ) proposal <- 10 if ( proposal > 10 ) proposal <- 1 # move? https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_8/chapter8_ex/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/chapter_8/chapter8_ex/ Chapter 8 - Exercises Corrie September 11, 2018 Chapter 8 - Exercises Easy. 8E1. Which of the following is a requirement of the simple Metropolis algorithm? The proposal distribution must be symmetric 8E2. Gibbs sampling is more efficient than the Metropolis algorithm. How does it achieve this extra efficiency? Are there any limitations? Gibbs sampling uses conjugate priors which allows it to make smarter proposals and is thus more efficient. The downside to this, is that it uses conjugate priors which might not be a good or valid prior from a scientific perspective. https://samples-of-thoughts.com/projects/statistical-rethinking/readme/ Mon, 01 Jan 0001 00:00:00 +0000 https://samples-of-thoughts.com/projects/statistical-rethinking/readme/ Statistical Rethinking These are code snippets, plots and my solutions to some of the exercises of the book “Statistical Rethinking” by Richard McElreath. I am currently updating code snippets and exercise solution to follow the second version of the book. Solutions to old exercises can still be found on an old branch. Chapter 2 Exercises Chapter 3 Exercises Chapter 4 Why everthing so normal First Linear Predictions Curvy Regression Exercises Chapter 5 Spurious Associations Masked Relationships When adding variables hurt Categorical Variables Ordinary Least Squares Exercises Chapter 6 Overfitting Information Theory and Model Performance Using Information Criteria Exercises Chapter 7 Interactions Exercises Chapter 8 MCMC Exercises Chapter 10 Binomial Regression Poisson Regression Other Count Regressions Exercises