Samples of Thoughts

Memory Efficiency in Pandas

If you’re working with big data in pandas you can run into memory problems very quickly. When working locally, your machine might slow down or you even get this lovely message that asks you to please kill some applications. If working in the cloud, one can of course always ramp up memory but trust me, having to restart a couple of thousand killed jobs because of Out-of-Memory errors is not fun and also pricey! »

Piping in Pandas: Group By and Mutate

I am a big fan of the tidyverse in R but most of the time, I actually use Python. If the rest of your team uses Python, your production code is in Python, it simply doesn’t make much sense to use R. Anyway, I started to like working with pandas much better once I figured out how to pipe with pandas and how to “translate” from tidyverse to pandas. Then this code in R »

Animated Facebook Messages

I recently downloaded my own Facebook data and wanted to find out what kind of data gems I could find. There are some clear advantages when analyzing your own data, foremost, you’re the expert and know the “ground truth” behind the data. That said, there can still be big surprises! In my case, the most interesting parts of the analysis could be boiled down in two graphics. Since there’s also a time factor in the data, I thought this is a good opportunity to learn about animated plots and indeed, it works quite beautifully with the two plots. »

AI Guild Podcast: Data Science Interviews

On turning the tables during job interviews as you experience grows Some time ago, I sat down with Leyla from the AI Guild to talk about my own Data Science path but also my experience in job interviews for data positions. Recently, as I’ve been interviewing for positions, I wondered how to find out some more sensitive topics. How is the company keeping it with diversity? Is diversity important for them and do they take active measures to increase e. »

Visa Costs meet Data Viz

I recently stumbled across this data set about visa costs. It is a collection of visa costs for all countries for different kind of visas (tourist, business, work, transit, and some other visas). Each row corresponds to visa relations between a source country (the country applying for the visa) and a target country (the country issuing the visa) together with the cost for the different visa types. Since I had a bit of free time on my hand, I decided to do some “plotcrastinating”, play around with the data and try out some new visualizations. »

Menu

about data, statistics and everything in between

Memory Efficiency in Pandas

Piping in Pandas: Group By and Mutate

Animated Facebook Messages

AI Guild Podcast: Data Science Interviews

Visa Costs meet Data Viz