Heisengames and the importance of patience in business

Posted on Mon 08 February 2016 in Business • Tagged with heisengames, business

Most bets businesses take, be it hiring, features, products or strategy don't work out. Still, many businesses are successful despite setbacks. A negative attitude---even when the analysis of the situation is in fact correct---may be missing the bigger picture.

Continue reading

Cloud9: Cloud coding that actually works

Posted on Sun 07 February 2016 in Coding • Tagged with coding, ide, c9

For the past 2 months I've been using Cloud9 for writing code in the cloud, and I can wholeheartedly recommend it: it just works for me. It's basically Docker plus an IDE: you get a Docker container running Ubuntu that you can access over a web IDE.

Continue reading

Luigi vs Airflow vs Pinball

Posted on Sat 06 February 2016 in Data • Tagged with data, etl, workflow, luigi, airflow, pinball

A spreadsheet comparing the three opensource workflow tools for ETL.

Continue reading

Pinball review

Posted on Sat 06 February 2016 in Data • Tagged with data, etl, workflow, pinball

Pinball is an ETL tool written by Pinterest. Like Airflow, it supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard). Unfortunately, I found Pinball has very little documentation, very few recent commits in the Github repo and few meaningful answers to Github issues by maintainers, while it's architecture is complicated and undocumented.

Continue reading

How to make a blog like this

Posted on Thu 07 January 2016 in Meta • Tagged with blog, pelican

Make a simple blog with Github Pages and Pelican.

Continue reading

Airflow review

Posted on Wed 06 January 2016 in Data • Tagged with data, etl, workflow, airflow

Airflow is a workflow scheduler written by Airbnb. It supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard), so it can be used as a starting point for traditional ETL. It has a nice web dashboard for seeing current and past task state, querying the history and making changes to metadata such as connection strings.

Continue reading

Systems thinking and system traps

Posted on Wed 06 January 2016 in Books • Tagged with systems, books

Thinking in Systems, written by the late Donella Meadows, is a book about how to think about systems, how to control systems and how systems change and control themselves. A system can be anything from a heating furnace to a social system. The gem of the book is the part about system traps. System traps are ways a system can go wrong; examples are drift to low performance, seeking the wrong goals, shifting the burden, etc.

Continue reading

Luigi review

Posted on Sun 20 December 2015 in Data • Tagged with data, etl, workflow, luigi

I review Luigi, an execution framework for writing data pipes in Python code. It supports task-task dependencies, it has a simple central scheduler with an HTTP API and an extensive library of helpers for building data pipes for Hadoop, AWS, Mysql etc. It was written by Spotify for internal use and open sourced in 2012. A number of companies use it, such as Foursquare, Stripe, Asana.

Continue reading

Cargo Cult Data

Posted on Mon 26 January 2015 in Data • Tagged with data

Cargo cult data is when you're collecting and looking at data when making decisions, but you're only following the forms and outside appearances of scientific investigation and missing the essentials, so it doesn't work.

Continue reading