Luigi review

Marton Trencseni - Sun 20 December 2015 • Tagged with data, etl, workflow, luigi

I review Luigi, an execution framework for writing data pipes in Python code. It supports task-task dependencies, it has a simple central scheduler with an HTTP API and an extensive library of helpers for building data pipes for Hadoop, AWS, Mysql etc. It was written by Spotify for internal use and open sourced in 2012. A number of companies use it, such as Foursquare, Stripe, Asana.

Continue reading

Cargo Cult Data

Marton Trencseni - Mon 26 January 2015 • Tagged with data

Cargo cult data is when you're collecting and looking at data when making decisions, but you're only following the forms and outside appearances of scientific investigation and missing the essentials, so it doesn't work.

Cargo cult data

Continue reading