Fetchr Data Science Infra at 1 year

Posted on Tue 14 August 2018 in Data • Tagged with data, etl, workflow, airflow, fetchr, model, ml

A description of our Analytics+ML cluster running on AWS, using Presto, Airflow and Superset.

Continue reading

Building the Fetchr Data Science Infra on AWS with Presto and Airflow

Posted on Wed 14 March 2018 in Data • Tagged with data, etl, workflow, airflow, fetchr

We used Hive/Presto on AWS together with Airflow to rapidly build out the Data Science Infrastructure at Fetchr in less than 6 months.

Continue reading

Luigi vs Airflow vs Pinball

Posted on Sat 06 February 2016 in Data • Tagged with data, etl, workflow, luigi, airflow, pinball

A spreadsheet comparing the three opensource workflow tools for ETL.

Continue reading

Airflow review

Posted on Wed 06 January 2016 in Data • Tagged with data, etl, workflow, airflow

Airflow is a workflow scheduler written by Airbnb. It supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard), so it can be used as a starting point for traditional ETL. It has a nice web dashboard for seeing current and past task state, querying the history and making changes to metadata such as connection strings.

Continue reading