What not to spend time on
Marton Trencseni - Mon 23 July 2018 • Tagged with warren, buffett, self, help, physics, haskell
Warren Buffett says deciding what not to spend time on is just as important as deciding what to spend time on.
Marton Trencseni - Mon 23 July 2018 • Tagged with warren, buffett, self, help, physics, haskell
Warren Buffett says deciding what not to spend time on is just as important as deciding what to spend time on.
Marton Trencseni - Sat 07 July 2018 • Tagged with statistics, data
When working with averages, we have to be careful. There are pitfalls lurking to pollute our statistics and results reported.
Marton Trencseni - Wed 14 March 2018 • Tagged with data, etl, workflow, airflow, fetchr
We used Hive/Presto on AWS together with Airflow to rapidly build out the Data Science Infrastructure at Fetchr in less than 6 months.
Marton Trencseni - Wed 09 November 2016 • Tagged with data, science, product, analytics
I used to think that a good analogy for using data is the instrumentation of a cockpit in an airliner. Lots of instruments, and if they fail, the pilot can’t fly the plane and bad things happen. There’s no autopilot for companies. The problem with this analogy is that planes aren’t built in mid-air. Product teams and companies constantly need to build and ship new products.
Marton Trencseni - Sun 05 June 2016 • Tagged with ab-testing, strata, statistics, data
I gave this talk at the O’Reilly Strata Conference London in 2016 June, mostly based on what I learned at Prezi from 2012-2016.
Marton Trencseni - Sat 14 May 2016 • Tagged with books, programming, hhvm, brooks
I read this book on my first vacation after I started working at Facebook and thus became a semi-regular Hack/HHVM user. I highly recommend reading (parts of) it. But not to learn Hack/PHP, which is irrelevant to most people. Instead, it’s to learn about how Facebook improved it’s www codebase and performance without rewriting the old PHP code in one big effort, and thus avoided the famous Second-system effect.
Marton Trencseni - Tue 16 February 2016 • Tagged with physics, einstein, relativity
This post is about the amazing success of Einstein's general theory of relativity. The theory predicts, among other things the accelerating Universe, black holes, gravitational lensing and gravitational waves. The real shocker is to remember that Einstein didn't invent general relativity to explain these. He didn’t know about these, they didn't exist at that time!
Continue readingMarton Trencseni - Mon 08 February 2016 • Tagged with heisengames, business
Most bets businesses take, be it hiring, features, products or strategy don't work out. Still, many businesses are successful despite setbacks. A negative attitude---even when the analysis of the situation is in fact correct---may be missing the bigger picture.
Continue readingMarton Trencseni - Sun 07 February 2016 • Tagged with coding, ide, c9
For the past 2 months I've been using Cloud9 for writing code in the cloud, and I can wholeheartedly recommend it: it just works for me. It's basically Docker plus an IDE: you get a Docker container running Ubuntu that you can access over a web IDE.
Continue readingMarton Trencseni - Sat 06 February 2016 • Tagged with data, etl, workflow, luigi, airflow, pinball
A spreadsheet comparing the three opensource workflow tools for ETL.
Marton Trencseni - Sat 06 February 2016 • Tagged with data, etl, workflow, pinball
Pinball is an ETL tool written by Pinterest. Like Airflow, it supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard). Unfortunately, I found Pinball has very little documentation, very few recent commits in the Github repo and few meaningful answers to Github issues by maintainers, while it's architecture is complicated and undocumented.
Continue readingMarton Trencseni - Thu 07 January 2016 • Tagged with blog, pelican
Make a simple blog with Github Pages and Pelican.
Continue readingMarton Trencseni - Wed 06 January 2016 • Tagged with data, etl, workflow, airflow
Airflow is a workflow scheduler written by Airbnb. It supports defining tasks and dependencies as Python code, executing and scheduling them, and distributing tasks across worker nodes. It supports calendar scheduling (hourly/daily jobs, also visualized on the web dashboard), so it can be used as a starting point for traditional ETL. It has a nice web dashboard for seeing current and past task state, querying the history and making changes to metadata such as connection strings.
Marton Trencseni - Wed 06 January 2016 • Tagged with systems, books
Thinking in Systems, written by the late Donella Meadows, is a book about how to think about systems, how to control systems and how systems change and control themselves. A system can be anything from a heating furnace to a social system. The gem of the book is the part about system traps. System traps are ways a system can go wrong; examples are drift to low performance, seeking the wrong goals, shifting the burden, etc.
Marton Trencseni - Sun 20 December 2015 • Tagged with data, etl, workflow, luigi
I review Luigi, an execution framework for writing data pipes in Python code. It supports task-task dependencies, it has a simple central scheduler with an HTTP API and an extensive library of helpers for building data pipes for Hadoop, AWS, Mysql etc. It was written by Spotify for internal use and open sourced in 2012. A number of companies use it, such as Foursquare, Stripe, Asana.
Continue readingMarton Trencseni - Mon 26 January 2015 • Tagged with data
Cargo cult data is when you're collecting and looking at data when making decisions, but you're only following the forms and outside appearances of scientific investigation and missing the essentials, so it doesn't work.