- Git Cheatsheet for The Rest of Us - 2010/08/13
Here at Scalien we've been git users for a while now, with Keyspace being hosted on github. We've had great experiences with both git and Github.
The distributed nature of git is wonderful: we are able to commit locally into the local repo, and later push to the main github repo. Cloning is a great way for everyone to create forks for experimenting around and adding smaller features. Although git in itself is pretty cool, Github is the "killer app": it makes source control management beatiful, gives us a nice clicky interface for repo admin tasks, and lets everyone clone easily. We pay $12 for the small plan, this gives us 10 private repos to keep stealth mode projects in stealth mode. I bought the Pro Git book by Scott Chacon on Amazon for $23, who incidentally works at Github.
- New Scalien Offices - 2010/07/30
We just settled in to the new Scalien offices. I've posted some pictures on Flickr! We found that doing pair programming on the projector on the especially tricky parts of the code is the best way to get past big issues. We now have these pair programming sessions almost every day.
The full blog post is at the Scalien Blog.
- Making Mysql safe for transactional usage - 2010/05/06
By default, Mysql is not safe to run in transactional mode and store mission critical data such as financial transactions. By safety I mean correctnessi wrt. gotchas and silent undesired behaviour. For example, a Mysql server was migrated to a new one, but the new one had skip-innodb turned on, which resulted in all tables becoming non-transactional. Thanks to other settings, this only resulted in warnings which went unnoticed for months. Or, by default, Mysql will insert the value "0xyz" into an INT column instead of issuing an error, although this is clearly some type of mistake. The default configuration is optimized to work well for a website such as a blog. This is OK, because most Mysql users are using it for just that. That is also why most "advanced" articles discuss how to tune Mysql performance, but I couldn't find good guides (eg. a checklist) on making in transactionally safe. For a list of gotchas, see the Mysql gotchas page.
- New Logitech M705 Marathon Mouse - 2010/05/02
My old desktop mouse was a Logitech MX Laser. I loved that thing, it fit in my hands just the way a mouse should, it has a nice scroll wheel and a back button (and some others I never used), it just worked. Here's a picture:
- Write frameworkey programs - 2010/03/11
We spend many hours a day pondering what separates successful programming projects from not-so successful ones. At the beginning we are enthusiastic and write lots of code, but towards the end we are often dissatisfied with the actual code — irrespective of whether it gets the job done! The problem is, many, if not most programs turn out to be brittle, and we know it.
- Recent papers (2009) - 2010/01/26
I collected a bunch of papers I wrote last year, inlcuding Scalien whitepapers, a Free Software Foundation conference paper (in hungarian) and a short literature review of a physics book.
- Keyspace whitepaper: the distributed database which is currently Scalien's primary product, under heavy development!
- PaxosLease whitepaper: describes the Paxos-varaint invented by my co-authors and myself for negotiating distributed leases (time expiring locks) in a consistent manner.
- Paper on open-source distributed systems for the 2009 FSF conference (in hungarian)
- The Exciting physics of an Excited universe: a literature review paper of Norman Glendenning's book Compact stars, talks about neutron stars, pulsars and strange stars.
You should of course use the Google Docs Preview Firefox addon for previewing PDF files!
- The Confused World of "NoSQL" - 2009/11/28
Non-relational datastores are usually thrown together under the umbrella term "NoSQL", which recently just got its own Wikipedia entry. Just as the Wikipedia entry, the world of "NoSQL" is changing quickly. Here I will differentiate the different use-cases and motivations for using and building such systems.
- Google Wave Has Hit My Shores - 2009/11/09
Last week I got my Google Wave (GW) invite, and we were able to get a group of friends, roughly ~10 people also invited within a few days. This is fortunate, because it allows me to actually use GW to communicate and collaborate with a common set of people. The short story is: Google Wave has rocked our world, we produced a high number of waves and blips within a few days (blips are the actual comments in GW terminology), much more than we would have using email in the same time period. I estimate the volume of communication increased 2-5x. Aside from volume, GW also changed the nature of communication: this group did not use IM before, but several instances of real-time chatting occured within GW. Also, GW opened new channels between people, eg. I chatted with a person I usually only talk to in real-life (and rarely). Overall, the "synchronous and asynchronous" nature of GW is a real killer feature. Even the characters-in-real-time feature (meaning you can see the other's keystrokes in real-time, which is sometimes annoying and too much) is overall a good feature, because it engages the parties. GW is such a pleasant communication platform that even the alpha-as-in-buggy-as-hell web interface, uncharacteristic of Google, has not deterred us from using it. Overall I think GW is a game-changer: this group has not exchanged a single email since we all got on GW. I'm trying to get other groups onto GW as soon as possible.

- SSD Reading List - 2009/10/18
As part of the preparatory work of replacing BerkeleyDB with our own storage engine in Keyspace, I've been looking at papers about SSD technology. It looks like SSDs will replace spinning disks in a couple of years, and in anticipation of this we're looking into the possibility of optimizing the engine for its characteristics: cheap random reads, potentially expensive writes because of erases behind complex on-disk logic called the Flash Tranlation Layer (FTL). Here are some good resources I found.
- Google Docs Preview Firefox Addon - 2009/09/10

Download the Google Docs Preview Firefox Addon here.Google has exported their very cool web-based PDF viewer. You can use it by telling it what PDF file to open like this:
http://docs.google.com/gview?url=PDF_URL
- Keyspace whitepaper - 2009/07/23
- Scalable Web Architectures and Application State - 2009/06/17
In this article we follow a hypothetical programmer, Damian, on his quest to make his web application scalable.
- Thoughts on Yahoo's PNUTS distributed database - 2009/02/15
I've updated the Readings in Distributed Databases with Yahoo's new PNUTS paper.
PNUTS is Yahoo's in-house distributed tablestore used for serving some of its web properties. The goal in this post is finding out the basics: how replication is managed, what kind of guarantees the system makes, can branching occur...
- My Startup Manifesto - 2009/01/25
This is my startup manifesto.
- CIDR 2009 Proceedings - 2009/01/08
I could not attend this year's meeting, so I was waiting for the proceedings. Here it is:
- Introducing PrimoBlog - 2009/01/08
In an effort to get rid of compromisable targets on my server, I decided to roll my own "blog engine": PrimoBlog. As the name implies, it's quite primitive. It consists of 123 lines of shell code plus the framing HTML and CSS code. It generates static HTML and RSS from text files, so no PHP or Mysql is involved.
- Shackleton's Job Advertisement - 2009/01/08
Ernest Shackleton's 1907 ad in London's Times, recruiting a crew to sail with him on his exploration of the South Pole:
Wanted. Men for hazardous journey.
Low wages. Bitter cold.
Long hours of complete darkness.
Safe return doubtful.
Honor and recognition in the event of success.Would this kind of ad work on programmers?
- Re: Readings in Distributed Systems - 2008/12/06
I've updated the Readings in Distributed Systems page with a few papers: 2 on implementation issues and one on Sinfonia.
- Readings in Distributed Systems - 2008/10/28
I started maintaining a list of papers on Distributed Systems.
- Entry to Photography - 2008/10/14
I've wanted to pick up photography as a hobby for years, and now that it's off-season, I finally had a chance to go out and just do it. First of all I watched BBC's Genius of Photography --- it's the perfect way to find inspiration and see for yourself that cameras don't take good pictures, photographers take good pictures.
- Doing an Ironman - 2008/09/22
Why write about the Ironman here, a blog meant for programmers and other technical types? From personal experience I know that people have a misconception about the Ironman. They think it's about swimming, biking and running training followed by a grueling day of racing followed by some bragging, per the Ironman slogan:
- Show Me Your Data Structure... - 2008/09/10
There's an old programming proverb which goes something like this:
Show me you algorithm,
and I will remain puzzled,
but show me your data structure,
and I will be enlightened. - Transactions in Memory - 2008/09/01
I recently re-read Chapter 24 of Beautiful Code, titled Beautiful Concurrency (PDF). It's very well written, you won't regret reading it. In it, Simon Peyton Jones makes the case for Software Transactional Memory vs. explicit locking by the programmer.
- Ironman - 2008/08/25
I've been busy this weekend finishing up my first Ironman (3.8km swim, 180km bike, 42km run) season.
- P2P vs. the Cloud - 2008/08/16
Marco Kotrotsos in a recent article claims that the cloud has failed, and instead advocates a P2P cloud architecture:
I am referring to the slew of outages the last day’s and weeks of some high profile infrastructures like Amazon S3 and Google. I always thought Googles infrastructure was untouchable. But it seemed that I had woken up to a different world last week. When I saw tweets and messages going around with "Gmail is down".
...
P2P Cloud computing in general will be the future of infrastructure. - Data-based Recommendations - 2008/08/04
Social recommendation sites like Slashdot, Digg or the smaller Hacker News are used by millions to find interesting content on the web. Some are supposed to be worth a lot of dollars. Although these sites are useful in the sense of being better than nothing, the signal to noise ratio is still very low. Hacker News' feed features roughly 75 posts a day of which maybe 5-10% are interesting to me. In the long run, scanning through 75 posts a day to find 8 mildly interesting ones is not very good. Granted, for discovering "breaking news" social recommendation sites are ideal --- but in reality, most posted links are not new, they're just supposed to be interesting.
- Beautiful Migration - 2008/07/28
O'Reilly's Beautiful Code is an excellent book with 33 short, bite-sized chapters about interesting code. After I read the book I thought it'd be a good retrospective exercise to write your own Chapter 34 to share insights with other programmers. So here it goes:
- Book Recommendation: C Interfaces and Implementations - 2008/07/21
- Bloatware is a Business Opportunity (Part II) - 2008/07/15
Read Part I first.
Given bloatware, what is the end-user to do?
Ben Kenobi would tell you to use an elegant weapon from a more civilized time.
- Bloatware is a Business Opportunity (Part I) - 2008/07/15
Bloatware is defined by Wikipedia as:
Software bloat, also known as bloatware or elephantware, is a term used in both a neutral and disparaging sense, to describe the tendency of newer computer programs to be larger, or to use larger amounts of system resources (mass storage space, processing power or memory) than necessary for the same or similar benefits from older versions to its users. Additionally, the term bloatware is used in common language for pre-installed, huge software bundles, mostly consisting of demos and trial ware.
- The Hollywood Model in Software Engineering - 2008/07/14
One of the downsides of working at a large corporation is low work efficiency and inter-project idle-times. From my own experience, I estimate that the corporate programmer could be 2x - 3x more productive, in other words, he is performing at 33% - 50% capacity. The interesting question is, is there another way? I've always known how Hollywood movies are produced, but I've never made the connection to software project management pointed out by Sue Bushell in her piece entitled Replicating the Hollywood Model.

