Culture talk: What I learned from building a distributed database that never took off

Marton Trencseni - Tue 31 December 2024 - Data

Introduction

Recently, I delivered a culture talk on Scalien, my old database startup titled What I learned from building a distributed database that never took off. Below is a short re-cap.

The Scalien journey began after completing my Computer Science degree and working as a C++ programmer at Graphisoft and later at Opennet, a small Hungarian telco company. Driven by a passion for systems programming and inspired by seminal Google File System, MapReduce, and Bigtable papers on distributed systems, my co-founder and I decided to venture into the startup world. We aimed to create a NoSQL database that stood out by focusing on strong consistency and reliability using the Paxos algorithm for replication — versus the scalability-centric approaches of competitors like MongoDB and Cassandra.

At Scalien, we developed two distributed key-value databases: Keyspace and ScalienDB. Keyspace was our initial effort, focusing solely on replication without sharding, while ScalienDB expanded to include sharding, cross-platform support (Windows, Linux, BSD), comprehensive admin tooling, and client libraries for multiple languages including Python, PHP, C#, and C++. Our implementation leveraged the Paxos algorithm to ensure reliable replication and consistency across distributed nodes, addressing a critical gap in the NoSQL landscape where most databases prioritized scalability over reliability. ScalienDB demonstrated superior performance in benchmarks, attracting a few paying customers, including a VC-funded Hungarian startup and a NASDAQ-listed SaaS company. Despite our technical achievements, such as writing approximately 200k lines of robust C++ code and achieving beta-quality software, we faced ultimately insurmountable challenges.

Ultimately, Scalien ceased operations in late 2012 after four years, unable to secure further funding amidst fierce competition from well-funded US companies like MongoDB, which had raised $100M by that point. The rise of cloud providers offering managed database services reduced the demand for our specialized solutions, and our open-source business model proved unsustainable. Additionally, the relentless demands of startup life led to burnout, and we struggled with enterprise sales and understanding the venture capital landscape. Reflecting on this experience, I realized that technical excellence alone isn’t enough for business success. Aspiring entrepreneurs should focus on realistic expectations, niche markets, complementary co-founders, and a deep understanding of business dynamics. If I were to start again, I would prioritize iterative development, customer validation, and perhaps adopt a VC-less studio model to maintain control and profitability, avoiding the pitfalls that Scalien encountered.

ScalienDB

ScalienDB was architected as a robust distributed NoSQL database that emphasized strong consistency and reliability through its innovative use of the Paxos consensus algorithm. The system was divided into four core components: controllers, shard servers, a web-based management console, and client libraries. Controllers managed the cluster state and schema, ensuring that all nodes operated with a consistent configuration. Shard servers handled data storage and replication, organizing data into quorums to maintain high availability and fault tolerance. The web management console provided administrators with intuitive tools to oversee and manage the cluster, while the client libraries facilitated seamless interaction between applications and the database. This modular architecture allowed ScalienDB to efficiently handle large-scale online request processing, making it well-suited for cloud-based SaaS applications.

At the heart of ScalienDB's reliability was its sophisticated implementation of the Paxos algorithm, tailored to meet the demands of distributed systems. Unlike traditional Paxos, ScalienDB introduced TotalPaxos for shard servers, requiring all nodes within a quorum to participate in the consensus process. This approach enhanced consistency by ensuring that any changes were unanimously agreed upon, thereby eliminating the risk of data divergence. Additionally, ScalienDB employed PaxosLease for master node election within the controller quorum, enabling swift and deadlock-free leadership transitions. The system also implemented MultiPaxos and commit chaining to optimize performance by reducing the number of disk commits and network roundtrips required for consensus. These enhancements not only improved the efficiency of replication but also fortified the database against various failure scenarios, ensuring that ScalienDB could maintain consistent state across all replicas even in the face of node outages or network partitions.

A standout feature of ScalienDB was its custom-built storage engine, which was meticulously designed to overcome the limitations encountered with existing solutions like BerkeleyDB. The storage engine utilized a chunk-based architecture, where data was partitioned into fixed-size chunks that were individually managed and replicated across shard servers. To accelerate data retrieval, ScalienDB implemented bloom filters, a probabilistic data structure that significantly reduced the need to scan irrelevant chunks during query execution. This approach enhanced read performance by quickly eliminating chunks that did not contain the desired keys. Furthermore, ScalienDB employed an efficient shard splitting and merging mechanism to maintain balanced data distribution and optimize storage utilization. By dynamically adjusting the number of shards based on data size and access patterns, the storage engine ensured that the database could scale horizontally without compromising on performance. These technical innovations in the storage layer, combined with ScalienDB's rigorous replication strategies, underscored its commitment to delivering a high-performance, reliable distributed database solution.

Links

Slides