Posted: 2017-05-20 13:10:28 by Alasdair Keyes
CockroachDB has been floating around for a few years and version 1.0 has just been released ready for production. I had been aware of the system for some time but had never really played with it, so I decided now would be as good a time as any to prod it a little.
This article won't be any kind of how-to because their own documentation (available at https://www.cockroachlabs.com/docs/) is fantastic and if you do look into using CockroachDB, it is by far the best place to start.
My main DB experience is with configuring, maintaining and developing on MySQL (although I've slowly been using Postgresql on projects due to the advanced features it provides) so these two RDMS systems are my benchmark going forward. (Within the MySQL/Postgresql labels I'm also including the add-ons and enterprise tools such as Percona for MySQL, and EnterpriseDB for Postgresql).
As the name 'CockroachDB' suggests, the system is designed to be hard to kill, being able to provide a scalable, fault tolerant, distributed DB solution which will continue running with multiple nodes missing.
For any system nowadays, high availability is not a 'nice to have' feature or requirement to consider later, but something that requires careful thought and planning from the outset; for even the most basic set-up. For many companies this is often just setting up a MySQL Master/Slave (Or Master/Master if you're a sadist and into hacky solutions) or Postgresql's streaming replication to "kind-of-sort-of" get some duplication of data and redundancy. Although this does provide some quick wins over a single node setup, in a modern platform that needs to minimise downtime and remove risk of data loss it is not a good solution. Postgres has some solutions such as PG-Pool, EDB Failover Manager, PgBouncer etc, but these are still tacked on and from experience, not a solution that I would want to force my business to rely on.
It's with this experience that I've been waiting for something like CockroachDB. On top of this, it's good to see that 'old-fashioned' Relational databases are still getting new blood after the fast increase of NoSQL systems over the last 10 years.
From having a play about these are the key things that poppet out at me (but I'm sure there are many others)
With clusters such as MySQL's NDB, there are data nodes and SQL nodes. Clients can only run queries via the SQL nodes and I've always thought this a limitation that you are not utilising your cluster to the full. With CockroachDB, if your node is running, you can connect to it and run SQL queries against the data stored on it. You will need to look at some way of managing connections, the simple way is with HAProxy and they even provide a way of making the HAProxy config automatically for you https://www.cockroachlabs.com/docs/manual-deployment.html###step-5-set-up-haproxy-load-balancers
And when I say easy.... I mean easy. With any real-world use-case you will be tweaking and configuring your system to use many more switches, but in testing, I just started a node and told it which cluster to join and it just joined, synced data and became usable within seconds.
cockroach start --insecure --host=cdbnode02 --join=cdbnode01 --background
And that's it.
--insecure switch just allows you to run a local cluster without generating TLS CA/Client certs etc, this would not be used in a live environment
I'm not usually a fan of pretty interfaces for server applications, often they sacrifice the brevity and conciseness of a command line for very little benefit, however CockroachDB starts a web interface by default when the node starts... and it's fantastic. The interface is clean and easily understandable. You can view DB logs, statistics, cluster information, node details all through one screen. With DB systems, interfaces like this usually require you installing some bulky Java app or paying a fortune for their 'Enterprise' tools, but this is neither and invaluable for monitoring the health and performance of your cluster.
Any web developer will have got used to using MySQL/Postgresql/MSSQL/other RDMS client libraries for their chosen language and it can take some time for a new DB to get a mature, reliable library. With CockroachDB this is not an issue. The system is designed to be compatible with Postgresql, so you can use the existing libraries for your language to get stuck right in https://www.cockroachlabs.com/docs/install-client-drivers.html
This is also a benefit for users should CockroachDB not succeed. A company can go down the CockroachDB road early on and be secure in the knowledge that even if it doesn't succeed or shuts up shop after 5 years, there is a migration path to Postgresql and their application will require no big re-write. This will really help adoption of Cockroach early on and is a great decision by CokcroachLabs.
I can only speak for Linux, but I assume Mac/Windows is the same. Everything is available in one binary and installation is to download this binary and place it into
/usr/local/bin/ (or elsewhere, should you prefer) and that is it. The same binary provides server tools and client tools all in one. If I were to use this in production, I would likely use BASH aliases or similar to split out the client/server functionality, but this means that upgrades are a doddle and it would be good to have.
CockroachDB have taken a interesting path with configuration.... there are no config files. Everything is configured using command line switches. There is no SysV Init/Upstart/Systemd shipped with it, setup is controlled by creating a startup script with all your settings and then placing it into a VCS.
One thing to note is that CockroachDB uses
$HOME as a base for creating/storing files by default.
Going back to the simplicity of a single binary. CockroachDB is designed to use a rolling upgrade path (See here for more information https://www.cockroachlabs.com/docs/upgrade-cockroach-version.html). To upgrade, you just rollout the new binary and restart.
Most MySQL/Postgresql clusters usually have a slave set aside solely for backups. This is a node that receives updates from the master but accepts no other client connections so it can be used to backup data without causing locking/load issues. This is the same with CockroachDB, add another node, firewall off client IPs and set up a backup cron https://www.cockroachlabs.com/docs/back-up-data.html
I will continue to play about with Cockroach, and I have not had to use it in anger or run any performance benchmarks against it so I have no idea how it competes with other RDMS', but for a first look, it is outstanding.
One thing that I am impressed with is how stable it feels, the availability of tools such as the web interface, the easy of set-up and configuration really lends itself to feeling like an extremely mature and safe system. And although it's not at all logical to make decisions on a hunch; feeling safe and confident in software will be 9/10s of the battle when a team/company is deciding if it should implement a specific database solution.
I'm extremely excited about the idea of an easy to use, reliable HA database system such as CockroachDB, the only worry I have is that in a cloud-driven era where lots of people are already invested into platforms such as MySQL on AWS, it will be hard for CockroachDB to get a foot in the door. It would however be fantastic for an in-house system.
Note: Beware that Cockroach does send scrubbed diagnostics information back to CockroachLabs, see here for information on how to stop it https://www.cockroachlabs.com/docs/diagnostics-reporting.html
Note #2: Reading the FAQ is a must, there are some use-cases where CockroachDB is not suitable https://www.cockroachlabs.com/docs/frequently-asked-questions.html
If you found this useful, please feel free to donate via bitcoin to 1NT2ErDzLDBPB8CDLk6j1qUdT6FmxkMmNz
I'm now available for IT consultancy and software development services - Cloudee LTD.
Happy user of Digital Ocean (Affiliate link)