Gentle introduction to Bigtable

In May this year, Google announced availability of Big Table database for public use through Google Cloud Platform. What that means is that we mortals, can use the same database that Google uses for it’s products like Gmail, Google Analytics or Google Earth and many others.

It’s praiseworthy that Google released paper about Big table in 2006 which was inspiration for other opensource NoSql databases like HBase (closest to Bigtable) and Cassandra.

Of course it’s inspiring to read original paper or documentation on Google Cloud Platform but if you want to save time I’ll try to summarize and write key points about Big Table on Google Cloud Platform.

 

Philosophy

Big Table is column based NoSQL database. What that means is that (as most NoSQL databases) it doesn’t have strong structure, but you can add columns to row as you wish.

Only row key is indexed, which means you can run queries based on keys or range of keys. Rows are sorted by row key, thus it's important to set row key to be evenly distributed.

Everything is stored as string.

Every row have column families where every family have columns which should are grouped together when they are related (it’s a bit weird on first reading but I will try to explain in example later).

There is third dimension in data representation (beside row and colums) and that is timestamp. 

Empty cells don’t take space and data are compressed.

It’s transactional only on row level.

So far there are two Client APIs that can be used. HBase (Java) and Go. I think there are wrappers for HBase client in other languages, so I think everybody can pick what it suits him.

Of course there are some recommendations how to model data, what is good way to construct key etc. It’s all in documentation.

Setup

I was surprised to see how easy is to set up Bigtable Cluster:

Under project in developer console go to Storage > Bigtable and then click on Create cluster

Then you type name, cluster id, select zone and number of nodes. Number of nodes is of course proportional to performance where cluster of 3 nodes should hold to 30000 QPS or 30MB/s and cost 1.95$ per hour and 0.17$ per GB/month for storage. Cluster is set in matter of seconds and there you have it. Ah yes, in footer there is info that it can store up to 200 PB of data so you don’t have to worry that you will run out of space on your hard drive(s). In fact documentation says that it’s not suitable for data lower than 1TB. Setup reminds me of Google App Engine Datastore where you basically don’t have setup, you just write code. Resizing cluster is matter of typing number of nodes (number of nodes is linear to the throughput, i.e. one node holds ~10000 QPS). 

There are different ways how to play and explore with BigTable. I've tried shell access and simple Go server which is connected to Bigtable cluster. I admit I wanted to create my own example but in this hot weather as it is now it's above my focus. Hopefully some simple example will come to my mind and in next article I will create step by step example. 

 
blog comments powered by Disqus