As I wrote in previous article Google Cloud Storage has interesting features which doesn't have to be immediately obvious. Today I want to continue this topic with focus on command gsutil.
gsutil is a command line tool to work with Google Cloud Storage. It's part of Google Cloud SDK but it can be installed as standalone program. Code is open sourced on Github. If you look at gsutil commands you'll see that it follows unix convention of commands like ls, mv, cp etc. Beauty of gsutil is that it offers in easier work with GCS (some cases) than for example through Cloud Console (Web UI) and in some cases unique commands. I want to focus on those especially in this article.
Syncing local folder with bucket.
gsutil rsync -d -r <local folder> gs://<bucket name>
r is for recursive upload of folders and d is to delete content if need to synchronise local file with bucket. Delete flag should be used with caution, because unwanted deletion of data can occur.
If you set this in cron on your local computer, you can have simple automated backup.
Used for bulk operations on objects in bucket as for example, setting encryption keys or Storage class. When you change storage class for bucket, all new objects will be stored under new storage class, but existing files not. For example this command will set all files in bucket to Coldline storage class
gcloud rewrite -s coldline gs://<bucket name>/**
Logging allows you to log access, i.e. every request to your bucket/objects and also daily storage consumption. Logs are stored in specified bucket daily in csv format including various info like timestamp, ip address, http method, gcs url, time, user agent, gcs operation... to enable logging you need to enable it on desired buckets and define buckets where log files will be stored. You can also define prefix of log file for easier operations.
gsutil logging set on -b gs://<bucket name where log files will be created> -o <log file prefix> gs://<monitored bucket name 1> gs://<monitored bucket name 2 etc>
and also allow write access for bucket where log data will be stored:
gsutil acl ch -g [email protected]:W gs://<bucket name where log files will be created>
you can afterwards copy files locally and export to BigQuery for further analysis.
Similar as unix command, it concatenate output of file to console (stdout). With r flag it's possible to set range of bytes to output:
This prints out whole file:
gsutil cat gs://<bucket name>/object
wheres this prints header (object path) and first 1000 bytes for object prefix
gsutil cat -h -r 0-1000 gs://<bucket-name>/<object-prefix>*
Concanates up to 32 objects in GCS into one. If you have multiple objecs/files in Storage, this is ideal way to put them together. you list paths of the files in buckets (files don't need to be in the same bucket) and as last path for resulting file
gsutil compose gs://<bucket name>/object1 gs://<bucket name>/object2 gs://<bucket name>/object3 gs://<bucket name/final_object
That's it. gsutil has of course many more commands which makes working with Google Cloud Storage more easier and simpler.