Sharing local Datastore between GAE and client library

Since the beginning (2008) Google App Engine (GAE) had integrated Datastore API which could be used also for local development. Few years later Datastore was created as separate product on Google Cloud Platform and could be accessed either with HTTP requests or with client libraries for most popular languages. To use Datastore locally for development, Datastore emulator was developed as part of Google Cloud SDK. Until recently accessing GAE data from local database outside of GAE application wasn't possible, although in cloud it's the same database.

There are cases when you wan't to share data between GAE Standard and some other application in your project, for example GAE Flexible or Google Compute Engine instance and it's very convenient for local development as well if that would be possible.

It just happen that I'm working on similar project where I need to share data between GAE Standard applicatoin and web app on GCE, so I accidentally discover that it's possible since May 2018 to share local Datastore between GAE app and rest of the local world.

Here is official documentation which explains how it works https://cloud.google.com/appengine/docs/standard/python/tools/migrate-cloud-datastore-emulator but since it lacks some explanation and steps (functionality is currently in beta) it took me some time to figure out, I decided to create simple application and explain usage

Basically dev_appserver which is used for running locally GAE can be forced to use Datastore emulator (which as it looks like in the future will happen by default) and it can even migrate data from GAE local dastore to emulator datastore, but somewhere during process I lost local data in my existing GAE application while I was figuring out how to run this, not sure what caused that :)

Complete code is here https://github.com/zdenulo/local-datastore-gae-rest

I'm using Python 2.7 in example since it works both for GAE Standard and Cloud Datastore Python library. 

In order to use this few things needs to be done:

First one is to download and set up Google Cloud SDK if you didn't already, I wrote instructions how to do it here https://www.the-swamp.info/blog/configuring-gcloud-multiple-projects/ 

Second thing is to install Datastore Emulator (if you don't have it already). You need to have Java JRE (Datastore emulator works under Java) and execute command:

gcloud components install cloud-datastore-emulator

Third thing is to install GAE SDK:

gcloud components install app-engine-python

You need to set application variable in app.yaml file to correspond to project you have set in Cloud SDK. 

To see what project id you have set in Cloud SDK execute:

gcloud config get-value project

To set project id:

gcloud config set project <project_id>

 

Next you need to install client Datastore library for Python (either in virtualenv or as superuser):

pip install google-cloud-datastore

 

You start GAE application with following command (if you are in project's folder):

dev_appserver.py --support_datastore_emulator=true --dev_appserver_log_level=debug --datastore_emulator_port=8081 --datastore_emulator_cmd="<GOOGLE_CLOUD_SDK_FOLDER>/platform/cloud-datastore-emulator/cloud_datastore_emulator" .

- support_datastore_emulator - forces dev_appserver to use Datastore emulator and not built in GAE datastore simulator. 

- datastore_emulator_port - here is declared explicitly port which Datastore emulator is using and it's convenient to do that since after every restart port changes, so you don't have to change it elsewhere.

- datastore_emulator_cmd - this actually isn't document at the moment but it basically represents path to Datastore emulator start up file. I found it in Google Cloud SDK folder under /platform/cloud-datastore-emulator/cloud_datastore_emulator

That's it!

Check for debug info in console if there are some errors.

Here is code overview, I guess not special.

in main.py I've created simple db model Book with only one field: name. I've also created requests handlers through which I am fetching and creating objects.

import time

import webapp2
from google.appengine.ext import ndb


class Book(ndb.Model):
    name = ndb.StringProperty()


class CreateHandler(webapp2.RequestHandler):
    def get(self):
        name = self.request.get('name', 'gae_{}'.format(time.time()))
        book_key = Book(name=name).put()
        self.response.write(book_key)


class ListHandler(webapp2.RequestHandler):
    def get(self):
        books = Book.query().fetch()
        self.response.write(books)


app = webapp2.WSGIApplication([
    ('/', ListHandler),
    ('/create', CreateHandler),

], debug=True)

 

in file local_datastore.py I created similar functions, I'm also setting as environmental variable Datastore emulator host and port.

import os

# this can be set also in shell as
# export DATASTORE_EMULATOR_HOST=localhost:8081

os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8081'


from google.cloud import datastore

client = datastore.Client()


def list_books():
    """Fetch all Book entities in Datastore and return as list"""
    res = client.query(kind='Book').fetch()
    return list(res)


def create_book(name):
    """Create simple entity and store in Datastore"""
    key = client.key('Book')
    key = client.allocate_ids(key, 1)

    entity = datastore.Entity(key=key[0])
    entity.update({'name': name})
    client.put(entity)


if __name__ == '__main__':
    create_book('local')
    books = list_books()
    for b in books:
        print b

 

So when I execute local_datastore.py (I create one instance and fetch it) I get:

<Entity(u'Book', 5770237022568448L) {u'name': 'local'}>

and when I hit url http://localhost:8080/create, I get

Key('Book', 10)

and when I list http://localhost:8080/ I get

Book(key=Key('Book', 10), name=u'gae_1525870175.3')

Book(key=Key('Book', 5770237022568448), name=u'local')

and when I execute just list in script

<Entity(u'Book', 10L) {u'name': u'gae_1525870175.3'}>
<Entity(u'Book', 5770237022568448L) {u'name': 'local'}>

which works as expected, I see objects from both applications. Interesting thing is that GAE is using incrementing ids whereas emulator random, not sure why.

 

There are still few issues, things I didn't figure out:

Per documentation I mentioned at the beginning, it should be possible to connect GAE to running Datastore emulator (not running emulator with dev_appserver), but for me it didn't work for some reason.

In app.yaml application needs to be set in order that dev_appserver knows project id and can correctly create local datastore file. Not sure how it works with "gcloud app deploy" command which doesn't work  when application is set.

Not sure how is it with other languages which are supported in GAE Standard, for Java there is no explicit mention, but it's mentioned that for local development it runs Datastore emulator, Go is in the same situation as Python (currently beta).

I am aware that there are probably other ways how to configure and tweak this, but this one worked for me.

 

blog comments powered by Disqus