Google Task Queues JSON API Transactions - google-app-engine

Is it possible to put a record to Google Cloud Datastore and a task to the TaskQueue atomically outside of App Engine using their REST APIs?
Basically achieve this (below) but in Container Engine and without using the high-level App Engine-specific APIs.
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Queue queue = QueueFactory.getDefaultQueue();
try {
Transaction txn = ds.beginTransaction();
// ...
queue.add(TaskOptions.Builder.withUrl("/path/to/my/worker"));
// ...
txn.commit();
} catch (DatastoreFailureException e) {
}

This is not currently possible using the Cloud Datastore API.

Related

Cloud Endpoints and App Engine

I've just started on Google Cloud and I'm creating an iOS app to interact with Google Cloud services via a mobile backend. I'm using Python to write the backend for App Engine. I've gone through the tutorials in creating an API based on endpoints - but I have a question.
Do I have to create a Cloud Endpoints API, and then an app on App Engine? Basically, I want to be able to register accounts on my iOS app, call an API which then makes use of Google Datastore to store the account details. From looking at the tutorials (both the cloud endpoints one and then the guestbook one), am I meant to expose Google Datastore, cloud storage etc. within the endpoints api? Or does that link into another app where that is all done?
Sorry if this sounds a bit silly, but I just want to make sure!
Thanks in advance.
In a nutshell, your Cloud Endpoints API is your application. Some of the documentation regarding Cloud Endpoints can be a bit confusing (or vague), but on the server side it's essentially a bunch of Python decorators or Java annotations that allow you to expose your application logic as a REST API.
I find the Java implementation of Cloud Endpoints more intuitive than the Python one, which requires a bit more work to (de-)serialise your objects. You could look at endpoints_proto_datastore.ndb.EndpointsModel which might take some of the boilerplate stuff out of the equation (defining messages).
Essentially, when you write your API, each endpoint maps to a python function. Inside that function you can do what you like, but typically it will be either:
Deserialise your POSTed JSON, validate it, and write some entities to Datastore (or Cloud SQL, BigTable, wherever).
Read one or more entities from Datastore and serialize them to JSON and return them to the client.
For example, you might define your API (the whole collection of endpoint functions) as
#endpoints.api(name='cafeApi', version='v1', description='Cafe API', audiences=[endpoints.API_EXPLORER_CLIENT_ID])
class CafeApi(remote.Service):
# endpoints here
For example, you might have an endpoint to get nearby cafes:
#endpoints.method(GEO_RESOURCE, CafeListResponse, path='cafes/nearby', http_method='GET', name='cafes.nearby')
def get_nearby_cafes(self, request):
"""Get cafes close to specified lat,long"""
cafes = list()
for c in search.get_nearby_cafes(request.lat, request.lon):
cafes.append(c.response_message())
return CafeListResponse(cafes=cafes)
A couple of things to highlight here. With the Python Endpoints implementation, you need to define your resource and message classes - these are used to encapsulate request data and response bodies.
So, in the above example, GEO_RESOURCE encapsulates the fields required to make a GeoPoint (so we can search by location using Search API, but you might just search Datastore for Cafes with a 5-star rating):
GEO_RESOURCE = endpoints.ResourceContainer(
message_types.VoidMessage,
lat=messages.FloatField(1, required=True),
lon=messages.FloatField(2, required=True)
)
and the CafeListResponse would just encapsulate a list of CafeResponse objects (with Cloud Endpoints you return a single object):
class CafeListResponse(messages.Message):
locations = messages.MessageField(CafeResponse, 1, required=False, repeated=True)
where the CafeResponse is the message that defines how you want your objects (typically Datastore entities) serialised by your API. e.g.,
class LocationResponse(messages.Message):
id = messages.StringField(1, required=False)
coordinates = messages.MessageField(GeoMessage, 3, required=True)
name = messages.StringField(4, required=False)
With that endpoint signature, you can access it via an HTTP GET at /cafeApi/v1/cafes/nearby?lat=...&lon=... or via, say, the Javascript API client with `cafeApi.cafes.nearby(...).
Personally, I found Flask a bit more flexible with working with Python to create a REST API.

memcache and a custom DoFn with dataflow

I'm trying to use google memcache with dataflow. I'd like essentially like to transform data into memcache. Is it possible to use the google memcache api inside of dataflow?
I get the following error:
java.util.concurrent.ExecutionException: com.google.apphosting.api.ApiProxy$CallNotFoundException: The API package 'memcache' or call 'Set()' was not found. com.google.appengine.api.utils.FutureWrapper.setExceptionResult(FutureWrapper.java:65)
This is the line of code:
AsyncMemcacheService asyncCache = MemcacheServiceFactory.getAsyncMemcacheService("namespace");
asyncCache.put("key", "value", Expiration.byDeltaSeconds(100000)).get();
I think memcache is part of App Engine and not directly accessible outside of App Engine. As a result you won't be able to access it from Dataflow directly. What you could do is create an App Engine service that acted as a proxy and send requests to that from Dataflow.

Google Cloud Endpoint performance

What is the performance of Google Cloud Endpoints? In my case a large blob is being transferred. Size is anywhere from 1MB to 8MB. It seems to take about 5 to 10 minutes with my broadband speed being about 1Mb upload.
Note this is being done from a Java client calling an endpoint. The object being transferred looks like this:
public class Item
{
String type;
byte[] data;
}
On the java client side, the code looks like this:
Item item = new Item( type, s );
MyItem.Builder builder = new MyItem.Builder( new NetHttpTransport(), new GsonFactory(), null );
service = builder.build();
PutItem putItem = service.putItem( item );
putItem.execute();
Why does it take so long to send one of these up to an endpoint? Is it the JSON parsing that is slowing it down? Any ideas on how to speed this up?
Endpoints is just a wrapper around HTTP requests made to Java servlets (you mention java on client, so I'll make the assumption of java on the server).
This will introduce a very small, fixed, overhead, but the transfer speed of a large blob should be no different whether you are using endpoints or not.
As noted by Gilberto, you should probably consider using Google Cloud Storage (GCS api is slowly replacing the blobstore API). You can use it to upload directly to storage and remove the burden on your GAE app.

Remote API, Objectify and the DevServer don't like transactions?

I am using objectify 4 to write to the HRD datastore. Everything works fine in unit tests and running the application in devserver or production.
But when I try connect using the REMOTE API to the devserver datastore, an error is thrown when the code starts a XG transaction. While connecting with the Remote API, it seems to think that HRD is not enabled.
This is how I connect ...
public static void main(String[] args) {
RemoteApiOptions options = new RemoteApiOptions().server("localhost", 8888).credentials("foo", "bar");
//options = options.
RemoteApiInstaller installer = new RemoteApiInstaller();
StoredUser storedUser = null;
try {
installer.install(options);
ObjectifyInitializer.register();
storedUser = new StoredUserDao().loadStoredUser(<KEY>);
log.info("found user : " + storedUser.getEmail());
// !!! ERROR !!!
new SomeOtherDao().doSomeDataManipulationInTransaction();
} catch (Throwable e) {
e.printStackTrace();
} finally {
ObjectifyFilter.complete();
installer.uninstall();
}
}
When new SomeOtherDao().doSomeDataManipulationInTransaction() starts a transactions on multiple entity groups I get the error thrown :
transactions on multiple entity groups only allowed in High Replication applications
How can I tell the remote api that this is a HRD environment ?
If your application is using the High Replication Datastore, add an
explicit s~ prefix (or e~ prefix if your application is located in the
European Union) to the app id
For Java version, add this prefix in the application tag in the appengine-web.xml and then deploy the version where you have activated the remote_api servlet
Example
<application>myappid</application>
become
<application>s~myappid</application>
Source: https://developers.google.com/appengine/docs/python/tools/uploadingdata#Python_Setting_up_remote_api
I had 'unapplied job percentage' set to 0 and transactions using the remote api failed as if the devserver was running with Master/Slave and not HRD. Raising the 'unapplied job percentage' above zero fixed the problem.

Load data from Google Cloud Storage to BigQuery using Java

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?
What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.
I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);
job.setConfiguration(config);
// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);
// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);
List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);
// Also set custom delimiter or header rows to skip here....
// [not shown].
Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute().getJobReference();
// ... see rest of codelab for waiting for job to complete.
For more information on the load configuration object, see the javadoc here.

Resources