Bulk get using objectify? - google-app-engine

I am using the Objectify Loader ids method to batch load the ids from the google datastore. But from the Google Cloud Trace logs, I can see that this requests are batched in ids of size 10.
Is there any way to increase this size?
However, if I change the consistency level of objectify to Eventual and then do the batch get, then I only see one get call in trace logs.
Can someone explain this behavior and how to increase the batch get limit?
EDIT: Here is the query I am using:
ofy().load().type(ClassName.class).ids(list of ids)

Related

How do I improve my GCP datastore query performance?

I have data in Google datastore. The thing I'm developing makes two http requests to my node.js backend. They both take longer than I'd like. I'm only focusing on trying to improve the response time for one of them (the heaviest request). I modified the node.js code on the backend to time the duration of the GCP datastore query. I've compiled data for 40 calls to the database. I can compile more if you'd like. I put all of the data into an excel spreadsheet, but I've also created images of the information held within, to include in this stackoverflow post. It is included further below.
The size of the payload from the database is only: 1.5 KB according to chrome debugger.
The datastore calls were made once every 3 minutes.
Here is the function that makes the datastore call:
async function listReviews(brand, sku, res) {
const kind = 'ScoredReviewFragment'
const query = datastore.createQuery(brand, kind)
if (brand != 'demo')
query.filter('sku', sku)
const [reviews] = await datastore.runQuery(query)
res.setHeader('Access-Control-Allow-Origin', '*')
res.json(reviews)
}
Personally, I'd love to have the highest total time taken for the datastore call to be sub 100ms. As far as an average database call time, I'd love for that to be 50ms or lower. Is this possible with Google datastore? Am I hoping for too much?
Inside of google chrome debugger, I have looked at these two queries. >98%, or almost 90%, depending on how you measure it, of the time spent from the perspective of the browser is waiting for a response of the server. The download time is almost nothing. Please see included hover-tooltip for the chrome debugger information on the particular database call in question.
The data I compiled in the excel spreadsheet shows the following:
If more information is needed to debug this problem please let me know and I'd be happy to provide it. Help me understand: How do I improve the GCP datastore query response time? Or is there a better way to properly debug the datastore queries?
You can use the best practices listed here as a quick reference as keys-only 1 query is a potentially fastest way to access query results and reduce the latency. However, there are some other good options that you can apply to your system based on your needs. You can still use ‘projection query’2 or replace your offsets to ‘cursor’3, you can find more detailed kind of queries in the following documentation4.
Other best practices5 that should help you to improve the efficiency of your application such as use batch operation performing multiple Cloud Datastore call6.
Be sure to use correct region for your database location. To locate the latency correctly you need to trace the request and measure latency inside your server.
Also if this is mission critical there are other DBs that provide better latency like MemoryStore and BigTable. These are different products with different nature.

Is google Datastore recommended for storing logs?

I am investigating what might be the best infrastructure for storing log files from many clients.
Google App engine offers a nice solution that doesn't make the process a IT nightmare: Load balancing, sharding, server, user authentication - all in once place with almost zero configuration.
However, I wonder if the Datastore model is the right for storing logs. Each log entry should be saved as a single document, where each clients uploads its document on a daily basis and can consists of 100K of log entries each day.
Plus, there are some limitation and questions that can break the requirements:
60 seconds timeout on bulk transaction - How many log entries per second will I be able to insert? If 100K won't fit into the 60 seconds frame - this will affect the design and the work that needs to be put into the server.
5 inserts per entity per seconds - Is a transaction considered a single insert?
Post analysis - text search, searching for similar log entries cross clients. How flexible and efficient is Datastore with these queries?
Real time data fetch - getting all the recent log entries.
The other option is to deploy an elasticsearch cluster on goole compute and write the server on our own which fetches data from ES.
Thanks!
Bad idea to use datastore and even worse if you use entity groups with parent/child as a comment mentions when comparing performance.
Those numbers do not apply but datastore is not at all designed for what you want.
bigquery is what you want. its designed for this specially if you later want to analyze the logs in a sql-like fashion. Any more detail requires that you ask a specific question as it seems you havent read much about either service.
I do not agree, Data Store is a totally fully managed no sql document store database, you can store the logs you want in this type of storage and you can query directly in datastore, the benefits of using this instead of BigQuery is the schemaless part, in BigQuery you have to define the schema before inserting the logs, this is not necessary if you use DataStore, think of DataStore as a MongoDB log analysis use case in Google Cloud.

GAE Golang - queries not returning data right after it is saved

I am writing a Google App Engine Go application and I'm having problem in testing some functionality. Here is some sample code. The problem is as follows:
I create a new item and save it to datastore
I make a search query for that item immediately after (for example, getting all items in the namespace)
The item is not there
If I query for the item later (say, in subsequent pages), the item is found normally. I understand this behaviour might be intentional for apps deployed on GAE, but I would expect the local datastore to be instantly good for queries.
Is there any way I can force the datastore to consolidate and be good for such queries?
This is called eventual consistency, and it's a feature of App Engine's Datastore.
You can use a get method instead of a query to test if an entity has been saved.
In Java we can set the desired behavior of a local datastore by changing a run parameter:
By default, the local datastore is configured to simulate the
consistency model of the High Replication Datastore, with the
percentage of datastore writes that are not immediately visible in
global queries set to 10%.
To adjust this level of consistency, set the
datastore.default_high_rep_job_policy_unapplied_job_pct system
property with a value corresponding to the amount of eventual
consistency you want your application to see.
I could not find something similar for Go.

What happens when a function in GAE app exceeds quota yet is unfinished?

I ran a function that loads a lot of data to GAE using db.put(). However, it raised over quota exception when I hit my write quota. When I rechecked the data by running the app, the data returned was indeed incomplete. So when the quota is available again, I ran the data loader again from some index (so I don't write the same data again and again).
Here is the problem: after I ran the data loader manually (again and again), it seems all the data that I need for the app to work is already there, although the first time I load the data there was over quota exception.
So, my question specifically is: does function that ran over quota in GAE being queued until the quota is available again or does it being terminated?
Background of project: my friend and I are building a search system. We need the database of the search system, thus we load the database to GAE.
If you hit write quota while adding many values to the datastore, the remaining values will not be saved anywhere and you will have to try again. Datastore admin shows the number of entities based on datastore statistics, but this will have a delay in being updated. Though officially it is mentioned as upto 24 hours, it can be even more as mentioned in this previous post. So for finding if recently uploaded entities are present in the datastore, we cannot rely on datastore admin and need to query and find if a particular entity you added recently is present. Or else you can read the entity key value that is returned for each db.put() and use the last returned value to see which is the last successfully stored entity.

How to do bulk deletes from Google App Engien Datastore

I need to delete bulk records from datastore, I went through all the previous links but all just talked about fetching the entities from datastore and then deleting them one by one , problem in my case is that I have got around 80K entities and read gets timed out whenever i try to do it using datastore db.delete() method .
Does any one here by any chance know a method more close to SQL to perform a bulk delete ?
You can use Task Queue + DB Cursor for deletion.
Task can be executed up to 10 minutes, it's probably enought time to delete all entities. But if it takes longer, you can get current cursor position, and call task itself one more time, with this cursor as paramtere, and start processing from last position.
Define what API you're using. JDO? GAE? JPA? You refer to some db.delete, yet tag this as JDO; they are not the same. JDO obviously provides pm.deletePersistentAll(), and if wanting more than that you can make use of Google Mapper API
You can use Cloud Dataflow to bulk delete entities in Datastore. You can use a GQL query to select the entities to delete:
https://cloud.google.com/datastore/docs/bulk-delete

Resources