Controling eventual AppEngine datastory consistency during testing - google-app-engine

I have an AppEngine app written in Go, and I'm trying to improve my tests.
Part of the tests that I need to run are a series of create, update, delete queries on the same object. However given that the datastore is eventually consistent (these aren't child objects), I am currently stuck using a time.Sleep(time.Second * 5) to give the simulated datastore in the SDK enough time for consistency to propagate.
This results in tests that take a long time to run. How can I force something more like strong consistency for tests without rewriting my code to use ancestor queries?

Have a look at the dev_server arguments. You will see there is an option for setting the consistency policy.
--datastore_consistency_policy {consistent,random,time}
the policy to apply when deciding whether a datastore
write should appear in global queries (default: time)
Notice the default is time, you want consistent

It's been a while, but the method that I found that works well is to call the context as follows:
c, err := aetest.NewContext(&aetest.Options{StronglyConsistentDatastore: true})

Related

Can we stop the perf test in gatling automatically when we reach a certain limit of 504s?

Is it possible to stop the performance tests automatically when you reach a certain amount of 504s instead of running the tests fully? Is there any options available in Gatling to achieve this?
I don't think this is possible - gatling users are (by design) not aware of results from others. Likewise simulation level assertions are not checked until after all the users are complete.
The best you can do is to have .exitHereIfFailed in your scenario so that individual users don't keep performing their own actions after a 504 or other error
Depending on your scenarios, you might also be able to rework them to improve matters - eg: if you're not dependent on each user having a distinct authorisation etc, you could use fewer users in a simulation with looping inside scenarios. But this will be highly dependant on what you're modelling.

Synchronicity and the Datastore in Google App Engine

I seem to be having a consistency problem with some of my data; i'm writing a unit test to see if a certain model has been placed in the datastore. It fails in the unit test unless I put a 5 second sleep before the return of the storing function.
I've been reading about asynchronous functions in gae, thinking that perhaps I need something along the lines of a promise so that the function will wait before returning until the data has been placed into the datastore. However, all the documentation on asynchronous versions of functions in GAE seem to imply that its non async functions already sort of act like promises in that way.
What does it mean for a function like put() to return? It seems to not mean that the data has been appropriately stored. Is there a way to wait until the data has been stored?
EDIT: My problem wasn't simply dealing with consistency, but that I was unsure of whether the problem was a consistency issue at all, and wanted instead to ask specifically about how the return of call to put() related to what was happening under the hood of GAE.
I think this question is similar to that listed, but is still useful to remain up because it approaches the consistency issue from a different perspective. If other people need to find this information, but aren't entirely sure of the phrasing as I was, or follow a similar train of thought as me, they may be able to reach the information through this question. It's also written more explicitly, with less domain specific terminology.
That being said, I do see the issue in terms of end-goal informational content; I would understand if it's taken down.
https://cloud.google.com/appengine/docs/java/datastore/#Java_Datastore_writes_and_data_visibility
Data writes happen in two stages, commit and apply. Commit records the transactions to a majority of replicas, and apply does two things in parallel: 1) writes the data and 2) writes the indexes.
Your unit test query may be executing on a replica that has a stale version of the data. The write operation returns immediately after the commit phase but the apply phase happens asynchronously. Ancestor queries are guaranteed to be up-to-date, however, so try testing by getting on the object key.

Keeping Consistent Count in Google App Engine

I am looking for suggestions on a very common problem on Google App Engine platform for keeping consistent counters.
I have a task to load the groups of a domain and then create a task for each group to load its group members in a separate task. Now as there are thousands of groups and members there will be too many tasks.
I will be creating one task to get one page of groups and within that task I will be creating multiple tasks for each group to get its members.Now, to know whether I have loaded all groups or not, I have the logic to just check the nextPageToken and then set the flag of groups loading to finished.
However as there will be separate tasks for each group to load members, I need to keep track of all whether all group member tasks have finished or not. Now here I have a problem that various tasks accessing a single count of numGroupMembersFinished, will create concurrency issues and somewhere the count will get corrupted and not return correct data.
My answer is general because your question doesn't have any code or proposed solution since you don't say where you plan to keep that counter.
Many articles on the web cover this. Google for "sharding counters" for a semi-scalable way to count datastore entities quickly in O(1) time.
more importantly look at the memcache api. It has a function to atomically increment/decrement counters stored there. That one is guaranteed to never have concurrency issues however you would still need some way to recover and/or double-check that the memcache entry wasn't evicted, maybe by also keeping the count stored in an entity that you set asynchronously and "get by key" to always get its latest value.
this still isn't 100% bulletproof because the cache could be evicted at the same moment that you have many concurrent attempts to modify it thus your backup datastore entity could miss a "set".
You need to calculate, based on your expected concurrent usage, if those chances to miss an increment/decrement are greater than a comet hitting the earth. Hopefully you wont use it on an air traffic controller.
you could use the MapReduce or Pipeline API:
https://github.com/GoogleCloudPlatform/appengine-mapreduce
https://github.com/GoogleCloudPlatform/appengine-pipelines
allowing you to split your problem into smaller manageable parts whereby the library can handle all of the details of signaling/blocking between tasks, gathering the results, and handing them back to you when it's done
Google I/O 2010 - Data pipelines with Google App Engine:
https://www.youtube.com/watch?v=zSDC_TU7rtc
Google I/O 2011: Large-scale Data Analysis Using the App Engine Pipeline API:
https://www.youtube.com/watch?v=Rsfy_TYA2ZY
Google I/O 2011: App Engine MapReduce:
https://www.youtube.com/watch?v=EIxelKcyCC0
Google I/O 2012 - Building Data Pipelines at Google Scale:
https://www.youtube.com/watch?v=lqQ6VFd3Tnw
Zig Mandel mentioned it, here's the link to Google's own recipe for implementing a counter:
https://cloud.google.com/appengine/articles/sharding_counters
I copy-pasted (renamed some variables, etc...) the configurable sharded counter into my app and it's working great!
I used this tutorial: https://cloud.google.com/appengine/articles/sharding_counters together with hashid library and created this golang library:
https://github.com/janekolszak/go-gae-uid
gen := gaeuid.NewGenerator("Kind", "HASH'S SALT", 11 /*id length*/)
c := appengine.NewContext(r)
id, err = gen.NewID(c)
The same approach should be easy for other languages.

Objectify queries and strange eventual consistency

I'm seeing some strange behavior related to objectify and eventual consistency. I have noticed this behavior while running some integration tests which make HTTP requests to an App Engine Java development server.
As I have wanted those tests to also work when being run against the real app engine environment, they are dealing with eventual consistency by repeating requests which return results based on eventually consistent queries.
I previously had accidentally the ObjectifyFilter in the wrong location in web.xml, so that the ObjectifyFilter would not run. Now that I moved it to the start of the filter chain, so that it actually runs, all my queries seem to always return consistent results! No more eventual consistency, that is!
For example one test does the following:
Request which adds a user with some username
Request which tries to authorize user with username and password. This will make a global query for users with given username, and the query should be eventually consistent, but it always finds the user entity.
I have no clue what is happening.
More info:
I have checked that ofy().toString returns a different value for each request.
I'm using -Ddatastore.default_high_rep_job_policy_unapplied_job_pct=50
Appengine SDK version 1.8.6
I'm making all writes inside transactions
Disable eventual consistency in your tests. Adding retries and sleeps does not change the logic of your code, it just complicates testing. There's no point in trying to test around eventually consistent behavior; just be aware that it exists.
I don't know the answer to your specific question because it's really about the specific behavior of the test harness. Re-read the unit testing guide closely; unapplied jobs are applied at odd points like the second time a query is run. It's only a very rough approximation of the eventually consistent behavior of the server environment.

writing then reading entity does not fetch entity from datastore

I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,
The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.

Resources