We have started to get a lot of DatastoreTimeoutException lately for this basic query:
select id from Later where at < '2013-07-04' limit 500 (Some Pseudo SQL)
In code it looks like:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("Later").setKeysOnly();
Filter f = new FilterPredicate("at",FilterOperator.LESS_THAN, new Date());
query.setFilter(f);
query.addSort("at", SortDirection.DESCENDING);
PreparedQuery pq = datastore.prepare(query);
return pq.asList(FetchOptions.Builder.withLimit(500));
The table have aprox. 1.5 million entities. Is this normal datastore behavior?
UPDATE: If we remove the filter it is working better. Of course we need the filter so thats not an solution in the long run.
Change to using asIterator instead of asList.
From the docs:
When iterating through the results of a query using the
PreparedQuery.asIterable() and PreparedQuery.asIterator() methods, the
Datastore retrieves the results in batches. By default each batch
contains 20 results, but you can change this value using
FetchOptions.chunkSize(). You can continue iterating through query
results until all are returned or the request times out.
Related
Problem
Running a datastore query with or without FetchOptions.Builder.withLimit(100) takes the same execution time! Why is that? Isn't the limit method intended to reduce the time to retrieve results!?
Test setup
I am locally testing the execution time of some datastore queries with Google's App Engine. I am using the Google Cloud SDK Standard Environment with the App Engine SDK 1.9.59.
For the test, I created an example entity with 5 indexed properties and 5 unindexed properties. I filled the datastore with 50.000 entries of a test entity. I run the following method to retrieve 100 of this entities by utilizing the withLimit() method.
public List<Long> getTestIds() {
List<Long> ids = new ArrayList<>();
FetchOptions fetchOptions = FetchOptions.Builder.withLimit(100);
Query q = new Query("test_kind").setKeysOnly();
for (Entity entity : datastore.prepare(q).asIterable(fetchOptions)) {
ids.add(entity.getKey().getId());
}
return ids;
}
I measure the time before and after calling this method:
long start = System.currentTimeMillis();
int size = getTestIds().size();
long end = System.currentTimeMillis();
log.info("time: " + (end - start) + " results: " + size);
I log the execution time and the number of returned results.
Results
When I do not use the withLimit() FetchOptions for the query, I get the expected 50.000 results in about 1740 ms. Nothing surprising here.
If I run the code as displayed above and use withLimit(100) I get the expected 100 results. However, the query runs about the same 1740 ms!
I tested with different numbers of datastore entries and different limits. Every time the queries with or without withLimit(100) took the same time.
Question
Why is the query still fetching all entities? I am sure the query is not supposed to get all entities even though the limit is set to 100 right? What am I missing? Is there some datastore configuration for that? After testing and searching the web for 4 days I still can't find the problem.
FWIW, you shouldn't expect meaningful results from datastore performance tests performed locally, using either the development server or the datastore emulator - they're just emulators, they don't have the same performance (or even the 100% equivalent functionality) as the real datastore.
See for example Datastore fetch VS fetch(keys_only=True) then get_multi (including comments)
I'm running a Kind based Query using Java API on GAE.
Here is the sample code:
DatastoreService dataStore = DatastoreServiceFactory.getDatastoreService();
Filter value1Filter = new FilterPredicate(PROPERTY_1, FilterOperator.EQUAL, value1);
Filter value2Filter = new FilterPredicate(PROPERTY_2, FilterOperator.EQUAL, value2);
Filter myFilter = CompositeFilterOperator.and(value1Filter, value2Filter);
Query findQuery = new Query("MyKIND").setFilter(myFilter);
myEntity = dataStore.prepare(findQuery).asSingleEntity();
PROPERTY_1 and PROPERTY_2 are indexed properties. This query works some times and now fails consistently for certain values. Let's say these values are value1 and value2 shown above.
However, if I run the same query using SELECT in the datastore viewer with same values, value1 and value2, it works and shows the result. I earlier suspected this to be a result of eventual consistency. However, the indexed values have been written long back (months earlier) for them to be replicated across the other instances.
Is there a way to correct this situation? Unfortunately, I don't have the key of the entity to query with.
I'm interested in migrating from JDO queries to Datastore queries to make use of the AsyncDatastore API.
However, I'm unable to make the following query work in Datastore queries:
//JDO query (working correctly)
PersistenceManager pm = PMF.get().getPersistenceManager();
Query query = pm.newQuery("SELECT FROM "
+ Tasks.class.getName()
+ " WHERE archivado==false & arrayUsers=="
+ user.getId()
+ " & taskDate != null & taskDate > best_before_limit "
+ "PARAMETERS Date best_before_limit "
+ "import java.util.Date");
List <Tasks> results= (List<Tasks>) pm.newQuery(query).execute(new Date());
//Datastore query (returning zero entities)
AsyncDatastoreService datastore = DatastoreServiceFactory.getAsyncDatastoreService();
com.google.appengine.api.datastore.Query query = new com.google.appengine.api.datastore.Query("Tasks");
Filter userFilter = new FilterPredicate("arrayUsers", FilterOperator.EQUAL,user.getId());
Filter filterPendingTasks = new FilterPredicate("taskDate", FilterOperator.LESS_THAN_OR_EQUAL , new Date());
Filter completeFilter = CompositeFilterOperator.and(filterPendingTasks,userFilter);
query.setFilter(completeFilter);
List<Entity> results = datastore.prepare(query).asList(FetchOptions.Builder.withDefaults());
Apart from the fact that I have to build my Task objects out of the Entities resulting from the query, these should be the same.
The problem is that the query must look up if the passed user id (user.getId()) is present in the array (arrayUsers). JDO does this without any issues, but no joy with Datastore queries so far.
Any ideas about what is wrong with my code?
As was pointed out by the users commenting, you use different properties in your datastore query. If you have such a query and you don't have EXACTLY the index for this, it won't work. Without seeing what indexes you have, I say this query looks good to me, so either you don't have data that returns there (unlikely, since your JDO query does it), or you're missing a filter.
In general, in datastore when querying for one of the values to equal something specific, you indeed would use something like this :
new Query("Widget").setFilter(new FilterPredicate("x", FilterOperator.EQUAL, 1))
Since you're using an equality filter, you won't get funky results (as you can see in the docs (look for "Properties with multiple values can behave in surprising ways")).
Currently, there is no way to do this using Datastore Queries due to the lack of a "CONTAINS" operator or similar.
The alternative to keep using JDO (at least for this kind of queries).
NB: I am using db (not ndb) here. I know ndb has a count_async() but I am hoping for a solution that does not involve migrating over to ndb.
Occasionally I need an accurate count of the number of entities that match a query. With db this is simply:
q = some Query with filters
num_entities = q.count(limit=None)
It costs a small db operation per entity but it gets me the info I need. The problem is that I often need to do a few of these in the same request and it would be nice to do them asynchronously but I don't see support for that in the db library.
I was thinking I could use run(keys_only=True, batch_size=1000) as it runs the query asynchronously and returns an iterator. I could first call run() on each query and then later count the results from each iterator. It costs the same as count() however run() has proven to be slower in testing (perhaps because it actually returns results) and in fact it seems that batch_size is limited at 300 regardless of how high I set it which requires more RPCs to do a count of thousands of entities than the count() method does.
My test code for run() looks like this:
queries = list of Queries with filters
iters = []
for q in queries:
iters.append( q.run(keys_only=True, batch_size=1000) )
for iter in iters:
count_entities_from(iter)
No, there's no equivalent in db. The whole point of ndb is that it adds these sort of capabilities which were missing in db.
I need to get a count of records for a particular model on App Engine. How does one do it?
I bulk uploaded more than 4000 records but modelname.count() only shows me 1000.
You should use Datastore Statistics:
Query query = new Query("__Stat_Kind__");
query.addFilter("kind_name", FilterOperator.EQUAL, kind);
Entity entityStat = datastore.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");
Please note that the above does not work on the development Datastore but it works in production (when published).
I see that this is an old post, but I'm adding an answer in benefit of others searching for the same thing.
As of release 1.3.6, there is no longer a cap of 1,000 on count queries. Thus you can do the following to get a count beyond 1,000:
count = modelname.all(keys_only=True).count()
This will count all of your entities, which could be rather slow if you have a large number of entities. As a result, you should consider calling count() with some limit specified:
count = modelname.all(keys_only=True).count(some_upper_bound_suitable_for_you)
This is a very old thread, but just in case it helps other people looking at it, there are 3 ways to accomplish this:
Accessing the Datastore statistics
Keeping a counter in the datastore
Sharding counters
Each one of these methods is explained in this link.
count = modelname.all(keys_only=True).count(some_upper_limit)
Just to add on to the earlier post by dar, this 'some_upper_limit' has to be specified. If not, the default count will still be a maximum of 1000.
In GAE a count will always make you page through the results when you have more than 1000 objects. The easiest way to deal with this problem is to add a counter property to your model or to a different counters table and update it every time you create a new object.
I still hit the 1000 limit with count so adapted dar's code (mine's a bit quick and dirty):
class GetCount(webapp.RequestHandler):
def get(self):
query = modelname.all(keys_only=True)
i = 0
while True:
result = query.fetch(1000)
i = i + len(result)
if len(result) < 1000:
break
cursor = query.cursor()
query.with_cursor(cursor)
self.response.out.write('<p>Count: '+str(i)+'</p>')
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("__Stat_Kind__");
Query.Filter eqf = new Query.FilterPredicate("kind_name",
Query.FilterOperator.EQUAL,
"SomeEntity");
query.setFilter(eqf);
Entity entityStat = ds.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");
Another solution is using a key only query and get the size of the iterator. The computing time with this solution will rise linearly with the amount of entrys:
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactorykeyFactory = datastore.newKeyFactory().setKind("MyKind");
Query query = Query.newKeyQueryBuilder().setKind("MyKind").build();
int count = Iterators.size(datastore.run(query));