Limit is not working with GAE Search API - google-app-engine

My Code:
SortExpression sortExpr = SortExpression.newBuilder()
.setExpression(locExpr)
.setDirection(SortExpression.SortDirection.ASCENDING)
.setDefaultValueNumeric(distanceInMeters + 1)
.build();
Query searchQuery = Query.newBuilder().setOptions(QueryOptions.newBuilder()
.setSortOptions(SortOptions.newBuilder().addSortExpression(sortExpr))
.setLimit(10)) // this limit is not working, Problematic line
.build(query);
Results<ScoredDocument> results = getIndex().search(searchQuery);
1) It is returning 1020 records instead of 10 (as I set in Limit). What's wrong in the above code?
2) If I remove limit condition it returns 1020 records instead 2000(i.e all records), why it's not returning all 2000 records? Is there any fetching limit of Search API?

That looks right to me. Please file a bug report on the external issue tracker, and we will investigate.

Related

DatastoreTimeoutException in app engine for basic query

We have started to get a lot of DatastoreTimeoutException lately for this basic query:
select id from Later where at < '2013-07-04' limit 500 (Some Pseudo SQL)
In code it looks like:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("Later").setKeysOnly();
Filter f = new FilterPredicate("at",FilterOperator.LESS_THAN, new Date());
query.setFilter(f);
query.addSort("at", SortDirection.DESCENDING);
PreparedQuery pq = datastore.prepare(query);
return pq.asList(FetchOptions.Builder.withLimit(500));
The table have aprox. 1.5 million entities. Is this normal datastore behavior?
UPDATE: If we remove the filter it is working better. Of course we need the filter so thats not an solution in the long run.
Change to using asIterator instead of asList.
From the docs:
When iterating through the results of a query using the
PreparedQuery.asIterable() and PreparedQuery.asIterator() methods, the
Datastore retrieves the results in batches. By default each batch
contains 20 results, but you can change this value using
FetchOptions.chunkSize(). You can continue iterating through query
results until all are returned or the request times out.

SalesForce limit on SOQL?

Using the PHP library for salesforce I am running:
SELECT ... FROM Account LIMIT 100
But the LIMIT is always capped at 25 records. I am selecting many fields (60 fields). Is this a concrete limit?
The skeleton code:
$client = new SforceEnterpriseClient();
$client->createConnection("EnterpriseSandboxWSDL.xml");
$client->login(USERNAME, PASSWORD.SECURITY_TOKEN);
$query = "SELECT ... FROM Account LIMIT 100";
$response = $client->query($query);
foreach ($response->records as $record) {
// ... there's only 25 records
}
Here is my check list
1) Make sure you have more than 25 records
2) after your first loop do queryMore to check if there are more records
3) make sure batchSize is not set to 25
I don't use PHP library for Salesforce. But I can assume that before doing
SELECT ... FROM Account LIMIT 100
some more select queries have been performed. If you don't code them that maybe PHP library does it for you ;-)
The Salesforce soap API query method will only return a finite number of rows. There are a couple of reasons why it may be returning less than your defined limit.
The QueryOptions header batchSize has been set to 25. If this is the case, you could try adjusting it. If it hasn't been explicitly set, you could try setting it to a larger value.
When the SOQL statement selects a number of large fields (such as two or more custom fields of type long text) then Salesforce may return fewer records than defined in the batchSize. The reduction in batch size also occurs when dealing with base64 encoded fields, such as the Attachment.Body. If this is the case they you can just use queryMore with the QueryLocator from the first response.
In both cases, check the done and size properties of the done and size properties of the QueryResult to determine if you need to use queryMore and the total number of rows that match the SOQL query.
To avoid governor limits it might be better to add all the records to a list then do everything you need to do to the records in the list. After you done just update your database using: update listName;

How to get all results from solr query?

I executed some query like "Address:Jack*". It show numFound = 5214 and display 100 documents in results page(I changed default display results from 10 to 100).
How can I get all documents.
I remember myself doing &rows=2147483647
2,147,483,647 is integer's maximum value. I recall using a number bigger than that once and having a NumberFormatException because it couldn't be parsed into an int. I don't know if they use Long nowadays, but 2 billion rows is normally more than enough.
Small note:
Be careful if you are planning to do this in production. If you do a query like * : * and your index is big, you could transferring a couple of gigabytes in that query.
If you know you won't have many docs, go ahead and use integer's max value.
On the other hand, if you are doing a one-time script and just need to dump all results (for example document ID's) then this approach is valid, if you don't mind waiting 3-5 minutes for a query to return.
Don't use &rows=2147483647
Don't use Integer.MAX_VALUE(2147483647) as value of rows in production. This will heavily slow down your query even if you have a small resultset, because solr preallocates a queue in this size. see https://issues.apache.org/jira/browse/SOLR-7580
I strongly suggest to use Exporting Result Sets
It’s possible to export fully sorted result sets using a special rank query parser and response writer specifically designed to work together to handle scenarios that involve sorting and exporting millions of records.
Or I suggest to use Deep Paging.
Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
This is the kind of thing that could bring your Solr server to their knees.
For typical applications displaying search results to a human user,
this tends to not be much of an issue since most users don’t care
about drilling down past the first handful of pages of search results
— but for automated systems that want to crunch data about all of the
documents matching a query, it can be seriously prohibitive.
This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.
Now we are talking of Deep Paging.
I’ll suggest to read this amazing post:
https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
And take a look at this document page:
https://solr.apache.org/guide/pagination-of-results.html
And here is an example that try to explain how to paginate using the cursors.
SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc); // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrClient.query(solrQuery);
String nextCursorMark = rsp.getNextCursorMark();
for (SolrDocument d : rsp.getResults()) {
...
}
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Returning all the results is never a good option as It would be very slow in performance.
Can you mention your use case ?
Also, Solr rows parameter helps you to tune the number of the results to be returned.
However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
So you would need to set a high value for all the results to be returned.
What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.
int lastResult=0; //this is for processing the future batch
String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity
SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.
SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement
Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.
The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.
This will help you execute the next batch starting with last result from previous batch.
Hope this helps. Shoot up a comment below if you need any clarification.
For selecting all documents in dismax/edismax via Solarium php client, the normal query syntax : does not work. To select all documents set the default query value in solarium query to empty string. This is required as the default query in Solarium is :. Also set the alternative query to :. Dismax/eDismax normal query syntax does not support :, but the alternative query syntax does.
For more details following book can be referred
http://www.packtpub.com/apache-solr-php-integration/book
As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query.
I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.
The way I dealt with the problem is by running the query twice:
// Start with your (usually small) default page size
solrQuery.setRows(50);
QueryResponse response = solrResponse(query);
if (response.getResults().getNumFound() > 50) {
solrQuery.setRows(response.getResults().getNumFound());
response = solrResponse(query);
}
It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.
query.setRows(Integer.MAX_VALUE);
works for me!!

Issue related to Google App Engine query within a date range

I am concerned about querying entities this way
created_start = datetime.today()
created_start = created_start - timedelta(hours=1)
created_end = datetime.now()
a = Message.all()
a.filter('created >=',created_start)
a.filter('created <',created_end)
Due to the 1000 query results restriction. So two questions:
Will this work if .all() returns more that 1000 results? Or to put it in a different way. Will all() return more than a 1000 results incase there were more?
Is there a better way to achieve querying for entities between a given data range?
Thank you very much in advance
Your solution is good, since Version 1.3.6, query results are no longer capped at 1000.
You can iterate a entities until exhaustion or fetch chunks of entities using a cursor.

How does one get a count of rows in a Datastore model in Google App Engine?

I need to get a count of records for a particular model on App Engine. How does one do it?
I bulk uploaded more than 4000 records but modelname.count() only shows me 1000.
You should use Datastore Statistics:
Query query = new Query("__Stat_Kind__");
query.addFilter("kind_name", FilterOperator.EQUAL, kind);
Entity entityStat = datastore.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");
Please note that the above does not work on the development Datastore but it works in production (when published).
I see that this is an old post, but I'm adding an answer in benefit of others searching for the same thing.
As of release 1.3.6, there is no longer a cap of 1,000 on count queries. Thus you can do the following to get a count beyond 1,000:
count = modelname.all(keys_only=True).count()
This will count all of your entities, which could be rather slow if you have a large number of entities. As a result, you should consider calling count() with some limit specified:
count = modelname.all(keys_only=True).count(some_upper_bound_suitable_for_you)
This is a very old thread, but just in case it helps other people looking at it, there are 3 ways to accomplish this:
Accessing the Datastore statistics
Keeping a counter in the datastore
Sharding counters
Each one of these methods is explained in this link.
count = modelname.all(keys_only=True).count(some_upper_limit)
Just to add on to the earlier post by dar, this 'some_upper_limit' has to be specified. If not, the default count will still be a maximum of 1000.
In GAE a count will always make you page through the results when you have more than 1000 objects. The easiest way to deal with this problem is to add a counter property to your model or to a different counters table and update it every time you create a new object.
I still hit the 1000 limit with count so adapted dar's code (mine's a bit quick and dirty):
class GetCount(webapp.RequestHandler):
def get(self):
query = modelname.all(keys_only=True)
i = 0
while True:
result = query.fetch(1000)
i = i + len(result)
if len(result) < 1000:
break
cursor = query.cursor()
query.with_cursor(cursor)
self.response.out.write('<p>Count: '+str(i)+'</p>')
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("__Stat_Kind__");
Query.Filter eqf = new Query.FilterPredicate("kind_name",
Query.FilterOperator.EQUAL,
"SomeEntity");
query.setFilter(eqf);
Entity entityStat = ds.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");
Another solution is using a key only query and get the size of the iterator. The computing time with this solution will rise linearly with the amount of entrys:
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactorykeyFactory = datastore.newKeyFactory().setKind("MyKind");
Query query = Query.newKeyQueryBuilder().setKind("MyKind").build();
int count = Iterators.size(datastore.run(query));

Resources