Count of no of record returned, without considering Limit cakePHP 3 - cakephp

I want the no of records available in the database for the current query but without considering the LIMIT.
$this->Orders->find('all')
->where(['order_quantity']=>5)
->LIMIT(5);
Let's consider, I have 50 no of records for this above query. So just want the no of records available for the current query. I can't use 'count()' because of the limit it will always return total no of records available is less than or equal to 5. Is there any solution in cakePHP.

This page in the CakePHP 3 book, explains EXACTLY the answer to your question including how and why it works:
Returning the Total Count of Records
Using a single query object, it is possible to obtain the total number
of rows found for a set of conditions:
$total = $articles->find()->where(['is_active' => true])->count();
The count() method will ignore the limit, offset and page clauses,
thus the following will return the same result:
$total = $articles->find()->where(['is_active' => true])->limit(10)->count();
This is useful when you need to know the total result set size in
advance, without having to construct another Query object. Likewise,
all result formatting and map-reduce routines are ignored when using
the count() method.
Notice the bit about "... will ignore the limit, offset, and page clauses"
So try something like this:
$data = $articles->find()->where(['is_active' => true])->limit(10);
$count = $data->count();

I don't think you are familiar with the use of limit in find. So, I suggest you to study the docs.
The limit in your query means that the query will only display first 5 data even if the query actually has 50 data.
So, in order to get the actual data, you just need to remove the limit and make some changes in your code as follows:
$this->Orders->find('count')->where(['order_quantity' => 5]);

Related

Inconsistent values for getNumberFound() in Search API

I have a full-text search index with 42 documents like in the screenshot below:
When I query the index for "" it returns all the 42 documents correctly (good), but when I use the limit and offset options in the query, the value returned for the total number of matches found (results.getNumberFound()) varies from time to time. It gives me different values for different offsets!! In short, making the same query just with different offset values gives a different value for results.getNumberFound() function!
NOTE: This happens only in production server after I deploy the app. In local server everything
works perfectly (i.e for the same query, the number of total hits found is the same regardless of the offset option value).
Query query = Query.newBuilder()
.setOptions(QueryOptions.newBuilder()
.setLimit(limit)
.setOffset(offset).build())
.build(searchPhrase);
Results<ScoredDocument> results = INDEX.search(query);
LOG.warning( "Phrase:'" + searchPhrase +
"' limit:" + limit +
" offset:" + offset +
" num:" + results.getNumberFound());
Here's a screenshot of the log output:
So is there something wrong I'm doing or it's a bug in the Search API because the weird thing is that the issue only happens in the production server not the local one.
The python docs say
number_found
Returns an approximate number of documents matching the query. QueryOptions defining post-processing of the search results. If the QueryOptions.number_found_accuracy parameter were set to 100, then number_found <= 100 is accurate.
Similiar api components in exist in Java. From your code it appears you haven't set an accuracy. See java QueryOptions https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/QueryOptions
Having said that I have seen many questions/discussions about lack of accuracy on the number of found results.
Surprisingly, this is working as intended (as Tim says).
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/QueryOptions.Builder#setNumberFoundAccuracy(int)
In its default state, the datastore scans the minimal set of data to fulfill the request. The database provides a very rough estimate of match results by multiplying ID range with estimate of matching keys (#keys found that matched / #ids scanned during the query).
For small data sets, set the accuracy value higher (500 or 1000) and call it a day. You can also improve the estimate by making sure key IDs are uniformly distributed and by fetching a higher limit each call (though if you don't need the data, just use the accuracy parameter).
This might not be applicable here but this is a general workaround for larger data sets:
Use num_accuracy == 1000. When queries return an estimate of <1000, you can trust that. When a query returns an estimate of >1000, perform your own estimate using a second query:
Include an extra numeric field with your data, which is a value of a discrete probabilistic event (e.g. #0s in a hash of some randomish data). When you get a large estimate from the first query, repeat your query with the additional constraint (e.g. AND ZERO_COUNT == y), where y is chosen based on the first query's estimate to match <1000 entities, producing an exact count for the second query which you can accurately extrapolate. Since you don't need the results of this data, you can set limit to 1 & num_accuracy == 1000.

SalesForce limit on SOQL?

Using the PHP library for salesforce I am running:
SELECT ... FROM Account LIMIT 100
But the LIMIT is always capped at 25 records. I am selecting many fields (60 fields). Is this a concrete limit?
The skeleton code:
$client = new SforceEnterpriseClient();
$client->createConnection("EnterpriseSandboxWSDL.xml");
$client->login(USERNAME, PASSWORD.SECURITY_TOKEN);
$query = "SELECT ... FROM Account LIMIT 100";
$response = $client->query($query);
foreach ($response->records as $record) {
// ... there's only 25 records
}
Here is my check list
1) Make sure you have more than 25 records
2) after your first loop do queryMore to check if there are more records
3) make sure batchSize is not set to 25
I don't use PHP library for Salesforce. But I can assume that before doing
SELECT ... FROM Account LIMIT 100
some more select queries have been performed. If you don't code them that maybe PHP library does it for you ;-)
The Salesforce soap API query method will only return a finite number of rows. There are a couple of reasons why it may be returning less than your defined limit.
The QueryOptions header batchSize has been set to 25. If this is the case, you could try adjusting it. If it hasn't been explicitly set, you could try setting it to a larger value.
When the SOQL statement selects a number of large fields (such as two or more custom fields of type long text) then Salesforce may return fewer records than defined in the batchSize. The reduction in batch size also occurs when dealing with base64 encoded fields, such as the Attachment.Body. If this is the case they you can just use queryMore with the QueryLocator from the first response.
In both cases, check the done and size properties of the done and size properties of the QueryResult to determine if you need to use queryMore and the total number of rows that match the SOQL query.
To avoid governor limits it might be better to add all the records to a list then do everything you need to do to the records in the list. After you done just update your database using: update listName;

How to get all results from solr query?

I executed some query like "Address:Jack*". It show numFound = 5214 and display 100 documents in results page(I changed default display results from 10 to 100).
How can I get all documents.
I remember myself doing &rows=2147483647
2,147,483,647 is integer's maximum value. I recall using a number bigger than that once and having a NumberFormatException because it couldn't be parsed into an int. I don't know if they use Long nowadays, but 2 billion rows is normally more than enough.
Small note:
Be careful if you are planning to do this in production. If you do a query like * : * and your index is big, you could transferring a couple of gigabytes in that query.
If you know you won't have many docs, go ahead and use integer's max value.
On the other hand, if you are doing a one-time script and just need to dump all results (for example document ID's) then this approach is valid, if you don't mind waiting 3-5 minutes for a query to return.
Don't use &rows=2147483647
Don't use Integer.MAX_VALUE(2147483647) as value of rows in production. This will heavily slow down your query even if you have a small resultset, because solr preallocates a queue in this size. see https://issues.apache.org/jira/browse/SOLR-7580
I strongly suggest to use Exporting Result Sets
It’s possible to export fully sorted result sets using a special rank query parser and response writer specifically designed to work together to handle scenarios that involve sorting and exporting millions of records.
Or I suggest to use Deep Paging.
Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
This is the kind of thing that could bring your Solr server to their knees.
For typical applications displaying search results to a human user,
this tends to not be much of an issue since most users don’t care
about drilling down past the first handful of pages of search results
— but for automated systems that want to crunch data about all of the
documents matching a query, it can be seriously prohibitive.
This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.
Now we are talking of Deep Paging.
I’ll suggest to read this amazing post:
https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
And take a look at this document page:
https://solr.apache.org/guide/pagination-of-results.html
And here is an example that try to explain how to paginate using the cursors.
SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc); // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrClient.query(solrQuery);
String nextCursorMark = rsp.getNextCursorMark();
for (SolrDocument d : rsp.getResults()) {
...
}
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Returning all the results is never a good option as It would be very slow in performance.
Can you mention your use case ?
Also, Solr rows parameter helps you to tune the number of the results to be returned.
However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
So you would need to set a high value for all the results to be returned.
What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.
int lastResult=0; //this is for processing the future batch
String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity
SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.
SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement
Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.
The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.
This will help you execute the next batch starting with last result from previous batch.
Hope this helps. Shoot up a comment below if you need any clarification.
For selecting all documents in dismax/edismax via Solarium php client, the normal query syntax : does not work. To select all documents set the default query value in solarium query to empty string. This is required as the default query in Solarium is :. Also set the alternative query to :. Dismax/eDismax normal query syntax does not support :, but the alternative query syntax does.
For more details following book can be referred
http://www.packtpub.com/apache-solr-php-integration/book
As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query.
I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.
The way I dealt with the problem is by running the query twice:
// Start with your (usually small) default page size
solrQuery.setRows(50);
QueryResponse response = solrResponse(query);
if (response.getResults().getNumFound() > 50) {
solrQuery.setRows(response.getResults().getNumFound());
response = solrResponse(query);
}
It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.
query.setRows(Integer.MAX_VALUE);
works for me!!

Adding a projection to an NHibernate criteria stops it from performing default entity selection

I'm writing an NHibernate criteria that selects data supporting paging. I'm using the COUNT(*) OVER() expression from SQL Server 2005(+) to get hold of the total number of available rows, as suggested by Ayende Rahien. I need that number to be able to calculate how many pages there are in total. The beauty of this solution is that I don't need to execute a second query to get hold of the row count.
However, I can't seem to manage to write a working criteria (Ayende only provides an HQL query).
Here's an SQL query that shows what I want and it works just fine. Note that I intentionally left out the actual paging logic to focus on the problem:
SELECT Items.*, COUNT(*) OVER() AS rowcount
FROM Items
Here's the HQL:
select
item, rowcount()
from
Item item
Note that the rowcount() function is registered in a custom NHibernate dialect and resolves to COUNT(*) OVER() in SQL.
A requirement is that the query is expressed using a criteria. Unfortunately, I don't know how to get it right:
var query = Session
.CreateCriteria<Item>("item")
.SetProjection(
Projections.SqlFunction("rowcount", NHibernateUtil.Int32));
Whenever I add a projection, NHibernate doesn't select item (like it would without a projection), just the rowcount() while I really need both. Also, I can't seem to project item as a whole, only it's properties and I really don't want to list all of them.
I hope someone has a solution to this. Thanks anyway.
I think it is not possible in Criteria, it has some limits.
You could get the id and load items in a subsequent query:
var query = Session
.CreateCriteria<Item>("item")
.SetProjection(Projections.ProjectionList()
.Add(Projections.SqlFunction("rowcount", NHibernateUtil.Int32))
.Add(Projections.Id()));
If you don't like it, use HQL, you can set the maximal number of results there too:
IList<Item> result = Session
.CreateQuery("select item, rowcount() from item where ..." )
.SetMaxResult(100)
.List<Item>();
Use CreateMultiCriteria.
You can execute 2 simple statements with only one hit to the DB that way.
I am wondering why using Criteria is a requirement. Can't you use session.CreateSQLQuery? If you really must do it in one query, I would have suggested pulling back the Item objects and the count, like:
select {item.*}, count(*) over()
from Item {item}
...this way you can get back Item objects from your query, along with the count. If you experience a problem with Hibernate's caching, you can also configure the query spaces (entity/table caches) associated with a native query so that stale query cache entries will be cleared automatically.
If I understand your question properly, I have a solution. I struggled quite a bit with this same problem.
Let me quickly describe the problem I had, to make sure we're on the same page. My problem came down to paging. I want to display 10 records in the UI, but I also want to know the total number of records that matched the filter criteria. I wanted to accomplish this using the NH criteria API, but when adding a projection for row count, my query no longer worked, and I wouldn't get any results (I don't remember the specific error, but it sounds like what you're getting).
Here's my solution (copy & paste from my current production code). Note that "SessionError" is the name of the business entity I'm retrieving paged data for, according to 3 filter criterion: IsDev, IsRead, and IsResolved.
ICriteria crit = CurrentSession.CreateCriteria(typeof (SessionError))
.Add(Restrictions.Eq("WebApp", this));
if (isDev.HasValue)
crit.Add(Restrictions.Eq("IsDev", isDev.Value));
if (isRead.HasValue)
crit.Add(Restrictions.Eq("IsRead", isRead.Value));
if (isResolved.HasValue)
crit.Add(Restrictions.Eq("IsResolved", isResolved.Value));
// Order by most recent
crit.AddOrder(Order.Desc("DateCreated"));
// Copy the ICriteria query to get a row count as well
ICriteria critCount = CriteriaTransformer.Clone(crit)
.SetProjection(Projections.RowCountInt64());
critCount.Orders.Clear();
// NOW add the paging vars to the original query
crit = crit
.SetMaxResults(pageSize)
.SetFirstResult(pageNum_oneBased * pageSize);
// Set up a multi criteria to get your data in a single trip to the database
IMultiCriteria multCrit = CurrentSession.CreateMultiCriteria()
.Add(crit)
.Add(critCount);
// Get the results
IList results = multCrit.List();
List<SessionError> sessionErrors = new List<SessionError>();
foreach (SessionError sessErr in ((IList)results[0]))
sessionErrors.Add(sessErr);
numResults = (long)((IList)results[1])[0];
So I create my base criteria, with optional restrictions. Then I CLONE it, and add a row count projection to the CLONED criteria. Note that I clone it before I add the paging restrictions. Then I set up an IMultiCriteria to contain the original and cloned ICriteria objects, and use the IMultiCriteria to execute both of them. Now I have my paged data from the original ICriteria (and I only dragged the data I need across the wire), and also a raw count of how many actual records matched my criteria (useful for display or creating paging links, or whatever). This strategy has worked well for me. I hope this is helpful.
I would suggest investigating custom result transformer by calling SetResultTransformer() on your session.
Create a formula property in the class mapping:
<property name="TotalRecords" formula="count(*) over()" type="Int32" not-null="true"/>;
IList<...> result = criteria.SetFirstResult(skip).SetMaxResults(take).List<...>();
totalRecords = (result != null && result.Count > 0) ? result[0].TotalRecords : 0;
return result;

What's the best way to count results in GQL?

I figure one way to do a count is like this:
foo = db.GqlQuery("SELECT * FROM bar WHERE baz = 'baz')
my_count = foo.count()
What I don't like is my count will be limited to 1000 max and my query will probably be slow. Anyone out there with a workaround? I have one in mind, but it doesn't feel clean. If only GQL had a real COUNT Function...
You have to flip your thinking when working with a scalable datastore like GAE to do your calculations up front. In this case that means you need to keep counters for each baz and increment them whenever you add a new bar, instead of counting at the time of display.
class CategoryCounter(db.Model):
category = db.StringProperty()
count = db.IntegerProperty(default=0)
then when creating a Bar object, increment the counter
def createNewBar(category_name):
bar = Bar(...,baz=category_name)
counter = CategoryCounter.filter('category =',category_name).get()
if not counter:
counter = CategoryCounter(category=category_name)
else:
counter.count += 1
bar.put()
counter.put()
db.run_in_transaction(createNewBar,'asdf')
now you have an easy way to get the count for any specific category
CategoryCounter.filter('category =',category_name).get().count
+1 to Jehiah's response.
Official and blessed method on getting object counters on GAE is to build sharded counter. Despite heavily sounding name, this is pretty straightforward.
Count functions in all databases are slow (eg, O(n)) - the GAE datastore just makes that more obvious. As Jehiah suggests, you need to store the computed count in an entity and refer to that if you want scalability.
This isn't unique to App Engine - other databases just hide it better, up until the point where you're trying to count tens of thousands of records with each request, and your page render time starts to increase exponentially...
According to the GqlQuery.count() documentation, you can set the limit to be some number greater than 1000:
from models import Troll
troll_count = Troll.all(keys_only=True).count(limit=31337)
Sharded counters are the right way to keep track of numbers like this, as folks have said, but if you figure this out late in the game (like me) then you'll need to initialize the counters from an actual count of objects. But this is a great way to burn through your free quota of Datastore Small Operations (50,000 I think). Every time you run the code, it will use up as many ops as there are model objects.
I haven't tried it, and this is an utter resource hog, but perhaps iterating with .fetch() and specifying the offset would work?
LIMIT=1000
def count(query):
result = offset = 0
gql_query = db.GqlQuery(query)
while True:
count = gql_query.fetch(LIMIT, offset)
if count < LIMIT:
return result
result += count
offset += LIMIT
orip's solution works with a little tweaking:
LIMIT=1000
def count(query):
result = offset = 0
gql_query = db.GqlQuery(query)
while True:
count = len(gql_query.fetch(LIMIT, offset))
result += count
offset += LIMIT
if count < LIMIT:
return result
We now have Datastore Statistics that can be used to query entity counts and other data. These values do not always reflect the most recent changes as they are updated once every 24-48 hours. Check out the documentation (see link below) for more details:
Datastore Statistics
As pointed out by #Dimu, the stats computed by Google on a periodic basis are a decent go-to resource when precise counts are not needed and the % of records are NOT changing drastically during any given day.
To query the statistics for a given Kind, you can use the following GQL structure:
select * from __Stat_Kind__ where kind_name = 'Person'
There are a number of properties returned by this which are helpful:
count -- the number of Entities of this Kind
bytes -- total size of all Entities stored of this Kind
timestamp -- an as of date/time for when the stats were last computed
Example Code
To answer a follow-up question posted as a comment to my answer, I am now providing some sample C# code that I am using, which admittedly may not be as robust as it should be, but seems to work OK for me:
/// <summary>Returns an *estimated* number of entities of a given kind</summary>
public static long GetEstimatedEntityCount(this DatastoreDb database, string kind)
{
var query = new GqlQuery
{
QueryString = $"select * from __Stat_Kind__ where kind_name = '{kind}'",
AllowLiterals = true
};
var result = database.RunQuery(query);
return (long) (result?.Entities?[0]?["count"] ?? 0L);
}
The best workaround might seem a little counter-intuitive, but it works great in all my appengine apps. Rather than relying on the integer KEY and count() methods, you add an integer field of your own to the datatype. It might seem wasteful until you actually have more than 1000 records, and you suddenly discover that fetch() and limit() DO NOT WORK PAST THE 1000 RECORD BOUNDARY.
def MyObj(db.Model):
num = db.IntegerProperty()
When you create a new object, you must manually retrieve the highest key:
max = MyObj.all().order('-num').get()
if max : max = max.num+1
else : max = 0
newObj = MyObj(num = max)
newObj.put()
This may seem like a waste of a query, but get() returns a single record off the top of the index. It is very fast.
Then, when you want to fetch past the 1000th object limit, you simply do:
MyObj.all().filter('num > ' , 2345).fetch(67)
I had already done this when I read Aral Balkan's scathing review: http://aralbalkan.com/1504 . It's frustrating, but when you get used to it and you realize how much faster this is than count() on a relational db, you won't mind...

Resources