I'm implementing pagination on sorted models and, as it stands, my queries are fetching way too much data. The way I display the links to the pages is similar to Google: the current page is highlighted and there is a surrounding "padding" of pages that you can navigate to. For instance, if there are 20 pages, the padding is 5, and you're on page 10, the page links would look like this:
... 5 6 7 8 9 [10] 11 12 13 14 15 ...
The thing is, I need to calculate the number of pages AFTER the current page in order to know how many page links past the current page should be shown. To do this, I sum the number of items I would need for the padding pages, plus the current page, plus one (to know whether to show the "..."), and then fetch these results. This results in a massive query of results that, ultimately, I only need a small subset of.
Google App Engine's api provides a count() function that returns the number of results a query fetches. However, it does not allow me to specify an offset.
How do I work around this problem?
I'm considering fetching the first item on the next page after the current page, then executing count() on another query that sorts on the values of that item, if that makes sense. Am I on the right track or am I missing something completely? I'm relatively new to app engine, so go easy! Thanks :)
UPDATE:
Thank you, Peter.
Cursors are indeed the appropriate approach to use. Here's an example for anyone looking to accomplish the same:
# For example, the following query has 27 results.
book_query = Book.all().filter("name_lowercase < ", "b" )
# Let's fetch 10 books starting at offset 0...
r = book_query.fetch(10, 0)
# This returns a cursor to the book after the last fetched result, index 10
c = book_query.cursor()
# Now let's count the number of results after our fetch, limit 100.
# To use cursors, the query must be exactly the same.
book_query2 = Book.all().filter("name_lowercase < ", "b" ).with_cursor(c)
book_query2.count(100) # Returns 17
I haven't used them yet, but I believe Query Cursors are what you are looking for.
Related
I think this might be very simple.
I wrote a query in heap to tell me which users were part of an event and how many times they engaged in it during the year.
The result is a simple table with username and number of occurrences.
It worked. However, Heap has this weird behavior of choosing multiple results (maybe at random?) and throwing them into a single "Other (X other results)" category. Where x is a number of others.
So i end up with a table of 20 maybe 30 users and occurences, and one row of "Other (X other results)".
I shrunk the query to see results from a smaller subset of dates and the "Other" category disappeared.
I really need to see every individual row in my query results! Even if it's paginated.
Help! Thank you
You can export the result as a CSV. The downloaded file will contain all the results (all single entries without the grouped OTHER).
Inte the current UI, you can find Export to CSV at the top of the report view.
I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks
I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.
Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.
I have a search query that returns a bunch of records but we're using paging so we only return back 10, 25, or 50 page subsets of the data. Basically the stored procedure goes along these lines.
WITH search_results AS
(
SELECT model, brandname, msrp,
ROW_NUMBER() OVER (ORDER BY #sortExpression) as rowNumber
FROM models
WHERE ...criteria....
)
SELECT * FROM search_results
WHERE rowNumber BETWEEN ((#pageNumber-1)*#pageSize)+1 and ((#pageNumber-1)*#pageSize)+#pageSize
When I use small pages my sproc comes back very quickly, usually in under a second. However, sometimes our users will enter criteria that may return back a few thousand to potentially a ten thousand records. They'll page through and just grab a few at a time, but the actual search results have a large number. The sproc is running quickly when my page size is small but when I increase it, it takes a few seconds which is too long.
This is all fine, I am using smaller pages. The problem is that part of our solutions is a filter. This filter lists all of the brands, categories, and 4 price range quadrants for the full search results. So when they click filter it takes the lowest price and and the highest, breaks it into 4 equal sized groupings and they are on the form with checkboxes. user than can check the ranges they want to filter and the brands and categories they want to filter. This re-submits the search with new criteria.
I'm not sure how to return a full set of brands, categories and highest/lowest price without running the main procedure (in the WITH) twice. Does it make sense to dump all of that into a temporary table and then return back multiple recordsets to my business object? The results, the brand list, the category list, and then the MIN and MAX prices? Is there a pattern for returning back filter information for search results like this?
The answer is no, there's no pattern and maybe. Try to put the raw big result in a temp table and use it to return multiple record sets. Test it and see if it works better. Doing it you are (in general) using more memory and less CPU. In the tunning business sometimes there are trade offs where you can exchange memory/IO/CPU use to speed up things.
I'm sure there is something out there about this topic but I just can't figure out how to word a search for it.
I have a table of records that gets loaded into a paging grid in the UI. The user has the ability to update/modify these records..also multiple users can use the system at once all hitting the same data. I have a filter on the paging grid allowing the user to see only X type of records.
When the user first enters with filter X selected they see items 1-25 on page 1 of 2. They page to the second page where the items should be 26-50..but before they paged lets say 25 records on the first page had their type changed by another user, now they don't appear when selecting that filter. So now we have 25 less items to page through which means items that were 26-50 before are now items 1-25 and what was page 2 is now page 1 and there is no page 2...
You can probably see the issue I'm getting into, I'm passing an offset to the query to get the next page of results..but now that offset is so high it returns a blank page of records confusing the user and our record processing.
There isn't really an easy solution to this problem. Even GMail/Google doesn't show the exact number of messages/pages found when searching something.
The first thing you can do (if you use a DataGrid/CellTable) is set the boolean exact as false when you call updateRowCount, and give it your current number of records instead of your total number of records. This will make the pager display "1 - 25 of over 25" instead of "1 - 25 of 50".
The next possibility is to update the row count regularly (using RPC polling to check for new/deleted records - or using server push techniques, see GWTEventService and ServerPushFAQ).
You can also check if your request returns items or not, and cancel the call/update the row count if it doesn't.
I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. This site also will have it's own products and the owners of the site want the search results to be inter-connected.
If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from.
eg.
July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...
The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again.
In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. Then if the user goes forward a few pages, I do not have to re-grab all the data.
Does anyone have suggestions on good algorithms to use to combine two search results?
Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source.
Grabbing a lot of results and caching them (in the session) is one solution you could use.
If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page.
For example (pseudo code)
**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
(A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)
User clicks page 2
**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.
You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). This would make the first request longer, but could possibly make the second request much faster.
If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source.
If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. As soon as a new search is executed, dump the old results.