How to combine two search results effectively? - database

I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. This site also will have it's own products and the owners of the site want the search results to be inter-connected.
If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from.
eg.
July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...
The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again.
In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. Then if the user goes forward a few pages, I do not have to re-grab all the data.
Does anyone have suggestions on good algorithms to use to combine two search results?

Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source.
Grabbing a lot of results and caching them (in the session) is one solution you could use.
If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page.
For example (pseudo code)
**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
(A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)
User clicks page 2
**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.
You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). This would make the first request longer, but could possibly make the second request much faster.
If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source.
If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. As soon as a new search is executed, dump the old results.

Related

How to stop Heap Analytics grouping assets into "OTHER" Category

I think this might be very simple.
I wrote a query in heap to tell me which users were part of an event and how many times they engaged in it during the year.
The result is a simple table with username and number of occurrences.
It worked. However, Heap has this weird behavior of choosing multiple results (maybe at random?) and throwing them into a single "Other (X other results)" category. Where x is a number of others.
So i end up with a table of 20 maybe 30 users and occurences, and one row of "Other (X other results)".
I shrunk the query to see results from a smaller subset of dates and the "Other" category disappeared.
I really need to see every individual row in my query results! Even if it's paginated.
Help! Thank you
You can export the result as a CSV. The downloaded file will contain all the results (all single entries without the grouped OTHER).
Inte the current UI, you can find Export to CSV at the top of the report view.

Google Sheets Array/Filter for Newest Date/Unique ID

I have a sheet "RawCount" with Google Form results that will accumulate over time (people will make entries each week or as their raw number changes).
I need a way to compile the data to obtain the most recent entry for each individual who has entered data via the form:
This data will accumulate with new entries over the period of eight months from up to 100 or more different people.
I would like to sum the most recent entries for each individual onto another tab in the same Google Sheet that contains a scorecard.
Thanks for any help you can offer. I think I've sprained my brain on this.

How to grab filter elements form a large search result when just using smaller pages in the application?

I have a search query that returns a bunch of records but we're using paging so we only return back 10, 25, or 50 page subsets of the data. Basically the stored procedure goes along these lines.
WITH search_results AS
(
SELECT model, brandname, msrp,
ROW_NUMBER() OVER (ORDER BY #sortExpression) as rowNumber
FROM models
WHERE ...criteria....
)
SELECT * FROM search_results
WHERE rowNumber BETWEEN ((#pageNumber-1)*#pageSize)+1 and ((#pageNumber-1)*#pageSize)+#pageSize
When I use small pages my sproc comes back very quickly, usually in under a second. However, sometimes our users will enter criteria that may return back a few thousand to potentially a ten thousand records. They'll page through and just grab a few at a time, but the actual search results have a large number. The sproc is running quickly when my page size is small but when I increase it, it takes a few seconds which is too long.
This is all fine, I am using smaller pages. The problem is that part of our solutions is a filter. This filter lists all of the brands, categories, and 4 price range quadrants for the full search results. So when they click filter it takes the lowest price and and the highest, breaks it into 4 equal sized groupings and they are on the form with checkboxes. user than can check the ranges they want to filter and the brands and categories they want to filter. This re-submits the search with new criteria.
I'm not sure how to return a full set of brands, categories and highest/lowest price without running the main procedure (in the WITH) twice. Does it make sense to dump all of that into a temporary table and then return back multiple recordsets to my business object? The results, the brand list, the category list, and then the MIN and MAX prices? Is there a pattern for returning back filter information for search results like this?
The answer is no, there's no pattern and maybe. Try to put the raw big result in a temp table and use it to return multiple record sets. Test it and see if it works better. Doing it you are (in general) using more memory and less CPU. In the tunning business sometimes there are trade offs where you can exchange memory/IO/CPU use to speed up things.

Paging with a data set that can be changing?

I'm sure there is something out there about this topic but I just can't figure out how to word a search for it.
I have a table of records that gets loaded into a paging grid in the UI. The user has the ability to update/modify these records..also multiple users can use the system at once all hitting the same data. I have a filter on the paging grid allowing the user to see only X type of records.
When the user first enters with filter X selected they see items 1-25 on page 1 of 2. They page to the second page where the items should be 26-50..but before they paged lets say 25 records on the first page had their type changed by another user, now they don't appear when selecting that filter. So now we have 25 less items to page through which means items that were 26-50 before are now items 1-25 and what was page 2 is now page 1 and there is no page 2...
You can probably see the issue I'm getting into, I'm passing an offset to the query to get the next page of results..but now that offset is so high it returns a blank page of records confusing the user and our record processing.
There isn't really an easy solution to this problem. Even GMail/Google doesn't show the exact number of messages/pages found when searching something.
The first thing you can do (if you use a DataGrid/CellTable) is set the boolean exact as false when you call updateRowCount, and give it your current number of records instead of your total number of records. This will make the pager display "1 - 25 of over 25" instead of "1 - 25 of 50".
The next possibility is to update the row count regularly (using RPC polling to check for new/deleted records - or using server push techniques, see GWTEventService and ServerPushFAQ).
You can also check if your request returns items or not, and cancel the call/update the row count if it doesn't.

Google App Engine: How do you count query results given an offset?

I'm implementing pagination on sorted models and, as it stands, my queries are fetching way too much data. The way I display the links to the pages is similar to Google: the current page is highlighted and there is a surrounding "padding" of pages that you can navigate to. For instance, if there are 20 pages, the padding is 5, and you're on page 10, the page links would look like this:
... 5 6 7 8 9 [10] 11 12 13 14 15 ...
The thing is, I need to calculate the number of pages AFTER the current page in order to know how many page links past the current page should be shown. To do this, I sum the number of items I would need for the padding pages, plus the current page, plus one (to know whether to show the "..."), and then fetch these results. This results in a massive query of results that, ultimately, I only need a small subset of.
Google App Engine's api provides a count() function that returns the number of results a query fetches. However, it does not allow me to specify an offset.
How do I work around this problem?
I'm considering fetching the first item on the next page after the current page, then executing count() on another query that sorts on the values of that item, if that makes sense. Am I on the right track or am I missing something completely? I'm relatively new to app engine, so go easy! Thanks :)
UPDATE:
Thank you, Peter.
Cursors are indeed the appropriate approach to use. Here's an example for anyone looking to accomplish the same:
# For example, the following query has 27 results.
book_query = Book.all().filter("name_lowercase < ", "b" )
# Let's fetch 10 books starting at offset 0...
r = book_query.fetch(10, 0)
# This returns a cursor to the book after the last fetched result, index 10
c = book_query.cursor()
# Now let's count the number of results after our fetch, limit 100.
# To use cursors, the query must be exactly the same.
book_query2 = Book.all().filter("name_lowercase < ", "b" ).with_cursor(c)
book_query2.count(100) # Returns 17
I haven't used them yet, but I believe Query Cursors are what you are looking for.

Resources