How to grab filter elements form a large search result when just using smaller pages in the application? - sql-server

I have a search query that returns a bunch of records but we're using paging so we only return back 10, 25, or 50 page subsets of the data. Basically the stored procedure goes along these lines.
WITH search_results AS
(
SELECT model, brandname, msrp,
ROW_NUMBER() OVER (ORDER BY #sortExpression) as rowNumber
FROM models
WHERE ...criteria....
)
SELECT * FROM search_results
WHERE rowNumber BETWEEN ((#pageNumber-1)*#pageSize)+1 and ((#pageNumber-1)*#pageSize)+#pageSize
When I use small pages my sproc comes back very quickly, usually in under a second. However, sometimes our users will enter criteria that may return back a few thousand to potentially a ten thousand records. They'll page through and just grab a few at a time, but the actual search results have a large number. The sproc is running quickly when my page size is small but when I increase it, it takes a few seconds which is too long.
This is all fine, I am using smaller pages. The problem is that part of our solutions is a filter. This filter lists all of the brands, categories, and 4 price range quadrants for the full search results. So when they click filter it takes the lowest price and and the highest, breaks it into 4 equal sized groupings and they are on the form with checkboxes. user than can check the ranges they want to filter and the brands and categories they want to filter. This re-submits the search with new criteria.
I'm not sure how to return a full set of brands, categories and highest/lowest price without running the main procedure (in the WITH) twice. Does it make sense to dump all of that into a temporary table and then return back multiple recordsets to my business object? The results, the brand list, the category list, and then the MIN and MAX prices? Is there a pattern for returning back filter information for search results like this?

The answer is no, there's no pattern and maybe. Try to put the raw big result in a temp table and use it to return multiple record sets. Test it and see if it works better. Doing it you are (in general) using more memory and less CPU. In the tunning business sometimes there are trade offs where you can exchange memory/IO/CPU use to speed up things.

Related

How to stop Heap Analytics grouping assets into "OTHER" Category

I think this might be very simple.
I wrote a query in heap to tell me which users were part of an event and how many times they engaged in it during the year.
The result is a simple table with username and number of occurrences.
It worked. However, Heap has this weird behavior of choosing multiple results (maybe at random?) and throwing them into a single "Other (X other results)" category. Where x is a number of others.
So i end up with a table of 20 maybe 30 users and occurences, and one row of "Other (X other results)".
I shrunk the query to see results from a smaller subset of dates and the "Other" category disappeared.
I really need to see every individual row in my query results! Even if it's paginated.
Help! Thank you
You can export the result as a CSV. The downloaded file will contain all the results (all single entries without the grouped OTHER).
Inte the current UI, you can find Export to CSV at the top of the report view.

Generating Working Hours using SQL Server Query

I have this data and I need to generate a query that will give the output below
You can do this kind of groupings of rows with 2 separate row_number()s. Have 1 for all the data, ordered by date and second one ordered by code and date. To get the groups separated from the data, use the difference between these 2 row_number()s. When it changes, then it's a new block of data. You can then use that number in group by and take the minimum / maximum dates for each of them.
For the final layout you can use pivot or sum + case, most likely you want to have a new row_number for getting the rows aligned properly. Depending if you can have data missing / not matching you'll need probably additional checks.

couchdb map / reduce multiple keys filtering by date

I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.

How to combine two search results effectively?

I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. This site also will have it's own products and the owners of the site want the search results to be inter-connected.
If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from.
eg.
July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...
The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again.
In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. Then if the user goes forward a few pages, I do not have to re-grab all the data.
Does anyone have suggestions on good algorithms to use to combine two search results?
Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source.
Grabbing a lot of results and caching them (in the session) is one solution you could use.
If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page.
For example (pseudo code)
**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
(A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)
User clicks page 2
**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.
You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). This would make the first request longer, but could possibly make the second request much faster.
If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source.
If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. As soon as a new search is executed, dump the old results.

How do I get around the Sum(First(...)) not allowed limitation is SSRS2005

The problem that I have is SQL Server Reporting Services does not like Sum(First()) notation. It will only allow either Sum() or First().
The Context
I am creating a reconciliation report. ie. what sock we had a the start of a period, what was ordered and what stock we had at the end.
Dataset returns something like
Type,Product,Customer,Stock at Start(SAS), Ordered Qty, Stock At End (SAE)
Export,1,1,100,5,90
Export,1,2,100,5,90
Domestic,2,1,200,10,150
Domestic,2,2,200,20,150
Domestic,2,3,200,30,150
I group by Type, then Product and list the customers that bought that product.
I want to display the total for SAS, Ordered Qty, and SAE but if I do a Sum on the SAS or SAE I get a value of 200 and 600 for Product 1 and 2 respectively when it should have been 100 and 200 respectively.
I thought that i could do a Sum(First()) But SSRS complains that I can not have an aggregate within an aggregate.
Ideally SSRS needs a Sum(Distinct())
Solutions So Far
1. Don't show the Stock at Start and Stock At End as part of the totals.
2. Write some code directly in the report to do the calc. tried this one - didn't work as I expected.
3. Write an assembly to do the calculation. (Have not tried this one)
Edit - Problem clarification
The problem stems from the fact that this is actually two reports merged into one (as I see it). A Production Report and a sales report.
The report tried to address these criteria
the market that we sold it to (export, domestic)
how much did we have in stock,
how much was produced,
how much was sold,
who did we sell it to,
how much do we have left over.
The complicating factor is the who did we sell it to. with out that, it would have been relativly easy. But including it means that the other top line figures (stock at start and stock at end) have nothing to do with the what is sold, other than the particular product.
I had a similar issue and ended up using ROW_NUMBER in my query to provide a integer for the row value and then using SUM(IIF(myRowNumber = 1, myValue, 0)).
I'll edit this when I get to work and provide more data, but thought this might be enough to get you started. I'm curious about Adolf's solution too.
Pooh! Where's my peg?!
Have you thought about using windowing/ranking functions in the SQL for this?
This allows you to aggregate data without losing detail
e.g. Imagine for a range of values, you want the Min and Max returning, but you also wish to return the initial data (no summary of data).
Group Value Min Max
A 3 2 9
A 7 2 9
A 9 2 9
A 2 2 9
B 5 5 7
B 7 5 7
C etc..
Syntax looks odd but its just
AggregateFunctionYouWant OVER (WhatYouWantItGroupedBy, WhatYouWantItOrderedBy) as AggVal
Windowing
Ranking
you're dataset is a little weird but i think i understand where you're going.
try making the dataset return in this order:
Type, Product, SAS, SAE, Customer, Ordered Qty
what i would do is create a report with a table control. i would set up the type, product, and customer as three separate groups. i would put the sas and sae data on the same group as the product, and the quantity on the customer group. this should resemble what i believe you are trying to go for. your sas and sae should be in a first()
Write a subquery.
Ideally SSRS needs a Sum(Distinct())
Re-write your query to do this correctly.
I suspect your problem is that you're written a query that gets you the wrong results, or you have poorly designed tables. Without knowing more about what you're trying to do, I can't tell you how to fix it, but it has a bad "smell".

Resources