Sorting groups in solr by top result in each group - solr

Is there a way to sort groups in solr based on just the top result in the group and not all the members? For example, I'm grouping product variants by model, and sorting each group by the number of resellers. This gives me the product variant with the most resellers at the top of each group. Next I'm applying a low price sort to the entire result set. Currently all the groups are being ordered by the lowest price within the group, rather than the lowest price of the top product.
For faceting you can do group.truncate=true if you are just concerned with the top result, but I don't see anything similar for sorting.
Is this possible or will I just have to do my own sorting in custom request handler.

Related

Using a multiple arrays in order to find the lowest "Ranking" in an character defined ranking system

Wordy title but I was interviewed on this question, couldn't derive the answer and really would love to better understand array usage in excel.
Question:
You have two arrays. One depicts the credit ratings given by various companies (i.e. Company 1, Company 2...) and then an array showing the rankings of all credit ratings. Your goal is to use a formula to find the Lowest Rating by "CUSIP" and which company gave this rating. NOTE: A rating of NR = "NOT RATED" and should be excluded.
Arrays (Red & Black Array is the desired outcome of the formula)
Figured out my own answer... Thought I would document it here for you. One of the issues I had in the interview was to do with me setting up the "Ranking" table as a vertical table rather than a longwise table. This did not allow the max-matching array function to properly work.
I had to rearrange the ranking table to be elongated: | AAA | AA | A |
Lowest Rating Formula:
=INDEX((Ranking Table Array),
2,
MAX(IF(Credit Ratings Array)<>"NR",MATCH((CUSIP row in the table),
(Credit Ratings Array),0))))
I found this to accurately match the credit ratings with the lowest rated (would rank higher than a higher rated bond) for all of the companies
The company formula:
INDEX(Table1[[#Headers],[Company 1]:[Company 3]],1,MATCH(C11,B4:D4,0))
Piggie backing off the other formula - this formula matches the lowest credit rating found in the last formula to the company rating that in each row.
Issues with answers:
They require the answer table to be in the same format as the data being read. Because the formulas assume that CUSIPs ascending are being assessed the formulas will move 1 row down with each shift of the formula down. This is not scalable (even though this is an interview question).

Excel: Need to Generate IDs based on multiple criteria with repeating IDs

Looking to create pricing groups bases on multiple criteria. Each group could have multiple items within the group. I'm struggling with the autocreation the naming of each group. I estimate there should be about 6.5K pricing groups out of 14K items.
Below is the criteria -
QTY per case - is the number of bottles in a case
Size - size of the bottle
Family Brand - contains a group of like items
Code - CS1 - This is my unique code for each group that contains each of the above and lowest possible case price.
enter image description here
The "Thinking" column is how I want each group to look, but how do I do this with 14K items quickly?
If I understood correctly your pricing group name consists of two parts: a simple combination of columns and a "special" column, that should be counted.
Part 1 is simple: =C2&"-"&B2&"-"&A1&"-"
To make Part 2 easier you could sort, sorting fields Part 1, CODE-CS1.
After have done this you could use helping columns. If Part 1 is in column x and code-CS1 in column y you could find a formula for
Part 2 (column z): ="T"&IF(X1=X2;IF(Y1=Y2;Z1;Z1+1);1)
That means: If Part 1 is changing your counter starts with T1, if not so if your CODE CS1 changes, it counts, if not, so it keeps last number.
the result code would be =X2&Z2
It is untested and I use german excel, maybe the code doesn't work without any adaption, but in general it should work

Solr - Change how score is calculated? (Sum instead of Max)

We're having some relevance issues with Solr results. In this particular example we have product A showing up above product B. Product A's title contains the search term. Product B's title also contains the search term along with its Description and Category Name. So logically, Product B should be more relevant and appear above Product A, but it does not.
The schema is configured to take all of these extra fields into account. After analyzing the debug info of the query with ...&debugQuery=true&debug.explain.structured=trueit appears that both products have achieved the same score. Looking further, I can see these extra fields having scores calculated, but for some reason, the parser only takes the maximum of these scores instead of the sum which causes it to be the same:
Is there a reason that Solr behaves this way? Is there any way to change this behavior to use the sum instead of the max? (Just like in the parent element in the images)
You can control how the score is calculated using the tie parameter, provided that you are using Dismax/eDismax query parser.
Solr documentation explains it very well :
tie (Tie Breaker) parameter :
The tie parameter specifies a float value (which should be something
much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields,
more than one field may match. If so, each field will generate a
different score based on how common that word is in that field (for
each document relative to all other documents).
The tie parameter lets
you control how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest
scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction
max query": that is, only the maximum scoring subquery contributes to
the final score.
A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn’t matter what the maximum scoring sub query
is, because the final score will be the sum of the subquery scores.
Typically a low value, such as 0.1, is useful.

Solr Group By Field Tokens & Count

I'm using Solr 6.3.0 to store a full tree hierarchy with 3 levels. Each document is a node and its path in the tree is stored in a field, e.g. treePath:>522>12>7 for a level 3 node or treePath:>522>12 for a level 2 node.
Counting the children for a particular level 2 node is easy: I can regex query on treePath:/>522>12>.*/.
Also, I can count all the level 3 nodes with a regex query like />[0-9]+>[0-9]>.+/
I'm interested in getting the average branching factor at level 2. I think this should be possible using a faceted query that would group by the prefix of treePath.
The tricky part as I see it is grouping documents that share the prefix of a given field without specifying the actual prefix and letting Solr match them.
Any help is most welcome :)
Thanks!
Edit:
I figured out that I can simply count the level 3 nodes and divide that by the number of level 2 nodes and get the average branching factor but I'm still interested in finding out if there's a way of grouping the documents by field prefix
A possible solution would be to store level2 and level3 in two different fields, then faceting on the level2 field will give you all the level2s with their count. Summing this count and dividing by the number of elements would give you the branching factor.
The advantage of this solution over yours is that it can be applied with queries which restrict the trees you want to consider.

mysql: Optimizing query w/ mixed-ascendency ORDER BY

I have a large table (~1M rows now, soon ~10M) that has two ranked columns (in addition to the regular data):
avg_visited, a float 0-1 representing a %age popularity; higher is better
alexa_rank, an integer 1-N giving an a priori ranking
The a priori ranking is from external sources so can't be changed. Many rows have no popularity yet (as no user has yet hit it), so the a priori ranking is the fallback ordering. The popularity however does change very frequently - both to update old entries and to add a popularity to ones that previously only had the a priori ranking, if some user actually hits it.
I frequently run SELECT id, url, alexa_rank, avg_visited FROMsitesORDER BY avg_visited desc, alexa_rank asc LIMIT 49500, 500 (for various values of 49500).
However, ORDER BY cannot use an index with mixed ascendency per http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
This is in mysql 5.1, innodb.
How can I best change this situation to give me a sane, fully indexed query?
Unfortunately, MySQL does not support DESC clauses in the indexes, neither does it support indexes on derived expressions.
You can store the negative popularity along with the positive one and use it in the ORDER BY:
CREATE INDEX ix_mytable_negpopularity_apriori ON (neg_popularity, a_priori);
INSERT
INTO mytable (popularity, neg_popularity)
VALUES (#popularity, -#popularity);
SELECT *
FROM mytable
ORDER BY
neg_popularity, a_priori
Just a simple hack:
Since since popularity is a float between 0 to 1. You can multiply it by -1 and the number will be between -1 to 0.
This way you can reverse the sort order of popularity to ORDER BY popularity ASC, a_priori ASC
Not sure the overhead out weighs the gain.
This reminds me of the hack of storing emails in reverse form.

Resources