Solr : How can I group on two different fields? - solr

My schema is like :
product_id
category_id
A category contains products.
In solr 3.6, I group results on category_id and it works well.
I just added a new field:
group_id
A group contains products that vary on size or color.
Example: shoes in blue, red and yellow are 3 differents products and have the same group_id.
Additionally to the result grouping on field category_id, I would like to have in my results only one product for a group_id, assuming group_id can be null (for products that aren't part of a group).
To follow the example of the shoes, it means that for the request "shoe", only one of the 3 products should be in results.
I thought to do a second result grouping on group_id, but I doesn't seem possible to do that way.
Any idea?
EDIT : For now, i process the results in php to delete documents that have a group_id that is already in the results. I leave this subject open, in case someone finds how to group on 2 fields

If your aim is to get grouping counts based on multiple "group by" fields, you can use pivot faceting to achieve this.
&facet.pivot=category_id,group_id
Solr will give you back a hierarchy of grouped result counts, following the page of search results, under the facet_pivot element.
http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28pivot%29#Pivot_.28ie_Decision_Tree.29_Faceting

It is not possible to group by query on two fields.
If you need count then you can use facet.field(For single field) or facet.pivot(For multiple field).
It is not actually group but you can get count of that group for multiple field.
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">306</int>
</lst>
<result name="response" numFound="667" start="0" maxScore="0.70710677">
<doc>
<int name="idField">7393</int>
<int name="field_one">12</int>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_ranges"/>
<lst name="facet_intervals"/>
<lst name="facet_heatmaps"/>
<lst name="facet_pivot">
<arr name="field_one,field_two">
<lst>
<str name="field">field_one</str>
<int name="value">3</int>
<int name="count">562</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">347</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">215</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">field_one</str>
<int name="value">12</int>
<int name="count">105</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">97</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">8</int>
</lst>
</arr>
</lst>
</arr>
</lst>
</lst>
</response>
Example Query :
http://192.168.100.145:7983/solr/<collection>/select?facet.pivot=field_one,field_two&facet=on&fl=idField,field_one&indent=on&q=field_one:(3%2012)&rows=1&wt=xml

if you can change the data that you are posting to solr, then I suggest that you create a string field which will have a concatenation of category_id and group_id. For example, if the category_id = 5 and group_id=2, then your string field can be :- '5,2' (using ',' or any other character as a delimiter). You can then group on this string field.

Related

Two-dimensional facet query

I have a simple query which returns counts for two different facets (e.g. authors and categories) with some limit. For instance:
select?q=*&facet.field=authors&facet.field=categories&facet.limit=2
As the result two lists with two top values each are returned, e.g.:
<lst name="facet_fields">
<lst name="authors">
<int name="Author1">1200</int>
<int name="Author2">1100</int>
</lst>
<lst name="categories">
<int name="Cat1">500</int>
<int name="Cat2">400</int>
</lst>
</lst>
I would like to present these facets on a two-dimensional table with the count for each pair:
Author1 Author2
Cat1 x x
Cat2 x x
How can I get the count for each pair? I tried to use pivot faceting with the query like this:
select?q=*&facet.pivot=authors,categories&facet.limit=2
but the response is not what I expect as it can contain something like in the following example where categories in the pivot for Author2 are different than for Author1. The pivot presents top categories for each author and not top categories for the whole query:
<lst name="facet_pivot">
<arr name="authors,categories">
<lst>
<str name="field">authors</str>
<str name="value">Author1</str>
<int name="count">1200</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat1</str>
<int name="count">450</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat2</str>
<int name="count">300</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">authors</str>
<str name="value">Author2</str>
<int name="count">1100</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat3</str>
<int name="count">300</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat4</str>
<int name="count">250</int>
</lst>
</arr>
</lst>
</arr>
</lst>
Can the pivot query be somehow parameterized to achieve the desired result of is there any other query that I could use for this particular use case?

how to use functionqueries in solr as comparison to a spesific value?

I want to write a query that for ex. in sql psodocode like below
select * from temptable where price + 3 = 188;
Solr query i try is below
http://127.0.0.1:8983/solr/select/?fl=score,id&defType=func&q=sum(price,3):188
but i get below error. How can i query in solr? Please do not advice using "TO" keyword.
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">score,id</str>
<str name="q">sum(price,3):188</str>
<str name="defType">func</str>
</lst>
</lst>
<lst name="error">
<str name="msg">
org.apache.solr.search.SyntaxError: Unexpected text after function: :188
</str>
<int name="code">400</int>
</lst>
</response>
frange query will do
{!frange l=188 u=188} sum(price,3)

Solr Group By query

I have schema.xml like this:
Sample data
id Country State City Area
1 India abc cd mnv
15131 India Delhi HauzK asdf (from 1 to 15131 inserted usingcsvhandler)
15132 India Karnatka Bang mno ( 15132 inserted using solarium api)
All fields are text_general type and applying
Whitespace tokenizer
Lowercase filterfactory
Ngramfilter factory
One thing to note :
I inserted records from Id = '1' to id=15131 with CSV request handler and document with id = 15132 using solarium API to insert new record.
Now, I have suggestion box for country. I want to show only different countries, so I did group by on country.
http://localhost:8983/solr/searchLocation/country?
q=country%3Ain&wt=xml&indent=true&group=true&group.field=country
I got following result
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">4</int>
</lst>
<lst name="grouped">
<lst name="country">
<int name="matches">15132</int>
<arr name="groups">
<lst>
**<str name="groupValue">ndia</str>**
<result name="doclist" numFound="15131" start="0" maxScore="0.24998347">
<doc>
<str name="country">india</str>
<str name="state">Andaman and Nicobar</str>
<str name="city">A&N Islands</str>
<str name="area">Marine Jetty</str>
<str name="id">02cb8ba4-bffe-4c4e-a976-29f01ad8d275</str>
<float name="score">0.24998347</float>
</doc>
</result>
</lst>
<lst>
**<str name="groupValue">d</str>**
<result name="doclist" numFound="1" start="0" maxScore="0.24998347">
<doc>
<str name="country">india</str>
<str name="state">Kerala</str>
<str name="city">Palghat</str>
<str name="area">Padagirinew</str>
<str name="id">0158f635-24dd-4d2f-9697-e79272684c95</str>
<float name="score">0.24998347</float>
</doc>
</result>
</lst>
</arr>
</lst>
</lst>
</response>
My confusion is , how it could be possible I got two groups
all records from id = 1 to id=15131 with country value = india
last record with id = 15132 with country value = india
Why it is not making two different groups?? It should be single group becuase value of country field is India ...
Thanks

what does facet in Solr means?

Can you please explain me , what is facet ?
What did I understand is , suppose I have following documents.
State Country
karntaka India
Bangalore India
Delhi India
Noida India
It collapse multiple same value of field to a single value and returns number of times that value occurred.
Now when i am search on field 'Country' then obviously I am getting 4 times India , So i keep facet=on and facet.field=Country, with a motive of getting only one time India , but when i fired query rather I am getting
some weird result
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="country">
<int name="a">4</int>
<int name="d">4</int>
<int name="di">4</int>
<int name="dia">4</int>
<int name="i">4</int>
<int name="ia">4</int>
<int name="in">4</int>
<int name="ind">4</int>
<int name="indi">4</int>
<int name="india">4</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
Can any one help me to understand .
Thanks
If you had a Washington, USA entry, the facet would report 4 results for India and 1 for USA.
Use a string field type. You seem to have used a (text) field with lowercasing and n-gramming, which may benefit people who spell India as Inde, for example. A string field is not processed like this and therefore its best suited for a field meant to be faceted.

Multiple cores join query

My solr version is 4.0
I have a multicore environment with a core for products and a core for availability records of these products.
The products core will contain detailed descriptions and has about 10,000 douments.
The availabilities core contains up to 4 million documents.
I built a small testset and I'm trying to get results using the join syntax, meant to find alle availabilities of products containing "disney".
http://localhost:8080/solr/product/select?q={!join%20from=productid%20to=id%20fromindex=availp}disney&fl=*
I get zero results.
Individual queries on each of the cores do yield results.
Questions:
1. how should I construct the query in order to get results
2. when I refine my query for filtering for a specific date, what would the syntax be.
for example ?fq=period:"november 2012" AND country:France
country is a field from the product index, period is a field from then availp index.
Results from individual queries: product core
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">id,productname</str>
<str name="indent">1</str>
<str name="q">disney</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="31" start="0">
<doc>
<str name="productname">DPAZ00 DPAZ00-02 DPAZ0002 Disneyland Parijs Hotel Disney's Santa Fe</str>
<str name="id">44044</str></doc>
</result>
</response>
other core: availp
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="fl">*</str>
<str name="indent">1</str>
<str name="q">productid:44044</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="42" start="0">
<doc>
<date name="datefrom">2012-10-01T10:00:00Z</date>
<arr name="period">
<str>oktober 2012</str>
</arr>
<str name="productid">44044</str>
<double name="minpriceperperson">209.0</double>
<int name="durationcode">1</int>
<str name="id">3890</str>
<int name="budgetcode">2</int>
</result>
</response>
1) You should query inventory core (with product as inner index).
This is how the query should be
http:// localhost:8080/solr/product/select?q=*& fl={!join from=id to=id fromIndex=availp}productname:disney
2) You can use the same query syntax above.
http:// localhost:8080/solr/product/select?q=period:november&fl={!join from=id to=id fromIndex=availp}productname:disney AND country:France
You can remove productname from above if not needed.
Have you tried by changing the fromindex to fromIndex (uppercase I)?
According to Adventures with Solr Join, the query look like this:
http://localhost:8983/solr/parents/select?q=alive:yes AND _query_:"{!join fromIndex=children from=fatherid to=parentid v='childname:Tom'}"
It should be works

Resources