Two-dimensional facet query - solr

I have a simple query which returns counts for two different facets (e.g. authors and categories) with some limit. For instance:
select?q=*&facet.field=authors&facet.field=categories&facet.limit=2
As the result two lists with two top values each are returned, e.g.:
<lst name="facet_fields">
<lst name="authors">
<int name="Author1">1200</int>
<int name="Author2">1100</int>
</lst>
<lst name="categories">
<int name="Cat1">500</int>
<int name="Cat2">400</int>
</lst>
</lst>
I would like to present these facets on a two-dimensional table with the count for each pair:
Author1 Author2
Cat1 x x
Cat2 x x
How can I get the count for each pair? I tried to use pivot faceting with the query like this:
select?q=*&facet.pivot=authors,categories&facet.limit=2
but the response is not what I expect as it can contain something like in the following example where categories in the pivot for Author2 are different than for Author1. The pivot presents top categories for each author and not top categories for the whole query:
<lst name="facet_pivot">
<arr name="authors,categories">
<lst>
<str name="field">authors</str>
<str name="value">Author1</str>
<int name="count">1200</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat1</str>
<int name="count">450</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat2</str>
<int name="count">300</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">authors</str>
<str name="value">Author2</str>
<int name="count">1100</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat3</str>
<int name="count">300</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat4</str>
<int name="count">250</int>
</lst>
</arr>
</lst>
</arr>
</lst>
Can the pivot query be somehow parameterized to achieve the desired result of is there any other query that I could use for this particular use case?

Related

how to use functionqueries in solr as comparison to a spesific value?

I want to write a query that for ex. in sql psodocode like below
select * from temptable where price + 3 = 188;
Solr query i try is below
http://127.0.0.1:8983/solr/select/?fl=score,id&defType=func&q=sum(price,3):188
but i get below error. How can i query in solr? Please do not advice using "TO" keyword.
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">score,id</str>
<str name="q">sum(price,3):188</str>
<str name="defType">func</str>
</lst>
</lst>
<lst name="error">
<str name="msg">
org.apache.solr.search.SyntaxError: Unexpected text after function: :188
</str>
<int name="code">400</int>
</lst>
</response>
frange query will do
{!frange l=188 u=188} sum(price,3)

Solr : How can I group on two different fields?

My schema is like :
product_id
category_id
A category contains products.
In solr 3.6, I group results on category_id and it works well.
I just added a new field:
group_id
A group contains products that vary on size or color.
Example: shoes in blue, red and yellow are 3 differents products and have the same group_id.
Additionally to the result grouping on field category_id, I would like to have in my results only one product for a group_id, assuming group_id can be null (for products that aren't part of a group).
To follow the example of the shoes, it means that for the request "shoe", only one of the 3 products should be in results.
I thought to do a second result grouping on group_id, but I doesn't seem possible to do that way.
Any idea?
EDIT : For now, i process the results in php to delete documents that have a group_id that is already in the results. I leave this subject open, in case someone finds how to group on 2 fields
If your aim is to get grouping counts based on multiple "group by" fields, you can use pivot faceting to achieve this.
&facet.pivot=category_id,group_id
Solr will give you back a hierarchy of grouped result counts, following the page of search results, under the facet_pivot element.
http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28pivot%29#Pivot_.28ie_Decision_Tree.29_Faceting
It is not possible to group by query on two fields.
If you need count then you can use facet.field(For single field) or facet.pivot(For multiple field).
It is not actually group but you can get count of that group for multiple field.
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">306</int>
</lst>
<result name="response" numFound="667" start="0" maxScore="0.70710677">
<doc>
<int name="idField">7393</int>
<int name="field_one">12</int>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_ranges"/>
<lst name="facet_intervals"/>
<lst name="facet_heatmaps"/>
<lst name="facet_pivot">
<arr name="field_one,field_two">
<lst>
<str name="field">field_one</str>
<int name="value">3</int>
<int name="count">562</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">347</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">215</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">field_one</str>
<int name="value">12</int>
<int name="count">105</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">97</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">8</int>
</lst>
</arr>
</lst>
</arr>
</lst>
</lst>
</response>
Example Query :
http://192.168.100.145:7983/solr/<collection>/select?facet.pivot=field_one,field_two&facet=on&fl=idField,field_one&indent=on&q=field_one:(3%2012)&rows=1&wt=xml
if you can change the data that you are posting to solr, then I suggest that you create a string field which will have a concatenation of category_id and group_id. For example, if the category_id = 5 and group_id=2, then your string field can be :- '5,2' (using ',' or any other character as a delimiter). You can then group on this string field.

Is it possible to do Solr faceting combining multiple fields, like distinct on multiple columns in RMDB?

Let's say I want to do faceting on the combination of two fields in my doc.
For example:
Field1 Field2
A B
C D
A B
A C
C B
C D
Will have the facet result like
AB [2]
CD [2]
AC [1]
CB [1]
Is this possible? I mean on the fly, which means the fields are picked randomly, and therefore cannot create a copyfield during index.
You can group two fields using the Pivot Facets which is available on the Solr 4.0.
You can run the following query on your index to get it.
http://localhost:8181/solr/collection1/select?q=*:*&facet=true&facet.pivot=field1,field2
Then, the result will be like :
<lst name="facet_pivot">
<arr name="field1,field2">
<lst>
<str name="field">field1</str>
<str name="value">A</str>
<int name="count">3</int>
<arr name="pivot">
<lst>
<str name="field">field2</str>
<str name="value">B</str>
<int name="count">2</int>
</lst>
<lst>
<str name="field">field2</str>
<str name="value">C</str>
<int name="count">1</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">field1</str>
<str name="value">C</str>
<int name="count">3</int>
<arr name="pivot">
<lst>
<str name="field">field2</str>
<str name="value">D</str>
<int name="count">2</int>
</lst>
<lst>
<str name="field">field2</str>
<str name="value">B</str>
<int name="count">1</int>
</lst>
</arr>
</lst>
</arr>
</lst>

Multiple cores join query

My solr version is 4.0
I have a multicore environment with a core for products and a core for availability records of these products.
The products core will contain detailed descriptions and has about 10,000 douments.
The availabilities core contains up to 4 million documents.
I built a small testset and I'm trying to get results using the join syntax, meant to find alle availabilities of products containing "disney".
http://localhost:8080/solr/product/select?q={!join%20from=productid%20to=id%20fromindex=availp}disney&fl=*
I get zero results.
Individual queries on each of the cores do yield results.
Questions:
1. how should I construct the query in order to get results
2. when I refine my query for filtering for a specific date, what would the syntax be.
for example ?fq=period:"november 2012" AND country:France
country is a field from the product index, period is a field from then availp index.
Results from individual queries: product core
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">id,productname</str>
<str name="indent">1</str>
<str name="q">disney</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="31" start="0">
<doc>
<str name="productname">DPAZ00 DPAZ00-02 DPAZ0002 Disneyland Parijs Hotel Disney's Santa Fe</str>
<str name="id">44044</str></doc>
</result>
</response>
other core: availp
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="fl">*</str>
<str name="indent">1</str>
<str name="q">productid:44044</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="42" start="0">
<doc>
<date name="datefrom">2012-10-01T10:00:00Z</date>
<arr name="period">
<str>oktober 2012</str>
</arr>
<str name="productid">44044</str>
<double name="minpriceperperson">209.0</double>
<int name="durationcode">1</int>
<str name="id">3890</str>
<int name="budgetcode">2</int>
</result>
</response>
1) You should query inventory core (with product as inner index).
This is how the query should be
http:// localhost:8080/solr/product/select?q=*& fl={!join from=id to=id fromIndex=availp}productname:disney
2) You can use the same query syntax above.
http:// localhost:8080/solr/product/select?q=period:november&fl={!join from=id to=id fromIndex=availp}productname:disney AND country:France
You can remove productname from above if not needed.
Have you tried by changing the fromindex to fromIndex (uppercase I)?
According to Adventures with Solr Join, the query look like this:
http://localhost:8983/solr/parents/select?q=alive:yes AND _query_:"{!join fromIndex=children from=fatherid to=parentid v='childname:Tom'}"
It should be works

SOLR date faceting and BC / BCE dates / negative date ranges

Date ranges including BC dates is this possible?
I would like to return facets for all years between 11000 BCE (BC) and 9000 BCE (BC) using SOLR.
A sample query might be with date ranges converted to ISO 8601:
q=*:*&facet.date=myfield_earliestDate&facet.date.end=-92009-01-01T00:00:00&facet.date.gap=%2B1000YEAR&facet.date.other=all&facet=on&f.myfield_earliestDate.facet.date.start=-112009-01-01T00:00:00
However the returned results seem to be suggest that dates are in positive range, ie CE, not BCE...
see sample returned results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
<lst name="params">
<str name="f.vra.work.creation.earliestDate.facet.date.start">-112009-01-01T00:00:00Z</str>
<str name="facet">on</str>
<str name="q">*:*</str>
<str name="facet.date">vra.work.creation.earliestDate</str>
<str name="facet.date.gap">+1000YEAR</str>
<str name="facet.date.other">all</str>
<str name="facet.date.end">-92009-01-01T00:00:00Z</str>
</lst>
</lst>
<result name="response" numFound="9556" start="0">ommitted</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_dates">
<lst name="vra.work.creation.earliestDate">
<int name="112010-01-01T00:00:00Z">0</int>
<int name="111010-01-01T00:00:00Z">0</int>
<int name="110010-01-01T00:00:00Z">0</int>
<int name="109010-01-01T00:00:00Z">0</int>
<int name="108010-01-01T00:00:00Z">0</int>
<int name="107010-01-01T00:00:00Z">0</int>
<int name="106010-01-01T00:00:00Z">0</int>
<int name="105010-01-01T00:00:00Z">0</int>
<int name="104010-01-01T00:00:00Z">0</int>
<int name="103010-01-01T00:00:00Z">0</int>
<int name="102010-01-01T00:00:00Z">0</int>
<int name="101010-01-01T00:00:00Z">0</int>
<int name="100010-01-01T00:00:00Z">5781</int>
<int name="99010-01-01T00:00:00Z">0</int>
<int name="98010-01-01T00:00:00Z">0</int>
<int name="97010-01-01T00:00:00Z">0</int>
<int name="96010-01-01T00:00:00Z">0</int>
<int name="95010-01-01T00:00:00Z">0</int>
<int name="94010-01-01T00:00:00Z">0</int>
<int name="93010-01-01T00:00:00Z">0</int>
<str name="gap">+1000YEAR</str>
<date name="end">92010-01-01T00:00:00Z</date>
<int name="before">224</int>
<int name="after">0</int>
<int name="between">5690</int>
</lst>
</lst>
</lst>
</response>
Any ideas why this is the case, can solr handle negative dates such as -112009-01-01T00:00:00Z?
I don't think this is fully supported. At least I don't see any explicit references to BC dates in the source code or the tests.
I even tried defining BC years using date math, e.g:
facet.date.start: NOW-11000YEARS
facet.date.end: NOW
facet.date.gap: +1000YEAR
and got some weird results:
<int name="8991-06-07T20:30:45-.666Z">0</int>
<int name="7991-06-07T20:30:45-.666Z">0</int>
<int name="6991-06-07T20:30:45-.666Z">0</int>
<int name="5991-06-07T20:30:45-.666Z">0</int>
<int name="4991-06-07T20:30:45-.666Z">0</int>
<int name="3991-06-07T20:30:45-.666Z">0</int>
<int name="2991-06-07T20:30:45-.666Z">0</int>
<int name="1991-06-07T20:30:45-.666Z">0</int>
<int name="0991-06-07T20:30:45-.666Z">0</int>
<int name="0010-06-07T20:30:45-.666Z">0</int>
<int name="1010-06-07T20:30:45-.666Z">1435</int>
Note the - after the seconds. Looks like a bug to me...

Resources