Solr: Facet one field with two outputs - solr

I'm using Solr for indexing products and organising them into several categories. Each document has a taxon_names multi value field, where the categories are stored as human readable strings for a product.
Now I want to fetch all the categories from Solr and display them with clickable links to the user, without hitting the database again. At index time, I get the permalinks for every category from the MySQL database, which is stored as a multi value field taxon_permalinks. For generating the links to the products, I need the human readable format of the category and its permalink (otherwise you would have such ugly URLs in your browser, when just using the plain human readable name of the category, e.g. %20 for space).
When I do a facet search with http://localhost:8982/solr/default/select?q=*%3A*&rows=0&wt=xml&facet=true&facet.field=taxon_names, I get a list of human readable taxons with its counts. Based on this list, I want to create the links, so that I don't have to hit the database again.
So, is it possible to retrieve the matching permalinks from Solr for the different categories? For example, I get a XML like this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="6580" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="taxon_names">
<int name="Books">2831</int>
<int name="Music">984</int>
...
</lst>
</result>
And inside the taxon_names array I would need the name of the permalink.
Maybe it's possible by defining a custom field type in the config XMLs. But for this, I don't have enough experience with Solr.

Since it appears from your description that you are faceting permalink in the taxon_permalink field and the values in that field should correspond to the same category names in the taxon_names field. Solr allows you to facet on multiple fields, so you can just facet on both fields and walk the two facet results grabbing the display name from the taxon_names facet values and the permalink from the taxon_permalink facet values.
Query:
http://localhost:8982/solr/default/selectq=*%3A*&rows=0&wt=xml
&facet=true&facet.field=taxon_names&facet.field=taxon_permalink
Your output should then look like similar to the following:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="6580" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="taxon_names">
<int name="Books">2831</int>
<int name="Music">984</int>
...
</lst>
<lst name="taxon_permalink">
<int name="permalink1">2831</int>
<int name="permalink2">984</int>
...
</lst>
</result>

Related

Solr Facet Search-Spell check

I'm usign Solr facet search on a column of database. It successfully returns the data:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="tags">
<int name="lol">58</int>
<int name="scienc">58</int>
<int name="photo">34</int>
<int name="axiom">27</int>
<int name="geniu">14</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
I want to make sure that only complete words are counted. In the above example you can see counts for'scienc' and 'geniu' that should be for 'science' and 'genius'. How can I achieve this? Can I incorporate spell checking feature?
This probably has to do with the underlying fieldType that you have associated with your tags field. The field value is most likely being stemmed or having other analyzers associated with it. I would suggest one of two things:
Remove the stemming and/or other processing to prevent the words from appearing as partial.
(Recommended) Create a separate field tags_facet with fieldType="string" in your schema.xml and use a copyField directive to copy the values feed into your original tags field. Then facet on this new tags_facet field.
Use the copyField feature of Solr to copy the original field to one with a string fieldType. If the values are a set of words, instead of string, you could use a whitespace tokenised fieldtype (without ngrams of course.)

Solr : How can I group on two different fields?

My schema is like :
product_id
category_id
A category contains products.
In solr 3.6, I group results on category_id and it works well.
I just added a new field:
group_id
A group contains products that vary on size or color.
Example: shoes in blue, red and yellow are 3 differents products and have the same group_id.
Additionally to the result grouping on field category_id, I would like to have in my results only one product for a group_id, assuming group_id can be null (for products that aren't part of a group).
To follow the example of the shoes, it means that for the request "shoe", only one of the 3 products should be in results.
I thought to do a second result grouping on group_id, but I doesn't seem possible to do that way.
Any idea?
EDIT : For now, i process the results in php to delete documents that have a group_id that is already in the results. I leave this subject open, in case someone finds how to group on 2 fields
If your aim is to get grouping counts based on multiple "group by" fields, you can use pivot faceting to achieve this.
&facet.pivot=category_id,group_id
Solr will give you back a hierarchy of grouped result counts, following the page of search results, under the facet_pivot element.
http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28pivot%29#Pivot_.28ie_Decision_Tree.29_Faceting
It is not possible to group by query on two fields.
If you need count then you can use facet.field(For single field) or facet.pivot(For multiple field).
It is not actually group but you can get count of that group for multiple field.
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">306</int>
</lst>
<result name="response" numFound="667" start="0" maxScore="0.70710677">
<doc>
<int name="idField">7393</int>
<int name="field_one">12</int>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_ranges"/>
<lst name="facet_intervals"/>
<lst name="facet_heatmaps"/>
<lst name="facet_pivot">
<arr name="field_one,field_two">
<lst>
<str name="field">field_one</str>
<int name="value">3</int>
<int name="count">562</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">347</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">215</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">field_one</str>
<int name="value">12</int>
<int name="count">105</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">97</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">8</int>
</lst>
</arr>
</lst>
</arr>
</lst>
</lst>
</response>
Example Query :
http://192.168.100.145:7983/solr/<collection>/select?facet.pivot=field_one,field_two&facet=on&fl=idField,field_one&indent=on&q=field_one:(3%2012)&rows=1&wt=xml
if you can change the data that you are posting to solr, then I suggest that you create a string field which will have a concatenation of category_id and group_id. For example, if the category_id = 5 and group_id=2, then your string field can be :- '5,2' (using ',' or any other character as a delimiter). You can then group on this string field.

Solr Grouping with multifield facets

I want to know if this is possible using solr query:
Two columns to consider: location1, location2
Want to do a face on both the columns.
Below query will work:
http://localhost:8983/solr/select/? q=*:*&version=2.2&rows=0&facet=true&facet.field=location1&facet.field=location2
Response:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">13</int>
</lst>
<result name="response" numFound="7789" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="location1">
<int name="Chicago">100</int>
<int name="NewYork">50</int>
<int name="Washington">30</int>
</lst>
<lst name="location2">
<int name="Washington">200</int>
<int name="Philadelphia">100</int>
<int name="Chicago">50</int>
</lst>
<response>
What I need, is to group both location1 and location2 and get the following results:
Washington :230
Chicago :50
Philadelphia:100
Washington :30
Currently we do it at the service layer. But can this be done using result grouping in solr? What I understand is result grouping gives an aggregate of all the data but goes not do a facet topic aggregate.
You need to store both location1 and location2 in a single multi-valued field, say locations. Then you can issue this facet query to get what you want:
q=*:*&rows=0&facet=true&facet.field=locations
Solr does not support Grouping on Multivalued fields.
Support for grouping on a multi-valued field has not yet been implemented.
You can probably create a new field at indexing with a combined value and use the fields for faceting.
EDIT :-
Use a copy field to copy the contents of both fields to a single field and perform facet on it. Need just the schema changes and reindexing of data

Multiple cores join query

My solr version is 4.0
I have a multicore environment with a core for products and a core for availability records of these products.
The products core will contain detailed descriptions and has about 10,000 douments.
The availabilities core contains up to 4 million documents.
I built a small testset and I'm trying to get results using the join syntax, meant to find alle availabilities of products containing "disney".
http://localhost:8080/solr/product/select?q={!join%20from=productid%20to=id%20fromindex=availp}disney&fl=*
I get zero results.
Individual queries on each of the cores do yield results.
Questions:
1. how should I construct the query in order to get results
2. when I refine my query for filtering for a specific date, what would the syntax be.
for example ?fq=period:"november 2012" AND country:France
country is a field from the product index, period is a field from then availp index.
Results from individual queries: product core
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">id,productname</str>
<str name="indent">1</str>
<str name="q">disney</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="31" start="0">
<doc>
<str name="productname">DPAZ00 DPAZ00-02 DPAZ0002 Disneyland Parijs Hotel Disney's Santa Fe</str>
<str name="id">44044</str></doc>
</result>
</response>
other core: availp
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="fl">*</str>
<str name="indent">1</str>
<str name="q">productid:44044</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="42" start="0">
<doc>
<date name="datefrom">2012-10-01T10:00:00Z</date>
<arr name="period">
<str>oktober 2012</str>
</arr>
<str name="productid">44044</str>
<double name="minpriceperperson">209.0</double>
<int name="durationcode">1</int>
<str name="id">3890</str>
<int name="budgetcode">2</int>
</result>
</response>
1) You should query inventory core (with product as inner index).
This is how the query should be
http:// localhost:8080/solr/product/select?q=*& fl={!join from=id to=id fromIndex=availp}productname:disney
2) You can use the same query syntax above.
http:// localhost:8080/solr/product/select?q=period:november&fl={!join from=id to=id fromIndex=availp}productname:disney AND country:France
You can remove productname from above if not needed.
Have you tried by changing the fromindex to fromIndex (uppercase I)?
According to Adventures with Solr Join, the query look like this:
http://localhost:8983/solr/parents/select?q=alive:yes AND _query_:"{!join fromIndex=children from=fatherid to=parentid v='childname:Tom'}"
It should be works

How do I detect "ERROR:SCHEMA-INDEX-MISMATCH" in Solr?

How do I find documents in my index that have a SCHEMA-INDEX-MISMATCH? I have a number of these that I am finding them by trial-and-error. I want to query for them.
The results that I get have "ERROR:SCHEMA-INDEX-MISMATCH" in a field. An example:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<result name="response" numFound="1" start="0" maxScore="12.993319">
<doc>
<float name="score">12.993319</float>
<str name="articleId">ERROR:SCHEMA-INDEX-MISMATCH,stringValue=555</str>
<str name="articleType">Knowledge Base</str>
<str name="description">Moving to another drive Question: How can I ....</str>
<str name="id">article:555</str>
<str name="title">Moving to another drive</str>
<str name="type">article</str>
</doc>
</result>
</response>
If it matters, my query is along the lines of http://server/solr/select?q=id:%22article:555%22
What is the "type" of articleId?
I had issues with a date field and due to a defect in indexing program, I had 'ERROR:SCHEMA-INDEX-MISMATCH". Since these are values out side the bounds of a normal date, I was able to find them by the query - "Not myDateFieldType:[0001-01-01T00:00:00Z NOW]" .
If you are able to craft this type of query, depending on your data type, you should be able to find these values.

Resources