how to join and search all the fields in solr? - solr

I have two documents product and seller.
Product: {ID,NAME,DESCRIPTION}
Seller: {ID, PRODUCT_ID, SELLER_NAME, ATTRIBUTE_NAME, ATTRIBUTE_VALUE}
I need to join both of these documents and search all the fields in the seller and product?
So far I'm trying something like {!join from=product_id to=id}seller_name:"Sample-2" . This searches value "Sample-2" in seller_name field of seller document. How can i modify this to search all the fields of product and seller along with join?

Usually you'd implement this by either using copyField-directives to add all the terms into one field and search on that field, or by supplying a qf= parameter to give the fields you'd want to search (with *dismax).
If you're going to do a lot of these you might want to create a separate core and index pre-processed data into that, with copyField directives to create a catch all-field.

Related

Solr filter on group

I am looking for possibility to apply filter on group in Solr. It means that if at least one of documents in group isn't restricted by filter entire group should be displayed in search results. Moreover I need apply Solr filter on document to filter documents inside group. For example, I have following documents and group it by baseProductCode:
group test1
Test1VariantProduct1
restrictedCountries: US
type: ebook
baseProductCode: test1
Test1VariantProduct2
type: paperbook
baseProductCode: test1
In this example I need to apply filter on document by restrictedCountries field and filter on group by type. It means that I would like to filter document with restricted countries and it could be implemented using fq=-countries=US. From the other hand I want to hide group from search result by type if all group documents are hidden. As result I want following cases will be valid:
Our system shouldn't display this group in search results for US country and ebook type.
Our system should display this group in search results for any other country and ebook type.
Could you please advice is it possible to implement it using Solr features?
If you're using faceting to get your groups, they'll work with the set of documents after they've been filtered, so the groups won't be shown if there are no documents that matches from that group (i.e. you'd be faceting on baseProductCode).

Solr search statistics in category scale

I need to implement further functionality picture of it is attached below. I've already built an application based on Solr search.
In a few words about this functionality: drop down will contain similar search phrases within concrete category and number of items found.
In what way to make Solr collect such data and somehow receive it?
Yes, you can do that in Solr using Facets, which allow grouping results. The default behaviour of facets is to return the group name and the number of items found. You do that by adding these 2 items you your query string facet=true, facet.field=category.
An example query in your case will be
http://localhost:8983/solr/NAME_OF_YOUR_INDEX/select/?wt=json&indent=on&q=ipo&fl=category,name&facet=true&facet.field=category
Take a look at the tutorial for more details.
This is roughly equivalent to doing this in SQL:
SELECT category, COUNT(*) FROM items WHERE text LIKE "%ipo%" GROUP BY category;

Solr query time join

I am attempting to do a join on two fields that have the same name (company_id) but are from different entities to query a document based on a field it does not have.
ex: I have a sales entity and a company entity, where the sales entity holds a company id, and the company entity holds the name of the company.
For size reasons, I cannot do this join at index time.
I wish to get the names of the companies that have a sale over x.
I attempted both of the following:
q={!join+from=company_id+to=company_id}sales:[100 TO *]
and
fq={!join+from=company_id+to=company_id}sales:[100 TO *]
For the fq one I just specified *:* as the q parameter.
In both cases I got results, but the results did not have sales in that range.
How can I fix this?
Using Solr 4.4
Note: This appears to work with only one entity involved.
With "different entities" are you referring to 2 Solr Core ?
In that case you have use a slight different sintax :
http://localhost:8983/solr/<coreTO>/select?q={!join from=docId to=id fromIndex=<coreFROM>}query
From this link Solr-join
I have found the solution.
According to this :
The join operation is done on a term basis, so the "from" and "to" fields must use compatible field types. For example: joining between a StrField and a TrieIntField will not work, likewise joining between a StrField and a TextField that uses LowerCaseFilterFactory will only work for values that are already lower cased in the string field.

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

Is there any workaround for sorting on multiValued field?

Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)
docs:- http://wiki.apache.org/solr/CommonQueryParameters#sort
My original schema is (You can consider the following is a GROUP-BY) :-
products (id, unique)
users who make some comments(multiValued)
last_comment_date for each user (multiValued, one user can make multiple comments, but only the last comment date is captured)
If sorting on multiValued is allowed,
I can easily get list of the products commented by certain users,
then sort by last_activity_date.
However, it does not work.
The workaround I have currently is to reverse the schema to :-
user + product (as id, unique)
user (single value)
last_comment_date
products
Which mean I (sort of) manage to get list of the products commented by certain users,
order by last_comment_date,
of course it lead to duplication of products
as product will appear in each of the user's comment.
Any suggestion to simulate a group-by effect.
Between, I using solr 3.1.
Field collapsing does not apply.
Sorting by a multi-valued field is not something that is just pending to do or can be patched.
It can't be possibly done because it simply doesn't make any sense.
The way to do this is to have a single-valued field (populated at index-time with the last date) per document, then sort on that. I.e. when indexing traverse the list of users with their last activity date, find the latest date, and assign it to the document's last-activity-date field.

Resources