Solr Faceting Multi-valued vs Tokenizers - solr

I'm trying to set up a subject field in my schema. I'm drawing from a database where a single record can have multiple subjects and the subjects are listed in a comma delimited string. Is there a way to facet on just one of the subjects?
Thanks

Check SolrFacetingOverview for an faceting overview.
Facet Indexing section mentions the field type you should choose for the field that you want to facet on.
You can customize the faceting using SimpleFacetParameters
You can filter the results with entities having particular value for a subject using the filter query e.g. fq=subject:"MATH"
The filtering would produce only the results matching the criteria and the facet results would include the facets from the resultset.

if I understand well you want this, in the dih file:
<entity name="entity" pk="id" query="..." transformer="RegexTransformer">
<field column="subjects" splitBy=","/>
</entity>
and the query for facetting:
http://localhost:8983/solr/select?q=...&facet=true&facet.field=subjects&facet.query=subjects:the-one-you-want
would that work?

Related

Solr filter on facets

Each of my documents can have one or more entries of a field called Classes, describing some properties of the document, always of the form:
<field name="Classes">"<Description> - <TypeLabel> - <OriginLabel>"</field>
So for instance a document about food might have the two fields:
<field name="Classes">"Yellow orange - Fruit - California"</field>
<field name="Classes">"Small broccoli - Vegetable - Florida"</field>
I am using Solr 5.0 and a schema.xml file, where I have a multiValued "text_en" field Classes that I copy to a "string" field Classes_asString so that I can do faceting on the whole field and treat is as a big label.
With facet.field on Classes_asString I am getting the facet counts that I want, but now I would like to additionally filter these results.
For example, how do I only get facet results that end with "California"?
Or, in another example, how do I only get facet results that have "Vegetable" between the two "-"?
I have seen the option facet.prefix, but this is not applicable in my case. I would appreciate any help or suggestions.
Maybe this scenario is a good place to use:
Index the Classes info as Child documents. You have at least 3 fields in those fields, so it's worth using their own doc for that?
Then you should be able to facet on the specific child field, either with a current Solr version if it is supported (not sure), or with work in this ticket that is not merged yet

How to query a specific document by id

From a previous query I already have the document ID (the uniqueKey in this schema is 'track_id') of the document I'm interested in.
Then I would like to query a sequence of words on that document while highlighting the match.
I can't seem to be able to combine the search parameters in a successful way (all my google searches return purple links :\ ), although I've already tried many combinations these past few days. I also know the field where the matches will be if that's any use in terms of improving match speed.
I'm guessing it should be something like this:
/select?q=track_id:{key_i_already_have} AND/&/{part_I_dont_know} word1 word2 word3
Currently, since I can't combine these two search parameters, I'm only querying the words and thus getting several results from several documents.
Thanks in advance.
From Solr 4 you can use the realtime get, which is much more faster than searching the index by id.
http://localhost:8983/solr/get?ids=id1,id2,id3
For index updates to be visible (searchable), some kind of commit must reopen a searcher to a new point-in-time view of the index. The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.
You may try applying Filter Query for id. So it will filter your search query to that id, and then search in that document for all the keywords, and highlight them.
Your query will look like:
/select?fq=track_id:DOC_ID&q=word1 word2 word3
Just make sure your "id" field in schema.xml is defined of the type string to apply filter queries on it.
<field name="id" type="string" indexed="true" stored="true" required="true" />

Could it possible to get related documents in Solr search query?

I use the following query string to get a document indexed in Solr:
http://localhost:8080/solr/newsarchive/select/?q=ID:bbc-55950440dc8e5f1a550bd736214a1e7e&sort=Date%20desc&version=2.2&start=0&rows=10&indent=on&wt=json
Which returns the specified document of ID bbc-55950440dc8e5f1a550bd736214a1e7e.
My question is: Is there any way to make this query returns a number of related documents IDs?
There is a way to do this in Solr, it's called More Like This: https://wiki.apache.org/solr/MoreLikeThis
You pass Solr a query and the More Like This handler will return similar documents for each document the query you passed in would return. It determines similarity by looking at the terms in fields that you select and running a Lucene query using those terms.
The fields you select need at a minimum to be stored, preferably they should be set up to store term vectors:
<field name="cat" ... termVectors="true" />
An example query (taken from the documentation):
http://localhost:8983/solr/select?q=apache&mlt=true&mlt.fl=manu,cat
In this case you are querying the index for the word "apache" and requesting a more like this result set (mlt=true). You are asking Solr to base the similar on the fields manu and cat. Solr will then look at the terms in those fields and perform a search on those fields using those terms to locate similar documents.
A few more articles/examples:
http://blog.brattland.no/node/18
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

solr displayed some results first when they are part of the results

I consider this solr psedo-doc
<doc>
<field name="title"/>
<field name="name"/>
<field name="keywords"/>
</doc>
Some doc's will have the keyword "up" which means that they should appear first (despite of their initial order position) when and only when they are part of the search results.
So lets say I have:
doc1('title1','Bob, Alice','people, up, couple')
doc2('title2','Smart Phone, Laptop, Bob','devices, electronics')
if I query with "title:title2 name:Bob" then I should get doc1 first (it has the 'up' keyword).
if I query with "name:Bob" I still get doc1 first for the same reason.
if I query with "name:Laptop" then I should only get doc2 in my results. doc1 should not be included since it doesnt match my search query.
Any suggestion to do this?
You have several options to do something like that:
function query / boost query (in dismax handler)
during index time (boost documents)
extract 'up' keyword to additional field and sort by this field, than score
For example (with dismax handler):
/select?defType=dismax&q=...&bq=keywords:"up"^1000
This can be solved with Solr's query time boosting. So following the guidance from the Solr Relevancy FAQ - you could add an additional boosted search term to all queries, e.g. title:title2 name:Bob keywords:up^2
You could also at index time for each document, determine if the up keyword is present then store that in an additional field (boolean for example) in your schema and boost the query results based on that boolean field.

Resources