Solr Support for wildcard in facet.prefix - solr

I have a system which has huge number of facet values on Country name. So the countries can be USA, United State, Canada etc
Now I wanted facets to be custom sorted. By default solr supports either count based sorting or alphabetic sorting. However I did not wanted sorting in this manner. I wanted to have a custom sort such that also USA variations comets at a top, then europe, then asia and so on.
For this I have written a tokenizer which reads a text file and generates token like this
0001_usa
0002_united state
So basically I prefix my sort sequence and then sort on alphabetic order. I then remove the prefix while displaying on UI. So far it works great. Now since the number of facets are huge, I also want a search feature with auto suggest. So for example if a user types "u" I should be able to display all countries starting with "u" in the type ahead. I was using facet.prefix earlier for this but after my custom token it would not work since I prefix 000x to the token. Also facet.prefix does not seem to support wild card. So how can I implement this type ahead? Any other way to support custom sorting in Solr. I do not want to get all the data on client and sort since its huge.
Please help

You easily achieve this by indexing the country names in an additional field, with the right handlers for autosuggest.
You could have something like country_sort where you put your prefixed values like before (001_usa, 0002_united state) and a country_autosugest field where you put the plain values (usa, united states).
Then query on country_autosugest and sort on country_sort. This way you can also return the value of country_autosuggest, no need to process the string at display time.

Related

solr search with whitespaces and without whitespaces

I want to search products in the document with whitespaces and without whitespaces like "base ball", "baseball"
if someone searches for "baseball" the result should fetch the records of "baseball" & "base ball"
I am not able to that, also i do not want to use "synonyms" for that.
I have used filter class "WordDelimiterFilterFactory" to get that results i use keywords like sunglass for sun glass, keychain for key chain in synonyms files.
but there will be much more words like this so it's been difficult to find such words whose meaning is same even after split.
so I am looking for the solution where I don't have to use synonyms to get the desired result
I've tried by setting catenateWords='1' to get that result but it also did not match the result.
This is not possible without adding the synonyms. You should add the base ball as a synonyms to baseball.
The WordDelimiterFilterFactory is depricated.
Even if you use WordDelimiterGraphFilterFactory its not possible.
generateWordParts : It spilts the words at camelcase like BaseBall...but its not the case for you.
catenateWords : It also wont work in your case as your word is not having any special char or hyphen separated to join. e.g wi-fi will get wifi.
So either you data should have the separate words to be indexed. It means if you dont want to use synonyms then you have to push baseball and base ball for indexing then only you will be able perform search on these words.

Solr multilingual stemisation

I'm using Solr to index documents like .pdf or .docx. These documents are in french or in english and I want to use the stemisation for both languages.
For exemple, if I search "chevaux" I want to find "cheval" (french) and if I search "raise" I want to find "raising" (english).
Is there a way to do this without createting 2 core (one in english and one in french) ?
Have two fields, one with the field definition you want for French, and one with the field definition you want for English. Then use the Language Detection feature to submit the content to the correct field.
When searching, query the field that has the correct language as the user, or if you don't know, search both - or use language detection to try to do a better guess.
You can also index the same content into both fields, but my initial guess is that it'll give you weird results down the road, where someone enters a French word, but due to the processing rules for English, you get hit that wouldn't have happened if you only indexed to the correct field.
By enabling langid.map, you can tell Solr to index the content into fields named fieldname_langcode (where fieldname is picked up from langid.fl).
langid.map: Enables field name mapping. If true, Solr will map field names for all fields listed in langid.fl.
You can use langid.map.replace or langid.map.pattern if you want to change the default fieldname_langcode naming, but I'd leave those alone for now.

SOLR custom similarity for locations

I'd like to store in SOLR some items with addresses (City, State, ...) and I'd like to change how similarity is computed. The thing is that when comparing for example city I'm only interested if they are same and not if those strings are similar. Is there a way how to that? Is it through the custom similarity?
If so, can somebody please point me to how it can be done in Solr 6.2?
Thank you very much.
If you're only interested if something matches exactly, use a StrField (a StrField is case sensitive, so the case has to match as well). As you're only getting exact matches, the scoring will be the same for all documents.
The only time you need to implement a custom similarity class is if you want to score documents in a different way than what the built in similarities (or function queries) allow.
Matching exactly would be a regular query: city:Frankfurt. As long as the field is a StrField, only documents with exactly Frankfurt in that field will be returned (and unless you've added an index time boost for one of them, they'll all score identical).
Also, if you're sorting by a field (such as city), any score calculation will be thrown out.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

Solr: where to store additional information?

I want to provide additional information per each indexed document during index time.
And access this information in the same analyzer during query time to compare it.
So. Theoretically it would be great to write this value into some field present in this document and at query time search this field also.
f.e. I have an animals db. I want to find all documents with 3 words 'dog' inside. (just an example). I can setup for my "animals" field my custom BaseTokenFilterFactory which will produce my custom TokenFilter which will just count all 'dog' words and store this number somewhere. So. Where I can store this value to access it at searching time?
Your example sounds like something which will be better suited to be handled by custom Similarity or a query function in Solr and not as a custom analyzer.
For example if using Solr 4.0 you can use the function termfreq(field,term) to order by the number of times dog appears. or you can use it as a filter like so:
fq={!frange l=3 u=100000}termfreq(animals,"dog")
This will filter all documents whose animals field doesn't have at least 3 occurrences of the word dog.
The advantage of using this method is that you don't affect the scoring of the documents only filter them.
The ability to filter by function exists since Solr 1.4 so even if you are using an earlier version of Solr (>1.4) you can easily write the "termfreq" function query yourself

Resources