Solr search on specific field gives weird results - solr

Solr experts -
I have two records in my solr database:
{
"keywords":["jaime kelly jkelly natixis sales and marketing manager"],
"job_role":"natixis sales & marketing manager",
"empl_name":["jaime kelly jkelly"],
},
{
"keywords":["schwayb jackson sjackson"],
"job_role":"portfolio manager",
"empl_name":["schwayb jackson sjackson"],
}
When I search on the field empl_name with the query:
empl_name:schwayb natixis
the first record returned is jaime kelly instead of schwayb jackson. This is weird. I am explicitly searching the field empl_name and among the two records the second one is the closer match. Why does Solr not order it correctly?
Looks like Solr sees the string "natixis" in the job_role and keywords of the earlier record and is giving it more preference. But I want solr to ONLY look at empl_name and no other field. How can I achieve this?

You need to group the query terms in parentheses. Or else it gets parsed as empl_name:schwayb OR natixis.

Related

SOLR : Matching Street Types

I am doing address matching in SOLR and for most part it is working fine. I have a situation where I would like return the same value for the following two cases:
10 SMITH COURT REDBANK PLAINS QLD
10 SMITH CT REDBANK PLAINS QLD
The street type abbreviation CT = COURT.
One option I have tried is to have both the records in SOLR, but that just leads to duplication of a lot of data. I have ~30 million records, but these could be halved if there is a way in SOLR to match as explained above.
Any suggestions how to handle this issue?
Synonyms allow users to find documents through multiple terms that might not have been used in the original document definition.
You can try using solr.SynonymGraphFilterFactory
For more details on the synonym filter please refer to the documentation.

Solr: Newly Observed facets

I have two fields in my solr index data: "userName" and "startTimeISO" along with many other fields.
Now I want to query for all the "userNames" that were seen TODAY but not seen in the last 30 days.
Basically, I am trying to find out Newly Observed UserNames for today.
Now the Solr Facet query I am running is:
facet.pivot: "userName,startTimeISO",
fq: " NOT startTimeISO:["2014-12-20T00:00:00.000Z" TO "2015-01-18T00:00:00.000Z"] AND startTimeISO:["2015-01-19T00:00:00.000Z" TO "2015-01-20T00:00:00.000Z"]"
But I am for some reason getting incorrect results.
For example, I see userName: "bla" the above query.
If I run the same query for tomorrow, I am again see "bla" in my Facet Results.
I am some how not able to get the correct logic. Perhaps I am not using all the tools provided by solr, which I am unaware of?
Can someone help me here. I dont mind testing all of your suggestions and coming back and forth with different suggestions.
In the meanwhile I am looking online to see if there is some other way to facet.
Update:
SOLUTION:
In case your data looks like:
"id": "1",
"userName": "one",
"startTimeISO": "2015-01-20T17:24:32.888Z"
"id": "2",
"userName": "one",
"startTimeISO": "2015-01-16T17:24:50.208Z"
"id": "3",
"userName": "two",
"startTimeISO": "2015-01-20T17:25:06.109Z"
You could use the below query combination:
q=*:*
fq=startTimeISO:[NOW-1DAY TO NOW] //this will give you all the users that
were seen today
fq=-_query_:"{!join from=userName to=userName}startTimeISO:[NOW-30DAYS TO
NOW-1DAYS]" //dont include those documents that have others with the same
name and were viewed during the last 30 days.
Thanks to Alvaro Cabrerizo for helping me out.
Here is the link to the same question on Solr mailing list:
http://lucene.472066.n3.nabble.com/Newly-observed-Facets-td4180538.html
There isn't one query that will do what you want. Your best bet is to first query for the user names seen today (a smaller number than all those in the last 30) returning that list to your client. A typical 'fq' querying for the last day will select those documents, and then facet.field=username with facet.limit=1000000 unfortunately high and facet.mincount=1. Now that you have this list on your client, submit a large query to Solr for faceting on the username field again and with a filter query for the next 29 days (don't include today), and an additional filter query to match just the usernames you found in the first query. Ideally the username filter would use the 'terms' QParser in Solr 4.10 but it's not essential. When this second query returns, this will show you which of the usernames seen today were also seen in the subsequent 29 days. With that information, you can subtract the sets of names and you have the usernames seen today.

How do I create a Solr query that returns results even if one field in my query has no matches?

Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)
You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation
You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

Returning documents using multi-valued field

I'm quite new to Solr and I'm supporting an existing Solr search engine which was written by someone else. I've been reading on Solr for the last couple of weeks so I'd consider myself beyond the basics.
A particular field, let's say name, is multi-valued. For example, a document has a field "name" with values "Alice, Trudy". We want that the document is returned when "Alice" or "Trudy" is input and not when "Alice Trudy" is entered. Currently the document is even with "Alice Trudy". How could this be done?
Thanks a lot!
Krt_Malta
If the field value is "Alice, Trudy", normally solr/lucene should match for "alice" or "trudy". If not, there could be special "Text Analysis" or stemming options active for this field.
Take a look at the part "text analysis" at the solr documentation: http://lucene.apache.org/solr/tutorial.html#Text+Analysis
and: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Resources