Solr : How to implement this logic? - solr

I have the following scenario. Suppose I have table a big table like this.
Id(unique) returnMe desc name value
1 user1 all those living in usa country USA
2. user2 all those like game game football
3. user1 my hobbies are hobby guitar
Now, how can I get results (returnMe) for following queries.
1. For all those users who live in usa AND like guitar
2. For all those users who live in usa OR like guitar.
Please donot modify query in anyway.
For my solConfig.xml 'desc' , 'name' , 'value' are searchable , indexable fields.
Thanks for any help.
Well I am editing this to explain my logic ..
Step 1: Break query on AND like (live in USA) AND (like guitar)
Step 2: Then select returnMes from first query and returnMes from second query.
Step 3: Take common returnMes, returned from first query and second query.
Is there any way Solr can do that. Can we do it through Solr "join" or not or some otherway ??
I do want to do that in my PHP , it would be massive overhead.

You are going to need to modify the query in some way. A simple step to parse the query and add parentheses to it, and possibly field names to search. You could reasonably easily transform those queries into something like:
(For all those users who live in usa) AND (like guitar)
(For all those users who live in usa) OR (like guitar)
or perhaps you can cut out "for all those users who" and have simply:
(live in usa) AND (like guitar)
(live in usa) OR (like guitar)
And set the query field to value. Of course, you could run into issues, if you had a document with value=users, or something of that nature, since it will search for each term present in the value field.
If you really want to be able to work with natural language, than you can take look at the OpenNLP project.

Related

Solr dynamic sorting

We have a website on which you can search through a large amount of products from different shops. Say we have 5 products per result page and the 10 best matches for a search have all the same score. 8 of the products are of one shop (A), and the two others by two other shops (B,C).
What we often get is (letter indicating a product of this shop)
A
A
A
A
A
---- second result page ----
A
B
A
C
A
but what we want to get is something like this:
A
C
B
A
A
---- second result page ----
A
A
A
A
A
Writing function query seems to be one option
http://www.solrtutorial.com/custom-solr-functionquery.html
What is the best way to achieve this?
You could group the results by shop using Field Collapsing and display the result either as a group or flattened list (depending on how you want it).
Another trick that I've seen in use to help the users see results from multiple group is to use Facets. You could have a sidebar (or something similar) that does two things:
By default it lets the user know that there are other filter criteria (ex. shops) in the result. This helps a lot when the result is paginated.
With facets being present, it is upto the user to choose whatever criteria she/he wishes to apply, thus relieving you of implementing heavy scenario based logic.
Read more about faceting here.
Edit:
If you have to use custom sort logic, you could write it down using Functions and use it in the sort when querying Solr. Here is the reference from the docs.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

Solr End User Query Translation

I am wondering if there is anyway to transform an end user query to a more complicated solr query based on some rules.
For example, if the user types in 32" television, then I want to use the dismax query parser to let solr take care of this user query string like below:
http://localhost:8983/solr/select/?q=32" television&defType=dismax
However, if the user types in "televisions on sale", then I want to do a regular search for token televisions and onsale flag is true like below:
http://localhost:8983/solr/select/?q=name:televisions AND isOnSale:true
Is this possible? Or must this logic require an advance search form where the user can clearly state in a checkbox that they only want on sale items.
Thanks.
Transforming the user query is quite possible. You can do it in following two ways
implement a Servlet Filter that listens to user query transforms it before dispatching it to solr request handler.
Look at query parser plugin in SOLR and implement one based on the existing one like standard query parser and modify it to apply transformation rules.
Let the search happen through the whole index and let the user choose. If a review shows up, render it with the appropriate view. If a product shows up, offer to search for more products.
Samsung 32 in reviews --read more
LG 32 in offers --find more like this
Your offers page can offer more options, such as filtering products on sale.
You may use a global boost field on documents. For example, a product on sale has a score of 1.0 while out of stock products have 0.33. A review of a new products has 1.0, old products have less.
Maybe you can set up the search so when someone searches for whatever have isOnSale as a secondary sort parameter. So by default sort by score then sort by isonsale or just sort by isonsale. That way you will still get all "television" ads in the results just the ones on sale are on top.

Solr complicated faceting

I have problems with faceting. Imagine this situation. Product can be in more than one category. This is common behavior for faceting:
Category
Android (25)
iPhone (55)
other (25)
Now when I select "Android", I make new query with "fq" => "category:Android", I will get:
Category
Android
iPhone (15)
other (2)
But this means that there is 15 products, that are in categories "Android" AND "iPhone". I would like something like this: ("Android" OR "iPhone")
Category
Android
iPhone (+5)
other (+1)
Meaning I will get 25 results by selecting "Android (25)" and another 5 by selecting "iPhone (+5)", so finally I will get 30 search results..
Does anyone know if this is possible with SOLR's faceting? Or perhaps with more than one query and calculate it manually?
Thanks for advice!
Try a new query with the negative of the selections, like "fq" => "-category:Android" - you should then get the facet counts you are looking for.
Depending on all the permutations you need, you probably want to look into query facets that enable you to get counts for arbitrary queries. For instance, you can do facet.query=category:("Android" OR "iPhone") and get a count results keyed on category:("Android" OR "iPhone"). And, you can do this for any number of queries you want counts for. So, in your case, you can probably get to a final solution with some combination of straight field facets and query facets.
Edit: Re-reading you question, you may also want to look into tagging and excluding parts of an extra fq, depending on how you are allowing your users to "select into" the choices. (The example in the docs is fairly close to your original setup, although I'm not sure the end behavior is exactly as you desire).

How can I find a city and country based on a user search?

I am trying to search a SQL Server 2008 table (containing about 7 million records) for cites and countries based on a user input type text. The search string that I get from the user can be anything like:
"Hotels in San Francisco, US" or "New York, NY" or "Paris sddgdfgxx" or "Toronto Canada" terms are not allways separated by comma and not in a specific order and there might be unusefull data.
This is what I tried:
Method 1: FTS with contains:
ex: select * from cityNames where contains(cityname,'word1 and word2') -- with AND
select * from cityNames where contains(cityname,'word1 or word2') -- with OR
This didn't work very well because a term like 'sddgdfgxx' would return nothing if used with 'AND'. Using OR will work for one word cities like 'Paris' but not for 'San Diego' or 'San Francisco'
Method 2: this is actually a reverse search, the logic of it is to search if the user imput string contains any of the cities or countries from my table. This way I'll know for sure that 'Aix en Provence' or 'New York' was searched for.
ex: select * from cityCountryNames where 'Ontario, Canada, Toronto' like cityCountryNames
notes: I wasn't able to get results for two words cities and the query was slow.
Any help is appreciated.
I would strongly recommend using a 3rd-party API like the Google Geocoding API to take such input and parse it into a location with discrete parts (street address, city, state, country, etc.) Then you could use those discrete parts to search your database if necessary.
Map services like Google and Bing have solved this problem way better than you or I ever would, so why not leverage all the work they've done?
SQL isn't designed for the kinds of queries you are performing, certainly not scale.
My recommendation would be as follows:
Index all your places (cities + countries) into a Solr Index. Solr is a FOSS search server built using Lucene and can easily query the 7MM records index in milliseconds or less.
Query solr with the user typed string and voila the first match is the best match.
So even if the user typed "Paris sddgdfgxx", Paris should be your first hit. If you want to get really sophisticated use an n-gram approach (known as Lucene Shingles)
Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.

Resources