We noticed following issues with 'categoryname' field search in WebSphere Commerce, so trying to understand if it's rather a data set up issue or Commerce Search/SOLR is not designed to work with such type of scenarios.
We have more than 100 catalogs that site and customer specific, customers get their own catalog/category/items when they log in and there is no issue with category browsing or order placement, but having an issue with OOB keyword search since OOB IBM_findProductsBySearchTerm profile includes 'categoryname' as part of 'defaultSearch' while making SOLR calls along with name, shortDesc, keyword, and few other fields.
Having said that we are seeing too many and unwanted results that are not relevant for given search keyword since a match is found in some other customer's catalog category(s) name. We do see correct results if I comment below in wc-search.xml file, but this prevents searching categories in the current catalog as well.
<_config:field name="defaultSearch"/>
<_config:field name="categoryname"/>
For example, following are the categories that match 'candy' keyword but are not part of the current site and catalog(site with catalog D) keyword search, how do we prevent these getting scored during keyword search and still use categoryname search?
Rubys' candy -- in catalog A
Smith dairy stuff -- in catalog B
Kitchen Utensils -- in catalog C
Candy supplies -- in catalog E
Prep kits -- in catalog D, no items in this category have word 'candy' in it.
Basically we are getting items from 'Prep kits' category as well for the site with catalog D in keyword search since other catalog's categories have word 'candy' in it. In nutshell we are getting too many and non-relevant results the moment 'categoryname' field used in wc-search.xml or in direct SOLR query(qf=categoryname).
I believe the issue is because the categoryname is indexed as wc_text and multivalued with comma separated data across all catalogs in the system.
What kind of customization needs to be done to fix this issue, so that the search would return relevant results?
Thanks
There is nothing OOB since categoryname index data has no catalog_id visibility. Solved the issue by adding a dynamic and multivalued categoryname_ field and used that to replace existing categoryname qf in a custom ExpressionProvider class. This limits keyword searches to current catalog(s) categories only and returns correct results.
Related
I need to implement further functionality picture of it is attached below. I've already built an application based on Solr search.
In a few words about this functionality: drop down will contain similar search phrases within concrete category and number of items found.
In what way to make Solr collect such data and somehow receive it?
Yes, you can do that in Solr using Facets, which allow grouping results. The default behaviour of facets is to return the group name and the number of items found. You do that by adding these 2 items you your query string facet=true, facet.field=category.
An example query in your case will be
http://localhost:8983/solr/NAME_OF_YOUR_INDEX/select/?wt=json&indent=on&q=ipo&fl=category,name&facet=true&facet.field=category
Take a look at the tutorial for more details.
This is roughly equivalent to doing this in SQL:
SELECT category, COUNT(*) FROM items WHERE text LIKE "%ipo%" GROUP BY category;
Hope someone can help guide me in the right direction on this SOLR question...
I have a dataset that includes hotel features and ratings. Examples of features are 'pool', 'gym', 'free wifi'. Each feature has a rating between 1 and 10. I would like to use SOLR to query these feature/rating pairs in conjunction with some other hotel-related criteria like 'hotel name' or 'location' so that the hotels with most matching and highest rated features go to the top of the search results. Here's an example:
Imagine an end user is searching for a hotel in New York City (location:"New York City) that as 'free wifi' and a 'pool'. Ideally, the matches at the top of the search results are hotels in NYC that have 'free wifi' and 'pool' with the highest ratings. Hope that makes sense? Can anyone send me in the direction of the SOLR features that will allow be to execute this type of query?
Thanks.
I would use the edismax handler and do this in two parts.
First, does the hotel have a pool? Either use a field with multiple values, so you can search for amenities:pool, amenities:gym, etc., or use a set of boolean fields, pool:True, gym:True.
Second, have a field for the ratings, and use the boost feature to boost by each rating. The score will be multiplied by the boost. I've used a max() to make sure that a zero or negative rating does not cause problems. Here is a sample definition that would go into the request handler config in solrconfig.xml:
<str name="boost">product(max(pool_rating,1),max(gym_rating,1))</str>
product() accepts multiple args, so you can just keep adding them.
Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)
docs:- http://wiki.apache.org/solr/CommonQueryParameters#sort
My original schema is (You can consider the following is a GROUP-BY) :-
products (id, unique)
users who make some comments(multiValued)
last_comment_date for each user (multiValued, one user can make multiple comments, but only the last comment date is captured)
If sorting on multiValued is allowed,
I can easily get list of the products commented by certain users,
then sort by last_activity_date.
However, it does not work.
The workaround I have currently is to reverse the schema to :-
user + product (as id, unique)
user (single value)
last_comment_date
products
Which mean I (sort of) manage to get list of the products commented by certain users,
order by last_comment_date,
of course it lead to duplication of products
as product will appear in each of the user's comment.
Any suggestion to simulate a group-by effect.
Between, I using solr 3.1.
Field collapsing does not apply.
Sorting by a multi-valued field is not something that is just pending to do or can be patched.
It can't be possibly done because it simply doesn't make any sense.
The way to do this is to have a single-valued field (populated at index-time with the last date) per document, then sort on that. I.e. when indexing traverse the list of users with their last activity date, find the latest date, and assign it to the document's last-activity-date field.
We're using Solr to search on a Shop index and a Product index. Currently a Shop has a field shop_keyword which also contains the keywords of the products assigned to it. The shop keywords are separated by a space. Consequently, if there is a product which has a keyword "apple" and another which has "orange", a search for shops having Apple AND Orange would return the shop for these products.
However, this is incorrect since we want that a search for shops having Apple AND Orange returns shop(s) having products with both "apple" and "orange" as keywords.
We tried solving this problem, by making shop keywords multi-valued and assigning the keywords of every product of the shop as a new value in shop keywords. However as was confirmed in another post Querying Solr documents with one of the fields multi-valued, Solr does not support "all words must match
in the same value of a multi-valued field".
(Hope I explained myself well)
How can we go about this? Ideally, we shouldn't change our search infrastructure dramatically.
Thanks!
Krt_Malta
I am going to assume shop_keyword is a text field.
A keyword search of Apple AND Orange would return only shop_keyword terms that contain both Apple and Orange, provided you are searching on that field exclusively (shop_keyword:Apple AND Orange). For example, you should only see results that contain:
Apple Orange
And not:
Apple Mango
(I was able to confirm this on my local Solr instance with a text field)
However, you would see results that contain:
Apple Lime Orange Tree
(where "Orange Tree" is a single word but has spaces)
From the link you posted, it seems like this is the problem. So your real problem is that you have spaces in your keywords, which Solr is also using as a delimiter of sorts, in which case the technical solutions listed there are the only ones I know of. However...
If you have control of the terms and they aren't used in a free text search (or for google), you could consider removing the spaces from the keywords and adding quotes to your search. That would solve your problem:
shop_keyword:"Apple" AND "Orange"
Wouldn't return "Orange_Tree".
If you went this route you could use a separate field to index terms for free text search and other non-programmatic purposes.
Not ideal, but I hope that kinda helps =).
I am trying to search a SQL Server 2008 table (containing about 7 million records) for cites and countries based on a user input type text. The search string that I get from the user can be anything like:
"Hotels in San Francisco, US" or "New York, NY" or "Paris sddgdfgxx" or "Toronto Canada" terms are not allways separated by comma and not in a specific order and there might be unusefull data.
This is what I tried:
Method 1: FTS with contains:
ex: select * from cityNames where contains(cityname,'word1 and word2') -- with AND
select * from cityNames where contains(cityname,'word1 or word2') -- with OR
This didn't work very well because a term like 'sddgdfgxx' would return nothing if used with 'AND'. Using OR will work for one word cities like 'Paris' but not for 'San Diego' or 'San Francisco'
Method 2: this is actually a reverse search, the logic of it is to search if the user imput string contains any of the cities or countries from my table. This way I'll know for sure that 'Aix en Provence' or 'New York' was searched for.
ex: select * from cityCountryNames where 'Ontario, Canada, Toronto' like cityCountryNames
notes: I wasn't able to get results for two words cities and the query was slow.
Any help is appreciated.
I would strongly recommend using a 3rd-party API like the Google Geocoding API to take such input and parse it into a location with discrete parts (street address, city, state, country, etc.) Then you could use those discrete parts to search your database if necessary.
Map services like Google and Bing have solved this problem way better than you or I ever would, so why not leverage all the work they've done?
SQL isn't designed for the kinds of queries you are performing, certainly not scale.
My recommendation would be as follows:
Index all your places (cities + countries) into a Solr Index. Solr is a FOSS search server built using Lucene and can easily query the 7MM records index in milliseconds or less.
Query solr with the user typed string and voila the first match is the best match.
So even if the user typed "Paris sddgdfgxx", Paris should be your first hit. If you want to get really sophisticated use an n-gram approach (known as Lucene Shingles)
Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.