I've got a business case where I need to check if the search query is about displays businesses
eg: q="night clubs new york"
I've got a list of Countries, state city and region in my database 3million + records and I've got a list of business categories.
All I want to do is check if in the query has a business category in it (night clubs) and does it have a City, state or country's name (new york). So i'm checking the number of results retuned for the below query. If I get 2 numResults then this is a business query and then I query my Solr index to search for businesses.
query: places_ss:(night clubs new york) OR categories_ss:(night clubs new york)
Speed Question: How should I save the list of cities, states and countries in SOLR to get maximum search speed ?
Have one document id:places and add distinct cities, states and countries in on array places_ss
have multiple documents with different id's with 100,000 place names in each document in an array.
?
have a document or multiple documents with place_s string(not array) each place separated by space and each space in place separated by underscore eg: new york becomes new_york.
And during query time I will get multiple combinations of night clubs new york
eg: night night_clubs night_clubs_new night_clubs_new_york clubs_new clubs_new_york new_york york and query for place.
Would it be a good idea to have a separate core just for above place documents to increase speed ?
Is this a good solution ?
Document organisation :
better to have a document approche with :
- location
- activity
- other things needed!
location
You should save your location like this
Country:state:city:suburb.... so that you can seach in usa:new york:new york*
of ::new york
No need for _
avoid that, there is no needs !
activity
activity should be stored in another field for precision on the search and speed.
Related
I'm decent when it comes to working with Access, and can usually figure things out but am currently stumped on this query process.
I only have 2 shipment tables. One contains historical data (tblBookedLoads), while the second table (RMCData) contains current orders.
What I need to do, is for each row in RMCData, figure out who has driven the same route recently (Origin to Destination) and send them an email with the details of the matching shipments... and any others that match, IE, if I have 50 shipments from Atlanta to Memphis tomorrow, I don't want to send them 50 emails.
I'm just not sure about how to go about retrieving the information since it it stored in different formats, or how to create a query based on row by row data from fields in another table... Let me know what you think. I'm guessing it has something to do with looping in vba, but not sure.
Both tables are read only.
tblRMCData 'Contains details for new orders (approx. 1000 rows)
' has fields 'ReferenceNum','Ship Date','Origin City', 'Origin State','Origin Zip','destination', 'Weight','Comments","Comments2"
'Origin/Destination City, State and Zip are separated
tblBookedLoads 'contains history of orders (approx. 20000 rows). CONTAINS DUPLICATES
' has fields 'Carrier', 'Ship Date", 'Order','Origin', 'Destination','Weight','Bill Amount",'email address'
' Origin/Destination is populated as "LITTLE ROCK, AR 12345"
Any advise is appreciated.
Will duplicated documents impact search results?
For example, we have an index that we can have the same documents repeated and different by only one field.
Index: ChannelID, ProductID, ProductName and ProductDescription
We may have the same product on different ChannelIDs. So, if we have 100 ChannelIDs, we will have 100 times the same product (document) if this product is available is on all channels.
When doing a search, because of these repetition of documents (same product name, description), will it impact the results quality?
Thanks.
Depending on your search, similar documents would all show up in search results. For example, in your ‘100 different channel ids but same product’ example, if one searches by product description (assuming the same product gets the same description), all of the 100 documents of that product would either be returned if the search matched or none of them will.
Need help figuring out a good way to store data effectively and efficiently
I'm using Parse (JavaScript SDK), here's an example of what I'm trying to store
Predictions of football (soccer) matches so an example of one match would be;
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 predicts the score will be Team A 2-0 Team B -> so 2-0
User456 predicts the score will be Team A 1-3 Team B -> so 1-3
Each event has information attached to it like an eventId, several categories, start time, end time, a result and more
I need to record a score prediction per user for each event (usually 10 events at a time so a lot of predictions will be coming in)
I need to store these so I can cross reference the correct result against the user's prediction and award points based on their prediction, the teams in the match and the categories of the event but instead of adding to a total I need all the awarded points stored separately per category and per user so I can then filter based on predictions between set dates and certain categories e.g.
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 prediction = 2-0
Actual result = 2-0
So now I need to award X points to User123 for Team A, Team B, "League-1", and "Sunday-League" and record it to the event date too.
I would suggest you create a table for games and a table for users and then an associative table to handle the many to many relationship. This is a pretty standard many to many relationship.
I have two indexed fields in my Solr schema
Employee Name
Manager Name
Which are plain strings.
my Question is: Given a search term, I want to display top 5 suggested completions from Manager Names and the next 5 from Employee Names.
I can use copy fields, but sometimes I get all top 10 results from Employee Names.
I have a hunch that boosting can help me.. but could not figure out how?
Boost can't help you control the results and distribute 5 each in the top 10 results.
Probably you can check on Field Collapsing, where you can group per role (Manager and Name) and limit 5 results for the group.
So you would have 2 groups returned back to you with 5 results each.
Using Solr 3.3
Key Store Item Name Description Category Price
=========================================================================
1 Store Name Xbox 360 Nice game machine Electronic Games 199.99
2 Store Name Xbox 360 Nice game machine Electronic Games 199.99
3 Store Name Xbox 360 Nice game machine Electronic Games 249.99
I have data similar to above table and loaded into Solr. Item Name,
description Category, Price are searchable.
Expected result
Facet Field
Category
Electronic(1)
Games(1)
**Store Name**
XBox 360 Nice game machine priced from 199.99 - 249.99
What will be the query parameters that I can send to Solr to receive results above, basically I wan to group it by Store, ItemName, Description and min max price
And I want to keep paging consistent with the main (StoreName). The paging should be based on the Store Name group. So if 20 stores were found. I should be able to correctly page.
Please suggest
If using Solr 4.0, the new "Grouping" (which replaces FieldCollapsing) fixes this issue when you add the parameter "group.facet=true".
So to group your fields you would have add the following parameters to your search request:
group=true // Enables grouping
group.facet=true // Facet counts to be number of groups instead of documents
group.field=Store // Groups results by the field "Store"
group.ngroups=true // Tells Solr to return the number of groups found
The number of groups found is what you would show to the user and use for paging, instead of the normal total count, which would be the total number of documents in the index.
Have you looked into field collapsing? It is new in Solr 3.3.
http://wiki.apache.org/solr/FieldCollapsing
What I did is I created another field that grouped the required fields in a single field and stored it, problem solved, so now I just group only on that field and I get the correct count.