Imagine a search engine that doesn't have the word 'Iran' in the index. If a user searches for 'Iran' we have no results. We could use SOLR's spelling correction and suggest 'iron' instead. Alternatively, we could use a synonym dictionary and replace 'Iran' by 'Persia' (assuming 'Persia' is in the index). However, we don't know what the user wants to search for. So I would like to present both words, 'Persia' and 'iron', as suggestions.
Thus my question: can I access the synonym dictionary from the SOLR-client?
I am not sure, whether this should be solved by SOLR at all. It would of course be easy to store the synonym list e.g. in a SQL-Database and get suggestions from there. On the other hand it might be good to keep the number of systems and dependencies as small as possible.
You can access synonyms file using simple GET method and then you can do the mapping.
localhost:8983/solr/#/coreName/files?file=synonyms.txt
Related
Scenario.
I have a document in the database which has thousands of item in
'productList' as below.
here
All the object in array 'productList' has the same shape and same fields with different values.
Now I want to search in the following way.
when a user writes 'c' against 'Ingrediants' field, the list will show all 'Ingrediants' start with alphabet 'c'.
when a user write 'A' against 'brandName' field, the list will show
all 'brandName' start with alphabet 'A'.
please give an example using this to search for it, either it is by
creating an index(json,text).
creating a Search index (design document) or
using views etc
Note: I don't want to create an index at run-time(I mean index could be defined by Cloudant dashboard) I just want to query it, by this library in the application.
I have read the documentation's, I got the concepts.
Now, I want to implement it with the best approach.
I will use this approach to handle all such scenarios in future.
Sorry if the question is stupid :)
thanks.
CouchDB isn't designed to do exactly what you're asking. You'd need one index for Ingredient, and another for Brand Name - and it isn't particularly performant to do both at once. The best approach I think would be to check out the Mango query feature http://docs.couchdb.org/en/2.0.0/api/database/find.html, try the queries you're interested in and then add indexes as required (it has the explain plan to help make this more efficient).
I have a set of keywords defined by client requirements stored in a SOLR field. I also have a never ending stream of sentences entering the system.
By using the sentence as the query against the keywords I am able to find those sentences that match the keywords. This is working well and I am pleased. What I have essentially done is reverse the way in which SOLR is normally used by storing the query in Solr and passing the text in as the query.
Now I would like to be able to extend the idea of having just a keyword in a field to having a more fully formed SOLR query in a field. Doing so would allow proximity searching etc. But, of course, this is where life becomes awkward. Placing SOLR query operators into a field will not work as they need to be escaped.
Does anyone know if it might be possible to use the SOLR "query" function or perhaps write a java class that would enable such functionality? Or is the idea blowing just a bit too much against the SOLR winds?
Thanks in advance.
ES has percolate for this - for Solr you'll usually index the document as a single document in a memory based core / index and then run the queries against that (which is what ES at least used to do internally, IIRC).
I would check out the percolate api with ElasticSearch. It would sure be easier using this api than having to write your own in Solr.
I have an object Person with fields firstName,lastName etc.
After finding the list of persons available. I need to find persons whose firstName contains substring "ho" . How can I do this?
I would have used LIKE with wild cards but my application is hosted on google app engine, so I cant use LIKE in the SQL Query. Tried it before did not work. Any suggestions how I can do this without traversing each object in the list?
You really need to think of the datastore in a different manner than a relational database. What that essentially means is that you have to be smart about how you store your data to get at it. Without having full text search, you can use a strategy to mimic full text search by creating a key list of searchable words and storing them in a child entity in an entity group. Then you can construct your query to return the keys of the parent object that match your "query string". This allows you to have indexing without the overhead of full text search.
Here's a great example of it using Objectify but you can use anything to accomplish the same thing (JPA, JDO, low level API).
http://novyden.blogspot.com/2011/02/efficient-keyword-search-with-relation.html
You can't, at least, not if you're using the BigTable-based datastore.
I'm not entirely sure on the vocabulary, but what I'd like to do is send a document (or just a string really) and a bunch of keywords to a Solr server (using Solrnet), and have a return that tells me if the document is a match for the keywords or not, without having the document being stored or indexed to the server.
Is this possible, and if so, how do I do it?
If not, any suggestions of a better way? The idea is to check if a document is a match before storing it. Could it work to store it first with just a soft commit, and if it is not a match delete it again? How would this affect the index?
Index a document - send it to Solr to be tokenized and analyzed and the resulting strings stored
Store a document - send it to Solr to be stored as-is, without any modifications
So if you want a document to be searchable you need to index it first.
If you want a document (fields) to be retrievable in its original form, you need to store a document.
What exactly are you trying to accomplish? Avoid duplicate documents? Can you expand a little bit on your case...
I have a database that stores details of Code Chekins from various SCRs. One of the table in this database store Commit Comments for each checkin. I am trying to develop a search feature which with the help of Postgres posix notation searches through this table trying to match a regular expression on this comment field and return all the matched.
I have already got this to work, but the main problem here is the performance of this search. For a fairly big database it almost takes 15-20 mins for a search to complete and as its a web frontend waiting for the result this is totally unacceptable time for a medium sized database.
I figured that creating an index on this text field might help but I am unable to create a btree index because data for some of the rows is too big for potgres to create index on it.
Is there any other solution to this? Are there any other indexes that can be created which again should not be language dependent?
Check the full text search functions, regular expressions can't use indexes.
Now, you can use pg_trgm extension.
Documentation:
http://www.postgresql.org/docs/9.1/static/pgtrgm.html
Good start point:
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
Yeah, Full Text Searching is your answer here. PostgreSQL has a pretty robust and fast FTS capability.
Others have mentioned full text searching. If you need regular expressions rather than full text searching, there is no way to index them in a generic way. As long as the expression is anchored at the beginning of the string (using ^ at the start), an index can usually be used, but for generic regular expressions, there is no way to use an index for searching them.
use pg_trgm extension
CREATE EXTENSION pg_trgm;
then you can create index for field name like
CREATE INDEX tmp ON companies USING GIN (name gin_trgm_ops);
this index will be used for search like
SELECT * from companies where name ~* 'jet'