I'm pretty new to Solr. I was reading the Solr highlighting documentation to discover whether there's a way, in the returned results, to highlight the stems of words that would be returned in an analysis query, but I was not able to find anything in the Solr documentation.
For example, if I use the Solr admin interface to index/query analyze named beetles running management" on the Solr sample field text_en it results in the last filter, the PSF, or PorterStemFilter, changing it to name beetl run manag. I'd like to find a way to query Solr so that it highlights the stems of any found matches.
For example, if my server data contained a phrase
naming beetles running managers
and I queried using the phrase
Named beetles running management
it would return the data in solr, naming beetles running managers but I would like it to return that matching data highlighted:
nameing beetles running managers
because
name beetle run manag
are the stems of the words in my query that match the server data.
Related
I would like to add ommitNorm=true to the title field.
It is wrongfully overboosting some of our titles.
However I don't know how the title field is indexed. What is its name - just dc.title?
Because in the schema.xml, I don't see anything about it. What is the type of that field, what analyzer or anything else is used for it. Is there anyway to know?
Most metadata fields in DSpace are handled via dynamic fields. That's why you don't see each specified individually in the search core's schema.xml file.
I'm not sure where the boosting is happening (or whether DSpace does any, even). I don't recall seeing any boost clauses when looking through the solr log files. I see some extraction parameters being set in SolrServiceImpl#writeDocument, where the document is being indexed. It looks like there is an extraction parameter for boosting individual fields, perhaps you can play with that to get what you'd like.
If you want to see the field type for any Solr field, the easiest option is probably the Schema Browser in the Solr admin user interface, eg
http://localhost:8080/solr/#/search/schema-browser?field=title (you may need to use an SSH tunnel or the like to access Solr running on a different host since the DSpace solr install is typically IP-limited to access from localhost).
Is there a way we can add documents into a specific shard?
For example, documents type A will always get inserted into shard1 and document type B always go to shard2.
I have tried using custom router but it does not guaranty that different prefix will route to different shard.
PS. I am on Solr 5 using cloud mode.
A caveat: I'm using SolrNet to access SolrCloud, and it doesn't integrate with ZooKeeper yet. For Java clients, this might be far easier.
Despite what I read here and here with regard to the CompositeId Router, I could never get it to work. What #jay helped me figure out is a way to use "implicit" routing to achieve this. If you create your collection like this (leave out the numShards parameter):
http://localhost:8983/solr/admin/collections?action=CREATE&name=myCol&maxShardsPerNode=2&router.name=implicit&shards=shard1,shard2&router.field=shard
then add a field to your schema.xml named "shard" (matching the router.field parameter), you can index to a specific shard simply by adding the shard field to the document being indexed and specifying the shard name. At query time, you can specify shards to search -- more here (I was able to simply specify the shard name w/o a specific address).
I haven't tested this in production yet, but have verified using multiple VirtualBox instances, with ZooKeeper, HAProxy, and several Solr nodes, and it's doing exactly what I expected. Corrections and comments welcome.
I'm moving a search from coldfusion 9 verity to coldfusion 10 solr, but i'm getting some weird results.
For example; if i search for "Fishing and Camping England" including the quotation marks on verity i get 7 results, and as you'd expect the results contain the correct phrase "Fishing and Camping England"
But when i search on solr, i get 1 result, and its a result i didn't get back previously. The context shows;
about fish! Camping England and
If i search the solr collection using different search terms, the results/documents i want are actually there. Is there something strange with solr and search terms in quotation marks? I looked on the Adobe site for solr terms, and it seems it should be fine. Buts it not! I get the same strange results on our local development server and our remote server.
For this example i changed the actual search words, but I hope you get the idea.
There is difference between working of verity and solr search engine. verity is classic search engine where as Solr is modern.Solr is more robust and fast. Raymond Camden have explained it well in his blog.
For difference in result in solr you have to chose a proper serach syntax that will return you desired result. Solr support multiple search syntax to find matching result. Here is some example of solr search syntax.
I have done an extensive search and found example curl commands contain REST requests,
but I have not come across a document that lists all of the available commands and their
options. Does such a document exist?
The good places to start are the following:
Solr Reference Guide: Searching
Solr Common Query Parameters
Solr Query Syntax (as #Wils commented above)
XML Messages for Updating a Solr Index
Recently I got involved in a task, and part of it require to use Apache Solr ( for Document Search) ,and Apache Tika ( to Extract the meta-text or plain text from documents)
I have n't integrated Solr and tika yet ,But I have worked with both of them individually I might have set of questions related to Apache Solr and Apache Tika , It might be at beginners level or average.
Following types of practical I did with Solr e.g. created a dummy database, wrote a program, configured - schema.xml things, ran Solr sever, and program which fetches documents from database and store in Solr Document Index , Made a Simple client to fetch data from Solr via JSON Interface, Made a Program which keeps MySQL Database to sync with Apache’s Solr document Index.
Following types of practical I did with tika e.g. compiled and Installed Tika, understood its document parsing capablities.
..
My Sample Task statement:
Part of my project require to store around 100,000 of documents (Data of these 100,000 (Doc,PDF,Txt) docs are fetched by Apache tika and pushed to MySql’s Database and later that pushed to apache Solr’s Document Database)for Full Text Search and search them those via a client interface (Browser)
In simple programmatical level this task will get done,
I would like to understand the challenges related to managing the index or something else in Solr e.g.
** In advanced level does it require optimizing the Solr’s Open Source Code?
** While Solr works in proper way, does it provide any specific challenges?
** What Key things need to consider initially so that, Solr should work in a proper way.
** Do you think any extra tool to developed to monitor Solr’s working ?
Hope you got the idea related to questions I have ?
** Also I would like to know If you have any experience of using apache Tika with apache Solr, and any challenges or key things to consider ?
Would you like to recommend and specific sources Or If you have any document or anything which you feel to be helpful.