Where is Solr core schema.xml? - solr

I have started Solr and created a core using the following commands. I would like to modify the schema.xml file but cannot find it anywhere. Do the following commands create a schema.xml file?
bin\solr.cmd start
bin\solr.cmd create -c test

If you're using the managed schema (which you are by default when creating a core), the schema is meant to be changed through the Schema API.
However, if you stop Solr first, you can safely make edits to the managed-schema file, even if it tells you that you shouldn't hand edit it. Just make sure that nothing is running and relying on the state read from the file earlier - otherwise it'll be overwritten as the current state of the schema is written to the file.
If you want to use the classic schema.xml file, you can change your configuration to use the ClassicIndexSchemaFactory instead of the ManagedSchemaFactory.
You can change this definition in solrconfig.xml by adding
<schemaFactory class="ClassicIndexSchemaFactory" />

Related

Searching PDF files stored in database using SOLR

I have a lot of PDF files stored in a database (MSSQL) I need to search. They are stored as BLOB. I need a walk through on how to search them using SOLR.
I have a DB, lets call it "fred". Inside Fred is a table, we'll call it pdffiles. pdffiles has a column named pdfdata, of type BLOB.
The pdfs are stored in this table, with the binary data stored in the column. What steps do I take to get SOLR to extract this data and index it?
I'm guessing it involves the TikaEntityProcessor but having the pdfs stored in the database rather than just being regular files adds a level of complexity. I have previously worked with SOLR and have it running in production.
Sample dataconfig and schema files would be very useful.
What steps do I take to get SOLR to extract this data and index it?
create a new file called tika-data-config.xml which will have database configurations and the query to get the data.
You need to update the solrconfig.xml in a text editor and add the following within the config tags:
You need to mention the libs related to data-import handler.
Provide the respective database jar file.
Do the changes in the schema.xml file by mentioning your field. Add the proper fieldType for your field depending on your search requirement.
Once the setup is ready then you can request solr for indexing
using http://localhost:8983/solr/collection1/dataimport?command=full-import
Please refer the link at solr for more detailed...Configure DIH

Highlighting Solr search results with bin/post and managed schema

I've got Solr 6.6.1 installed. I run bin/post to fetch and index some documents into a new core. I'd like to add a text field and highlight on that field. I notice that in server/solr/myCore/conf that there is a file, managed-schema, with a warning that tells me not to edit the file.
What's the supported way to use bin/post AND enable highlighting on a text field?
Solr implicitly uses a ManagedIndexSchemaFactory, which is by default "mutable" and keeps schema information in a managed-schema file.
You have several choices:
Go back to <schemaFactory class="ClassicIndexSchemaFactory"/>, so you will be able to change schema file manually.
Stay with managed Schema API and just modification operations via HTTP to add new field, which you will use for highlighting.
I would recommend to stick with #2, but it's totally up to you. Official documentation will help you to choose which schema options for your text fields you need to get the best out of highlighting.

solr - synonyms.txt vs managed sy

I realize there are two ways to add synonyms:
1. using synonyms.txt and SynonymFilterFactory
2. using rest api by using ManagedSynonymFilterFactory.
The question is - can both of these be used together? If so, if a new entry is added to synonyms.txt, will it be returned when fetching synonyms using rest api and vice versa?
We currently use synonyms.txt but want to use the rest apis to update synonyms on the fly without having to restart solr. At the same time, we want to retain synonyms added using synonyms.txt and also retain ability to add new words using the txt file.
Also, if we add synonyms using the txt file, should solr always require a restart before changes are reflected? Or just a core reload is supposed to do it? - if the latter, it for some reason doesnt' work for us.

Solr runtimelib usage

I am trying to use the support that was added for specifying jars as runtime libraries when creating request handler's and other components. However, it is not clear to me from the documentation (https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode) whether this only works through components created through the ConfigAPI or if it should also work if runtimeLib="true" is added to solrconfig.xml.
For example:
<requestHandler name="/browse" class="solr.SearchHandler" runtimeLib="true">
I added runtimeLib="true" to all of my searchComponents and requestHandlers in solrconfig.xml to see if it would work, but when starting the Solr instances, they all fail because they are looking for a class that is in a custom jar file. I've added the .system collection and uploaded the jars per the Solr Reference Guide/Wiki documentation and can see the .system collection and I can also see that my collection's configoverlay.json has the two jars I uploaded.
My collection's configoverlay.json contents
{"runtimeLib":{
"my-custom-jar":{
"name":"my-custom-jar",
"version":1},
"sqljdbc41-jar":{
"name":"sqljdbc41-jar",
"version":1}}}
Is specifying a runtimeLib attribute in solrconfig.xml supported? If so, what is the proper usage?
You're almost there. Further down on the page that you are linking to there is an example of creating a parser. The example uses completely different example values than the rest of the page, so I can understand why you may have glossed over it.
The point is, that you need to register your request handler using the curl command provided on the page. Unfortunately, you need to use a command that I had to dig into the source code to find: create-requesthandler. To create a request handler using your values above, I think you should issue the command
curl "http://{servername}:8983/solr/{collection}/config" -H 'Content-type:application/json' -d '{
"create-requesthandler": {
"name":"my-custom-jar",
"runtimeLib": true,
"class": "solr.SearchHandler"
}
}'
remember to replace the values of servername and collection. And change the port if you are using a non-default value.
Restart your solr server and the plugin should be available.
Sadly loading classes from plugins in managed schemas seems to be unsupported at the moment:
https://issues.apache.org/jira/browse/SOLR-8751
This probably means that you have to add it dynamically via the API as mentioned above. So the solution could be to use a minimal managed schema and add the fields requiring external jars afterwards.
For me, the simplest solution was not to use the Blob API at all, and directly add the required jars to the classpath of the Solr instance, as described here:
http://lucene.472066.n3.nabble.com/Problems-while-setting-classpath-while-upgrading-to-Solr-5-1-from-Solr-4-X-tp4209853p4209863.html

managed-schema and other fancy stuff. What is it?

I created a new core in Standalone mode (please, correct me, if I'm saying something wrong). I did it like this (following Apache Solr Reference Guide 5.2):
$ bin/solr create -c test
I hoped to see, that everything (in fact almost nothing) I did goes hand in hand with the Reference Guide. On pages 13 and 14 the Guide clearly describes how the solr home directory should look like:
solr.xml
core_name1/
core.properties
conf/
solrconfig.xml
schema.xml
data/
...
However, when I go to ./server/solr/test/conf I see there 8 files and one directory:
currency.xml
lang/
params.json
solrconfig.xml
synonyms.txt
elevate.xml
managed-schema
protwords.txt
stopwords.txt
Phew... Terrible looking stuff which is not touched at all in the first chapter of the Reference Guide. I do not understad what I did wrong and what made the home directory of a new core look so ugly. I did not have in mind to create any currency.xml and other fancy files. But what looks worst of all is that I can not find any schema file, which judging by the Reference Guide should be the most important. I guess that now I should use managed-schema instead, but when I open it I see a really dreadful message:
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
Bump! A newbie like me creates a first core in his life, hopes to see a lovely schema file, but finds something that he can not even edit. So, my questions are
how actually to create a core and make hands dirty editing schema.xml file
is it possible to edit somehow managed schema or not
is there any reference guide that can be followed line by line and produce expected results?
The schema.xml is present in solr\configsets\basic_configs\conf location. You can copy it and place it in /conf directory and modify it, to add your fields. This would not affect any other cores.
Also, the other files are needed for how Solr manages the stop works while searching for strings, and currency details etc.
In the documentation of previous versions, these details were mentioned and thats how I figured it out the hard way. If you delete those stopwords and other files, you will eventually find error message that they are missing.
Hope this information helps. Happy learning
To add onto this, in your SolrConfig.xml you can find the schemaFactory tag and change the class attribute from "ManagedIndexSchemaFactory" to "ClassicIndexSchemaFactory". This will make Solr use schema.xml in the cores conf directory instead of generating the managed-schema file.

Resources