managed-schema and other fancy stuff. What is it? - solr

I created a new core in Standalone mode (please, correct me, if I'm saying something wrong). I did it like this (following Apache Solr Reference Guide 5.2):
$ bin/solr create -c test
I hoped to see, that everything (in fact almost nothing) I did goes hand in hand with the Reference Guide. On pages 13 and 14 the Guide clearly describes how the solr home directory should look like:
solr.xml
core_name1/
core.properties
conf/
solrconfig.xml
schema.xml
data/
...
However, when I go to ./server/solr/test/conf I see there 8 files and one directory:
currency.xml
lang/
params.json
solrconfig.xml
synonyms.txt
elevate.xml
managed-schema
protwords.txt
stopwords.txt
Phew... Terrible looking stuff which is not touched at all in the first chapter of the Reference Guide. I do not understad what I did wrong and what made the home directory of a new core look so ugly. I did not have in mind to create any currency.xml and other fancy files. But what looks worst of all is that I can not find any schema file, which judging by the Reference Guide should be the most important. I guess that now I should use managed-schema instead, but when I open it I see a really dreadful message:
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
Bump! A newbie like me creates a first core in his life, hopes to see a lovely schema file, but finds something that he can not even edit. So, my questions are
how actually to create a core and make hands dirty editing schema.xml file
is it possible to edit somehow managed schema or not
is there any reference guide that can be followed line by line and produce expected results?

The schema.xml is present in solr\configsets\basic_configs\conf location. You can copy it and place it in /conf directory and modify it, to add your fields. This would not affect any other cores.
Also, the other files are needed for how Solr manages the stop works while searching for strings, and currency details etc.
In the documentation of previous versions, these details were mentioned and thats how I figured it out the hard way. If you delete those stopwords and other files, you will eventually find error message that they are missing.
Hope this information helps. Happy learning

To add onto this, in your SolrConfig.xml you can find the schemaFactory tag and change the class attribute from "ManagedIndexSchemaFactory" to "ClassicIndexSchemaFactory". This will make Solr use schema.xml in the cores conf directory instead of generating the managed-schema file.

Related

Where is Solr core schema.xml?

I have started Solr and created a core using the following commands. I would like to modify the schema.xml file but cannot find it anywhere. Do the following commands create a schema.xml file?
bin\solr.cmd start
bin\solr.cmd create -c test
If you're using the managed schema (which you are by default when creating a core), the schema is meant to be changed through the Schema API.
However, if you stop Solr first, you can safely make edits to the managed-schema file, even if it tells you that you shouldn't hand edit it. Just make sure that nothing is running and relying on the state read from the file earlier - otherwise it'll be overwritten as the current state of the schema is written to the file.
If you want to use the classic schema.xml file, you can change your configuration to use the ClassicIndexSchemaFactory instead of the ManagedSchemaFactory.
You can change this definition in solrconfig.xml by adding
<schemaFactory class="ClassicIndexSchemaFactory" />

Highlighting Solr search results with bin/post and managed schema

I've got Solr 6.6.1 installed. I run bin/post to fetch and index some documents into a new core. I'd like to add a text field and highlight on that field. I notice that in server/solr/myCore/conf that there is a file, managed-schema, with a warning that tells me not to edit the file.
What's the supported way to use bin/post AND enable highlighting on a text field?
Solr implicitly uses a ManagedIndexSchemaFactory, which is by default "mutable" and keeps schema information in a managed-schema file.
You have several choices:
Go back to <schemaFactory class="ClassicIndexSchemaFactory"/>, so you will be able to change schema file manually.
Stay with managed Schema API and just modification operations via HTTP to add new field, which you will use for highlighting.
I would recommend to stick with #2, but it's totally up to you. Official documentation will help you to choose which schema options for your text fields you need to get the best out of highlighting.

automating the solr index mechanism

I have indexed few PDF files in solr. I have used curl command for now. My requirment is that if files are pushed to a perticular directory, those files must be indexed. no manual indexing should be done. When files come, it must be indexed. Is there any way to achieve this ? I am new to Solr. Pls give brief suggestions. Thanks in advance.
I can see 2 options.
Create cron job (or something like that)
Try to use DataImportHandler's scheduler
I would probably lean more towards cron(like) solution 1.
That way after file got indexed it can be moved to separate folder. This is very basic solution, using proper queueing system should give you option to process many files at once.

solr/browse gives page not found error.

How to make browse page load ? I have added handler as given in the page
https://wiki.apache.org/solr/VelocityResponseWriter
Still not working. Can any one brief me on this. Thanks in advance.
Couple of things to check:
Have you restarted Solr?
Is the core you are trying to 'browse' a default core? If not, you need to include the core name in the URL. E.g. /solr/collection1/browse
Are your library statements in solrconfig.xml pointing at the right velocity jar? Use absolute path unless you are very sure that you know what your base directory is for the relative paths
Are you getting any errors in the server logs?
If all fails, start comparing what you have with the collection1 example in Solr distribution. it works there, so you can compare nearly line-by-line the relevant entries and even experiment with collection1 to make it more like your failing example.

How do I set up a new Solr core using data from an existing core?

I saw that there was a similar question asked 3 years ago, but I figure it's OK to duplicate as 1) the existing q is 3 years old and 2) I have different problems and a different version of Solr.
Here's the story. I was given a copy of the "Index" directory of an existing Solr core by a collaborator. I am trying to set up my own core locally and using that index. The existing core was from a Solr 4.1.0 installation. (I have tried, and failed, to set up both Solr 4.3.1 and Solr 4.1.0.) I'm running Solr with Jetty.
What's the problem, you ask? Well, I replace the config files (schema.xml and solrconfig.xml) in the default example core with the ones my collaborator gave me. And then I run Jetty. This creates a new Index folder. I delete the contents of that Index folder and copy in the contents of the Index folder I was given.
The result is that Solr gives me an error indicating that "segments" files cannot be found. So I noticed that there are two files (segments.gen and segments_1) that are created with the initial Index folder. I experiment with leaving those in the Index folder but replacing everything else. Now Solr seems to be working (the browswer interface is working) but it reports "Num docs: 0" and a *:* query gives me 0 results.
Anyone have any ideas? I'm happy to provide more info. Thanks in advance.
You have to use segments.gen and segments_1 from the original index. Ask you collaborator to give you those files also. But since you mentioned that collaborator gave you a copy of index folder, so you must already be having those files.
Note that it might not be necessary that segment_1 is present in your original index copy. It can be segment_N. Whatever segment_ file is there in original copy, copy that to new index and restart jetty.
segments.gen records the current generation (the _N in segments_N) in the index, as a fallback in case directory listing of the files fails to locate the segments_N file (eg on filesystems, like NFS, where the directory listing may come from a stale cache)

Resources