solr tries to upgrade my schema to managed - solr

This is on solr 7.1.0. I have a classic schema, with the proper line in solrconfig.xml:
<schemaFactory class="ClassicIndexSchemaFactory"/>
However I still get this line in the log:
ManagedIndexSchemaFactory The schema has been upgraded to managed,​ but the non-managed schema schema.xml is still loadable. PLEASE REMOVE THIS FILE.
And when I inspect that core's schema it's the generic schema, not the one defined in my schema.xml.

This sounds like the core you think is loading is not the one that that is actually loading. Because:
Upgraded schema should have been the same as the original one - you are seeing a different one
You don't see managed-schema in the directory you expect it to be
You keep getting the message
So, have a look at the overview page of that core and check whether the instance directory points to where you expect it to be.

Related

Managed-schema.xml file is overwritten when I populate Solr Managed Schema from Sitecore

In my solr managed-schema.xml file I added the following:
<copyField source="computedtitle_t" dest="computedtitlecopy_t" />
When I populate-schema from Sitecore, the managed-schema file is overwritten and so are my changes
Is there a patch file on the Sitecore side where I can add this and to what section?
Yes, Sitecore manages the Solr schema for you through the populate-schema function in the Control Panel. This is done via the SchemaPopulateHelper. You can implement your own class, implementing the ISchemaPopulateHelper interface and register it in the config.
A while back, I wrote a generic implementation of this where you can put your entire managed schema as part of the Sitecore config instead. This also allows leveraging from the Sitecore config file patch feature, so that your schema changes can go along with other Sitecore configs if needed.
You can read more about it here: https://mikael.com/2020/10/dealing-with-solr-managed-schema-through-sitecore-config-files/
Here are some more generic info about how Sitecore works with Solr and managed schema: https://mikael.com/2018/01/working-with-content-search-and-solr-in-sitecore-9/
You can use the code here as a starting point: https://github.com/mikaelnet/sitecore-solr-config
Please note that there was a small interface change in Sitecore 9.3 (I think), so the sample code may need some changes for it to work. Also, make sure you start with a managed schema that is equal to the one that's provided with the Sitecore version you're using. There may be a few changes in the default schema between the versions.

How does Solr's schema-less feature work? How to revert it to classic schema?

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?
And whether it's a good practice or not? Is there any way to disable it?
The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.
It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.
Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.
You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.
How to change Solr managed schema to classic schema
Stop Solr: $ bin/solr stop
Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.
Edit solrconfig.xml:
search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>
Rename managed-schema to schema.xml and you are done.
You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.
Is it a good practice? When to use it?
It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.
If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.
Limits
While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:
{"name":"John Doe"}
then you will get an error if you try to index a doc like that next:
{"name": {
"first": "Daniel",
"second": "Dennett"
}
}
That is because in the first case the field name was of type string while in the second case it is an object.
If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)
This is so called schemaless mode in Solr. I don't know about internal details, how it's implemented, etc.
bin/solr start -e schemaless
This snippet above will start Solr in schemaless mode, if you don't do that, it will work as usual.
For more information on schemaless, take a look here - https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

Disappearing cores in Solr

I am new to Solr.
I have created two cores from the admin page, let's call them "books" and "libraries", and imported some data there. Everything works without a hitch until I restart the server. When I do so, one of these cores disappears, and the logging screen in the admin page contains:
SEVERE CoreContainer null:java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
SEVERE SolrCore REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore#454055ac (papers) has a reference count of 1
I was testing my query in the admin interface; when I refreshed it, the "libraries" core was gone, even though I could normally query it just a minute earlier. The contents of solr.xml are intact. Even if I restart Tomcat, it remains gone.
Additionally, I was trying to build a query similar to this: "Find books matching 'war peace' in libraries in Atlanta or New York". So given cores "books" and "libraries", I would issue "books" the following query (which might be wrong, if it is please correct me):
(title:(war peace) blurb:(war peace))
AND _query_:"{!join
fromIndex=libraries from=libraryid to=libraryid
v='city:(new york) city:(atlanta)'}"
When I do so, the query fails with "libraries" core disappears, with the above symptoms. If I re-add it, I can continue working (as long as I don't restart the server or issue another join query).
I am using Solr 4.0; if anyone has a clue what is happening, I would be very grateful. I could not find out anything about the meaning of the error message, so if anyone could suggest where to look for that, or how go about debugging this, it would be really great. I can't even find where the log file itself is located...
I would avoid the Debian package which may be misconfigured and quirky. And it contains (a very early build of?) solr 4.0, which itself may have lingering issues; being the first release in a new major version. The package maintainer may not have incorporated the latest and safest Solr release into his package.
A better way is to download Solr 4.1 yourself and set it up yourself with Tomcat or another servlet container.
In case you are looking to install SOLR 4.0 and configure, you can following the installation procedure from here
Update the solr config for the cores to be persistent.
In your solr.xml, update <solr> or <solr persistent="false"> to <solr persistent="true">

Custom version number for Solr schema

Is it possible to store a custom version number somewhere in the Solr schema so that it could be retrieved by the client in order to verify that it is connected to a compatible Solr instance?
When I'm deploying a new version of the application to QA or production I need to be sure that all the data sources (Solr, RDBMS, etc) my app is connected to have been properly updated/migrated. So I want to perform some validation at the startup. It's easy with the database (e.g. storing current schema version in the VERSION table), but it's less obvious where to store the version information for the Solr schema.
The SystemInfoHandler will provide the version along with other information about the Solr instance. In later versions of Solr (3.x & 4.x), this is already enabled as part of the admin requestHandler.
You can access the information via http://localhost:8983/solr/admin/system from the example site distributed with Solr. Modify the url accordingly for your Solr configuration.
Note: If you are running an older version of Solr this can be enabled by adding the following line to the solrconfig.xml file.
<requestHandler name="/admin/system" class="solr.admin.SystemInfoHandler" />
Update:
For the specific scenario of knowing when the schema has changed (e.g. version the schema) can be accomplished by updating the name attribute of root node every time the schema file is modified. This name value will then be available in the SystemInfoHandler response.

How to start work on Solrnet

I have installed Apache Tomcat 6 and configure Solr 1.4. Now Solr service running successfully.
In Solr 1.4 has two important file solrconfig.xml and schema.xml to configure c# application with Solr service.
But whenver I changed schema.xml file according to sql table’s field it’s stop the Solr service.
so tell me step by step to configure solrconfig.xml and schema.xml
Thanks,
Unfortunately, the question isn't clear or is too broad, so I can only give general advice and point to documentation.
But whenver I changed schema.xml file according to sql table’s field it’s stop the Solr service.
Yes, after changing your schema you either have to restart your Solr instance, or, if you are using Cores (recommended), you have to reload the changed core.
tell me step by step to configure solrconfig.xml and schema.xml
Just change the files according to your needs. The Solr package contains numerous examples, all thoroughly commented. Documentation on solrconfig.xml is here. Documentation on schema.xml is here. After making any change in solrconfig.xml you have to restart your Solr instance.
Also, when making changes to the schema, make sure you reflect those changes in your SolrNet mapping.

Resources