Solr 5 custom field and filter - solr

I'm new to solr. After i tried using Solr 5 client. I want to try Solr 5 source code.
So my questions are,
can i create a custom field for my own core on solr 5 by editing
schema.xml? if it's possible, please tell me the location (it wasn't in my conf folder, should i create a new one?).
Is there any other method for adding a custom field other than using schema
api?
Everytime i try to create a new core and then index the files, there are only currency.xml, elevate.xml, managed-schema(generated schema), params.json, protwords.txt, solrconfig.xml, stopwords.txt synonyms.txt on my conf folder and there's no schema.xml. Did i miss something?
Is there any simple tutorial to explain the custom filter on solr 5?
I really appriciate your answer. Thank's

When you create a core in Solr 5 it comes by default with schemaless mode active. This mode make solr schema not visible and all changes need to be done with schema API. If you want to manage schema by yourself you could rename managed-schema to schema.xml and modify solrconfig.xml to not use schemaless mode. In solrconfig.xml replace
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
by
<schemaFactory class="ClassicIndexSchemaFactory"/>
Now solr will use schema.xml managed by yourself.
The only mandatory configuration files to use are solrconfig.xml and in your case schema.xml. The other files are used just if you configure some filters using them. If you are using the example schema.xml probably you need to have all these files. But clean the configuration files to have just the fields and field types you really expect to use.
To learn more about filters, tokenizers and analyzers you can take a look at https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.

Yes
Manually editing the schema and using the API, are only two ways, as far as i know.
How exactly are you creating this core? Are you using the install_solr_service.sh ? Assuming its a linux system, check /var/solr/configs folder. Thats where the config files are if you ran that script.
Yes of course :) . https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide There is a "Getting Started" section, which should answer all your questions, including where configs are stored, how to use them etc.
Happy Searching!

Related

Managed-schema.xml file is overwritten when I populate Solr Managed Schema from Sitecore

In my solr managed-schema.xml file I added the following:
<copyField source="computedtitle_t" dest="computedtitlecopy_t" />
When I populate-schema from Sitecore, the managed-schema file is overwritten and so are my changes
Is there a patch file on the Sitecore side where I can add this and to what section?
Yes, Sitecore manages the Solr schema for you through the populate-schema function in the Control Panel. This is done via the SchemaPopulateHelper. You can implement your own class, implementing the ISchemaPopulateHelper interface and register it in the config.
A while back, I wrote a generic implementation of this where you can put your entire managed schema as part of the Sitecore config instead. This also allows leveraging from the Sitecore config file patch feature, so that your schema changes can go along with other Sitecore configs if needed.
You can read more about it here: https://mikael.com/2020/10/dealing-with-solr-managed-schema-through-sitecore-config-files/
Here are some more generic info about how Sitecore works with Solr and managed schema: https://mikael.com/2018/01/working-with-content-search-and-solr-in-sitecore-9/
You can use the code here as a starting point: https://github.com/mikaelnet/sitecore-solr-config
Please note that there was a small interface change in Sitecore 9.3 (I think), so the sample code may need some changes for it to work. Also, make sure you start with a managed schema that is equal to the one that's provided with the Sitecore version you're using. There may be a few changes in the default schema between the versions.

solr tries to upgrade my schema to managed

This is on solr 7.1.0. I have a classic schema, with the proper line in solrconfig.xml:
<schemaFactory class="ClassicIndexSchemaFactory"/>
However I still get this line in the log:
ManagedIndexSchemaFactory The schema has been upgraded to managed,​ but the non-managed schema schema.xml is still loadable. PLEASE REMOVE THIS FILE.
And when I inspect that core's schema it's the generic schema, not the one defined in my schema.xml.
This sounds like the core you think is loading is not the one that that is actually loading. Because:
Upgraded schema should have been the same as the original one - you are seeing a different one
You don't see managed-schema in the directory you expect it to be
You keep getting the message
So, have a look at the overview page of that core and check whether the instance directory points to where you expect it to be.

Solr no content field = no highlighting

I want to add highlighting to my search result from Solr. My problem is the query don't contain any content field.
Search seems to work, but I guess that when I create the index I need to tell Solr to stored the texts or something.
I am running Solr on Windows.
java -Dc=aceapps -Dauto=yes -Ddata=files -Drecursive=yes -Dfiletypes=pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html -jar example/exampledocs/post.jar "\user\PowerBI"
Hard to answer when we don't know your schema.xml or solrconfig.xml
Have a look at the Wiki for the standard highlighter:
https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter
Solr also comes with an example core called techproducts and in there is a good example of highlighting. Look at the request handler and highlighter in the solrconfig.xml and the fields it's being applied to in that example.

How does Solr's schema-less feature work? How to revert it to classic schema?

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?
And whether it's a good practice or not? Is there any way to disable it?
The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.
It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.
Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.
You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.
How to change Solr managed schema to classic schema
Stop Solr: $ bin/solr stop
Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.
Edit solrconfig.xml:
search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>
Rename managed-schema to schema.xml and you are done.
You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.
Is it a good practice? When to use it?
It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.
If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.
Limits
While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:
{"name":"John Doe"}
then you will get an error if you try to index a doc like that next:
{"name": {
"first": "Daniel",
"second": "Dennett"
}
}
That is because in the first case the field name was of type string while in the second case it is an object.
If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)
This is so called schemaless mode in Solr. I don't know about internal details, how it's implemented, etc.
bin/solr start -e schemaless
This snippet above will start Solr in schemaless mode, if you don't do that, it will work as usual.
For more information on schemaless, take a look here - https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

How to start work on Solrnet

I have installed Apache Tomcat 6 and configure Solr 1.4. Now Solr service running successfully.
In Solr 1.4 has two important file solrconfig.xml and schema.xml to configure c# application with Solr service.
But whenver I changed schema.xml file according to sql table’s field it’s stop the Solr service.
so tell me step by step to configure solrconfig.xml and schema.xml
Thanks,
Unfortunately, the question isn't clear or is too broad, so I can only give general advice and point to documentation.
But whenver I changed schema.xml file according to sql table’s field it’s stop the Solr service.
Yes, after changing your schema you either have to restart your Solr instance, or, if you are using Cores (recommended), you have to reload the changed core.
tell me step by step to configure solrconfig.xml and schema.xml
Just change the files according to your needs. The Solr package contains numerous examples, all thoroughly commented. Documentation on solrconfig.xml is here. Documentation on schema.xml is here. After making any change in solrconfig.xml you have to restart your Solr instance.
Also, when making changes to the schema, make sure you reflect those changes in your SolrNet mapping.

Resources