I have solr instance having lacks of data uploaded. I want to create a new copyfield which is concatenation of existing two fields.
Do I need to repopulate my data?
Yes. From the solr documentation
Fields are copied before analysis is done
By analysis in copyfield context they mean index analizer, which executed when a document is indexed.
Related
When I add any field in Solr and then index some data, Solr creates a copy field for this field.
For example I added a field named app_id and after indexing there are data both in app_id and another field named app_id_str.
Is there any way to prevent creating these copy fields ?
I am assuming you are using a reasonably new Solr version. (I do not have enough reputation to comment on the problem yet) You can prevent Solr from automatically creating copy fields during index time. You just have to configure the "add-schema-fields" update processor not to create copy fields on the fly. Here is how,
Open the solrconfig.xml file of the core you wish to disable adding copy fields automatically.
Comment out the configuration to disable the copy field creation on text fields (or any type of field that is configured to generate a copy field).
Save and restart the Solr instance.
Index the documents.
Schema.xml
Search for copyField definitions using wildcards in their glob pattern in schema.xml.
The copyField command can use a wildcard (*) character in the dest
parameter only if the source parameter contains one as well. copyField
uses the matching glob from the source field for the dest field name
into which the source content is copied.
You need to comment anything that looks like this :
<copyField source="*" dest="*_str"/>
You may also have some dynamicField definitions like the following that would create any copied fields (otherwise you would perhaps remember having explicitly defined such fields like app_id_str) :
<dynamicField name="*_str" type="string"/>
SchemaLess Mode
Internally, the Schema API and the Schemaless Update Processors both
use the same Managed Schema functionality.
If you are using Solr in "schemaless mode", you can do the same either by using the Schema API :
Delete a Copy Field Rule
Delete a Dynamic Field Rule
Or by reconfiguring the dedicated update processor in solrconfig.xml as stated by Kusal.
See the paragraph titled You Can Still Be Explicit below this section.
I added a new field in my schema that is indexed but not stored, so that I can copy another field into it. Do I still have to re-index all the documents because of this schema change? Or can I just restart my solr server? I looks like I have to re-index all documents since sorting on that new non-stored field is giving me unexpected results, but I would like a confirmation on that.
You have to full re-index. As schema change can contain different IndexAnalyzers Solr can't apply schema changes by itself.
Yes, you have to run indexer to actually fill in the data to that filed
Is there any Solr API to read the Solr schema.xml?
The reason I need it is that Solr faceting is not backwards compatible. If the index doesn't define field A, but the program tries to generate facets for field A, all the facets will fail. Therefore I need to check in the runtime what fields we have in the index, and generate the facets dynamically.
Since Solr 4.2 the Schema REST API allows you to get the schema with :
http://localhost:8983/solr/schema
or with a core name :
http://localhost:8983/solr/mycorename/schema
Since Solr 4.4 you may also modify your schema.
more details on the Solr Wiki page
You can get the schema with http://localhost:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml
It's the raw xml, so have to parse it to get the information you need.
However, if your program generates an invalid facet, maybe you should just fix the program instead of trying to work around this.
One alternative is to use LukeRequestHandler. It is modeled after Luke tool which is used to diagnose the content of Lucene Index. The query /admin/luke?show=schema, will show you the schema. However, you will need to define it in solrconfig.xml like so :
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
Documentation of LukeRequestHandler link
Actually you have the Schema API for that.
The Solr schema API allows using a REST API to get information about the schema.xml
In Solr 4.2 and 4.3, it only allows GET (read-only) access, but in
Solr 4.4, new fields and copyField directives may be added to the schema. Future Solr releases will extend this functionality to allow more schema
elements to be updated
API Entry Points
/collection/schema: retrieve the entire schema
/collection/schema/fields: retrieve information about all defined fields, or create new fields with optional copyField directives
/collection/schema/fields/name: retrieve information about a named field, or create a new named field with optional copyField directives
/collection/schema/dynamicfields: retrieve information about dynamic field rules
/collection/schema/dynamicfields/name: retrieve information about a named dynamic rule
/collection/schema/fieldtypes: retrieve information about field types
/collection/schema/fieldtypes/name: retrieve information about a named field type
/collection/schema/copyfields: retrieve information about copy fields, or create new copyField directives
/collection/schema/name: retrieve the schema name
/collection/schema/version: retrieve the schema version
/collection/schema/uniquekey: retrieve the defined uniqueKey
/collection/schema/similarity: retrieve the global similarity definition
/collection/schema/solrqueryparser/defaultoperator: retrieve the default operator
Examples
Input
Get a list of all fields.
curl http://localhost:8983/solr/collection1/schema/fields?wt=json
Input
Get the entire schema in JSON.
curl http://localhost:8983/solr/collection1/schema?wt=json
More info here: apache-solr-ref-guide-4.5.pdf (search for Schema API)
I have the schema with 10 fields. One of the fields is text(content of a file) , rest all the fields are custom metadata. Document doesn't chnages but the metadata changes frequently .
Is there any way to skip the Document(text) while re-indexing. Can I only index only custom metadata? If I skip the Document(text) in re-indexing , does it update the index file by removing the text field from the Index document?
To my knowledge there's no way to selectively update specific fields. An update operation performs a complete replace of all document data. Since Solr is open source, it's possible that you could produce your own component for this if really desired.
I have a field with indexed="no" and stored="yes" and now I need to query that field.
How can build this index after setting indexed="yes"? or I need to do a complete reindex (re-import) ?
Thanks
No, you'll need to do a complete reindex. Solr indexes a document at a time and in order to have any change done to any of its fields, the whole document will have to be reindexed.
If all your fields are stored, you might be able to write some code to have the complete reindexing done without having to fetch the data again from the data source -- you can fetch the documents from Solr and then add them back to Solr.