Does Solr have an API to read schema.xml? - solr

Is there any Solr API to read the Solr schema.xml?
The reason I need it is that Solr faceting is not backwards compatible. If the index doesn't define field A, but the program tries to generate facets for field A, all the facets will fail. Therefore I need to check in the runtime what fields we have in the index, and generate the facets dynamically.

Since Solr 4.2 the Schema REST API allows you to get the schema with :
http://localhost:8983/solr/schema
or with a core name :
http://localhost:8983/solr/mycorename/schema
Since Solr 4.4 you may also modify your schema.
more details on the Solr Wiki page

You can get the schema with http://localhost:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml
It's the raw xml, so have to parse it to get the information you need.
However, if your program generates an invalid facet, maybe you should just fix the program instead of trying to work around this.

One alternative is to use LukeRequestHandler. It is modeled after Luke tool which is used to diagnose the content of Lucene Index. The query /admin/luke?show=schema, will show you the schema. However, you will need to define it in solrconfig.xml like so :
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
Documentation of LukeRequestHandler link

Actually you have the Schema API for that.
The Solr schema API allows using a REST API to get information about the schema.xml
In Solr 4.2 and 4.3, it only allows GET (read-only) access, but in
Solr 4.4, new fields and copyField directives may be added to the schema. Future Solr releases will extend this functionality to allow more schema
elements to be updated
API Entry Points
/collection/schema: retrieve the entire schema
/collection/schema/fields: retrieve information about all defined fields, or create new fields with optional copyField directives
/collection/schema/fields/name: retrieve information about a named field, or create a new named field with optional copyField directives
/collection/schema/dynamicfields: retrieve information about dynamic field rules
/collection/schema/dynamicfields/name: retrieve information about a named dynamic rule
/collection/schema/fieldtypes: retrieve information about field types
/collection/schema/fieldtypes/name: retrieve information about a named field type
/collection/schema/copyfields: retrieve information about copy fields, or create new copyField directives
/collection/schema/name: retrieve the schema name
/collection/schema/version: retrieve the schema version
/collection/schema/uniquekey: retrieve the defined uniqueKey
/collection/schema/similarity: retrieve the global similarity definition
/collection/schema/solrqueryparser/defaultoperator: retrieve the default operator
Examples
Input
Get a list of all fields.
curl http://localhost:8983/solr/collection1/schema/fields?wt=json
Input
Get the entire schema in JSON.
curl http://localhost:8983/solr/collection1/schema?wt=json
More info here: apache-solr-ref-guide-4.5.pdf (search for Schema API)

Related

Apache solr search for multiple fields without specifying field name

Apache solr search for multiple fields without specifying field name in solr 7.7.2 version. Created copy field for all fields and assigning it to dest=“text” which is field of text type. But it doesn’t give any output. It works for only one field where df=fieldName.
It has managed schema which automatically override the changes after indexing, please let me know what would be the issue.

Prevent Solr from creating default copy fields

When I add any field in Solr and then index some data, Solr creates a copy field for this field.
For example I added a field named app_id and after indexing there are data both in app_id and another field named app_id_str.
Is there any way to prevent creating these copy fields ?
I am assuming you are using a reasonably new Solr version. (I do not have enough reputation to comment on the problem yet) You can prevent Solr from automatically creating copy fields during index time. You just have to configure the "add-schema-fields" update processor not to create copy fields on the fly. Here is how,
Open the solrconfig.xml file of the core you wish to disable adding copy fields automatically.
Comment out the configuration to disable the copy field creation on text fields (or any type of field that is configured to generate a copy field).
Save and restart the Solr instance.
Index the documents.
Schema.xml
Search for copyField definitions using wildcards in their glob pattern in schema.xml.
The copyField command can use a wildcard (*) character in the dest
parameter only if the source parameter contains one as well. copyField
uses the matching glob from the source field for the dest field name
into which the source content is copied.
You need to comment anything that looks like this :
<copyField source="*" dest="*_str"/>
You may also have some dynamicField definitions like the following that would create any copied fields (otherwise you would perhaps remember having explicitly defined such fields like app_id_str) :
<dynamicField name="*_str" type="string"/>
SchemaLess Mode
Internally, the Schema API and the Schemaless Update Processors both
use the same Managed Schema functionality.
If you are using Solr in "schemaless mode", you can do the same either by using the Schema API :
Delete a Copy Field Rule
Delete a Dynamic Field Rule
Or by reconfiguring the dedicated update processor in solrconfig.xml as stated by Kusal.
See the paragraph titled You Can Still Be Explicit below this section.

Does copy field requires data re-upload

I have solr instance having lacks of data uploaded. I want to create a new copyfield which is concatenation of existing two fields.
Do I need to repopulate my data?
Yes. From the solr documentation
Fields are copied before analysis is done
By analysis in copyfield context they mean index analizer, which executed when a document is indexed.

conversion of DateField to TrieDateField in Solr

I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.
You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.

Update document field with solrj

I want to edit document filed in solr,for example edit the author name,so i use the following code in solrj:
params.set("literal.author","anaconda")
but the author multivalued="true" in schema and because of that "anaconde" is not replace with it's previous name and add to the end of the author name,also if i ommit the multivalued field or set it to false the bad request exception happen in re-indexing file with new author field,how can i solve this problem and delete or modify the previous document field in solrj?
or does it any config i miss in schema?
thanks
The only option I know of would be to query the full document (all fields using &fl=* parameter) into a local construct with solrj, update the appropriate field(s) and them submit the entire document back to Solr.
Nope there is no way to update specific field for an document in Solr, nor through any of its Client apis.
EDIT :- With Solr 4.0 it it possible to Partially update the documents with certain fields.
This post should be the correct answer to your question (if you are using SOLR 4.x)
For Solr 4.0 you are able to update a single field on a document, but that version is ALPHA, if you are concerned.
But for the update thingy, it is only possible by CURL I think, I didnt find any way to update a single field on a doc on java side by solrj.
You have two options:
As stated in other answers, you can query for the original document, update the field, and then re-save which will overwrite the original document with the new values.
Your other option is to install a nightly build of Solr, where Yonik has added a patch for updateable documents. You should keep an eye on https://issues.apache.org/jira/browse/SOLR-139 as this patch is pretty new and still being worked on.

Resources