how to edit solr 5 schema which is created by default - solr

How do I edit a schema such as the gettingstarted collection as mentioned in
https://lucene.apache.org/solr/quickstart.html
Thanks
Joyce

Solr 5 uses a managed schema by default, while Solr 4 used the schema.xml file. Solr 5 automatically creates the schema for you by guessing the type of the field. Once the type is assigned to the field, you can't change it. You have to set the type of the field before you add data to Solr 5.
To change the schema in Solr 5, you will want to use the Schema Api, which is a REST interface.
Schemaless Mode states the following:
You Can Still Be Explicit - Even if you want to use schemaless mode for most fields, you can still use the Schema API to pre-emptively create some fields, with explicit types, before you index documents that use them.
... Once a field has been added to the schema, its field type is fixed.
If you are using the quick start guide for Solr 5, here's what you have to do if you want to explicitly specify the field types:
After you end the following command: bin/solr start -e cloud -noprompt
Then enter a command like this:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : { "name":"MYFIELDNAMEHERE", "type":"tlong",
"stored":true}}' http://localhost:8983/solr/gettingstarted/schema
The previous command will force the MYFIELDNAMEHERE field to be a tlong. Replace MYFIELDNAMEHERE with the field name that you want to be explicitly set, and change tlong to the Solr type that you want to use.
After doing that, then load your data as usual.

Related

Solr: ignore or modify fields with incorrect type or other type errors

I am reading hundreds of thousands of documents into a Solr instance using the post tool. I've found that out of this corpus, about 150 documents failed due to some type of schema type error e.g. I defined a field "created" as a date field, but a few documents had an invalid date value in that field.
Rather than changing the schema and reindexing all the documents (a process that takes about 20 hours), for my purposes it is fine to just read the documents that failed into the index by ignoring the "created" field.
How can I configure Solr to ingest the documents it receives and simply drop the created field from them? Even better, how can I tell Solr to simply drop any field which fails schema validation?
There are two possibilities:
1) Ignore the Field
To ignore the field in question, use the fmap parameter to map it to an ignored field. For example:
bin/post -c mycollection -params \
"fmap.created=ignored_created" files...
which leverages the dynamic ignored_* field of type ignored in the schema e.g.:
<dynamicField name="ignored_*" type="ignored" multiValued="true"/>
2) Change the Format
If the information is valid but not in the correct format, use an update processor to parse the format, or to modify the input. For example, to parse dates in an unusual format, add the format to the solr.ParseDateFieldUpdateProcessorFactory in solrconfig.xml.
To modify the input, use a RegexReplaceProcessorFactory.

How to delete a doc at a specific shard in Solr

I want to delete a specific doc at a specific shard in Solr, below is my query:
http://localhost:8080/solr/collections_1_replica1/update?stream.body=<delete><query>id:1</query></delete>&commit=true&distrib=false
But this still effect to collections_2_replica1, so what is the correct query in this case.
If you use the default Solr Cloud collection configurations, Solr choose where to put the document according to the document id (docId.hash() % number of shards).
In other words, you're not supposed to delete from a specific shard because you can't be sure whether the document is there or on the other shards.
If I'm not wrong, the distrib=false parameter is not effective in updated.
Tyr out this one :
curl http://xx.xx.xx.xx:8983/solr/collection_name/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:1</query></delete>'

Solr fields mapping?

I am indexing documents into solr from a source. At source, for each document, i have some associated properties which i am indexing & fetching into solr.
What i am doing is i am mapping some fields from source properties with solr schema fields. But i could see couple of extra fields in solr logs which i am not mapping. While querying in solr admin UI, i could see only mapped fields.
E.g. In below logs, i am using only content_name & content content_modifier but i could see Template fields also.
INFO - 2014-09-18 12:07:47.185; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.content_name=1_.000&literal.content_modifier=System&literal.Template={8ad4d8f0-93a7-4941-9657-cf3706f00409} {add=[1_.000 (1479581071766978560)]} 0 0
So whats happening here? Will solr index only mapped fields and skip rest of unmapped ones? Or will solr index all fields including mapped & non-mapped but on admin UI , it will show only mapped fields?
Please suggest.
Your question is defined by what your solrconfig and schema say because you can configure it any way you want. Here is how it works for the example schema for Solr 4.10:
1) In solrconfig.xml, the handler use "uprefix" parameter to map all fields NOT in schema to a dynamic field ignored_*
2) In schema.xml, that dynamic field has type ignored
3) Type ignored (in the same file) is defined as stored=false and indexed=false. Which means do not complain if you get one of fields with matching pattern, but do nothing with, literally ignore.
So, if you don't like that, you can modify any part of that pipeline. The easiest test would be to change the dynamic field to use type string and reindex. Then, you should see the rest of the fields.

Solr not searching (Very basic example)

Solr version :
4.2.1
Objective:
I am trying to get a very simplistic Solr example off the ground
So far:
Installed solr
Was able to run the example\tutorial successfully http://lucene.apache.org/solr/4_2_1/tutorial.html
Next:
Now I am trying to create my own schema
I have created a schema : http://pastebin.com/vj4ATa8d
And a Test Doc:http://pastebin.com/7fvZ5GTQ
I have added the doc to Solr using the command
java -jar post.jar testdoc.xml
What’s working:
In Solr Admin- I can see the schema
I can see one document uploaded
I can go to Admin console and query as follows:
Specify q as “:”. This works- shows the document
http://localhost:8983/solr/collection2/select?q=*%3A*&wt=xml&indent=true
What does not work:
If I give q as Nashua- I see no results
This is the default search field
Other attributes didn't work either
http://localhost:8983/solr/collection2/select?q=Nashua&wt=xml&indent=true
The debug response http://pastebin.com/fTneyEba
You need to either copy your fields into the default search field (in this case text) or qualify your query with the field you want to search against:
.../select?q=city:Nashua&wt=xml&indent=true
Things to read up on:
Default Search Field
Copy Fields
Both are documented here:
https://wiki.apache.org/solr/SchemaXml

SOLR Tika: add text of file to existing record (ExtractingRequestHandler)

I am indexing posts in SOLR with "name", "title", and "description" fields. I'd like to later be able to add a file (like a Word doc or a PDF) using Tika / the ExtractingRequestHandler.
I know I can add documents like so: (or through other interfaces)
curl
'http://localhost:8983/solr/update/extract?literal.id=post1&commit=true'
-F "myfile=#tutorial.html"
But this replaces the correct post (post1 above) -- is there a parameter I can pass to have it only add to the record?
In Solr (ver < 4.0) you can't modify fields in a document. You can only delete or add/replace whole documents. Therefore, when "appending" a file to the Solr document you have to rebuild your document from its current values (using literal), i.e. query for the document and then:
http://localhost:8983/solr/update/extract?literal.id=post1&literal.name=myName&literal.title=myTitle&literal.description=myDescription&commit=true

Resources