SOLR 4.3 Modification of schema.xml not considered by the server - solr

I have the following error: [doc=testIngestID411] unknown field 'dateImport'
At the beginning I did not have the field 'dateImport' in my solr schema. I decided to add it after launching solr a few times.
1. I added this field to schema.xml:
<filed name="dateImport" type="string" indexed="true" stored="true" required="true"/>
after the other pre-existing fields.
I removed all my existing documents using :
<delete><query>*:*</query></delete>
Stopped SOLR (using ctrl+c or by killing the jar process)
Restarted SOLR (using java -jar start.jar)
Then, when I try to insert a document with a filed named dateImport I got :
"unknown field 'dateImport'"
Extra information:
If I modify one field which existed before (i.e which was there the first time I launched this SOLR core) the modification is well considered. For instance, if I change one field that was not required for required=true (and restart solr). Then I cannot add a document without specifying this field.
Also I have noticed, using the web admin interface:
On the left there is a tab call "Schema", this schema contains all modifications (like the field dateImport). Above this tab there is another tab named "Schema Browser". The field 'dateImport' DOES NOT appear here :( .
What can I do to get this new field working??
Thank you

Change <filed ... to <field ...

Related

Reindexing Solr Data with different field type

I am facing an issue while reindexing Solr data.
I have indexed some documents specifying a wrong field type on the managed-schema file.
Now, instead of the wrong field definition, I would like to use:
<field name="documentDate" type="date" indexed="true" stored="true"/>
To do this I have:
deleted all the previous wrong indexed documents;
updated the managed-schema
reloaded the core
After these steps I tried to reindex documents, but this fails; looking at logs:
org.apache.solr.common.SolrException: Exception writing document id 2ecde3eb2b5964b2c44362f752f7b90d to the index; possible analysis error: cannot change DocValues type from NUMERIC to SORTED_SET for field "documentDate".
How is this possible? I have removed all the documents storing the field documentDate.. How can I solve this issue?
maybe try to delete the data folder in your core.
You can add new fields to your schema without delete the data folder, but when you modify a field (this is my experience) then I have to delete the data folder and build a new fresh index

Additional fields in schema.xml are not showing up when I do a query

Solr version information: 6.6.0
The core is named: solr
Instance: /var/solr/data/new_core
In the /var/solr/data/new_core/conf/ directory I have a custom schema.xml file
I have multiple custom fields like this in the schema.xml file
<field name="nid" type="int" indexed="true" stored="true"/>
When I select the 'solr' core and go to query, these custom fields are not showing up in the results. Here's an example of the results:
{
"response":{"numFound":200,"start":0,"docs":[
{
"id":"koe1eh/node/49",
"site":"https://example.com:1881/",
"hash":"koe1eh",
"ss_language":"und",
"url":"https://example.com:1881/node/49",
"ss_name":"tfadmin",
"tos_name":"tfadmin",
"ss_name_formatted":"tfadmin",
"tos_name_formatted":"tfadmin",
"is_uid":1,
"bs_status":true,
"bs_sticky":false,
"bs_promote":false,
"is_tnid":0,
"bs_translate":false,
"ds_created":"2009-03-12T17:46:06Z",
"ds_changed":"2009-06-18T15:25:33Z",
"ds_last_comment_or_change":"2009-06-18T15:25:33Z",
"tos_content_extra":" (Gifts) ",
"sm_field_apptype":["mousepad"],
"_version_":1588589404094464000,
"timestamp":"2018-01-03T16:28:34Z"}]
}}
The query performed is: http://example.com/solr/solr/select?indent=on&q=*:*&rows=1&wt=json
solrconfig.xml is here: https://pastebin.com/iVhZCqTW
schema.xml is here: https://pastebin.com/UBaUN5EK
I have tried restarting solr, reloading the core, and reindexing with no effect.
It turns out that I was just looking at some entries that did not have those fields. When I altered my query to start on record 900, then I saw the fields I was looking for. I'm not sure what else I may have done to get this working as I've been trying many different things.

Solr-Retrieve name of document where the word is found

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr
I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.
I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.
Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt
Same thing I said when you mentioned this question on IRC:
Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.
https://wiki.apache.org/solr/HowToReindex

Stored fields in Solr are getting displayed in queries , why?

I am new to using Solr , and I have made a new core and copied the default schema.xml to the conf/ folder. The changes I have made is very trivial .
<field name="id" type="string" indexed="true" stored="false" required="true" multiValued="false" />
As you can see, I set the id field to stored=false. As per my understanding, the field id should not be displayed now when I do a query search. But that is not happening. I have tried restarting solr instance, and did the query to index the file again.
curl 'http://localhost:8983/solr/TwitterCore/update/json?commit=true'
--data-binary #$(echo TwitterData_Core_Conf/TwitterText_en_demo.json)
-H 'Content-type:application
As per Solr Wiki , this should have re-indexed my file. However when I run my query again, I still see the Id .
An example of the document returned (this is not the complete JSON node , I just copied some parts ) :
"text": [
"RT #FollowTrainTV: Moonseternity just joined #FollowTrainTV - Watch them stream on http://t.co/oMcOGA51kT"
],
"lang": [
"en"
],
"id": "0a8edfea-68f7-4b05-b370-27b5aba640b7", // I dont want to see this
"_version_": 1512067627994841000
Maybe someone can give me detailed steps on re-indexing.
When you change the schema.xml file and restart the solr-server, the changes only apply for new documents. This means you have to clear the index and re-index all documents (Except at query tokenizer, these changes are active immediately after server restart, but this is not the case here). After re-indexing, the id field should not be visible any more.
Another remark: You don't have to test your queries with curl. When you connect to http://localhost:8983/solr with your web-browser you should find an admin interface there. There you can select a core and test your queries.
Refer to this https://lucene.apache.org/solr/guide/6_6/docvalues.html document.
Non-stored docValues fields will be also returned along with other
stored fields when all fields are
specified to be returned (e.g. “fl=*”) for search queries depending on
the effective value of the useDocValuesAsStored parameter for each
field. For schema versions >= 1.6, the implicit default is
useDocValuesAsStored="true".
The String field type has docValues="true" . That is the reason why it is appearing in the search response.
You can either add the useDocValuesAsStored="false" parameter to the field or you can use a different fieldType, say text_general.

Solr highlighting for external fields

I would like to use Solr highlighting, but our documents are only indexed and not stored. The field values are found in a separate database. Is there a way to pass in the text to be highlighted without Solr needing to pull that text from its own stored fields? Or is there an interface that would allow me to pass in a query, a field name, a field value and get back snippets?
I'm on Solr 5.1.
Lucene supports highlighting (returns offsets) also for non-stored content by using docValues.
Enabling a field for docValues only requires adding docValues="true" to the field (or field type) definition, e.g.:
<field name="manu_exact" type="string" indexed="true" stored="false" docValues="true" />
(introduced in Lucene 8.5, SOLR-14194)
You could reindex the resultset (read from database) in an embedded solr instance and run the query with same set of keywords with highlighting turned on and get the highlighted text back.
You could read the schema and solrconfig as resources from local jar and extract to temporary solr core directory to get this setup working.

Resources