Is there a way to overcome the limitation of Solr
How to add an additional column to the collection that I have already created and have crores of data in it.
To add a new field to an existing schema, you can use the Solr Schema API:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name":"sell_by",
"type":"pdate",
"stored":true }
}' http://localhost:8983/solr/gettingstarted/schema
The type parameter corresponds to the field type you want the new field to have.
If you're using the old schema.xml format, you can add the field type in the XML there:
<field name="sell_by" type="pdate" indexed="true" stored="true"/>
You'll have to reload the configuration for the collection after changing it. If you're using Zookeeper (i.e. you're manually uploading your configuration to Zookeeper), you can use zkCli.sh and downconfig and upconfig to download and upload the configuration set.
After adding the field, you'll have to reindex the documents that should contain the field by submitting them to Solr again, so that the content is added to the field as expected.
Related
I am using Solr 6.6. I am trying atomic updates on a date field. The field is defined in schema as
field name="inventory_update_time" type="date" indexed="true" stored="true" omitNorms="true" multiValued="false" omitTermFreqAndPositions="true"/
and I am firing the curl request as
curl 'localhost:8081/solr/sitename/update' -H 'Content-type:application/json' -d '[{"id":"9988062","inventoryUpdateTime":"2018-07-03T06:29:29Z"}]'
but the date is not getting updated.
any suggestions?
Your field name and your JSON name is not the same. You're not doing atomic updates either, since that would require a "set" command.
Your schema has the field name set as inventory_update_time, but in your JSON structure you're using inventoryUpdateTime as the key.
To actually perform an atomic update:
[
{
"id":"9988062",
"inventory_update_time":{
"set":"2018-07-03T06:29:29Z"
}
}
]
I am new to using Solr , and I have made a new core and copied the default schema.xml to the conf/ folder. The changes I have made is very trivial .
<field name="id" type="string" indexed="true" stored="false" required="true" multiValued="false" />
As you can see, I set the id field to stored=false. As per my understanding, the field id should not be displayed now when I do a query search. But that is not happening. I have tried restarting solr instance, and did the query to index the file again.
curl 'http://localhost:8983/solr/TwitterCore/update/json?commit=true'
--data-binary #$(echo TwitterData_Core_Conf/TwitterText_en_demo.json)
-H 'Content-type:application
As per Solr Wiki , this should have re-indexed my file. However when I run my query again, I still see the Id .
An example of the document returned (this is not the complete JSON node , I just copied some parts ) :
"text": [
"RT #FollowTrainTV: Moonseternity just joined #FollowTrainTV - Watch them stream on http://t.co/oMcOGA51kT"
],
"lang": [
"en"
],
"id": "0a8edfea-68f7-4b05-b370-27b5aba640b7", // I dont want to see this
"_version_": 1512067627994841000
Maybe someone can give me detailed steps on re-indexing.
When you change the schema.xml file and restart the solr-server, the changes only apply for new documents. This means you have to clear the index and re-index all documents (Except at query tokenizer, these changes are active immediately after server restart, but this is not the case here). After re-indexing, the id field should not be visible any more.
Another remark: You don't have to test your queries with curl. When you connect to http://localhost:8983/solr with your web-browser you should find an admin interface there. There you can select a core and test your queries.
Refer to this https://lucene.apache.org/solr/guide/6_6/docvalues.html document.
Non-stored docValues fields will be also returned along with other
stored fields when all fields are
specified to be returned (e.g. “fl=*”) for search queries depending on
the effective value of the useDocValuesAsStored parameter for each
field. For schema versions >= 1.6, the implicit default is
useDocValuesAsStored="true".
The String field type has docValues="true" . That is the reason why it is appearing in the search response.
You can either add the useDocValuesAsStored="false" parameter to the field or you can use a different fieldType, say text_general.
How do I edit a schema such as the gettingstarted collection as mentioned in
https://lucene.apache.org/solr/quickstart.html
Thanks
Joyce
Solr 5 uses a managed schema by default, while Solr 4 used the schema.xml file. Solr 5 automatically creates the schema for you by guessing the type of the field. Once the type is assigned to the field, you can't change it. You have to set the type of the field before you add data to Solr 5.
To change the schema in Solr 5, you will want to use the Schema Api, which is a REST interface.
Schemaless Mode states the following:
You Can Still Be Explicit - Even if you want to use schemaless mode for most fields, you can still use the Schema API to pre-emptively create some fields, with explicit types, before you index documents that use them.
... Once a field has been added to the schema, its field type is fixed.
If you are using the quick start guide for Solr 5, here's what you have to do if you want to explicitly specify the field types:
After you end the following command: bin/solr start -e cloud -noprompt
Then enter a command like this:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : { "name":"MYFIELDNAMEHERE", "type":"tlong",
"stored":true}}' http://localhost:8983/solr/gettingstarted/schema
The previous command will force the MYFIELDNAMEHERE field to be a tlong. Replace MYFIELDNAMEHERE with the field name that you want to be explicitly set, and change tlong to the Solr type that you want to use.
After doing that, then load your data as usual.
I have the following error: [doc=testIngestID411] unknown field 'dateImport'
At the beginning I did not have the field 'dateImport' in my solr schema. I decided to add it after launching solr a few times.
1. I added this field to schema.xml:
<filed name="dateImport" type="string" indexed="true" stored="true" required="true"/>
after the other pre-existing fields.
I removed all my existing documents using :
<delete><query>*:*</query></delete>
Stopped SOLR (using ctrl+c or by killing the jar process)
Restarted SOLR (using java -jar start.jar)
Then, when I try to insert a document with a filed named dateImport I got :
"unknown field 'dateImport'"
Extra information:
If I modify one field which existed before (i.e which was there the first time I launched this SOLR core) the modification is well considered. For instance, if I change one field that was not required for required=true (and restart solr). Then I cannot add a document without specifying this field.
Also I have noticed, using the web admin interface:
On the left there is a tab call "Schema", this schema contains all modifications (like the field dateImport). Above this tab there is another tab named "Schema Browser". The field 'dateImport' DOES NOT appear here :( .
What can I do to get this new field working??
Thank you
Change <filed ... to <field ...
We are using Solr 1.4.
How to delete the documents a month ago?
We are doing something similar where we purge items from one of our indexes, using curl and taking advantage of the timestamp field in the Solr schema.
Here is the curl command that you would issue to delete items older than 30 days (using DateMathParser to calculate based on current day), using the timestamp field in the schema.
curl "http://localhost:8983/solr/update?commit=true" -H "Content-Type: text/xml"
--data-binary "<delete><query>timestamp:[* TO NOW/DAY-30DAYS]</query></delete>"
Of course you will need to change the url to match your solr instance and you may choose to use a different field.
Also here is the field definition for the timestamp field from the schema.xml that comes with the Solr distribution in the example folder.
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
You need to POST in order to do deletes but if you use the post.jar from the example folder in the installation it is simply:
java -Ddata=args -Dcommit=yes -jar post.jar "<delete><query>$DateField:[* TO $DateOneMonthAgo]</query></delete>"
where $DateField is the name of the field where the date is stored and $DateOneMonthAgo is the date one month from now (2011-11-09T11:48:00Z)