How to delete the documents a month ago - solr

We are using Solr 1.4.
How to delete the documents a month ago?

We are doing something similar where we purge items from one of our indexes, using curl and taking advantage of the timestamp field in the Solr schema.
Here is the curl command that you would issue to delete items older than 30 days (using DateMathParser to calculate based on current day), using the timestamp field in the schema.
curl "http://localhost:8983/solr/update?commit=true" -H "Content-Type: text/xml"
--data-binary "<delete><query>timestamp:[* TO NOW/DAY-30DAYS]</query></delete>"
Of course you will need to change the url to match your solr instance and you may choose to use a different field.
Also here is the field definition for the timestamp field from the schema.xml that comes with the Solr distribution in the example folder.
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>

You need to POST in order to do deletes but if you use the post.jar from the example folder in the installation it is simply:
java -Ddata=args -Dcommit=yes -jar post.jar "<delete><query>$DateField:[* TO $DateOneMonthAgo]</query></delete>"
where $DateField is the name of the field where the date is stored and $DateOneMonthAgo is the date one month from now (2011-11-09T11:48:00Z)

Related

Is there a way to overcome the limitation of Solr

Is there a way to overcome the limitation of Solr
How to add an additional column to the collection that I have already created and have crores of data in it.
To add a new field to an existing schema, you can use the Solr Schema API:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name":"sell_by",
"type":"pdate",
"stored":true }
}' http://localhost:8983/solr/gettingstarted/schema
The type parameter corresponds to the field type you want the new field to have.
If you're using the old schema.xml format, you can add the field type in the XML there:
<field name="sell_by" type="pdate" indexed="true" stored="true"/>
You'll have to reload the configuration for the collection after changing it. If you're using Zookeeper (i.e. you're manually uploading your configuration to Zookeeper), you can use zkCli.sh and downconfig and upconfig to download and upload the configuration set.
After adding the field, you'll have to reindex the documents that should contain the field by submitting them to Solr again, so that the content is added to the field as expected.

Is there a way to view search document fields that are only indexed but not stored via the solr admin panel using the query tool?

I want to view the indexed but not stored fields of a solr search document in the solr admin query tool, is there any provision for this?
Example Field Configuration:
<field name="product_data" type="string" indexed="true" stored="false" multiValued="false" docValues="true" />
If you're using schema version 1.6, Solr will automagically fetch the values from the stored docValues, even if the field itself is set as stored="false". Include the field name in fl to get the values.
However, even if you're looking for the actual tokens indexed for a document / field / value, using the Analysis page is usually the preferred way as it allows you to tweak the value and see the response quickly. The Luke Request Handler / Tool is useful if you want to explore the actual indexed tokens.

Solr-Retrieve name of document where the word is found

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr
I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.
I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.
Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt
Same thing I said when you mentioned this question on IRC:
Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.
https://wiki.apache.org/solr/HowToReindex

Stored fields in Solr are getting displayed in queries , why?

I am new to using Solr , and I have made a new core and copied the default schema.xml to the conf/ folder. The changes I have made is very trivial .
<field name="id" type="string" indexed="true" stored="false" required="true" multiValued="false" />
As you can see, I set the id field to stored=false. As per my understanding, the field id should not be displayed now when I do a query search. But that is not happening. I have tried restarting solr instance, and did the query to index the file again.
curl 'http://localhost:8983/solr/TwitterCore/update/json?commit=true'
--data-binary #$(echo TwitterData_Core_Conf/TwitterText_en_demo.json)
-H 'Content-type:application
As per Solr Wiki , this should have re-indexed my file. However when I run my query again, I still see the Id .
An example of the document returned (this is not the complete JSON node , I just copied some parts ) :
"text": [
"RT #FollowTrainTV: Moonseternity just joined #FollowTrainTV - Watch them stream on http://t.co/oMcOGA51kT"
],
"lang": [
"en"
],
"id": "0a8edfea-68f7-4b05-b370-27b5aba640b7", // I dont want to see this
"_version_": 1512067627994841000
Maybe someone can give me detailed steps on re-indexing.
When you change the schema.xml file and restart the solr-server, the changes only apply for new documents. This means you have to clear the index and re-index all documents (Except at query tokenizer, these changes are active immediately after server restart, but this is not the case here). After re-indexing, the id field should not be visible any more.
Another remark: You don't have to test your queries with curl. When you connect to http://localhost:8983/solr with your web-browser you should find an admin interface there. There you can select a core and test your queries.
Refer to this https://lucene.apache.org/solr/guide/6_6/docvalues.html document.
Non-stored docValues fields will be also returned along with other
stored fields when all fields are
specified to be returned (e.g. “fl=*”) for search queries depending on
the effective value of the useDocValuesAsStored parameter for each
field. For schema versions >= 1.6, the implicit default is
useDocValuesAsStored="true".
The String field type has docValues="true" . That is the reason why it is appearing in the search response.
You can either add the useDocValuesAsStored="false" parameter to the field or you can use a different fieldType, say text_general.

SOLR 4.3 Modification of schema.xml not considered by the server

I have the following error: [doc=testIngestID411] unknown field 'dateImport'
At the beginning I did not have the field 'dateImport' in my solr schema. I decided to add it after launching solr a few times.
1. I added this field to schema.xml:
<filed name="dateImport" type="string" indexed="true" stored="true" required="true"/>
after the other pre-existing fields.
I removed all my existing documents using :
<delete><query>*:*</query></delete>
Stopped SOLR (using ctrl+c or by killing the jar process)
Restarted SOLR (using java -jar start.jar)
Then, when I try to insert a document with a filed named dateImport I got :
"unknown field 'dateImport'"
Extra information:
If I modify one field which existed before (i.e which was there the first time I launched this SOLR core) the modification is well considered. For instance, if I change one field that was not required for required=true (and restart solr). Then I cannot add a document without specifying this field.
Also I have noticed, using the web admin interface:
On the left there is a tab call "Schema", this schema contains all modifications (like the field dateImport). Above this tab there is another tab named "Schema Browser". The field 'dateImport' DOES NOT appear here :( .
What can I do to get this new field working??
Thank you
Change <filed ... to <field ...

Resources