I'm newbie to solr, I'm trying to test different possibilities to delete a document.
I have tested the
update?commit=true&stream.body=<delete><query>*:*</query></delete>.
An other method that I find from solr quick start said that a document can be deleted
using : bin/post -c gettingstarted -d "<delete><id>SP2514N</id></delete>"
When I try it seems that it works, but when I search for the id I find it.
why it doesn't work, and I wonder if there is other ways to delete a document (for example using the admin console).
Try this
http://localhost:8983/solr/gettingstarted/update?stream.body=<delete><query>id:SP2514N</query></delete>&commit=true
check Here for reference
It doesn't look like you issued a commit after you deleted (hence the commit=true in the URL deletion). Add a <commit/> after your delete and that should work. (see more on Solr Update XML here)
Related
How do I delete all the documents in my SOLR index using the SOLR Admin.
I tried using the url and it works but want to know if the same can be done using the Admin..
Use one of the queries below in the Document tab of Solr Admin UI:
XML:
<delete><query>*:*</query></delete>
JSON:
{'delete': {'query': '*:*'}}
Make sure to select the Document Type drop down to Solr Command (raw XML or JSON).
Update: newer versions of Solr may work better with this answer: https://stackoverflow.com/a/48007194/3692256
My original answer is below:
I'm cheating a little, but not as much as writing the query by hand.
Since I've experienced the pain of accidental deletions before, I try to foolproof my deletions as much as possible (in any kind of data store).
1) Run a query in the Solr Admin Query screen, by only using the "q" parameter at the top left. Narrow it to the items you actually want to delete. For this example, I'm using *:*, but you can use things like id:abcdef or a range or whatever. If you have a crazy complex query, you may find it easier to do this multiple times, once for each part of the data you wish to delete.
2) On top of the results, there is a grayed out URL. If you hover the mouse over it, it turns black. This is the URL that was used to get the results. Right (context) click on it and open it in a new tab/window. You should get something like:
http://localhost:8983/solr/my_core_name/select?q=*%3A*&wt=json&indent=true
Now, I want to get it into a delete format. I replace the select?q= with update?commit=true&stream.body=<delete><query> and, at the end, the &wt=json&indent=true with </query></delete>.
So I end up with:
http://localhost:8983/solr/my_core_name/update?commit=true&stream.body=<delete><query>*%3A*</query></delete>
Take a deep breath, do whatever you do for good luck, and submit the url (enter key works).
Now, you should be able to go back to the Solr admin page and run the original query and get zero results.
For everyone who doesn't like a lot of words :-)
curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<commit />'
select XML on collection Document tab and update below parameter.
<delete><query>*:*</query></delete>
This solution is only applicable if you are deleting all the documents in multiple collections and not for selective deletion:
I had the same scenario, where I needed to delete all the documents in multiple collections. There were close to 500k documents in each shard and there were multiple shards of each collection. Updating and deleting the documents using the query was a big task and thus followed the below process:
Used the Solr API for getting the details for all the collections -
http://<solrIP>:<port>/solr/admin/collections?action=clusterstatus&wt=json
This gives the details like name of collection, numShards, configname, router.field, maxShards, replicationFactor, etc.
Saved the output json with the above details in a file for future reference and took the backups of all the collections I needed to delete the documents in, using the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
Further I deleted all the collections which I need to remove all the documents for using the following:
http://<solr-ip>:<port>/solr/admin/collections?action=DELETEALIAS&name=collectionname
Re-created all the collections using the details in the Step 1 and the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=CREATE&name=collectionname&numShards=number&replicationFactor=number&maxShardsPerNode=number&collection.configName=configname&router.field=routerfield
I executed the above steps in loop for all the collections and was done in seconds for around 100 collections with huge data. Plus, I had the backups as well for all the collections.
Refer to this for other Solr APIs: DELETEALIAS: Delete a Collection Alias, Input
Under the Documents tab, select "raw XML or JSON" under Document Type and just add the query you need using the unique identifiers for each document.
{'delete': {'query': 'filter(product_id:(25634 25635 25636))'}}
If you want delete some documents by ID you can use the Solr POST tool.
./post -c $core_name ./delete.xml
Where the delete.xml file contains documents ids:
<delete>
<id>a3f04b50-5eea-4e26-a6ac-205397df7957</id>
</delete>
Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.
Solr version :
4.2.1
Objective:
I am trying to get a very simplistic Solr example off the ground
So far:
Installed solr
Was able to run the example\tutorial successfully http://lucene.apache.org/solr/4_2_1/tutorial.html
Next:
Now I am trying to create my own schema
I have created a schema : http://pastebin.com/vj4ATa8d
And a Test Doc:http://pastebin.com/7fvZ5GTQ
I have added the doc to Solr using the command
java -jar post.jar testdoc.xml
What’s working:
In Solr Admin- I can see the schema
I can see one document uploaded
I can go to Admin console and query as follows:
Specify q as “:”. This works- shows the document
http://localhost:8983/solr/collection2/select?q=*%3A*&wt=xml&indent=true
What does not work:
If I give q as Nashua- I see no results
This is the default search field
Other attributes didn't work either
http://localhost:8983/solr/collection2/select?q=Nashua&wt=xml&indent=true
The debug response http://pastebin.com/fTneyEba
You need to either copy your fields into the default search field (in this case text) or qualify your query with the field you want to search against:
.../select?q=city:Nashua&wt=xml&indent=true
Things to read up on:
Default Search Field
Copy Fields
Both are documented here:
https://wiki.apache.org/solr/SchemaXml
How do I pick/ delete all the documents from Solr using the boolean NOT notion?
i.e. How do I delete all the documents from Solr who's id does NOT start with A59?
Use - to indicate NOT.
For example, to query documents with id not starting with A59, the query would be: -id:A59*, that is: /solr/select/?q=-id:A59*
To delete by query, post the query in a delete message to the update handler, as specified here.
EDIT: NOT (all uppercase) can also be used as boolean operator
Exclamation works for NOT as well, so:
/solr/select/?q=!id:A59*
should work in the case above.
I don't believe that a negative delete by query works. See this Jira ticket: https://issues.apache.org/jira/browse/SOLR-381
They do say that there is a workaround to prefix in a :, but I do not have any luck with that.
This does not work (same with using NOT)
java -Ddata=args -jar /opt/solr/example/exampledocs/post.jar "-userid:*"
java -jar /opt/solr/example/exampledocs/post.jar *.xml
Adding in a : gives a syntax error (same with using NOT)
java -Ddata=args -jar /opt/solr/example/exampledocs/post.jar ": -userid:*"
java -jar /opt/solr/example/exampledocs/post.jar *.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing args to http://localhost:8983/solr/update..
SimplePostTool: FATAL: Solr returned an error #400 Error parsing Lucene query
SimplePostTool: version 1.4
Using the - symbol in-front of the files to implies that exclude that particular value. It will give result like NOT Equal
The following is sample url query string where. I have kept "&fq=-HQ_City_Code:MEL",
It will skip all the result which is having HQ_City_Code value MEL.
http://localhost:8983/solr/HQ_SOLR_Hotels/select?q=*:*&fq=HQ_National_Code:TH&fq=HQ_TYPE:hotel_EN&fq=HQ_Country_Code:AU&**fq=-HQ_City_Code:MEL**&wt=json&indent=true
before deleting please ensure that the ids that you are referring to is string and in no way would be formed by two terms combined .
The way I would do it is read the data from solr from a script and do a singular delete or in batches , because it provides a better control and validations over each ids which reduces the risk of wrong deletion
Hence
1 read the data from solr from a script using
/solr/select/?q=id:A59*
2 verify and validate the ids
3 delete them one by one or in a group of 10 ids at once
Regards
Rajat
As Mauricio stated:
Use the - symbol to indicate what you want exclude in your query.
The following two queries will delete all documents except the ones that begin with A59.
GET http://<url>/solr/<core>/update?stream.body=<delete><query>-id:A59*</query></delete>
GET http://<url>/solr/<core>/update?stream.body=<commit/>
The first line does the delete operation.
The second line does the commit.
I am having lots of solr document indexed which has field
uri = nntp://msnews.microsoft.com/microsoft.public.windows.server.sbs
but when i search with query
uri:nntp\://msnews.microsoft.com/microsoft.public.windows.server.sbs
It returns zero results. The search query works with similar other uri (nntp://msnews.microsoft.com/microsoft.public.windows.windowsxp.general) though.
What am i missing here?
If your search URI is similar to
/select?uri%3Anntp*&rows=0
you should still be able to get a good idea of how many items in that field begin with nntp without even returning any rows, the numFound attribute of the result tag should tell you.
If this is blank, I would check your logfile. It is entirely likely you're adding documents with commit turned off. I would use the command line scripts to force things to commit and refresh the readers:
sync
bin/commit
sync
bin/readercycle
Then I would issue that search again and see if you can see your data again.