Deleting solr documents from Solr Admin - solr

How do I delete all the documents in my SOLR index using the SOLR Admin.
I tried using the url and it works but want to know if the same can be done using the Admin..

Use one of the queries below in the Document tab of Solr Admin UI:
XML:
<delete><query>*:*</query></delete>
JSON:
{'delete': {'query': '*:*'}}
Make sure to select the Document Type drop down to Solr Command (raw XML or JSON).

Update: newer versions of Solr may work better with this answer: https://stackoverflow.com/a/48007194/3692256
My original answer is below:
I'm cheating a little, but not as much as writing the query by hand.
Since I've experienced the pain of accidental deletions before, I try to foolproof my deletions as much as possible (in any kind of data store).
1) Run a query in the Solr Admin Query screen, by only using the "q" parameter at the top left. Narrow it to the items you actually want to delete. For this example, I'm using *:*, but you can use things like id:abcdef or a range or whatever. If you have a crazy complex query, you may find it easier to do this multiple times, once for each part of the data you wish to delete.
2) On top of the results, there is a grayed out URL. If you hover the mouse over it, it turns black. This is the URL that was used to get the results. Right (context) click on it and open it in a new tab/window. You should get something like:
http://localhost:8983/solr/my_core_name/select?q=*%3A*&wt=json&indent=true
Now, I want to get it into a delete format. I replace the select?q= with update?commit=true&stream.body=<delete><query> and, at the end, the &wt=json&indent=true with </query></delete>.
So I end up with:
http://localhost:8983/solr/my_core_name/update?commit=true&stream.body=<delete><query>*%3A*</query></delete>
Take a deep breath, do whatever you do for good luck, and submit the url (enter key works).
Now, you should be able to go back to the Solr admin page and run the original query and get zero results.

For everyone who doesn't like a lot of words :-)

curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<commit />'

select XML on collection Document tab and update below parameter.
<delete><query>*:*</query></delete>

This solution is only applicable if you are deleting all the documents in multiple collections and not for selective deletion:
I had the same scenario, where I needed to delete all the documents in multiple collections. There were close to 500k documents in each shard and there were multiple shards of each collection. Updating and deleting the documents using the query was a big task and thus followed the below process:
Used the Solr API for getting the details for all the collections -
http://<solrIP>:<port>/solr/admin/collections?action=clusterstatus&wt=json
This gives the details like name of collection, numShards, configname, router.field, maxShards, replicationFactor, etc.
Saved the output json with the above details in a file for future reference and took the backups of all the collections I needed to delete the documents in, using the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
Further I deleted all the collections which I need to remove all the documents for using the following:
http://<solr-ip>:<port>/solr/admin/collections?action=DELETEALIAS&name=collectionname
Re-created all the collections using the details in the Step 1 and the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=CREATE&name=collectionname&numShards=number&replicationFactor=number&maxShardsPerNode=number&collection.configName=configname&router.field=routerfield
I executed the above steps in loop for all the collections and was done in seconds for around 100 collections with huge data. Plus, I had the backups as well for all the collections.
Refer to this for other Solr APIs: DELETEALIAS: Delete a Collection Alias, Input

Under the Documents tab, select "raw XML or JSON" under Document Type and just add the query you need using the unique identifiers for each document.
{'delete': {'query': 'filter(product_id:(25634 25635 25636))'}}

If you want delete some documents by ID you can use the Solr POST tool.
./post -c $core_name ./delete.xml
Where the delete.xml file contains documents ids:
<delete>
<id>a3f04b50-5eea-4e26-a6ac-205397df7957</id>
</delete>

Related

Solr Cloud 6.4.0 Document routing compositeID does not route to another shard

I am completely confused about what is going on in my SolrCloud setup. That said, you see I am new to the topic.
My current setup for testing is:
I have created a SolrCloud on a single server (I do not have more for testing in the moment). I use the embedded Zookeeper.
I have three shards on three nodes with two replica in each one. Every node has a different IP (8984,7574,7575).
I set up a collection with
curl 'http://localhost:8984/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=2&collection.configName=myconfigset&router.name=compositeId'
When I index some Documents with
curl 'http://localhost:8984/solr/mycollection/update/csv?update.chain=all-into-one&commit=true&separator=%09&encapsulator=%00&header=true&trim=true&f.itau_name3.split=true&f.itau_name3.separator=;&f.itau_lnx3.split=true&f.itau_lnx3.separator=;&f.itau_af3.split=true&f.itau_af3.separator=;&f.itau_init3.split=true&f.itau_init3.separator=;' --data-binary #file.txt -H 'Content-type:text/plain; charset=utf-8'
the routing works ... sometimes. It seems to depend on the prefix/compositeID I use:
When I use r1!<uniquekey>, r2!<uniquekey>, r3!<uniquekey> a query : with _route_=r1! returns documents with r1 and r2 prefix. _route_=r3! seems to be correctly routed. In the very same setup (after clearing the index and reindexing) the same with year1!, year2!, year3! as prefixes works. A query with q=*:*&_route_=year1! etc. only returns results of the respective route-prefix.
Can somebody help me to find out what's going on here?
Are there any restrictions on what how to construct the route-prefix?
you did not show your full query, but, I suspect this might be happening:
when you query with route=r1! you are not adding an fq=r_field:r1
route=r1! only tells Solr what shard to query, but this does not exclude other r field values (like r2 and r3) that are ALSO indexed in that same shard, that is why you need to add the fq
Does this make sense?

solr :cannot delete a document using post

I'm newbie to solr, I'm trying to test different possibilities to delete a document.
I have tested the
update?commit=true&stream.body=<delete><query>*:*</query></delete>.
An other method that I find from solr quick start said that a document can be deleted
using : bin/post -c gettingstarted -d "<delete><id>SP2514N</id></delete>"
When I try it seems that it works, but when I search for the id I find it.
why it doesn't work, and I wonder if there is other ways to delete a document (for example using the admin console).
Try this
http://localhost:8983/solr/gettingstarted/update?stream.body=<delete><query>id:SP2514N</query></delete>&commit=true
check Here for reference
It doesn't look like you issued a commit after you deleted (hence the commit=true in the URL deletion). Add a <commit/> after your delete and that should work. (see more on Solr Update XML here)

SOLR Search. Search for additional documents after initial match

I have the following problem to solve.
Client sends the id of the document. This is an HTTP Get to a proxy (not directly to SOLR). Example:
baseURL/movies/{id}
The response of this call will be a list of variants of this movie.
In order to find the variants I want to perform a SOLR search using title and some other fields, e.g.
/movies/select?q=title:spiderman+year:2001
it will expect the different variants of Spiderman e.g. SpiderMan, Spiderman HD. etc
The problem I have now is that the proxy service will not have the title of the original movie. It will get only the id of this movie for the API.
My approach so far is to get the original movie information using the id,
e.g.
/movies/select?id={id}
After I get the original movie then I perform a second request to SOLR search for the variants.
Any ideas how to avoid the two calls to SOLR search?

How to delete a doc at a specific shard in Solr

I want to delete a specific doc at a specific shard in Solr, below is my query:
http://localhost:8080/solr/collections_1_replica1/update?stream.body=<delete><query>id:1</query></delete>&commit=true&distrib=false
But this still effect to collections_2_replica1, so what is the correct query in this case.
If you use the default Solr Cloud collection configurations, Solr choose where to put the document according to the document id (docId.hash() % number of shards).
In other words, you're not supposed to delete from a specific shard because you can't be sure whether the document is there or on the other shards.
If I'm not wrong, the distrib=false parameter is not effective in updated.
Tyr out this one :
curl http://xx.xx.xx.xx:8983/solr/collection_name/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:1</query></delete>'

Solr weird search behaviour

I am having lots of solr document indexed which has field
uri = nntp://msnews.microsoft.com/microsoft.public.windows.server.sbs
but when i search with query
uri:nntp\://msnews.microsoft.com/microsoft.public.windows.server.sbs
It returns zero results. The search query works with similar other uri (nntp://msnews.microsoft.com/microsoft.public.windows.windowsxp.general) though.
What am i missing here?
If your search URI is similar to
/select?uri%3Anntp*&rows=0
you should still be able to get a good idea of how many items in that field begin with nntp without even returning any rows, the numFound attribute of the result tag should tell you.
If this is blank, I would check your logfile. It is entirely likely you're adding documents with commit turned off. I would use the command line scripts to force things to commit and refresh the readers:
sync
bin/commit
sync
bin/readercycle
Then I would issue that search again and see if you can see your data again.

Resources