How to delete a doc at a specific shard in Solr - solr

I want to delete a specific doc at a specific shard in Solr, below is my query:
http://localhost:8080/solr/collections_1_replica1/update?stream.body=<delete><query>id:1</query></delete>&commit=true&distrib=false
But this still effect to collections_2_replica1, so what is the correct query in this case.

If you use the default Solr Cloud collection configurations, Solr choose where to put the document according to the document id (docId.hash() % number of shards).
In other words, you're not supposed to delete from a specific shard because you can't be sure whether the document is there or on the other shards.
If I'm not wrong, the distrib=false parameter is not effective in updated.

Tyr out this one :
curl http://xx.xx.xx.xx:8983/solr/collection_name/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:1</query></delete>'

Related

Document is not returned when searched using query parameter in solr

I updated a document in solr using the below query and it was successful.The document has other fields like organization_name,place etc apart from what is shown in the api below.
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8390/solr/collection/update?commit=true' -d '[{"id" : “12345”,”code” : {"set" : “500”}}]’
{
"responseHeader":{
"rf":1,
"status":0,
"QTime":11}}
Post update, when I tried to query solr with query parameter(q) as name, it does not return any document. At the same time, if I query using name as fq parameter, I see the document coming up fine.
This query doesnot work:-
http://localhost:8390/solr/collection/select?q=test&qf=content^0.1%20name_display_name^1.0&defType=edismax
But,this query works(with fq param),
http://localhost:8390/solr/collection/select?q=*%3A*&rows=1000&fq=ngram_info_organization_name:test
The field type of organization_name is string and its both indexed and stored.
This issue is seen only for the document that i updated. If I query for other documents which are not updated, i am able to see the results.
Please help to figure out why the document is not listed when I use the name with query parameter.

Solr Cloud 6.4.0 Document routing compositeID does not route to another shard

I am completely confused about what is going on in my SolrCloud setup. That said, you see I am new to the topic.
My current setup for testing is:
I have created a SolrCloud on a single server (I do not have more for testing in the moment). I use the embedded Zookeeper.
I have three shards on three nodes with two replica in each one. Every node has a different IP (8984,7574,7575).
I set up a collection with
curl 'http://localhost:8984/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=2&collection.configName=myconfigset&router.name=compositeId'
When I index some Documents with
curl 'http://localhost:8984/solr/mycollection/update/csv?update.chain=all-into-one&commit=true&separator=%09&encapsulator=%00&header=true&trim=true&f.itau_name3.split=true&f.itau_name3.separator=;&f.itau_lnx3.split=true&f.itau_lnx3.separator=;&f.itau_af3.split=true&f.itau_af3.separator=;&f.itau_init3.split=true&f.itau_init3.separator=;' --data-binary #file.txt -H 'Content-type:text/plain; charset=utf-8'
the routing works ... sometimes. It seems to depend on the prefix/compositeID I use:
When I use r1!<uniquekey>, r2!<uniquekey>, r3!<uniquekey> a query : with _route_=r1! returns documents with r1 and r2 prefix. _route_=r3! seems to be correctly routed. In the very same setup (after clearing the index and reindexing) the same with year1!, year2!, year3! as prefixes works. A query with q=*:*&_route_=year1! etc. only returns results of the respective route-prefix.
Can somebody help me to find out what's going on here?
Are there any restrictions on what how to construct the route-prefix?
you did not show your full query, but, I suspect this might be happening:
when you query with route=r1! you are not adding an fq=r_field:r1
route=r1! only tells Solr what shard to query, but this does not exclude other r field values (like r2 and r3) that are ALSO indexed in that same shard, that is why you need to add the fq
Does this make sense?

how to edit solr 5 schema which is created by default

How do I edit a schema such as the gettingstarted collection as mentioned in
https://lucene.apache.org/solr/quickstart.html
Thanks
Joyce
Solr 5 uses a managed schema by default, while Solr 4 used the schema.xml file. Solr 5 automatically creates the schema for you by guessing the type of the field. Once the type is assigned to the field, you can't change it. You have to set the type of the field before you add data to Solr 5.
To change the schema in Solr 5, you will want to use the Schema Api, which is a REST interface.
Schemaless Mode states the following:
You Can Still Be Explicit - Even if you want to use schemaless mode for most fields, you can still use the Schema API to pre-emptively create some fields, with explicit types, before you index documents that use them.
... Once a field has been added to the schema, its field type is fixed.
If you are using the quick start guide for Solr 5, here's what you have to do if you want to explicitly specify the field types:
After you end the following command: bin/solr start -e cloud -noprompt
Then enter a command like this:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : { "name":"MYFIELDNAMEHERE", "type":"tlong",
"stored":true}}' http://localhost:8983/solr/gettingstarted/schema
The previous command will force the MYFIELDNAMEHERE field to be a tlong. Replace MYFIELDNAMEHERE with the field name that you want to be explicitly set, and change tlong to the Solr type that you want to use.
After doing that, then load your data as usual.

Deleting solr documents from Solr Admin

How do I delete all the documents in my SOLR index using the SOLR Admin.
I tried using the url and it works but want to know if the same can be done using the Admin..
Use one of the queries below in the Document tab of Solr Admin UI:
XML:
<delete><query>*:*</query></delete>
JSON:
{'delete': {'query': '*:*'}}
Make sure to select the Document Type drop down to Solr Command (raw XML or JSON).
Update: newer versions of Solr may work better with this answer: https://stackoverflow.com/a/48007194/3692256
My original answer is below:
I'm cheating a little, but not as much as writing the query by hand.
Since I've experienced the pain of accidental deletions before, I try to foolproof my deletions as much as possible (in any kind of data store).
1) Run a query in the Solr Admin Query screen, by only using the "q" parameter at the top left. Narrow it to the items you actually want to delete. For this example, I'm using *:*, but you can use things like id:abcdef or a range or whatever. If you have a crazy complex query, you may find it easier to do this multiple times, once for each part of the data you wish to delete.
2) On top of the results, there is a grayed out URL. If you hover the mouse over it, it turns black. This is the URL that was used to get the results. Right (context) click on it and open it in a new tab/window. You should get something like:
http://localhost:8983/solr/my_core_name/select?q=*%3A*&wt=json&indent=true
Now, I want to get it into a delete format. I replace the select?q= with update?commit=true&stream.body=<delete><query> and, at the end, the &wt=json&indent=true with </query></delete>.
So I end up with:
http://localhost:8983/solr/my_core_name/update?commit=true&stream.body=<delete><query>*%3A*</query></delete>
Take a deep breath, do whatever you do for good luck, and submit the url (enter key works).
Now, you should be able to go back to the Solr admin page and run the original query and get zero results.
For everyone who doesn't like a lot of words :-)
curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl http://localhost:8080/solr/update -H "Content-type: text/xml" --data-binary '<commit />'
select XML on collection Document tab and update below parameter.
<delete><query>*:*</query></delete>
This solution is only applicable if you are deleting all the documents in multiple collections and not for selective deletion:
I had the same scenario, where I needed to delete all the documents in multiple collections. There were close to 500k documents in each shard and there were multiple shards of each collection. Updating and deleting the documents using the query was a big task and thus followed the below process:
Used the Solr API for getting the details for all the collections -
http://<solrIP>:<port>/solr/admin/collections?action=clusterstatus&wt=json
This gives the details like name of collection, numShards, configname, router.field, maxShards, replicationFactor, etc.
Saved the output json with the above details in a file for future reference and took the backups of all the collections I needed to delete the documents in, using the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
Further I deleted all the collections which I need to remove all the documents for using the following:
http://<solr-ip>:<port>/solr/admin/collections?action=DELETEALIAS&name=collectionname
Re-created all the collections using the details in the Step 1 and the following API:
http://<solr-ip>:<port>/solr/admin/collections?action=CREATE&name=collectionname&numShards=number&replicationFactor=number&maxShardsPerNode=number&collection.configName=configname&router.field=routerfield
I executed the above steps in loop for all the collections and was done in seconds for around 100 collections with huge data. Plus, I had the backups as well for all the collections.
Refer to this for other Solr APIs: DELETEALIAS: Delete a Collection Alias, Input
Under the Documents tab, select "raw XML or JSON" under Document Type and just add the query you need using the unique identifiers for each document.
{'delete': {'query': 'filter(product_id:(25634 25635 25636))'}}
If you want delete some documents by ID you can use the Solr POST tool.
./post -c $core_name ./delete.xml
Where the delete.xml file contains documents ids:
<delete>
<id>a3f04b50-5eea-4e26-a6ac-205397df7957</id>
</delete>

Solr stop words question

My Solr installation is set up with the default stop words plus a few extra ones that I added.
Once in a while a user types a query string that consists of all stopwords. The result is that solr returns no documents at all.
What I would like to happen instead is that Solr returns all documents. Is this possible?
Frank
Off the top of my head, you could fetch the stopwords in your application (e.g. http://localhost:8983/solr/admin/file/?file=stopwords.txt), and use that list to detect stopwords in your user query before sending the query to Solr. If you detect that they're all stopwords, replace with *:* to fetch all documents.

Resources