How to delete all data from solr and hbase - solr

How do I delete all data from solr by command? We are using solr with lily and hbase.
How can I delete data from both hbase and solr?
http://lucene.apache.org/solr/4_10_0/tutorial.html#Deleting+Data

If you want to clean up Solr index -
you can fire http url -
http://host:port/solr/[core name]/update?stream.body=<delete><query>*:*</query></delete>&commit=true
(replace [core name] with the name of the core you want to delete from). Or use this if posting data xml data:
<delete><query>*:*</query></delete>
Be sure you use commit=true to commit the changes
Don't have much idea with clearing hbase data though.

I've used this request to delete all my records but sometimes it's necessary to commit this.
For that, add &commit=true to your request :
http://host:port/solr/core/update?stream.body=<delete><query>*:*</query></delete>&commit=true

Post json data (e.g. with curl)
curl -X POST -H 'Content-Type: application/json' \
'http://<host>:<port>/solr/<core>/update?commit=true' \
-d '{ "delete": {"query":"*:*"} }'

You can use the following commands to delete.
Use the "match all docs" query in a delete by query command:
'<delete><query>*:*</query></delete>
You must also commit after running the delete so, to empty the index, run the following two commands:
curl http://localhost:8983/solr/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8'
curl http://localhost:8983/solr/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
Another strategy would be to add two bookmarks in your browser:
http://localhost:8983/solr/update?stream.body=<delete><query>*:*</query></delete>
http://localhost:8983/solr/update?stream.body=<commit/>
Source docs from SOLR:
https://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F

If you want to delete all of the data in Solr via SolrJ do something like this.
public static void deleteAllSolrData() {
HttpSolrServer solr = new HttpSolrServer("http://localhost:8080/solr/core/");
try {
solr.deleteByQuery("*:*");
} catch (SolrServerException e) {
throw new RuntimeException("Failed to delete data in Solr. "
+ e.getMessage(), e);
} catch (IOException e) {
throw new RuntimeException("Failed to delete data in Solr. "
+ e.getMessage(), e);
}
}
If you want to delete all of the data in HBase do something like this.
public static void deleteHBaseTable(String tableName, Configuration conf) {
HBaseAdmin admin = null;
try {
admin = new HBaseAdmin(conf);
admin.disableTable(tableName);
admin.deleteTable(tableName);
} catch (MasterNotRunningException e) {
throw new RuntimeException("Unable to delete the table " + tableName
+ ". The actual exception is: " + e.getMessage(), e);
} catch (ZooKeeperConnectionException e) {
throw new RuntimeException("Unable to delete the table " + tableName
+ ". The actual exception is: " + e.getMessage(), e);
} catch (IOException e) {
throw new RuntimeException("Unable to delete the table " + tableName
+ ". The actual exception is: " + e.getMessage(), e);
} finally {
close(admin);
}
}

Use the "match all docs" query in a delete by query command: :
You must also commit after running the delete so, to empty the index, run the following two commands:
curl http://localhost:8983/solr/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8'
curl http://localhost:8983/solr/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8'

I came here looking to delete all documents from solr instance through .Net framework using SolrNet. Here is how I was able to do it:
Startup.Init<MyEntity>("http://localhost:8081/solr");
ISolrOperations<MyEntity> solr =
ServiceLocator.Current.GetInstance<ISolrOperations<MyEntity>>();
SolrQuery sq = new SolrQuery("*:*");
solr.Delete(sq);
solr.Commit();
This has cleared all the documents. (I am not sure if this could be recovered, I am in learning and testing phase of Solr, so please consider backup before using this code)

From the command line use:
bin/post -c core_name -type text/xml -out yes -d $'<delete><query>*:*</query></delete>'

fire this in the browser
http://localhost:8983/solr/update?stream.body=<delete><query>*:*</query></delete>&commit=true
this commmand will delete all the documents in index in solr

I've used this query to delete all my records.
http://host/solr/core-name/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true

To delete all documents of a Solr collection, you can use this request:
curl -X POST -H 'Content-Type: application/json' --data-binary '{"delete":{"query":"*:*" }}' http://localhost:8983/solr/my_collection/update?commit=true
It uses the JSON body.

The curl examples above all failed for me when I ran them from a cygwin terminal. There were errors like this when i ran the script example.
curl http://192.168.2.20:7773/solr/CORE1/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst>
</response>
<!--
It looks like it deleted stuff, but it did not go away
maybe because the committing call failed like so
-->
curl http://192.168.1.2:7773/solr/CORE1/update --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">2</int></lst><lst name="error"><str name="msg">Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]</str><int name="code">400</int></lst>
</response>
I needed to use the delete in a loop on core names to wipe them all out in a project.
This query below worked for me in the Cygwin terminal script.
curl http://192.168.1.2:7773/hpi/CORE1/update?stream.body=<delete><query>*:*</query></delete>&commit=true
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst>
</response>
This one line made the data go away and the change persisted.

I tried the below steps. It works well.
Please make sure the SOLR server it running
Just click the link Delete all SOLR data which will hit and delete all your SOLR indexed datas then you will get the following details on the screen as output.
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">494</int>
</lst>
</response>
if you are not getting the above output then please make sure the following.
I used the default host (localhost) and port (8080) on the above link. please alter the host and port if it is different in your end.
The default core name should be collection / collection1. I used collection1 in the above link. please change it too if your core name is different.

If you need to clean out all data, it might be faster to recreate collection, e.g.
solrctl --zk localhost:2181/solr collection --delete <collectionName>
solrctl --zk localhost:2181/solr collection --create <collectionName> -s 1

I made a JavaScript bookmark which adds the delete link in Solr Admin UI
javascript: (function() {
var str, $a, new_href, href, upd_str = 'update?stream.body=<delete><query>*:*</query></delete>&commit=true';
$a = $('#result a#url');
href = $a.attr('href');
str = href.match('.+solr\/.+\/(.*)')[1];
new_href = href.replace(str, upd_str);
$('#result').prepend('<a id="url_upd" class="address-bar" href="' + new_href + '"><strong>DELETE ALL</strong> ' + new_href + '</a>');
})();

If you're using Cloudera 5.x, Here in this documentation is mentioned that Lily maintains the Real time updations and deletions also.
Configuring the Lily HBase NRT Indexer Service for Use with Cloudera Search
As HBase applies inserts, updates, and deletes to HBase table cells,
the indexer keeps Solr consistent with the HBase table contents, using
standard HBase replication.
Not sure iftruncate 'hTable' is also supported in the same.
Else you create a Trigger or Service to clear up your data from both Solr and HBase on a particular Event or anything.

When clearing out a Solr index, you should also do a commit and optimize after running the delete-all query. Full steps required (curl is all you need): http://www.alphadevx.com/a/365-Clearing-a-Solr-search-index

Solr I am not sure but you can delete all the data from hbase using truncate command like below:
truncate 'table_name'
It will delete all row-keys from hbase table.

Related

Solr query to Postman's raw data query

This is my Solr query: qt section: /select
q section: -nest_path:*
fl section: *, [child limit=-1]
Solr query
and I wanna convert this to Postman's form-data like " stmt: select id,title from TABLE_NAME where author_code is not null limit 100 " to reach same result.
If you execute your query from the Solr Admin web page you should see the corresponding URL that you will need for Postman at the top of the page
In this case the URL will be:
http://localhost:8983/solr/puldata/select?fl=*%2C%20%5Bchild%20limit%3D-1%5D&q=-nest_path%3A*
I am not sure your query is valid, though. But that would be a different issue.

Problem exporting from mongo and then importing to SQL Server

Question: how do I export from mongo such that I can import into SQL Server if I use $unwind?
I need to use $unwind which means I can't use mongoexport.exe. Mongo.exe gives different output for json as shown below. Output I can't load into SQL Server. I would export as csv output, but my data includes commas. I would use $out to first copy my data to a new collection & then use mongoexport, but I'm querying a production server in the cloud where I only have read access.
To illustrate my problem, I created a collection with one record that has a date field "edited_on". You can see here that mongoexport output starts with ["_id:{$oid.... while mongo output starts with {"_id : ObjectID(….
*** MONGOEXPORT
The command:
mongoexport --quiet --host localhost:27017 --db "zzz" -c
"Test_Structures" --fields edited_on --type json --jsonArray --out
C:\export_test.json
The output:
[{"_id":{"$oid":"5aaa1d85b8078250f1000c0e"},"edited_on":{"$date":"2018-03-15T07:15:17.583Z"}}]
I can import this data into SQL with OPENROWSET along with OPENJSON.
Described here: https://www.mssqltips.com/sqlservertip/5295/different-ways-to-import-json-files-into-sql-server/
*** MONGO
The command:
mongo localhost/UW --quiet -eval "db.Test_Structures.aggregate( {
$project: { _id: 1 , edited_on: 1} } )" > C:\aggregate_test.json
The output:
{ "_id" : ObjectId("5aaa1d85b8078250f1000c0e"), "edited_on" :
ISODate("2018-03-15T07:15:17.583Z") }
Declare #JSON varchar(max)
My coworker answered my question. Use replace() to remove the text in the json file that are causing problems as follows.
SELECT #JSON = BulkColumn
FROM OPENROWSET (BULK 'C:\aggregate_test.json', SINGLE_CLOB) as j
SET #JSON = replace(replace(replace(#JSON,'objectid(',''),'isodate(',''),'")','"')
SELECT * FROM OPENJSON (#JSON) With (...)

Getting Issue in Solr Search with DSE

I am using DSE version = 5.0.1 . Using Solr for Search activities. I created core successfully for particular table.
After that once I am trying to execute solr seach query , I am getting below issue :-
cqlsh:tradebees_dev> SELECT * FROM yf_product_books where solr_query = ':';
ServerError: <Error from server: code=0000 [Server error]
message="java.io.IOException:
No shards available for ranges: [(2205014674981121837,2205014674981121837]]">
Please suggest some solutions.
Your query should be something like this : -
SELECT * FROM yf_product_books where solr_query = ':';

How to partial update multiple documents at once in solr?

I have the following command which I was expecting to work
curl http://localhost/solr/collection1/update?commit=true -H 'Content-type:application/json' -d '
[
{
"meta.something": "78c93c7d-2a9d-4cee-8cbc-1a8bba544678",
"meta.type": "newsletter",
"meta.type": { "set": "report" }
}
]'
But it fails with
"error":{"msg":"Document is missing mandatory uniqueKey field: _id","code":400}}
So it seems it is not possible to do this without specifying the primary key. But is there some way I can update everything that matches that criteria with some script or something?

Solr Error Document is missing mandatory uniqueKey field id

While Importing the data into Solr using DataImportHandler, I am getting the below error. Please someone provide your suggestion.
org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id
at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:92)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:717)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:512)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
In the schema.xml file you have mentioned id as required field = true.
Also the document that you are trying to index in SOLR do not contain this id field and hence SOLR is throwing this error.
Solution
Either add id to all your documents
OR
Remove required = true form schema file for id field.
Please share your schema.xml file and the documents that you are trying to index into SOLR.
Also keep in mind if you want quick response try to provide as much details as you can.
this is probably where the error reporting should be improved for solr
for my case, I've defined a string field not related with KEY, and I am trying to put a null value to that field. I should probably indicate the field is nullable.
I had this problem too and the reason was in my data import configuration. If you are using a hsqldb connection to import your data, pay attention to the way you map the database columns to your solr fields. Hsqldb is case sensitive. So, for example, the following was producing an error.
<field column="item_id" name="id" />
While this is working perfectly:
<field column="ITEM_ID" name="id" />
I got this issue whereby I either had nulls or duplicates in the id field.
Check that there are no duplicates in your id field with:
SELECT MAX(c) FROM (SELECT COUNT(x) as c, if FROM TABLE GROUP BY id)
And check the response is = 1. If not, you can find the fields with:
SELECT COUNT(*) as c, id FROM <table> GROUP BY id HAVING c > 1
I had the same issue, even if everything was working fine.
I created new SolrDocument in each loop while indexing.
HttpSolrServer server =
new HttpSolrServer("http://localhost:8983/solr/shard1");
Collection<SolrInputDocument> docs = new ArrayList<>();
for (list<String> data: datas) {
SolrInputDocument doc = new SolrInputDocument();
String id = data("id");
String fileName = data("fileName");
String column2 = data("column2");
String column3 = data("column3");
i++;
doc.addField("id", id);
doc.addField("title", fileName);
doc.addField("size", getFileSizeInMb(fileEntry.length()));
doc.addField("column2", column2);
doc.addField("column3", column3);
System.out.println("doc: " + doc);
docs.add(doc);
if (i % 100 == 0) {
System.out.println("Committed :" + i);
server.add(docs);
docs.clear();
}
}
server.commit();
initializing this in each loop made my code work
SolrInputDocument doc = new SolrInputDocument();

Resources