Watson Discovery - Delete documents - ibm-watson

Is there any possibility to delete documents from a watson discovery collection by date.
Something I would do in a SQL database like this:
DELETE FROM collection_name
WHERE publication_date < 2018-01-01;
I know, I can delete single documents by name and I could query the documents with a publication_date filer and after querying I could iterate over the document names and delete every single document but this seems a rather annoying approach for a quite simple task.

#user3609367 there is no way to delete multiple documents with a single API call. The approach you have mentioned in your post is the best way to do what you are asking.

Related

Azure Search: What is the best way to update a batch of documents

We need to update a batch of documents based on some criteria in an Azure Search Index. The only way we can think of with the current implementation is :
Search for the required documents (e.g. Category = 1)
Create new documents using the document Id of the result
In the new documents update the required fields (e.g. Price = Price*1.1)
Use a Merge to Update the newly created documents to update existing ones.
The above code looks like we are back in the 1960s or that we have a few screws loose in our brain! Is this the only way to achieve this in Azure Search?
We are using the .NET SDK.
Your algorithm to update documents matching a query is indeed correct. One note: use filter instead of full-text search - it will be more efficient.
Azure Search is a search engine, not a general-purpose database, so it doesn't directly support an equivalent of SQL's UPDATE ... WHERE pattern.

Retrieve all documents but only specific fields from Cloudant database

I want to return all the documents in a Cloudant database but only include some of the fields. I know that you can make a GET request against https://$USERNAME.cloudant.com/$DATABASE/_all_docs, but there doesn't seem to be a way to select only certain fields.
Alternatively , you can POST to /db/_find and include selector and fields in the JSON body. However, is there a universal selector, similar to SELECT * in SQL databases?
You can use {"_id":{"$gt":0}} as your selector to match all the documents, although you should note that it is not going to be performant on large data sets.

hbase execute batch statement

I am using lucene 3.0.1 to index a column in hbase. After making query in lucene I am getting a array of keys (which is of same format I have key in hbase) in java, now for all of these keys I want to make query to hbase and get corresponding rows from database. I am not able to find IN operator in hbase documentation, other option is I loop over set of keys and make query to hbase but in this case I will be making lot of hbase database calls. Is there any other option any help is much appreciated. Thanks
The get method of the HTable class can accept a list of GET objects and fetch them all as batch see the documentation
You essentially need to do something like
List<Get> rowsToGet= new ArrayList<Get>();
for (String id:resultsFromLucene)
rowsToGet.add(new Get(Bytes.toBytes(id)));
Result[] results = htable.get(rowsToGet);

Can SOLR perform an UPSERT?

I've been attempting to do the equivalent of an UPSERT (insert or update if already exists) in solr. I only know what does not work and the solr/lucene documentation I have read has not been helpful. Here's what I have tried:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","name":{"set":"steve"}}]'
{"responseHeader":{"status":409,"QTime":2},"error":{"msg":"Document not found for update. id=1","code":409}}
I do up to 50 updates in one request and request may contain the same id with exclusive fields (title_en and title_es for example). If there was a way of querying whether or not a list of id's exist, I could split the data and perform separate insert and update commands... This would be an acceptable alternative but is there already a handler that does this? I would like to avoid doing any in house routines at this point.
Thanks.
With Solr 4.0 you can do a Partial update of all those document with just the fields that have changed will keeping the complete document same. The id should match.
Solr does not support UPSERT mechanics out of the box. You can create a record or you can update a record and syntax is different.
And if you update the record you must make sure all your other pre-inserted fields are stored (not just indexed). Under the covers, an update creates a completely new record just pre-populated with previously stored values. But that functionality if very deep in (probably in Lucene itself).
Have you looked at DataImportHandler? You reverse the control flow (start from Solr), but it does have support for checking which records need to be updated and which records need to be created.
Or you can just run a solr query like http://solr.example.com:8983/solr/select?q=id%3A(ID1+ID2+ID3)&fl=id&wt=csv where you ask Solr to look for your ID records and return only ID of records it does find. Then, you could post-process that to segment your Updates and Inserts.

updating solr document using a query?

In the solr documentation, there are options to delete documents using a query, something like the following:
<delete><query>*:*</query></delete>
<delete><query>id:298253</query>
<query>entitytype:BlogEntry</query></delete>
However, I could not find any references about updating documents based on a query. Is this possible with updates in solr? Basically I would like to update the values of all the documents that match a query.
Something like update prop1=val1, prop2=val2 where ( prop3 < val3 and prop4=val4 )
Thanks,
-Vineel
The ability to update documents is being added to the Solr 4.0 release, which just went into Beta this week. I am not sure if there will be the ability to update documents based on a query, but you could ask in the Solr Users List. Unfortunately, I have not had a chance to explore the 4.0 version yet to see how atomic updates work.
Keep in mind that for partially updating documents in Solr, they need to be stored. Which increases the index size. Check this for some background

Resources