Would an IndexBatch operation with a single document ever throw IndexBatchException? - azure-cognitive-search

The IndexBatchException documentation, e.g., when calling IndexAsync, states:
Thrown when some of the indexing actions failed, but other actions succeeded and modified the state of the index. This can happen when the Search Service is under heavy indexing load. It is important to explicitly catch this exception and check its IndexResult property. This property reports the status of each indexing action in the batch, making it possible to determine the state of the index after a partial failure.
Does this mean this exception can be safely ignored when there is just a single document in the IndexBatch? Since, it seems impossible for an IndexBatch with just a single document to fail partially.

I tried calling IndexAsync with a Merge batch containing a single document to update, but with a non-existing document key (as recommended by Bruce):
var nonExistingDocument = SomeDocument()
var work = IndexBatch.Merge( nonExistingDocument );
try
{
await _search.Documents.IndexAsync( work );
}
catch ( IndexBatchException e )
{
var toRetry = e.FindFailedActionsToRetry( work, d => d.Id);
}
IndexBatchException was triggered, which is different behavior than what is documented in two ways:
"Thrown when some of the indexing actions failed, but other actions succeeded and modified the state of the index." Instead, the exception is thrown when any action fails.
"This can happen when the Search Service is under heavy indexing load." This can also happen for incorrect requests.
But, FindFailedActionsToRetry is seemingly smart enough to not suggest retrying requests which have failed due to erroneous requests. The toRetry enumeration is empty in the code sample above.
In short, no, this exception cannot be safely ignored. The documentation is misleading and it would be nice if it were updated.

Related

Akka.net - Streams with parallelism, backpressure and ActorRef

Tying to learn how use Akka.net Streams to process items in parallel from a Source.Queue, with the processing done in an Actor.
I've been able to get it to work with calling a function with Sink.ForEachParallel, and it works as expected.
Is it possible to process items in parallel with Sink.ActorRefWithAck (as I would prefer it utilize back-pressure)?
About to press Post, when tried to combine previous attempts and viola!
Previous attempts with ForEachParallel failed when I tried to create the actor within, but couldn't do so in an async function. If I use an single actor previous declared, then the Tell would work, but I couldn't get the parallelism I desired.
I got it to work with a router with roundrobin configuration.
var props = new RoundRobinPool(5).Props(Props.Create<MyActor>());
var actor = Context.ActorOf(props);
flow = Source.Queue<Element>(2000,OverflowStrategy.Backpressure)
.Select(x => {
return new Wrapper() { Element = x, Request = ++cnt };
})
.To(Sink.ForEachParallel<Wrapper>(5, (s) => { actor.Tell(s); }))
.Run(materializer);
The Request ++cnt is for console output to verify the requests are being processed as desired.
MyActor has a long delay on every 10th request to verify the backpressure was working.

php file() triggers no Exception on valid URL

try {
$antwort = file_get_contents('http://not_existing.notnotnot', false);
if($antwort===false) echo 'ERROR';
} catch(Exception $e) {
$e->getMessage();
}
var_dump($antwort); // returns string(0) ""
I get no Exception, no false, just empty content for every URL. A valid URL returns with this snippet the right content. Why can't I get exceptions for an invalid URL?
I came to this question because a wget on the same server leads to a valid return, but with a php script I can't file() the same URL. Really weird and I have no idea how to debug it.
It won't throw an exception if the file isn't found; it will raise a warning-level error. Those are different things. From the docs:
An E_WARNING level error is generated if filename cannot be found, maxlength is less than zero, or if seeking to the specified offset in the stream fails.
You should check for a false return, as you do, and not expect to catch an exception.
Also keep in mind when fetching a URL that the remote server may return an incorrect status code (instead of the expected 404), causing your script to think the file exists when it does not. You may need to check for empty values ("") as well.
As a rule, you should avoid using file_get_contents to access files via HTTP. It's not terribly secure, and many hosts don't even allow you to use it that way. Instead, use cURL, which is specifically designed for retrieving data over the web, including via HTTP.

Cassandra query returns empty array on first query but non-empty array on subsequent queries

I am trying to do a simple query against a Cassandra cluster using the Node.js Cassandra driver. I am following the examples, but it seems like my callback functions are getting called even if there are no results in the returned set.
var q = function (param) {
var query = 'SELECT * FROM my_table WHERE my_col=?';
var params = [param];
console.log("Cassandra query is being called with parameters " + params);
client.execute(query, params, function (err, result) {
console.log("Cassandra callback is being called. Row count is " + result.rows.length);
if (err != null) {
console.log("Got error");
}
if (result.rows.length <= 0) {
console.log("Sending empty response.");
}
else {
console.log("Sending non-empty response.");
}
});
};
q('value');
q('value');
Outputs
Cassandra query is being called with parameters value
Cassandra query is being called with parameters value
Cassandra callback is being called. Row count is 0
Sending empty response.
Cassandra callback is being called. Row count is 1
Sending non-empty response.
This is happening fairly consistently, but sometimes both calls will come up empty, and sometime both calls will return values.
I guess I'm doing something wrong with the async calls here, but I'm not really sure what it is.
I think this was due to a bad node in my cluster. One of the nodes was missing a bunch of data, and so one third of the time I would get back no results.
When you open the connection do you send down a consistency command?
I had this issue as well, when I open a connection call execute and send use <keyspace>; followed by consistency local_quorum;. This tells Cassandra to look a little harder for the answer and agree with other nodes that you are getting the latest version.
Note: I keep my connections around for a while, so this doesn't add any meaningful overhead (for me).
Note2: The above doesn't seem to work from my python app anymore (using Cassandra 3.5). However, using SimpleStatement and setting the consistency before calling execute does: https://datastax.github.io/python-driver/getting_started.html#setting-a-consistency-level
I hope this helps.

Indexeddb: Differences between onsuccess and oncomplete?

I use two different events for the callback to respond when the IndexedDB transaction finishes or is successful:
Let's say... db : IDBDatabase object, tr : IDBTransaction object, os : IDBObjectStore object
tr = db.transaction(os_name,'readwrite');
os = tr.objectStore();
case 1 :
r = os.openCursor();
r.onsuccess = function(){
if(r.result){
callback_for_result_fetched();
r.result.continue;
}else callback_for_transaction_finish();
}
case 2:
tr.oncomplete = callback_for_transaction_finish();
It is a waste if both of them work similarly. So can you tell me, is there any difference between them?
Sorry for raising up quite an old thread, but it's questioning is a good starting point...
I've looked for a similar question but in a bit different use case and actually found no good answers or even a misleading ones.
Think of a use case when you need to make several writes into the objectStore of even into several ones. You definitely don't want to manage each single write and it's own success and error events. That is the meaning of transaction and this is the (proper) implementation of it for indexedDB:
var trx = dbInstance.transaction([storeIdA, storeIdB], 'readwrite'),
storeA = trx.objectStore(storeIdA),
storeB = trx.objectStore(storeIdB);
trx.oncomplete = function(event) {
// this code will run only when ALL of the following requests are succeed
// and only AFTER ALL of them were processed
};
trx.onerror = function(error) {
// this code will run if ANY of the following requests will fail
// and only AFTER ALL of them were processed
};
storeA.put({ key:keyA, value:valueA });
storeA.put({ key:keyB, value:valueB });
storeB.put({ key:keyA, value:valueA });
storeB.put({ key:keyB, value:valueB });
Clue to this understanding is to be found in the following statement of W3C spec:
To determine if a transaction has completed successfully, listen to the transaction’s complete event rather than the success event of a particular request, because the transaction may still fail after the success event fires.
While it's true these callbacks function similarly they are not the same: the difference between onsuccess and oncomplete is that transactions complete but requests, which are made on those transactions, are successful.
oncomplete is only defined in the spec as related to a transaction. A transaction doesn't have an onsuccess callback.
I would only caution that there is no garentee that getting a successful trx.oncomplete means the data was written to the disk/database:
We are seeing a problem with trx.oncomplete where the data is not being written to the db on disk. FireFox has an explanation of what they did that is causing this problem here: https://developer.mozilla.org/en-US/docs/Web/API/IDBTransaction/oncomplete
It seems that windows/edge is also having the same issue. Basically, there is no guarantee that your app will have data written to the database, if/when the user decides to kill or power down the device. We've even tried waiting up to 15 minutes before shutting down in some cases and haven't seen the data written. For me I'd always want to ensure that a data write completes and is committed.
Are there other solutions for a real persistent database, or enhancements to the IndexedDB beyond FF experimental add...

AppEngine - Optimize read/write count on POST request

I need to optimize the read/write count for a POST request that I'm using.
Some info about the request:
The user sends a JSON array of ~100 items
The servlet needs to check if any of the received items is newer then its counterpart in the datastore using a single long attribute
I'm using JDO
what i currently do is (pseudo code):
foreach(item : json.items) {
storedItem = persistenceManager.getObjectById(item.key);
if(item.long > storedItem.long) {
// Update storedItem
}
}
Which obviously results in ~100 read requests per request.
What is the best way to reduce the read count for this logic? Using JDO Query? I read that using "IN"-Queries simply results in multiple queries executed after another, so I don't think that would help me :(
There also is PersistenceManager.getObjectsById(Collection). Does that help in any way? Can't find any documentation of how many requests this will issue.
I think you can use below call to do a batch get:
Query q = pm.newQuery("select from " + Content.class.getName() + " where contentKey == :contentKeys");
Something like above query would return all objects you need.
And you can handle all the rest from here.
Best bet is
pm.getObjectsById(ids);
since that is intended for getting multiple objects in a call (particularly since you have the ids, hence keys). Certainly current code (2.0.1 and later) ought to do a single datastore call for getEntities(). See this issue

Resources