Cassandra query returns empty array on first query but non-empty array on subsequent queries - database

I am trying to do a simple query against a Cassandra cluster using the Node.js Cassandra driver. I am following the examples, but it seems like my callback functions are getting called even if there are no results in the returned set.
var q = function (param) {
var query = 'SELECT * FROM my_table WHERE my_col=?';
var params = [param];
console.log("Cassandra query is being called with parameters " + params);
client.execute(query, params, function (err, result) {
console.log("Cassandra callback is being called. Row count is " + result.rows.length);
if (err != null) {
console.log("Got error");
}
if (result.rows.length <= 0) {
console.log("Sending empty response.");
}
else {
console.log("Sending non-empty response.");
}
});
};
q('value');
q('value');
Outputs
Cassandra query is being called with parameters value
Cassandra query is being called with parameters value
Cassandra callback is being called. Row count is 0
Sending empty response.
Cassandra callback is being called. Row count is 1
Sending non-empty response.
This is happening fairly consistently, but sometimes both calls will come up empty, and sometime both calls will return values.
I guess I'm doing something wrong with the async calls here, but I'm not really sure what it is.

I think this was due to a bad node in my cluster. One of the nodes was missing a bunch of data, and so one third of the time I would get back no results.

When you open the connection do you send down a consistency command?
I had this issue as well, when I open a connection call execute and send use <keyspace>; followed by consistency local_quorum;. This tells Cassandra to look a little harder for the answer and agree with other nodes that you are getting the latest version.
Note: I keep my connections around for a while, so this doesn't add any meaningful overhead (for me).
Note2: The above doesn't seem to work from my python app anymore (using Cassandra 3.5). However, using SimpleStatement and setting the consistency before calling execute does: https://datastax.github.io/python-driver/getting_started.html#setting-a-consistency-level
I hope this helps.

Related

Would an IndexBatch operation with a single document ever throw IndexBatchException?

The IndexBatchException documentation, e.g., when calling IndexAsync, states:
Thrown when some of the indexing actions failed, but other actions succeeded and modified the state of the index. This can happen when the Search Service is under heavy indexing load. It is important to explicitly catch this exception and check its IndexResult property. This property reports the status of each indexing action in the batch, making it possible to determine the state of the index after a partial failure.
Does this mean this exception can be safely ignored when there is just a single document in the IndexBatch? Since, it seems impossible for an IndexBatch with just a single document to fail partially.
I tried calling IndexAsync with a Merge batch containing a single document to update, but with a non-existing document key (as recommended by Bruce):
var nonExistingDocument = SomeDocument()
var work = IndexBatch.Merge( nonExistingDocument );
try
{
await _search.Documents.IndexAsync( work );
}
catch ( IndexBatchException e )
{
var toRetry = e.FindFailedActionsToRetry( work, d => d.Id);
}
IndexBatchException was triggered, which is different behavior than what is documented in two ways:
"Thrown when some of the indexing actions failed, but other actions succeeded and modified the state of the index." Instead, the exception is thrown when any action fails.
"This can happen when the Search Service is under heavy indexing load." This can also happen for incorrect requests.
But, FindFailedActionsToRetry is seemingly smart enough to not suggest retrying requests which have failed due to erroneous requests. The toRetry enumeration is empty in the code sample above.
In short, no, this exception cannot be safely ignored. The documentation is misleading and it would be nice if it were updated.

Filter cached REST-Data vs multiple REST-calls

I'm building an Angular Shop-Frontend which consumes a REST-API with Restangular.
To get the articles from the API, I use Restangular.all("articles") and I setup Restangular to cache this request.
When I want to get one article from the API, for example on the article-detail page by it's linkname and later somewhere else (on the cart-summary) by it's id, I would need 3 REST-calls:
/api/articles
/api/articles?linkname=some_article
/api/articles/5
But actually, the data from the two later calls is already available from the cached first call.
So instead I thought about using the cached articles and filter them to save the additional REST-calls.
I built these functions into my ArticleService and it works as expected:
function getOne(articleId) {
var article = $q.defer();
restangular.all("articles").getList().then(function(articles) {
var filtered = $filter('filter')(wines, {id: articleId}, true);
article.resolve((filtered.length == 1) ? filtered[0] : null);
});
return article.promise;
}
function getOneByLinkname(linkname) {
var article = $q.defer();
restangular.all("articles").getList().then(function(articles) {
var filtered = $filter('filter')(articles, {linkname: linkname}, true);
article.resolve((filtered.length == 1) ? filtered[0] : null);
});
return article.promise;
}
My questions concerning this approach:
Are there any downsides I don't see right now? What would be the correct way to go? Is my approach legitimate, to have as little REST-calls as possible?
Thanks for your help.
Are there any downsides I don't see right now?
Depends on how the functionality of your application. If it requires real time data, then having REST calls performed to obtain the latest data would be a requirement.
What would be the correct way to go? Is my approach legitimate, to have as little REST-calls as possible?
Depends still. If you want, you can explore push data notifications, such that when your data from the server is changed or modified, you could push those info to your client. That way, the REST operations happens based on conditions you would have defined.

rx.js catchup subscription from two sources

I need to combine a catch up and a subscribe to new feed. So first I query the database for all new records I've missed, then switch to a pub sub for all new records that are coming in.
The first part is easy do your query, perhaps in batches of 500, that will give you an array and you can rx.observeFrom that.
The second part is easy you just put an rx.observe on the pubsub.
But I need to do is sequentially so I need to play all the old records before I start playing the new ones coming in.
I figure I can start the subscribe to pubsub, put those in an array, then start processing the old ones, and when I'm done either remove the dups ( or since I do a dup check ) allow the few dups, but play the accumulated records until they are gone and then one in one out.
my question is what is the best way to do this? should I create a subscribe to start building up new records in an array, then start processing old, then in the "then" of the oldrecord process subscribe to the other array?
Ok this is what I have so far. I need to build up the tests and finish up some psudo code to find out if it even works, much less is a good implementation. Feel free to stop me in my tracks before I bury myself.
var catchUpSubscription = function catchUpSubscription(startFrom) {
EventEmitter.call(this);
var subscription = this.getCurrentEventsSubscription();
// calling map to start subscription and catch in an array.
// not sure if this is right
var events = rx.Observable.fromEvent(subscription, 'event').map(x=> x);
// getPastEvents gets batches of 500 iterates over and emits each
// till no more are returned, then resolves a promise
this.getPastEvents({count:500, start:startFrom})
.then(function(){
rx.Observable.fromArray(events).forEach(x=> emit('event', x));
});
};
I don't know that this is the best way. Any thoughts?
thx
I would avoid mixing your different async strategies unnecessarily. You can use concat to join together the two sequences:
var catchUpSubscription = function catchUpSubscription(startFrom) {
var subscription = this.getCurrentEventsSubscription();
return Rx.Observable.fromPromise(this.getPastEvents({count:500, start:startFrom}))
.flatMap(x => x)
.concat(Rx.Observable.fromEvent(subscription, 'event'));
};
///Sometime later
catchUpSubscription(startTime).subscribe(x => /*handle event*/)

Redis: Wrong data in wrong tables

I am trying to solve a problem that has been blocking me for a month
I am bulding the backend of an application using Node.js & Redis and due to our structure we have to transfer data from one redis table to another (What I mean by table is the one's that we use "select" i.e. "select 2")
We receive a lot of request and push a lot of response in a sec, and no matter how much I tried I could not stop data getting mixed. Assume we have a "teacherID" that has to be stored inside Redis table #2. And a "studentID" that has to be stored in Redis table #4. How matter what I tried (I've checked my code multiple times) I could not stop teacherID getting into studentID. The last trick I've tried was actually placing callback at each select.;
redisClient.select(4, function(err) {
if(err)
console.log("You could not select the table. Function will be aborted");
else {
// Proceed with the logic
}
});
What could be the reason that I cannot simply stop this mess ? One detail that drivers me crazy is that it works really well on local and also online however whenever multiple request reach to server it gets mixed. Any suggestions to prevent this error? (Even though I cannot share the code to NDA I can make sure that logic has been coded correctly)
I'm not sure about your statement about having to "transfer data from one redis table to another". Reading through your example it seems like you could simply have two redis clients that write to different databases (what you called "tables").
It would look similar to this:
var redis = require("redis");
var client1 = redis.createClient(6379, '127.0.0.1');
var client2 = redis.createClient(6379, '127.0.0.1');
client1.select(2, function(err){
console.log('client 1 is using database 2');
});
client2.select(4, function(err){
console.log('client 2 is using database 4');
});
Then, wherever your read/write logic is, you just use the appropriate client:
client1.set("someKey", "teacherID", function(err){
// ...
});
client2.set("someKey", "studentID", function(err){
// ...
});
You can obviously encapsulate the above into functions with callbacks, nest the operations, use some async library, etc. to make it do whatever you need it to do. If you need to transfer values from database 2 to database 4 you could do a simple client1.get() and client2.set().

AppEngine - Optimize read/write count on POST request

I need to optimize the read/write count for a POST request that I'm using.
Some info about the request:
The user sends a JSON array of ~100 items
The servlet needs to check if any of the received items is newer then its counterpart in the datastore using a single long attribute
I'm using JDO
what i currently do is (pseudo code):
foreach(item : json.items) {
storedItem = persistenceManager.getObjectById(item.key);
if(item.long > storedItem.long) {
// Update storedItem
}
}
Which obviously results in ~100 read requests per request.
What is the best way to reduce the read count for this logic? Using JDO Query? I read that using "IN"-Queries simply results in multiple queries executed after another, so I don't think that would help me :(
There also is PersistenceManager.getObjectsById(Collection). Does that help in any way? Can't find any documentation of how many requests this will issue.
I think you can use below call to do a batch get:
Query q = pm.newQuery("select from " + Content.class.getName() + " where contentKey == :contentKeys");
Something like above query would return all objects you need.
And you can handle all the rest from here.
Best bet is
pm.getObjectsById(ids);
since that is intended for getting multiple objects in a call (particularly since you have the ids, hence keys). Certainly current code (2.0.1 and later) ought to do a single datastore call for getEntities(). See this issue

Resources