Would Camel be appropriate for very high volume transaction processing - apache-camel

Just looking for opinions ...I have used Camel for relatively low volume transactional processing ( data flowing into a queue ) ...just wondering if Camel would still be appropriate for very high volume processing where relatively short transaction related messages ( < 2k each ) would flow in at a rate of over 200,000 messages a minute. I'm sure anything could be made to work ...just wondering if it would be a reasonable go to solution.

Related

Kinds of multi-partitioned stored procedures and will they still lock the entire cluster in VoltDB 9?

I try to understand the impact of multi-partitioned transactions in VoltDB 9.x. I know it is designed for single-partioned transactions, but I want to know what it will cost me if I can't avoid it.
In summary, my question is whether it is still the case that multi-partitioned transactions in VoltDB always lock the entire cluster and how are the different kinds of multi-partitioned transactions are related to each other regarding to their execution behaviour?
From H-Store-FAQ:
[...] this allows H-Store to support additional optimizations, such as speculative execution and arbitrary multi-partition transactions. For example, in VoltDB every transaction is either single-partition or all-partition. That is, any transaction that needs to touch multiple partitions will cause the VoltDB’s transaction coordinator to lock all partitions in the cluster, even if the transaction only needs to touch data at two partitions. [...] It is likely VoltDB will support these features in the future [...]
The papers The VoltDB Main Memory DBMS and How VoltDB does Transactions claim that it exists at least one split of multi-partitioned transactions in VoltDB: One-Shot-Reads and General-2PC-Transactions.
In the class MpTransactionTaskQueue there is a distinction, whether a transaction will be routed to the multi-partitioned site (count 1) or a pool of read-only sites (default count up to 20) of the MPI and they can't be executed interleaved.
So these are my sub questions:
Are One-Shot-Reads always be executed on RO-Sites?
Are RO-Sites execute read-only and not-one-phase multi-partitioned transactions in addition?
If it is at least one write fragment in a multi-partitioned transactions it will be executed on the RW-Site and atomic committed with 2PC?
In both cases it is possible, that I don't have to touch all partitions in the cluster. Are uninvolved partitions locked or can they execute single-partitioned transactions in the meantime (if several One-Shot-Reads or one 2PC-Transaction are running on other partitions). If they are locked, how? Does they get the FragmentTaskMessage with an empty or dummy plan fragment for example?
The class SystemProcedureCatalog defines an "Every-Flag" and it will be checked in code in addition to the read-only and single-partitioned flags. How does this flag is related to One-Shot-Reads or the Run-Everywhere-Pattern?
To make things easier for developers, procedures are called the same way regardless of what type they are. Internally there are different types of multi-partition procedures as they provide some optimizations, although there is more to be done and some H-Store projects have done research in these areas.
MP transactions still ultimately involve sending tasks to be done on all the partitions. The one exception you noticed is a special two-partition transaction that is only used in rebalancing data during elastic add or shrink.
Partitions consist of one or more sites (on separate servers) depending on kfactor. These sites stay in sync without a 2PC by requiring deterministic procedures. The partitions work through the backlog in a queue as fast as the process time (or local execution time) allows. All sites handle both reads and writes.
MP tasks sent to those partition queues have to wait on all the pending items to finish. That is why there is a pool of 20 (by default) threads for MP reads. This allows 20 tasks to be sent out at once, so that the next MP read usually doesn't have to wait for 2 networks hops + the max queue wait time + processing time before it can even get queued.
MP reads that are not "single-shot" would be Java procedures with multiple voltExecuteSQL() calls, such as a procedure where subsequent SQL queries depend on the results of prior queries. When these transactions send tasks to the partitions, the partitions have to wait for the max queue wait time + processing time + 2 network hops before they can do the next part of the transaction.
MP writes can also have multiple voltExecuteSQL() calls, plus they have to wait for a final commit signal, so this all delays the progress on the partitions.
There are certainly examples of MP transactions that shouldn't need to involve all of the partitions and could benefit from future optimizations, but it's not as easy as it may seem on a database that has to support durability to disk, k-safety, elastic add and shrink, multi-cluster active-active replication, and many of the other features that have been added to VoltDB over the years since it grew out of the H-Store project.
Disclosure: I work at VoltDB

Frequently Updated Table in Cassandra

I am doing an IoT sensor based project. In this each sensor is sending data to the server in every minute. I am expecting a maximum of 100k sensors in the future.
I am logging the data sent by each sensor in history table. But I have a Live Information table in which latest status of each sensor is being updated.
So I want to update the row corresponding to each sensor in Live Table, every minute.
Is there any problem with this? I read that frequent update operation is bad in cassandra.
Is there a better way?
I am already using Redis in my project for storing session etc. Should I move this LIVE table to Redis?
This is what you're looking for: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html
How you tune memtable thresholds depends on your data and write load. Increase memtable throughput under either of these conditions:
The write load includes a high volume of updates on a smaller set of data.
A steady stream of continuous writes occurs. This action leads to more efficient compaction.
So increasing commitlog_total_space_in_mb will make Cassandra flush memtables to disk less often. This means most of your updates will happen in memory only and you will have fewer duplicates of data.
At C* there's consistency levels for reading and consistency levels to write. If are going to have only one node then this not apply, zero problems, but if are going to use more than one dc or racks you need to increase the consistency level to grant that what you are retrieving is the last version of the updated row, or at writing level use an high consistency level. In my case I'm using ANY to write and QUORUM to read. This allows me to have all nodes expect one down to write and 51% up of the nodes to read. This is a trade off in the CAP theorem. Pls take a look at:
http://docs.datastax.com/en/cassandra/latest/cassandra/dml/dmlConfigConsistency.html
https://wiki.apache.org/cassandra/ArchitectureOverview

GAE Task Queues with ETA and large number of tasks

In my app, I need to send emails to a large number of users when an event happens. I'd like to send those emails out gradually instead of all at once. For clarity in explaining, let's say I need to send out emails to 10,000 users.
I currently do this with a task queue with a maximum rate of 1 task/second. I enqueue 10,000 tasks in batches, and the emails get sent out at a rate of 1/second.
I'd like to change this to using an ETA for the tasks instead of limiting the task queue to a maximum rate. Conceptually it would be like this (except that task submission would be batched):
now = datetime.utcnow()
for i, email in enumerate(email_list):
eta = now + datetime.timedelta(seconds=i)
deferred.defer(send_email, email, _eta=eta)
Before implementing a change like this, I'd like to have some confidence that GAE can do this efficiently.
If I have 10,000 tasks in a task queue, each with a different ETA, will the GAE task queue be able to efficiently monitor all the tasks and start them at approximately (the precise ETA isn't important) the appropriate times? I don't know what algorithm Google uses for this.
EDIT:
Imagine if you inserted a billion tasks in a single day each with an ETA. How would GAE monitor those tasks to make sure they got fired off at the right time? Polling all the tasks at some time interval (e.g., every minute) would be a terrible solution. Perhaps GAE uses some kind of priority queue. It would be nice to have some confidence that GAE has implemented an algorithm that will scale for a lot of tasks with an ETA.
With the stated daily quota of 10 billion tasks one would think they should be able to handle 10,000 of them :)
In my current project I'm also sending ~10,000 emails (SendGrid) with tasks & _eta (although in batches of 25) which works fine so far...
In the current infrastructure, the logic can be a little batchy when the throughput is significantly below the configured rate. Queues prepare tasks 5s in advance but processing can slow down if there are no tasks in a given 5s window.
It should work in general, but you might see a pattern of delays of up to 20s followed by bursts.
At a total throughput of 1B tasks/day, you would probably want to split to run over 40 queues at a rate of around 300 tasks/sec/queue. With a rate that steady, delays would be uncommon.

How to decide Kafka Cluster size

I am planning to decide on how many nodes should be present on Kafka Cluster. I am not sure about the parameters to take into consideration. I am sure it has to be >=3 (with replication factor of 2 and failure tolerance of 1 node).
Can someone tell me what parameters should be kept in mind while deciding the cluster size and how they effect the size.
I know of following factors but don't know how it quantitatively effects the cluster size. I know how it qualitatively effect the cluster size. Is there any other parameter which effects cluster size?
1. Replication factor (cluster size >= replication factor)
2. Node failure tolerance. (cluster size >= node-failure + 1)
What should be cluster size for following scenario while consideration of all the parameters
1. There are 3 topics.
2. Each topic has messages of different size. Message size range is 10 to 500kb. Average message size being 50kb.
3. Each topic has different partitions. Partitions are 10, 100, 500
4. Retention period is 7 days
5. There are 100 million messages which gets posted every day for each topic.
Can someone please point me to relevant documentation or any other blog which may discuss this. I have google searched it but to no avail
As I understand, getting good throughput from Kafka doesn't depend only on the cluster size; there are others configurations which need to be considered as well. I will try to share as much as I can.
Kafka's throughput is supposed to be linearly scalabale with the numbers of disk you have. The new multiple data directories feature introduced in Kafka 0.8 allows Kafka's topics to have different partitions on different machines. As the partition number increases greatly, so do the chances that the leader election process will be slower, also effecting consumer rebalancing. This is something to consider, and could be a bottleneck.
Another key thing could be the disk flush rate. As Kafka always immediately writes all data to the filesystem, the more often data is flushed to disk, the more "seek-bound" Kafka will be, and the lower the throughput. Again a very low flush rate might lead to different problems, as in that case the amount of data to be flushed will be large. So providing an exact figure is not very practical and I think that is the reason you couldn't find such direct answer in the Kafka documentation.
There will be other factors too. For example the consumer's fetch size, compressions, batch size for asynchronous producers, socket buffer sizes etc.
Hardware & OS will also play a key role in this as using Kafka in a Linux based environment is advisable due to its pageCache mechanism for writing data to the disk. Read more on this here
You might also want to take a look at how OS flush behavior play a key role into consideration before you actually tune it to fit your needs. I believe it is key to understand the design philosophy, which makes it so effective in terms of throughput and fault-tolerance.
Some more resource I find useful to dig in
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
https://grey-boundary.io/load-testing-apache-kafka-on-aws/
https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
I had recently worked with kafka and these are my observations.
Each topic is divided into partitions and all the partitions of a topic are distributed across kafka brokers; first of all these help to save topics whose size is larger than the capacity of a single kafka broker and also they increase the consumer parallelism.
To increase the reliability and fault tolerance,replications of the partitions are made and they do not increase the consumer parallelism.The thumb rule is a single broker can host only a single replica per partition. Hence Number of brokers must be >= No of replicas
All partitions are spread across all the available brokers,number of partitions can be irrespective of number of brokers but number of partitions must be equal to the number of consumer threads in a consumer group(to get best throughput)
The cluster size should be decided keeping in mind the throughput you want to achieve at consumer.
The total MB/s per broker would be:
Data/Day = (100×10^6 Messages / Day ) × 0.5MB = 5TB/Day per Topic
That gives us ~58MB/s per Broker. Assuming that the messages are equally split between partitions, for the total cluster we get: 58MB/s x 3 Topics = 178MB/s for all the cluster.
Now, for the replication, you have: 1 extra replica per topic. Therefore this becomes 58MB/sec/broker INCOMING original data + 58MB/sec/broker OUTGOING replication data + 58MB/sec/broker INCOMING replication data.
This gets about ~136MB/s per broker ingress and 58MB/s per broker egress.
The systems load will get very high and this is without taking into consideration any stream processing.
The system load could be handled by increasing the number of brokers and splitting your topics to more specific partitions.
If your data are very important, then you may want a different (high) replication factor. Fault tolerance is also an important factor for deciding the replication.
For example, if you had very very important data, apart from the N active brokers (with the replicas) that are managing your partitions, you may require to add stand-by followers in different areas.
If you require very low latency, then you may want to further increase your partitions (by adding additional keys). The more keys you have, the fewer messages you will have on each partition.
For low latency, you may want a new cluster (with the replicas) that manages only that special topic and no additional computation is done to other topics.
If a topic is not very important, then you may want to lower the replication factor of that particular topic and be more elastic to some data loss.
When building a Kafka cluster, the machines supporting your infrastructure should be equally capable. That is since the partitioning is done with round-robin style, you expect that each broker is capable of handling the same load, therefore the size of your messages does not matter.
The load from stream processing will also have a direct impact. A good software to manage your kafka monitor and manage your streams is Lenses, which I personally favor a lot since it does an amazing work with processing real-time streams

How many shards in a Google App Engine sharded counter?

I read today about sharded counters in Google App Engine. The article says that you should expect to max out at about 5/updates per second per entity in the data store. But it seems to me that this solution doesn't 'scale' unless you have some way of knowing how many updates you are doing per second. For example, you can allocate 10 shards, but will then start choking at 50 updates per second.
So how do you know how fast the updates are coming, and how do you feed that number back into the number of shards?
My guess is that along with the counter you could keep some record of recent activity, and if you detect a spike you can increase the number of shards. Is that generally how it's done? And if so, why isn't it done in the sample code? (That last question may be unanswerable.) Is it more common practice to monitor website activity and update shard counts as traffic rises, as opposed to doing it automatically in the code?
Update: What are the practical consequences effects of having too few shards and choking? Does it simply mean that the website becomes unresponsive, or is it possible to lose counter updates because of timeouts?
As an aside, this question talks about implementing counters without sharding, but one of the answers impies that even memcache needs to be sharded if traffic is high. So this issue of shard allocation and tuning seems to be important.
It is clearly simpler to manually monitor your website's popularity and increase the number of shards as needed. I would guess that most sites take this approach. Doing it programatically would not only be difficult, but it sounds like it would add an unacceptable amount of overhead to keep a record of all recent activity and try to analyze it to dynamically adjust the number of shards you're using.
I would prefer the simpler approach of just erring a little on the high side with the number of shards you choose.
You are correct about the practical consequences of having too few shards. Updating a datastore entity more frequently than possible which will initially cause some requests to take a long time (while the writes retry). If you have enough of them pile up, then they will start to fail as requests time out. This will certainly lead to missed counters. On the upside, your page will be so slow that users should start leaving which should relieve the pressure on the datastore :).
To address the last part of your question: Your memcache values will not require sharding. A single memcache server can handle tens of thousands of QPS of fetches and updates, so no plausibly large app is going to need to shard its memcache keys.
Why not add to the number of shards when Exceptions begin to occur?
Based on this GAE Example:
try{
Transaction tx = ds.beginTransaction();
// increment shard
tx.commit();
} catch(DatastoreFailureException e){
// Datastore is struggling to handle the current load, increase it / double it
addShards( getShardCount() );
} catch(DatastoreTimeoutException to){
// Datastore is struggling to handle the current load, increase it / double it
addShards( getShardCount() );
} catch (ConcurrentModificationException cm){
// Datastore is struggling to handle the current load, increase it / double it
addShards( getShardCount() );
}

Resources