We have a use case where there may be a high volume of requests for the same key. Aerospike routes requests to partitions based on a hash of the primary key. Additional partitions and rebalancing can be happen to support higher traffic. However, how can Aerospike handle the case where the hot spotting is caused by a single key?
Is there a better database solution for this case?
This article has some input: https://discuss.aerospike.com/t/hot-key-error-code-14/986.
The first thing is to turn on the [read-page-cache][1], and then spreading between different replicas... or, depending on the use, have multiple copies and reconcile on reads across them (depending if it is a write hotkey or a read one).
Related
I have a stream of data, containing a key, that I need to mix and match with data associated with that key. Each key belongs to a partition, and each partition can be loaded from a database.
Data is quite big and only a few hundred out of hundreds of thousands partitions can fit in a single task manager.
My current approach is to use partitionCustom based on the key.partition and cache the partition data inside a RichMapFunction to mix and match without reloading the data of the partitions multiple times.
When the number of message rate on a same partition gets too high, I hit a hot-spot/performance bottleneck.
What tools do I have in Flink to improve the throughput in this case?
Are there ways to customize the scheduling and to optimize the job placements based on setup time on the machines, and maximum processing time history?
It sounds like (a) your DB-based data is also partitioned, and (b) you have skew in your keys, where one partition gets a lot more keys than other partitions.
Assuming the above is correct, and you've done code profiling on your "mix and match" code to make that reasonably efficient, then you're left with manual optimizations. For example, if you know that keys in partition X are much more common, you can put all of those keys in one partition, and then distribute the remaining keys amongst the other partitions.
Another approach is to add a "batcher" operator, which puts up to N keys for the same partition into a group (typically this also needs a timeout to flush, so data doesn't get stuck). If you can batch enough keys, then it might not be so bad to load the DB data on demand for the partition associated with each batch of keys.
I've read in a book that
Flink maintains one state instance per keyvalue and partitions all records with the same key to the
operator task that maintains the state for this key.
my question is:
lets say i have 4 tasks with 2 slots each.
and there's a key that belongs to 95% of the data.
does it means that 95% the data is routed to the same machine?
Yes, it does mean that. If you have a hot key, then partitioning by key doesn't scale well.
In some cases, there are ways to work around this limitation. For example, if you are computing analytics (e.g., you want to count page views per page per minute, and one page gets 95% of the page views), you can do pre-aggregation -- split the work for the hot key across several parallel instances, and then do one final, non-parallel reduction of the partial results. (This is just standard map/reduce logic.)
This is called "data skew" and it is the bane of scalable applications everywhere.
It's also possible that the entire (100%) load goes to the same machine. There's no guarantee that the data is spread as evenly as possible by key, only that each key gets processed on a single machine. Technically, each key gets mapped to a key group (the number of key groups is the max parallelism for the topology) and each key group gets handled by a specific instance of an operator.
One way to handle this situation involves adding a second field to the key, resulting in a greater number of possible keys and possibly reducing the data skew across the keys. Then aggregate the results in a subsequent operator using just the one original key.
I'm new to Couchbase and wondering if there is any manner to implement a parallel read from bucket. Given that, a bucket contains 1024 vbuckets by default. So could it be possible to split a N1QL query select * from b1 into several queries? It means that one of those queries just reads data from vbucket1 to vbucket100. Because the partition key is used to decide which node the value should be persisted. I think it could be possible to read a part of data from bucket according to a range of partition key. Could someone help me out of this?
Thanks
I don't recommend proceeding down this route. If you are just starting out, you should be worrying about how to represent your data in JSON, how to write effective N1QL queries against it, and how to get a useful set of indexes that support those queries and let them run quickly. You should also make sure that your cluster is properly set up, and you have a proper mix of KV, N1QL, and indexing nodes, with none of them as an obvious bottleneck. And of course you should be measuring performance. Exotic strategies like query partitioning should come after that, if you are still unsatisfied with performance.
We have a table which is of size 100TB and we have multiple customers using the same table (i.e every customer uses different where conditions). Now the problem statement is every time a customer tries to query the table it gets scanned from top to bottom.
This creates lot of slowness for all the queries. We cannot even partition/bucket the table basing on any business keys. Can someone provide solution or point to similar problem statements and their resolution.
you can provide your suggestions as well as alternative technologies so that we can pick the best suitable one. Thanks.
My 2 cents: experiment with an ORC table with GZip compression (default) and clever partitioning / ordering...
every SELECT that uses a partition key in its WHERE clause will
do "partition pruning" and thus avoid to scan everything [OK, OK, you said you had no good candidate in your specific case, but in general it can be done so I had to mention it first]
then within each ORC file in scope, the min/max counters will be
checked for "stripe pruning", limiting the I/O further
With clever partitioning & clever ordering of the data at INSERT time, using the most-frequent filters, the pruning can be quite efficient.
Then you can look into optimizations such as using a non-default ORC stripe size, a non-default "bytes-per-reducer" threshold, etc.
Reference:
http://fr.slideshare.net/oom65/orc-andvectorizationhadoopsummit
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
https://streever.atlassian.net/wiki/display/HADOOP/Optimizing+ORC+Files+for+Query+Performance
http://thinkbig.teradata.com/hadoop-performance-tuning-orc-snappy-heres-youre-missing/
One last thing: with 15 nodes for running queries and a replication factor of 3, each HDFS block is available "locally" on 3 the nodes (20%) and "remotely" in the rest (80%). A higher replication factor may reduce I/O and network bottlenecks -- at the cost of disk space, of course.
Let's say I had a table with N rows, but no existing columns that could act as a Primary Key.
I'd like to generate one (for my convenience and completeness).
I have a few options for doing this.
I could use a GUID
I could use a sequence and generate an integer for each one (e.g., populated 1 to N)
I could generate a random integer
(and many more)
I get that GUIDs have their advantages and disadvantages.
Is there some advantage to using a randomly generated integer over a sequential integer?
Any CRUD operations on an indexed column shouldn't be affected. And if you were doing a bulk load, I would temporarily turn off the index and then restore it afterwards
I can't see a reason, but I've come across a situation (in this case Oracle) where someone has done just that and I'm hoping its more than "What's a sequence?".
Since you're seeing a specific implementation that has chosen this approach, we can only speculate at what the original developer might have been thinking. That's always subject to error.
My guess is that the original developer was trying to avoid the issue where the right-most block in the index on the sequence-generated key becomes the resource that blocks many different sessions trying to do an insert. The "hot block" problem occurs because every session doing an insert needs to modify the data in the right-most block (assuming sequential keys) so Oracle needs to serialize access. In most systems, this isn't a big deal-- the amount of serialization needed is minimal and most systems don't have enough simultaneous insert operations for this to be a meaningful issue. But if you have a very high-volume system, particularly if you're running on a RAC cluster, those wait events can be meaningful. If you had this sort of issue, generating a random key would eliminate it by causing the various sessions to (generally) write to different blocks in the index.
Of course, generating random keys would not be the recommended approach even if you found yourself waiting on the right-most block of an index frequently. Oracle provides reverse-key indexes to take care of the hot block issue by indexing the data in reverse which distributes I/O across the blocks in the index. If you're licensed to use the partitioning option, a hash-partitioned index would be even better. For a more detailed discussion on reverse-key indexes, RAC, and mitigating hot block issues, here's a link to a related SO question.