Tuning rocksDB to handle a lot of missing keys - apache-flink

I'm trying to configure the rocksdb I'm using as a backend for my flink job. The state rocksdb needs to hold is not too big (around 5G) but it needs to deal with a lot of missing keys. I mean that 80% of the get requests will not find the key in the data base. I wonder whether there is a specific configuration to help with the memory consumption. I have tried to use bloom filters with 3 bits key and increase the block size to 16kb but it doesn't seem to help and the job fails on out of memory exceptions.
I'll be glad to hear more suggestions 😊

I wonder whether there is a specific configuration to help with the memory consumption.
If you are able to obtain a heap profiling (like https://gperftools.github.io/gperftools/heapprofile.html ?), it will be helpful to figure out out what part of RocksDB consume the most memory.
Given your memory budget (i.e, expectation) you plan for your RocksDB, you might start with some general memory controls as following:
WriteBufferManager (for memtable, which is a large memory consumer) https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager
Twisting block cache size (another large memory consumer) https://github.com/facebook/rocksdb/wiki/Block-Cache
Tracking/capping memory in block cache (https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#block-cache, https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency)
I am not clear on how missing keys can potentially affect your memory consumption in specific way though.

Related

Wha tif the size of state is larger than the flink memory size?

I am wondering that the size of state is larger than the flink's memory size.
Since the state is controlled by the Flink App's APIs by defining MapState<K,V> in the code level, the state is possible to store large size of values (which is over than memory size such as 100Gb,200Gb).
Can it be possible?
You might be interested in reading about State Backends
The HashMapStateBackend holds data internally as objects on the Java heap
HashMapStateBackend will OOM your task managers if your MapStates are too big.
The EmbeddedRocksDBStateBackend holds in-flight data in a RocksDB database that is (per default) stored in the TaskManager local data directories
[...] Note that the amount of state that you can keep is only limited by the amount of disk space available. This allows keeping very large state, compared to the HashMapStateBackend that keeps state in memory. This also means, however, that the maximum throughput that can be achieved will be lower with this state backend. All reads/writes from/to this backend have to go through de-/serialization to retrieve/store the state objects, which is also more expensive than always working with the on-heap representation as the heap-based backends are doing.
EmbeddedRocksDBStateBackend will use the disk, so you have more capacity. Note that it is slower, but that caches could help alleviate some of that slowness; the configuration of which I suggest you look at (in Flink using RocksDB's mecanism)

Memory is not coming down after data processing in Apache Flink

I am using broadcastprocess function to perform simple pattern matching. I am broadcasting around 60 patterns. Once the process completed the memory is not coming down i am using garbage collection setting in my flink configuration file env.java.opts = "-XX:+UseG1GC" to perform GC but it is also not working. But CPU percentage coming after completing the processing of data. I am doing checkpointing every 2 minutes and my statebackend is filesystem. Below are screenshots of memory and CPU usage
I don't see anything surprising or problematic in the graphs you have shared. After ingesting the patterns, each instance of your BroadcastProcessFunction will be holding onto a copy of all of the patterns -- so that will consume some memory.
If I understand correctly, it sounds like the situation is that as data is processed for matching against those patterns, the memory continues to increase until the pods crash with out-of-memory errors. Various factors might explain this:
If your patterns involve matching a sequence of events over time, then your pattern matching engine has to keep state for each partial match. If there's no timeout clause to ensure that partial matches are eventually cleaned up, this could lead to a combinatorial explosion.
If you are doing key-partitioned processing and your keyspace is unbounded, you may be holding onto state for stale keys.
The filesystem state backend has considerable overhead. You may have underestimated how much memory it needs.

What does a cluster size mean?

When someone says "I have a cluster size of 6TB", what do they mean?
In terms of database size? Or in terms of the amount of data being processed at any given time? I don't know what this means.
That may depends of context.
If someone mentioned 6TB they probably are talking about storage.
Strictly cluster should be related to data chunks to be stored where related data should be stored in same cluster.
To understand database cluster you should understand first understand file system cluster which may be 4-8-16-32-64 ... KB for exemple where your data is stored in one or many clusters. For example, if your file system is planified to store big files then your cluster size should be bigger, performance is improved. If you want to store small files then your cluster should be smaller in order to optimize space usage.
In database your goal should be to be able to store related data to same cluster, in this case performance is optimized.
Anyway, 6TB is not anything which could make sense, so, this is probably related to storage space (or many storage sum).
you may want to check following doc to get idea :
https://docs.oracle.com/database/121/ADMIN/clustrs.htm#ADMIN018

EHCache heap and off-heap evaluate resources enough to use?

EHCache version 3.2
How do I evaluate "ehcache:resources" ?
File: ehcache.xml
<ehcache:cache alias="messageCache">
<ehcache:key-type>java.lang.String</ehcache:key-type>
<ehcache:value-type>org.cache.messageCacheBean</ehcache:value-type>
<ehcache:resources>
<ehcache:heap unit="entries">10</ehcache:heap>
<ehcache:offheap unit="MB">1</ehcache:offheap>
</ehcache:resources>
</ehcache:cache>
Assume.
Table Name: Message have 4 columns type varchar2(100 Byte) and have data 1000 rows or more than in the database.
How much provide enough define heap/offheap value ?
Thank you.
Sizing caches is part of the exercise and does not really have a general answer.
However, understanding a bit how Ehcache works is needed. In the context of your configuration:
All mappings will always be present in offheap. So this tier should be sized large enough to balance the memory usage / latency benefits that match your application needs.
The onheap tier will be used for the hot set, giving even better latency reduction. However if too small, it will be evicting all the timing, making its benefit less intersting but if too large it impacts Java garbage collection too much as well.
One thing you can do to help onheap tier sizing is to move to byte size in one test. While it will have a performance impact, it will enable you to evaluate how much memory a mapping takes. And thus derived how much mapping fit in the memory you are ready to spare.

Handling large data in a C program

I am working on a project which runs queries on a database and the results are greater than the memory size. I have heard of memory pool libraries but I'm not sure that it's the best way solution to this problem.
Do memory pool libraries support writing and reading back from disk (as the result of a query that needs to be parsed many times). Are there also some other ways to achieve this?
P.S
I am using MySQL Database and its C API to access database.
EDIT: here's an example:
Suppose I have five tables, each having a million rows. I want to find how much one table is similar to another, so I am creating a bloom filter for each table and then check each filter against the data in the rest of the four tables.
Extending your logical memory beyond the physical memory by using secondary storage (e.g. disks) is usually called swapping, not memory pooling. Your operating system already does it for you, and you should try letting it do its job first.
Memory pool libraries provide more speed and real-time predictability to memory allocation by using fixed-size allocation, but don't increase your actual memory.
You should restructure your program to not use so much memory. Instead of pulling the "whole" (or large part) of the DB into memory you should use a cursor and incrementally update the datastructure your program is maintaining or incrementally change the metric you are querying.
EDIT: you added that you might want to run a bloom filter on the tables?
Have a look at incremental bloom filters: here
how about the Physical Address Extension(PAE)

Resources