When thinking about the act of keying by something I traditionally think of the analogy of throwing all the events that match the key into the same bucket. As you can imagine, when the Flink application starts handling lots of data what you opt to key by starts to become important because you want to make sure you clean up state well. This leads me to my question, how exactly does Flink clean up these "buckets"? If the bucket is empty (all the MapStates and ValueStates are empty) does Flink close that area of the key space and delete the bucket?
Example:
Incoming Data Format: {userId, computerId, amountOfTimeLoggedOn}
Key: UserId/ComputerId
Current Key Space:
Alice, Computer 10: Has 2 events in it. Both events are stored in state.
Bob, Computer 11: Has no events in it. Nothing is stored in state.
Will Flink come and remove Bob, Computer 11 from the Key Space eventually or does it just live on forever because at one point it had an event in it?
Flink does not store any data for state keys which do not have any user value associated with them, at least in the existing state backends: Heap (in memory) or RocksDB.
The Key Space is virtual in Flink, Flink does not make any assumptions about which concrete keys can potentially exist. There are no any pre-allocated buckets per key or subset of keys. Only once user application writes some value for some key, it occupies storage.
The general idea is that all records with the same key are processed on the same machine (somewhat like being in the same bucket as you say). The local state for a certain key is also always kept on the same machine (if stored at all). This is not related to checkpoints though.
For your example, if some value was written for [Bob, Computer 11] at some point of time and then subsequently removed, Flink will remove it completely with the key.
Short Answer
It cleans up with the help of Time To Live (TTL) feature of Flink State and Java Garbage Collector (GC). TTL feature will remove any reference to the state entry and GC will take back the allocated memory.
Long Answer
Your question can be divided into 3 sub-questions:
I will try to be as brief as possible.
How does Flink partition the data based on Key?
For an operator over a keyed stream, Flink partitions the data on a key with the help of Consistent Hashing Algorithm. It creates max_parallelism number of buckets. Each operator instance is assigned one or more of these buckets. Whenever a datum is to be sent downstream, the key is assigned to one of those buckets and consequently sent to the concerned operator instance. No key is stored here because ranges are calculated mathematically. Hence no area is cleared or bucket is deleted anytime. You can create any type of key you want. It won't affect the memory in terms of keyspace or ranges.
How does Flink store state with a Key?
All operator instances have an instance-level state store. This store defines the state context of that operator instance and it can store multiple named-state-storages e.g. "count", "sum", "some-name" etc. These named-state-storages are Key-Value stores that can store values based on the key of the data.
These KV stores are created when we initialize the state with a state descriptor in open() function of an operator. i.e. getRuntimeContext().getValueState().
These KV stores will store data only when something is needed to be stored in the state. (like HashMap.put(k,v)). Thus no key or value is stored unless state update methods (like update, add, put) are called.
So,
If Flink hasn't seen a key, nothing is stored for that key.
If Flink has seen the key but didn't call the state update methods, nothing is stored for that key.
If a state update method is called for a key, the key-value pair will be stored in the KV store.
How does Flink clean up the state for a Key?
Flink does not delete the state unless it is required by the user or done by the user manually. As mentioned earlier, Flink has the TTL feature for the state. This TTL will mark the state expiry and remove it when a cleanup strategy is invoked. These cleanup strategies vary wrt backend type and the time of cleanup. For Heap State Backend, It will remove the entry from a state table i.e. removing any reference to the entry. The memory occupied by this non-referenced entry will be cleaned up by Java GC. For RocksDB State Backend, it simply calls the native delete method of RocksDB.
Related
We have a keyed process function that uses state and a "key by" being done immediately before that. The "key by" attribute involved transactional values and hence we expect many keys to be created. But these will be short-lived and we don't expect them to last for more than a day. Is there any way by which we can delete all the state associated with a key and the key itself manually from within the keyed process function?
Will simply setting the value of the associated state variables to null enable Flink to clean it up?
We are worried that even a very minimal amount of residual data that might be left back for every key-value would accumulate and contribute to huge state size.
One solution would be to configure state TTL so that the state is automatically deleted after some period of not being used. Or you can register a keyed timer in your keyed process function, and call clear() in the onTimer method to delete the state when the timer fires.
How frequently does Flink de/serialise operator state? Per get/update or based on checkpoints? Does the state backend make a difference?
I suspect that in the case of a keyed-stream with a diverse key (millions) and thousands of events per second for each key, the de/serialization might be a big issue. Am I right?
Your assumption is correct. It depends on the state backend.
Backends that store state on the JVM heap (MemoryStateBackend and FSStateBackend) do not serialize state for regular read/write accesses but keep it as objects on the heap. While this leads to very fast accesses, you are obviously bound to the size of the JVM heap and also might face garbage collection issues. When a checkpoint is taken, the objects are serialized and persisted to enable recovery in case of a failure.
In contrast, the RocksDBStateBackend stores all state as byte arrays in embedded RocksDB instances. Therefore, it de/serializes the state of a key for every read/write access. You can control "how much" state is serialized by choosing an appropriate state primitive, i.e., ValueState, ListState, MapState, etc.
For example, ValueState is always de/serialized as a whole, whereas a MapState.get(key) only serializes the key (for the lookup) and deserializes the returned value for the key. Hence, you should use MapState<String, String> instead of ValueState<HashMap<String, String>>. Similar considerations apply for the other state primitives.
The RocksDBStateBackend checkpoints its state by copying their files to a persistent filesystem. Hence, there is no additional serialization involved when a checkpoint is taken.
In my application I run a cron job to loop over all users (2500 user) to choose an item for every user out of 4k items, considering that:
- choosing the item is based on some user info,
- I need to make sure that each user take a unique item that wasn't taken by any one else, so relation is one-to-one
To achieve this I have to run this cron job and loop over the users one by one sequentially and pick up the item for each then remove it from the list (not to be chosen by next user(s)) then move to the next user
actually in my system the number of users/items is getting bigger and bigger every single day, this cron job now takes 2 hours to set items to all users.
I need to improve this, one of the things I've thought about is using Threads but I cant do that since Im using automatic scaling, so I start thinking about push Queues, so when the cron jobs run, will make a loop like this:
for(User user : users){
getMyItem(user.getId());
}
where getMyItem will push the task to a servlet to handle it and choose the best item for this person based on his data.
Let's say I'll start doing that so what will be the best/robust solution to avoid setting an item to more than one user ?
Since Im using basic scaling and 8 instances, can't rely on static variables.
one of the things that came across my mind is to create a table in the DB that accept only unique items then I insert into it the taken items so if the insertion is done successfully it means no body else took this item so i can just assign it to that person, but this will make the performance a bit lower cause I need to make write DB operation with every call (I want to avoid that)
Also I thought about MemCach, its really fast but not robust enough, if I save a Set of items into it which will accept only unique items, then if more than one thread was trying to access this Set at the same time to update it, only one thread will be able to save its data and all other threads data might be overwritten and lost.
I hope you guys can help to find a solution for this problem, thanks in advance :)
First - I would advice against using solely memcache for such algorithm - the key thing to remember about memcache is that it is volatile and might dissapear at any time, breaking the algorithm.
From Service levels:
Note: Whether shared or dedicated, memcache is not durable storage. Keys can be evicted when the cache fills up, according to the
cache's LRU policy. Changes in the cache configuration or datacenter
maintenance events can also flush some or all of the cache.
And from How cached data expires:
Under rare circumstances, values can also disappear from the cache
prior to expiration for reasons other than memory pressure. While
memcache is resilient to server failures, memcache values are not
saved to disk, so a service failure can cause values to become
unavailable.
I'd suggest adding a property, let's say called assigned, to the item entities, by default unset (or set to null/None) and, when it's assigned to a user, set to the user's key or key ID. This allows you:
to query for unassigned items when you want to make assignments
to skip items recently assigned but still showing up in the query results due to eventual consistency, so no need to struggle for consistency
to be certain that an item can uniquely be assigned to only a single user
to easily find items assigned to a certain user if/when you're doing per-user processing of items, eventually setting the assigned property to a known value signifying done when its processing completes
Note: you may need a one-time migration task to update this assigned property for any existing entities when you first deploy the solution, to have these entities included in the query index, otherwise they would not show up in the query results.
As for the growing execution time of the cron jobs: just split the work into multiple fixed-size batches (as many as needed) to be performed in separate requests, typically push tasks. The usual approach for splitting is using query cursors. The cron job would only trigger enqueueing the initial batch processing task, which would then enqueue an additional such task if there are remaining batches for processing.
To get a general idea of such a solution works take a peek at Google appengine: Task queue performance (it's python, but the general idea is the same).
If you are planning for push jobs inside a cron and you want the jobs to be updating key-value pairs as an addon to improvise the speed and performance, we can split the number of users and number of items into multiple key-(list of values) pairs so that our push jobs will pick the key random ( logic to write to pick a key out of 4 or 5 keys) and then remove an item from the list of items and update the key again, try to have a locking before working on the above part. Example of key value paris.
Userlist1: ["vijay",...]
Userlist2: ["ramana",...]
Can i use elasticsearch to store statistic informations with less overhead?
It should be used for example how often is a function call was madeand how much time has it taken.
Or how many requests has been made to a specific endpoint and also the time it has taken, and so on.
My idea would be to store a key, timestamp and takenTime.
And i can query results in different manner.
simply handled by functions like profile_start and profile_done
void endpointGetUserInformation()
{
profile_start("requests.GetUserInformation");
...
// profile_done stores to the database
profile_done("requests.GetUserInformation");
}
In a normal sql database i would make a table witch holds all keys, and a second table that holds key_ids, timestamp and timeTaken. This storage would need less space on disk.
When i store to elasticsearch it stores a lot of data additionally and also the key is redundant is there a solutuion to store it also simplier.
Dears, I have a problem where multiple redis-clients are accessing a common structure stored in redis-server.
Requirements are as follows:-
If a particular redis-client is accessing the structure stored in redis-server (shall do read and write operation on the structure), no other redis-client should be able to access and wait for being released.
Every time other redis-client is accessing the structure, they should access the updated structure.
How can I put locking mechanism to fulfill this requirement in C Code.
Thanks in Advance.
Redis provides the following:
1) Use Redis transactions and optimistic locking. See Redis Transactions
2) Or Lua scripting, which will be executed in Redis atomically. See EVAL
Use the watch command https://redis.io/commands/watch to detect modification by other clients. This command is specified only to the specified keys in redis transactions
Dears, thanks all for your response. It was helpful to know various features and generalised way available with the redis.
However I approached as follows as my requirement met this way.
I used secound timestamp (say t_sec) as key and counter as hash value. If in that particular second, further request comes, the counter value corresponding to t_sec key was incremented (HINCRBY command) in atomic way. Rest of the parameters are locally stored in a structure. If counter reaches a particular set limit, requests were dropped.
If this is next sec, new t_sec key value is used and counter is incremented from zero.
t_sec key corresponding to previous second got deleted (HDEL command).