How to clear the whole MapSate state with only one call - apache-flink

I know that if I do mapState.clear() I will be able to clean all the values into the state for the specific key, but my question is: Is there a way to do something like mapState.clear() and clean all the states into the mapStates with just one call? will be something like mapState.isEmpty() it will say "true" because all the keys into the mapState were cleaned up, not just for the current key.
Thanks.
Kind regards!

Because we are talking about a situation with nested maps, it's easy to get our terminology confused. So let's put this question into the context of an example.
Suppose you have a stream of events about users, and inside a KeyedProcessFunction you are using a MapState<ATTR, VALUE> to maintain a map of attribute/value pairs for each user:
userEvents
.keyBy(e -> e.userId)
.process(new ManageUserData())
Inside the process function, any time you are working with MapState you can only manipulate the one map for the user corresponding to the event being processed,
public static class ManageUserData extends KeyedProcessFunction<...> {
MapState<ATTR, VALUE> userMap;
}
so userMap.clear() will clear the entire map of attribute/value pairs for one user, but leave the other maps alone.
I believe you are asking if there's some way to clear all of the MapStates for all users at once. And yes, there is a way to do this, though it's a bit obscure and not entirely straightforward to implement.
If you change the KeyedProcessFunction in this example to a KeyedBroadcastProcessFunction, and connect a broadcast stream to the stream of user events, then in that KeyedBroadcastProcessFunction you can use KeyedBroadcastProcessFunction.Context.html#applyToKeyedState inside of the processBroadcastElement() method to iterate over all of the users, and for each user, clear their MapState.
You will have to arrange to send an event on the broadcast stream whenever you want this to happen.
You should pay attention to the warnings in the documentation regarding working with broadcast state. And keep in mind that the logic implemented in processBroadcastElement() must have the same deterministic behavior across all parallel instances.

Related

Flink AggregateFunction vs KeyedProcessFunction with ValueState

We have an application that consumes events from a kafka source. The logic from processing each element needs to take into account the events that were previously received (having the same partition key), without using time for windowing. The first implementation used a GlobalWindow, with an AggregateFunction for keeping the current state information and a trigger that would always fire in onElement call. I am guessing that the alternative of using a KeyedProcessFunction that and holds the state in a ValueState object would be more adequate, since we are not really taking timing into account, nor using any custom triggering. Is this assumption correct and are there any downsides to either one of these approaces?
In prefer using a KeyedProcessFunction in cases like this. It puts all of the related logic into one object -- rather than having to coordinate what's going on in a GlobalWindow, an AggregateFunction, and a Trigger (and perhaps also an Evictor). I find this results in implementations that are more maintainable and testable, plus you have more straightforward control over state management.
I don't see any advantages to a solution based on windows.

What is the difference between seeding a action and call a 'setter' method of a store in reflux data flow?

What is the difference between seeding a action and call a 'setter' method of a store in reflux data flow?
TodoActions['add'](todo)
vs
TodoStore.add(todo)
Action will trigger your store via RefluxJS lib, but Store.Add() is calling add method directly
First off, it's useful to note that Whatever.func() and Whatever['func']() are just two different syntaxes for the same thing. So the only difference here in your example is what you're calling it on.
As far as calling a method in a store directly, vs. an action which then ends up calling that method in a store, the difference is architectural, and has to do with following a pattern that is more easily scaled, works more broadly, etc. etc.
If any given event within the program (such as, in this case, adding something) emits 1 clear action that anything can listen for, then it becomes MUCH easier to build large programs, edit previously made programs, etc. The component saying that this event has happened doesn't need to keep track of everywhere that might need to know about it...it just needs to say TodoActions.add(todo), and every other part of the program that needs to know about an addition happening can manage itself to make sure it's listening for that action.
So that's why we follow the 1 way looping pattern:
component -> action -> store -> back to component
Because then the flow of events happening is much more easily managed, because each part of the program can manage its own knowledge about the program state and when it needs to be changed. The component emitting the action doesn't need to know every possible part of the program that might need that action...it just needs to emit it.

what is this difference between this flux action and this function call?

I could a have a flux action like this:
{type: 'KILL', payload: {target: 'ogre'}}
But I am not seeing what the difference is between having a method on a class People (wrapping the store) like this,
People.kill('ogre')
IF People is the only receiver of the action?
I see that the flux dispatcher gives me two advantages (possibly)
The "kill" method can be broadcast to multiple unknown receivers (good!)
The dispatcher gives me a handy place to log all action traffic (also good!)
These might be good things sure, but is there any other reasons that I am missing?
What I don't see is how putting the actions in the form of JSON objects, suddenly enforces or helps with "1-way" communication flow, which is what I read everywhere is the big advantage of having actions, and of flux.
Looks to me like I am still effectively sending a message back to the store, no matter how I perfume the pig. Sure the action is now going through a couple of layers of indirection (action creator, dispatcher) before it gets to the store, but unless I am missing something the component that sends that action for all practical purposes is updating whatever stores are listening for the kill message.
What I am missing here?
Again I know on Stack Overflow we can't ask too general a question, so I want to keep this very specific. The two snippets of code while having different syntax, appear to be semantically (except for the possibility of broadcasting to multiple stores) exactly the same.
And again if the only reason is that it enables broadcasting and enables a single point of flow for debug purposes, I am fine with that, but would like to know if there is some other thing about flux/the dispatcher I am missing?
The major features of the flux-style architecture are roughly the following:
the store is the single source of truth for application state
only actions can trigger mutation of the store's state
store state should not be mutated directly, i.e. via assigning object values, but by creating new objects via cloning/destructuring instead
Like a diet, using this type of architecture really doesn't work if you slip and go back to the old ways intermittently.
Returning to your example. The benefit for using the action here is not broadcasting or logging aspects, but simply the fact that the People class should only be able to either consume data from a store and express its wishes to mutate the state of said store with actions. Imagine for example that Elves want to sing to the the ogre and thus are interested in knowing the said ogre is still alive. At the same time the People want to be polite and do not wish to kill the ogre while it is being serenaded. The benefits of the flux-style architecture are clear:
class People {
kill(creature) {
if (creatureStore.getSerenadedCreature() !== creature)
store.dispatch({ type: 'KILL', payload: { target: creature } })
return `The ${creature} is being serenaded by those damn elves, let's wait until they've finished.`
}
}
class Elves {
singTo(creature) {
if (!creatureStore.getCreatures().includes(creature))
return store.dispatch({ type: 'SING_TO', payload: { target: creature } })
return `Oh no, the ${creature} has been killed... I guess there will be no serenading tonight..`
}
}
If the class People were to wrap the store, you'd need the Elves class to wrap the same store as well, creating two places where the same state would be mutated in one way or the other. Now imagine if there were 10 other classes that need access to that store and want to change it: adding those new features is becoming a pain because all those classes are now at the mercy of the other classes mutating the state from underneath them, forcing you to handle tons of edge cases not possibly even related to the business logic of those classes.
With the flux style architecture, all those classes will only consume data from the creatureStore and dispatch actions based on that state. The store handles reconciling the different actions with the state so that all of its subscribers have the right data at the right times.
The benefits of this pattern may not be evident when you only have a couple of stores that are consumed by one or two entities each. When you have tens (or hundreds) of stores with tens (or hundreds) of components consuming data from several stores each, this architecture saves you time and money by making it easier to develop new features without breaking existing ones.
Hope this wall-o-text helped to clarify!
What I don't see is how putting the actions in the form of JSON objects, suddenly enforces or helps with "1-way" communication flow, which is what I read everywhere is the big advantage of having actions, and of flux.
Looks to me like I am still effectively sending a message back to the store, no matter how I perfume the pig. Sure the action is now going through a couple of layers of indirection (action creator, dispatcher) before it gets to the store, but unless I am missing something the component that sends that action for all practical purposes is updating whatever stores are listening for the kill message.
What I am missing here?
Facebook Flux took the idea from the event driven GUI systems.
In there even if you move your mouse you get messages. This was called message loop then, and now we have actions dispatching.
Also, we have lists of subscribers inside stores.
And it is really the same principle in Redux where you have one store, while in Flux you may have multiple stores.
Now little mathematics. Having 2 components A and B you need to have just a few possible update chains A updates B and B update A, or self-update (non including in here the updates from outside of the app). This is the possible case.
With just three components we have much more possible chains.
And with even more components it gets complicated. So to suppress the exponential complexity of possible components interaction we have this Flux pattern which in nothing more than IDispatch, IObservable if you worked with these interfaces from some other programming languages. One would be for spitting the actions, and the other for entering the listener's chain that exists inside the store.
With this pattern, your React code will be organized in a different way than common React approach. You will not have to use React.Component state anymore. Instead, you will use the Store(s) that will hold the application state.
Your component can only show the desire to mutate the application state by dispatching the action. For instance: onClick may dispatch the action to increment the counter. The actions are objects with the property type: that is usually a string, and usually in upper case, but the action object may have many other props such as ID, value,...
Since the components are responsible for rendering based on the application state we need somehow to deliver them the application state. It may be via the props = store.getState() or we may use the context. But also check this.
Finally, it is even not forbidden that component uses the internal state (this.state) in case this has no impact on the application. You should recognize these cases.

Why do we need dispatcher in Flux?

This is not a React specific question. I'm thinking of implementing Flux in Aurelia/Angularjs.
While reading up on flux, I'm not convinced of the need of the dispatcher step. Why can't a component call the store directly to update and retrieve data? Is there anything wrong with that approach?
For example: If I have a CarStore that can create new cars, update cars and get a list of cars(just a thin layer on the CRUD api), I should be able to retrieve/update the list by directly calling the store from the car-grid component. Since the store is a singleton, whenever the list updates, car-grid should automatically get the new items. What is the benefit of using a dispatcher in this scenario?
I've created several large apps using React-native with Redux as the store / view state updater.
The dispatch action is synchronous regardless. There's a big disadvantage to using dispatchers, you lose the function signature. (Debugging, auto-catching type-errors, refactoring lost, multiple declarations of the same function, list goes on)
Never had to use a dispatcher and its caused no issues. Within the actions we simply call getState().dispatch. The store is a singleton anyhow, it's heavily recommended you don't have multiple stores. (Why would you do that...)
You can see here why are dispatchers important (check out the section Why We Need a Dispatcher). The way I see it, the idea is basically being able to access to various stores in a synchronous way (one callback finishes before another one is called). You can make this thanks to the waitFor method, which allows you to wait for a store to finish processing an action (or more tan one). There is a good example in the docs. For example, your application may grow and instead of having just that CarStore you have another Store whose updates depend on the CarStore updates.
If you will only ever have one store, then a dispatcher is redundant in my opinion. If you have multiple stores however, then a dispatcher is important so that actions don't need to know about each of these stores.
Please note that I am not saying that you should ditch the dispatcher if you only have one store. It's still a good pattern as it gives you the option of supporting multiple stores if you ever need to in the future.

Why should I use Actions in Flux?

An application that I develop was initially built with Flux.
However, over time the application became harder to maintain. There was a very large number of actions. And usually one action is only listened to in one place (store).
Actions make it possible to not write all event handler code in one place. So instead of this:
store.handleMyAction('ha')
another.handleMyAction('ha')
yetAnotherStore.handleMyAction('ha')
I can write:
actions.myAction('ha')
But I never use actions that way. I am almost sure, that this isn't an issue of my application.
Every time I call an action, I could have just called store.onSmthHappen instead of action.smthHappen.
Of course there are exceptions, when one action is processed in several places. But when that happens it feels like something went wrong.
How about if instead of calling actions I call methods directly from the store? Will my application not be so flexible? No! Occurs just rename (with rare exceptions). But at what cost! It becomes much harder to understand what is happening in the application with all these actions. Each time, when tracking the processing of complex action, I have to find in stores where they are processed. Then in these Stores I should find the logic that calls another action. Etcetera.
Now I come to my solution:
There are controllers that calls methods from stores directly. All logic to how handle action is in the Store. Also Stores calls to WebAPI (as usually one store relating to one WebAPI). If the event should be processed in several Stores (usually sequentially), then the controller handles this by orchestrating promises returned from stores. Some of sequentials (common used) in private methods of itself. And method of controllers can use them as simple part of handling. So I will never be duplicating code.
Controller methods do not return anything (one-way flow).
In fact the controller does not contain the logic of how to process the data. It's only points where, and in what sequence.
You can see almost the complete picture of the data processing in the Store. There is no logic in stores about how to interact with another stores (with flux it's like a many-to-many relation but just through actions). Now the store is a highly cohesive module that is responsible only for the logic of domain model (collection).
The main (in my opinion) advantages of flux are still here.
As a result, there are Stores, which are the only true source of the data. Components can subscribe to the Stores. And the components calls the same methods as before, but instead of actions uses controller. Interaction with React did not change at all.
Also, event processing becomes much obvious. Now I can just look at the handler in the controller and all becomes clear, and it's much easier to debug.
The question is:
Why were actions created in flux? And what are their advantages that I have missed?
Actions where implemented to capture a certain interaction on the view or from the server which can then be dispatched to as many different stores as you like. The developers explained this with the example of the facebookchat.
There is a messageStore and a threadstore. When the action eg. messagePost was emitted it got dispatched into both stores doing different work to update their attributes. Threadstore increased the number of unread messages and messageStore added the new message to its messagearray.
So its basicly channeling one action to perform datachanges in more than one store.
I had the same questions and thought process as you, and now I started using Flummox which makes it cleaner to have the Flux architecture.
I define my Actions in the same file where I define my Store, and that's close enough. I can still subscribe to the dispatcher to log events so I can see all actions being called, and I have the option to create multi-store Actions if needed.
It comes with a nice FluxComponent that lets you wrap all store-related code in a single place so its children are stateless components that get updated on store changes, like
<FluxComponent connectToStores={['storeA', 'storeB']}>
<InnerComponent />
</FluxComponent>
Keeping the Flux architecture (it uses Facebook's Flux behind the scenes) will hopefully make it easy to use other technologies like GraphQL.

Resources