Flink: Can we update a keyed state for only some elements in processBroadcastElement function? - apache-flink

As mentioned in the answer here, I can use applyToKeyedState to update all states across all keys in the same manner.
If my broadcast event has a subset of all keys and I only want to update those, can I make it a part of the KeyedStateFunction?
Example
ctx.applyToKeyedState(stateDescriptor, new KeyedStateFunction[K, ValueState[Boolean]]() {
override def process(k: K, state: ValueState[Boolean]): Unit = {
val key = k.asInstanceOf[String]
if (broadcastEvent.contains(key)) {
state.update(true))
}
}
})

Nothing prevents you from employing whatever logic you desire in your KeyedStateFunction, but you could get yourself into trouble. The issue is this: each instance of your keyed broadcast function operator will be applying this function independently. And the job might crash at any point -- perhaps after some instances have applied the KeyedStateFunction, and others have not.
You should limit yourself to operations on the keyed state that will never give rise to inconsistencies, even after failure/recovery or after rescaling.

Related

Deeply updating React state (array) without Immutable, any disadvantages?

I know using Immutable is a great way to deeply update React state, but I was wondering if there are any drawbacks I'm not seeing with this approach:
Assuming this.state.members has the shape Array<MemberType> where MemberType is { userId: String, role: String }.
If the user changes a user's role, the following method is executed:
changeMemberRole = (userId, event, key, value) => {
const memberIndex = _findIndex(this.state.members,
(member) => member.userId === userId);
if (memberIndex >= 0) {
const newMembers = [...this.state.members];
newMembers[memberIndex].role = value;
this.setState({ members: newMembers });
}
};
Would there be any advantage to replacing this with Immutable's setIn, other than potentially more terse syntax?
The difference between using or not Immutable.js is, of course, immutability¹ :)
When you declare const newMembers = [...this.state.members] you're copying an array of references, this is indeed a new array (modifying a direct child by index like 0,1,2 is not reflected) but filled with the same object references, so inner/deep changes are shared. This is called a shallow copy.
newMembers are not so new
Therefore any changes to any newMembers element are also made in the corresponding this.state.members element. This is fine for your example, no real advantages so far.
So, why immutability?
Its true benefits are not easily observed in small snippets because it's more about the mindset. Taken from the Immutable.js homepage:
Much of what makes application development difficult is tracking
mutation and maintaining state. Developing with immutable data
encourages you to think differently about how data flows through your
application.
Immutability brings many of the functional paradigm benefits such as avoiding side effects or race conditions since you think of variables as values instead of objects, making it easier to understand their scope and lifecycle and thus minimizing bugs.
One specific advantage for react is to safely check for state changes in shouldComponentUpdate while when mutating:
// assume this.props.value is { foo: 'bar' }
// assume nextProps.value is { foo: 'bar' },
// but this reference is different to this.props.value
this.props.value !== nextProps.value; // true
When working with objects instead of values nextProps and this.props.value will be considered distinct references (unless we perform a deep comparison) and trigger a re-render, which at scale could be really expensive.
¹Unless you're simulating your own immutability, for what I trust Immutable.js better
You're not copying role, thus if one of your components taking the role as prop (if any) cannot take benefit of pure render optimization (overriding shouldComponentUpdate and detecting whenever props have been actually changed).
But since you can make a copy of the role without immutablejs, there is no any effective difference except that you have to type more (and thus having more opportunities to make a mistake). Which itself is a huge drawback reducing your productivity.
From the setIn docs:
Returns a new Map having set value at this keyPath. If any keys in keyPath do not exist, a new immutable Map will be created at that key.
This is probably not what you are looking for since you may not want to insert a new member with the given role if it does not exist already. This comes down to whether you are able to control the userId argument passed in the function and verify whether it exists beforehand.
This solution is fine. You can replace it with update instead, if you want to.

Most efficient way to increment a value of everything in Firebase

Say I have entries that look like this:
And I want to increment the priority field by 1 for every Item in the list of Estimates.
I can grab the estimates like this:
var estimates = firebase.child('Estimates');
After that how would I auto increment every Estimates priority by 1?
FOR FIRESTORE API ONLY, NOT FIREBASE
Thanks to the latest Firestore patch (March 13, 2019), you don't need to follow the other answers above.
Firestore's FieldValue class now hosts a increment method that atomically updates a numeric document field in the firestore database. You can use this FieldValue sentinel with either set (with mergeOptions true) or update methods of the DocumentReference object.
The usage is as follows (from the official docs, this is all there is):
DocumentReference washingtonRef = db.collection("cities").document("DC");
// Atomically increment the population of the city by 50.
washingtonRef.update("population", FieldValue.increment(50));
If you're wondering, it's available from version 18.2.0 of firestore. For your convenience, the Gradle dependency configuration is implementation 'com.google.firebase:firebase-firestore:18.2.0'
Note: Increment operations are useful for implementing counters, but
keep in mind that you can update a single document only once per
second. If you need to update your counter above this rate, see the
Distributed counters page.
EDIT 1: FieldValue.increment() is purely "server" side (happens in firestore), so you don't need to expose the current value to the client(s).
EDIT 2: While using the admin APIs, you can use admin.firestore.FieldValue.increment(1) for the same functionality. Thanks to #Jabir Ishaq for voluntarily letting me know about the undocumented feature. :)
EDIT 3:If the target field which you want to increment/decrement is not a number or does not exist, the increment method sets the value to the current value! This is helpful when you are creating a document for the first time.
This is one way to loop over all items and increase their priority:
var estimatesRef = firebase.child('Estimates');
estimatesRef.once('value', function(estimatesSnapshot) {
estimatesSnapshot.forEach(function(estimateSnapshot) {
estimateSnapshot.ref().update({
estimateSnapshot.val().priority + 1
});
});
});
It loops over all children of Estimates and increases the priority of each.
You can also combine the calls into a single update() call:
var estimatesRef = firebase.child('Estimates');
estimatesRef.once('value', function(estimatesSnapshot) {
var updates = {};
estimatesSnapshot.forEach(function(estimateSnapshot) {
updates[estimateSnapshot.key+'/priority'] = estimateSnapshot.val().priority + 1;
});
estimatesRef.update(updates);
});
The performance will be similar to the first solution (Firebase is very efficient when it comes to handling multiple requests). But in the second case it will be sent a single command to the server, so it will either fail or succeed completely.

How best to store a number in google realtime model, and get atomic change events?

Sounds pretty simple, however...
This number holds an enumerated type, and should be a field within a custom realtime object. Here's its declaration in the custom object registration routine:
MyRTObjectType.prototype.myEnumeratedType =
gapi.drive.realtime.custom.collaborativeField('myEnumeratedType');
I can store it in the model as a simple javascript number, and initialize it like this:
function initializeMyRTObjectType() {
// other fields here
this.myEnumeratedType = 0;
}
...but the following doesn't work, of course, since it's just a number:
myRTObject.myEnumeratedType.addEventListener(
gapi.drive.realtime.EventType.OBJECT_CHANGED, self.onTypeChanged);
I can add the event listener to the whole object:
myRTObject.addEventListener(
gapi.drive.realtime.EventType.OBJECT_CHANGED, self.onTypeChanged);
But I'm only interested in changes to that number (and if I were interested in other changes, I wouldn't want to examine every field to see what's changed).
So let's say I store it as a realtime string, initializing it like this:
function initializeMyRTObjectType() {
var model = gapi.drive.realtime.custom.getModel(this);
// other fields here
this.myEnumeratedType = model.createString();
}
Now I'll get my change events, but they won't necessarily be atomic, and I can't know whether a change, say from "100" to "1001", is merely a change enroute to "101", and so whether I should react to it (this exact example may not be valid, but the idea is there...)
So the question is, is there either a way to know that all (compounded?) changes, insertions/deletions are complete on a string field, or (better) a different recommended way to store a number, and get atomic notification when it has been changed?
You also get a VALUE_CHANGED event on the containing object like you would for a map:
myRTObject.addEventListener(gapi.drive.realtime.EventType.VALUE_CHANGED,
function(event) {
if (event.property === 'myEnumeratedType') {
// business logic
}
});

Django: lock particular rows in table

I have the following django method:
def setCurrentSong(request, player):
try:
newCurrentSong = ActivePlaylistEntry.objects.get(
song__player_lib_song_id=request.POST['lib_id'],
song__player=player,
state=u'QE')
except ObjectDoesNotExist:
toReturn = HttpResponseNotFound()
toReturn[MISSING_RESOURCE_HEADER] = 'song'
return toReturn
try:
currentSong = ActivePlaylistEntry.objects.get(song__player=player, state=u'PL')
currentSong.state=u'FN'
currentSong.save()
except ObjectDoesNotExist:
pass
except MultipleObjectsReturned:
#This is bad. It means that
#this function isn't getting executed atomically like we hoped it would be
#I think we may actually need a mutex to protect this critial section :(
ActivePlaylistEntry.objects.filter(song__player=player, state=u'PL').update(state=u'FN')
newCurrentSong.state = u'PL'
newCurrentSong.save()
PlaylistEntryTimePlayed(playlist_entry=newCurrentSong).save()
return HttpResponse("Song changed")
Essentially, I want it to be so that for a given player, there is only one ActivePlaylistEntry that has a 'PL' (playing) state at any given time. However, I have actually experienced cases where, as a result of quickly calling this method twice in a row, I get two songs for the same player with a state of 'PL'. This is bad as I have other application logic that relies on the fact that a player only has one playing song at any given time (plus semantically it doesn't make sense to be playing two different songs at the same time on the same player). Is there a way for me to do this update atomically? Just running the method as a transaction with the on_commit_success decorator doesn't seem to work. Is there like a way to lock the table for all songs belonging to a particular player? I was thinking of adding a lock column to my model (boolean field) and either just spinning on it or pausing the thread for a few milliseconds and checking again but these feel super hackish and dirty. I was also thinking about creating a stored procedure but that's not really database independent.
Locking queries were added in 1.4.
with transaction.commit_manually():
ActivePlayListEntry.objects.select_for_update().filter(...)
aple = ActivePlayListEntry.objects.get(...)
aple.state = ...
transaction.commit()
But you should consider refactoring so that a separate table with a ForeignKey is used to indicate the "active" song.

Can State Pattern help with read only states?

I'm trying to model a certain process and I'm thinking that the State Pattern might be a good match. I'd like to get your feedback though about whether State will suit my needs and how it should be combined with my persistence mechanism.
I have a CMS that has numerous objects, for example, Pages. These objects (we'll use the example of Pages, but it's true of most objects) can be in one of a number of states, 3 examples are:
Unpublished
Published
Reworking
When Unpublished, they are editable. Once Published, they are not editable, but can be moved into the Reworking state. In the Reworking state they are editable again and can be Republished.
Obviously the decision for whether these Pages are editable should be in the models themselves and not the UI. So, the State pattern popped into mind. However, how can I prevent assigning values to the object's properties? It seems like a bad idea to have a check on each property setter:
if (!CurrentState.ReadOnly)
Any ideas how to work this? Is there a better pattern for this?
Using wikipedia's Java example, the structure has a Context, which calls methods defined in the base State, which the concrete states override.
In your case, the context is something like a page. In some states, the edit() method is simply a no-op. Some of the actions on the context may execute a state change implicitly. There is never any need in the client code to test which state you are in.
Update:
I actually thought of a method this morning that would work with your specific case and be a lot easier to maintain. I'll leave the original two points here, but I'm going to recommend the final option instead, so skip to the "better method" section.
Create a ThrowIfReadOnly method, which does what it says on the tin. This is slightly less repetitive and avoids the nesting.
Use an interface. Have an IPage that implements the functionality you want, have every public method return an IPage, then have two implementations, an EditablePage and a ReadOnlyPage. The ReadOnlyPage just throws an exception whenever someone tries to modify it. Also put an IsReadOnly property (or State property) on the IPage interface so consumers can actually check the status without having to catch an exception.
Option (2) is more or less how IList and ReadOnlyCollection<T> work together. It saves you the trouble of having to do a check at the beginning of every method (thus eliminating the risk of forgetting to validate), but requires you to maintain two classes.
-- Better Method --
A proper technical spec would help a lot to clarify this problem. What we really have here is:
A series of arbitrary "write" actions;
Each action has the same outcome, dependent on the state:
Either the action is taken (unpublished/reworking), or fails/no-ops (read-only).
What really needs to be abstracted is not so much the action itself, but the execution of said action. Therefore, a little bit of functional goodness will help us here:
public enum PublishingState
{
Unpublished,
Published,
Reworking
}
public delegate void Action();
public class PublishingStateMachine
{
public PublishingState State { get; set; }
public PublishingStateMachine(PublishingState initialState)
{
State = initialState;
}
public void Write(Action action)
{
switch (State)
{
case PublishingState.Unpublished:
case PublishingState.Reworking:
action();
break;
default:
throw new InvalidOperationException("The operation is invalid " +
"because the object is in a read-only state.");
}
}
}
Now it becomes almost trivial to write the classes themselves:
public class Page
{
private PublishingStateMachine sm = new
PublishingStateMachine(PublishingState.Unpublished);
private string title;
private string category;
// Snip other methods/properties
// ...
public string Title
{
get { return title; }
set { sm.Write(() => title = value; }
}
public string Category
{
get { return category; }
set { sm.Write(() => category = value; }
}
public PublishingState State
{
get { return sm.State; }
set { sm.State = value; }
}
}
Not only does this more-or-less implement the State pattern, but you don't need to maintain separate classes or even separate code paths for the different states. If you want to, for example, turn the InvalidOperationException into a no-op, just remove the throw statement from the Write method. Or, if you want to add an additional state, like Reviewing or something like that, you just need to add one case line.
This won't handle state transitions for you or any really complex actions that do different things depending on the state (other than just "succeed" or "fail"), but it doesn't sound like you need that. So this gives you a drop-in state implementation that requires almost no extra code to use.
Of course, there's still the option of dependency injection/AOP, but there's obviously a lot of overhead associated with that approach, and I probably wouldn't use it for something so simple.

Resources