Flink KeyedCoProcessFunction working with state - apache-flink

I use KeyedCoProcessFunction function to enrich main datastream with data comes from another stream
Code:
class AssetDataEnrichment extends KeyedCoProcessFunction[String, PacketData, AssetCommandState, AssetData] with LazyLogging {
case class AssetStateDoc(assetId: Option[String])
private var associatedDevices: ValueState[AssetStateDoc] = _
override def open(parameters: Configuration): Unit = {
val associatedDevicesDescriptor =
new ValueStateDescriptor[AssetStateDoc]("associatedDevices", classOf[AssetStateDoc])
associatedDevices = getRuntimeContext.getState[AssetStateDoc](associatedDevicesDescriptor)
}
override def processElement1(
packet: PacketData,
ctx: KeyedCoProcessFunction[String, PacketData, AssetCommandState, AssetData]#Context,
out: Collector[AssetData]): Unit = {
val tmpState = associatedDevices.value
val state = if (tmpState == null) AssetStateDoc(None) else tmpState
state.assetId match {
case Some(assetId) =>
logger.debug(s"There are state for ${packet.tag.externalId} = $assetId")
out.collect(AssetData(assetId, packet.tag.externalId.get, packet.toString))
case None => logger.debug(s"No state for a packet ${packet.tag.externalId}")
case _ => logger.debug("Smth went wrong")
}
}
override def processElement2(
value: AssetCommandState,
ctx: KeyedCoProcessFunction[String, PacketData, AssetCommandState, AssetData]#Context,
out: Collector[AssetData]): Unit = {
value.command match {
case CREATE =>
logger.debug(s"Got command to CREATE state for tag: ${value.id} with value: ${value.assetId}")
logger.debug(s"current state is ${associatedDevices.value()}")
associatedDevices.update(AssetStateDoc(Some(value.assetId)))
logger.debug(s"new state is ${associatedDevices.value()}")
case _ =>
logger.error("Got unknown AssetCommandState command")
}
}
}
processElement2() works good, it's accept data and update a state.
but in a processElement1() I am always hitting case None => logger.debug(s"No state for a packet ${packet.tag.externalId}")
although I expect that there will be a value that was set in processElement2 function
as an example I used this guide - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/

processElement1 and processElement2 do share state, but keep in mind that this is key-partitioned state. This means that a value set in processElement2 when processing a given value v2 will only be seen in processElement1 when it is called later with a value v1 having the same key as v2.
Also keep in mind that you have no control over the race condition between the two streams coming into processElement1 and processElement2.
The RidesAndFares exercise from the official Apache Flink training is all about learning to work with this part of the API. https://nightlies.apache.org/flink/flink-docs-stable/docs/learn-flink/etl/ is the home for the corresponding tutorial.

Related

Custom Layer with kwargs in tfjs

I'm new to tensorflowjs and I'm struggling to implement some custom layers, if someone could point me in the right direction that would be really helpful!
For example, I have a layer in InceptionResnetV1 architecture where I'm multiplying the layer by a constant scale (this was originally an unsupported Lambda layer which I'm switching out for a custom layer), but the value of this scale changes per block. This works fine in Keras with an implementation such as below, and using load_model with ScaleLayer in the custom objects
class ScaleLayer(tensorflow.keras.layers.Layer):
def __init__(self, **kwargs):
super(ScaleLayer, self).__init__(**kwargs)
def call(self, inputs, **kwargs):
return tensorflow.multiply(inputs, kwargs.get('scale'))
def get_config(self):
return {}
x = ScaleLayer()(x, scale = tensorflow.constant(scale))
I tried defining this in a similar way in javascript and then registered the class
class ScaleLayer extends tf.layers.Layer {
constructor(config?: any) {
super(config || {});
}
call(input: tf.Tensor, kwargs: Kwargs) {
return tf.tidy(() => {
this.invokeCallHook(input, kwargs);
const a = input;
const b = kwargs['scale'];
return tf.mul(a, b);
});
}
static get className() {
return 'ScaleLayer';
}
}
tf.serialization.registerClass(ScaleLayer);
However I'm finding that the kwargs are always empty. I tried another similar method where I passed scale as another dimension of the input, then did input[0] * input[1], which again worked fine for the keras model but not in javascript.
I feel like I'm missing something key on the way to defining this kind of custom layer with a changing value per block on the javascript end, so if someone would be able to point me in the right direction it would be much appreciated! Thanks.
constructor(config?: any) {
super(config || {});
}
The config are passed to the parent constructor. But as indicated by the question, the ScaleLayer layer also needs to keep some config properties
constructor(config?: any) {
super(config || {});
// this.propertyOfInterest = config.propertyOfInterest
// make sure that config is an object;
this.scale = config.scale
}
Then for the computation, the ScaleLayer property propertyOfInterest can be used
call(input: tf.Tensor) {
return tf.tidy(() => {
this.invokeCallHook(input, kwargs);
const a = input;
return tf.mul(a, this.scale);
});
}
Use the layer this way:
const model = tf.sequential();
...
model.add(new ScaleLayer({scale: 1}));
...

flink how to combine two windows?

I have a stream - I want to compare number of events in the current window with the previous window.
It can be done by keeping the number of events in the window in globalState and doing something link :
class Foo [I,O] extends ProcessWindowFunction[I,O, String, TimeWindow] {
override def process(key: String, context: Context, elements: Iterable[I], out: Collector[O]): Unit = {
val state = context.globalState.getState(windowStateDescriptor)
if (state.value != null) {
if(state.value > elements.size) {
// do some out.collect
} else {
state.update(elements.size)
}
}
}
}
however I am trying to avoid keeping the persistent state. is there a better more idiomatic way to achieve that ?

Entity Pattern in Redux?

Is there a standard entity pattern in the redux framework/library?
I am fairly new to react/redux, and I am building a simple pie chart application where you can add pie slices and change the pie chart name. (I am using the Immutable js library)
In my reducer the code looks really nasty and bulky :
switch (action.type) {
case 'CREATE_SLICE':
var myList = imState.getIn(['app', 'pie', 'data'])
myList = myList.toJS();
myList.push(action.slice);
var v = Immutable.fromJS(myList)
imState = imState.setIn(['app', 'pie', 'data'], v)
break;
case 'CHANGE_NAME':
var newName = action.newName;
imState = imState.setIn(['app', 'pie', 'name'], newName)
break;
So I decided to refactor this into a type of entity class :
class PieChart {
static get path() {
return ['app', 'pie'];
}
static createSlice(imState, action) {
var myList = imState.getIn([...this.path, 'data'])
myList = myList.toJS();
myList.push(action);
var v = Immutable.fromJS(myList)
imState = imState.setIn([...this.path, 'data'], v)
return imState;
}
static changeName(imState, newName) {
imState = imState.setIn([...this.path, 'name'], newName)
return imState;
}
}
This class contains no state. It only gets the state object passed to it via functions.
A path function contains the path inside the state object to the entity that the class is concerned with.
My question is, is this a common pattern? Is this an appropriate way to encapsulate entity data or functionality?
This would generally be considered an anti-pattern. Redux encourages use of plain data, and plain functions to manipulate that data. You don't need to wrap up that data into entity classes just to work with it.

Preserve null values in array of Play framework form mapping

I'm trying to get an idea how can I force Play Scala framework form mapper to save null values in array property.
Example. Request body (print out of snippet below):
AnyContentAsJson({
"entities":["ENI","GDF Suez","Procter & Gamble"],
"entityValues":[null,"42",null]
})
Resulting value of entityValues property after binding:
List(Some(42.0))
But I want to see:
List(None, Some(42.0), None)
Code snippet of controller:
def actionX = Action {implicit request =>
println(request.body)
TaskForm.form.bindFromRequest.fold(
formWithErrors => {
BadRequest("error")
},
taskData => {
println(taskData.entityValues)
}
)
}
Form class with mapping:
case class TaskForm(entities: List[String],
entityValues: List[Option[Double]]) { }
object TaskForm {
val map = mapping(
"entities" -> list(text),
"entityValues" -> list(optional(of(doubleFormat)))
)(TaskForm.apply)(TaskForm.unapply)
val form = Form(
map
)
}
I also tried some combinations of optional and default mapping parameters, but a result is still the same.
Using 0 or any another numeric value instead of null is not a case.
Does anyone have any ideas how to implement such form behaviour?
Thanks in advance for your time and attention.
It looks like you're sending JSON to a form endpoint. While this will work for simple JSON structures, you get no control over how it is done and hence get problems like the one you're seeing.
I'd be explicit about being a JSON-endpoint, and then you can define your own Reads[Option[Double]] that works precisely how you want it to:
First, define the implicits at the Controller level; here's where we get to control the null-handling; it ends up being pretty easy:
implicit val optionalDoubleReads = new Reads[Option[Double]] {
def reads(json: JsValue) = json match {
case JsNumber(n) => JsSuccess(Some(n.toDouble))
case JsString(n) => JsSuccess(Some(n.toDouble))
case JsNull => JsSuccess(None) // The important one
case _ => JsError("error.expected.jsnumber")
}
}
implicit val taskReads = Json.reads[TaskForm]
With that done, we modify your Action to require JSON (using parse.json). The function itself remains remarkably similar to the original form-binding fold:
def actionX = Action(parse.json) { implicit request =>
println(request.body)
request.body.validate[TaskForm].fold(
jsonErrors => {
BadRequest(s"Error: $jsonErrors")
},
taskData => {
println(taskData.entityValues)
Ok(taskData.entityValues.toString)
}
)
}

Extended event emitter functions across stores are clashing in Flux

I have multiple Flux stores. Now clearly, all of them are extending the same Event emitter singleton. This has led to events across stores clashing with each other (even the most common, emitChange). There seems to be no difference between doing Store1.getID() and Store2.getID(), because stores seem to be one large object extended from every other store. What am I doing wrong?
I have been having this issue for a while now, and its driving me nuts. I am sure this has a simple answer that I am missing. It's one of the reasons I am waiting for relay and GraphQL.
EDIT: What all my stores look like in code.
var Events = require('events'), extend = require('deep_extend'),
EventEmitter = Events.EventEmitter,
CHANGE_EVENT = 'change';
var SomeStore = extend(EventEmitter.prototype, {
someGetter: function(){
return _someVar;
},
dispatchToken: AppDispatcher.register(function(action) {
switch(action.type) {
case 'SOME_ACTION':
_someVar = 'someValue'
break;
default:
return true;
}
SomeStore.emitChange();
return true;
})
});
return SomeStore;
stores seem to be one large object extended from every other store.
There must be some problem with how you extend from EventEmitter otherwise your code should be working fine.
Now that there are a few ways to do the same thing, here is how facebook implemented it in their official examples:
var assign = require('object-assign');
var EventEmitter = require('events').EventEmitter;
var TodoStore = assign({}, EventEmitter.prototype, {
...
UPDATE
Now looking at your code
extend(EventEmitter.prototype, {
is actually writing on the prototype itself, hence the errors you got. Instead you should be extending an empty object:
extend({}, EventEmitter.prototype, {

Resources