Flink-statefun dynamic function discovery and fan-out execution - apache-flink

What would be a scalable way to dynamically register and call remote-statefun? I know I can register statefun while submitting a flink job but it's not ideal to submit a new build per new function. I wonder why would flink need to know about remote functions at job start.
If I used the statefun template url as http endpoint, is it possible to dynamically discover remote functions under a namespace?
spec:
functions: com.example/*
urlPathTemplate: https://bar.foo.com/{function.name}
Where function.name is dynamically generated UUID. I don't yet understand how this would work though.
Alternatively, we might be able to leverage broadcast state option(assuming remote statefun can be invoked from KeyedBroadcastProcessFunction). Say, we maintain a map of functions in any external storage e.g s3
The second approach:
Create a KeyedBroadcastProcessFunction that reads current state of function map when the function is open(..)
Send SNS notification when new function is deployed
Read the newly added s3 file by reading SNS notification in the processBroadcastElement method and update a flink state's broadcast state descriptor
All operator instances will share the same underlying broadcasted function map
The KeyedBroadcastProcessFunction will send each new message received in processElement function to all functions in the broadcasted function map
Third and possibly the simplest approach could be to register a process time timer and call s3 to fetch updated function map in onTime handler every 5mins.
Which would a preferred option? any pointers on trade-off analysis between these approaches? (apart from the time lag to discover newly added functions in the third approach)

Related

What is the best way to solve the problem of sending mutations from multiple browser tabs in RTK Query?

I have a simple React application that allows performing CRUD operations on various entities. It uses RTK Query to interact with the backend, and I want to utilize its built-in caching system - invalidating the cache only when a mutation is performed on an endpoint with the given tag. It works fine as long as I have only one tab open. If there are multiple users interacting with the application and one of them performs a mutation, the cache will be invalidated only in this user's browser tab. The update will not be populated to other browsers or tabs that are currently connected to the app. Each user would have to manually refresh the page after another user performed a mutation. One way to solve that is to invalidate the cache periodically, which isn't ideal, another is to force every query to re-fetch on focus, which is even worse. The best scenario would be to somehow detect that another user had sent a mutation for the given endpoint, and then invalidate the cache of this endpoint (by tags) in every other browser tab that is connected to the application. I'm looking for a solution that is better than what I've already implemented, which is the following:
There's a global component with a single websocket, which immediately connects to the backend
The backend assigns a unique identifier to the socket, saves it in a socket pool and sends back an event with the identifier
The component with the socket saves the identifier in Redux
RTK Query adds the identifier as a custom header to every request sent to the backend
The backend checks the HTTP method of the request. If it is a mutation (POST / PUT / PATCH / DELETE), it extracts the identifier from the custom header, filters the socket pool excluding the socket that has the same identifier as in the request, sends an event with the tag of the service that is being mutated to all the filtered sockets
The component's socket receives the event and uses RTK Query's invalidateTags utility function to invalidate the cache of the mutated service
Thanks to that, the whole app functions as if it was a real-time collaboration tool, where every change made by any user is immediately reflected in all the connected browser tabs. However, I think it is a bit too complicated and I feel like I'm reinventing the wheel. This scenario is surely quite popular, and there must be something that I'm missing, like an npm package that solves this problem, an RTK Query option that I've omitted, or a well-known design pattern. Of course, there are multiple packages that allow synchronizing Redux Store across multiple tabs, but that doesn't solve the problem of having multiple users connecting from different devices.
It works fine as long as I have only one tab open
JS code lives within a single tab / open page by default.
That includes JS variables and logic, and the Redux store is just another JS variable.
RTK Query is not specifically designed to interact across tabs. You'll have to write that part yourself.
If the concern is multiple users interacting with the same backend server, that's what RTK Query's polling options are for. Alternately, yeah, you could have the server send a signal via websocket to let the client know it needs to refetch data. But again, that's something you'll need to write yourself, as it's specific to the needs of your own application.

Apache Flink Statefun - Remote Deployment - State propagation

I have a few question about the remote deployment of functions as shown the diagram:
If have remote statefun functions (multiple instance with the Undertow as shown in the examples fronted by api gateway):
Do we need to configure the api gateway to send calls with same url to the same backend hosting the function or does the frame work take care of it ?
From my understanding each function is keeping local state. If one instance is relocated, or we scale the functions, how does the local state get redistributed ?
If there is any documentation, on this, please let me know.
Thanks.
The functions are stateless. All of the state they need in order to respond to an invocation is included in the message, and the response will include any updates to the state that are needed. Because they are stateless there's no need to worry about sessions or instance affinity or rescaling for remote functions.
The developers have given talks that get into some of these details. I'll suggest a talk by Tzu-Li (Gordon) Tai, Stateful Functions: Polyglot Event-Driven Functions for Stateful Distributed Applications.

How to find current value of variable from logic app instance?

Inside my logic app, I am initializing a variable and this variable's value can change over the course of logic app execution. While logic app is still in running mode (waiting for external event to happen) I want a way to find the current value of the variable in logic app.
I can always store value of this variable in data store like SQL server or blob storage and read it from there but I don't want to use external storage. Given that logic apps are kind of stateful in a way, I am wondering if there is a way to get variable value.
So, there isn't a way to peek at the state of a Logic App while it's running. Some data might be available in the Run History, but that's not necessarily real-time and there no easy way to correlate it with any external info.
That means an external mechanism is you're only practical solution, but there's nothing wrong with that.
My suggestion would be an Azure Function + Redis Cache. The Logic App can update its state periodically by some key value, [LogicAppName]+[OrderID] for example, then another client can query on that same pattern.
Eventually, you may want to elevate this to Application Insights if you fine the need to track the entire app or business processes.
I use simple action like Http POST action with fake url and "post" my variable in the post data.
if you need it to run on the actual environment (not only when debugging) you can set the Configure run after to continue even when the HTTP step failed.

Programatically listing and sending requests to dynamic App Engine instances

I want to send a particular HTTP request (or otherwise communicate a message) to every (dynamic/autoscaled) instance which is currently running for a particular App Engine application.
My goal is to trigger each instance to discard some locally cached data (because I have just modified the underlying data and want them to reload it).
One possible solution is to store a value in Memcache, and have instances check this each time they handle a request to see if they should flush their cache. But this adds latency to every request.
Another possible solution would be to somehow stop all running instances. No fixed overhead, but some impact while instances are restarted.
An even less desirable solution would be to redeploy the application code in order to cause all instances to be stopped. This now adds additional delay on my end as a deployment takes some time.
You could use the management API to list instances for a given version, but I'd suggest that you'd probably want to use something like the PubSub API to create a subscription on each of your App Engine instances. Since each instance has its own subscription, any messages sent to the monitored queue will be received by all instances.
You can create the subscription at startup (the /_ah/start endpoint may be useful), and then delete it at shutdown (using the /_ah/stop endpoint).

Creating Web-service for modifying XPO objects by timer

I have several clients that create new objects. When new object is created I need to start a timer that will change some object properties when time is elapsed (each object can be visible only for defined client groups certain time).
I want to use for this purpuses web-service and wrote a method that starts timer.
For example I need to set timer to 5 minutes. Are there any restrictions for executing time? Will a timer keep my web-service alive?
Perhaps, I don't understand your task completely, but your idea about Web Service usage looks strange to me. Web Services are usually used to process requests from remote clients. I.e. a client calls method of a Web Service and Web Service returns a result to this client.
I think, I got your idea :). If you need to just change data in the DB, I think the better solution is to create a windows service which will ping web service when needed.

Resources