Akka.net - Streams with parallelism, backpressure and ActorRef - akka-stream

Tying to learn how use Akka.net Streams to process items in parallel from a Source.Queue, with the processing done in an Actor.
I've been able to get it to work with calling a function with Sink.ForEachParallel, and it works as expected.
Is it possible to process items in parallel with Sink.ActorRefWithAck (as I would prefer it utilize back-pressure)?

About to press Post, when tried to combine previous attempts and viola!
Previous attempts with ForEachParallel failed when I tried to create the actor within, but couldn't do so in an async function. If I use an single actor previous declared, then the Tell would work, but I couldn't get the parallelism I desired.
I got it to work with a router with roundrobin configuration.
var props = new RoundRobinPool(5).Props(Props.Create<MyActor>());
var actor = Context.ActorOf(props);
flow = Source.Queue<Element>(2000,OverflowStrategy.Backpressure)
.Select(x => {
return new Wrapper() { Element = x, Request = ++cnt };
})
.To(Sink.ForEachParallel<Wrapper>(5, (s) => { actor.Tell(s); }))
.Run(materializer);
The Request ++cnt is for console output to verify the requests are being processed as desired.
MyActor has a long delay on every 10th request to verify the backpressure was working.

Related

is requestAnimationFrame belong to microtask or macrotask in main thread task management? if not, how can we categorize this kind of render side task

how react schedule effects? I made some test, it seems hooks is called after requestAnimationFrame, but before setTimeout. So, I was wondering, how is the real implementation of scheduler. I checked react source code, it seems built upon MessageChannel api.
Also, how event-loop runs the macrotask sequence, for instance setTimeout/script etc.?
const addMessageChannel = (performWorkUntilDeadline: any) => {
const channel = new MessageChannel();
const port = channel.port2;
channel.port1.onmessage = performWorkUntilDeadline;
port.postMessage(null);
}
const Component1 = () => {
const [value,] = useState('---NOT INITIALISED')
requestIdleCallback(() => {
console.log('requestIdleCallback---')
})
useEffect(() => {
console.log('useEffect---')
}, [])
Promise.resolve().then(() => {
console.log('promise---')
})
setTimeout(() => {
console.log('setTimeout---')
});
addMessageChannel(()=> {
console.log('addMessageChannel---')
})
requestAnimationFrame(() => {
console.log('requestAnimationFrame---')
})
return <div>{value}</div>;
}
export default Component1
browser console result:
promise---
requestAnimationFrame---
addMessageChannel---
useEffect---
setTimeout---
requestIdleCallback---
I'm not sure about the useEffect so I'll take your word they use a MessageChannel and consider both addMessageChannel and useEffect a tie.
First the title (part of it at least):
[Does] requestAnimationFrame belong to microtask or macrotask[...]?
Technically... neither. requestAnimationFrame (rAF)'s callbacks are ... callbacks.
Friendly reminder that there is no such thing as a "macrotask": there are "tasks" and "microtasks", the latter being a subset of the former.
Now while microtasks are tasks they do have a peculiar processing model since they do have their own microtask-queue (which is not a task queue) and which will get visited several times during each event-loop iterations. There are multiple "microtask-checkpoints" defined in the event-loop processing model, and every time the JS callstack is empty this microtask-queue will get visited too.
Then there are tasks, colloquially called "macro-tasks" here and there to differentiate from the micro-tasks. Only one of these tasks will get executed per event-loop iteration, selected at the first step.
Finally there are callbacks. These may be called from a task (e.g when the task is to fire an event), or in some particular event-loop iterations, called "painting frames".
Indeed the step labelled update the rendering is to be called once in a while (generally when the monitor sent its V-Sync update), and will run a series of operations, calling callbacks, among which our dear rAF's callbacks.
Why is this important? Because this means that rAF (and the other callbacks in the "painting frame"), have a special place in the event-loop where they may seem to be called with the highest priority. Actually they don't participate in the task prioritization system per se (which happens in the first step of the event loop), they may indeed be called from the same event-loop iteration as even the task that did queue them.
setTimeout(() => {
console.log("timeout 1");
requestAnimationFrame(() => console.log("rAF callback"));
const now = performance.now();
while(performance.now() - now < 1000) {} // lock the event loop
});
setTimeout(() => console.log("timeout 2"));
Which we can compare with this other snippet where we start the whole thing from inside a rAF callback:
requestAnimationFrame(() => {
setTimeout(() => {
console.log("timeout 1");
requestAnimationFrame(() => console.log("rAF callback"));
});
setTimeout(() => console.log("timeout 2"));
});
While this may seem like an exceptional case to have our task called in a painting-frame, it's actually quite common, because browsers have recently decided to break rAF make the first call to rAF trigger a painting frame instantly when the document is not animated.
So any test with rAF should start long after the document has started, with an rAF loop already running in the background...
Ok, so rAF result may be fluck. What about your other results.
Promise first, yes. Not part of the task prioritization either, as said above the microtask-queue will get visited as soon as the JS callstack is empty, as part of the clean after running a script step.
rAF, fluck.
addMessageChannel, see this answer of mine. Basically, in Chrome it's due to both setTimeout having a minimum timeout of 1ms, and a higher priority of the message-tasksource over the timeout-tasksource.
setTimeout currently has a 1ms minimum delay in Chrome and a lower priority than MessageEvents, still it would not be against the specs to have it called before the message.
requestIdleCallback, that one is a bit complex but given it will wait for the event-loop has not done anything in some time, it will be the last.

Best/Quickest way to execute Promises in-parallel? (React)

Suppose I need to fetch data to create a card. What is the quickest way to get this data using promises? This is the current way I'm doing it:
async function getCards() {
const promises = []
for (let i = 0; i < 10; i++) {
promises.push(getCard(i))
}
const cards = await Promise.allSettled(promises)
setCards(cards)
}
async function getCard(i) {
const property1 = await getProperty1(i)
const property2 = await getProperty2(i)
const property3 = await getProperty3(i)
const card = <div>
<div>Property 1: {property1}</div>
<div>Property 2: {property2}</div>
<div>Property 3: {property3}</div>
</div>
return card
}
For my purposes, I don't need Promise.allSettled, since I don't need to wait for all 10 cards to finish awaiting (I may just create a component), I can render each one as they complete. But I'd still like it to be parallel/execute as fast as possible. What other options do I have there? And is there a better way to handle what I'm doing in getCard?
If getPropertyN() are indeed an asynchronous operation (such as a networking request), then getCards() will run all the calls in your for loop in parallel, such that they are all in-flight at the same time and it will generally reduce the end-to-end time vs. run them serially.
There are some other factors in play, such as what the receiving host does when it receives a bunch of requests at once. If it only handles them one at a time, then you may not gain a whole lot. But, if the host has any parallelism, then you will definitely see a speedup by putting multiple requests in flight at the same time.
Note that your getCard(i) implementation is serializing the three calls to getProperty1(), getProperty2() and getProperty3() which perhaps could also be done in parallel with something like:
const [property1, property2, property3] = await Promise.all([
getProperty1(i),
getProperty2(i),
getProperty3(i)
]);
Instead of this:
const property1 = await getProperty1(i)
const property2 = await getProperty2(i)
const property3 = await getProperty3(i)
Another thing to keep in mind is that a browser (such as a fetch() call) will only make N simultaneous requests to the same host (where N is around 6). Once you exceed that number of requests to the same host that are all in-flight at the same time, then the browser will queue the rest of the requests until one of the previous ones finishes. The way it's implemented, it doesn't slow things down to do more than the max requests, but you don't gain any more parallelism after the browser's limit. If you were running this code from a different Javascript environment such as nodejs, then that limit would not apply as this is a browser-specific thing.
Note, the key thing to achieving the parallelism is launching multiple requests to be in-flight at the same time. There is no requirement that you use Promise.allSettled() before acting on any results unless you need to get all the results in order before you can process the results.
If the results can be processed individually as they finish and can be processed in any order, you can also write the code that way without using Promise.allSettled() such as:
getProperty(1).then(processResult).catch(processErr);
getProperty(2).then(processResult).catch(processErr);
getProperty(3).then(processResult).catch(processErr);
Note: I also don't see any error handling in your code. Any outside network request can fail and you must have some handler for rejected promises.

share() vs ReplaySubject: Which one, and neither works

I'm trying to implement short-term caching in my Angular service -- a bunch of sub-components get created in rapid succession, and each one has an HTTP call. I want to cache them while the page is loading, but not forever.
I've tried the following two methods, neither of which have worked. In both cases, the HTTP URL is hit once for each instance of the component that is created; I want to avoid that -- ideally, the URL would be hit once when the grid is created, then the cache expires and the next time I need to create the component it hits the URL all over again. I pulled both techniques from other threads on StackOverflow.
share() (in service)
getData(id: number): Observable<MyClass[]> {
return this._http.get(this.URL)
.map((response: Response) => <MyClass[]>response.json())
.share();
}
ReplaySubject (in service)
private replaySubject = new ReplaySubject(1, 10000);
getData(id: number): Observable<MyClass[]> {
if (this.replaySubject.observers.length) {
return this.replaySubject;
} else {
return this._http.get(this.URL)
.map((response: Response) => {
let data = <MyClass[]>response.json();
this.replaySubject.next(data);
return data;
});
}
}
Caller (in component)
ngOnInit() {
this.myService.getData(this.id)
.subscribe((resultData: MyClass[]) => {
this.data = resultData;
},
(error: any) => {
alert(error);
});
}
There's really no need to hit the URL each time the component is created -- they return the same data, and in a grid of rows that contain the component, the data will be the same. I could call it once when the grid itself is created, and pass that data into the component. But I want to avoid that, for two reasons: first, the component should be relatively self-sufficient. If I use the component elsewhere, I don't want to the parent component to have to cache data there, too. Second, I want to find a short-term caching pattern that can be applied elsewhere in the application. I'm not the only person working on this, and I want to keep the code clean.
Most importantly, if you want to make something persistent even when creating/destroying Angular components it can't be created in that component but in a service that is shared among your components.
Regarding RxJS, you usually don't have to use ReplaySubject directly and use just publishReplay(1, 10000)->refCount() instead.
The share() operator is just a shorthand for publish()->refCount() that uses Subject internally which means it doesn't replay cached values.

Angular Service and Web Workers

I have an Angular 1 app that I am trying to increase the performance of a particular service that makes a lot of calculations (and probably is not optimized but that's besides the point for now, running it in another thread is the goal right now to increase animation performance)
The App
The app runs calculations on your GPA, Terms, Courses Assignments etc. The service name is calc. Inside Calc there are user, term, course and assign namespaces. Each namespace is an object in the following form
{
//Times for the calculations (for development only)
times:{
//an array of calculation times for logging and average calculation
array: []
//Print out the min, max average and total calculation times
report: function(){...}
},
//Hashes the object (with service.hash()) and checks to see if we have cached calculations for the item, if not calls runAllCalculations()
refresh: function(item){...},
//Runs calculations, saves it in the cache (service.calculations array) and returns the calculation object
runAllCalculations: function(item){...}
}
Here is a screenshot from the very nice structure tab of IntelliJ to help visualization
What Needs To Be Done?
Detect Web Worker Compatibility (MDN)
Build the service depending on Web Worker compatibility
a. Structure it the exact same as it is now
b. Replace with a Web Worker "proxy" (Correct terminology?) service
The Problem
The problem is how to create the Web Worker "Proxy" to maintain the same service behavior from the rest of the code.
Requirements/Wants
A few things that I would like:
Most importantly, as stated above, keep the service behavior unchanged
To keep one code base for the service, keep it DRY, not having to modify two spots. I have looked at WebWorkify for this, but I am unsure how to implement it best.
Use Promises while waiting for the worker to finish
Use Angular and possibly other services inside the worker (if its possible) again WebWorkify seems to address this
The Question
...I guess there hasn't really been a question thus far, it's just been an explanation of the problem...So without further ado...
What is the best way to use an Angular service factory to detect Web Worker compatibility, conditionally implement the service as a Web Worker, while keeping the same service behavior, keeping DRY code and maintaining support for non Web Worker compatible browsers?
Other Notes
I have also looked at VKThread, which may be able to help with my situation, but I am unsure how to implement it the best.
Some more resources:
How to use a Web Worker in AngularJS?
http://www.html5rocks.com/en/tutorials/workers/basics/
https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers#Worker_feature_detection
In general, good way to make a manageable code that works in worker - and especially one that also can run in the same window (eg. when worker is not supported) is to make the code event-driven and then use simple proxy to drive the events through the communication channel - in this case worker.
I first created abstract "class" that didn't really define a way of sending events to the other side.
function EventProxy() {
// Object that will receive events that come from the other side
this.eventSink = null;
// This is just a trick I learned to simulate real OOP for methods that
// are used as callbacks
// It also gives you refference to remove callback
this.eventFromObject = this.eventFromObject.bind(this);
}
// Object get this as all events callback
// typically, you will extract event parameters from "arguments" variable
EventProxy.prototype.eventFromObject = (name)=>{
// This is not implemented. We should have WorkerProxy inherited class.
throw new Error("This is abstract method. Object dispatched an event "+
"but this class doesn't do anything with events.";
}
EventProxy.prototype.setObject = (object)=> {
// If object is already set, remove event listener from old object
if(this.eventSink!=null)
//do it depending on your framework
... something ...
this.eventSink = object;
// Listen on all events. Obviously, your event framework must support this
object.addListener("*", this.eventFromObject);
}
// Child classes will call this when they receive
// events from other side (eg. worker)
EventProxy.prototype.eventReceived = (name, args)=> {
// put event name as first parameter
args.unshift(name);
// Run the event on the object
this.eventSink.dispatchEvent.apply(this.eventSink, args);
}
Then you implement this for worker for example:
function WorkerProxy(worker) {
// call superconstructor
EventProxy.call(this);
// worker
this.worker = worker;
worker.addEventListener("message", this.eventFromWorker = this.eventFromWorker.bind(this));
}
WorkerProxy.prototype = Object.create(EventProxy.prototype);
// Object get this as all events callback
// typically, you will extract event parameters from "arguments" variable
EventProxy.prototype.eventFromObject = (name)=>{
// include event args but skip the first one, the name
var args = [];
args.push.apply(args, arguments);
args.splice(0, 1);
// Send the event to the script in worker
// You could use additional parameter to use different proxies for different objects
this.worker.postMessage({type: "proxyEvent", name:name, arguments:args});
}
EventProxy.prototype.eventFromWorker = (event)=>{
if(event.data.type=="proxyEvent") {
// Use superclass method to handle the event
this.eventReceived(event.data.name, event.data.arguments);
}
}
The usage then would be that you have some service and some interface and in the page code you do:
// Or other proxy type, eg socket.IO, same window, shared worker...
var proxy = new WorkerProxy(new Worker("runServiceInWorker.js"));
//eg user clicks something to start calculation
var interface = new ProgramInterface();
// join them
proxy.setObject(interface);
And in the runServiceInWorker.js you do almost the same:
importScripts("myservice.js", "eventproxy.js");
// Here we're of course really lucky that web worker API is symethric
var proxy = new WorkerProxy(self);
// 1. make a service
// 2. assign to proxy
proxy.setObject(new MyService());
// 3. profit ...
In my experience, eventually sometimes I had to detect on which side am I but that was with web sockets, which are not symmetric (there's server and many clients). You could run into similar problems with shared worker.
You mentioned Promises - I think the approach with promises would be similar, though maybe more complicated as you would need to store the callbacks somewhere and index them by ID of the request. But surely doable, and if you're invoking worker functions from different sources, maybe better.
I am the author of the vkThread plugin which was mentioned in the question. And yes, I developed Angular version of vkThread plugin which allows you to execute a function in a separate thread.
Function can be defined directly in the main thread or called from an external javascript file.
Function can be:
Regular functions
Object's methods
Functions with dependencies
Functions with context
Anonymous functions
Basic usage:
/* function to execute in a thread */
function foo(n, m){
return n + m;
}
// to execute this function in a thread: //
/* create an object, which you pass to vkThread as an argument*/
var param = {
fn: foo // <-- function to execute
args: [1, 2] // <-- arguments for this function
};
/* run thread */
vkThread.exec(param).then(
function (data) {
console.log(data); // <-- thread returns 3
}
);
Examples and API doc: http://www.eslinstructor.net/ng-vkthread/demo/
Hope this helps,
--Vadim

rx.js catchup subscription from two sources

I need to combine a catch up and a subscribe to new feed. So first I query the database for all new records I've missed, then switch to a pub sub for all new records that are coming in.
The first part is easy do your query, perhaps in batches of 500, that will give you an array and you can rx.observeFrom that.
The second part is easy you just put an rx.observe on the pubsub.
But I need to do is sequentially so I need to play all the old records before I start playing the new ones coming in.
I figure I can start the subscribe to pubsub, put those in an array, then start processing the old ones, and when I'm done either remove the dups ( or since I do a dup check ) allow the few dups, but play the accumulated records until they are gone and then one in one out.
my question is what is the best way to do this? should I create a subscribe to start building up new records in an array, then start processing old, then in the "then" of the oldrecord process subscribe to the other array?
Ok this is what I have so far. I need to build up the tests and finish up some psudo code to find out if it even works, much less is a good implementation. Feel free to stop me in my tracks before I bury myself.
var catchUpSubscription = function catchUpSubscription(startFrom) {
EventEmitter.call(this);
var subscription = this.getCurrentEventsSubscription();
// calling map to start subscription and catch in an array.
// not sure if this is right
var events = rx.Observable.fromEvent(subscription, 'event').map(x=> x);
// getPastEvents gets batches of 500 iterates over and emits each
// till no more are returned, then resolves a promise
this.getPastEvents({count:500, start:startFrom})
.then(function(){
rx.Observable.fromArray(events).forEach(x=> emit('event', x));
});
};
I don't know that this is the best way. Any thoughts?
thx
I would avoid mixing your different async strategies unnecessarily. You can use concat to join together the two sequences:
var catchUpSubscription = function catchUpSubscription(startFrom) {
var subscription = this.getCurrentEventsSubscription();
return Rx.Observable.fromPromise(this.getPastEvents({count:500, start:startFrom}))
.flatMap(x => x)
.concat(Rx.Observable.fromEvent(subscription, 'event'));
};
///Sometime later
catchUpSubscription(startTime).subscribe(x => /*handle event*/)

Resources