Clarification of Strongloop Subscription Pricing CPU versus Node Processes

Clarification of Strongloop Subscription Pricing CPU versus Node Processes - subscription

Trying to figure out if Strongloop support is priced by CPU/VCPU or by the number Node.js processes?
Strongloop site, https://strongloop.com/node-js/subscription-plans/ says
"Subscriptions are sized by number of Node.js application processes/instances, equivalent to the number of CPUs/VCPUs used."
But I can have multiple Node processes on 1 CPU. So if I had 6 Node processes being supported by 4 physical CPUs how many "processes" do I need to buy for the support plans?

StongLoop subscription plans do not take the number of CPUs into account, only Processes. You can have as many or as few CPUs as you would like; the number of processes will determine the appropriate subscription plan.
Please let me know if you would like to jump on a quick call to discuss your use case in further detail.

Related

flink jobmanger or taskmanger instances

I had few questions in flink stream processing framework. Please let me know the your comments on these questions.
Let say If I build the cluster with n nodes, out of which I had m nodes as job mangers (for HA) then, remaining nodes (n-m) are the ask mangers?
In each node, We had n cores then how we can control/to use the specific number of cores to task-manger/job-manger?
If we add the new node as task-manger then, does the job manger automatically assign the task to the newly added task-manger?
Does flink has concept of partitions and data skew?
If flink connects to pulsar and need to read the data from portioned topic. So, what is the parallelism here? (parallelism is equal to no. of partitions or it's completely depends the flink task-manager's no.of task slots)
Does flink has any inbuilt optimization on job graph? (Example. My job graph has so many filter, map , flatmap.. etc). Please can you suggest any docs/materials for flink job optimizations?
do we have any option like, one dedicated core can be used for prometheus metrics scraping?

Yes
Configuring the number of slots per TM: https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/flink-architecture/#task-slots-and-resources although each operator runs in its own thread and you have no control on which core they run, so you don't really have a fine-grained control of how cores are used. Configuring resource groups also allows you to distribute operators across slots: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/#task-chaining-and-resource-groups
Not for currently running jobs, you'd need to re-scale them. New jobs will use it though.
Yes. https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/sources/
It will depend on the Fink source parallelism.
It automatically optimizes the graph as it sees fit. You have some control rescaling and chaining/splitting operators: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/ (towards the end). As a rule of thumb, I would start deploying a full job per slot and then, once properly understood where are the bottlenecks, try to optimize the graph. Most of the time is not worth it due to increased serialization and shuffling of data.
You can export Prometheus metrics, but not have a core dedicated to it: https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/metric_reporters/#prometheus

Configuring core usage per slot in Flink

I have a cluster of 3 machines with 4 cores each. Each machine has one task manager. I know that the number of slots in Flink can be controlled by taskmanager.numberOfTaskSlots. I initially had allotted 12 slots in total (every task manager had 4 slots). Although, there is no explicit CPU isolation among slots (as mentioned here), I assume that each slot is roughly using 1 core. Am I right in assuming this?
I haven't mentioned any slot sharing group in my code and my pipeline does not have any blocking edges. The parallelism of each task is the same and is equal to the number of slots. I am assuming that one subtask from each task will be in a slot. Am I correct in this understanding?
After some conversation (link for the curious minds :-)), I wanted to increase the cores per slot to 2 for my experiments. So, I reduced the taskmanager.numberOfTaskSlots to 2 on each machine? After doing this, I see that the Flink WebUI shows 6 slots is total and 2 slots for each task manager. I have also reduced the parallelism of each task to 6. Is this all that I need to do?
Note: I am not using the MVP feature of fine grained resource management right now.

That sounds right.
Each Task Manager is a single JVM. A task slot doesn't correspond to anything physical -- it's just an abstract resource managed by the Flink scheduler. Each task in a task slot is an instance of an operator chain in the execution graph, and each task is single-threaded. No two instances of the same operator chain will ever be scheduled into the same slot.
All of the threads for all of the tasks in all of the slots in given task manager will compete for the resources available to that JVM: cores, memory, etc.
As you have noted, there is no way to explicitly set the number of cores per slot. And there's no requirement that the number be an integer. You could, for example, decide that your 4-core TMs are each providing 3 slots, for a total parallelism of 9 across the 3 TMs.

Apache Flink: number of TaskManagers per machine

The number of CPU cores per machine is four. In flink standalone mode, how should I set the number of TaskManagers on each machine?
1 TaskManager, each TaskManager has 4 slots.
2 TaskManagers, each TaskManager has 2 slots.
4 TaskManagers, each TaskManager has 1 slot. This setting is like apache-storm.

Normally you'd have one TaskManager per server, and (as per the doc that bupt_ljy referenced) one slot per physical CPU core. So I'd go with your option #1.

There's also the consideration of Flink's scheduling algorithm. We've frequently run into problems where, with multiple hosts running one large task manager a piece, all jobs get scheduled to one host, which can cause load problems.
We ended up making multiple smaller task managers per host and jobs seem to be distributed better (although they still cluster on one node often).
So, in my experience, I'd lean more towards 4 task managers with 1 slot a piece, or maybe compromise at 2 task managers with 2 slots a piece.

I think it depends on your application.
In official documents Distributed Runtime Environment, it says As a rule-of-thumb, a good default number of task slots would be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts.
But if you have to use a lot of memory in your application, then you don't need too many slots in one task manager.

Google Cloud Spanner - technical implication of 3 node recommendation in production

I'm wondering if there is any technical implication behind that recommendation.
It can't be split-brain because even when configuring 1 node there are 3 instances in total running in different data-centers within the configured region, each having all the data.
So when configuring 3 nodes google will run 9 instances in total sharding the data as well.
I would be happy if anybody could provide information on what's behind that recommendation.
Thanks,
Christian

"when configuring 1 node there are 3 instances"
Quickly correcting terminology to make sure we're on the same page, 1 node still has 3 replicas. Or, more fully: "When configuring your instance to have only 1 node, there are still 3 replicas". When you have 3 nodes, you still have 3 replicas, however each replica will have more physical machines assigned to it.
The reason 3 nodes is the recommended minimum for a 'production grade' instance is around the number of machines that are assigned. When working with highly available systems we need to consider the probability of a replica being unavailable (zone maintenance, unforeseen network issues, etc) combined with the probability of other simultaneous issues, such as physical machines failures.
It's true that even with 1 node Cloud Spanner can continue to operate if a replica becomes available. However, to provide the uptime guarantees of Cloud Spanner, 3 nodes reduces the probability of physical machine failures being catastrophic to the overall system.

Task distribution in Apache Flink

Consider a Flink cluster with some nodes where each node has a multi-core processor. If we configure the number of the slots based on the number of cores and equal share of memory, how does Apache Flink distribute the tasks between the nodes and the free slots? Are they fairly treated?
Is there any way to make/configure Flink to treat the slots equally when we configure the task slots based on the number of the cores available on a node
For instance, assume that we partition the data equally and run the same task over the partitions. Flink uses all the slots from some nodes and at the same time some nodes are totally free. The node which has less number of CPU cores involved outputs the result much faster than the node with more number of CPU cores involved in the process. Apart from that, this ratio of speedup is not proportional to the number of used cores in each node. In other words, if in one node one core is occupied and in another node two cores are occupied, in fairly treating each core as a slot, each slot should output the result over the same task in almost equal amount of time irrespective of which node they belong to. But, this is not the case here.
With this assumption, I would say that the nodes are not treated equally. This in turn produces a result time wise that is not proportional to the number of the nodes available. We can not say that increasing the number of the slots necessarily decreases the time cost.
I would appreciate any comment from the Apache Flink Community!!

Flink's default strategy as of version >= 1.5 considers every slot to be resource-wise the same. With this assumption, it should not matter wrt resources where you place the tasks since all slots should be the same. Given this, the main objective for placing tasks is to colocate them with their inputs in order to minimize network I/O.
If we are now in a standalone setup where we have a fixed number of TaskManagers running, Flink will pick slots in an arbitrary fashion (no guarantee given) for the sources and then colocate their consumers in the same slots if possible.
When running Flink on Yarn or Mesos where Flink can start new TaskManagers, Flink will first use up all slots of an existing TaskManager before it requests a new one. In this case, you will see that all sources will end up on as few TaskManagers as possible.
Since CPUs are not isolated wrt slots (they are a shared resource), the above-mentioned assumption does not hold true in all cases. Hence, in some cases where you have a fixed set of TaskManagers it is actually beneficial to spread the tasks out as much as possible to make use of the shared CPU resources.
In order to support this kind of scheduling strategy, the Flink community added the task spread out strategy via FLINK-12122. In order to use a scheduling strategy which is more similar to the pre FLIP-6 behaviour where Flink tries to spread out the workload across all available TaskExecutors, one needs to set cluster.evenly-spread-out-slots: true in the flink-conf.yaml

Very old thread, but there is a newer thread that answers this question for current versions.
with Flink 1.5 we added resource elasticity. This means that Flink is now able to allocate new containers on a cluster management framework like Yarn or Mesos. Due to these changes (which also apply to the standalone mode), Flink no longer reasons about a fixed set of TaskManagers because if needed it will start new containers (does not work in standalone mode). Therefore, it is hard for the system to make any decisions about spreading slots belonging to a single job out across multiple TMs. It gets even harder when you consider that some jobs like yours might benefit from such a strategy whereas others would benefit from co-locating its slots. It gets even more complicated if you want to do scheduling wrt to multiple jobs which the system does not have full knowledge about because they are submitted sequentially. Therefore, Flink currently assumes that slots requests can be fulfilled by any TaskManager.