Can DBT run with a pool size greater than 1 in Airflow?

Can DBT run with a pool size greater than 1 in Airflow? - snowflake-cloud-data-platform

I am running DBT Core with Airflow 2.0.2 and a Snowflake database. Currently my dbt tasks are on a pool with 1 slot. Is it safe to increase this number to 2 (or larger)?
Thank you.

Related

How does parallelism work in Apache Flink?

Consider I have a Flink cluster of 3 nodes. One node is for Job Manager and the other 2 nodes are for task manager. Each task manager has 3 task slots. So, when I submit my job with parallelism equal to 2, Flink will assign two task slots. So, my question is, how Flink will assign these task slots?
Some scenario
Does Flink assign one task slot from each task manager?
Is there a possibility that both task slots get assign from the same task manager? If yes, my job will not be running if that particular node is down for some reason. How can I avoid downtime in this scenario?

Since Flink 1.10 you can use the configuration setting cluster.evenly-spread-out-slots: true to cause the scheduler to spread out the slots evenly across all available task managers. Otherwise it will use all of the slots from one task manager before taking slots from the other.

Yes, task slots can be assigned to the same task manager given that each TM has 3 slots. If any node running active slot is down, the job will fail and will try to restart and at this point all the slots will be assigned on the only running node.

Flink yarn-session mode is becoming unstable when running ~10 batch jobs at same time

I am trying to set up a flink-yarn session to run ~100+ batch jobs. After getting connected to ~40 task managers and ~10 jobs running (each task manager with 2 slots and 1GB memory each) it looks like the session becomes unstable. There were enough resources available. The flink UI suddenly becomes not available, I guess the job manager might have died already. Eventually, the yarn application also got killed.
Job manager is running on 4 core 16GB node 12 gb available
Is there any guide to do the math for job manager resource vs the number of task managers it can handle?

I got this fixed. The reason the flink-session breaking was the low bandwidth of worker machines in the cluster. The worker machine which runs the task manager container should have at least 750Mbps or up. With each task manager having 2 slots and 1GB of memory, a moderate bandwidth ~ 450Mbps won't cut it. if the job is IO intensive, Communication between actors(job manager and workers or worker to the worker) could potentially get timed out(default ask time out is 100ms).
I decided to not to increase the ask timeout so that the jobs won't take long because of this bottleneck.

Running multiple Flink programs in a Flink Standalone Cluster (v1.4.2)

I have a Flink Standalone Cluster based on Flink 1.4.2 (1 job manager, 4 task slots) and want to submit two different Flink programs.
Not sure if this is possible at all as some flink archives say that a job manager can only run one job. If this is true, any ideas how can I get around this issue? There is only one machine available for the Flink cluster and we don't want to use any resource manager such as Mesos or Yarn.
Any hints?

The Flink jobs (programs) run in task slots which are located in a task manager. Assuming you have 4 task slots, you can run up-to 4 Flink programs. Also, be careful with the parallelism of your Flink jobs. If you have set the parallelism to 2 in both jobs, then yes 2 is the maximum number of jobs you can run on 4 task slots. Each parallel instance runs on a task slot.
Check this image: https://ci.apache.org/projects/flink/flink-docs-master/fig/slots_parallelism.svg

Spring Batch Processor Deadlocks

I am working on a Spring Boot application that runs a Spring Batch Processor on receiving a request. The processor step involves calling multiple APIs. The database is SQL Server. Following are the details about the job
No of partitions - 10 - each reading 2000 records from database table
Task Executor - max pool size - 1000
Task Executor - core pool size -500
Task Executor - queue capacity - 1000
Write chunk size - 100
There is only one job running at one point of time.
I am using Spring Data JPA to save list of Hibernate entities. Inserting results into database with multiple partitions causes deadlocks.
Any help in figuring out the issue will be very helpful. Thanks in advance.

Running same sync process on local db VS Asure

I have a WCF service on azure which perform multiple tasks.
Tasks:
1. download a file from a remote server (about 50MB)
2. preform bulk insert to a Azure Database (about 360,000 records) at once.
3. run 2 different SP (about 15 sec tops both)
The total process takes about 1 min on my local SQL server 2012 Database,
but when I try to run it from a cloud WCF (not Cloud Service) it takes more than the timeout connection can handle (4-30 min).
Still I don't understand why there is a significant difference of performance.
Cloud Resources? and if so how can I make it perform better same as if I ran it locally (as close as I can)?
Regards,
Avishai

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Can DBT run with a pool size greater than 1 in Airflow? - snowflake-cloud-data-platform

I am running DBT Core with Airflow 2.0.2 and a Snowflake database. Currently my dbt tasks are on a pool with 1 slot. Is it safe to increase this number to 2 (or larger)? Thank you.

Related

How does parallelism work in Apache Flink?

Flink yarn-session mode is becoming unstable when running ~10 batch jobs at same time

Running multiple Flink programs in a Flink Standalone Cluster (v1.4.2)

Spring Batch Processor Deadlocks

Running same sync process on local db VS Asure

Categories

Resources