Utilising multiple SQL Servers simultaneously via SSIS - sql-server

I recently discovered how to utilise the processing resources of multiple SQL Servers simultaneously through SSMS. (From a brilliant Thread on this forum). Where one registers multiple servers, from View --> Registered Servers (in SSMS), see pic below.
My Question is, is it possible to encapsulate SQL statements in an Execute Sql command container, that then utilises the resources of multiple Servers simultaneously in SSIS, just as it can be done within SSMS?

SSIS can certainly execute tasks against multiple servers at the same time, but you can't use multiple servers to share the execution of a single task. If you want the same SQL to execute against multiple server simultaneously in the same way multiserver execution works in SSMS, you must create a separate execute SQL task for each, you can't share that one task. If you wanted to change the executed SQL statement, this would mean editing all of the tasks containing that statement. But you can avoid this by making the executed SQL statement be sourced from an SSIS variable. That way, you only need to change a single variable.
To execute multiple tasks at the same time, simply drag multiple execution arrows out of the parent task. If there is no parent task, just drop the execute SQL tasks down on the design surface with no connection between them. James Serra wrote a quick blog entry about controlling parallel execution in SSIS quite a while ago, but the information is still current.

Related

Running several SQL jobs simultaneously under one job

Is it possible in SQL Server to run several jobs simultaneously under different sessions under same job.
For example, I have N stored procedures to run. They all have to be run under different sessions and start at the same time. I don't want to create N jobs, I want all of them start at the same time under 1 job.
In the past I've had one job create and start several other jobs using the sp_add_job command. If you set the delete level to 3 then the job will then get automatically deleted once it has completed.
The disadvantages are security and monitoring all the jobs.
I don't see any other option than using ssis sql script tasks for different scripts without any link between them and executing them. This will allow to run different SP or sql script to run parallel.Thanks!

Lock an SSIS package from multiple simultaneous executions

I have an SSIS package(Package1.dtsx) that been deployed to SSISDB. currently I scheduled the package with some parameters in sql server agent.
how do I lock the package(Package1.dtsx) if someone try to attempt running it in another sql server agent job with different parameters.
You can do this yourself by adding a flag and having your package check this flag before processing. Either quit out, loop until flag is clear or some other logic.
I personally have only ever had one agent per package and the agent handles the multiple execution scenarios.
Locking a package to prevent it from multiple executions is not possible. Think of it as a file. There is no way to lock a file from a user who has the rights to use it.
You can either create user groups/roles on SQL Server to segregate the execution depending on your needs/usage factors. To me, there is no straight forward way of locking a file from multiple executions. Sorry!

DataFlow task in SSIS is very slow as compared to writing the sql query in Execute SQL task

I am new to SSIS and have a pair of questions
I want to transfer 1,25,000 rows from one table to another in the same database. But When I use Data Flow Task, it is taking too much time. I tried using an ADO NET Destination as well as an OLE DB Destination but the performance was unacceptable. When I wrote the equivalent query inside an Execute SQL Task it provided acceptable performance. Why is such a difference in performance.
INSERT INTO table1 select * from table2
Based on the first observation, I changed my package. It is exclusively composed of Execute SQL Tasks either with a direct query or with a stored procedure. If I can solve my problem using only the Execute SQL Task, then why would one use SSIS as so many documents and articles indicate. I have seen as it's reliable, easy to maintain and comparatively fast.
Difference in performance
There are many things that could cause the performance of a "straight" data flow task and the equivalent Execute SQL Task.
Network latency. You are performing insert into table a from table b on the same server and instance. In an Execute SQL Task, that work would be performed entirely on the same machine. I could run a package on server B that queries 1.25M rows from server A which will then be streamed over the network to server B. That data will then be streamed back to server A for the corresponding INSERT operation. If you have a poor network, wide data-especially binary types, or simply great distance between servers (server A is in the US, server B is in the India) there will be poor performance
Memory starvation. Assuming the package executes on the same server as the target/source database, it can still be slow as the Data Flow Task is an in-memory engine. Meaning, all of the data that is going to flow from the source to the destination will get into memory. The more memory SSIS can get, the faster it's going to go. However, it's going to have to fight the OS for memory allocations as well as SQL Server itself. Even though SSIS is SQL Server Integration Services, it does not run in the same memory space as the SQL Server database. If your server has 10GB of memory allocated to it and the OS uses 2GB and SQL Server has claimed 8GB, there is little room for SSIS to operate. It cannot ask SQL Server to give up some of its memory so the OS will have to page out while trickles of data move through a constricted data pipeline.
Shoddy destination. Depending on which version of SSIS you are using, the default access mode for an OLE DB Destination was "Table or View." This was a nice setting to try and prevent a low level lock escalating to a table lock. However, this results in row by agonizing row inserts (1.25M unique insert statements being sent). Contrast that with the set-based approach of the Execute SQL Tasks INSERT INTO. More recent versions of SSIS default the access method to the "Fast" version of the destination. This will behave much more like the set-based equivalent and yield better performance.
OLE DB Command Transformation. There is an OLE DB Destination and some folks confuse that with the OLE DB Command Transformation. Those are two very different components with different uses. The former is a destination and consumes all the data. It can go very fast. The latter is always RBAR. It will perform singleton operations for each row that flows through it.
Debugging. There is overhead running a package in BIDS/SSDT. That package execution gets wrapped in DTS Debugging Host. That can cause a "not insignificant" slowdown of package execution. There's not much the debugger can do about an Execute SQL Task-it runs or it doesn't. A data flow, there's a lot of memory it can inspect, monitor, etc which reduces the amount of memory available (see pt 2) as well as just slows it down because of assorted checks it's performing. To get a more accurate comparison, always run packages from the command line (dtexec.exe /file MyPackage.dtsx) or schedule it from SQL Server Agent.
Package design
There is nothing inherently wrong with an SSIS package that is just Execute SQL Tasks. If the problem is easily solved by running queries, then I'd forgo SSIS entirely and write the appropriate stored procedure(s) and schedule it with SQL Agent and be done.
Maybe. What I still like about using SSIS even for "simple" cases like this is it can ensure a consistent deliverable. That may not sound like much, but from a maintenance perspective, it can be nice to know that everything that is mucking with the data is contained in these source controlled SSIS packages. I don't have to remember or train the new person that tasks A-C are "simple" so they are stored procs called from a SQL Agent job. Tasks D-J, or was it K, are even simpler than that so it's just "in line" queries in the Agent jobs to load data and then we have packages for the rest of stuff. Except for the Service Broker thing and some web services, those too update the database. The older I get and the more places I get exposed to, the more I can find value in a consistent, even if overkill, approach to solution delivery.
Performance isn't everything, but the SSIS team did set the ETL benchmarks using SSIS so it definitely has the capability to push some data in a hurry.
As this answer grows long, I'd simply leave it as the advantages of SSIS and the Data Flow over straight TSQL are native, out of the box
logging
error handling
configuration
parallelization
It's hard to beat those for my money.
If you are Passing SSIS Variables As Parameter in Parameter mapping Tab and assigning values to These Variables by Expression Then Your Execute SQL Task consume a lot of time in Evaluating that Expression.
Use Expression Task(Separately) To assign Variables Instead of using Expression in Variable Tab.

Can we use single database connection to execute multiple database select statements simultaneously from multiple threads?

I want to use single database connection from multiple threads to read (or to execute only select statements) in MS SQL Server in simultaneously. Is it possible to execute all these select statements in simultaneously from different threads.
I m using MS SQL Server from C++ in Linux environment. I need to create Database connection pools for reading and writing separately. So i want to know is there a possibility of sharing a same connection between threads to read only.
The select statements may return multiple rows (more than one row or result set). Will this be a problem?
Yes there will be a problem. Only one command can be executed at a time.
But you'll be fine using multiple connections, connection pooling works great for SQL server.
Don't use the same connection across threads. Only one command can be executed per connection. Create a connection for each thread. I'd suggest making a helper class to make this easier for you.
I want to use single database connection from multiple threads to read (or to execute only select
statements) in MS SQL Server in simultaneously
You should start reading the documentation and not start having funny ideas that whas you want matters here.
Yes, you CAN do that (MARS is the topic- read it up) but one connection can alwayso nly have one transaction context so it isa good approach to ahvem ultiple selects in one transaction (insert, couple of upserts etc.) but bad as a generic approach for progrmaming database connections.
Follow the recipes which mean iopen a conenction when you need it, close it when done and dont be afraid to run mtultiple connections.
If you worry about multi process in SQL Server, must think of two option.
Parallelism settings
AlwaysOn setup
The first one helps you by executing a single query with more than one cpu logical core; while the second one helps you to have a load balance, when you have concurrent connections.

SSIS Parallelism - Microsoft HPC Cluster?

I am new to SSIS, and am trying to use its Parallelism Feature to import data from a database.
My job is to do this: Import a multi terabyte database into a set of flat files as quickly as possible.
I was thinking of this:
I have a Microsoft Server 2008 HPC Cluster (of 3 nodes) at my disposal. I was thinking of writing a HPC SOA job so that all the three compute nodes can make independent connections to the SQL Server and import a portion of the data in parallel. Ofcourse this would have nothing to do with SSIS and be an independent utility.
Then I came across SSIS, and its parallel import features. MY SSIS Server is not very high end - only a 4GB Machine. I am somehow inclined to use SSIS because that's the ideal Microsoft way of doing data import - and I won't have to rewrite a lot of stuff and possibly use existing transformations etc.
What is the best way to use Custom Tasks (or available ones) and do this import in parallel?
Gitmo, I may misunderstand your question but will give it a shot. You need to move data from a SQL Server instance to multiple files, correct? You want to leverage the parallelised data movement functionality provided by SSIS. That means multiple simultaneously running Data Flow Tasks (DFTs). For each target file you can have only one DFT because of problems with concurrent writes.
To get multiple simultaneously running Data Flow Tasks where your source is a SQL Server database and your target is a set of files, you can possibly try the following ways (please note there are upper limits on the parallelization you can get out of SSIS based upon many factors including your CPU Core count, whether you are running in BIDS/Visual Studio or not, and various settings in your packages, your server(s), your SQL Server instance, and many other considerations):
The Multiple Simultaneous DFT Solution: A single SSIS Package with one Connection Manager pointed to the source SQL Server database and many Connection Managers each pointed to a separate target file, plus one DFT for each target file. The DFTs are all disconnected from one another (no precedence constraints or green/red/blue lines/arrows). If there are pre or post ETL steps that need to run a great way to parallelize these DFTs is to drop them all in a Sequence Container that is connected to the earlier and later tasks through precedence constraints/arrows. These disconnected DFTs in their own Sequence Container will try to all run simultaneously.
The Multiple Simultaneous DTEXEC Solution: Multiple SSIS packages each with their own target file-specific DFT. You manually run separate DTEXEC processes either through separate CMD windows or through the GUI. #3 below is a variation on this solution and possibly a better one.
The Parent Master Package Running Multiple Children Packages Solution: Wrap the per target file packages developed in #2 above in a single Parent Master package. In the Parent package have multiple simultaneously running Execute Package Tasks. Again these Execute Package Tasks would be disconnected from other tasks. A good way to do this is to drop the multiple Execute Package Tasks in their own Sequence Container. As before if the Execute Package Tasks are disconnected (no precedence constraints/arrows) they will all try to run simultaneously.
Take a look at this excellent article from the Microsoft SQLCAT Team for some more ideas/insight: Top 10 SQL Server Integration Services Best Practices
There are likely variations on these same ideas and possibly other solutions available both inside and outside of SSIS. Good luck!
please look this post ..... using multi threading out side ssis and acheiveing parallelism Multithreaded serial execution
with out modifying much of package
http://sqljunkieshare.com/2011/12/21/parallelism-in-etl-process-ssis-2008-and-ssis-2012/

Resources