Loading data from source table into multiple tables - sql-server

I have a source table called table1, which populates every night with new data. I have to upload the data in 4 relational tables tblCus, tblstore, tblEmp, tblExp. Should I use a stored procedure or trigger to accomplish that?
Thanks

If it is a simple load, you can use a DataFlow Task to select from table1.
Assuming table1 is your source table for your 4 tables.
Then you can use a Conditional split task which acts like a where clause, where you can set your definitions for tblCus, tblstore, tblEmp, tblExp and then add 4 destinations for these.
Look at my example:
Conditional split:

In SQL Server, there is always more than one way to skin a cat. From your question, I am making the assumption that you are denormalizing 4 tables from an OLTP-style database, into a single dimension in a data warehouse style application.
If the databases are on separate instances, or if only simple transformations are required, you could use SSIS (SQL Server Integration Services).
If this is correct, and the databases reside on the same instance, then you could use a stored procedure.
If the transformation is part of a larger load, then you could combine the two methods, and use SSIS to orchestrate the transformations, but simply call-off stored procedures in the control flow.
The general rule I use, to decide if I should use a data flow, or a stored procedure, for a specific transformation is: Data Flow is my preference, but if I will require any asynchronous transformations within the data flow, I revert to a stored procedure. This general rule USUALLY gives the best performance profile.
I would avoid triggers, especially if there are a large number of DML operations against the 4 tables, as the trigger will fire for each modification and potentially cause performance degradation.

Related

How to get a list of updated/inserted rows into a SQL Server database after multiple stored procedure have executed?

Consider Java application reading/modifying data from a SQL Server database using only stored procedures.
I am interested in knowing exactly what rows were inserted/updated after execution of some code.
Code that is executing could trigger multiple stored procedures and these procedures are working with different tables in general case.
My current solution is to debug low level Java code executed before any of stored procedures is called and inspecting parameters passed, to manually reconstruct impacts.
This seem to be ineffective and unreliable.
Is there a better approach?
To know exactly what rows were inserted/updated after execution of some code, you can implement triggers for UPDATE, DELETE and INSERT operations for the tables involved. These triggers should be almost the same for every table, changing just the name and the association with its table.
For this suggestion, these tables should have audit columns, like one for the datetime when they rows were inserted and one for datetime when they rows were updated - at least. You can search for more audit ideas if you want (and need), like a column to know wich user triggered the insert/update, or how many times the row was altered, an so on.
You should elaborate a different approach to achieve this depending of how much data you intend to generate with these triggers.
I'm assuming you know how to do this with best practices (for example, you can [and should, IMHO] create these triggers dinamically to facilitate maintenance).
Finally, you will be able to elaborate a query from sys tables that contains information about tables and rows and return only the rows involved, ordered by these new columns (just an idea that I hope fits in your particular case).

How Deadlock Happen resolved on Same Table insertion and update

I have a SSIS package with a task to load data. For some reason i need to update and insert same destination table. This happen deadlock
I use SSIS MULTI-CAST control.
What to do? how to resolve this situation?
In your OLE DB Destination, change the access mode from "FastLoad" to "Table or View". The former will take a table lock which is generally better for large inserts but in your scenario, you need the table to remain "unlocked." Your performance will suffer since you'll be issuing singleton inserts but I guess that doesn't really matter since you'll also be doing singleton updates with your "OLE DB Command"
Finally, I think you're doing this wrong. The multicast essentially duplicates a row so that you can direct it to N components. I generally see people trying to detect whether a row exists in the target and then either insert or update it based on that lookup. But that's the lookup component, not a multicast. Maybe you're doing a type 2 dimension or something but even then, there will be better ways to accomplish this versus what you're showing in the picture.
Your way seems strange, as billinkc said, you are effectively double data rows and perform INSERT and UPDATE actions with the same table concurrently from two different connections/contexts. This have to end in a deadlock.
I would use alternative approach - do required transforms with the data, and then write it to an intermediate table in the DataFlow. Then on the next SSIS task - execute MS SQL MERGE - Microsoft table upsert - with OLE DB Command. This will assure you do not have a deadlock between concurrent operations, logic of the MERGE could be quite flexible.
Last but not the least - use dedicated or global ##temp table for an intermediate table, Working with regular MS SQL #temp tables in SSIS is little tricky. Do not forget to clean up intermediate before and after MERGE, or create and dispose of ##temp table properly.

How to wrap or observe using SQL CLR types

I am aware of DB triggers and not seeking that option. I am wondering if its possible to observe DB tables with a reader from the middle ware.
Question - is it possible to get a observable wrapper using the SQL CLR types in C# on a Database table, for e.g. I have a table of tickets, how do I watch the table.
No, SQLCLR does not provide any special means of doing this.
The best, most appropriate, and only way to accomplish getting notifications of table modifications is through Triggers. That is what they are meant to do. You can capture data changes via DML Triggers (for INSERT, UPDATE, and DELETE operations), and you can capture structural changes via DDL Triggers (for ALTER TABLE, CREATE / ALTER / DROP TRIGGER, and CREATE / DROP INDEX operations).
You can create either type of Trigger using either pure T-SQL or using SQLCLR, though the SQLCLR option doesn't afford much benefit over T-SQL besides being able to access the inserted and deleted pseudo-tables via Dynamic SQL. The other reason to use a SQLCLR Trigger would be if you just need all rows of data from one or both of those pseudo-tables for a single operation. Else you could just call a SQLCLR User-Defined Function in a T-SQL Trigger if you needed to handle something on a per-row basis.
Remember, Triggers are part of the Transaction that is internally created (if no Transaction is currently active) when the DML operation starts. This way any changes made by the Trigger can be rolled-back if the DML operation ultimately fails.
If you want very light-weight notifications, you can do one of the following, but keep in mind that both will side-step the Transaction (i.e. cannot take back notification of an operation that fails to complete) and so can easily result in false-positives (i.e. notifications of modifications that never committed):
Send emails via sp_send_dbmail. This is asynchronous so should not adversely impact performance. But you do need to format the full data modification report as a string (perhaps as HTML?) in the trigger rather than attaching the results of a query since the query for the email won't have access to the pseudo-tables.
Use SQLCLR to dump desired info to a text file. You just need to be careful to allow for multiple, concurrent write-requests to the file, else concurrent DML statements will be negatively impacted.
A quick note about Query Notifications since the question was tagged with SqlDependency:
Query Notifications are not really helpful here. In addition to a list of restrictions on what queries are eligible for Query Notifications, they only indicate that the result set of a particular query has changed. So even if you set up simple notifications for SELECT * FROM table;, it won't tell you what changed specifically (i.e. the rows in the INSERTED and/or DELETED pseudo-tables available in Triggers). Still, if you are interested in reading up on them, here are two helpful links:
Working with Query Notifications
Query Notifications in SQL Server

Is it possible to use SSIS in order to fill multiple tables with inheritance at the same time?

I've got an MS SQL database with 3 inherent tables. A general one called "Tools" and two more specific ones "DieCastTool" and "Deburrer".
My task is it to get data out of an old MS Access "database". My Problem is, that I have to do a lot of search and filtering stuff untill I've got my data that I'd like to import into the new DB. So finally I don't want to do theses steps several times but populate the 3 tables at the same time.
Therefore I am using da Dataflowtarget in which I am not selecting a certain table, but using an sql select statement (with inner joins on the id columns) to receive all the fields of the 3 tables. Then I match the columns and in my opinion it should work. And it does as long as I only select the columns of the "Tools" table to be filled. When adding the columns of the child-tables, unfortunately it does not and returns me the ErrorCode -1071607685 => "No status is available"
I can think of 2 reasons why my solution does not work:
SSIS simply can't handle inheritance in SQL Tables and sees them as individual tables. (Maybe SSIS even can't handle filling multiple tables in one Dataflowtarget element?)
I am using SSIS in a wrong way.
Would be nice if someone could confirm/decline reason 1 because I have not found anything on this topic.
Yes, SSIS is not aware of table relationships. Think of it as: SSIS is aware of physical objects only, not your logical model.
I didn't understand how You got that error. Anyway, here are solutions:
if there are no FK constraints between those tables, You can use one source component, one multicast and 3 destinations in one dataflow
if there are FK constraints and You can disable them, then use
Execute SQL Task to disable (or drop) constraints, add the same
dataflow as in 1. and add another Execute SQL Task to enable (or
create) constraints
if there are FK constraints and You can't disable them, You can use
one dataflow to read all data and pass it to subsequent dataflows to
fill table by table. See
SSIS Pass Datasource Between Control Flow Tasks.

What is a good approach to preloading data?

Are there best practices out there for loading data into a database, to be used with a new installation of an application? For example, for application foo to run, it needs some basic data before it can even be started. I've used a couple options in the past:
TSQL for every row that needs to be preloaded:
IF NOT EXISTS (SELECT * FROM Master.Site WHERE Name = #SiteName)
INSERT INTO [Master].[Site] ([EnterpriseID], [Name], [LastModifiedTime], [LastModifiedUser])
VALUES (#EnterpriseId, #SiteName, GETDATE(), #LastModifiedUser)
Another option is a spreadsheet. Each tab represents a table, and data is entered into the spreadsheet as we realize we need it. Then, a program can read this spreadsheet and populate the DB.
There are complicating factors, including the relationships between tables. So, it's not as simple as loading tables by themselves. For example, if we create Security.Member rows, then we want to add those members to Security.Role, we need a way of maintaining that relationship.
Another factor is that not all databases will be missing this data. Some locations will already have most of the data, and others (that may be new locations around the world), will start from scratch.
Any ideas are appreciated.
If it's not a lot of data, the bare initialization of configuration data - we typically script it with any database creation/modification.
With scripts you have a lot of control, so you can insert only missing rows, remove rows which are known to be obsolete, not override certain columns which have been customized, etc.
If it's a lot of data, then you probably want to have an external file(s) - I would avoid a spreadsheet, and use a plain text file(s) instead (BULK INSERT). You could load this into a staging area and still use techniques like you might use in a script to ensure you don't clobber any special customization in the destination. And because it's under script control, you've got control of the order of operations to ensure referential integrity.
I'd recommend a combination of the 2 approaches indicated by Cade's answer.
Step 1. Load all the needed data into temp tables (on Sybase, for example, load data for table "db1..table1" into "temp..db1_table1"). In order to be able to handle large datasets, use bulk copy mechanism (whichever one your DB server supports) without writing to transaction log.
Step 2. Run a script which as a main step will iterate over each table to be loaded, if needed create indexes on newly created temp table, compare the data in temp table to main table, and insert/update/delete differences. Then as needed the script can do auxillary tasks like the security role setup you mentioned.

Resources