SQL Server Delta Records Pulling - sql-server

In our SQL Server DB, we have about some 800+ tables and there are 40 - 50 tables are business critical tables. MIS team needs to generate reports based on those 50 business tables.
Those 50 tables gets updated frequently. MIS team requires those delta records (update/inserted/deleted)
What would be the best solution?
We have few approches here
1.Always On 2.Replication 3.Mirroring 4.Introducing new column (LastModifiedDate & creating index) in those 50 tables and pulling those records periodically and populating it to MIS environment.
There will be huge code change for the new column LastModifiedDate approach.
Based on those 50 tables, we have huge number of stored procedures which it has Insert/Update
statements. In those stored procedures, we need to do code change for LastModifiedDate.
What would be the best solution from the above approches?
Pls let us know if any other approach to do. Note: We are using SQL Server 2008 R2
Regards Karthik

One approach is to have insert, update and delete triggers on these tables, and for each table an archive table with exactly the same columns plus e.g. username, modifieddatetime and a bit to indicate new and old. Then the triggers simply insert into archive select from inserted/deleted + current user, current time and 1 for inserted and 0 for deleted.
Then all your MIS need to concern themselves with is the archive tables, and you will not need to make a structure change to the existing tables.

Related

it's possible to create a trigger to move data between databases in postgresql?

I will try to simplify my problem:
let's say that I have 2 databases, let's call them DBA and DBB,
I have this table on DBA
shopping
id - name - amount
and on my DBB I have this other table:
shopping_hist
id - name - amount
every end of the month, I generate a dump from table shopping on DBA and copy its data
on table shopping_hist on DBB, it's possible to create a trigger that for every insert on Shopping, it will also make an insert on Shopping_hist, since they are not even on the same database?
I know that if they were on the same database, even if not on the same schema, it would be possible, but I'm not finding anything to automate this when it's for distinct databases

How to use the pre-copy script from the copy activity to remove records in the sink based on the change tracking table from the source?

I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.
In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like
PersonID Age Name SYS_CHANGE_OPERATION
---------------------------------------------
1 12 John U
2 15 James U
3 NULL NULL D
4 25 Jane I
with PersonID being the primary key for this table.
The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.
I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?
You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Then you can do the following in your pre-copy script:
delete from TableName where PersonID in (#{activity('MyLookUp').output.firstRow.PersonIDs})
This way you will be deleting all the deleted or updated rows before inserting the new ones.
Hope this helped!
In the meanwhile the Azure Data Factory provides the meta-data driven copy task. After going through the dialogue driven setup, a metadata table is created, which has one row for each dataset to be synchronized. I solved this UPSERT problem by adding a stored procedure as well as a table type for each dataset to be synchronized. Then I added the relevant information in the metadata table for each row like this
{
"preCopyScript": null,
"tableOption": "autoCreate",
"storedProcedure": "schemaname.UPSERT_SHOP_SP",
"tableType": "schemaname.TABLE_TYPE_SHOP",
"tableTypeParameterName": "shops"
}
After that you need to adapt the sink properties of the copy task like this (stored procedure, table type, table type parameter name):
#json(item().CopySinkSettings).storedProcedure
#json(item().CopySinkSettings).tableType
#json(item().CopySinkSettings).tableTypeParameterName
If the destination table does not exist, you need to run the whole task once before adding the above variables, because auto-create of tables works only as long as no stored procedure is given in the sink properties.

SSIS Move Data Between Databases - Maintain Referential Integrity

I need to move data between two databases and wanted to see if SSIS would be a good tool. I've pieced together the following solution, but it is much more complex than I was hoping it would be - any insight on a better approach to tackling this problem would be greatly appreciated!
So what makes my situation unique; we have a large volume of data, so to keep the system performant we have split our customers into multiple database servers. These servers have databases with the same schema, but are each populated with unique data. Occasionally we have the need to move a customer's data from one server to another. Because of this, simple recreating the tables and moving the data in place won't work as in the database on server A there could be 20 records, but there could be 30 records in the same table for the database on server B. So when moving record 20 from A to B, it will need to be assigned ID 31. Getting past this wasn't difficult, but the trouble comes when needing to move the tables which have a foreign key reference to what is now record 31....
An example:
Here's a sample schema for a simple example:
There is a table to track manufacturers, and a table to track products which each reference a manufacturer.
Example of data in the source database:
To handle moving this data while maintaining relational integrity, I've taken the approach of gathering the manufacturer records, looping through them, and for each manufacturer moving the associated products. Here's a high level look at the Control Flow in SSDT:
The first Data Flow grabs the records from the source database and pulls them into a Recordset Destination:
The OLE DB Source pulls from the source databases manufacturer table while pulling all columns, and places it into a record set:
Back in the control flow, I then loop through the records in the Manufacturer recordset:
For each record in the manufacturer recordset I then execute a SQL task which determines what the next available auto-incrementing ID will be in the destination database, inserts the record, and then returns the results of a SELECT MAX(ManufacturerID) in the Execute SQL Task result set so that the newly created Manufacturer ID can be used when inserting the related products into the destination database:
The above works, however once you get more than a few layers deep of tables that reference one another, this is no longer very tenable. Is there a better way to do this?
You could always try this:
Populate you manufacturers table.
Get your products data (ensure you have a reference such as name etc. to manufacturer)
Use a lookup to get the ID where your name or whatever you choose matches.
Insert into database.
This will keep your FK constraints and not require you to do all that max key selection.

SQL Server Alternative to reseeding identity column

I am currently working on a phone directory application. For this application I get a flat file (csv) from corporate SAP that is updated daily that I use to update an sql database twice a day using a windows service. Additionally, users can add themselves to the database if they do not exist (ie: is not included in the SAP file). Thus, a contact can be of 2 different types: 'SAP' or 'ECOM'.
So, the Windows service downloads the file from a SAP ftp, deletes all existing contacts in the database of type 'SAP' and then adds all the contacts on the file to the database. To insert the contacts into the database (some 30k), I load them into a DataTable and then make use of SqlBulkCopy. This works particularly, running only a few seconds.
The only problem is the fact that the primary key for this table is an auto-incremented identity. This means that my contact id's grows at a rate of 60k per day. I'm still in development and my id's are in the area of 20mil:
http://localhost/CityPhone/Contact/Details/21026374
I started looking into reseeding the id column, but if I were to reseed the identity to the current highest number in the database, the following scenario would pose issues:
Windows Service Loads 30 000 contacts
User creates entry for himself (id = 30 001)
Windows Service deletes all SAP contacts, reseeds column to after current highest id: 30 002
Also, I frequently query for users based on this this id, so, I'm concerned that making use of something like a GUID instead of an auto-incremented integer will have too high a price in performance. I also tried looking into SqlBulkCopyOptions.KeepIdentity, but this won't work. I don't get any id's from SAP in the file and if I did they could easily conflict with the values of manually entered contact fields. Is there any other solution to reseeding the column that would not cause the id column values to grow at such an exponential rate?
I suggest following workflow.
import to brand new table, like tempSAPImport, with your current workflow.
Add to your table only changed rows.
Insert Into ContactDetails
(Select *
from tempSAPImport
EXCEPT
SELECT Detail1, Detail2
FROM ContactDetails)
I think your SAP table have a primary key, you can make use of the control if a row updated only.
Update ContactDetails ( XXX your update criteria)
This way you will import your data fast, also you will keep your existing identity values. According to your speed requirements, adding indexes after import will speed up your process.
If SQL Server version >= 2012 then I think the best solution for the scenario above would be using a sequence for the PK values. This way you have control over the seeding process (you can cycle values).
More details here: http://msdn.microsoft.com/en-us/library/ff878091(v=sql.110).aspx

Split a large table to some tables and cross table query

I have a SQL Server 2008 R2 Enterprise Edition, that holds a single database,
which itself has almost nothing except a large table(blog table)。
The table is 100+ million rows (35 columns) and growing at around 200,000 rows per day. We need all the data to be "online",
and most of the columns need to be searchable in some fashion.
I would like to split the table to some small tables by month. Example:
table1 :1/1/2013-1/31/2013
table2 :2/1/2013-2/28/2013
table3 :3/1/2013-3/31/2013
table4 :4/1/2013-4/29/2013
.....
table12 :12/1/2013-12/31/2013
Assume that a user post some blog entries at 2 month(1 blog entry),4 month(2 blog entries ), 10 month(5 blog entries)
, 11 month(no blog entry) and 12 month(no blog entry)
There is a requirement:
Assume that current date is 12/20/2013.In order to get the recently 10 blog entries about this user,I have to union all 12 tables
I think this design is inefficiency.if so? How to design ? Thanks!
You might want to rather have a look at using Partitioned Table and Index Concepts
Partitioning makes large tables or indexes more manageable, because
partitioning enables you to manage and access subsets of data quickly
and efficiently, while maintaining the integrity of a data collection.
SQL Server has this functionality built in for you, so dont try to manage this on your own.
Also look at Designing Partitioned Tables and Indexes
Please take not that
Partitioned tables and indexes are available only on the Enterprise,
Developer, and Evaluation editions of SQL Server.
But in your case that would be fine.

Resources