How to performing Quality Control in SQL Server 2008 R2 - sql-server

I am very new to SQL Server. I am trying to determine the best way to conduct quality control of data stored within a SQL Server 2008 R2 Standard Edition database.
The types of QC tests to be conducted include data integrity, referential integrity, and business logic checks. The output needs to be a table where each record represents a dataset tested and each column represents a test conducted. Depending on the test, values for each column should either be a number representing how many records in the dataset failed, or a list of ID's representing records that failed.
I'm not sure where to begin... Can this be done using simple SQL queries or should this be done using Reporting Services or some other tool provided with SQL Server?

Start by building your queries in SSMS.
Once you get to stable queries, then you could go to SSRS if you want to enhance the presentation and delivery of the data, or to SSIS if you want to automate and flexibility to output to many different systems, or look at a simple SQL Agent job if you just want to copy data to a different table.
SSRS is aimed at read-only access with nice graphical presentation and delivery formats.
SSIS is aimed at flexible data integration tasks, but not much UI.
SSMS is the general purpose SQL server authoring tool. Both SSRS and SSIS can use the SQL you write in SSMS.
(I think this answers your question; Is this what you were looking for?)

Related

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

How to sync schema continuously in two databases in Sql Server (for unit testing)

We have a database against which we run unit tests for components that require a database (for several reasons we are not mocking the DAL everywhere).
We are using Sql Server 2008 R2 and in the development db server we have our development database (ApplicationName_Dev) and our testing db (ApplicationName_UT).
The unit tests create the test data they need and delete it afterwards so the tables could/should be empty when no tests are running.
The problem is keeping the schema of the unit test database up to date.
The best solution for me (to my limited knowledge) would be to have a Sql Server Agent Job that would run once a night (or when manually started) that would drops all the tables in the UT database, generate a create script for all tables, indexes and relationships in the Dev-database, and run the create scripts on the UT-database. Note that we don't need to insert any data.
Is there any way of programmatically (T-Sql, SMO etc) generating Create scripts for all tables including indexes and relationships?
In Management Studio I can right click the database->Tasks->Generate scripts...->Choose Objects->Tables and I get just the scripts that I want (except for the "Use [ApplicationName_Dev]" on the first line.
Please help.
Regards,
Mathias
I'd create an SSIS package - there's a task called "Transfer SQL Server Objects Task". Specify your Source and Destination Connections & Databases, set DropObjectsFirst to True, and CopyAllObjects (or just CopyAllTables and CopyAllViews) also, and you should be set. (And obviously, don't set CopyData to true).
You also need to set the CopyIndexes and other such table options, for those table structures you want.
Setting up a job to run an SSIS package is also quite easy.
You could use a tool like SQL Delta. You create a "script" (SQL Delta specific script) using SQL Delta and essentially , what you can do is get it to sync the source database with the destination database. It can also pump in data into some or all tables if needed.
The whole process can be automated using a scheduled job using the Scheduler (part of Windows).

How do you pull data from SQL Server to Oracle?

I'm wanting to take data from a SQL Server table and populate a Oracle table. Right now, my solution is to dump the data into a Excel table, write a macro to create a sql file that I can load into Oracle. The problem with this is I want to automate this process and I'm not sure I can automate this.
Is there an easy way to automate populating a Oracle table with data from a SQL Server table?
Thanks in advance
I suppose it depends on your definition of "easy".
The most robust approach would be to either use heterogeneous connectivity in Oracle to create a database link to the SQL Server database and then pull the data from SQL Server or to create a linked server in SQL Server that connects to Oracle and then push the data from SQL Server to Oracle.
Yes. Take a look at MS SQL's SSIS which stands for SQL Server Integration Services. SSIS allows all sorts of advanced capabilities, including automated with Sql Server Jobs, for moving data between disparate data sources. In your case, connecting to Oracle can be achieved a variety of ways.
There are three ways to automate this:
1) You can do as Paul suggested and created an SSIS package that will do this and it can be scheduled via SQL Agent,
2) If you don't want to deal with SSIS, you can download the free SQL# (SQLsharp) CLR Library from http://www.SQLsharp.com/ and use the DB_BulkCopy Stored Procedure to do this in a T-SQL Stored Proc which can also be scheduled via SQL Agent. [note: I am the author of SQL#]
3) You can also set up a Linked Server from SQL Server to Oracle, but this has the draw-back of being a potential security hole. Of course, you could use an Oracle Login that only has write-access to that single table (or something similar to that).
There are lots and lots of ways to do it. Which you choose depends on your requirements.
Using Excel is fine if it's a one time thing.
If it's a once-in-a-while thing, then you could write a simple .NET app that uses a single DataSet and multiple DataAdapters to do the data dump. C# code example here.
if it's a regular thing, then you could put the above in a Schtasks task, or you could use SSIS. I think SSIS is an extra-cost option.
if the requirement is for "online access", then a linked database is probably appropriate.

Best way to migrate export/import from SQL Server to oracle

I'm faced with needing access for reporting to some data that lives in Oracle and other data that lives in a SQL Server 2000 database. For various reasons these live on different sides of a firewall. Now we're looking at doing an export/import from sql server to oracle and I'd like some advice on the best way to go about it... The procedure will need to be fully automated and run nightly, so that excludes using the SQL developer tools. I also can't make a live link between databases from our (oracle) side as the firewall is in the way. The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting.
What I'm thinking about is writing a monster query for SQL Server (which I mostly have already) that will denormalise and read out the data from SQL Server into a flat file using the sql server equivalent of sqlplus as a scheduled task, dump into a Well Known Location, then on the oracle side have a cron job that copies down the file and loads it with sql loader and rebuilds indexes etc.
This is all doable, but very manual. Is there one or a combination of FOSS or standard oracle/SQL Server tools that could automate this for me? the Irreducible complexity is the query on one side and building indexes on the other, but I would love to not have to write the CSV dumping detail or the SQL loader script, just say dump this view out to CSV on one side, and on the other truncate and insert into this table from CSV and not worry about mapping column names and all other arcane sqlldr voodoo...
best practices? thoughts? comments?
edit: I have about 50+ columns all of varying types and lengths in my dataset, which is why I'd prefer to not have to write out how to generate and map each single column...
"The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting."
You are really looking for an ETL tool. If you have no money in the till, I suggest you check out the Open Source Talend and Pentaho offerings.

backup data for reporting

What is the best method to transfer data from sales table to sales history table in sql server 2005. sales history table will be used for reporting.
Take a look at SSAS. OLAP is built for reporting and is easy to query with tools like excel pivot tables.
Bulkcopy is fast and it will not use the transaction log. One batch run at the end of the day.
Deleting the copied records from your production server is a different situation that needs to be planed on that server's maintenance approach/plans. Your reporting server solution should not interfere with or affect the production server.
Keep in mind that your reporting server is not meant to be a backup of the data but rather a copy made exclusively for reporting purposes.
Also check on the server settings of your reporting server to be on Simple recovery model.
Most solutions will require 2 steps;
-copy the records from source to target
-delete records from source.
It is essential that your source table have a primary key.
The "best" method depends on a lot of things.
How many records?
Is this a production environment?
What tools do you have?
Unless you are moving a large amount of data, a simple stored procedure should do the trick.
A sql server job can manage the timing of when to call the proc.
if you just want to move the data to another table, use BulkCopy/BulkInsert. if you want to build reporting I would suggest a BI solution such as MS Analysis Service (OLAP).
It is difficult and in my opinion ugly to maintain two or more history/archive tables in the same database. For a reporting solution you will be considering all the tables for that piece of information anyway. History/Archive tables should only be used if you are going to put the data away and not touch it for a long period of time, ie. archive it away outside the operational DB.

Resources