Using MSFT SQL Server for data warehouse (star schema) - sql-server

We currently have a Microsoft SQL Server instance (oltp) we use as our transactional and reporting database. We want to pull out and create a separate database for reporting.
We are currently vetting Redshift and Snowflake. We came up with a question today which is why can't we create a new SQL Server instance for reporting which has the star schema and just use that (instead of redshift or snowflake)? We don't have many tables over a million rows. So maybe using a columnar data warehouse is over kill for us.
Does any know the pros and cons of using Microsoft SQL Server as a reporting database (data warehouse) with a star schema?
We also have a requirement to handle real time or near real time updates.

You can use SQL Server as a data warehouse repository. As long as you have a well designed star schema there is no reason not to use it for that purpose.

Related

Create a Data Warehouse with the database on SQL Developer

I have a database in SQL Developer which pull data from an ERP tool and I would like to create a Data warehouse in order to connect it then to PowerBI.
It's my first time that I am doing all this process from the beginning so I am not so experienced.
So where are you suggesting to create the Data Warehouse (I was thinking on SSMS) and how can I connect it with PowerBI ?
My Data Warehouse will consist from some View of my tables and some Joins to get some data in the structure that I want since it is not possible to change anything in the DB.
Thanks in advance.
A "data warehouse" is just a database. The distinction is really more about the commonly used schema design, in the sense that a warehouse is often built along the lines of a star or snowflake design.
So if you already have a database that is extracting data from your ERP, there is nothing to stop you from pointing PowerBI directly at that and performing some analytics etc. If your intention is to start with this database, and then clone/extract/load this data into a new database which is a star/snowflake schema, then that's a much bigger exercises.

Oracle to SQLServer export

I have to move data from existing database oracle to which I don't have direct access. The data is about 11 tables, 5GB each. The database admin can export the tables to some .csv or xml. The problem with csv is that some data is textual with lots of special characters. The problem with xml is that the markup is an overhead which will increase significantly the size of the files. The DBA admin is not competent enough to provide a working and neat solution. He uses toad as the database tool. Can you provide some ideas how to perform such a migration in the best possible way?
Please refer the below steps to migrate the data from Oracle to SQL server.
Recommended Migration Process
To successfully migrate objects and data from Oracle databases to SQL Server, Azure SQL DB, or Azure SQL Data Warehouse, use the following process:
1.Create a new SSMA project.
2.After you create the project, you can set project conversion, migration, and type mapping options. For information about project settings, see Setting Project Options (OracleToSQL). For information about how to customize data type mappings, see Mapping Oracle and SQL Server Data Types (OracleToSQL).
3.Connect to the Oracle database server.
4.Connect to an instance of SQL Server.
5.Map Oracle database schemas to SQL Server database schemas.
6.Optionally, Create assessment reports to assess database objects for conversion and estimate the conversion time.
7.Convert Oracle database schemas into SQL Server schemas.
8.Load the converted database objects into SQL Server.
You can do this in one of the following ways:
* Save a script and run it in SQL Server.
* Synchronize the database objects.
9. Migrate data to SQL Server.
10.If necessary, update database applications.
For more details :
[https://learn.microsoft.com/en-us/sql/ssma/oracle/migrating-oracle-databases-to-sql-server-oracletosql?view=sql-server-2017]
After the admin export data into CSV, try to convert it into a character set which will recognize all special characters.
Then, try to follow the steps from this link: link, it might work.
If after the import, there are still special characters, thy to manually convert them.
Get the DBA to export the tables using the ASCII delimiters which were designed for this purpose:
Row delimiter: Decimal 30 / 0x1E
Column delimiter: Decimal 31 / 0x1F
Then you can use BCP (or any other similar product) to upload the data to SQL Server.

AWS Glue: SQL Server multiple partitioned databases ETL into Redshift

Our team is trying to create an ETL into Redshift to be our data warehouse for some reporting. We are using Microsoft SQL Server and have partitioned out our database into 40+ datasources. We are looking for a way to be able to pipe the data from all of these identical data sources into 1 Redshift DB.
Looking at AWS Glue it doesn't seem possible to achieve this. Since they open up the job script to be edited by developers, I was wondering if anyone else has had experience with looping through multiple databases and transfering the same table into a single data warehouse. We are trying to prevent ourselves from having to create a job for each database... Unless we can programmatically loop through and create multiple jobs for each database.
We've taken a look at DMS as well, which is helpful for getting the schema and current data over to redshift, but it doesn't seem like it would work for the multiple partitioned datasource issue as well.
This sounds like an excellent use-case for Matillion ETL for Redshift.
(Full disclosure: I am the product manager for Matillion ETL for Redshift)
Matillion is an ELT tool - it will Extract data from your (numerous) SQL server databases and Load them, via an efficient Redshift COPY, into some staging tables (which can be stored inside Redshift in the usual way, or can be held on S3 and accessed from Redshift via Spectrum). From there you can add Transformation jobs to clean/filter/join (and much more!) into nice queryable star-schemas for your reporting users.
If the table schemas on your 40+ databases are very similar (your question doesn't clarify how you are breaking your data down into those servers - horizontal or vertical) you can parameterise the connection details in your jobs and use iteration to run them over each source database, either serially or with a level of parallelism.
Pushing down transformations to Redshift works nicely because all of those transformation queries can utilize the power of a massively parallel, scalable compute architecture. Workload Management configuration can be used to ensure ETL and User queries can happen concurrently.
Also, you may have other sources of data you want to mash-up inside your Redshift cluster, and Matillion supports many more - see https://www.matillion.com/etl-for-redshift/integrations/.
You can use AWS DMS for this.
Steps:
set up and configure DMS instance
set up target endpoint for redshift
set up source endpoints for each sql server instance see
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
set up a task for each sql server source, you can specify the tables
to copy/synchronise and you can use a transformation to specify
which schema name(s) on redshift you want to write to.
You will then have all of the data in identical schemas on redshift.
If you want to query all those together, you can do that by wither running some transformation code inside redsshift to combine and make new tables. Or you may be able to use views.

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

How to performing Quality Control in SQL Server 2008 R2

I am very new to SQL Server. I am trying to determine the best way to conduct quality control of data stored within a SQL Server 2008 R2 Standard Edition database.
The types of QC tests to be conducted include data integrity, referential integrity, and business logic checks. The output needs to be a table where each record represents a dataset tested and each column represents a test conducted. Depending on the test, values for each column should either be a number representing how many records in the dataset failed, or a list of ID's representing records that failed.
I'm not sure where to begin... Can this be done using simple SQL queries or should this be done using Reporting Services or some other tool provided with SQL Server?
Start by building your queries in SSMS.
Once you get to stable queries, then you could go to SSRS if you want to enhance the presentation and delivery of the data, or to SSIS if you want to automate and flexibility to output to many different systems, or look at a simple SQL Agent job if you just want to copy data to a different table.
SSRS is aimed at read-only access with nice graphical presentation and delivery formats.
SSIS is aimed at flexible data integration tasks, but not much UI.
SSMS is the general purpose SQL server authoring tool. Both SSRS and SSIS can use the SQL you write in SSMS.
(I think this answers your question; Is this what you were looking for?)

Resources