Database Problems - database

I have a database schema in an Oracle database. I also have data dumps from third party vendors. I load their data using sql loader scripts on a Linux machine.
We also have batch updates everyday.
The data is assumed to be free from data errors. E.g. if on the first day a data viz 'A' is inserted into the db and the data 'A' would not occur in the further loading (assumption). If we get a data named 'A' then we get a primary key violation.
Question: To avoid these violations should we build an analyzer to analyze the data errors or are there better solutions.

I built an ETL system for a company that had daily feeds of flat files containing line of business transaction data. The data was supposed to follow a documented schema but in practice there were lots of different types of violations from day to day and file to file.
We built SQL staging tables containing all nullable columns with bigger than ought to be needed varchars and loaded up the flat file data into these staging tables using efficient bulk-loading utilities. Then we ran a series of data consistency checks within the context of the database to ensure that the raw (staged) data could be cross-loaded to the proper production tables.
Nothing got out of the staging table environment until all of the edits were passed.
The advantage of loading the flat files into staging tables is that you can take advantage of the RDBMS to perform set actions and to easily compare new values with existing values from previous files, all without having to build special flat file handling code.

Related

Load all CSVs from path on local drive into AzureSQL DB w/Auto Create Tables

I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this

Methods to transfer Tables from source database to destination database using SSIS dynamically

I am relatively new to SSIS and have to come up with a SSIS package for work such that certain tables must be dynamically moved from one SQL server database to another SQL server database. I have the following constraints that need to be met:
Source table names and destination table names may differ so direct copying of table does not work with transfer SQL server object task.
Only certain columns may be transferred from source table to destination table.
This package needs to run every 5 minutes so it has to be relatively fast.
The transfer must be dynamic such that if there are new source tables, the package need not be reconfigured with hard coded values.
I have the following ideas for now:
Use transfer SQL Server object task but I'm not sure if the above requirements can be met, especially selective transfer of tables and dynamic mapping of columns.
Use SQLBulkCopy in a script component to perform migration.
I would appreciate if anyone could give some direction as to how I can go about meeting the requirements and if my existing ideas are possible.

What the Process to transfer the staging table data to Fact tables in Snowflake by Custom Validations

good Day.
I need help. I want to transfer the data in Snowflake from Staging tables to Fact tables automatically, when data is available in Stage table. While moving data from Staging table to Fact tables, I have couple of Custom validations on each column and row.
Any idea how to do this in Snowflake.
If any one knows could you please suggest me...!
Thanks in Advance...!
There are many ways to do this and how you go about it depends on what tools you have available. The simplest way to do this without using tools outside of the Snowflake ecosystem would be:
On each of the staging tables you have, set up a stream on these tables (here is the Snowflake documentation on streams)
Create a task that runs on a schedule (here is the Snowflake doc on tasks) to pull from the streams and write into the fact table.
This is really a general data warehousing question rather than a Snowflake one. Here is some more documentation on building SCD type 2 dimensions also written by someone at Snowflake
Assuming "staging tables" refers to a Snowflake table and not a file in a Snowflake stage, I would recommend using a Stream and Task for this. A stream will identify the delta of data that needs to be loaded, and a Task can execute on a schedule and will only actually run something if there is data in the stream. Create a stored procedure that is executed in the Task to run your validations and Merge the outcome of those into your Fact.

Load data, keys and indexes with SQL Server Integration Services (SSIS)

I have created the package SQL Server Integration Services (SSIS) that loads data from one server to another (records from table to records to another table).
It works properly, but unfortunately destination table do not have a keys and indexes (source table has).
How to load data with keys and indexes?
SSIS is used to move data from one place to another. Keys and indexes are part of the structure of the destination table not part of the data itself and so SSIS cannot "load" them. Potentially the destination structure you move the data into could be very different from the source (and in fact I'd expect this in most cases if you're moving data out of a transactional system into a data warehouse for example). You also need to consider that it could be reading from multiple sources each with different indexes and keys.
If you're looking to replicate structure rather than the data then you need a different tool. This could be a simple as using SSMS to script the table out from the source and re-running on the destination or something more advanced such as using Visual Studio database projects.

Design for importing definition data from Excel into SQL Server

We have Restaurant Inventory Control system that uses SQL Server 2008 R2.
It takes a very long time to add all the definition data: stock items, yields, packsizes, recipes, categories etc. So, our clients have asked if they can upload it from Excel.
Before I just jump in and start, I want to find out if there is a best practice way to do this.
I know all the tools: SSIS, stored procedures etc. But I'm looking for advice/resources that can help with the design process. How best to setup the spreadsheet, validate the data, create the child/parent relationships etc.
This must be a fairly common project -- so it must have a standard design/approach and that's what I'm looking for.
I think the design will depend on the technologies you're most comfortable with. If you're comfortable with SSIS and stored procedures, this is the general pattern I would use:
Excel Template - I wouldn't spend too much time on this, add the headers and sheets necessary for the tables. You can lock down certain things and/or implement rules, but most of your validation would be done in stored procs.
SSIS - Have a package that loads the excel data into Staging tables, have rows with errors get added to an error log to be presented to the user along with the validation issues from the stored procedures.
Staging Tables - Have one staging table per sheet/production table, have an ExecutionId column in each staging table to allow parallel processing. Allow all columns to be NULL so you can get the data in the staging tables or set the proper null conditions and have SSIS redirect these rows on error. Don't have any primary key / foreign key relationships in the staging tables, these can be validated in the stored procedure
Stored Procedures - Validate the staging data, any issues found would be added to the error log to be presented to the user or person performing the import. If there are no issues, import the data into the production tables. If there is existing data in the production tables, you could do a comparison and update if applicable.

Resources