SQL Server SSIS and Oracle Data Pump - sql-server

I'm a beginner, and trying to improve my knowledge on DB side.
I am learning SSIS with SQL Server 2008R2. Going by the tutorials from Web, I feel like this is somewhat similar to what I've read about Oracle Data Pump.
Can someone enlighten me, if there is similarity between two SSIS and Data Pump.
If they are totally different, please forgive me for this question. Else, let me know how they are similar.
Regards,
Justin

Data Pump is not a complete ETL tool,it is a feature in Oracle 11g.It transfers the data from a single file to a single destination.With SSIs you got all the extraction ,transformation and loading facilities.
Corresponding to SSIS ,oracle has oracle warehouse builder.
Oracle data pump is an alternative to EXPORT and Import utility in SQL SERVER.

I have never heard of Data Pump but initial googlings show it is more related to being a Data Flow Task within an SSIS package than being a substitute for a whole SSIS package. Data Pump is simply porting data from a single source to a single destination. An SSIS package can facilitate extracting, transforming, and loading any amount of sources to any amount of destinations within the same package. You also get the extensibility (if that is a word?) of writing .NET code or any other 3rd party assemblies that you would like to use to further do manipulation of data. You can also do file and DB maintenance with an SSIS package (clean up after processing of files, maintaining backups, etc.).

Related

Extract from Progress Database to SQL Server

I'm looking for the best approach (or a couple of good ones to choose from) for extracting from a Progress database (v10.2b). The eventual target will be SQL Server (v2008). I say "eventual target", because I don't necessarily have to connect directly to Progress from within SQL Server, i.e. I'm not averse to extracting from Progress to a text file, and then importing that into SQL Server.
My research on approaches came up with scenarios that don't match mine;
Migrating an entire Progress DB to SQL Server
Exporting entire tables from Progress to SQL Server
Using Progress-specific tools, something to which I do not have access
I am able to connect to Progress using ODBC, and have written some queries from within Visual Studio (v2010). I've also done a bit of custom programming against the Progress database, building a simple web interface to prove out a few things.
So, my requirement is to use ODBC, and build a routine that runs a specific query on a daily basis daily. The results of this query will then be imported into a SQL Server database. Thanks in advance for your help.
Update
After some additional research, I did find that a Linked Server is what I'm looking for. Some notes for others working with SQL Server Express;
If it's SQL Server Express that you are working with, you may not see a program on your desktop or in the Start Menu for DTS. I found DTSWizard.exe nested in my SQL Server Program Files (for me, C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn), and was able to simply create a shortcut.
Also, because I'm using the SQL Express version of SQL Server, I wasn't able to save the Package I'd created. So, after creating the Package and running it once, I simply re-ran the package, and saved off my SQL for use in teh future.
Bit of a late answer, but in case anyone else was looking to do this...
You can use linked server, but you will find that the performance won't be as good as directly connecting via the ODBC drivers, also the translation of the data types may mean that you cannot access some tables. The linked server might be handy though for exploring the data.
If you use SSIS with the ODBC drivers (you will have to use ADO.NET data sources) then this will perform the most efficiently, and as well you should get more accurate data types (remember that the data types within progress can change dynamically).
If you have to extract a lot of tables, I would look at BIML to help you achieve this. BIML (Business Intelligence Markup Language) can help you create dynamically many SSIS packages on the fly which can be called from a master package. This master package can then be scheduled or run ad-hoc and so can any of the child packages as needed.
Can you connect to the Progress DB using OLE? If so, you could use SQL Server Linked Server to bypass the need for extracting to a file which would then be loaded into SQL Server. Alternately, you could extract to Excel and then import from Excel to SQL Server.

Large Excel File Imported Into SQL Server Database

I have a client who needs to import rows from a LARGE Excel file (72K rows) into their SQL Server database. This file is uploaded by users of the system. Performance became an issue when we tried to upload and process these at the same time on user upload. Now we just save it to the disk and an admin picks it up and splits it into 2K rows and runs it through an upload tool one by one. Is there an easier way to accomplish this without affecting performance or timeouts?
If I understand your problem correctly you get a large spreadsheet and need to upload it into a SQL Server database. I'm not sure why your process is slow at the moment, but I don't think that data volume should be inherently slow.
Depending on what development tools you have available it should be possible to get this to import in a reasonable time.
SSIS can read from excel files. You could schedule a job that wakes up periodically and checks for a new file. If it finds the file then it uses a data flow task to import it into a staging table and then it can use a SQL task to run some processing in it.
If you can use .Net then you could write an application that reads the data out through the OLE automation API and loads it to a staging area through SQLBulkCopy. You can read the entire range into a variant array through the Excel COM API. This is not super-fast but should be fast enough for your purposes.
If you don't mind using VBA then you can write a macro that does something similar. However, I don't think traditional ADO has a bulk load feature. In order to do this you would need to export a .CSV or something similar to a drive that can be seen off the server and then BULK INSERT from that file. You would also have to make a bcp control file for the output .CSV file.
Headless imports from user-supplied spreadsheets are always troublesome, so there is quite a bit of merit in doing it through a desktop application. The principal benefit is with error reporting. A headless job can really only send an email with some status information. If you have an interactive application the user can troubleshoot the file and make multiple attempts until they get it right.
I could be wrong, but from your description it sounds like you were doing the processing in code in your application (i.e. file is uploaded and the code that handles the upload then processes the import, possibly on a row-by-row basis)
In any event, I've had the most success importing large datasets like that using SSIS. I've also set up a spreadsheet as a linked server which works but always felt a bit hackey to me.
Take a look at this article which details how to import data using several different methods, namely:
SQL Server Data Transformation Services (DTS)
Microsoft SQL Server 2005 Integration Services (SSIS)
SQL Server linked servers
SQL Server distributed queries
ActiveX Data Objects (ADO) and the Microsoft OLE DB Provider for SQL Server
ADO and the Microsoft OLE DB Provider for Jet 4.0

Transferring data between different DBMS's

I would like to transfer the whole Database i have in Informix to Oracle. We have an an application which works on both Databases, one of our customers is moving from Informix to Oracle, and needs to transfer the whole Database to Oracle (the structure is the same).
We need often to transfer data between oracle/Mssql/Informix sometimes only one table and not the whole Database.
Does anybody know about any good program which does this kind of job?
The Pentaho Data Integration ETL tools are available as open source (also known under the former name "Kettle") for cross-database migration and many other use cases.
From their data sheet:
Common Use Cases
Data warehouse population with built-in support for slowly changing
dimensions, junk dimensions
Export of database(s) to text-file(s) or other databases
Import of data into databases, ranging from text-files to excel
sheets
Data migration between database applications
...
A list of input / output data formats can be found in the accepted answer of this question: Does anybody know the list of Pentaho Data Integration (Kettle) connectors list?
It supports all databases with a JDBC driver, which means most of them.
Check this question of mine, it includes some very good ideas: Searching for (freeware) database migration tool
you could give the Oracle Migration Workbench a try. See http://download.oracle.com/docs/html/B15858_01/toc.htm If you want to read Informix data into Oracle on a regular basis, using the Heterogeneous Services might be a better option. Check for hs4odbc or dg4odbc, depending on the Oracle release you have.
I hope this helps,
Ronald.
I have done this in the past and it is not a trivial task. We ended up writing out each table out to a pipe delimited flat file and reloading each table into Oracle with Oracle SQL Loader. There was a ton of Perl scripts to scrub the source data and shell scripts to automate the process as much as possible and run things in parallel as well.
Gotchas that can come up:
1. Pick a delimiter that is as unique as possible.
2. Try to find data types that match as close as possible to the Informix ones as possible. ie date vs. timestamp
3. Try to get the data as clean as possible prior to dumping out the flat files.
4. HS will most likely be too slow..
This was done years ago. You may want to investigate Golden Gate (now owned by Oracle) software which may help with the process(GG did not exist when I did it)
Another idea is use an ETL tool to read Informix and dump the data into Oracle (Informatica comes to mind)
Good luck :)
sqlldr - Oracle's import utility
Here's what I did to transfer 50TB of data from MySQL to ORacle. Generated csv files from MySql and used sqlldr utility in oracle to export all the data from the files to oracle db. It is the fastest way to import data. I researched on this for a few weeks and done lot of benchmark test cases and sqlldr is hands down best and fastest way to import into oracle.

Shared Data Sources vs. OLE DB Connections In SSIS

I've been using Shared Data Sources in all of my SSIS projects because I thought it was a "best practice". However, now that everything is under source control (TFS) just about every time I open a package it updates the Data Source connection in the package. I either have to roll the change back or check it in with some nonsense description.
I saw this SSIS Best Practice blog entry and it got me thinking about whether Shared Data Sources are really the way to go.
Don’t use Data Sources: No, I don't
mean data source components. I mean
the .ds files that you can add to your
SSIS projects in Visual Studio in the
"Data Sources" node that is there in
every SSIS project you create.
Remember that Data Sources are not a
feature of SSIS - they are a feature
of Visual Studio, and this is a
significant difference. Instead, use
package configurations to store the
connection string for the connection
managers in your packages. This will
be the best road forward for a smooth
deployment story, whereas using Data
Sources is a dead-end road. To
nowhere.
What are your experiences with data sources, configuration and source control?
We use SVN, so it doesn't integrate in the same way the TFS does. When starting out with SSIS, I used Shared Datasource, but it got me into all sorts of trouble when I finally uploaded the package to run on a schedule. So now I use XML configuration files (package configurations) to provide the connection properties, and I've never had any trouble with these.
So I agree, share datasources = bad idea/lack of hair
when we were migrating from SSIS 2005 to 2008, data sources were quite painful. Configurations on the other hand are pretty flexible. Especially if you store configurations in one database table - that way you can easily change connections with just one UPDATE statement!

What is the best way to import standalone data into a database?

A little background:
I have a remote, stand alone SQL Server database that is truncated at the end of every weekend. The data is hardly relational, not normalized at all, and pretty annoying to work with. On top of that, the schema for this database cannot be modified at all, because it is recreated by a third party application. Before the database is destroyed each week, a backup is created of that week's data. On average each database will have between 500,000 and 2,000,000 records.
My task is to create a historical version of this database that is a superset of all of these database backups. It should tie into our other databases which contain related sets of information. I have already started on an application to perform this task, and I've gotten to the point where I'm able to match data with our other databases, but I'm wondering if theres any best practice to handling this kind of import.
How do I make sure that I have unique IDs in my historical version of this database? Are there any features in SQL Server that can do some of the heavy lifting in this for me?
Thanks for your time on this.
There's definitely a feature in SQL Server that can assist you and that feature is called SSIS (SQL Server Integration Services). One of the main uses of SSIS is for ETL (Extract, Transform, Load), which means extracting data from several diverse source, transforming it into whatever you need to get into your destination database (such as a data warehouse - any linking with existing data will also happen here), and finally loading it into your destination DB.
I think the best way to get started, if that's what you want of course, is to pick up a good book on SSIS and go through it. While reading, don't forget to play around with the BIDS (Business Intelligence Development Studio - one of the SQL Server tools) to create some test packages.
Furthermore, on the internet you'll find plenty of "getting started" articles.
For your case in particular what I would do is:
create a generic package that can import the data from a source DB (one of your weekly DBs) and insert it into the destination DB - this package can be parameterized using Parent Package Configuration.
create a main package that loops over all backups in a certain folder, restores them one by one and calls the generic import package for each restore. After each successful import, the Control Flow would delete the previously-restored DB.
I think I've given you enough material to investigate on now :-)
Good luck,
Valentino.

Resources