I have a requirement at work where we have to move large csv/Excel data files from vendor FTP Server to our local server. I have been researching on this and thinking to use SSIS FTP Task for this requirement. Can SSIS FTP Task handle large data files? How much large data can be supported with SSIS FTP Task.
Large file size transferring recommendations that you can think of.
Any suggestion in this regards would be appreciated. This will help us in deciding the right way of achieving this requirement.
Thanks in advance.
We are analyzing which route to go SSIS or any other tool that can help us in achieving the task
Related
Could anyone help me with the following scenario? Any insight would be greatly appreciated
I have a PostgreSQL database running on an instance and I want to transfer the same setup with data to another instance. The requirement is to retire the original instance after transfer. The course of action should include copying millions of records stored in the database, Postgres log files, configuration files, and any other necessary files. What is my best course of action? Can docker be utilized in this situation?
What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2
I have a remote XML file, which is zipped (approx 100MB in size). I need to download, extract, read, parse and import into SQL Server.
Before starting coding this solution (in Python), is there any ready-made utility which could do that? Notice that this needs to run on a scheduled basis (preferably as service) or Windows Schedule.
What's really important is that it needs to be really fast!
Thank you,
Giorgoc
Following on my comments:
You can quite easily do this with SSIS.
Download the remote XML file... Here is an example: How to read a remote xml file from SQL Server 2005
you can then do transformation on the data if needed by using transformation tasks
Using SSIS to load the XML data into the SQL Server DB... Here is an example: How to load an XML file into a database using an SSIS package?
Hope they point you in the right direction and help you in your tasks.
I'm a beginner, and trying to improve my knowledge on DB side.
I am learning SSIS with SQL Server 2008R2. Going by the tutorials from Web, I feel like this is somewhat similar to what I've read about Oracle Data Pump.
Can someone enlighten me, if there is similarity between two SSIS and Data Pump.
If they are totally different, please forgive me for this question. Else, let me know how they are similar.
Regards,
Justin
Data Pump is not a complete ETL tool,it is a feature in Oracle 11g.It transfers the data from a single file to a single destination.With SSIs you got all the extraction ,transformation and loading facilities.
Corresponding to SSIS ,oracle has oracle warehouse builder.
Oracle data pump is an alternative to EXPORT and Import utility in SQL SERVER.
I have never heard of Data Pump but initial googlings show it is more related to being a Data Flow Task within an SSIS package than being a substitute for a whole SSIS package. Data Pump is simply porting data from a single source to a single destination. An SSIS package can facilitate extracting, transforming, and loading any amount of sources to any amount of destinations within the same package. You also get the extensibility (if that is a word?) of writing .NET code or any other 3rd party assemblies that you would like to use to further do manipulation of data. You can also do file and DB maintenance with an SSIS package (clean up after processing of files, maintaining backups, etc.).
I need to be able to extract and transform data from a data source on a client machine and ship it off via a web service call to be loaded into our data store. I would love to be able leverage SSIS but the Sql Server licensing agreement is preventing me from installing Integration Services on a client machine. Can I just provide the client copies of the Integration Services' assemblies to be referenced by my app? Does anyone have any ideas on how to best implement a solution to this problem apart from building a custom solution from the ground up? Ideally the solution would include leveraging an existing ETL tool?
Thanks for your suggestions.
If you are providing your client with a service around their data, you should develop a standard that they need to deliver their data in, and negotiate a delivery method for that file well before you ever consider what to do with SSIS. Since from comments it appears that your data is on a machine in a client's remote location, the most common method I have seen is either having the client SecureFTP a file into your network for processing, or to have a job on your end that gets the file using SecureFTP. Once you have the file on your network, writing the SSIS to process it is trivial.
If the server can reach out to the client machine, then you can just run the SSIS package on the server. What kind of data are you moving? If it's a flat file, you could FTP it to the server.
Another way to go about this is to use BCP. I'm not a big fan of this approach (SSIS is much faster, more robust, etc), but it can work in a pinch.
http://msdn.microsoft.com/en-us/library/ms162802.aspx