I am using Oracle ODI 11.1.1.7.
I have 6, pipe delimited files. Each file has different number of columns. Number of columns is fixed in each file. So I know the format. I want to load all these files in a single table in database.
I can create ODI process with steps in sequential order and call interfaces created for these files to accomplish the task.
Is there any better way to do this? Like creating only one interface which can work with all these files. Something i can do through a loop?
Thanks in advance.
Unfortunately as these files have a different structure (number of columns), you will need have a different source datastore for each of them so you will need different interfaces.
If the structure were the same, you could use only one datastore and one interface. You would need to use a variable as the filename in the datastore definition and create a loop in a package that would change the value of the variable and execute the interface loading the file.
Related
I am working on Azure Logic App based solution to export the data from Database to CSV file.
So far, I am able to do the following:
Use SQL database connector and execute the stored procedure to retrieve data
Create a CSV file
Use FTP connector to upload the file to an FTP server
With this approach, the issue is going to be data size. I am trying to figure out a way(s) to handle large data export. Here are the different approaches I can think of:
Option 1: Use pagination in SP, and iteratively collect the data.
Question 1.1: In each loop in the Logic App, multiple CSV files will be generated. How to combine the data from each iteration into a single CSV file?
Question 1.2: I am thinking of using a variable array to collect the data from each iteration, and then create CSV file from the variable array. Will I run into any issue if the size of the variable array becomes too large?
Option 2: Somewhere I read that you can overcome the data size issue with chunking.
Question: I am not even sure if the database connector supports chunking and will I able export the data into a single CSV file?
Option 3: Create multiple CSV files and then merge them into one CSV file?
Question: Is that possible to do it in Logic apps? Or will I need to implement a Logic function to handle the merging of CSV files?
Its hard to say different options without knowing the amount of data you are talking about.
1- Yes you can use pagination, i would recommend using a SP in your sql server to return the data and also the page size and number: https://social.technet.microsoft.com/wiki/contents/articles/40060.sql-pagination-for-bulk-data-transfer-with-logic-apps.aspx
1.1 By default for-each in LogicApps is executed in parallel, check the For each settings and change the concurrency control: https://learn.microsoft.com/sv-se/azure/logic-apps/logic-apps-control-flow-loops#foreach-loop-sequential
The data returned from the SP can be converted to csv by using "create csv table"
https://learn.microsoft.com/sv-se/azure/logic-apps/logic-apps-perform-data-operations#create-csv-table-action
Before I begin, I would like to express my appreciation for all of the insight I've gained on stackoverflow and everyone who contributes. I have a general question about managing large numbers of files. I'm trying to determine my options, if any. Here it goes.
Currently, I have a large number of files and I'm on Windows 7. What I've been doing is categorizing the files by copying them into folders based on what needs to be processed together. So, I have one set that contains the files by date (for long term storage) and another that contains the copies by category (for processing and calculations). Of course this doubles my data each time. Now I'm having to create more than one set of categories; 3 copies to be exact. This is quadrupling my data.
For the processing side of things, the data ends up in excel. Originally, all the data was brough into excel. Then all organization and filtering was performed in excel. This was time consuming and not easily maintainable over the long term. Later the work load was shifted to the file system itself, which lightened the work in excel.
The long and short of it is that this is an extremely inefficient use of disk space. What would be a better way of handling this?
Things that have come to mind:
Overlapping Folders
Is there a way to create a folder that only holds the addresses of a file, rather than copying the file. This way I could have two folders reference the same file.
To my understanding, a folder is a file listing the memory addresses of the files inside of it, but on Windows a file can only be contained in one folder.
Microsoft SQL Server
Not sure what could be done here.
Symbolic Links
I'm not an administrator, so I cannot execute the mklink command.
Also, I'm uncertain about any performance issues with this.
A Junction
Apparently not allowed for individual files, only folders in windows.
Search folders (*.search-ms)
Maybe I'm missing something, but to my knowledge there is no way to specify individual files to be listed.
Hashing the files
Creating hash tags for all the files, would allow for the files to be stored once. But then I have no idea how I would handle the hash tags.
XML
Maybe I could use xml files to attach meta data to the files and somehow search using them.
Database File System
I recently came across this concept in my search. Not sure how it would apply Windows.
I have found a partial solution. First, I discovered that the laptop I'm using is actually logged in as Administrator. As an alternative to options 3 and 4, I have decided to use hard-links, which are part of the NTFS file system. However, due to the large number of files, this is unmanageable using the following command from an elevated command prompt:
mklink /h <source\file> <target\file>
Luckily, Hermann Schinagl has created the Link Shell Extension application for Windows Explorer and a very insightful reading of how Junctions, Symbolic Links, and Hard Links work. The only reason that this is currently a partial solution, is due to a separate problem with Windows Explorer, which I intend to post as a separate question. Thank you Hermann.
Is there a way to automatically generate SSIS packages? I need to create a lot of SSIS packages that just erase data from one table and import data from a text file. The file name matches table name and the column headers are in the first line of the file.
For more detailed information:
I am working on a project in which I have to separate two systems that are currently coupled (one system has direct access to the other's database). After the modifications, one system will provide data through txt files to be loaded in the other database.
We have to use SSIS to load data into the database from the text files.
The text files will be provided in CSV format with column headers in the first line.
The tables from both databases have matching column names, and all we need to do is clear the table and load data from the files.
I have more than one hundred tables with different number of columns. Do I need to create each package manually?
I'm familiar with 2 free options.
EzAPI might be a good place if you're a .NET heavy shop or just really want to geek out with the API. This approach allows you to control the pretty much the entire package generation but at the cost of coding time. I find EzAPI generally easier than working with the base COM/.NET libraries for SSIS.
Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I did this just using vb, I passed in the table names as a command parameter and used vb to generate the insert and clear, worked a charm... I can try and dig it out tomorrow when I'm back in the office but it was pretty simple. There didn't seem to be any other way to say "just get x and export it", "just take y and import it into z" so vb it had to be. In fact come to think of it I think I actually used a small xml file to pass the table info for export and then determined the table name for import from the csv file name. To be clear, this was only one package but it could dynamically choose the number of imports/exports it did. Further clarification this was vb within ssis as a processing step
I want to export all my queries as individual files for purposes of putting them into mercurial source control, but I don't know how to export the individual queries as individual files without having to open each one, then save to the folder, then add into the project, or some equally convoluted process.
I wouldn't mind having to add each one individually, but how do I get them out of the database as individual files without opening them all and doing each one save as? Ostensibly I would like them named with the name they have in the database right now.
I could easily dump the whole lot into one long file using database tasks, but that's not really super helpful is it?
I have SSMS 2k5 and 2k8 (and VS 2k5, 2k8, 2010 to boot) to work with, any thoughts?
Right click on the database. Select Generate Script. On the last page. Script To file you can choose single file or file per object
When you script a database in SSMS you have the option of one file per objects.
SMO is useful with a small app to iterate through
Third party tools like Red Gate SQL Compare (there are other free tools) can script too
I would write a small C# program which extracts your database object via SMO and stores them in your filesystem the way you want.
It is rather easy to write stored procedures which fetches the definition into the result as text. sp_helptext could be used as start.
Than you can use PowerShell to write the Output to the file system.
It sounds as if this would fit rather good into the Really Simple Data Dictionary codeplex project. link text
Working on a school project, the program is supposed to read from a text file that has a record about a song in every line, fields separated by ";".
Anyways I have no knowledge of databases, and I just want the quickest way to create a database from that text file, and also i will need to change some of the fields of the records once in a while from the program... Also the program needs to search through the database based on certain fields.
Anyways so far all our projects didn't keep a database, so when we closed the program, every info was gone, now i actually need to keep some info for the next time the program runs. What's the fastest way to accomplish this?
Also I wanna be able to keep some info about the software, like the path of the original text file for weekly updates. Where can i save info like that?
EDIT: it doesn't have to an actual database, as long as i can search and edit it efficiently.
If you can use SQL database, I'd suggest simple file-based database SQLite
With SQLite, you can query, insert and update records by executing regular SQL statements.
Here you will find introduction to C++ interface It's easy to embed SQLite support in an application because SQLite comes as a library, meaning a bunch of header files and 1-2 binary archive with library.
Your comma-delimited textfile is aleady a database. You can add records, delete records, and modify records using the standard textfile routines provided by the standard C++ libraries.
Alternatively, you can import your textfile into SQL Server using BULK INSERT.
Finally, you can access your CSV (comma-delimited text) file using SQL queries. You need to find the correct connection string. See http://www.connectionstrings.com/textfile.