Creating Netezza table from csv file with header - netezza

I need to create table in Netezza from csv file with header. Target table should have the columns as the headers in source file. Target table should have flexible structure changing based on source file headers. Is it possible?

I don't know what your expectations are for this? You will need datatypes as well as column names, and short of using NVARCHAR(100) for everything and crossing your fingers, I don't know how to address that from the csv file alone...
If you can get your source system to provide another csv file with metadata for COLNAME,DATATYPE(Precision) you can certainly change that to a valid 'create table' statement in netezza, and then something like this to get you the rest of the way:
CREATE EXTERNAL TABLE demo_ext SAMEAS emp USING (dataobject
('/tmp/demo.out') DELIMITER '|');
More info here:
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html

From the question, im assuming the requirement is you need to create a new table/alter every time based on ur source csv file. Is it?
#Moutusi Das
As mentioned by Lars G Olsen, the option will be having a separate file for table creation which contains column names & datatypes. You can use unix script to get the details of table creation file & create/altee table in netezza.
Then load the table using external table command.

Related

Is it possible to generate both an FMT and a table from a CSV file?

So I need a way to import CSVs that vary in column names, column order, and number of columns. They will always be CSV and of course comma-delimited.
Is it possible to generate both FMT and a temp table creation script of a CSV file?
From what I can gather, you need one or the other. For example, you need the table to generate the FMT file using the bcp utility. And you need the FMT file to dynamically build a create script for a table.
Using just SQL and to dynamically load files text files there is no quick way to do this. I see one option:
Get the data into SQL Server as a single column (bcp it in or use
t-sql and openrowset to load, SSIS, etc...). Be sure to include in this table a second column that is an identity (I'll call it "row_nbr"). You will need this to find the first row to get column names from the header in the file.
Parse the first record "where row_nbr = 1" to get the header record. You will need a string parse function (find online, or create your own) to substring out each column name.
Build dynamic SQL statement to create a new table with the parsed
out number of fields you just found. Must calculate lengths and use
a generic "varchar" data type since you wont know how to type the
data. Use column names found above.
Once you have a table created with the correct number of adequately
sized columns, you can create the format file.
I assumed, in my answer, that you are comfortable with doing all these things, just shared the logical flow at a high level. I can add more if you need more detail.

Mass import txt files in a single SQL Server table, using filename as key column

I have a folder of txt files. The filenames are of the form [integer].txt (like 1.txt, 2.txt and so on).
I have a table, let's say TableA (id int not null, contents varchar(max))
I want a way to mass import the contents of those files into TableA, populating the id column from the filename. Each file will be a single record in the table. It's not a delimited file.
I've looked into SSIS and flat-file source, but I could not find a way to select a folder instead of a single file (this answer claims it can be done, but I could not find out how).
Bulk Insert is my next bet, but I'm not sure how I can populate the id column with the filename.
Any ideas?
For anyone that might need it, I ended up solving this by:
Using a ForEach loop container (Thanks for the hint #Panagiotis
Kanavos)
Using a flat-file source, setting as row delimiter and column
delimiters a sequence I know didn't exist in the file (for example '$$$')
Assigning the filename to a variable, and the full path to a computed
variable (check this great post on how to assign the variables)
Using a derived column to pass the filename in the output (check out
this answer)

Use SSIS to import multiple .csv files that each have unique columns

I keep running into issues creating a SSIS project that does the following:
inspects folder for .csv files -> for each csv file -> insert into [db].[each .csv files' name]
each csv and corresponding table in the database have their own unique columns
i've tried the foreach loop found in many write ups but the issue comes down to the flat file connection. it seems to expect each csv file has the same columns as the file before it and errors out when not presented with this column names.
anyone aware of a work around for this?
Every flat file format would have to have it's own connection because the connection is what tells SSIS how to interpret the data set contained within the file. If it didn't exist it would be the same as telling SQL server you want data out of a database but not specifying a table or its columns.
I guess the thing you have to consider is how are you going to tell a data flow task what column in a source component is going to map to a destination component? Will it always be the same column name? Without a Connection Manager there is no way to map the columns unless you do it dynamically.
There are still a few ways you can do what you want and you just need to search around because I know there are answers on this subject.
You could create a Script Task and do the import in .Net
You could create a SQL Script Task and use BULK INSERT or OPENROWSET into a temporary stagging table and then use dynamic sql to map and import the final table.
Try to keep a mapping table with below columns
FileLocation
FileName
TableName
Add all the details in the table.
Create user variables for all the columns names & one for result set.
Read the data from table using Execute SQL task & keep it in single result set variable.
In For each loop container variable mappings map all the columns to user variables.
Create two Connection Managers one for Excel & other for csv file.
Pass CSV file connection string as #[User::FileLocation]+#[User::FileName]
Inside for each loop conatiner use bulk insert & assign the source & destination connections as well as table name as User::TableName parameter.
if you need any details please post i will try to help you if it is useful.
You could look into BiML Script, which dynamically creates and executes a package, based on available meta data.
I got 2 options for you here.
1) Scrip component, to dynamically create table structures in sql server.
2) With for each loop container, use EXECUTE SQL TASK with OPENROWSET clause.

SSIS Dynamic Mapping column

I'm little new to SSIS and I have a need to import some flat files into SQL tables in the same structure.
(Assume the table is already exist in the same structure and table name and flat file name is same)
I thought to create a generic package (sql 2014) to import all those file by looping through a folder.
I try to create a data flow task in a foreach loop container in the data flow task I dropped a flat file source and ADO.Net destination .
I have set the file source to a variable so that every time it loops through it get the new file. similarly for the ADO.net table name I set it to a the variable so that each time it select a different table according to the file name.
since both source column names and destination column names are same I assume it will map the columns automatically.
but with a simple map it didn't let me to run the package so added a column on the source and selected a table and mapped it.
when I run the package I assumed it will automatically re map everything.
but for the first file it ran but second file it failed complaining with map issues.
can some one let me know whether this is achievable by doing some dynamic mapping?? or using any other way.
any help would be much appreciated.
thanks
Ned

Need to map csv file to target table dynamically

I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.

Resources