Loading column selectively from TSV in Redshift - database

I am loading a TSV file from S3 file in a RedShift table. TSV file has 10 columns.
I want to select only column 2 and 5 from that CSV.
Previously I was using the COPY command to create table. Can I selectively specify I want to select column 2 and 5?

You cannot do this (currently) with a COPY command. The target table and the CSV/TSV file being loaded need to have the same columns. You can COPY to a temp table with all the columns then insert the needed data into your table.
If this doesn't work for you then define an external table that points to the CSV file and insert from the external table to your target.

Related

BULK INSERT not inserting properly from CSV

I am trying to use BULK INSERT to add rows to an existing table from a .csv file. For now I have a small file for testing purposes with the following formatting:
UserID,Username,Firstname,Middlename,Lastname,City,Email,JobTitle,Company,Manager,StartDate,EndDate
273,abc,dd,dd,dd,dd,dd,dd,dd,dd,dd,dd
274,dfg,dd,dd,dd,dd,dd,dd,dd,dd,dd,dd
275,hij,dd,dd,dd,dd,dd,dd,dd,dd,dd,dd
And this is what my query currently looks like:
BULK INSERT DB_NAME.dbo.Users
FROM 'C:\data.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
When I execute this query it returns 1 row affected. I checked the entry in the table and noticed that the data in the file is inserted in the table as a single row.
What could be causing this? What I am trying to accomplish is insert those rows in individual rows in the table. See (long)image below
The first column is actually an IDENTITY column so in the file I just specified a integer even though it will be overwritten by the auto generated ID as I am not sure how to tell the query to start inserting from the second field yet.
There are more columns created in the actual table than specified in the file as not everything needs to be filled. Could that be causing it?
The problem is that you are loading data into the first column. To skip a column create a view over your table with just the columns you want to load and BULK INSERT into view. See below example (from MSDN: https://msdn.microsoft.com/en-us/library/ms179250.aspx):
CREATE VIEW v_myTestSkipCol AS
SELECT Col1,Col3
FROM myTestSkipCol;
GO
USE AdventureWorks2012;
GO
BULK INSERT v_myTestSkipCol
FROM 'C:\myTestSkipCol2.dat'
WITH (FORMATFILE='C:\myTestSkipCol2.xml');
GO
What I would recommend you do instead is to create a staging table which matches the file exactly. Load data into that and then use INSERT statement to copy it into your permanent table. This approach is much more robust and flexible. For example, after loading the staging table you can perform some data validation or cleanup before loading the permanent table.

How to perform bulk insert when we have identity column in the table

I excluded values of identity column of the table in the text file i used to load. This resulted error while loading. Please let me know how to deal with this scenario.
Your source csv file should include all columns, even the identity column.
The destination table with the identity column will create its own values when you bulk upload and it will ignore the column in your source CSV file unless you specify the KEEPIDENTITY property.
See BULK INSERT (Transact-SQL)

Is it possible to append column from file to existing table in MonetDB?

Is it possible in MonetDB to append column from file to existing table or table has to be recreated? Also, is it possible to drop single column from a table?
you could load your data into a new table with record numbers in a column then inner join.
and yes, drop column is supported

update a table in SSIS periodically

How to do regular updates to a database table in SSIS. The table has foreign key constraints.
I have a package running every week, and I have to update the data in the table from a flat file. Most of the contents are the same with update values and other new rows.
UPDATE : My data file contains updated contents ( some rows missing, some rows added, some modified ). The data file does not have the Primary keys ( I create the primary keys when I first bulk insert the data from the data file ), on subsequent SSIS package runs, I need to update the table with new data file contents.
e.g.
table
---------------------------------------------
1 Mango $0.99
2 Apple $0.59
3 Orange $0.33
data file
---------------------------------------------
Mango 0.79
Kiwi 0.45
Banana 0.54
How would I update the table with data from the file. The table has foreign key constraints with other tables.
another approach, to load massive group data instead of dealing row by row:
On database
create an staging table (e.g. StagingTable [name], [price])
Create a procedure (you may need to change the objects names, and add
transaction control and error handling etc just a draft):
create procedure spLoadData
as
begin
update DestinationTable
set DestinationTable.Price = StagingTable.Price
from DestinationTable
join StagingTable
on DestinationTable.Name = StagingTable.Name
insert into DestinationTable
(Name, Price)
select Name, Price
from StagingTable
where not exists (select 1
from DestinationTable
where DestinationTable.name = StagingTable.Name)
end
On SSIS
Execute SQL Task with (truncate [staging_table_name])
Data Flow task transferring from your Flat File to the Staging Table
Execute SQL Task calling the procedure you created (spLoadData).
Following are the few thoughts/steps:
Create a Flat File Connection manger.
Take Data flow task.
Create Flat File Source with connection manager just created.
Take lookup transformation(s) as many as you need to get FK values based on your source file values.
Take a lookup transformation after all above lookups, to get all values from Destination table.
Keep Conditional split and compare source values and destination values.
If all columns matched then UPDATE, else INSERT.
Map above conditional split results accordingly to OLEDB Destnation/OLEDB Command.
Give a try and let me know the results/comments.

How to load data from a single CSV file into multiple tables with relations? [duplicate]

This question already has answers here:
How do I split flat file data and load into parent-child tables in database?
(2 answers)
Closed 10 years ago.
I have a CSV file which I need to load into a SQL database. The issue is, that i need to split some Data into different tables. During the load I need to make sure, that when the first part is loaded into the first table I get that ID which will be put into the foreign key field in the second table when I try to load the other data in the database.
How do I load the data into multiple tables with the data coming from CSV file by maintaining the data integrity?
-- My source file
Part1,Part2
A,a
B,b
C,c
-- Steps
Create a temp table
CREATE TABLE [dbo].[Part1And2](
[RowID] [int] IDENTITY (1,1) NOT NULL,
[Part1] varchar NULL,
[Part2] varchar NULL
) ON [PRIMARY]
truncate this table using Exceute SQL Task
DFT 1 : Load the file into a temp table. Now Part1 and Part2 sets of fields get coupled.
Source : Your source file
Destination : The table you created.
DFT 2 : Split the data
Source : The temp table you just loaded
Feed it to multicast tsk
Create two outputs from the multicast and connect them to the individual tables (Say Part1 and Part2)
Select the fields for each table
In our example, table Part1 will have two fields RowID and Part1
table Part2 : RowID and Part2
This is just the starting point. Think what would you do when you get the next set of file. How would the IDs
be assigned?

Resources