I am trying to load a table using the COPY command and specifying the Column list as per the redshift documentation https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-column-mapping.html#copy-column-mapping-jsonpaths.
The file content of s3://mybucket/data/listing/data.csv is header followed by the file content. File content below:
c1, c2, c3, c4. c5, c6, c7
1,2,3,4,5,6,7
11,11,11,11,11,11,11
21,21,21,21,21,21,21
31,31,31,31,31,31,31
.........................
.........................
.........................
And I am using the following command to load the listing table which only has three columns c1, c2, c3.
copy listing(c1, c2, c3)
from 's3://mybucket/data/listing/data.csv'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
CSV
However, Redshift is not allowing the copy and failing with the following error:
1202 Extra column(s) found
Why is it? I am specifying the selected columns with the same name.
What I am missing here?
You cannot currently limit the columns in a COPY statement this way. You can either load all columns to a temporary table and then INSERT them into your target table or you can define the file(s) to be loaded as an external table and then INSERT directly to your target using SELECT from the external table.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html
Related
I have a staged file and I am trying to query the first line/row of it because it contains the column headers of the file. Is there a way I can create an external table using this file so that I can query the first line?
I am able to query the staged file using
SELECT a.$1
FROM #my_stage (FILE_FORMAT=>'my_file_format',PATTERN=>'my_file_path') a
and then to create the table I tried doing
CREATE EXTERNAL TABLE MY_FILE_TABLE
WITH
LOCATION='my_file_path'
FILE_FORMAT = my_file_format;
Reading Headers from CSV is not supported however this answer from StackOverflow gives a workaround.
I have the following table value type in SQL which is used in Azure Data Factory to import data from a flat file in a bulk copy activity via a stored procedure. File 1 has all three columns in it so this works fine. File 2 only has Column1 and Column2, but NOT Column3. I figured since the column was defined as NULL it would be ok but ADF complains that its attempting to pass in 2 columns when the table type expects 3. Is there a way to reuse this type for both files and make Column3 optional?
CREATE TYPE [dbo].[TestType] AS TABLE(
Column1 varchar(50) NULL,
Column2 varchar(50) NULL,
Column3 varchar(50) NULL
)
Operation on target LandSource failed:
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Trying to pass a
table-valued parameter with 2 column(s) where the corresponding
user-defined table type requires 3 column(s)
Would be nice if the copy activity behavior was consistent regardless of whether or not a stored procedure with table type is used or native BCP in the activity. When not using the table type and using the default bulk insert, missing columns in the source file end up being NULL in the target table without error (assumming the column is NULLABLE).
It will cause the mapping error in ADF.
In the Copy Activity, every column needs to be mapped.
If the source file only has two columns, it will cause mapping error.
So, I suggest you to create two different Copy activities and create a two columns table type.
You can pass optional column, I've made a test successfully, but the steps will be a bit complex. In my case, File 1 has all three columns, File 2 only has Column1 and Column2, but NOT Column3. It will use Get Metadata activity, Set Variable activity, ForEach activity, IfCondition activity.
Please follow my steps:
You need to define a variable FileName to foreach.
In the Get Metadata1 activity, I specified the file path.
In the ForEach1 activity, use #activity('Get Metadata1').output.childItems to foreach the filelist. It need to be Sequential.
Inside the ForEach1 activity, use Set Variable1 to set the FileName variable.
In the Get Metadata2, use item().name to specify the file.
In the Get Metadata2, use Column count to get the column count from the file.
In the If Contdition1, use #greater(activity('Get Metadata2').output.columnCount,2) to determine whether the file is larger than two columns.
In the True activity, use variable FileName to specify the file.
In the False activity, use Additional columns to add a Column.
When I run debug, the result shows:
Is there a way to create a table( with columns) dynamically by using the JSON file from the staging area?
I used the comman: 'copy into TableName from #StageName;'
This put all the different rows in my json file into a single column.
However, I want different columns. For example column1 should be "IP", column 2 should be "OS" and so on.
Thank you in advance!!
I have implemented the same thing in my project.
So its a 2 step process.
1st Step - Create a stage table with variant data type table and copy into table from stage - which I can see you have already done that.
2nd Step - Either create a table or a view(since snowflake is superfast, View is the way to go for this dynamic extract of JSON data) which will read the data directly from this variant column, something like this
create or replace view schema.vw_tablename copy grants as
SELECT
v:Duration::int Duration,
v:Connectivity::string Connectivity
...
from public.tablename
if your JSON has an array of structure, use below
create or replace view schema.vw_tablename copy grants as
SELECT
v:Duration::int Duration,
v:Connectivity::string Connectivity,
f.value:Time::int as Event_Time,
from public.tablename,
table(flatten(v:arrayname)) f
When publishing a dacpac with sqlpackage.exe, it runs Schema Compare first, followed by pre-deployment scripts. This causes a problem when, for instance, you need to drop a table or rename a column. Schema Compare was done before the object was modified and the deployment fails. Publish must be repeated to take the new schema into account.
Anyone have a work-around for this that does not involve publishing twice?
Gert Drapers called it as pre-pre-deployment script here
Actually it is a challenge. If you need to add non-nullable and foreign key column to a table full of data - you can do with a separate script only.
If you are the only developer - that is not a problem, but when you have a large team that "separate script" has to be somehow executed before every DB publish.
The workaround we used:
Create separate SQL "Before-publish" script (in DB project) which has a property [Build action = None]
Create custom MSBuild Task where to call SQLCMD.EXE utility passing "Before-publish" script as a parameter, and then to call SQLPACKAGE.EXE utility passing DB.dacpac
Add a call of the custom MSBuild Task to db.sqlproj file. For example:
<UsingTask
TaskName="MSBuild.MsSql.DeployTask"
AssemblyFile="$(MSBuildProjectDirectory)\Deploy\MsBuild.MsSql.DeployTask.dll" />
<Target Name="AfterBuild">
<DeployTask
Configuration="$(Configuration)"
DeployConfigPath="$(MSBuildProjectDirectory)\Deploy\Deploy.config"
ProjectDirectory="$(MSBuildProjectDirectory)"
OutputDirectory="$(OutputPath)"
DacVersion="$(DacVersion)">
</DeployTask>
</Target>
MsBuild.MsSql.DeployTask.dll above is that custom MSBuild Task.
Thus the "Before-publish" script could be called from Visual Studio.
For CI we used a batch file (*.bat) where the same two utilities (SQLCMD.EXE & SQLPACKAGE.EXE) were called.
The final process we've got is a little bit complicated and should be described in a separate article - here I mentioned a direction only :)
Move from using visual studio to using scripts that drive sqlpackage.exe and you have the flexibility to run scripts before the compare:
https://the.agilesql.club/Blog/Ed-Elliott/Pre-Deploy-Scripts-In-SSDT-When-Are-They-Run
ed
We faced a situation when we need to transform data from one table into other during deployment of the database project. Of course it is a problem to do using the DB project due to in the pre-deployment the destination table (column) still doesn't exist but in post-deployment script the source table (column) is already absent.
To transform data from TableA to TableB we used the following idea (This approach can be used for any data modifications):
Developer adds destination table (dbo.TableB) into the DB project and deploys it onto the local DB (without committing to a SVN)
He or she creates a pre-deployment transformation script. The trick is that the script put the result data into a temporary table: #TableB
Developer deletes the dbo.TableA in the DB project. It is assumed that the table will be deleted during execution of the main generated script.
Developer writes a post-deployment script that copies data form #TableB to dbo.TableB that was just created by the main script.
All of the changes are committed into the SVN.
This way we don't need the pre-pre-deployment script due to we store the intermediate data in the temporary table.
I'd like to say that the approach that uses the pre-pre-deployment script had the same intermediate (temporary) data, however it is stored not in temporary tables but in real tables. It happens between pre-pre-deployment and pre-deployment. After execution of pre-deployment script this intermediate data disappears.
What is more, the approach with using temporary tables allows us to face the following complicated but real situation: Imagine that we have two transformations in our DB project:
TableA -> TableB
TableB -> TableC
Apart from that we have two databases:
DatabaeA that have the TableA
DatabaeB where the TableA was already transformed into the TableB. The TableA is absent in the DatabaseB.
Nonetheless we can deal this situation. We need just one new action in the pre-deployment. Before the transformation we try to copy data form the dbo.TableA into #TableA. And the transformation script works with temporary tables only.
Let me show you how this idea works in DatabaseA and DatabaseB.
It is assumed that the DB project has two couples of the pre and post deployment scripts: "TableA -> TableB" and "TableB -> TableC".
Below is the example of the scripts for "TableB -> TableC" transformation.
Pre-deployment script
----[The data preparation block]---
--We must prepare to possible transformation
--The condition should verufy the existance of necessary columns
IF OBJECT_ID('dbo.TableB') IS NOT NULL AND
OBJECT_ID('tempdb..#TableB') IS NULL
BEGIN
CREATE TABLE #TableB
(
[Id] INT NOT NULL PRIMARY KEY,
[Value1] VARCHAR(50) NULL,
[Value2] VARCHAR(50) NULL
)
INSERT INTO [#TableB]
SELECT [Id], [Value1], [Value2]
FROM dbo.TableB
END
----[The data transformation block]---
--The condition of the transformation start
--It is very important. It must be as strict as posible to ward off wrong executions.
--The condition should verufy the existance of necessary columns
--Note that the condition and the transformation must use the #TableA instead of dbo.TableA
IF OBJECT_ID('tempdb..#TableB') IS NOT NULL
BEGIN
CREATE TABLE [#TableC]
(
[Id] INT NOT NULL PRIMARY KEY,
[Value] VARCHAR(50) NULL
)
--Data transformation. The source and destimation tables must be temporary tables.
INSERT INTO [#TableC]
SELECT [Id], Value1 + ' '+ Value2 as Value
FROM [#TableB]
END
Post-deployment script
--Here must be a strict condition to ward of a failure
--Checking of the existance of fields is a good idea
IF OBJECT_ID('dbo.TableC') IS NOT NULL AND
OBJECT_ID('tempdb..#TableC') IS NOT NULL
BEGIN
INSERT INTO [TableC]
SELECT [Id], [Value]
FROM [#TableC]
END
In the DatabaseA the pre-deployment script has already created the #TableA. Therefore the data preparation block won't be executed due to there is no dbo.TableB in the database.
However the data transformation will be executed because there is the #TableA in the database that was created by the transformation block of the "TableA -> TableB".
In the DatabaseB the data preparation and transformation blocks for the "TableA -> TableB" script won't be executed. However we already have the the transformed data in the dbo.TableB. Hence the the data preparation and transformation blocks for the "TableB -> TableC" will be executed without any problem.
I use the below work around in such scenarios
If you would like to drop a table
Retain the table within the dacpac (Under Tables folder).
Create a post deployment script to drop the table.
If you would like to drop a column
Retain the column in the table definition within dacpac (Under Tables folder).
Create a post deployment script to drop the column.
This way you can drop tables and columns from your database and whenever you make the next deployment ( may be after few days or even months) exclude that table/columns from dacpac so that dacpac is updated with the latest schema.
I have a stored procedure on a server which generates a table in my database. Then in ssis I'm querying some columns from that table and then I'm appending some dummy columns filled with static values. When I query the database I'm doing it so by holding the query into a variable (sql command from variable), in that query I am using a select a, b, c from X where #[User::variable1] = '' and #[User::variable2]='' for all 4.
My question is: I need to be able to change the value of those variables (variable1 to 4) for 48 different scenarios (or might be more than that), so manually replacing them would be a pain since it will lead to over 130 combos. If there a way in which I could pass the values from an excel file at runtime to the package?
ex:
column1 column2 column3 column4
12.03.2015 def ghi jkl
12.04.2015 456 789 012
..
..
And I need to loop through all columns in the excel file and the results should be exported to files.
What I described above I already made except for the part in which I can get the values for those 4 variables from the excel file. I need help only with this part.
Any help would be great.
Thank you,
Cristian
Yes, this is possible.
Create a Connection to Excel
Create a Transit table to store the excel content (obviously the column names)
Create a "Data Flow" Task to transfer the content from Excel into the Transit Table
Create an "Execute SQL Task"
Get one by one row from Transit table in a loop or Cursor
Dynamically create a SQL string with the value read from the Transit Table
Execute it by using sp_executesql.
Use "Result set" if you want to output any recordset