I have created one file format (CSV) and then one external table to load csv data from azure blob storage.
The external table is showing all columns as NULL except the "Value" column
File Format Code
COMPRESSION = 'NONE'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE'
EMPTY_FIELD_AS_NULL = FALSE
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('NULL');
EXTERNAL TABLE CODE
CREATE OR REPLACE EXTERNAL TABLE EXT_DIM_TESTTABLE
(
COL1 VARCHAR (1000) AS (value:"COL1"::string),
COL2 VARCHAR (1000) AS (value:"COL2"::string),
COL3 VARCHAR (1000) AS (value:"COL3"::string),
COL4 VARCHAR (1000) AS (value:"COL4"::string),
COL5 VARCHAR (1000) AS (value:"COL5"::string),
COL6 VARCHAR (1000) AS (value:"COL6"::string)
)
WITH
LOCATION=#TESTSTAGE
AUTO_REFRESH = true
FILE_FORMAT = 'FILE_TESTFORMAT_CSV'
PATTERN='.*TEST_DATA.csv';
Now when I select * from EXT_DIM_TESTTABLE, all columns shows NULL except VALUE one,
VALUE column is coming as below, the column names are not taken as "Col1" / "Col2" etc. but the values are correct. Rest all columns are NULL
{
"c1": "TESTING",
"c2": "TESTING",
"c3": "TESTING",
"c4": "TESTING",
"c5": "TESTING",
"c6": "TESTING"
}
not sure what is missing here?
It seems you are using value:"COL1"::string incorrectly.
Can you try using below DDL for external table?
CREATE OR REPLACE EXTERNAL TABLE EXT_DIM_TESTTABLE
(
COL1 VARCHAR(1000) AS (value:c1::string),
COL2 VARCHAR(1000) AS (value:c2::string),
COL3 VARCHAR(1000) AS (value:C3::string),
COL4 VARCHAR(1000) AS (value:c4::string),
COL5 VARCHAR(1000) AS (value:c5::string),
COL6 VARCHAR(1000) AS (value:c6::string)
)
WITH
LOCATION=#TESTSTAGE
AUTO_REFRESH = true
FILE_FORMAT = 'FILE_TESTFORMAT_CSV'
PATTERN='.*TEST_DATA.csv'
;
Related
Merge statement throws:
"Boolean value is not recognized"
I'm reading all varchar values from streams and writing to a master table. I dont have any Boolean in the source or destination table. Unable to find out why I'm getting: "Boolean value is not recognized" Error."
create table "EMP_TEST" (EMPID integer, EMPNAME VARCHAR(500), EMPADD VARCHAR(500), EMPSALARY INTEGER);
create table "EMP_TEST_MAIN" (EMPID integer, EMPNAME VARCHAR(500), EMPADD VARCHAR(500), EMPSALARY INTEGER);
create or replace stream ST_EMP_TEST on table "EMP_TEST";
insert into "EMP_TEST"
select 1, 'AAA','PLACE 1', 100 UNION
select 2, 'BBB','PLACE 2', 200 UNION
select 3, 'CCC','PLACE 3', 300;
MERGE INTO "EMP_TEST_MAIN" AS T USING (select * from ST_EMP_TEST where NOT (METADATA$ACTION ='DELETE' AND METADATA$ISUPDATE = TRUE)) AS S ON T.EMPID = S.EMPID WHEN MATCHED AND S.METADATA$ACTION = 'INSERT' AND S.METADATA$ISUPDATE THEN UPDATE SET T.EMPNAME = S.EMPNAME AND T.EMPADD = S.EMPADD AND T.EMPSALARY = S.EMPSALARY WHEN MATCHED AND S.METADATA$ACTION = 'DELETE' THEN
DELETE WHEN NOT MATCHED AND S.METADATA$ACTION = 'INSERT' THEN
INSERT (T.EMPID, T.EMPNAME, T.EMPADD, T.EMPSALARY) VALUES (S.EMPID, S.EMPNAME, S.EMPADD, S.EMPSALARY);
The columns in UPDATE part are separated with AND:
WHEN MATCHED AND S.METADATA$ACTION = 'INSERT' AND S.METADATA$ISUPDATE
THEN UPDATE SET T.EMPNAME = S.EMPNAME
AND T.EMPADD = S.EMPADD
AND T.EMPSALARY = S.EMPSALARY
-- AND is incorrect in this context
Should be ,:
WHEN MATCHED AND S.METADATA$ACTION = 'INSERT' AND S.METADATA$ISUPDATE
THEN UPDATE SET T.EMPNAME = S.EMPNAME
,T.EMPADD = S.EMPADD
,T.EMPSALARY = S.EMPSALARY
I'm using external tables to load data from csv stored in a blob to a table in Azure SQL Data Warehouse. The csv uses a string delimiter (double quote), empty strings are represented as 2 double quotes ("").
I want the empty columns to be treated as NULL in the table. The external file format I use is set up with USE_TYPE_DEFAULT = FALSE, but this does not seem to work since empty columns are imported as empty strings. And this only tends to happen when the columns are strings, numeric columns are correctly converted to NULL.
I'm also importing a different csv which does not have a string delimiter using a different external file format and those empty columns are imported as NULL. So it looks like it has something to do with the STRING_DELIMITER option.
The csv:
col1;col2;col3;col4;col5;col6
"a";"b";"c";"1";"2";"3"
"d";"";"f";"4";"";"6"
The code of the external file format:
CREATE EXTERNAL FILE FORMAT eff_string_del
WITH (
FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS(
FIELD_TERMINATOR = ';'
,STRING_DELIMITER = '0x22'
,FIRST_ROW = 2
,USE_TYPE_DEFAULT = False)
)
Code of the table using the external file format:
CREATE EXTERNAL TABLE dbo.test (
col1 varchar(1) null
,col2 varchar(1) null
,col3 varchar(1) null
,col4 int null
,col5 int null
,col6 int null
)
WITH (
DATA_SOURCE = [EDS]
,LOCATION = N'test.csv'
,FILE_FORMAT = eff_string_del
,REJECT_TYPE = VALUE
,REJECT_VALUE = 0
)
The result when querying the external table:
SELECT *
FROM [dbo].[test]
col1 col2 col3 col4 col5 col6
---- ---- ---- ----------- ----------- -----------
a b c 1 2 3
d f 4 NULL 6
Can someone please help me explain what is happening or what I'm doing wrong?
Use USE_TYPE_DEFAULT = False in external file format.
Any NULL values that are stored by using the word NULL in the delimited text file are imported as the string 'NULL'.
For example:
CREATE EXTERNAL FILE FORMAT example_file_format
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = False)
)
Reference : https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
Have you considered adding the value NULL in that field instead of ""?
See below a test I've performed using the following code:
declare #mytable table
(id int identity primary key, column1 varchar(100))
insert into #mytable (column1) values ('test1')
insert into #mytable (column1) values ('test2')
insert into #mytable (column1) values (null)
insert into #mytable (column1) values ('test3')
insert into #mytable (column1) values (null)
select
*
from #mytable
The results looks like this:
Would this work for you?
I am new to Azure and Polybase, I am trying to read a CSV file into a SQL External Table.
I noticed that, it is not possible to skip the first row, the header on some forums I read.
I'm hoping for the opposite,Can you help me ?
The code I used is below.
Thanks in advance
CREATE EXTERNAL TABLE dbo.Test2External (
[Guid] [varchar](36) NULL,
[Year] [smallint] NULL,
[SysNum] [bigint] NULL,
[Crc_1] [decimal](15, 2) NULL,
[Crc_2] [decimal](15, 2) NULL,
[Crc_3] [decimal](15, 2) NULL,
[Crc_4] [decimal](15, 2) NULL,
[CreDate] [date] NULL,
[CreTime] [datetime] NULL,
[UpdDate] [date] NULL,
...
WITH (
LOCATION='/20160823/1145/FIN/',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile
);
-- Run a query on the external table
SELECT count(*) FROM dbo.Test2External;
there is a workaround by using 'EXTERNAL FILE FORMAT' with 'FIRST_ROW = 2'.
e.g. if we create a file format
CREATE EXTERNAL FILE FORMAT [CsvFormatWithHeader] WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = False
)
)
GO
And then use this file format with create external table
CREATE EXTERNAL TABLE [testdata].[testfile1]
(
[column1] [nvarchar](4000) NULL
)
WITH (DATA_SOURCE = data_source,
LOCATION = file_location,
FILE_FORMAT = [CsvFormatWithHeader],REJECT_TYPE = PERCENTAGE,REJECT_VALUE = 100,REJECT_SAMPLE_VALUE = 1000)
It will skip first row while executing queries for 'testdata.testfile1'.
You have a few options:
get the file headers removed permanently because Polybase isn't really meant to work with file headers
Use Azure Data Factory which does have options for skipping header rows when the file is in Blob storage
set the rejection options of the Polybase table to try and ignore the header row, ie setREJECT_TYPE to VALUE and the REJECT_VALUE to 1, eg
this is a bit hacky as you don't have any control over whether or not this is actually the header row, but it would work if you only have one header row and it is the only error in the file. Example below.
For a file called temp.csv with this content:
a,b,c
1,2,3
4,5,6
A command like this will work:
CREATE EXTERNAL TABLE dbo.mycsv (
colA INT NOT NULL,
colB INT NOT NULL,
colC INT NOT NULL
)
WITH (
DATA_SOURCE = eds_esra,
LOCATION = N'/temp.csv',
FILE_FORMAT = eff_csv,
REJECT_TYPE = VALUE,
REJECT_VALUE = 1
)
GO
SELECT *
FROM dbo.mycsv
My results:
set the datatypes of the external table to VARCHAR just for staging the data then remove the header row when converting to an internal table using something like ISNUMERIC, eg
CREATE EXTERNAL TABLE dbo.mycsv2 (
colA VARCHAR(5) NOT NULL,
colB VARCHAR(5) NOT NULL,
colC VARCHAR(5) NOT NULL
)
WITH (
DATA_SOURCE = eds_esra,
LOCATION = N'/temp.csv',
FILE_FORMAT = eff_csv,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
)
GO
CREATE TABLE dbo.mycsv3
WITH (
CLUSTERED INDEX ( colA ),
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
colA,
colB,
colC
FROM dbo.mycsv2
WHERE ISNUMERIC( colA ) = 1
GO
HTH
Skip header rows on SQL Data Warehouse PolyBase load
Delimited text files are often created with a header row that contains the column names. These rows need to be excluded from the data set during the load. Azure SQL Data Warehouse users can now skip these rows by using the First_Row option in the delimited text file format for PolyBase loads.
The First_Row option defines the first row that is read in every file loaded. By setting the value to 2, you effectively skip the header row for all files.
For more information, see the documentation for the CREATE EXTERNAL FILE FORMAT statement.
When I attempt to import a .csv comma-delimited flat file into a Microsoft SQL server 2008R2 64-bit instance, for string columns a NULL in the original data becomes a literal string "NULL" and in a numeric column I receive an import error. Can anyone please help???
KISS
Pre-process it, Replace all "NULL" with "".
ie the .csv file will have
,,
Instead of
NULL,NULL,
Seems to do the job for me.
Put the data into a staging table and then insert to the production table using SQL code.
update table1
set field1 = NULL
where field1 = 'null'
Or if you want to do a lot of fields
update table1
set field1 = case when field1 = 'null' then Null else Field1 End
, field2 = case when field2 = 'null' then Null else Field2 End
, field3 = case when field3 = 'null' then Null else Field3 End
Adding to HLGEM's answer, I do it dynamically, I load into staging table here all column types are VARCHAR and then do:
DECLARE #sql VARCHAR(MAX) = '';
SELECT #sql = CONCAT(#sql, '
UPDATE [staging].[',[TABLE_NAME],']
SET [',[COLUMN_NAME],'] = NULL
WHERE [',[COLUMN_NAME],'] = ''NULL'';
')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [TABLE_SCHEMA] = 'staging'
AND [TABLE_NAME] IN ('MyTableName');
SELECT #sql;
EXEC(#sql);
Then do:
INSERT INTO [dbo].[MyTableName] ([col1], [col2], [colN])
SELECT [col1], [col2], [colN]
FROM [staging].[MyTableName]
Where table [dbo].[MyTableName] is defined with the desired column types, this also fails and tells you in type conversion errors...
I have about 100 columns in my table, 50 of which need to be changed to (smallmoney, null) format. Currently they are all (varchar(3), null).
How do I do that with the columns I want? Is there a quick way? Let's pretend I have 5 columns:
col1 (varchar(3), null)
col2 (varchar(3), null)
col3 (varchar(3), null)
col4 (varchar(3), null)
col5 (varchar(3), null)
how do I make them look like this:
col1 (smallmoney, null)
col2 (smallmoney, null)
col3 (smallmoney, null)
col4 (varchar(3), null)
col5 (varchar(3), null)
You can programmatically create the ALTER script, and then execute it. I just chopped this out, you'll need to validate the syntax:
SELECT
'ALTER TABLE "' + TABLE_NAME + '" ALTER COLUMN "' + COLUMN_NAME + '" SMALLMONEY'
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'MyTable'
AND COLUMN_NAME LIKE 'Pattern%'
Give this a shot, but make a backup of the table first... no idea how the automatic conversion of that data will go.
alter table badlyDesignedTable alter column col1 smallmoney, col2 smallmoney, col3 smallmoney
edit: changed syntax
You can query the system tables or ANSI views for the columns in question and generate the ALTER table statements. This
select SQL = 'alter table'
+ ' ' + TABLE_SCHEMA
+ '.'
+ TABLE_NAME
+ ' ' + 'alter column'
+ ' ' + COLUMN_NAME
+ ' ' + 'smallmoney'
from INFORMATION_SCHEMA.COLUMNS
where TABLE_SCHEMA = 'dbo'
and TABLE_NAME = 'MyView'
order by ORDINAL_POSITION
will generate an alter table statement for every column in the table. You'll need to either filter it in the where clause or past the results into a text editor and remove the ones you don't want.
Read up on ALTER TABLE ALTER COLUMN though...modifying a column with alter column comes with constraints and limitations, especially if it is indexed. alter table will fail if any column can't be converted to the target data type. For varchar->smallmoney conversion, you'll fail if any row contains anything that doesn't look like a string literal of the appropriate type. If it won't convert with CONVERT(smallmoney,alter table will fail. it the column contains nil ('') or whitespace (' '), the conversion will most likely succeed (in the case of a smallmoney target, I suspect you'll get 0.0000 as a result).
Bear in mind that multiple values may wind up converted to the same value in the target datatype. This can hose indexes.
If you're trying to convert from a nullable column to a non-nullable column, you'll need to first ensure that every row has a non-null value first. Otherwise the conversion will fail.
Good luck.