I am unable to find a way to create an external table in Azure SQL Data Warehouse (Synapse SQL Pool) with Polybase where some fields contain embedded commas.
For a csv file with 4 columns as below:
myresourcename,
myresourcelocation,
"""resourceVersion"": ""windows"",""deployedBy"": ""john"",""project_name"": ""test_project""",
"{ ""ResourceType"": ""Network"", ""programName"": ""v1""}"
Tried with the following Create External Table statements.
CREATE EXTERNAL FILE FORMAT my_format
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR=',',
STRING_DELIMITER='"',
First_Row = 2
)
);
CREATE EXTERNAL TABLE my_external_table
(
resourceName VARCHAR,
resourceLocation VARCHAR,
resourceTags VARCHAR,
resourceDetails VARCHAR
)
WITH (
LOCATION = 'my/location/',
DATA_SOURCE = my_source,
FILE_FORMAT = my_format
)
But querying this table gives the following error:
Failed to execute query. Error: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.
Any help will be appreciated.
Currently this is not supported in polybase, need to modify the input data accordingly to get it working.
Related
I'm executing select * from owner_name.tablename; which is the external table having dat file.
while executing this it is not returning any rows and not even count(0) fetching results. I could see the below error in log file.
error processing column ID in row 1 for datafile /external/ab/output/user/table_name.dat
ORA-01722: invalid number
.
.
.
.
.
.
.
error processing column ID in row 140489 for datafile /external/ab/output/user/table_name.dat
ORA-01722: invalid number
But, the same dat file and same table executing fine in owner schema.
I did check all like dat file, DDL of table, file location, spaces in data file, delimiters and everything looking same but still error ORA-01722
What am I missing here? Previous stackoverflow questions are for insert queries and not for external tables.
DDL :
CREATE TABLE "OWNER"."TABLE_EXT"
( "ID" NUMBER(22,0),
"SOURCE_CODE" NUMBER(3,0)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "OWNER_EXT"
ACCESS PARAMETERS
( records delimited BY newline FIELDS TERMINATED BY "Ç" missing field VALUES are NULL
(
CLAIM_ID ,
SOURCE_CODE ,
) )
LOCATION
( "OWNER_EXT":'table.dat'
)
)
REJECT LIMIT UNLIMITED
PARALLEL 16 ;
DAT file :
Ç3Ç5278260051Ç557065100415Ç5278260051ÇÇÇÇÇÇÇ
Hi I have declared a table like this
create or replace table app_event (
ID varchar(36) not null primary key,
VERSION number,
ACT_TYPE varchar(255),
EVE_TYPE varchar(255),
CLI_ID varchar(36),
DETAILS variant,
OBJ_TYPE varchar(255),
DATE_TIME timestamp,
AAPP_EVENT_TO_UTC_DT timestamp,
GRO_ID varchar(36),
OBJECT_NAME varchar(255),
OBJ_ID varchar(255),
USER_NAME varchar(255),
USER_ID varchar(255),
EVENT_ID varchar(255),
FINDINGS varchar(255),
SUMMARY variant
);
DETAILS column will contain xml file so that i can run xml function and get element of that xml file .
My sample rows looks like this
dfjkghdfkjghdf8gd7f7997,0,TEST_CASE,CHECK,74356476476DFD,<?xml version="1.0" encoding="UTF-8"?><testPayload><testId>3495864795uiyiu</testId><testCode>COMPLETED</testCode><testState>ONGOING</testState><noOfNewTest>1</noOfNewTest><noOfReviewRequiredTest>0</noOfReviewRequiredTest><noOfExcludedTest>0</noOfExcludedTest><noOfAutoResolvedTest>1</noOfAutoResolvedTest><testerTypes>WATCHLIST</testerTypes></testPayload>,CASE,41:31.3,NULL,948794853948dgjd,(null),dfjkghdfkjghdf8gd7f7997,test user,dfjkghdfkjghdf8gd7f7997,NULL,(null),(null)
When i declare DETAILS as varchar i am able to load file but when i declare this as variant i get below error for that column only
Error parsing JSON:
dfjkghdfkjghdf8gd7f7997COMPLETED</status
File 'SNOWFLAKE/Sudarshan.csv', line 1, character 89 Row 1, column
"AUDIT_EVENT"["DETAILS":6]
Can you please help on this ?
I can not use varchar as i need to query element of xml also in my query .
This is how i load into table and i use default CSV format ,file is available in S3 .
COPY INTO demo_db.public.app_event
FROM #my_s3_stage/
FILES = ('app_Even.csv')
file_format=(type='CSV');
Based on Answer this is how i am loading
copy into demo_db.public.app_event from (
select
$1,$2,$3,$4,$5,
parse_xml($6),$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,parse_xml($17)
from #~/Audit_Even.csv d
)
file_format = (
type = CSV
)
But when i execute it says zero row processed and no mentioned stage here
If you are using a COPY INTO statement then you need to put in a subquery to convert the data before loading it into the table. Use the parse_xml within your copy statement's subquery, something like this:
copy into app_event from (
select
$1,
parse_xml($2) -- <---- "$2" is the column number in the CSV that contains the xml
from #~/test.csv.gz d -- <---- This is my own internal user stage. You'll need to change this to your external stage or whatever
)
file_format = (
type = CSV
)
It is hard to provide you with a good SQL statement without a full example of your existing code (your copy / insert statement). In my example above, I'm copying a file in my own user stage (#~/test.csv.gz) with the default CSV file format options. You are likely using an external stage but it should be easy to adapt this to your own example.
Getting the error while creating External File Format in Azure SQL DB
Incorrect syntax near 'EXTERNAL'.
I am using the following commands (Used the T-SQL syntax from Microsoft Docs Link - https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15&tabs=delimited) but still getting the syntax error:
--Example 1
CREATE EXTERNAL FILE FORMAT textdelimited1
WITH ( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = '|')
GO
--Example 2
CREATE EXTERNAL FILE FORMAT skipHeader_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True)
)
As #wBob mentioned, since External file format is not supported on Azure SQL DB and MI. We can use EXTERNAL DATA SOURCE. There are many reasons for this problem (Cannot bulk load because the ... could not be opened).
Check whether the SAS key has expired. And please check the Allowed permissions.
Did you delete the question mark when you create the SECRET?
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12******2FspTCY%3D'
I've tried the following test, it works well.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
go
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12&ss=bfqt&srt=sco&sp******%2FspTCY%3D'; -- dl
CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://***.blob.core.windows.net/<container_name>',
CREDENTIAL = UploadInvoices
);
BULK INSERT production.customer
FROM 'bs140513_032310-demo.csv'
WITH
(
DATA_SOURCE = 'MyAzureInvoices',
FORMAT = 'CSV',
FIRSTROW = 2
)
GO
I am trying to load data from azure blob storage.
The data has already been staged.
But, the issue is when I try to run
copy into random_table_name
from #stage_name_i_created
file_format = (type='csv')
pattern ='*.csv'
Below is the error I encounter:
raise error_class(
snowflake.connector.errors.ProgrammingError: 001757 (42601): SQL compilation error:
Table 'random_table_name' does not exist
Basically, it says table does not exist, which it does not, but the syntax on website is the same as mine.
COPY INTO query on Snowflake returns TABLE does not exist error
In my case the table name is case-sensitive. Snowflake seems to convert everything to upper case. I changed the database/schema/table names to all upper-case and it started working.
First run the below query to fetch the column headers
select $1 FROM #stage_name_i_created/filename.csv limit 1
Assuming below are the header lines from your csv file
id;first_name;last_name;email;age;location
Create a file_format csv
create or replace file format semicolon
type = 'CSV'
field_delimiter = ';'
skip_header=1;
Then you should define the datatype and field name as below
create or replace table <yourtable> as
select $1::varchar as id
,$2::varchar as first_name
,$3::varchar as last_name
,$4::varchar as email
,$5::int as age
,$6::varchar as location
FROM #stage_name_i_created/yourfile.csv
(file_format => semicolon );
The table must exist prior to running a COPY INTO command. In your post, you say that the table does not exist...so that is your issue.
If your table exist, try by forcing the table path like this:
copy into <database>.<schema>.<random_table_name>
from #stage_name_i_created
file_format = (type='csv')
pattern ='*.csv'
or by steps like this:
use database <database_name>;
use schema <schema_name>;
copy into database.schema.random_table_name
from #stage_name_i_created
file_format = (type='csv')
pattern ='*.csv';
rbachkaniwala, what do you mean by 'How do I create a table?( according to snowflake syntax it is not possible to create empty tables)'.
You can just do below to create a table
CREATE TABLE random_table_name (FIELD1 VARCHAR, FIELD2 VARCHAR)
The table does need to exist. You should check the documentation for COPY INTO.
Other areas to consider are
do you have the right context set for the database & schema
does the user / role have access to the table or object.
It basically seems like you don't have the table defined yet. You should
ensure the table is created
ensure all columns in the CSV exist as columns in the table
ensure the order of the columns are the same as in the CSV
I'd check data types too.
"COPY INTO" is not a query command, it is the actual data transfer execution from source to destination, which both must exist as others commented here but If you want just to query without loading the files then run the following SQL:
//Display list of files in the stage to verify stage
LIST #stage_name_i_created;
//Create a file format
CREATE OR REPLACE FILE FORMAT RANDOM_FILE_CSV
type = csv
COMPRESSION = 'GZIP' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO'
NULL_IF = ('\\N');
//Now select the data in the files
Select $1 as first_col,$2 as second_col //can add as necessary number of columns ...etc
from #stage_name_i_created
(FILE_FORMAT => RANDOM_FILE_CSV)
More information can be found in the documentation link here
https://docs.snowflake.com/en/user-guide/querying-stage.html
The WHERE condition in my T-SQL query is not returning any rows, from SQL 2019 polybase to .XLSX file
Here's my code that created Polybase:
create master key encryption by password = 'Polybase2CSV';
create external data source myODBCxlsx with
(
LOCATION = 'odbc://localhost',
CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; DBQ=F:\PolybaseSourceData\CustomerData.xlsx'
);
create external table CustomerData(
CUSTOMERID FLOAT(53),
CUSTOMERNAME Nvarchar(255),
DEPARTMENT Nvarchar(255)
) with (
LOCATION='[sheet1$]',
DATA_SOURCE=myODBCxlsx
);
This query works:
select * from customerData
However this doesn't:
select * from customerData where customername = 'Steve'
The query doesn't return any rows, although there's a customer by name Steve.
PUSHDOWN is automatically enabled by default if you don't specify a setting when creating an external data source. Unfortunately, that particular driver doesn't work with PUSHDOWN enabled, so you will get errors with simple queries. Turning off PUSHDOWN will resolve that.
The external data source definition should look like this:
create external data source myODBCxlsx with
(
LOCATION = 'odbc://localhost',
CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; DBQ=F:\Files\CustomerData.xlsx',
PUSHDOWN=OFF
);