clickhouse import csv dictionary

clickhouse import csv dictionary - database

New CH user and I'm trying to setup a dictionary that maps airline 2 character codes to airline names to use in the onTime databases I create using the sample data from here https://clickhouse.com/docs/en/getting-started/example-datasets/ontime/
Then I manually created a csv file with these contents:
id,code,name
1,UA,United Airlines
2,HA,Hawaiian Airlines
3,OO,SkyWest
4,B6,Jetblue Airway
5,QX,Horizon Air
6,YX,Republic Airway
7,G4,Allegiant Air
8,EV,ExpressJet Airlines
9,YV,Mesa Airlines
10,WN,Southwest Airlines
11,OH,PSA Airlines
12,MQ,Envoy Air
13,9E,Endeavor Air
14,NK,Spirit Airlines
15,AA,American Airlines
16,DL,Delta Air Lines
17,AS,Alaska Airlines
18,F9,Frontier Airlines
Created the dictionary
CREATE DICTIONARY airlinecompany
(
id UInt64,
code String,
company String
)
PRIMARY KEY id
SOURCE(FILE(path '/var/lib/clickhouse/user_files/airlinenames.csv' format 'CSV'))
LAYOUT(FLAT())
LIFETIME(3600)
I can see the dictionary has been created
┌─name───────────┐
│ airlinecompany │
│ ontime │
└────────────────┘
But when I try and list its contents I get this error:
Received exception from server (version 22.3.3):
Code: 27. DB::Exception: Received from localhost:9000. DB::Exception: Cannot parse input: expected ',' before: 'id,code,name\r\n1,UA,United Airlines\r\n2,HA,Hawaiian Airlines\r\n3,OO,SkyWest\r\n4,B6,Jetblue Airway\r\n5,QX,Horizon Air\r\n6,YX,Republic Airway\r\n7,G4,Allegiant Air\r\n8,EV,':
Row 1:
Column 0, name: id, type: UInt64, ERROR: text "id,code,na" is not like UInt64
: While executing CSVRowInputFormat. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)
But I dont think a csv starts with a , before id. Am I missing something from my creation statement or do I need to generate a csv in a certain way?
*** Edit with correct insert:
2 main things I did wrong before
-Layout needs to be COMPLEX_KEY_HASHED()
-Primary key should becode
CREATE DICTIONARY airlinecompany
(
id UInt64,
code String,
company String
)
PRIMARY KEY code
SOURCE(FILE(path '/var/lib/clickhouse/user_files/airlinenames.csv' format 'CSVWithNames'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(3600)

CSV file contains the header so it needs to use CSVWithNames-format instead of CSV:
CREATE DICTIONARY airlinecompany
(
id UInt64,
code String,
company String
)
PRIMARY KEY id
SOURCE(FILE(path '/var/lib/clickhouse/user_files/airlinenames.csv' format 'CSVWithNames'))
LAYOUT(FLAT())
LIFETIME(3600)

Related

EXterna Table errorORA-29913- ORA-29400- KUP-04040

i created a external table when i select from it this error show.
i work with oracle 19c
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file customer.csv in EXTERNAL not found
------------code----------------
CREATE TABLE customers
(Email VARCHAR2(255) NOT NULL,
Name VARCHAR2(255) NOT NULL,
Phone VARCHAR2(255) NOT NULL,
Address VARCHAR2(255) NOT NULL)
ORGANIZATION EXTERNAL(
type oracle_loader
DEFAULT DIRECTORY external
ACCESS PARAMETERS
(
records delimited by newline
fields terminated by ','
missing field values are null
REJECT ROWS WITH ALL NULL FIELDS)
LOCATION ('customer.csv'))
REJECT LIMIT UNLIMITED;
customer.csv data
salma.55#gmm.com,salma,0152275522,44al,
mariam.66#hotmail.com,mariam,011145528,552www,
ahmed.85#gmail.com,ahmed,0111552774,44eee,

"DEFAULT DIRECTORY external" means you are looking in a named directory that you have called "external".
For example, if I had done:
create directory XYZ as '/tmp';
then
default directory XYZ
means I'll be searching in /tmp for my files. So look at DBA_DIRECTORIES to see where your "EXTERNAL" directory is pointing

how to Read headers of a CSV file in Snowflake stage

I am learning snowflake ,I was enter image description here trying to read the headers of CSV file stored in aws bucket ..I used the metadata fields that required me to input $1,$2 as column names and so on to obtain headers(for copy into table creation)..
is there a better alternative to this?
Statement :
select
Top 100 metadata$filename,
metadata$file_row_number,
t.$1,
t.$2,
t.$3,
t.$4,
t.$5,
t.$6
from
#aws_stage t
where
metadata$filename = 'OrderDetails.csv'

how to load multiple files into multiple destination table in ssis

HI I Have one doubt in ssis,
source Location have different files each file name is comes with location name .here we want load
each file name corresponding tables using ssis package.
source loacation have multiples files for each locationname files;
exaple:Files location : c:\Sourcefile\
Filesnames comes like : hyd files,bang files.
Hyd files comes like: hyd.txt,hyd1.txt hyd2.txt all are same structure only.hyd related all files load into hyd table only.
bang files comes like: bang.txt,bang.txt bang2.txt all are same structure only.bang related all files load into bang table only.
all source files and target tables structure are same only.
source FIles Structure: for hyd.txt file
Id,name,loc
1,abc,hyd
2,hari,hyd
for hyd1.txt file
id,name,loc
4,banu,hyd
5,ran,hyd
similar to bang:
id,name,loc
10,gop,bang
11,union,loc
for bang1.txt file
id,name,loc
14,ja,bang
here all hyd related text files load into hyd table in sql server database table. similar to bang fils load into bang table.
hyd table structure :
CREATE TABLE [dbo].[hyd](
[id] [int] NULL,
[name] [varchar](50) NULL,
[loc] [varchar](50) NULL
)
similar to bang
CREATE TABLE [dbo].[bang](
[id] [int] NULL,
[name] [varchar](50) NULL,
[loc] [varchar](50) NULL
)
I tried like below:
above one tables names not getting dynamically. i kept statistically values in table variable. that time all location related records are loaded into one table.
how to load multiple files into multiple destination table in ssis.please tell me how to achive this task in ssis

From the screenshots i have 3 suggestions:
You have to set the Data Flow Task Delay Validation property to True
You have to change the User::location variable value outside the Data flow task, you can add an expression task before the data flow task with the following expression
#[User::location] = SUBSTRING(#[User::FileName],1,FINDSTRING(#[User::FileName,".",1) -1)
or use a script component to achieve this
Or you can add a script task followed 2 data flow tasks inside the for each loop, the script task check the filename: if it is hyd it execute the first DFT , if it is bang it execute the second: (check this link: Working with Precedence Constraints in SQL Server Integration Services)

Import JSON into ClickHouse

I create table with this statement:
CREATE TABLE event(
date Date,
src UInt8,
channel UInt8,
deviceTypeId UInt8,
projectId UInt64,
shows UInt32,
clicks UInt32,
spent Float64
) ENGINE = MergeTree(date, (date, src, channel, projectId), 8192);
Raw data looks like:
{ "date":"2016-03-07T10:00:00+0300","src":2,"channel":18,"deviceTypeId ":101, "projectId":2363610,"shows":1232,"clicks":7,"spent":34.72,"location":"Unknown", ...}
...
Files with data loaded with the following command:
cat *.data|sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]+0300//'| clickhouse-client --query="INSERT INTO event FORMAT JSONEachRow"
clickhouse-client throw exception:
Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: location: (at row 1)
Is it possible to skip fields from JSON object that not presented in table description?

The latest ClickHouse release (v1.1.54023) supports input_format_skip_unknown_fields user option which eneables skipping of unknown fields for JSONEachRow and TSKV formats.
Try
clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"
See more details in documentation.

Currently, it is not possible to skip unknown fields.
You may create temporary table with additional field, INSERT data into it, and then do INSERT SELECT into final table. Temporary table may have Log engine and INSERT into that "staging" table will work faster than into final MergeTree table.
It is relatively easy to add possibility to skip unknown fields into code (something like setting 'format_skip_unknown_fields').

how to avoid re-inserting data into sql table while re-running SSIS package that loads data from flat file to SQL table?

I have a flat file with the following data
Id,FirstName,LastName,Address,PhoneNumber
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
4,nash,gordon,charlotte NC ADDRESS,111200110
I have created a SSIS package that has flat file source , and an aggregate function that makes sure that only unique rows are inserted and not duplicate records from the flat file , and SQL table as my destination.
Everything is fine when i run the package , i am getting below output in SQL table
Id FName LName Address phoneNumber
1 ben afflick xyz Address 5014123746
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
But when i add some new data to the flat file as below
Id,FirstName,LastName,Address,PhoneNumber
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
4,nash,gordon,charlotte NC ADDRESS,111200110
5,abc,xyz,New York,9999988888
and re-run the package the data that is already present in the table is getting re-inserted as below
1 ben afflick xyz Address 5014123746
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
1 ben afflick xyz Address 5014123746
5 abc xyz New York 9999988888
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
But i DO NOT want this , i don't want data to be inserted that is already present.
I want only the newly added data to get inserted into the SQL table.
Can someone please help me to achieve this?

Another method is to load your file into a staging table in the database and then use a merge statement to insert data into your destination table.
In practice this would look like a data flow from your flat file to your staging table and then a execute sql task containing a merge statement.
You can then update any matching values as well, should you wish to.
merge into table_a
using stage_a
on stage_a.key = table_a.key
when not matched then insert (a,b,c,d) values ( a,b,c,d )

Your data flow task would look something like this. Here, the Flat file source reads the CSV file and then passes the data to Lookup transformation. This transformation will check for existing data in the destination table. If there are no matching records, then the data from CSV file will be sent to the OLE DB Destination otherwise, the data will be discarded only.
lookup Transformation link ---
http://www.codeproject.com/Tips/574437/Term-Lookup-Transformation-in-SSIS

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

clickhouse import csv dictionary - database

CSV file contains the header so it needs to use CSVWithNames-format instead of CSV: CREATE DICTIONARY airlinecompany ( id UInt64, code String, company String ) PRIMARY KEY id SOURCE(FILE(path '/var/lib/clickhouse/user_files/airlinenames.csv' format 'CSVWithNames')) LAYOUT(FLAT()) LIFETIME(3600)

Related

EXterna Table errorORA-29913- ORA-29400- KUP-04040

how to Read headers of a CSV file in Snowflake stage

how to load multiple files into multiple destination table in ssis

Import JSON into ClickHouse

how to avoid re-inserting data into sql table while re-running SSIS package that loads data from flat file to SQL table?

Categories

Resources