Load csv file data into tables - database

Created tables as below :
source:([id:`symbol$()] ric:();source:();Date:`datetime$())
property:([id:`symbol$()] Value:())
Then i have two .csv files which include two tables datas.
property.csv showing as below :
id,Value
TEST1,1
TEST2,2
source.csv showing as below :
id,ric,source,Date
1,TRST,QO,2017-07-07 11:42:30.603
2,TRST2,QOT,2018-07-07 11:42:30.603
Now , how to load csv file data into each tables one time

You can use the 0: to load delimited records. https://code.kx.com/wiki/Reference/ZeroColon
The most simple form of the function is (types; delimiter) 0: filehandle
The types should be given as their uppercase letter representations, one for each column or a blank space to ignore a column. e.g using "SJ" for source.csv would mean I wanted to read in the id column as a symbol and the value column as a long.
The delimiter specifies how each columns is separated, in your case Comma Separated Values (CSV). You can pass in the delimiter as a string "," which will treat every row as part of the data and return a nested list of the columns which you can either insert into a table with matching schema or you can append on headers and flip the dictionary manually and then flip to get a table like so: flip `id`value!("IS";",") 0: `:test.txt.
If you have column headers as the first row in the csv you can pass an enlisted delimeter enlist "," which will then use the column headers and return a table in kdb with these as the headers, which you can then rename if you see fit.
As the files you want to read in have different types for the columns and are to bed into you could create a function to read them in for examples
{x insert (y;enlist ",") 0:z}'[(`source;`property);("SSSP";"SJ");(`:source.csv;`:property.csv)]
Which would allow you to specify the name of the table that should be created, the column types and the file handle of the file.
I would suggest a timestamp instead of the (depreciated) datetime as it is stored as a long instead of a float so there will be no issues with comparison.

you can use key to list the contents of the dir ;
files: key `:.; /get the contents of the dir
files:files where files like "*.csv"; /filter the csv files
m:`property.csv`source.csv!("SJ";"JSSZ"); /create the mappings for each csv file
{[f] .[first ` vs f;();:; (m#f;enlist csv) 0: hsym f]}each files
and finally, load each csv file; please note here the directory is 'pwd', you might need to add the dir path to each file before using 0:

Related

ADF COPY ACTIVITY unable to identify the (") double quotes in between the column value while loading CSV files to snowflake

I'm facing the issue in ADF copy activity while loading the CSV data to the snowflake table,
The issue is while loading the CSV file to the snowflake table using the ADF COPY ACTIVITY, it is treating data of a single column as a multiple columns data,
for example: "My brother often watches different cricket shows on different ""screens"", but on the same different platform"
This is the value of single column_A but ADF copy activity is reading as a value for two-column instead of one
i.e col_A=My brother often watches different cricket shows on different ""screens"
col_B= but in the same different platform
But I want this value to be in single-column i,e column_A
column_A="My brother often watches different cricket shows on different ""screens"", but on the same different platform"
Any alternatives I could do for this?
In your source data, the column value contains comma , and double quotes " which are the same as your dataset properties column delimiter and Quote character.
Column delimiter is to separate the column based on the given delimiter value.
If the column value also contains the same delimiter character, the quote character is used to identify the complete value as a single column
Example:
sample data : "1,abc",def
Preview of data in Azure Data Factory dataset:
In your case you have both column delimiter and quotes character within your column value, so it is not identified as a single column but instead separated into 2 columns based on dataset property values (comma , and double-quotes ".)
Your sample data :
"My brother often watches different cricket shows on different ""screens"", but on the same different platform"
To fix this you can change the column delimiter in your source file or replace double quotes within column value with something else.
Example:

Modify the delimiter of an external table with HiveQL

I'm taking a CSV file from HDFS and transferring it to my External Table in hive.
But my CSV file has the delimiter " ; " and in my second column, I have " ; " along with the information.
You can see in the image below:
Can you guide me what I should do? Are there any Hive properties that allow me to do this or any other solution?
By default, ROW FORMAT TEXT FIELDS TERMINATED BY ';' will split it apart
If you want the (OS) value to be part of the second column, you need to quote that column. e.g. A;"Mozilla//5.0;(Linux)";BR. In other words, change how the file is written/stored outside of Hive
If you cannot modify the file, you can make your queries simply concatenate those two columns, e.g. SELECT CONCAT(user_agent, ';', os) FROM data;

Copy data files from internal stage table to Logical tables

I am dealing with json and csv files moving from Unix/S3 bucket to Internal/External stage receptively
and I don't have any issue with json files copy from Internal/External stages to Static or logical table, where I am storing as JsonFileName, and JsonFileContent
Trying to copy to Static table ( parse_json($1) is working for JSON)
COPY INTO LogicalTable (FILE_NM, JSON_CONTENT)
from (
select METADATA$FILENAME AS FILE_NM, parse_json($1) AS JSON_CONTENT
from #$TSJsonExtStgName
)
file_format = (type='JSON' strip_outer_array = true);
I am looking for something similar for CSV, copy csv file name and csv file content from internal/external staging to Static or logical tables. Mainly looking for this to separate file copy and file loading, load may fail due number of columns mismatch, newline character, or bad data in one of the records.
If any one of below gets clarified is fine, please suggest
1) Trying to copy to Static table (METADATA$?????? not working for CSV)
select METADATA$FILENAME AS FILE_NM, METADATA$?????? AS CSV_CONTENT
from #INT_REF_CSV_UNIX_STG
2) Trying for dynamic columns (T.* not working for CSV)
SELECT METADATA$FILENAME,$1, $2, $3, T.*
FROM #INT_REF_CSV_UNIX_STG(FILE_FORMAT => CSV_STG_FILE_FORMAT)T
Regardless of whether the file is CSV or JSON, you need to make sure that your SELECT matches the table layout of the target table. I assume with your JSON, your target table is 2 columns...filename and a VARIANT column for your JSON contents. For CSV, you need to do the same thing. So, you need to do the $1, $2, etc. for each column that you want from the file...that matches your target table.
I have no idea what you are referencing with METADATA$??????, btw.
---ADDED
Based on your comment below, you have 2 options, which aren't native to a COPY INTO statement:
1) Create a Stored Procedure that looks at a table DDL and generates a COPY INTO statement that has the static columns defined and then executing the COPY INTO from within the SP.
2) Leverage an External Table. By defining an External Table with the METADATA$FILENAME and the rest of the columns, the External Table will return the CSV contents to you as JSON. From there, you can treat it in the same way you are treating your JSON tables.

"EmptyHeader" in CSV Export Options?

I have a CSV file I am attempting to create, and the recipient requires a header row. In this header row (and in the data) there is a field that used to be present that was removed. However, they did not remove the column that that held that data, so now, there is an empty column name surrounded by delimiters ("|"). How can I recreate this?
The expected results for the following columns should be:
RxType1|RxType2|RxType3|RxType4|RxType5||DelivID
(There is an empty column between RxType5 and DelivID) and the results would be:
|Rx|OTC|Legend|Generic|Other||Express
I am using SSRS, and have attempted adding an extra pipe the the column header for RxType5 with an empty column behind it, but the CSV seems to generate a header row based on the column names from the stored procedure and not from the RDL data. I have also attempted in the Stored Proc to create the column by using:
Select
'' AS ""
OR
'' AS "|"
but when I refresh the fields in SSRS, it puts that the column is called "ID_" (because a space, no character, or pipe is non-CLS compliant.
Any suggestions on how I can achieve this? Thanks so much :)
Try creating the column with a known name, like SELECT '' AS [RemoveMe], and then just remove that name from the row header text box.

SAP Data Services .csv data file load from Excel with special characters

I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?
OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".

Resources