PostgreSQL CSV file validation - database

I know CSV files can be imported into database using COPY.
But is there any way I can validate/check the CSV file structure/syntax before I use the COPY function? Is there any built in function?
Or is it possible to process the CSV file row by row, if so how?
For example, user uploads a new CSV file. I want to make sure that all CSV rows contain the required a,b,c,d columns, that all the values are set, that values in the fields are not over the allowed length, if the file itself is in the correct format (CSV), if the correct delimiter is used in the file, that the encoding of the file is UTF8 and similar checks.

Related

Parsing CEF files in snowflake

We have staged the log files in external stage s3.The staged log files are in CEF file format.How to parse CEF files from stage to move the data to snowflake?
If the files have a fixed format (i.e. there are record and field delimiters and each record has the same number of columns) then you can just treat it as a text file and create an appropriate file format.
If the file has a semi-structured format then you should be able to load it into a variant column - whether you can create multiple rows per file or only one depends in the file structure. If you can only create one record per file then you may run into issues with file size as a variant column has a maximum file size.
Once the data is in a variant column you should be able to process it to extract usable data from it. If there is a structure Snowflake can process (e.g. xml or json) then you can use the native capabilities. If there is no recognisable structure then you'd have to write your own parsing logic in a stored procedure.
Alternatively, you could try and find another tool that will convert your files to an xml/json format and then Snowflake can easily process those files.

How to convert a CSV file into bcp formatted file?

I'm into a task of importing a CSV file to SQL server table. I'm using bcp tool as my data can be large. The issue im facing with bcp is that the table where I'm gonna import CSV into can have a mix of data types like date, int, etc and if I use bcp using native mode (-n), I will need bcp file as the input but I have CSV file.
Is there any way to convert CSV file into bcp file? or
How can I import a CSV file into SQL server table given that my table columns can have any data type and not just character types?
Had it been that all columns are of character type, i would have used bcp tool with -c option.
Actually... the safest thing to do when importing data, especially when it ins bulk like this, is to import it into a staging table first. In this case where all of the fields are string/varchars. That then allows you to scrub/validate the data and make sure it's safe for consumption. Then once you've verified it, move/copy it to your production tables converting it to the proper type as you go. That's typically what I do when dealing with import data.
a CSV file is just a text file that is delimited by commas. With regard to importing text files, there is no such thing as a 'BCP' file. BCP has an option to work with native SQL data (unreadable to the human eye with a text editor), but the default is to just work with text the same as what you have in your CSV file. There is no conversion needed, with using textual data, there is no such thing as a "BCP file". It's just a ascii text file.
Whoever created the text file has already completed a conversion from their natural datatypes into text. As others have suggested, you will save yourself some pain later if you just load the textual CSV data file you have into a "load" table of all "VARCHAR" fields. Then from that load table you can manipulate the data into whatever datatypes you require in your final destination table. Better to do this than to make SQL do implied conversions by having BCP insert data directly into the final destination table.

BAI2 File needs to be load into SSIS

How can I load BAI2 file to SSIS?
.BAI2 is an industry standard format used by the banks. Below is the one truncated example
01,021000021,CST_USER,110520,1610,1627,,,2/
02,CST_USER,089900137,1,110509,1610,,2/
03,000000370053368,USD,010,782711622,,,015,7620008 12,,,040,760753198,,/
88,043,760000052,,,045,760010026,,,050,760000040,, ,055,760000045,,/
Use a Flat file connection manager
I think you can import these files using a flat file connection manager, because they are similar to comma separated text, try to change the row delimiter and column delimiter properties to find the appropriate one.
From the example you mentioned i think you should use:
, as Column delimiter
/ as Row delimiter
To learn more about how to interpret a BAI2 file check the following link:
EBS – How to interpret a BAI2 file
Based on this link:
The BAI2 file is a plain text file (.TXT Format), which contains values / texts one after the other.
Because the number of columns is not fixed among all rows than you must use define only one column (DT_STR,4000) in the flat file connection manager, and split columns using a Script Component:
SSIS ragged file not recognized CRLF
how to check column structure in ssis?
SSIS : Creating a flat file with different row formats
Helpful links
SQL SERVER – Import CSV File into Database Table Using SSIS
Importing Flat Files with Inconsistent Formatting Using SSIS
SSIS Lesson 2: First Package

Mass import txt files in a single SQL Server table, using filename as key column

I have a folder of txt files. The filenames are of the form [integer].txt (like 1.txt, 2.txt and so on).
I have a table, let's say TableA (id int not null, contents varchar(max))
I want a way to mass import the contents of those files into TableA, populating the id column from the filename. Each file will be a single record in the table. It's not a delimited file.
I've looked into SSIS and flat-file source, but I could not find a way to select a folder instead of a single file (this answer claims it can be done, but I could not find out how).
Bulk Insert is my next bet, but I'm not sure how I can populate the id column with the filename.
Any ideas?
For anyone that might need it, I ended up solving this by:
Using a ForEach loop container (Thanks for the hint #Panagiotis
Kanavos)
Using a flat-file source, setting as row delimiter and column
delimiters a sequence I know didn't exist in the file (for example '$$$')
Assigning the filename to a variable, and the full path to a computed
variable (check this great post on how to assign the variables)
Using a derived column to pass the filename in the output (check out
this answer)

How to import variable record length CSV file using SSIS?

Has anyone been able to get a variable record length text file (CSV) into SQL Server via SSIS?
I have tried time and again to get a CSV file into a SQL Server table, using SSIS, where the input file has varying record lengths. For this question, the two different record lengths are 63 and 326 bytes. All record lengths will be imported into the same 326 byte width table.
There are over 1 million records to import.
I have no control of the creation of the import file.
I must use SSIS.
I have confirmed with MS that this has been reported as a bug.
I have tried several workarounds. Most have been where I try to write custom code to intercept the record and I cant seem to get that to work as I want.
I had a similar problem, and used custom code (Script Task), and a Script Component under the Data Flow tab.
I have a Flat File Source feeding into a Script Component. Inside there I use code to manipulate the incomming data and fix it up for the destination.
My issue was the provider was using '000000' as no date available, and another coloumn had a padding/trim issue.
You should have no problem importing this file. Just make sure when you create the Flat File connection manager, select Delimited format, then set SSIS column length to maximum file column length so it can accomodate any data.
It appears like you are using Fixed width format, which is not correct for CSV files (since you have variable length column), or maybe you've incorrectly set the column delimiter.
Same issue. In my case, the target CSV file has header & footer records with formats completely different than the body of the file; the header/footer are used to validate completeness of file processing (date/times, record counts, amount totals - "checksum" by any other name ...). This is a common format for files from "mainframe" environments, and though I haven't started on it yet, I expect to have to use scripting to strip off the header/footer, save the rest as a new file, process the new file, and then do the validation. Can't exactly expect MS to have that out-of-the box (but it sure would be nice, wouldn't it?).
You can write a script task using C# to iterate through each line and pad it with the proper amount of commas to pad the data out. This assumes, of course, that all of the data aligns with the proper columns.
I.e. as you read each record, you can "count" the number of commas. Then, just append X number of commas to the end of the record until it has the correct number of commas.
Excel has an issue that causes this kind of file to be created when converting to CSV.
If you can do this "by hand" the best way to solve this is to open the file in Excel, create a column at the "end" of the record, and fill it all the way down with 1s or some other character.
Nasty, but can be a quick solution.
If you don't have the ability to do this, you can do the same thing programmatically as described above.
Why can't you just import it as a test file and set the column delimeter to "," and the row delimeter to CRLF?

Resources