I am using following postgres command in terminal to output very large query result into CSV format:
psql -d ecoprod -t -A -F"," -f queries/query.sql > exports/output.csv
It works just fine except its not valid CSV format. Text values should be wrapped in quotes "". Its not and its causing many problems parsing the CSV when there are commas in the text and so on.
Of course I could use another delimiter like semicolon however its the similar problem. In addition some text values contain line break characters which also breaks the parsing.
Didnt find any way to modify the command in documentation. Hope you will help me. Thank you.
-F doesn't promise to generate valid CSV. There is a --csv option you could use instead, which is at least intended for this purpose. But it seems like COPY or \copy would be more suited.
Related
I have some sensitive information that I need to import into SQL Server that is proving to be a challenge. I'm not sure what the original database that housed this information was, but I do know it is provided to us in a Unix fixed length text file with LF row terminator. I have two files: a small file that covers a month's worth of data, and a much larger file that covers 5 years worth of data. I have created a BCP format file and command that successfully imports and maps the data to my SQL Server table.
The 5 year data is supposedly in the same format, so I've used the same command and format file on the text file. It starts processing some records, but somewhere in the processing (after several thousand records), it throws Unexpected EOF encountered and I can see in the database some of the rows are mapped correctly according to the fixed lengths, but then something goes horribly wrong and screws up by inserting parts of data in columns they most definitely do not belong in. Is there a character that would cause BCP to mess up and terminate early?
BCP Command: BCP DBTemp.dbo.svc_data_temp in C:\Test\data2.txt -f C:\test\txt2.fmt -T -r "0x0A" -S "stageag,90000" -e log.rtf
Again, format file and command work perfectly for the the smaller data set, but something in the 5 year dataset is screwing up BCP.
Thanks in advance for the replies!
So I found the offending characters in my fixed width file. Somehow whoever pulled the data originally (I don't have access to the source), escaped (or did not escape correctly) the double quotes in some of the text, resulting in some injection of extra spaces breaking the fixed width guidelines we were supposed to be following. After correcting the double quotes by hex editing the file, BCP was able to process all records using the format file without issue. I had used the -F and -L flags to examine certain rows of the data and to narrow it down to where I could visually compare the rows that were ok and the rows where the problems started to arise, which led me to discover the double quotes issue. Hope his helps for somebody else if they have an issue similar to this!
My BCP command looks like this:
BCP azuredatabase.dbo.rawdata IN "clientPathToCSVFile" -S servername -U user#servername -P pw -c -t,-r\n
My CSV file is in {cr}{lf} format.
My CSV looks like this
125180896918,20,9,57.28,2020-01-04 23:02:21,282992,1327,4,2850280,49552
125180896919,20,10,57.82,2020-01-04 23:02:21,282992,1298,4,2850280,48881
125180896920,16,11,58.20,2020-01-04 23:02:21,282992,1065,4,2850280,48612
125180896921,20,12,69.10,2020-01-04 23:02:21,282992,515,4,2850280,10032
125180896922,20,13,70.47,2020-01-04 23:02:21,282992,1280,4,2850280,48766
125180896923,1,1,105.04,2020-01-04 23:02:21,,1296,4,2969398,49161
As you can see there are also empty fields.
My output looks like this
Starting copy...
0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 547
So how do I correctly setup my command for BCP?
You stated that your data is in CRLF format (that means \r\n). But your bcp command is told to look for a line terminator of \n (using the -r option).
I would have expected to see the first half of your actual CRLF line terminator of "\r\n" be split in half with the \r being included in your last column and the \n being found as the line terminator, but it look like BCP loaded no rows because it found no \n in your file.
I have not worked with Azure/BCP much, so maybe someone else knows how BCP for Azure would handle this, but the SQL Server version of BCP would still find your \n and then load the \r into your last column.
Either that or your line terminator is now what you think it is. Have you viewed the file with a text editor (not notepad, not wordpad... something that will show hidden characters like line terminators and tabs and such).
Usually, when BCP loads no rows (and there are rows in the file to load), it could be a mixup with line terminators.
I've seen some suggestions to use eval, but that assumes the quoted text is an entire string in itself, and not part of one.
Simple example:
Split the string
<SERVICETYPE Name="Two words">
So that we get
<SERVICETYPE
Name="Two words">
Is this possible? Ideally in a statement that I can then use to loop through the values. (Yes, I know perl or something would be easier, but I don't have anything more useful than bash availale, so I have to get this working).
I'm currently splitting into an array with the following
IFS=" " read -ra xmlfield <<< ${xmlline}
for i in "${xmlfield[#]}"; do
But then of course that gives me:
<SERVICETYPE
Name="Two
words">
Which is a pain.
EDIT: Fixed! Changing grep to find fixed the newline issue.
I'm trying to output newline characters to a results file when I'm running a script to log test results. After every test result, I want the script to output a newline character for proper formatting.
The command I'm sending is:
tester.bat | grep "Passed all trials" > results.txt
It works properly, but it outputs like this inside results.txt:
Test #1: Passed all trialsTest #2: Passed all trials
and so on.
I'd like it to output like this inside results.txt:
Test #1: Passed all trials (newline)
Test #2: Passed all trials
Can it be done in a single line? If not, I'm open to longer solutions. If this is not possible, I guess I could write a separate script that will update the text file with appropriate newlines between "trialsTest," but this route is not preferable. Thanks in advance!
why using external utilities, when there are built-in function that do exactly what you Need?
tester.bat|find "Passed all trials"
or
tester.bat|findstr /c:"Passed all trials"
I try to do a bulk copy from a file into a table in sqlserver 2008. Its only a part of the file that I want to copy (in the BCP command example bellow it is only line number 3 I will copy).
Sometimes I receive the "Unexpected EOF encountered" error message.
The command is:
BCP tblBulkCopyTest in d:\bc_file.txt -F3 -L3 -c -t, -S(local) -Utest -Ptest
When the file looks like the following, the copy works fine (the line number 3 is inserted into my table):
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
When the file looks like the following, I receive "Unexpected EOF encountered" error message:
Test
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
It seems like when the file starts with something not in the correct format, the BCP command fails, even though it is not the line I want to copy (in the bcp command I have specified line number 3).
When the file looks like the following, the copy works fine:
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
Test
So it is only when the file has some incorrect data "before" the lines I want to copy, that I receive the error.
Any suggestions how the BCP command will ignore the lines, not in question?
The way I solve that kind of errors every day:
1) Create a table with single column tblRawBulk(RawRow varchar(1000)).
2) Insert all of the rows there.
3) Remove unnecessary unformatted rows (e.g. 'test' in your example) with WHERE-parameter. Or even remove all unnecessary rows (except of rows, that you need to load) to simplify point 5.
4) Upload this table with BCP to some workfolder.
5) Download this new file.
It's no exact what you want, but may help if you'll write corresponding stored procedure, that can automatically do all this things.