Issue with record length when creating CSV in SQLCMD - sql-server

I am experiencing an issue when generating a csv file from SQLCMD. I have a test table which looks like this.
Which has 2 numeric columns and 2 nvarchar columns (one 64 characters long and the other 2000 characters long).
If I issue the command select * from test what I get is this
number1 number2 varcharacter1 varcharacter2
1 2 test longertext
2 3 test2 longertext2
3 4 test3 longertext3
If I want to output this as csv I use this sql:
select cast(number1 as nvarchar) + ',' +
varcharacter2 + ',' +
varcharacter1 + ',' +
cast(number2 as nvarchar)
from test
This gives me this which looks OK
(No column name)
1,longertext,test,2
2,longertext2,test2,3
3,longertext3,test3,4
The issue is when I enter it via SQLCMD. If I issue the following command
Scripts>sqlcmd -S localhost\STUDIO -d studiodb -U sa -P Studio2016! -h-1 -i test.sql -o test.csv
What I get is a CSV file which contains the data, but when I open it in TextPad the last column is padded out to 2000 characters long.
1,longertext,test,2
2,longertext2,test2,3
3,longertext3,test3,4
What is causing this and how can I fix it? What am I doing wrong?

Related

Export data in csv using bat file

I'm trying to export my data to a csv file using the command below. I am using -s, for comma delimited.
It's working fine but I face one problem when my customer name has comma(,) inside. In this case it's cutting the customer name in two different column.
How do I resolve this issue?
sqlcmd -S . -d MYDB -E -Q "set nocount on; select 'customer_id','customer_name','salesrep_id'; select customer_id,customer_name,salesrep_id from customer where customer_id=106866" -b -o C:\customer.csv -h-1 -s, -w 700
It would be nice to know your datatypes. But generally put varchar in double quotes. Here's the very basic idea:
select customer_id, '"' + customer_name + '"', salesrep_id from customer where customer_id=106866
I generally like to explicitly create the csv line myself, so my selects usually look more like this:
select '"' + customer_id + '","' + customer_name + '"," + salesrep_id + '"' from customer where customer_id=106866
If there are columns that are numbers, then I convert the column to varchar and don't put double quotes around it.
Based on your comments:
Here is a .cmd file that I use in production to create a .csv file:
set proc=NoBillsReport
sqlcmd -E -S win11.net.davisbrownlaw.com -d ProLaw -Q "exec %proc%" -h -1 -W > "c:\2019\no-bills.csv"
It uses a stored procedure and redirects the output instead of trying to have SQL do the output.
Here is a simplified version of my stored procedure:
--Header row
select '"Matter ID","Client Sort","Matter Description","Billing Initials","Area of Law","Fees No-Billed","Costs No-Billed"'
-- Costs that were no-billed in previous month
select '"' + matterid + '","' + clientsort + '","' + shortdesc + '","'
+ initials + '","' + areaoflaw + '",' + convert(varchar(50),nobilled_fees)
+ ',' + convert(varchar(50),nobilled_costs)
from Matters
And its output:
"Matter ID","Client Sort","Matter Description","Billing Initials","Area of Law","Fees No-Billed","Costs No-Billed"
"70441233","ACME","Closing Wheel, Whenever","LA","IP",0.00,7.98
which works great as the contents of a .csv file.
The stored procedure is nice because it doesn't all have to fit on 1 long command line. But, for your use, I think this has a shot at working:
sqlcmd -S . -d MYDB -E -Q "set nocount on; select '""customer_id"",""customer_name"",""salesrep_id""'; select convert(varchar(50),customer_id) + ',""' + customer_name + '"",""' + salesrep_id + '""' from customer where customer_id=106866; " -h -1 -W > C:\customer.csv
Note that this is called from a .cmd file. The double double quotes might be different if called directly from the command line. Also, on my system, the 11 spaces after the ; at the end of the sql line is necessary due to how the double double quotes are consolidated. And yes, there are single quotes and double double quotes all over the place.
Based on further comments, here is how I script a .cmd file to pull year, month, day:
:: Get current year/month/day
for /f "skip=1 tokens=2-4 delims=(-)" %%a in ('"echo.|date"') do (
for /f "tokens=1-3 delims=/.- " %%A in ("%Date:* =%") do (
set %%a=%%A&set %%b=%%B&set %%c=%%C)
)
)
set /a "yy=10000%yy% %% 10000 %% 2000 + 2000,mm=100%mm% %% 100,dd=100%dd% %% 100"
:: Pad with leading zeros if needed
set mm=0%mm%
set mm=%mm:~-2%
set dd=0%dd%
set dd=%dd:~-2%
Then your filename would be "customer%yy%%mm%%dd%.csv"

Split file in Unix based on occurence of some specific string

Contents of my file is as following
Tenor|CurrentCoupon
15Y|3.091731898890382
30Y|3.5773546584901617
Id|Cusip|Ticker|Status|Error|AsOfDate|Price|LiborOas
1|01F020430|FN 15 2 F0|1||20180312|95.19140625|-0.551161358515
2|01F020448|FN 15 2 F1|1||20180312|95.06640625|1.18958768351
3|01F020547|FN 20 2 F0|1||20180312|90.484375|50.742896921
4|01F020554|FN 20 2 F1|1||20180312|90.359375|52.4642397071
5|01F020646|FN 30 2 F0|1||20180312|90.25|6.26649840403
and I have to split it into 2 files like
Tenor,CurrentCoupon
15Y,3.294202313
30Y,3.727696014
and
Id,Cusip,Ticker,Status,Error,AsOfDate,Price,LiborOas
1,01F020489,FN 15 2 F0,1,,20180807,94.27734375,6.199343069
2,01F020497,FN 15 2 F1,1,,20180807,94.15234375,8.225144379
3,01F020588,FN 20 2 F0,1,,20180807,89.984375,48.11248894
I have very little knowledge of UNIX scripts. The number of rows will vary.
Using awk you can do something very simple
awk -F '|' '{print $0 > NF ".txt"}' yourfile.txt
This command will split your file into 2.txt (all rows containing 2 columns) and 8.txt (all rows containing 8 columns)
To understand this command, -F option sets the delimiter, awk will parse your file line by line, $0 stands for the entire row, NF for the number of fields in the parsed row.
If you want to change the delimiter from | to , :
awk -F '|' 'BEGIN{OFS=","};{$1=$1; print > NF ".txt"}' yourfile.txt
OFS stands for Output File Separator, $1=$1 is an ugly hack to rebuild your row with the right separator ^^

Pick 20 records each time and transpose from a big file

I have a big file with 1 column and 800,000 rows
Example:
123
234
...
5677
222
444
I want to transpose it into 20 numbers per line.
Example:
123,234,....
5677,
222,
444,....
I tried using while loop like this
while [ $(wc -l < list.dat) -ge 1 ]
do
cat list.dat | head -20 | awk -vORS=, '{ print $1 }'| sed 's/,$/\n/' >> sample1.dat
sed -i -e '1,20d' list.dat
done
but this is insanely slow.
Can anyone suggest a faster solution?
pr is the right tool for this, for example:
$ seq 100 | pr -20ats,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60
61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80
81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
For your file, try pr -20ats, list.dat
Based on width of column text, you might run into the error pr: page width too narrow. In that case, try:
$ seq 100000 100100 | pr -40ats,
pr: page width too narrow
$ seq 100000 100100 | pr -J -W79 -40ats,
100000,100001,100002,100003,100004,100005,100006,100007,100008,100009,100010,100011,100012,100013,100014,100015,100016,100017,100018,100019,100020,100021,100022,100023,100024,100025,100026,100027,100028,100029,100030,100031,100032,100033,100034,100035,100036,100037,100038,100039
100040,100041,100042,100043,100044,100045,100046,100047,100048,100049,100050,100051,100052,100053,100054,100055,100056,100057,100058,100059,100060,100061,100062,100063,100064,100065,100066,100067,100068,100069,100070,100071,100072,100073,100074,100075,100076,100077,100078,100079
100080,100081,100082,100083,100084,100085,100086,100087,100088,100089,100090,100091,100092,100093,100094,100095,100096,100097,100098,100099,100100
Formula for -W value is (col-1)*len(delimiter) + col where col is number of columns required
From man pr
pr - convert text files for printing
-a, --across
print columns across rather than down, used together with -COLUMN
-t, --omit-header
omit page headers and trailers; implied if PAGE_LENGTH <= 10
-s[CHAR], --separator[=CHAR]
separate columns by a single character, default for CHAR is the character without -w and 'no
char' with -w. -s[CHAR] turns off line truncation of all 3 column options (-COLUMN|-a -COLUMN|-m)
except -w is set
-COLUMN, --columns=COLUMN
output COLUMN columns and print columns down, unless -a is used. Balance number of lines in the columns
on each page
-J, --join-lines
merge full lines, turns off -W line truncation, no column alignment, --sep-string[=STRING] sets separa‐
tors
-W, --page-width=PAGE_WIDTH
set page width to PAGE_WIDTH (72) characters always, truncate lines, except -J option is set, no inter‐
ference with -S or -s
See also Why is using a shell loop to process text considered bad practice?
If you don't wish to use any other external binaries, you can refer the below SO link answering a similar question in depth.
bash: combine five lines of input to each line of output
If you want to use sed:
sed -n '21~20 { x; s/^\n//; s/\n/, /g; p;}; 21~20! H;' list.dat
The first command
21~20 { x; s/^\n//; s/\n/, /g; p;},
is triggered at lines matching 21+(n*20); n>=0. Here everything that was put in the hold space at complement lines via the second command:
21~20! H;
is processed:
x;
puts the content of the hold buffer (20 lines) in the pattern space and places the current line (21+(n*20)) in the hold buffer. In the pattern space:
s/^\n//
removes the trailing new line and:
s/\n/, /g
does the desired substitution.:
p;
prints the now 20-columned row.
After that the next line is read in the hold buffer and the process continues.

Find content of one file from another file in UNIX

I have 2 files. First file contains the list of row ID's of tuples of a table in the database.
And second file contains SQL queries with these row ID's in "where" clause of the query.
For example:
File 1
1610657303
1610658464
1610659169
1610668135
1610668350
1610670407
1610671066
File 2
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668350;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672154;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668135;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672153;
I have to read File 1 and search in File 2 for all the SQL commands which matches the row ID's from File 1 and dump those SQL queries in a third file.
File 1 has 1,00,000 entries and File 2 contains 10 times the entries of File 1 i.e. 1,00,0000.
I used grep -f File_1 File_2 > File_3. But this is extremely slow and the rate is 1000 entries per hour.
Is there any faster way to do this?
You don't need regexps, so grep -F -f file1 file2
One way with awk:
awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.
Machine Specs:
Intel(R) Xeon(R) CPU E5-2670 0 # 2.60GHz (8 cores)
98 GB RAM
I suggest using a programming language such as Perl, Ruby or Python.
In Ruby, a solution reading both files (f1 and f2) just once could be:
idxes = File.readlines('f1').map(&:chomp)
File.foreach('f2') do | line |
next unless line =~ /where ri=(\d+);$/
puts line if idxes.include? $1
end
or with Perl
open $file, '<', 'f1';
while (<$file>) { chomp; $idxs{$_} = 1; }
close($file);
open $file, '<', 'f2';
while (<$file>) {
next unless $_ =~ /where ri=(\d+);$/;
print $_ if $idxs{$1};
}
close $file;
The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.
Turn file2 into a CSV-formatted file where the first field is the value after ri=
cat file2.txt | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt
Create two tables:
sqlite> CREATE TABLE file1(rowId char(10));
sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));
Import the row IDs from file1:
sqlite> .import file1.txt file1
Import the statements from file2, using the "prepared" version:
sqlite> .separator ,
sqlite> .import file2_with_ids.txt file2
Select all and ony the statements in table file2 with a matching rowId in table file1:
sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);
File 3 can be easily created by redirecting output to a file before issuing the select statement:
sqlite> .output file3.txt
Test data:
sqlite> select count(*) from file1;
1000000
sqlite> select count(*) from file2;
10000000
sqlite> select * from file1 limit 4;
1610666927
1610661782
1610659837
1610664855
sqlite> select * from file2 limit 4;
1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;
Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.
Most of previous answers are correct but the only thing that worked for me was this command
grep -oi -f a.txt b.txt
Maybe try AWK and use number from file 1 as a key for example simple script
First script will produce awk script:
awk -f script1.awk
{
print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
}
and then invoke script2.awk with file
I may be missing something, but wouldn't it be sufficient to just iterate the IDs in file1 and for each ID, grep file2 and store the matches in a third file? I.e.
for ID in `cat file1`; do grep $ID file2; done > file3
This is not terribly efficient (since file2 will be read over and over again), but it may be good enough for you. If you want more speed, I'd suggest to use a more powerful scripting language which lets you read file2 into a map which quickly allows identifying lines for a given ID.
Here's a Python version of this idea:
queryByID = {}
for line in file('file2'):
lastEquals = line.rfind('=')
semicolon = line.find(';', lastEquals)
id = line[lastEquals + 1:semicolon]
queryByID[id] = line.rstrip()
for line in file('file1'):
id = line.rstrip()
if id in queryByID:
print queryByID[id]
## reports any lines contained in < file 1> missing in < file 2>
IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do ((\!$(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS
or to do what the asker wants, take off the negation and redirect
(IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do (($(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS) >> < file 3>

How to download Postgres bytea column as file

Currently, i have a number of files stored in postgres 8.4 as bytea. The file types are .doc, .odt, .pdf, .txt and etc.
May i know how to download all the file stored in Postgres because i need to to do a backup.
I need them in their original file type instead of bytea format.
Thanks!
One simple option is to use COPY command with encode to hex format and then apply xxd shell command (with -p continuous hexdump style switch). For example let's say I have jpg image in bytea column in samples table:
\copy (SELECT encode(file, 'hex') FROM samples LIMIT 1) TO
'/home/grzegorz/Desktop/image.hex'
$ xxd -p -r image.hex > image.jpg
As I checked it works in practice.
Try this:
COPY (SELECT yourbyteacolumn FROM yourtable WHERE <add your clauses here> ...) TO 'youroutputfile' (FORMAT binary)
Here's the simplest thing I could come up with:
psql -qAt "select encode(file,'base64') from files limit 1" | base64 -d
The -qAt is important as it strips off any formatting of the output. These options are available inside the psql shell, too.
base64
psql -Aqt -c "SELECT encode(content, 'base64') FROM ..." | base64 -d > file
xxd
psql -Aqt -c "SELECT encode(content, 'hex') FROM ..." | xxd -p -r > file
If you have a lot of data to download then you can get the lines first and then iterate through each one writing the bytea field to file.
$resource = pg_connect('host=localhost port=5432 dbname=website user=super password=************');
// grab all the user IDs
$userResponse = pg_query('select distinct(r.id) from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38');
// need to work through one by one to handle data
while($user = pg_fetch_array($userResponse)){
$user_id = $user['id'];
$query = 'select r.id, f.data, rfile.resource_type_id, ft.extension from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38 and r.id = ' . $user_id;
$fileResponse = pg_query($query);
$fileData = pg_fetch_array($fileResponse);
$data = pg_unescape_bytea($fileData['data']);
$extension = $fileData['extension'];
$fileId = $fileData['id'];
$filename = $fileId . '.' . $extension;
$fileHandle = fopen($filename, 'w');
fwrite($fileHandle, $data);
fclose($fileHandle);
}
DO $$
DECLARE
l_lob_id OID;
r record; BEGIN
for r in
select data, filename from bytea_table
LOOP
l_lob_id:=lo_from_bytea(0,r.data);
PERFORM lo_export(l_lob_id,'/home/...'||r.filename);
PERFORM lo_unlink(l_lob_id);
END LOOP;
END; $$
Best I'm aware, bytea to file needs to be done at the app level.
(9.1 might change this with the filesystem data wrapper contrib. There's also a lo_export function, but it is not applicable here.)
If you want to do this from a local windows, and not from the server, you will have to run every statement individually, and have PGAdmin and certutil:
Have PGAdmin installed.
Open cmd from the runtime folder or cd "C:\Program Files\pgAdmin 4\v6\runtime"
Run in PGAdmin query to get every statement that you will have to paste in cmd:
SELECT 'set PGPASSWORD={PASSWORD} && psql -h {host} -U {user} -d {db name} -Aqt -c "SELECT encode({bytea_column}, ''base64'') FROM {table} WHERE id='||id||'" > %a% && CERTUTIL -decode %a% "C:\temp{name_of_the_folder}\FileName - '||{file_name}||' ('||TO_CHAR(current_timestamp(),'DD.MM.YYYY,HH24 MI SS')||').'||{file_extension}||'"'
FROM table WHERE ....;
Replace {...}
It will generate something like:
set PGPASSWORD=123 psql -h 192.1.1.1 -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=33" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test1 - (06.04.2022,15 42 26).docx"
set PGPASSWORD=123 psql -h 192.1.1.1 -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=44" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test2 - (06.04.2022,15 42 26).pdf"
Copy paste all the generated statements in CMD. The files will be saved to your local machine.

Resources