Export an array into a CSV-file in PL/pgSQL - arrays

I have a function, which RETURNS SETOF text[]. Sample result of this function:
{080213806381,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}
{080213806382,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}
I'm forming each row with a statement like:
resultRow := array_append(resultRow, fetchedRow.data::text);
and then:
RETURN NEXT resultRow;
And here's my COPY command:
COPY(
SELECT myFunction()
) TO 'D:\test_output.csv' WITH (FORMAT 'csv', DELIMITER E',', HEADER false)
And I have a couple of problems:
Regardless the fact that values are appended to the array in the same way, some of them are double-quoted and some of them are not. This somehow depends on a presence of space character in a value. Look, for instance, at the 1st element of the array or at the answer2 and "answer 3" in each row. I want some unified behavior.
After exporting in to CSV with COPY command I'm getting the same rows with all these curly braces at the beginning and the end. I dont want them in CSV.
What can I do to solve these issues?

You wish to export rows of varying numbers of columns. You're producing a set of arrays, but from there want to produce a CSV file.
The immediate issue - array literals aren't CSV
Your function returns text[] literals, i.e. PostgreSQL array literals.
These are not CSV as commonly recognised. They're comma-separated, yes, but they follow different syntax rules. You can't reliably treat an array literal as a CSV row or vice versa.
Don't attempt to just chop the delimiting {...} off and treat the array literal as a CSV row.
COPY won't work well or at all
COPY is not going to work well for you. It's designed to handle relations, i.e. uniform sets of structured rows where each column is of a well defined type and each row has the same number of columns.
You could redefine your function to return a setof record and pad your records with nulls to always be the same width, but it'll be pretty ugly and limited, plus the CSV will then incorporate the nulls.
What COPY will do is export a single column CSV containing array literals in a single CSV field. This certainly will not be what you want.
Solution 1: Export client-side
You might be better off doing this on the client side, via a script or program to generate the CSV. Have the program receive the set of arrays and then write it to CSV via a suitable library, like Python's csv module. Choose a client scripting language where the PostgreSQL driver understands arrays and can transform them to arrays in the language's format - again, like psycopg2 for Python.
e.g. given dummy function:
CREATE OR REPLACE FUNCTION get_rows() RETURNS setof text[] AS $$
VALUES
('{080213806381,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}'::text[]),
('{080213806382,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3","q4","a4"}'::text[])
$$ LANGUAGE SQL;
a client script could be as simple as:
#!/usr/bin/env python
import psycopg2
import csv
with psycopg2.connect('dbname=craig') as conn:
curs = conn.cursor()
with open("test.csv","w") as csvfile:
f = csv.writer(csvfile)
curs.execute("SELECT * FROM get_rows()")
for row in curs:
f.writerow(row[0])
Solution 2: Export CSV directly from a procedure
Alternately, if the CSV document isn't too big, you could produce the entire CSV in a single procedure, perhaps using plpythonu and the csv module, or a similar CSV library for your preferred procedural language. Because the whole CSV document must be accumulated in memory this won't scale to very very large documents.

Using text array as result format is wrong idea - a text array format is not simply convertible with CSV format. Return table instead
CREATE OR REPLACE FUNCTION foo()
RETURNS TABLE(c1 text, c2 text, c3 text, c4 text, c5 text, c6 text, c7 text, c8 text)
AS $$
VALUES('080213806381','personal data1','question 1','answer1','question 2','answer2','question 3','answer 3'),
('080213806382','personal data1','question 1','answer1','question 2','answer2','question 3','answer 3');
$$ LANGUAGE sql;
postgres=# COPY (SELECT * FROM foo()) TO stdout CSV;
080213806381,personal data1,question 1,answer1,question 2,answer2,question 3,answer 3
080213806382,personal data1,question 1,answer1,question 2,answer2,question 3,answer 3
Time: 1.228 ms

Related

SQL Server: STRING_SPLIT() result in a computed column

I couldn't find good documentation on this, but I have a table that has a long string as one of it's columns. Here's some example data of what it looks like:
Hello:Goodbye:Apple:Orange
Example:Seagull:Cake:Chocolate
I would like to create a new computed column using the STRING_SPLIT() function to return the third value in the string table.
Result #1: "Apple"
Result #2: "Cake"
What is the proper syntax to achieve this?
At this time your answer is not possible.
The output rows might be in any order. The order is not guaranteed to
match the order of the substrings in the input string.
STRING_SPLIT reference
There is no way to guarantee which item was the third item in the list using string_split and the order may change without warning.
If you're willing to build your own, I'd recommend reading up on the work done by
Brent Ozar and Jeff Moden.
You shouldn't be storing data like that in the first place. This points to a potentially serious database design problem. BUT you could convert this string into JSON by replacing : with ",", surround it with [" and "] and retrieve the third array element , eg :
declare #value nvarchar(200)='Example:Seagull:Cake:Chocolate'
select json_value('["' + replace(#value,':','","' )+ '"]','$[2]')
The string manipulations convert the string value to :
["Example","Seagull","Cake","Chocolate"]
After that, JSON_VALUE parses the JSON string and retrieves the 3rd item in the array using a JSON PATH expression.
Needless to say, this will be slow and can't take advantage of indexing. If those values are meant to be read or written individually, they should be stored in separate columns. They'll probably take less space than one long string.
If you have a lot of optional fields but only a subset contain values at any time, you could use sparse columns. This way you could have thousands of rows, only a few of which would contain data at any time

Snowflake:Export data in multiple delimiter format

Requirement:
Need the file to be exported as below format, where gender, age, and interest are columns and value after : is data for that column. Can this be achieved while using Snowflake, if not is it possible to export data using Python
User1234^gender:male;age:18-24;interest:fishing
User2345^gender:female
User3456^age:35-44
User4567^gender:male;interest:fishing,boating
EDIT 1: Solution as given by #demircioglu
It displays as NULL values instead of other column values
Below the EMPLOYEES table data
When I ran below query
SELECT 'EMP_ID'||EMP_ID||'^'||'FIRST_NAME'||':'||FIRST_NAME||';'||'LAST_NAME'||':'||LAST_NAME FROM tempdw.EMPLOYEES ;
Create your SQL with the desired format and write it to a file
COPY INTO #~/stage_data
FROM
(
SELECT 'User'||User||'^'||'gender'||':'||gender||';'||'age'||':'||age||';'||'interest'||':'||interest FROM table
)
file_format = (TYPE=CSV compression='gzip')
File format here is not important because each line will be treated as a field because of your delimiter requirements
Edit:
CONCAT function (aliased with ||) returns NULL if you have a NULL value.
In order to eliminate NULLs you can use NVL2 function
So your SQL will have series of NVL2s
NVL2 checks the first parameter and if it's not NULL returns first expression, if it's NULL returns second expression
So for User column
'User'||User||'^' will turn into
NVL2(User,'User','')||NVL2(User,User,'')||NVL2(User,'^','')
P.S. I am leaving up to you to create the rest of the SQL, because Stackoverflow's function is to help find the solution, not spoon feed the solution.
No, I do not believe multiple delimiters like this are supported in Snowflake at this time. Multiple byte and multiple character delimiters are supported, but they will need to be specified as the same delimiter repeated for either record or line.
Yes, it may be possible to do some post-processing or use Python scripts to achieve this. Or even SQL transformative statements. This is not really my area of expertise so if someone has an example for you, I'll let them add to the discussion.

Stream Analytics GetArrayElements as String

I have a Stream analytics job that gets the data from an external source (I do not have a say on how the data is being formatted). I am trying to import the data into my data lake, storing as a JSON. This works fine, but I also want to get the output in a CSV, this is where I am having trouble.
As the input data has an array as one of the column, when importing in JSON it recognizes it and provides the right data i.e. places them in brackets [A, B, C], but when I use it in CSV I get the column represented as the word "Array". I thought I would convert it to XML, use STUFF and get them in one line, but it does not like using a SELECT statement in a CROSS APPLY.
Has anyone worked with Stream Analytics importing data into CSV, that has array column? If so, how did you manage to import the array values?
Sample data:
[
{"GID":"10","UID":1,"SID":"5400.0","PG:["75aef","e5f8e"]},
{"GID":"10","UID":2,"SID":"4400.0","PG:["75aef","e5f8e","6d793"]}
]
PG is the column I am trying to extract, so the output CSV should look something like.
GID|UID|SID|PG
10|1|5400.0|75aef,e5f8e
10|2|4400.0|75aef,e5f8e,6d793
This is the query I am using,
SELECT
D.GID ,
D.UID ,
D.SID ,
A.ArrayValue
FROM
dummy AS D
CROSS APPLY GetArrayElements(D.PG) AS A
As you could imagine, this gives me results in this format.
GID|UID|SID|PG
10|1|5400.0|75aef
10|1|5400.0|e5f8e
10|2|4400.0|75aef
10|2|4400.0|e5f8e
10|2|4400.0|6d793
As Pete M said, you could try to create a JavaScript user-defined function to convert an array to a string, and then you could call this User-defined function in your query.
JavaScript user-defined function:
function main(inputobj) {
var outstring = inputobj.toString();
return outstring;
}
Call UDF in query:
SELECT
TI.GID,TI.UID,TI.SID,udf.extractdatafromarray(TI.PG)
FROM
[TEST-SA-DEMO-BLOB-Input] as TI
Result:

Import CSV data into SQL Server

I have data in the csv file similar to this:
Name,Age,Location,Score
"Bob, B",34,Boston,0
"Mike, M",76,Miami,678
"Rachel, R",17,Richmond,"1,234"
While trying to BULK INSERT this data into a SQL Server table, I encountered two problems.
If I use FIELDTERMINATOR=',' then it splits the first (and sometimes the last) column
The last column is an integer column but it has quotes and comma thousand separator whenever the number is greater than 1000
Is there a way to import this data (using XML Format File or whatever) without manually parsing the csv file first?
I appreciate any help. Thanks.
You can parse the file with http://filehelpers.sourceforge.net/
And with that result, use the approach here: SQL Bulkcopy YYYYMMDD problem or straight into SqlBulkCopy
Use MySQL load data:
LOAD DATA LOCAL INFILE 'path-to-/filename.csv' INTO TABLE `sql_tablename`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
IGNORE 1 LINES;
The part optionally enclosed by '\"', or escape character and quote, will keep the data in the first column together for the first field.
IGNORE 1 LINES will leave the field name row out.
UTF8 line is optional but good to use if names have diacritics, like in José.

Commas within CSV Data

I have a CSV file which I am directly importing to a SQL server table. In the CSV file each column is separated by a comma. But my problem is that I have a column "address", and the data in this column contains commas. So what is happening is that some of the data of the address column is going to the other columns will importing to SQL server.
What should I do?
For this problem the solution is very simple.
first select => flat file source => browse your file =>
then go to the "Text qualifier" by default its none write here double quote like (") and follow the instruction of wizard.
Steps are -
first select => flat file source => browse your file => Text qualifier (write only ") and follow the instruction of wizard.
Good Luck
If there is a comma in a column then that column should be surrounded by a single quote or double quote. Then if inside that column there is a single or double quote it should have an escape charter before it, usually a \
Example format of CSV
ID - address - name
1, "Some Address, Some Street, 10452", 'David O\'Brian'
New version supports the CSV format fully, including mixed use of " and , .
BULK INSERT Sales.Orders
FROM '\\SystemX\DiskZ\Sales\data\orders.csv'
WITH ( FORMAT='CSV');
I'd suggest to either use another format than CSV or try using other characters as field separator and/or text delimiter. Try looking for a character that isn't used in your data, e.g. |, #, ^ or #. The format of a single row would become
|foo|,|bar|,|baz, qux|
A well behave parser must not interpret 'baz' and 'qux' as two columns.
Alternatively, you could write your own import voodoo that fixes any problems. For the later, you might find this Groovy skeleton useful (not sure what languages you're fluent in though)
Most systems, including Excel, will allow for the column data to be enclosed in single quotes...
col1,col2,col3
'test1','my test2, with comma',test3
Another alternative is to use the Macintosh version of CSV, which uses TAB's as delimiters.
The best, quickest and easiest way to resolve the comma in data issue is to use Excel to save a comma separated file after having set Windows' list separator setting to something other than a comma (such as a pipe). This will then generate a pipe (or whatever) separated file for you that you can then import. This is described here.
I don't think adding quote could help.The best way I suggest is replacing the comma in the content with other marks like space or something.
replace(COLUMN,',',' ') as COLUMN
Appending a speech mark into the select column on both side works. You must also cast the column as a NVARCVHAR(MAX) to turn this into a string if the column is a TEXT.
SQLCMD -S DB-SERVER -E -Q "set nocount on; set ansi_warnings off; SELECT '""' + cast ([Column1] as nvarchar(max)) + '""' As TextHere, [Column2] As NormalColumn FROM [Database].[dbo].[Table]" /o output.tmp /s "," -W

Resources