I have a Stream analytics job that gets the data from an external source (I do not have a say on how the data is being formatted). I am trying to import the data into my data lake, storing as a JSON. This works fine, but I also want to get the output in a CSV, this is where I am having trouble.
As the input data has an array as one of the column, when importing in JSON it recognizes it and provides the right data i.e. places them in brackets [A, B, C], but when I use it in CSV I get the column represented as the word "Array". I thought I would convert it to XML, use STUFF and get them in one line, but it does not like using a SELECT statement in a CROSS APPLY.
Has anyone worked with Stream Analytics importing data into CSV, that has array column? If so, how did you manage to import the array values?
Sample data:
PG is the column I am trying to extract, so the output CSV should look something like.
This is the query I am using,
dummy AS D
CROSS APPLY GetArrayElements(D.PG) AS A
As you could imagine, this gives me results in this format.

As Pete M said, you could try to create a JavaScript user-defined function to convert an array to a string, and then you could call this User-defined function in your query.
JavaScript user-defined function:
function main(inputobj) {
var outstring = inputobj.toString();
return outstring;
Call UDF in query:


Big query nested json strings to arrays then new tables

Bigquery Database
I've got a webhook that's pushing to my big query table. The problem is it has lots of nested json strings which are brought in as strings. I ultimately want to make each column with these json strings into their own tables but I'm getting stuck because I can't figure out how to get them unnested and into an array.
[{"id":"63bddc8cfe21ec002d26b7f4","description":"General Admission", "src_currency":"USD","src_price":50.0,"src_fee":0.0,"src_commission":1.79,"src_discount":0.0,"applicable_pass_id":null,"seats_label":null,"seats_section_label":null,"seats_parent_type":null,"seats_parent_label":null,"seats_self_type":null,
Here's the sample return from the original source and below is a screenshot of what I'm working with.
[Current Database
I've tried a number of things but the multiple nestings and string to array issue are really hampering everything I've tried.
I'm honestly not sure exactly what output/structure is best for this data set. I assume that each of the json returns probably just needs to be its own table and I can reference or join them based off that first "id" value in the json strings but I'm wide open to suggestions.
You can use a combination of JSON functions, and array functions to manipulate this kind of data.
JSON_EXTRACT_ARRAY can convert the JSON formatted string into an array, UNNEST then can make each entry into rows, and finally JSON_EXTRACT_SCALAR can pull out individual columns.
So here's an example of what I think you're trying to accomplish:
with sampledata as (
select """[{"id":"63bddc8cfe21ec002d26b7f4","description":"General Admission", "src_currency":"USD","src_price":50.0,"src_fee":0.0,"src_commission":1.79,"src_discount":0.0,"applicable_pass_id":null,"seats_label":null,"seats_section_label":null,"seats_parent_type":null,"seats_parent_label":null,"seats_self_type":null,"seats_self_label":null,"rate_type":"Rate","option_name":null,"price_level_id":null,"src_discount_price":50.0,"rate_id":"636d6d5cea8c6000222c640d","cost_item_id":"63bddc8cfe21ec002d26b7f4"},{"id":"63bddc8cfe21ec002d26b7f4","description":"General Admission", "src_currency":"USD","src_price":50.0,"src_fee":0.0,"src_commission":1.79,"src_discount":0.0,"applicable_pass_id":null,"seats_label":null,"seats_section_label":null,"seats_parent_type":null,"seats_parent_label":null,"seats_self_type":null,"seats_self_label":null,"rate_type":"Rate","option_name":null,"price_level_id":null,"src_discount_price":50.0,"rate_id":"636d6d5cea8c6000222c640d","cost_item_id":"63bddc8cfe21ec002d26b7f4"}]""" as my_json_string
select JSON_EXTRACT_SCALAR(f,'$.id') as id, JSON_EXTRACT_SCALAR(f,'$.rate_type') as rate_type, JSON_EXTRACT_SCALAR(f,'$.cost_item_id') as cost_item_id
from sampledata, UNNEST(JSON_EXTRACT_ARRAY(my_json_string)) as f
Which creates rows with specific columns from that data, like this:

Convert PostgreSQL nested JSON to numeric array in Tableau

I have a PostgreSQL database containing a table test_table with individual records. First column is a simple store_id, second column meausurement is a nested json.
store_id | measurement
0 | {...}
The format of the measurement column is as follows:
'file_info': 'xxxx',
'data': {
'contour_data': {
'X': [-97.0, -97.0, -97.0, -97.0, -97.0, -97.0],
'Y': [-43.0, -41.0, -39.0, -39.0, -38.0, -36.0]
I would like to plot Y vs. X in a scatter plot in Tableau. Therefore I connected the database successfully with the PostgreSQL connector of Tableau. From this page I learned, that I have to use Custom SQL queries to extract data from the json object, since Tableau doesn't directly support the json datatype of Postgres. I tried already the following Custom SQL Query in Tableau:
store_id as store_id,
measurement#>>'{data, contour_data, X}' as contour_points_x,
measurement#>>'{data, contour_data, Y}' as contour_points_y
from test_table
which successfully extracts the two arrays to two new columns contour_points_x and contour_points_y. However both new columns are in Tableau of type string, so I cannot use them as data source for a plot.
How do I have to adjust the Custom SQL query to make the data arrays plottable in a Tableau scatter plot?
Looks like you need to split the columns. Check this
EDIT - the linked approach works when you can reliably assume an upper bound for the number of points in each list. One way to split arbitrarily sized lists is described here
The answer is a concatenation of several functions and/or syntax operations. One has to
use the #> operator to dig in the json and return it as json type (not as text type as >># does).
use json_array_elements_text() to expand the json to a set of text.
use type cast operator :: to convert text to float
/* custom SQL Query in Tableau */
store_id as store_id,
json_array_elements_text(measurement#>'{data, contour_data, X}')::float as contour_points_x,
json_array_elements_text(measurement#>'{data, contour_data, Y}')::float as contour_points_y,
from test_table
Both resulting columns appear in a Tableau Sheet now as discrete measures. Changing to discrete dimensions allows to plot contour_points_y vs. contour_points_x as desired.

Postgres: is there any row_to_json equivalent that returns values only?

In a project I'm working on, I need to stream potentially large data sets from a Postgres database to the client, for analytics purposes.
The application is built in Rails (irrelevant for this question) and after a bit of research I'm currently able to stream query results by using COPY in Postgres:
COPY (SELECT row_to_json(t) from (#{query}) t) TO STDOUT;
Sources (for who's interested):
This works, but it yields every row as a key-value pair, e.g.:
In the spirit of minimising the size (in bytes) of the response and especially since this is getting served through the web, I want to return just an array of values for every row, i.e.:
["[403457, \"\", \"Firstname403457\", \"Lastname403457\", \"adwords\", \"2015-08-05T22:43:07.295796\", \"2017-01-19T04:48:29.464051\"]"]
Is there a way to achieve this within Postgres, even by nesting functions, starting from the query above?
You could create a simple SQL function that converts a row into the desired format:
CREATE FUNCTION row2json(anyelement) RETURNS json
'SELECT json_agg(z.value) FROM json_each(row_to_json($1)) z';
Then you use that to transform the output:
SELECT row2json(mytab) FROM mytab;
If performance is more important than JSON output, just cast the result to a string:
SELECT CAST(mytab AS text) FROM mytab;

Export an array into a CSV-file in PL/pgSQL

I have a function, which RETURNS SETOF text[]. Sample result of this function:
{080213806381,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}
{080213806382,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}
I'm forming each row with a statement like:
resultRow := array_append(resultRow,;
and then:
RETURN NEXT resultRow;
And here's my COPY command:
SELECT myFunction()
) TO 'D:\test_output.csv' WITH (FORMAT 'csv', DELIMITER E',', HEADER false)
And I have a couple of problems:
Regardless the fact that values are appended to the array in the same way, some of them are double-quoted and some of them are not. This somehow depends on a presence of space character in a value. Look, for instance, at the 1st element of the array or at the answer2 and "answer 3" in each row. I want some unified behavior.
After exporting in to CSV with COPY command I'm getting the same rows with all these curly braces at the beginning and the end. I dont want them in CSV.
What can I do to solve these issues?
You wish to export rows of varying numbers of columns. You're producing a set of arrays, but from there want to produce a CSV file.
The immediate issue - array literals aren't CSV
Your function returns text[] literals, i.e. PostgreSQL array literals.
These are not CSV as commonly recognised. They're comma-separated, yes, but they follow different syntax rules. You can't reliably treat an array literal as a CSV row or vice versa.
Don't attempt to just chop the delimiting {...} off and treat the array literal as a CSV row.
COPY won't work well or at all
COPY is not going to work well for you. It's designed to handle relations, i.e. uniform sets of structured rows where each column is of a well defined type and each row has the same number of columns.
You could redefine your function to return a setof record and pad your records with nulls to always be the same width, but it'll be pretty ugly and limited, plus the CSV will then incorporate the nulls.
What COPY will do is export a single column CSV containing array literals in a single CSV field. This certainly will not be what you want.
Solution 1: Export client-side
You might be better off doing this on the client side, via a script or program to generate the CSV. Have the program receive the set of arrays and then write it to CSV via a suitable library, like Python's csv module. Choose a client scripting language where the PostgreSQL driver understands arrays and can transform them to arrays in the language's format - again, like psycopg2 for Python.
e.g. given dummy function:
CREATE OR REPLACE FUNCTION get_rows() RETURNS setof text[] AS $$
('{080213806381,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3"}'::text[]),
('{080213806382,"personal data1","question 1",answer1,"question 2",answer2,"question 3","answer 3","q4","a4"}'::text[])
a client script could be as simple as:
#!/usr/bin/env python
import psycopg2
import csv
with psycopg2.connect('dbname=craig') as conn:
curs = conn.cursor()
with open("test.csv","w") as csvfile:
f = csv.writer(csvfile)
curs.execute("SELECT * FROM get_rows()")
for row in curs:
Solution 2: Export CSV directly from a procedure
Alternately, if the CSV document isn't too big, you could produce the entire CSV in a single procedure, perhaps using plpythonu and the csv module, or a similar CSV library for your preferred procedural language. Because the whole CSV document must be accumulated in memory this won't scale to very very large documents.
Using text array as result format is wrong idea - a text array format is not simply convertible with CSV format. Return table instead
RETURNS TABLE(c1 text, c2 text, c3 text, c4 text, c5 text, c6 text, c7 text, c8 text)
AS $$
VALUES('080213806381','personal data1','question 1','answer1','question 2','answer2','question 3','answer 3'),
('080213806382','personal data1','question 1','answer1','question 2','answer2','question 3','answer 3');
$$ LANGUAGE sql;
postgres=# COPY (SELECT * FROM foo()) TO stdout CSV;
080213806381,personal data1,question 1,answer1,question 2,answer2,question 3,answer 3
080213806382,personal data1,question 1,answer1,question 2,answer2,question 3,answer 3
Time: 1.228 ms

Importing jSON Array into Hive/Hadoop

I'm using the HortonWorks Hadoop Sandbox and I have imported a jSON string into a table, as per this guide however I run into trouble as my string contains an array which is not handled well by json_tuple(). I have tried the expolode() function, but that returns the following error:
"Error occurred executing hive query: OK FAILED: UDFArgumentException explode() takes an array or a map as a parameter"
What does that mean exactly and how can I fix it? Would the problem be with the format of the table? I just followed as per the guide above and created the table with:
value STRING
LOCATION '/user/hue/games'
The value I am trying to explode is an array like this:
Any help very much appreciated!
The explode() hive function takes a hive array or map, and you gave it a String value. json_tuple() worked in the guide because it made your string into a map.
You'll want to convert your json array into a format that hive can accept, or use one of the JSON SerDes or something of that nature in order to query the way you want.
JSON SerDe for Hive that supports JSON arrays
