please pardon the level of detail. I'm not completely sure how to phrase this question.
I am new to scala and still learning the intricacies of the language. I have a project where all the data I need is contained in a table with a layout like this:
CREATE TABLE demo_data ( table_key varchar(10), description varchar(40), data_key varchar(10), data_value varchar(10) );
Where the table_key column contains the main key I'm searching on, and the description repeats for every row with that table_key. In addition there are descriptive keys and values contained in the data_key and data_value pairs.
I need to consolidate a set of these data_keys into my resulting class so that the class will end up like this:
case class Tab ( tableKey: String, description: String, valA: String, valB: String, valC: String )
object Tab {
val simple = {
get[String]("table_key") ~
get[String]("description") ~
get[String]("val_a") ~
get[String]("val_b") ~
get[String]("val_c") map {
case tableKey ~ description ~ valA ~ valB ~ valC => Tab(table_key, description, valA, valB, valC)
def list(tabKey: String) : List[Tab] = {
DB.withConnection { implicit connection =>
val tabs = SQL(
SELECT DISTINCT p.table_key, p.description,
a.data_value val_a,
b.data_value val_b,
c.data_value val_c
FROM demo_data p
JOIN demo_data a on p.table_key = a.table_key and a.data_key = 'A'
JOIN demo_data b on p.table_key = b.table_key and b.data_key = 'B'
JOIN demo_data c on p.table_key = c.table_key and c.data_key = 'C'
WHERE p.table_key = {tabKey}
).on('tabKey -> tabKey).as(Tab.simple *)
return tabs
which will return what I want, however I have more than 30 data keys that I wish to retrieve in this manner, and the joins to itself rapidly becomes unmanageable. As in the query ran for 1.5 hours and used up 20GB worth of temporary tablespace before running out of disk space.
So instead I am doing a separate class that retrieves a list of data keys and data values for a given table key using the "where data_key in ('A','B','C',...)", and now I'd like to "flatten" the returned list into a resulting object that will have the valA, valB, valC, ... in it. I still want to return a list of the flattened objects to the calling routine.
Let me try to idealize what I'd like to accomplish..
Take a header result set and a detail result set, extract out the keys out of the detail result set to populate additional elements/properties in the header result set and produce a list of classes containing the all the elements of the header result set, and the selected properties from the detail result set. So I get a list of TabHeader(tabKey,Desc) and for each I retrieve a list of interesting TabDetail(DataKey,DataValue), I then extract out the element where the DataKey == 'A' and put the DataValue element in Tab(valA), and do the same for DataKey == 'B', 'C', ... After I'm done I wish to produce a Tab(tabKey, Desc, valA, valB, valC, ...) in place of the corresponding TabHeader. I could quite possibly muddle through this in Java, but I'm treating this as a learning opportunity and would like to know a good way to do this in Scala.
I'm feeling that something with the scala mapping should do what I need, but I haven't been able to track down exactly what.


How to assign a unique ID to each row in a table in the Flink Table API?

I'm using Flink to compute a series of operations. Each operation produces a table which is both used for the next operation as well as stored in S3. This makes it possible to view the data at each intermediate step in the calculation and see the effect of each operation.
I need to assign a unique identifier to each row in each table, so that when that identifier appears again in the following step (possibly in a different column) I know that two rows are associated with each other.
The first obvious candidate for this seems to be the ROW_NUMBER() function, but:
It doesn't seem to be anywhere in the table expression API. Do I have to construct SQL strings?
How do I use it? When I try this query:
I get this error:
org.apache.flink.table.api.ValidationException: Over Agg: The window rank function without order by. please re-check the over window statement.
Does it always require sorting the table? This seems like an overhead I'd rather avoid.
The next option was just to generate a random UUID for every row. But when I try this, the same UUID is never used twice, so it's completely useless. Here's an example:
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api._
import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment
object Sandbox {
def main(args: Array[String]): Unit = {
val env = StreamTableEnvironment.create(
val inp = env.fromValues("id"))
val out1 = inp.addColumns(uuid().as("u"))
val out2 = out1.addColumns($"u".as("u2"))
WITH ('connector' = 'print')
WITH ('connector' = 'print')
.addInsert("out1", out1)
.addInsert("out2", out2)
// Equivalent to the createStatementSet method:
The output I get:
[info] +I(1,4e6008ad-868a-4f95-88b0-38ee7969067d)
[info] +I(1,55da264d-1e15-4c40-94d4-822e1cd5db9c,c9a78f93-580c-456d-9883-08bc998124ed)
I need the UUID from out1 to reappear in out2 in both columns, e.g:
[info] +I(1,4e6008ad-868a-4f95-88b0-38ee7969067d)
[info] +I(1,4e6008ad-868a-4f95-88b0-38ee7969067d,4e6008ad-868a-4f95-88b0-38ee7969067d)
I suppose this is due to this note in the docs:
This function is not deterministic which means the value would be recalculated for each record.
How can I calculate a UUID just once and make it 'concrete' so that the same value is sent to both out1 and out2?
I get a similar result with a user defined function:
class uuidUdf extends ScalarFunction {
def eval(): String = UUID.randomUUID().toString
val out1 = inp.addColumns(call(new uuidUdf()).as("u"))

BigQuery: extract keys from json object, convert json from object to key-value array

I have a table with a column which contains a json-object, the value type is always a string.
I need 2 kind of information:
a list of the json keys
convert the json in an array of key-value pairs
This is what I got so far, which is working:
RETURNS Array<String>
return Object.keys(JSON.parse(input));
RETURNS Array<Struct<key String, value String>>
let json = JSON.parse(input);
return Object.keys(json).map(e => {
return { "key" : e, "value" : json[e] }
WITH input AS (
SELECT "{\"key1\": \"value1\", \"key2\": \"value2\"}" AS json_column
SELECT "{\"key1\": \"value1\", \"key3\": \"value3\"}" AS json_column
SELECT "{\"key5\": \"value5\"}" AS json_column
jsonObjectKeys(json_column) AS keys,
jsonToKeyValueArray(json_column) AS key_value
FROM input
The problem is that FUNCTION is not the best in term of compute optimization, so I'm trying to understand if there is a way to use plain SQL to achieve these 2 needs (or 1 of them at least) using only SQL w/o functions.
Below is for BigQuery Standard SQL
array(select trim(split(kv, ':')[offset(0)]) from t.kv kv) as keys,
select as struct
trim(split(kv, ':')[offset(0)]) as key,
trim(split(kv, ':')[offset(1)]) as value
from t.kv kv
) as key_value
from input,
unnest([struct(split(translate(json_column, '{}"', '')) as kv)]) t
If to apply to sample data from your question - output is

Cypher statement with distinct match conditions is returning the same result

I am using Neo4j as a database to store voting information related to another database object.
I have a Vote object which has fields:
type:String with values of UP or DOWN.
argId:String which is a string ID value linking to a unique argument object
I am trying to query the number of votes assigned to a given argId using the following queries:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='DOWN'
RETURN {downvotes: COUNT(v)} AS votes
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
RETURN {upvotes: COUNT(v)} AS votes
Note that this above cypher -- works and returns the expected result result like so:
"downvotes": 1
"upvotes": 10
But I feel like the query could be a bit neater and want to write something like this:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN {upvotes: COUNT(v), downvotes: COUNT(b)}
Just reading it through, I think it makes sense, b and v are declared as separate variables, so all should be good (so I thought).
But running it given me this:
"upvotes": 10,
"downvotes": 10
But it should be what I have above.
Why is this?
I'm kinda new to neo4j and cypher so I've probably not understood how cypher works fully.
Can anyone shine any light?
Thank you!
p.s. I'm using Neo4j 3.5.6 and running the queries via the Desktop web browser app.
I think if you run this query you will get a clearer picture of what is happeneing. Your query produces a cartesian product of the upvotes(10) and the downvotes(1). The product is a result set of 10 rows. When they are subsequently counted, there are ten of each.
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN v.type, b.type
In order to get the result you want you need to filter the values and count them individually.
Rather than have two match statements, have a single match statement that retreives all of the values of interest and then use a conditional statement to filter them into upvotes and downbotes buckets.
Something like this may suit you.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
upvotes: count(CASE WHEN v.type = 'DOWN' THEN 1 END),
downvotes: count(CASE WHEN v.type = 'UP' THEN 1 END)
} AS vote_result
Using APOC you could do something like this whereby you use the type values themselves to aggregate the counts and then use APOC to convert it to a map with the types as the keys in the map.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
WITH [v.type, count(*)] AS vote_pair

Csv file to a Lua table and access the lines as new table or function()

Currently my code have simple tables containing the data needed for each object like this:
infantry = {class = "army", type = "human", power = 2}
cavalry = {class = "panzer", type = "motorized", power = 12}
battleship = {class = "navy", type = "motorized", power = 256}
I use the tables names as identifiers in various functions to have their values processed one by one as a function that is simply called to have access to the values.
Now I want to have this data stored in a spreadsheet (csv file) instead that looks something like this:
Name class type power
Infantry army human 2
Cavalry panzer motorized 12
Battleship navy motorized 256
The spreadsheet will not have more than 50 lines and I want to be able to increase columns in the future.
Tried a couple approaches from similar situation I found here but due to lacking skills I failed to access any values from the nested table. I think this is because I don't fully understand how the tables structure are after reading each line from the csv file to the table and therefore fail to print any values at all.
If there is a way to get the name,class,type,power from the table and use that line just as my old simple tables, I would appreciate having a educational example presented. Another approach could be to declare new tables from the csv that behaves exactly like my old simple tables, line by line from the csv file. I don't know if this is doable.
Using Lua 5.1
You can read the csv file in as a string . i will use a multi line string here to represent the csv.
gmatch with pattern [^\n]+ will return each row of the csv.
gmatch with pattern [^,]+ will return the value of each column from our given row.
if more rows or columns are added or if the columns are moved around we will still reliably convert then information as long as the first row has the header information.
The only column that can not move is the first one the Name column if that is moved it will change the key used to store the row in to the table.
Using gmatch and 2 patterns, [^,]+ and [^\n]+, you can separate the string into each row and column of the csv. Comments in the following code:
local csv = [[
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
first = false -- set first to false to switch off the header block
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
Here is how you can load a csv using the I/O library:
-- Example of how to load the csv.
path = "some\\path\\to\\file.csv"
local f = assert(
local csv = f:read("*all")
Alternative you can use io.lines(path) which would take the place of csv:gmatch("[^\n]+") in the for loop sections as well.
Here is an example of using the resulting table:
-- print table out
print("items = {")
for name, item in pairs(items) do
print(" " .. name .. " = { ")
for field, value in pairs(item) do
print(" " .. field .. " = ".. value .. ",")
print(" },")
The output:
items = {
Infantry = {
type = human,
class = army,
power = 2,
Battleship = {
type = motorized,
class = navy,
power = 256,
Cavalry = {
type = motorized,
class = panzer,
power = 12,

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
SELECT, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.
