Snowflake Flatten Query for array - arrays

Snowflake Table has 1 Variant column and loaded with 3 JSON record. The JSON records is as follows.
{"address":{"City":"Lexington","Address1":"316 Tarrar Springs Rd","Address2":null} {"address":{"City":"Hartford","Address1":"318 Springs Rd","Address2":"319 Springs Rd"} {"address":{"City":"Avon","Address1":"38 Springs Rd","Address2":[{"txtvalue":null},{"txtvalue":"Line 1"},{"Line1":"Line 1"}]}
If you look at the Address2 field in the JSON , The first one holds NULL,2nd String and 3rd one array.
When i execute the flatten query for Address 2 as one records holds array, i get only the 3rd record exploded. How to i get all 2 records with exploded value in single query.
select data:address:City::string, data:address:Address1::string, value:txtvalue::string
from add1 ,lateral flatten( input => data:address:Address2 );

When I execute the flatten query for Address 2 as one records holds array, I get only the 3rd record exploded
The default behaviour of the FLATTEN table function in Snowflake will skip any columns that do not have a structure to expand, and the OUTER argument controls this behaviour. Quoting the relevant portion from the documentation link above (emphasis mine):
OUTER => TRUE | FALSE
If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output.
If TRUE, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).
Default: FALSE
Since your VARIANT data is oddly formed, you'll need to leverage conditional expressions and data type predicates to check if the column in the expanded row is of an ARRAY type, a VARCHAR, or something else, and use the result to emit the right value.
A sample query illustrating the use of all above:
SELECT
t.v:address.City AS city
, t.v:address.Address1 AS address1
, CASE
WHEN IS_ARRAY(t.v:address.Address2) THEN f.value:txtvalue::string
ELSE t.v:address.Address2::string
END AS address2
FROM
add1 t
, LATERAL FLATTEN(INPUT => v:address.Address2, OUTER => TRUE) f;
P.s. Consider standardizing your input at ingest or source to reduce your query complexity.
Note: Your data example is inconsistent (the array of objects does not have homogenous keys), but going by your example query I've assumed that all keys of objects in the array will be named txtvalue.

Related

How to query specific information from a JSON column?

I have searched Stack Overflow to get an answer to my question, but while I found many interesting cases, none of them quite address mine.
I have a column called fields in my data, that contains JSON information, such as presented below:
Row Fields
1 [{"label":"Label 1","key":"label_1","description":"Value of label_1"},{"label":"Label 2","key":"label_2","error":"Something"}]
2 [{"description":"something","label":"Row 1","key":"row_1"},{"label":"Row 2","message":"message_1","key":"row_2"}]
In essence, I have many rows of JSON that contain label and key, and bunch of other parameters like that. From every {}, I want to extract only label and key, and then (optional, but ideally) stretch every label and key in every {} to its own row. So, as a result, I would have the following output:
Row Label Key
1 Label 1 label_1
1 Label 2 label_2
2 Row 1 row_1
2 Row 2 row_2
Please note, contents of label and key within JSON can be anything (strings, integers, special characters, a mix of everything, etc. In addition, key and label can be anywhere in relation to other parameters within each {}.
Here is the Big Query SQL dummy data for convenience:
SELECT '1' AS Row, '[{"label":"Label 1","key":"label_1","description":"Value of label_1"},{"label":"Label 2","key":"label_2","error":"Something"}]' AS Fields
UNION ALL
SELECT '2' AS Row, '[{"description":"something","label":"Row 1","key":"row_1"},{"label":"Row 2","message":"message_1","key":"row_2"}]' AS Fields
I have first thought of using REGEX to isolate all the brackets and only show me information with label and key. Then, I looked into BQ Documentation of JSON functions and got very stuck on json_path parameters, specifically because their example doesn't match mine.
Consider below approach
select `row`,
json_extract_scalar(el, '$.label') label,
json_extract_scalar(el, '$.key') key
from your_table, unnest(json_extract_array(fields)) el
if applied to sample data in your question - output is

PostgreSQL count results within jsonb array across multiple rows

As stated in the title, I am in a situation where I need to return a count of occurrences within an array, that is within a jsonb column. A pseudo example is as follows:
CREATE TABLE users (id int primary key, tags jsonb);
INSERT INTO users (id, j) VALUES
(1, '{"Friends": ["foo", "bar", "baz"]}'),
(2, '{"Friends": ["bar", "bar"]}');
please note that the value for friends can contain the same value more than once. This will be relevant later (in this case the second value contains contains the name "bar" twice in jsonb column under the key "Friends".)
Question:
For the example above, if I were to search for the value "bar" (given a query that I need help to solve), I want the number of times "bar" appears in the j (jsonb) column within the key "Friends"; in this case the end result I would be looking for is the integer 3. As the term "bar" appears 3 times across 2 rows.
Where I'm at:
Currently I have sql written, that returns a text array containing all of the friends values (from the multiple selected rows) in a single, 1 dimensional array. That sql is as follows
SELECT jsonb_array_elements_text(j->'Friends') FROM users;
yielding result is the following:
jsonb_array_elements_text
-------------------------
foo
bar
baz
bar
bar
Given that this is an array, is it possible to filter this by the term "bar" in some fashion in order to get the count of the number of times it appears? Or am I way off in my approach?
Other Details:
Version: psql (PostgreSQL) 9.5.2
The table in question and a gin index on it.
Please let me know if any additional information is needed, thanks in advance.
You need to use the result of the function as a proper table, then you can easily count the number of times the value appears.
select count(x.val)
from users
cross join lateral jsonb_array_elements_text(tags->'Friends') as x(val)
where x.val = 'bar'

Single search box Web2py, union usage

I am trying to create a single search box on my website.
First I split up the search input in multiple strings using split().
Then I am looping over the multiple strings I created with split(), with every string I create a query. These query's will be stored in a list.
In the next step I am trying to execute all those query's and store the results (rows) in another list.
The next thing I want to do is union all these results(rows). In this case the final result will be an output of a query containing all the different keywords used in the searchbox.
This is my code:
def ajaxlivesearch():
str = request.vars.values()[0]
a=str.split()
items = []
q = []
r =[]
for partialstr in a:
q.append((db.profiel.sport.like('%'+partialstr+'%'))|(db.profiel.speelsterkte.like('%'+partialstr+'%'))|(db.profiel.plaats.like('%'+partialstr+'%')))
for query in q:
r.append(db(query).select(groupby=db.profiel.id))
for results in r:
for (i,row) in enumerate(results):
items.append(DIV(A(B(row.id_user.first_name) ,NBSP(1), B(row.id_user.last_name),BR(), I(row.sport),I(','), NBSP(1), I(row.speelsterkte),I(','), NBSP(1),I(row.plaats),HR(), _id="res%s"%i, _href=row.id_user, _onclick="copyToBox($('#res%s').html())"%i), _id="resultLiveSearch"))
return TAG[''](*items)
My question is: How do I union the multiple results(rows)?
You can get the union of two Rows objects (removing duplicates) as follows:
rows_union = rows1 | rows2
However, it would be more efficient to get all the records in a single query. To simplify, you can also use the .contains method rather than using .like and wrapping each term with %s.
fields = ['sport', 'speelsterkte', 'plaats']
query_terms = [db.profiel[f].contains(term) for f in fields for term in a]
query = reduce(lambda a, b: a | b, query_terms)
results = db(query).select()
Also, you are not using any aggregation functions, so it is not clear why you have specified the groupby argument (and in any case, each record has a unique id, so grouping would have no effect). Perhaps you instead meant orderby=db.profiel.id.
Finally, it is probably not a good idea to do request.vars.values()[0], as request.vars is a dictionary-like object, and the particular value of interest is not guaranteed to be the first item in .values(). Instead, just refer to the name of the particular variable (e.g., request.vars.keyword), which is also more efficient because you are extracting a single item rather than converting all values to a list.

how to choose first row when use group by in web2py

I have a table in db. and when i use this code
results = db().select(db.project.ALL, orderby=db.project.id, groupby=db.project.status)
I can only choose the last row in repeats. how could i choose the first one?
The select() method will return a Rows object, which is a itterable collection of Row objects. The order of which rows will apear in your results variable will vary depending on your orderby parameter passed to your select() method. If you simply want to get a single record, which is the first one out of your select(), you could change the above code to:
results = db().select(db.project.ALL, orderby=db.project.id, groupby=db.project.status).first()
But if your intent is to reverse the order of which you're getting the rows in result, add the tilde operator (~) to your orderby parameter, which will result in adding a ORDER BY ____ DESC in relational databases. Eg:
results = db().select(db.project.ALL, orderby=~db.project.id, groupby=db.project.status)

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Resources