How to get JavaScript/JSON array of array in Postgresql? - arrays

JSON object format is verbose:
"[{"id":1,"name":"John"}, {"id":2,"name":"Jack"}]"
Sometimes, repeating field names take more space than the actual data. To save bandwidth, and speed up page loading, I would like to generate a JavaScript array of arrays in string format instead and send it to the client. For example for this data:
create table temp (
id int,
name text
);
insert into temp values (1, 'John'), (2, 'Jack');
I would like to get '[[1, "John"], [2, "Jack"]]'. How can I do that?
I do not want to aggregate columns by typing them out, since that would be hard to maintain. I also know postgresql does not allow multiple types in an array like JavaScript, so one possibility is to use composite types, but then, stringified/aggregated result ends up having '()' in them.

select array_to_json(array_agg(json_build_array(id, name)))
from temp;
array_to_json
---------------------------
[[1, "John"],[2, "Jack"]]

Related

Hive UDF to mask/transform certain attributes of an array<<struct>> data type in Hive

I need to mask certain attributes of a column with array((struct)) data type in Hive. For example, a field, biodata = [{'name':'Rahul','age':20,'gender':'male'},{'name':'Kavita','age':25,'gender':'female'}]
Here, I need to mask/encrypt 'name' attribute and return array((struct)) as below:
biodata = [{'name':'xvdff','age':20,'gender':'male'},{'name':'ddkfld','age':25,'gender':'female'}]
How can i achieve this with by writing a Hive UDF.
If you want to do it without exploding, then you need to write custom UDF.
sha256 hash is a good method (in Hive it is sha2(input, 256) function) for data obfuscation because it is collision-tolerate and deterministic one-way function. One-way means it is not possible to reverse (cryptographically strong), collision-tolerance means it is very low probability to get the same hash for different input values, and deterministic means that it is always the same hash for the the same input, this property allows you to perform joins on hashed attribute and calculate distinct hashed values, perform other analytics and aggregation in the same way as if they were not hashed.
Using native Hive functions, you can explode, apply sha256, then collect array again.
For example like this:
select t.id,
collect_list(named_struct('name', sha2(e.name, 256), 'age', e.age, 'gender', e.gender)) as result_array
from mytable t
lateral view outer inline(t.biodata) e as name, age, gender
group by t.id
sha256 being applied across all the data in your data warehouse will still give you the possibility to analyze and join by hashed values, though it is not possible to reverse sha256 without having original value->hash mapping.
Additionally you may want to set empty values or other "special values" to NULL or empty instead of hashing them like this: case when name = '' or name = 'NA' then '' else sha2(name, 256) end, it will be more convenient to analyze and filter such values.
The length of sha256 is 64 HEX digits, does not depend on input length. Example for 'test' input string: 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
For older Hive versions without native sha2, you can call DigestUtils method using reflect or java_method: reflect('org.apache.commons.codec.digest.DigestUtils', 'sha256Hex', input)
Less secure and less collision-tolerate hashing method is MD5: md5(input).
Also Hive has mask_hash function for masking data, which is based on MD5 in Hive 2.x and changed to use sha256 in Hive 3.0, see code changes, you can use that code and also read this blog on how to sort array(struct) by specified struct field in GenericUDF, this will give you a good start if you want custom UDF

How to make dynamic references to tables in Anylogic?

I`ve modeled six machines. Each of them has a different profile of electricity load. The load profile is provided in a table in AnyLogic. Every machine has an own table storing these values. I iterate trough the values to implement the same in TableFunctions. Now I face the following challenge: How can I make a dynamic reference to the relevant table. I would like to pick a specific table in dependence of a machine indice. How can I define a variable that dynamically refers to the relevant table object?
Thank you for your help!
not sure it is really necessary in your case but here goes:
You can store a reference to a database table to a variable of the following type:
com.mysema.query.sql.RelationalPathBase
When selecting values of double (int, String, etc.) type in a particular column, you may get the column by index calling variable.getColumns().get(index). Then you need to cast it to the corresponding type like below:
List<Double> resultRows = selectFrom(variable).where(
( (com.mysema.query.types.path.NumberPath<Double>) variable.getColumns().get(1) ).eq(2.0))
.list(( (com.mysema.query.types.path.NumberPath<Double>) variable.getColumns().get(1) ));
Are you always going to have a finite number of machines and how is your load profile represented? If you have a finite number of machines, and the load profile is a set of individual values - or indeed as long as you can hold those values in a single field per element - then you can create a single table, e.g. machine_load_profile, where the first column is load_profile_element and holds element IDs and further columns are named machine_0, machine_1, machine_2 etc., holding the values for each load profile element. You can then get the load profile elements for a single machine like this:
List<Double> dblReturnLPEs = main.selectValues(
"SELECT machine_" + oMachine.getIndex()
+ " FROM machine_load_profile"
+ " ORDER BY load_profile_element;"
);
and either iterate that list or convert them into an array:
dblLPEValues = dblReturnLPEs.stream().mapToDouble(Double::doubleValue).toArray();
and iterate that.
Of course you could also use the opposite orientation for your columns and rows as well, using WHERE, I simply had a handy example oriented in this manner.

PostgreSQL count results within jsonb array across multiple rows

As stated in the title, I am in a situation where I need to return a count of occurrences within an array, that is within a jsonb column. A pseudo example is as follows:
CREATE TABLE users (id int primary key, tags jsonb);
INSERT INTO users (id, j) VALUES
(1, '{"Friends": ["foo", "bar", "baz"]}'),
(2, '{"Friends": ["bar", "bar"]}');
please note that the value for friends can contain the same value more than once. This will be relevant later (in this case the second value contains contains the name "bar" twice in jsonb column under the key "Friends".)
Question:
For the example above, if I were to search for the value "bar" (given a query that I need help to solve), I want the number of times "bar" appears in the j (jsonb) column within the key "Friends"; in this case the end result I would be looking for is the integer 3. As the term "bar" appears 3 times across 2 rows.
Where I'm at:
Currently I have sql written, that returns a text array containing all of the friends values (from the multiple selected rows) in a single, 1 dimensional array. That sql is as follows
SELECT jsonb_array_elements_text(j->'Friends') FROM users;
yielding result is the following:
jsonb_array_elements_text
-------------------------
foo
bar
baz
bar
bar
Given that this is an array, is it possible to filter this by the term "bar" in some fashion in order to get the count of the number of times it appears? Or am I way off in my approach?
Other Details:
Version: psql (PostgreSQL) 9.5.2
The table in question and a gin index on it.
Please let me know if any additional information is needed, thanks in advance.
You need to use the result of the function as a proper table, then you can easily count the number of times the value appears.
select count(x.val)
from users
cross join lateral jsonb_array_elements_text(tags->'Friends') as x(val)
where x.val = 'bar'

How to store data that can be either a numeric range or a numeric value?

In my table I need to store a physical quantity that can be given either as a numeric value or as a numeric interval. The table below illustrates the idea:
------------------------------
Isotope_ID | Atomic_Weight
------------------------------
1 | 1.00784
2 | [6.938, 6.997]
... | ...
This table is unacceptable because the field Atomic_Weight contains values of different types. What is the best practice in such cases?
Edit1: There are three possible ways to represent information about atomic weight:
value + (uncertainty), e.g. 1.00784 (9)
interval, e.g. [6.938, 6.997]
mass number of the most stable isotope e.g 38
These three subtypes cannot be stored in one field because this would violate 1 Normalization Form. This is why the example table is unacceptable.
I will try to restate my question more clearly: What are possible ways to store information about atomic weight (that can be given in one of the three different subtypes) in my database?
either as a numeric value or as a numeric interval
In the case of intervals you can store a single value x as [x,x].
In this application it's not like the single values are exact values. They only represent a measurement to a certain accuracy. Even the interval endpoints only represent measurements to a certain accuracy.
This table is unacceptable because the field Atomic_Weight contains values of different types.
The relational model doesn't care what values are in a "type". If no DBMS "type" fits yours then you must encode your ideal table & column into one or more table(s) and/or column(s).
You can encode them as strings. But then the DBMS doesn't know how to optimize queries involving their constituent values as well as for a multiple column encoding. And you must constantly decode and encode them to manipulate per parts.
Weight_string (isotope, weight)
// VALUES (1, '1.00874'), (2, '[6.938, 6.997]')
What is the best practice in such cases?
The main pattern is to have a table for every non-primitive subtype, and to encode the values of a subtype as one or more columns. (If a subtype has a subtype in turn, repeat.)
Weight_single (isotope, weight_single)
// VALUES (1, 1.00874)
Weight_interval(isotope, weight_min, weight_max)
// VALUES (2, 6.938, 6.997)
Another pattern is to encode each value in as many columns as necessary, whether they are used or not.
Weight_1_row_NULL_unused(isotope, single, interval_min, interval_max,
// VALUES (1, 1.00874, NULL, NULL), (2, NULL, 6.938, 6.997)
Weight_1_row_type_tag(isotope, type_tag, if_single, if_finite_min, if_finite_max)
// VALUES (1, 'single', 1.00874, 0, 0),(2, 'interval', 0, 6.938, 6.997)
Search re SQL subtypes/subtyping tables.
I would go with a table with three columns:
Isotope_ID,
Atomic_Weight_From,
Atomic_Weight_To.
In the case there is only one value, the Atomic_Weight_From, and Atomic_Weight_To will contain the same value.
This way you keep your table as clean as possible, as well as the code that needs to deal with it.

Selective PostgreSQL database querying

Is it possible to have selective queries in PostgreSQL which select different tables/columns based on values of rows already selected?
Basically, I've got a table in which each row contains a sequence of two to five characters (tbl_roots), optionally with a length field which specifies how many characters the sequence is supposed to contain (it's meant to be made redundant once I figure out a better way, i.e. by counting the length of the sequences).
There are four tables containing patterns (tbl_patterns_biliteral, tbl_patterns_triliteral, ...etc), each of which corresponds to a root_length, and a fifth table (tbl_patterns) which is used to synchronise the pattern tables by providing an identifier for each row—so row #2 in tbl_patterns_biliteral corresponds to the same row in tbl_patterns_triliteral. The six pattern tables are restricted such that no row in tbl_patterns_(bi|tri|quadri|quinqui)literal can have a pattern_id that doesn't exist in tbl_patterns.
Each pattern table has nine other columns which corresponds to an identifier (root_form).
The last table in the database (tbl_words), contains a column for each of the major tables (word_id, root_id, pattern_id, root_form, word). Each word is defined as being a root of a particular length and form, spliced into a particular pattern. The splicing is relatively simple: translate(pattern, '12345', array_to_string(root, '')) as word_combined does the job.
Now, what I want to do is select the appropriate pattern table based on the length of the sequence in tbl_roots, and select the appropriate column in the pattern table based on the value of root_form.
How could this be done? Can it be combined into a simple query, or will I need to make multiple passes? Once I've built up this query, I'll then be able to code it into a PHP script which can search my database.
EDIT
Here's some sample data (it's actually the data I'm using at the moment) and some more explanations as to how the system works: https://gist.github.com/823609
It's conceptually simpler than it appears at first, especially if you think of it as a coordinate system.
I think you're going to have to change the structure of your tables to have any hope. Here's a first draft for you to think about. I'm not sure what the significance of the "i", "ii", and "iii" are in your column names. In my ignorance, I'm assuming they're meaningful to you, so I've preserved them in the table below. (I preserved their information as integers. Easy to change that to lowercase roman numerals if it matters.)
create table patterns_bilateral (
pattern_id integer not null,
root_num integer not null,
pattern varchar(15) not null,
primary key (pattern_id, root_num)
);
insert into patterns_bilateral values
(1,1, 'ya1u2a'),
(1,2, 'ya1u22a'),
(1,3, 'ya12u2a'),
(1,4, 'me11u2a'),
(1,5, 'te1u22a'),
(1,6, 'ina12u2a'),
(1,7, 'i1u22a'),
(1,8, 'ya1u22a'),
(1,9, 'e1u2a');
I'm pretty sure a structure like this will be much easier to query, but you know your field better than I do. (On the other hand, database design is my field . . . )
Expanding on my earlier answer and our comments, take a look at this query. (The test table isn't even in 3NF, but the table's not important right now.)
create table test (
root_id integer,
root_substitution varchar[],
length integer,
form integer,
pattern varchar(15),
primary key (root_id, length, form, pattern));
insert into test values
(4,'{s,ş,m}', 3, 1, '1o2i3');
This is the important part.
select root_id
, root_substitution
, length
, form
, pattern
, translate(pattern, '12345', array_to_string(root_substitution, ''))
from test;
That query returns, among other things, the translation soşim.
Are we heading in the right direction?
Well, that's certainly a bizarre set of requirements! Here's my best guess, but obviously I haven't tried it. I used UNION ALL to combine the patterns of different sizes and then filtered them based on length. You might need to move the length condition inside each of the subqueries for speed reasons, I don't know. Then I chose the column using the CASE expression.
select word,
translate(
case root_form
when 1 then patinfo.pattern1
when 2 then patinfo.pattern2
... up to pattern9
end,
'12345',
array_to_string(root.root, '')) as word_combined
from tbl_words word
join tbl_root root
on word.root_id = root.root_id
join tbl_patterns pat
on word.pattern_id = pat.pattern_id
join (
select 2 as pattern_length, pattern_id, pattern1, ..., pattern9
from tbl_patterns_biliteral bi
union all
select 3, pattern_id, pattern1, pattern2, ..., pattern9
from tbl_patterns_biliteral tri
union all
...same for quad and quin...
) patinfo
on
patinfo.pattern_id = pat.pattern_id
and length(root.root) = patinfo.pattern_length
Consider combining all the different patterns into one pattern_details table with a root_length field to filter on. I think that would be easier than combining them all together with UNION ALL. It might be even easier if you had multiple rows in the pattern_details table and filtered based on root_form. Maybe the best would be to lay out pattern_details with fields for pattern_id, root_length, root_form, and pattern. Then you just join from the word table through the pattern table to the pattern detail that matches all the right criteria.
Of course, maybe I've completely misunderstood what you're looking for. If so, it would be clearer if you posted some example data and an example result.

Resources