Aggregate JSON arrays with column values as keys - arrays

I've got a scenario where I'm trying to aggregate data and insert that aggregated data into another table, all from inside of a function. The data is being inserted into the other table as arrays and JSON. I've been able to aggregate into arrays perfectly fine, but I'm running into some trouble trying to aggregate the data into JSON the way I want.
Basically here is a sample of the data I'm aggregating:
id_1 | id_2 | cat_ids_array
------+------+---------------
201 | 4232 | {9,10,11,13}
201 | 4236 | {11}
201 | 4249 | {12}
201 | 4251 | {9,10}
202 | 4245 | {11}
202 | 4249 | {12}
202 | 4251 | {9,10}
202 | 4259 | {9}
203 | 4232 | {9,10,11,13}
203 | 4236 | {11}
203 | 4249 | {12}
203 | 4251 | {9,10}
203 | 4377 | {14}
204 | 4232 | {15,108}
204 | 4236 | {15}
205 | 4232 | {17,109}
205 | 4245 | {17}
205 | 4377 | {18}
206 | 4253 | {20}
When I use json_agg() to aggregate the id_2 and cat_ids_array into a JSON string here is what I get:
id_1 | json_agg
------+----------------------------------
201 | [{"f1":4232,"f2":[9,10,11,13]}, +
| {"f1":4236,"f2":[11]}, +
| {"f1":4249,"f2":[12]}, +
| {"f1":4251,"f2":[9,10]}]
202 | [{"f1":4245,"f2":[11]}, +
| {"f1":4249,"f2":[12]}, +
| {"f1":4251,"f2":[9,10]}, +
| {"f1":4259,"f2":[9]}]
203 | [{"f1":4232,"f2":[9,10,11,13]}, +
| {"f1":4236,"f2":[11]}, +
| {"f1":4249,"f2":[12]}, +
| {"f1":4251,"f2":[9,10]} +
| {"f1":4377,"f2":[14]}]
204 | [{"f1":4232,"f2":[15,108]}, +
| {"f1":4236,"f2":[15]}]
205 | [{"f1":4232,"f2":[17,109]}, +
| {"f1":4245,"f2":[17]}, +
| {"f1":4377,"f2":[18]}]
206 | [{"f1":4253,"f2":[20]}]
Here is what I'm trying to get:
id_1 | json_agg
------+-------------------------------------------------------------
201 | [{"4232":[9,10,11,13],"4236":[11],"4249":[12],"4251":[9,10]}]
202 | [{"4245":[11],"4249":[12],"4251":[9,10],"4259":[9]}]
203 | [{"4232":[9,10,11,13],"4236":[11],"4249":[12],"4251":[9,10],"4377":[14]}]
204 | [{"4232":[15,108],"4236":[15]}]
205 | [{"4232":[17,109],"4245":[17],"4377":[18]}]
206 | [{"4253":[20]}]
I'm thinking that I will have to do some kind of string concatenation, but I'm not entirely sure the best way to go about this. As stated before, I'm doing this from inside of a function, so I've got some flexibility in what I can do since I'm not limited to just SELECT syntax magic.
Also pertinent, I'm running PostgreSQL 9.3.4 and cannot upgrade to 9.4 in the near future.

It's a pity you cannot upgrade, Postgres 9.4 has jsonb and much added functionality for JSON. In particular json_build_object() would be perfect for you:
Return multiple columns of the same row as JSON array of objects
Almost, but not quite
While stuck with Postgres 9.3, you can get help from hstore to construct an hstore value with id_2 as key and cat_ids_array as value:
hstore(id_2::text, cat_ids_array::text)
Or:
hstore(id_2::text, array_to_json(cat_ids_array)::text)
Then:
json_agg(hstore(id_2::text, array_to_json(cat_ids_array)::text))
But the array is not recognized as array. Once cast to hstore, it's a text string for Postgres. There is hstore_to_json_loose(), but it only identifies boolean and numerical types.
Solution
So I ended up with string manipulation like you predicted. there are various ways to construct the json string. Each is more or less fast / elegant:
format('{"%s":[%s]}', id_2::text, translate(cat_ids_array::text, '{}',''))::json
format('{"%s":%s}', id_2::text, to_json(cat_ids_array))::json
replace(replace(to_json((id_2, cat_ids_array))::text, 'f1":',''),',"f2', '')::json
I picked the second variant, seems to be the most reliable and works for other array types than the simple int[] as well, which might need escaping:
SELECT id_1
, json_agg(format('{"%s":%s}', id_2::text, to_json(cat_ids_array))::json)
FROM tbl
GROUP BY 1
ORDER BY 1;
Result as desired.
SQL Fiddle demonstrating all.

Related

group first started date and last end date for records with a common field

Here is an excerpt of my table data:
|deviceid | |failcount| started | ended |
| a1078 | | 2 | 2020-12-07 14:51:33 | 2020-12-07 17:30:16|
|a1006 | | 2 | 2020-12-09 15:58:01 | 2020-12-09 23:59:59|
|a1006 | | 2 | 2020-12-10 00:00:00 | 2020-12-10 16:40:02|
|a136 | | 71 | 2020-12-18 10:12:19 | 2020-12-18 23:59:59|
|a136 | | 71 | 2020-12-19 00:00:00 | 2020-12-19 04:27:23|
|a1078 | | 36 | 2020-12-21 10:07:09 | 2020-12-21 14:36:40|
What I am trying to get to is to get the earliest start date and the latest end date for each deviceid, BUT for only the same failcount number. The failcount number is the # of failures for each deviceid, so multiple deviceid's can have the same failcount number (should be very few).
This is what im trying to get to:
|deviceid| |failcount| | started | | ended |
|a1078 | | 2 | | 2020-12-07 14:51:33 | |2020-12-07 17:30:16|
|a1006 | | 2 | |2020-12-09 15:58:01 | |2020-12-10 16:40:02|
|a136 | | 71 | |2020-12-18 10:12:19 | |2020-12-19 04:27:23|
|a1078 | | 36 | | 2020-12-21 10:07:09 | |2020-12-21 14:36:40|
I've tried variations of min/max but can't figure out how to keep different failcounts from same deviceid from being combined - for instance, in the above for deviceid a1078, I don't want the started for failcount 2 and the ended for failcount 36.
What I have so far is this but it is combining different failcounts for the same deviceid:
select deviceid
, min(started) as started
, max(started) as ended
from dayfail
where eventno = '600509'
and Convert(char(6),started, 112) = '202012'
group by deviceid
Thx in advance for any assistance to this sql newbie ;)

How to calculate a procentual difference between two columns in Data Studio pivot?

I have a data source table in Google Sheets, looking like:
+------------+--------------+--------+
| Date | Search query | Clicks |
+------------+--------------+--------+
| 09.11.2020 | keyword 1 | 20 |
+------------+--------------+--------+
| 16.11.2020 | keyword 1 | 24 |
+------------+--------------+--------+
| 16.11.2020 | keyword 2 | 23 |
+------------+--------------+--------+
| 09.11.2020 | keyword 2 | 18 |
+------------+--------------+--------+
| 09.11.2020 | keyword 3 | 19 |
+------------+--------------+--------+
| 16.11.2020 | keyword 3 | 17 |
+------------+--------------+--------+
With this data source i have a Data Studio pivot, looking like:
+--------------+------------+------------+
| Search query | 09.11.2020 | 16.11.2020 |
+--------------+------------+------------+
| keyword 1 | 20 | 24 |
+--------------+------------+------------+
| keyword 2 | 18 | 23 |
+--------------+------------+------------+
| keyword 3 | 19 | 17 |
+--------------+------------+------------+
How can i create an additional column in Data Studio with calculation of procentual clicks difference between dates? So the Data Studio table will look like:
+--------------+------------+------------+---------------------------------+-----------------------------+
| Search query | 09.11.2020 | 16.11.2020 | Difference between B and C in % | Formula for Difference in % |
+--------------+------------+------------+---------------------------------+-----------------------------+
| keyword 1 | 20 | 24 | 17 | =100-((B2*100)/C2) |
+--------------+------------+------------+---------------------------------+-----------------------------+
| keyword 2 | 18 | 23 | 22 | =100-((B3*100)/C3) |
+--------------+------------+------------+---------------------------------+-----------------------------+
| keyword 3 | 19 | 17 | -12 | =100-((B4*100)/C4) |
+--------------+------------+------------+---------------------------------+-----------------------------+
Last column contains just formula example.
I tried all available possibilities in Data Studio, but failed. The cause of my fail is maybe a bug i've experienced.
My other try was to build a pivot in Google Sheet directly and calculate the difference there - but this doesn't work for me too - because my Google Sheet breaks pivot when it renews its data.
The key is calculated field. For each pair of columns, where you need a difference, you create a calculated field. Then you add this field to your table and see the calculated difference. Like on the following screenshot.

Building index for specific value

I have a table that keeps inventory information for products in stores on daily basis. It is like:
|------------|-----------|---------|-----------------|
| Date | ProductId | StoreId | InventoryOnHand |
|------------|-----------|---------|-----------------|
| 2017-10-11 | 348 | 121 | 2 |
| 2017-10-11 | 110 | 200 | 0 |
| 2017-10-11 | 254 | 587 | -2 |
| 2017-10-12 | 311 | 875 | 26 |
| 2017-10-12 | 954 | 364 | 15 |
| 2017-10-12 | 348 | 121 | 0 |
| 2017-10-12 | 441 | 121 | 7 |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
|------------|-----------|---------|-----------------|
In most queries I used have condition like WHERE InventoryOnHand > 0. I need to speed up these queries.
Therefore, I want to build and index that separates values on column InventoryOnHand whether they are greater than 0 or not.
Filtered Index does not solve my problem because if I use filtered index all values greater than 0 will be indexed and this increases index size. I only need to know if a value greater than 0 or not.
i.e. I want to build an index that only works when condition is InventoryOnHand > 0. Is there any way to do this on SQL-Server?

SQL Server : Islands And Gaps

I'm struggling with an "Islands and Gaps" issue. This is for SQL Server 2008 / 2012 (we have databases on both).
I have a table which tracks "available" Serial-#'s for a Pass Outlet; i.e., Buss Passes, Admissions Tickets, Disneyland Tickets, etc. Those Serial-#'s are VARCHAR, and can be any combination of numbers and characters... any length, up to the max value of the defined column... which is VARCHAR(30). And this is where I'm mightily struggling with the syntax/design of a VIEW.
The table (IM_SER) which contains all this data has a primary key consisting of:
ITEM_NO...VARCHAR(20),
SERIAL_NO...VARCHAR(30)
In many cases... particularly with different types of the "Bus Passes" involved, those Serial-#'s could easily track into the TENS of THOUSANDS. What is needed... is a simple view in SQL Server... which simply outputs the CONSECUTIVE RANGES of Available Serial-#'s...until a GAP is found (i.e. a BREAK in the sequences). For example, say we have the following Serial-#'s on hand, for a given Item-#:
123
124
125
139
140
ABC123
ABC124
ABC126
XYZ240003
XYY240004
In my example above, the output would be displayed as follows:
123 -to- 125
139 -to- 140
ABC123 -to- ABC124
ABC126 -to- ABC126
XYZ240003 to XYZ240004
In total, there would be 10 Serial-#'s...but since we're outputting the sequential ranges...only 5-lines of output would be necessary. Does this make sense? Please let me know...and, again, THANK YOU!...Mark
This should get you started... the fun part will be determining if there are gaps or not. You will have to handle each serial format a little bit differently to determine if there are gaps or not...
select x.item_no,x.s_format,x.s_length,x.serial_no,
LAG(x.serial_no) OVER (PARTITION BY x.item_no,x.s_format,x.s_length
ORDER BY x.item_no,x.s_format,x.s_length,x.serial_no) PreviousValue,
LEAD(x.serial_no) OVER (PARTITION BY x.item_no,x.s_format,x.s_length
ORDER BY x.item_no,x.s_format,x.s_length,x.serial_no) NextValue
from
(
select item_no,serial_no,
len(serial_no) as S_LENGTH,
case
WHEN PATINDEX('%[0-9]%',serial_no) > 0 AND
PATINDEX('%[a-z]%',serial_no) = 0 THEN 'NUMERIC'
WHEN PATINDEX('%[0-9]%',serial_no) > 0 AND
PATINDEX('%[a-z]%',serial_no) > 0 THEN 'ALPHANUMERIC'
ELSE 'ALPHA'
end as S_FORMAT
from table1 ) x
order by item_no,s_format,s_length,serial_no
http://sqlfiddle.com/#!3/5636e2/7
| item_no | s_format | s_length | serial_no | PreviousValue | NextValue |
|---------|--------------|----------|-----------|---------------|-----------|
| 1 | ALPHA | 4 | ABCD | (null) | ABCF |
| 1 | ALPHA | 4 | ABCF | ABCD | (null) |
| 1 | ALPHANUMERIC | 6 | ABC123 | (null) | ABC124 |
| 1 | ALPHANUMERIC | 6 | ABC124 | ABC123 | ABC126 |
| 1 | ALPHANUMERIC | 6 | ABC126 | ABC124 | (null) |
| 1 | ALPHANUMERIC | 9 | XYY240004 | (null) | XYZ240003 |
| 1 | ALPHANUMERIC | 9 | XYZ240003 | XYY240004 | (null) |
| 1 | NUMERIC | 3 | 123 | (null) | 124 |
| 1 | NUMERIC | 3 | 124 | 123 | 125 |
| 1 | NUMERIC | 3 | 125 | 124 | 139 |
| 1 | NUMERIC | 3 | 139 | 125 | 140 |
| 1 | NUMERIC | 3 | 140 | 139 | (null) |

Transform ranged data in an Access table

I have a table in Access database as below;
Name | Range | X | Y | Z
------------------------------
A | 100-200 | 1 | 2 | 3
A | 200-300 | 4 | 5 | 6
B | 100-200 | 10 | 11 | 12
B | 200-300 | 13 | 14 | 15
C | 200-300 | 16 | 17 | 18
C | 300-400 | 19 | 20 | 21
I have trying write a query that convert this into the following format.
Name | X_100_200 | Y_100_200 | Z_100_200 | X_200_300 | Y_200_300 | Z_200_300 | X_300_400 | Y_300_400 | Z_300_400
A | 1 | 2 | 3 | 4 | 5 | 6 | | |
B | 10 | 11 | 12 | 13 | 14 | 15 | | |
C | | | | 16 | 17 | 18 | 19 | 20 | 21
After trying for a while the best method I could come-up with is to write bunch of short queries that selects the data for each Range and then put them together again using a Union query. The problem is that for this example I have shown 3 columns (X, Y and Z), but I actually have much more. Access is starting to strain with the amount of SQL I have come up with.
Is there a better way to achieve this?
The answer was simple. Just use Access Pivotview. Finding it hard to export the results to Excel though.

Resources