Postgres select by array element range - arrays

In my table I've got column facebook where I store facebook data ( comment count, share count etc.) and It's an array. For example:
{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}
Now I'm trying to SELECT rows that facebook total_count is between 5 and 10. I've tried this:
SELECT * FROM pl where regexp_matches(array_to_string(facebook, ' '), '(\d+).*')::numeric[] BETWEEN 5 and 10;
But I'm getting an error:
ERROR: operator does not exist: numeric[] >= integer
Any ideas?

There is no need to convert the array to text and use regexp. You can access a particular element of the array, e.g.:
with pl(facebook) as (
values ('{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}'::text[])
)
select facebook[1][2] as total_count
from pl;
total_count
-------------
14
(1 row)
Your query may look like this:
select *
from pl
where facebook[1][2]::numeric between 5 and 10
Update. You could avoid the troubles described in the comments if you would use the word null instead of empty strings ''''.
with pl(id, facebook) as (
values
(1, '{{total_count,14},{comment_count,0}}'::text[]),
(2, '{{total_count,null},{comment_count,null}}'::text[]),
(3, '{{total_count,7},{comment_count,10}}'::text[])
)
select *
from pl
where facebook[1][2]::numeric between 5 and 10
id | facebook
----+--------------------------------------
3 | {{total_count,7},{comment_count,10}}
(1 row)
However, it would be unfair to leave your problems without an additional comment. The case is suitable as an example for the lecture How not to use arrays in Postgres. You have at least a few better options. The most performant and natural is to simply use regular integer columns:
create table pl (
...
facebook_total_count integer,
facebook_comment_count integer,
...
);
If for some reason you need to separate this data from others in the table, create a new secondary table with a foreign key to the main table.
If for some mysterious reason you have to store the data in a single column, use the jsonb type, example:
with pl(id, facebook) as (
values
(1, '{"total_count": 14, "comment_count": 0}'::jsonb),
(2, '{"total_count": null, "comment_count": null}'::jsonb),
(3, '{"total_count": 7, "comment_count": 10}'::jsonb)
)
select *
from pl
where (facebook->>'total_count')::integer between 5 and 10
hstore can be an alternative to jsonb.
All these ways are much easier to maintain and much more efficient than your current model. Time to move to the bright side of power.

Related

Snowflake: Trouble getting numbers to return from a PIVOT function

I am moving a query from SQL Server to Snowflake. Part of the query creates a pivot table. The pivot table part works fine (I have run it in isolation, and it pulls numbers I expect).
However, the following parts of the query rely on the pivot table- and those parts fail. Some of the fields return as a string-type. I believe that the problem is Snowflake is having issues converting string data to numeric data. I have tried CAST, TRY_TO_DOUBLE/NUMBER, but these just pull up 0.
I will put the code down below, and I appreciate any insight as to what I can do!
CREATE OR REPLACE TEMP TABLE ATTR_PIVOT_MONTHLY_RATES AS (
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership')) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE('Member Adds')) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE('Member Attrition')) AS Member_Attrition,
((ZEROIFNULL(CAST('Starting Membership' AS FLOAT))
+ ZEROIFNULL(CAST('Member Adds' AS FLOAT))
+ ZEROIFNULL(CAST('Member Attrition' AS FLOAT)))-ZEROIFNULL(CAST('Starting Membership' AS FLOAT)))
/ZEROIFNULL(CAST('Starting Membership' AS FLOAT)) AS "% Change"
FROM
(SELECT * FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL) PT)
I realize this is a VERY big question with a lot of moving parts... So my main question is: How can I successfully change the data type to numeric value, so that hopefully the formulas work in the second half of the query?
Thank you so much for reading through it all!
EDITED FOR SHORTENING THE QUERY WITH UNNEEDED SYNTAX
CAST(), TRY_TO_DOUBLE(), TRY_TO_NUMBER(). I have also put the fields (Starting Membership, Member Adds) in single and double quotation marks.
Unless you are quoting your field names in this post just to highlight them for some reason, the way you've written this query would indicate that you are trying to cast a string value to a number.
For example:
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership'))
This is simply trying to cast a string literal value of Starting Membership to a double. This will always be NULL. And then your ZEROIFNULL() function is turning your NULL into a 0 (zero).
Without seeing the rest of your query that defines the column names, I can't provide you with a correction, but try using field names, not quoted string values, in your query and see if that gives you what you need.
You first mistake is all your single quoted columns names are being treated as strings/text/char
example your inner select:
with ATTR_PIVOT(id, studentname) as (
select * from values
(1, 'student_a'),
(1, 'student_b'),
(1, 'student_c'),
(2, 'student_z'),
(2, 'student_a')
)
SELECT *
FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL
there is no "starting membership" column and we get all the rows..
ID
STUDENTNAME
1
student_a
1
student_b
1
student_c
2
student_z
2
student_a
So you need to change 'Starting Membership' -> "Starting Membership" etc,etc,etc
As Mike mentioned, the 0 results is because the TRY_TO_DOUBLE always fails, and thus the null is always turned to zero.
now, with real "string" values, in real named columns:
with ATTR_PIVOT(Market, Coverage_Mo, "Starting Membership", "Member Adds", "Member Attrition") as (
select * from values
(1, 10 ,'student_a', '23', '150' )
)
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE("Starting Membership")) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE("Member Adds")) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE("Member Attrition")) AS Member_Attrition
FROM ATTR_PIVOT
WHERE "Starting Membership" IS NOT NULL
we get what we would expect:
MARKET
COVERAGE_MO
STARTING_MEMBERSHIP
MEMBER_ADDS
MEMBER_ATTRITION
1
10
0
23
150

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

SQL Server : LIKE - Finding 1 and not 11 in a string

Say I have a SQL Server table with these values:
ID test
-----------------
1 '1,11,X1'
2 'Q22,11,111,51'
3 '1'
4 '5,Q22,1'
If I want to find out which rows contain the comma-separated value '1', I can just do the following and it will work but I'd like to find a better or less wordy way of doing so if it exists. Unfortunately I cannot use RegExp because using \b1\b would be awesome here.
Select test
FROM ...
WHERE
test LIKE '1,%'
OR test = '1'
OR test LIKE '%,1'
OR test LIKE %,1,%
Something like...
WHERE
test LIKE '%[,{NULL}]1[,{NULL}]%'
I know this line isn't correct but you get what I'm after... hopefully ;)
EDITED based on comments below
You shouldn't use comma-delimited values to store lists. You should use a junction table. But, if you have to, the following logic might help:
Select test
FROM ...
WHERE ',' + test + ',' like '%,' + '1' + ',%' ;
This assumes that what you are looking for is "1" as the entire item in the list.
Note: You can/should write the like pattern as '%,1,%'. I just put it in three pieces to separate out the pattern you are looking for.
There are plenty of SplitString functions available if you google around (many here on StackOverflow) that take a comma delimited string like you have, and split it out into multiple rows. You can CROSS APPLY that table-value function to your query, and then just select for those rows that have '1'.
For example, using this splitstring function here (just one of many):
T-SQL split string
You can write this code to get exactly what you want (note, the declare and insert are just to set up test data so you can see it in action):
DECLARE #test TABLE (ID int, test varchar(400));
INSERT INTO #test (ID, test)
VALUES(1, '1,11,X1'),
(2, 'Q22,11,111,51'),
(3, '1'),
(4, '5,Q22,1')
SELECT *
FROM #test
CROSS APPLY splitstring(test)
WHERE [Name] = '1'
This query returns this:
1 1,11,X1 1
3 1 1
4 5,Q22,1 1
select *
from table
where ',' + test + ',' like '%,1,%'
You have to "normalize" your database. If you have multiple attributs for one row, it's a problem!
Add a "One to Many" relation with yours attributs.
You can do like that:
ID, test
1, 1
1, 11
1, X1
2, Q22
2, 11
[...]
SELECT test FROM ...
WHERE ID = (SELECT ID FROM ... WHERE test = 1)
You primary key is (ID, test) now.
You need something like:
SELECT test
FROM _tableName_
WHERE (test LIKE '1,%'
OR test LIKE '%,1'
OR test LIKE '%,1,%'
OR test LIKE '1')
This will return rows that match in order
1 starts a list
1 ends a list
1 is in the middle of a list
1 is its own list

Sum of values of json array in PostgreSQL

In PostgreSQL 9.3, I have a table like this
id | array_json
---+----------------------------
1 | ["{123: 456}", "{789: 987}", "{111: 222}"]
2 | ["{4322: 54662}", "{123: 5121}", "{1: 5345}" ... ]
3 | ["{3232: 413}", "{5235: 22}", "{2: 5453}" ... ]
4 | ["{22: 44}", "{12: 4324}", "{234: 4235}" ... ]
...
I want to get the sum of all values in array_json column. So, for example, for first row, I want:
id | total
---+-------
1 | 1665
Where 1665 = 456 + 987 + 222 (the values of all the elements of json array). No previous information about the keys of the json elements (just random numbers)
I'm reading the documentation page about JSON functions in PostgreSQL 9.3, and I think I should use json_each, but can't find the right query. Could you please help me with it?
Many thanks in advance
You started looking at the right place (going to the docs is always the right place).
Since your values are JSON arrays -> I would suggest using json_array_elements(json)
And since it's a json array which you have to explode to several rows, and then combine back by running sum over json_each_text(json) - it would be best to create your own function (Postgres allows it)
As for your specific case, assuming the structure you provided is correct, some string parsing + JSON heavy wizardry can be used (let's say your table name is "json_test_table" and the columns are "id" and "json_array"), here is the query that does your "thing"
select id, sum(val) from
(select id,
substring(
json_each_text(
replace(
replace(
replace(
replace(
replace(json_array,':','":"')
,'{',''),
'}','')
,']','}')
,'[','{')::json)::varchar
from '\"(.*)\"')::int as val
from json_test_table) j group by id ;
if you plan to run it on a huge dataset - keep in mind string manipulations are expensive in terms of performance
You can get it using this:
/*
Sorry, sqlfiddle is busy :p
CREATE TABLE my_table
(
id bigserial NOT NULL,
array_json json[]
--,CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
INSERT INTO my_table(array_json)
values (array['{"123": 456}'::json, '{"789": 987}'::json, '{"111": 222}'::json]);
*/
select id, sum(json_value::integer)
from
(
select id, json_data->>json_object_keys(json_data) as json_value from
(
select id, unnest(array_json) as json_data from my_table
) A
) B
group by id

Avoid generating duplicating random number

As per the module requirement file name length to be as 8 chars, for that to implement first 4 char DDMM and remaining 4 char trying to fetch the random numbers from the database by using function and view, the same what I am using in database I have pasted below:
Function:
CREATE FUNCTION [dbo].[GenerateRandomNumbersLetters]
(
#NumberOfCharacters TINYINT
)
RETURNS VARCHAR(32)
AS
BEGIN
RETURN
(
SELECT LEFT(REPLACE([NewID], '-', ''), #NumberOfCharacters)
FROM dbo.RetrieveNewID
);
END
View:
CREATE VIEW [dbo].[RetrieveNewID]
AS
SELECT [NewID] = NEWID();
My query:
select
SUBSTRING(replace(convert(varchar(10), getdate(), 3), '/', ''), 1, 4) +
dbo.GenerateRandomNumbersLetters(4) as FileNamerandomNUM
Ex: 0907CCE7
For every row it will provide a random number, but in one scenario recently the random generate duplicates, how can I avoid such scenarios also, kindly advice
There is a risk of 'value repeating' for random numbers especially if you take only the first four digit of a random number.
Instead of that , generate sequence numbers. to implement this you can create a table with structure
file_date | seq_no
Ex: 0907 | 1000
0907 | 1001
then each time you want to get a file name, query against this table for the next sequence number
select max(seq_no)+1 from <table>

Resources