Let's say I have a table students with a column type jsonb where I store a list with students' additional emails. A student row looks like this
student_id
name
emails
1
John Doe
[j.doe#email.com]
I'm using the following query to update the emails column:
UPDATE students SET emails = emails || '["j.doe#email.com"]'::jsonb
WHERE student_id=1
AND NOT emails #> '["j.doe#email.com"]'::jsonb;
Once the column emails is filled, if I reuse query above with the parameter ["j.doe#email.com", "john#email.com"], the column emails would be update with repeated value:
student_id
name
emails
1
Student 1
[j.doe#email.com, j.doe#email.com, john#email.com]
Is there a way to make sure that in the column emails I'll always have a jsonb list with only unique values ?
Use this handy function which removes duplicates from a jsonb array:
create or replace function jsonb_unique_array(jsonb)
returns jsonb language sql immutable as $$
select jsonb_agg(distinct value)
from jsonb_array_elements($1)
$$;
Your update statement may look like this:
update students
set emails = jsonb_unique_array(emails || '["j.doe#email.com", "john#email.com"]'::jsonb)
where student_id=1
and not emails #> '["j.doe#email.com", "john#email.com"]'::jsonb
Test it in db<>fiddle.
Related
I have a table that I query like this...
select *
from table
where productId = 'abc123'
Which returns 2 rows (even though the productId is unique) because one of the columns (orderName) is an Array...
**productId, productName, created, featureCount, orderName**
abc123, someProductName, 2020-01-01, 12, someOrderName
, , , , someOtherOrderName
I'm not sure whether the missing values in the 2nd row are empty strings or nulls because of the way the orderName array expands my search results but I want to now run a query like this...
select productName, ARRAY_TO_STRING(orderName,'-')
from table
where productId = 'abc123'
and ifnull(featureCount,0) > 0
But this query returns...
someProductName, someOrderName-someOtherOrderName
i.e. both array values came back even though I specified a condition of featureCount>0.
I'm sure I'm missing something very basic about how Arrays function in BigQuery but from Google's ARRAY_TO_STRING documentation I don't see any way to add a condition to the extracting of ARRAY values. Appreciate any thoughts on the best way to go about this.
For what I understand, this is because you are just querying one row of data which have a column as ARRAY<STRING>. As you are using ARRAY_TO_STRINGS it will only accept ARRAY<STRING> values you will see all array values fit into just one cell.
So, when you run your script, your output will fit your criteria and return the columns with arrays with additional rows for visibility.
The visualization on the UI should look like your mention in your question:
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
Note: On bigquery this additional row is gray out ( ) and Its part of row 1 but it shows as an additional row for visibility. So this output only have 1 row in the table.
And the visualization on a JSON will be:
[
{
"productId": "abc123",
"productName": "someProductName",
"created": "2020-01-01",
"featureCount": "12",
"orderName": [
"someOrderName",
"someOtherOrderName"
]
}
]
I don't think there is specific documentation info about how you visualize arrays on UI but I can share the docs that talks about how to flattening your rows outputs into a single row line, check:
Working with Arrays
Flattening Arrays
I use the following to replicate your issue:
CREATE OR REPLACE TABLE `project-id.dataset.working_table` (
productId STRING,
productName STRING,
created STRING,
featureCount STRING,
orderName ARRAY<STRING>
);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123','someProductName','2020-01-01','12',['someOrderName','someOtherOrderName']);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123X','someProductNameX','2020-01-02','15',['someOrderName','someOtherOrderName','someData']);
output
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
2
abc123X
someProductNameX
2020-01-02
15
someOrderName
someOtherOrderName
someData
Note: Table contains 2 rows.
I have an employee table in postgres having a JSON column "mobile" in it. It stores JSON Array value ,
e_id(integer) name(char) mobile(jsonb)
1 John [{\"mobile\": \"1234567891\", \"status\": \"verified\"},{\"mobile\": \"1265439872\",\"status\": \"verified\"}]
2 Ben [{\"mobile\": \"6453637238\", \"status\": \"verified\"},{\"mobile\": \"4437494900\",\"status\": \"verified\"}]
I have a search api which queries this table to search for employee using mobile number.
How can I query mobile numbers directly ?
How should I create index on the jsonb column to make query work faster ?
*updated question
You can query like this:
SELECT e_id, name
FROM employees
WHERE mobile #> '[{"mobile": "1234"}]';
The following index would help:
CREATE INDEX ON employees USING gin (mobile);
I have an array of jsonb elements (jsonb[]), with id and text. To remove an element I could use:
UPDATE "Users" SET chats = array_remove(chats, '{"id": 2, "text": "my message"')
But I want to delete the message just by the id, cause getting the message will cost me another query.
Assuming missing information:
Your table has a PK called user_id.
You want to remove all elements with id = 2 across the whole table.
You don't want to touch other rows.
id is unique within each array of chats.
UPDATE "Users" u
SET chats = array_remove(u.chats, d.chat)
FROM (
SELECT user_id, chat
FROM "Users", unnest(chats) chat
WHERE chat->>'id' = '2'
) d
WHERE d.user_id = u.user_id;
The following explanation matches the extent of provided information in the question:
I have two Hive tables as shown below, along with their columns
Tbl_Customer
Id
Name
Tbl_Cntct
Id
Phone
One Id can have many phone numbers so I have a table
Tbl_All
Id
Name
Phn_List ARRAY
My question is on how to load data from Tbl_Custome and Tbl_Cntct into Tbl_All.
I can do it in PIG, but want to do same in Hive.
Thanks
Insert overwrite table Tbl_All
select cus.id,cus.name,collect_set(ctc.phone)
from Tbl_Customer cus join Tbl_Cntct ctc on cus.id = ctc.id
group by cus.id,cus.name
The collect_set UDAF is a function collects the column into an array with no duplicates.If you want to remain all the value include duplicated ones,use collect_list function
I am bit new on the updating multiple records and i wanted to know the best way to go on about a solution for this, i am writing a stored proc were basically i have two tables,
one that matches a server id to a user id
and another table with record information for each user id with multiple columns with values.
Basically here is how its going to work:
Get all the matching user ids for the specific server id in the tb_UserServerMap table
then foreach userId in the tb_setting table update the columns with the new values
Basic structure of your stored procedure would be:
CREATE PROCEDURE Blah
#Server_ID int /* or whatever data type is appropriate */
as
UPDATE ts
SET
ColumnA = 10 /* New value for column A - maybe passed as a parameter? */
/* More columns here */
FROM
tb_setting ts
inner join
tb_UserServerMap usm
on
ts.user_id = usm.user_id
WHERE
usm.server_id = #Server_ID
I can't fill in more of it without knowing the names of columns to be updated, how those values are obtained, data types, etc.
You don't need a foreach,
Update tblName set firstCol = val1, secondCol = val2 where id in (id1, id2, id3)