Need guidance in removing + in the result using REGEXP

Need guidance in removing + in the result using REGEXP - snowflake-cloud-data-platform

SELECT ID,
lower(LISTAGG(DISTINCT COL_A, ',') WITHIN GROUP(ORDER BY COL_A)) AS COL_1
FROM table_1
WHERE date = '2022-02-02'
GROUP BY ID
ID
COL_1
12345
abc,+bda,+beach,relax
23456
unknown_user,+unknown_member,+others_to_denote
When I run the above query, I'm getting results like mentioned above. I want the + symbol to be removed from the results. Is it possible to use REGEXP in this case?

So you should do your cleaning prior to the LIST_AGG because currently you can have abc in your output many times:
SELECT
column1 AS ID
,LISTAGG(DISTINCT lower(column2), ',') WITHIN GROUP(ORDER BY lower(column2)) AS COL_1_a
,lower(LISTAGG(DISTINCT column2, ',') WITHIN GROUP(ORDER BY column2)) AS COL_1_b
,replace(COL_1_b,'+')
FROM VALUES
(12345, 'abc'),
(12345, 'ABC'),
(12345, '+ABC'),
(12345, '+ABc'),
(12345, '+AbC'),
(12345, '+bda')
GROUP BY ID;
gives:
ID
COL_1_A
COL_1_B
REPLACE(COL_1_B,'+')
12345
+abc,+bda,abc
+abc,+abc,+abc,+bda,abc,abc
abc,abc,abc,bda,abc,abc
thus with a layer of "cleaning"
SELECT
ID
,LISTAGG(DISTINCT col2_cleaned, ',') WITHIN GROUP(ORDER BY col2_cleaned) AS COL_1
FROM (
SELECT column1 as id,
replace(lower(column2),'+') AS col2_cleaned
FROM (
VALUES
(12345, 'abc'),
(12345, 'ABC'),
(12345, '+ABC'),
(12345, '+ABc'),
(12345, '+AbC'),
(12345, '+bda')
)
)
GROUP BY ID;
more sensible results are produced:
ID
COL_1
12345
abc,bda
Other Cleaning points:
Do you really want all + tokens removed, sometimes it is just tokens at the beginning or the end that we want to remove, thus the replace can remove too much:
SELECT column1 as orig
,replace(lower(column1),'+') AS all_cleaned
,ltrim(lower(column1),'+') AS lt_cleaned
,rtrim(lower(column1),'+') AS rt_cleaned
,trim(lower(column1),'+') AS t_cleaned
FROM VALUES
('abc'),
('ABC'),
('+A+BC'),
('+AB+c+'),
('+AbC+')
;
ORIG
ALL_CLEANED
LT_CLEANED
RT_CLEANED
T_CLEANED
abc
abc
abc
abc
abc
ABC
abc
abc
abc
abc
+A+BC
abc
a+bc
+a+bc
a+bc
+AB+c+
abc
ab+c+
+ab+c
ab+c
+AbC+
abc
abc+
+abc
abc
But you can also use REGEXP_REPLACE if you want to remove upto 2 + from the beginning but not more:
,regexp_replace(lower(column1), '\\+{1,2}','',1,1,'e')

if you just want to remove +, then you can use REPLACE
SELECT ID,COL_1, replace(col_1,'+') FROM VALUES
('12345', 'abc,+bda,+beach,relax')
,('23456','unknown_user,+unknown_member,+others_to_denote') as tab(ID,COL_1)

Related

TSQL, change value on a comma delimited column

I have a column called empl_type_multi which is just a comma delimited column, each value is a link to another table called custom captions.
For instance, i might have the following as a value in empl_type_multi:
123, RHN, 458
Then in the custom_captions table these would be individual values:
123 = Dog
RHN = Cat
458 = Rabbit
All of these fields are NTEXT.
What i am trying to do is convert the empl_type_multi column and chance it to the respective names in the custom_captions table, so in the example above:
123, RHN, 458
Would become
Dog, Cat, Rabbit
Any help on this would be much appreciated.
----- EDIT ------------------------------------------------------------------
Ok so ive managed to convert the values to the corresponding caption and put it all into a temporary table, the following is the output from a CTE query on the table:
ID1 ID2 fName lName Caption_name Row_Number
10007 22841 fname1 lname1 DENTAL ASSISTANT 1
10007 22841 fname1 lname1 2
10007 22841 fname1 lname1 3
10008 23079 fname2 lname2 OPS WARD 1
10008 23079 fname2 lname2 DENTAL 2
10008 23079 fname2 lname2 3
How can i update this so that anything under caption name is added to the caption name of Row_Number 1 separated by a comma?
If i can do that all i need to do is delete all records where Row_Number != 1.
------ EDIT --------------------------------------------------
The solution to the first edit was:
WITH CTE AS
(
SELECT
p.ID1
, p.ID2
, p.fname
, p.lname
, p.caption_name--
, ROW_NUMBER() OVER (PARTITION BY p.id1ORDER BY caption_name DESC) AS RN
FROM tmp_cs p
)
UPDATE tblPerson SET empType = empType + ', ' + c.Data
FROM CTE c WHERE [DB1].dbo.tblPerson.personID = c.personID AND RN = 2
And then i just incremented RN = 2 until i got 0 rows affected.
This was after i ran:
DELETE FROM CTE WHERE RN != 1 AND Caption_name = ''

select ID1, ID2, fname, lname, left(captions, len(captions) - 1) as captions
from (
select distinct ID1, ID2, cast(fname as nvarchar) as fname, cast(lname as nvarchar) as lname, (
select cast(t1.caption_name as nvarchar) + ','
from #temp as t1
where t1.ID1 = t2.ID1
and t1.ID2 = t2.ID2
and cast(caption_name as nvarchar) != ''
order by t1.[row_number]
for xml path ('')) captions
from #temp as t2
) yay_concatenated_rows
This will give you what you want. You'll see casting from ntext to varchar. This is necessary for comparison because many logical ops can't be performed on ntext. It can be implicitly cast back the other way so no worries there. Note that when casting I did not specify length; this will default to 30, so adjust as varchar(length) as needed to avoid truncation. I also assumed that both ID1 and ID2 form a composite key (it appears this is so). Adjust the join as you need for the relationship.

you have just shared your part of problem,not exact problem.
try this,
DECLARE #T TABLE(ID1 VARCHAR(50),ID2 VARCHAR(50),fName VARCHAR(50),LName VARCHAR(50),Caption_name VARCHAR(50),Row_Number INT)
INSERT INTO #T VALUES
(10007,22841,'fname1','lname1','DENTAL ASSISTANT', 1)
,(10007,22841,'fname1','lname1', NULL, 2)
,(10007,22841,'fname1','lname1', NULL, 3)
,(10008,23079,'fname2','lname2','OPS WARD', 1)
,(10008,23079,'fname2','lname2','DENTAL', 2)
,(10008,23079,'fname2','lname2', NULL, 3)
SELECT *
,STUFF((SELECT ','+Caption_name
FROM #T T1 WHERE T.ID1=T1.ID1 FOR XML PATH('')
),1,1,'')
FROM #T T

You can construct the caption_name string easily by looping through while loop
declare #i int = 2,#Caption_name varchar(100)= (select series from
#temp where Row_Number= 1)
while #i <= (select count(*) from #temp)
begin
select #Caption_name = #Caption_name + Caption_name from #temp where Row_Number = #i)
set #i = #i+1
end
update #temp set Caption_name = #Caption_name where Row_Number = 1
and use case statement to remove null values
(select case when isnull(Caption_name ,'') = '' then
'' else ',' + Caption_name end

How to create a new "shared" group in a single SELECT query in MS SQL

I'm trying to group a SELECT like you'd normally do - AND at the same time make a new "shared/aggregate group" adding that to the original result-set without a secondary SELECT and UNION.
The secondary SELECT and UNION is out of the question since the real use of this is with some very big tables, with a lot of joins, so it would be waay to slow. So the UNION way is definitely out of the question.
I've tried my best to illustrate this with the following simplified example:
BEGIN TRAN
CREATE TABLE #MyTable
(
id INT,
name VARCHAR(255)
)
INSERT INTO #MyTable VALUES (1,'cola');
INSERT INTO #MyTable VALUES (2,'cola');
INSERT INTO #MyTable VALUES (3,'cola');
INSERT INTO #MyTable VALUES (4,'fanta');
INSERT INTO #MyTable VALUES (5,'fanta');
INSERT INTO #MyTable VALUES (6,'fanta');
INSERT INTO #MyTable VALUES (7,'water');
INSERT INTO #MyTable VALUES (8,'water');
INSERT INTO #MyTable VALUES (9,'water');
INSERT INTO #MyTable VALUES (10,'cola');
INSERT INTO #MyTable VALUES (11,'cola');
SELECT
CASE
WHEN name = 'cola' OR name = 'fanta'
THEN 'soda'
ELSE
name
END as name,
COUNT(distinct id) as count
FROM #MyTable
GROUP BY name
ROLLBACK TRAN
Actual output:
soda 5
soda 3
water 3
Desired output:
cola 5
fanta 3
soda 8 <- this is the "shared/aggregate group"
water 3

As Panagiotis Kanavos correctly pointed out in the comment above, this can be done using ROLLUP:
BEGIN TRAN
CREATE TABLE #BeverageType
(
name VARCHAR(255)
)
INSERT INTO #BeverageType VALUES ('Soda');
INSERT INTO #BeverageType VALUES ('Other');
CREATE TABLE #UserBeverage
(
id INT,
name VARCHAR(255)
)
INSERT INTO #UserBeverage VALUES (1,'cola');
INSERT INTO #UserBeverage VALUES (2,'cola');
INSERT INTO #UserBeverage VALUES (3,'cola');
INSERT INTO #UserBeverage VALUES (1,'fanta'); -- <- NOTE: user 1 drinks both cola and fanta so the as intended the user is only counted 1 time in the ROLLUP 'Soda' group (7)
INSERT INTO #UserBeverage VALUES (5,'fanta');
INSERT INTO #UserBeverage VALUES (6,'fanta');
INSERT INTO #UserBeverage VALUES (7,'water');
INSERT INTO #UserBeverage VALUES (8,'water');
INSERT INTO #UserBeverage VALUES (9,'water');
INSERT INTO #UserBeverage VALUES (10,'cola');
INSERT INTO #UserBeverage VALUES (11,'cola');
SELECT ub.name, bt.name AS groupName, COUNT(distinct id) as uniqueUserCount
FROM #UserBeverage as ub
JOIN #BeverageType as bt
ON CASE
WHEN (ub.name = 'water')
THEN 'Other'
ELSE
'Soda'
END = bt.name
GROUP BY ROLLUP(bt.name, ub.name)
ROLLBACK TRAN
Outputs:
cola Soda 5
fanta Soda 3
water Other 3
NULL Other 3
NULL Soda 7
NULL NULL 10

You should repeat CASE statement everywhere.
SELECT
CASE WHEN name = 'cola' OR name = 'fanta'
THEN 'soda' ELSE name END as name,
COUNT((CASE WHEN name = 'cola' OR name = 'fanta'
THEN 'soda' ELSE name END)) as count
FROM #MyTable
GROUP BY CASE WHEN name = 'cola' OR name = 'fanta'
THEN 'soda' ELSE name END
+-------+-------+
| name | count |
+-------+-------+
| soda | 8 |
+-------+-------+
| water | 3 |
+-------+-------+
Can I suggest to use a subquery:
SELECT name, count(*) AS count
FROM (SELECT CASE WHEN name = 'cola' OR name = 'fanta'
THEN 'soda' ELSE name END as name
FROM #MyTable) x
GROUP BY name;

If you need the aggregate as well as the individual products, then an alternative may be to use a UNION and select the aggregates as a second query.
SELECT name, count(distinct id) as count
FROM #MyTable
GROUP BY name
UNION
SELECT 'SODA', COUNT(distinct id) as count
FROM #MyTable
WHERE name = 'cola' or name ='fanta'
You might also use Søren Høyer Kristensen's summary table to get the aggregate names if you need more groupings.

SQL Server array_agg (master - detail)

How can I convert this PostgreSQL code to SQL Server ?
select
countries.title,
(select array_to_json(array_agg(row_to_json(t)))
from postcodes t
where t.country_id = countries.id) as codes
from countries
My initial problem is that I need to select complete master table and with each row all details.
Countries:
id title
1 SLO
2 AUT
PostCodes:
id country_id code title
1 1 1000 Lj
2 1 2000 Mb
3 2 22180 Vi
4 2 22484 De
Desired result:
1 SLO 1000;Lj|2000;MB
2 AUT 22180;Vi|22484;De
Not:
1 SLO 1000 Lj
1 SLO 2000 Mb
2 AUT 22180 Vi
2 AUT 22484 De
The best solution would be using FOR JSON, but unfortunately I need support for 2008 or at least 2012.
With left join all master data are duplicated for detail count, but I do not want to do this. Even worse it would be to select all countries and then call select on post_codes for every country in for loop.

select countries.title,
STUFF((select '|' + t.code + ';' + t.title
from postcodes t
where t.country_id = countries.id
FOR XML PATH('')
),1,1,'') as codes
from countries
-- CAST t.code to VARCHAR if it's Number

try this:
Select Main.COUNTRY_ID,c.title,Left(Main.POSTCODES,Len(Main.POSTCODES)-1) As "POSTCODES"
From
(
Select distinct ST2.COUNTRY_ID,
(
Select ST1.CODE+';'+ST1.TITLE + '|' AS [text()]
From dbo.POSTCODES ST1
Where ST1.COUNTRY_ID = ST2.COUNTRY_ID
ORDER BY ST1.COUNTRY_ID
For XML PATH ('')
) [POSTCODES]
From dbo.POSTCODES ST2
) [Main]
inner join countries c on c.id=main.country_id

Using XML PATH for concatenation can increase the complexity of your code. It's better to implement a CLR aggregation function. Then, you can do the following:
SELECT C.[id]
,C.[title]
,REPLACE([dbo].[Concatenate] (P.[code] + ';' + P.[title]), ',', '|')
FROM #Countries C
INNER JOIN #PostCodes P
ON C.[id] = p.[country_id]
GROUP BY C.[id]
,C.[title];
You can create your own version of the concatenate aggregate - you can specify the delimiter, the order, etc. I can show you examples if you want.
DECLARE #Countries TABLE
(
[id] TINYINT
,[title] VARCHAR(12)
);
INSERT INTO #Countries ([id], [title])
VALUES (1, 'SLO')
,(2, 'AUT');
DECLARE #PostCodes TABLE
(
[id] TINYINT
,[country_id] TINYINT
,[code] VARCHAR(12)
,[title] VARCHAR(12)
);
INSERT INTO #PostCodes ([id], [country_id], [code], [title])
VALUES (1, 1, 1000, 'Lj')
,(2, 1, 2000, 'Mb')
,(3, 2, 22180, 'Vi')
,(4, 2, 22484, 'De');
SELECT C.[id]
,C.[title]
,REPLACE([dbo].[Concatenate] (P.[code] + ';' + P.[title]), ',', '|')
FROM #Countries C
INNER JOIN #PostCodes P
ON C.[id] = p.[country_id]
GROUP BY C.[id]
,C.[title];

Find columns that contain all matches from a comma delimited list using TSQL

I have a table "AddressSearch". In one column I have stored a comma separated list of strings.
eg: Table: AddressSearch
col1
-----------------
UK, east, london
UK, Cambridge, Museum
Maryland, Johns University
I also have another table named "Main" which has a column "full_address" that stores the full addresses in the format "xxx, east london, UK, E15 xxx".
I need a way to find all occurrences in the "Main" table where full_address contains all the strings that are comma separated in each row in the AddressSearch table.
eg: For the first row in AddressSearch, it should match all the rows in Main and filter out the rows that contain "UK" AND "east" AND "London".
I have already tried to split the strings into a table variable and do an inner join between Main and AddressSearch with PATINDEX. But this will only give me rows from Main that has either "UK" OR "east" OR "London".
Any suggestions?

Perhaps this may help
Declare #Main table (Full_Address varchar(150))
Insert into #Main values
('xxx, east london, UK, E15 xxx'),
('The Museum, Cambridge , UK, E25 xxx'),
('Mary Land, University , UK, E25 xxx')
Declare #AddressSearch table (col varchar(150))
Insert into #AddressSearch values
('UK, east, london'),
('UK, Cambridge, Museum'),
('Maryland, Johns University')
;with cteSearch as (Select *,RowNr=Row_Number() over (Order by Col) from #AddressSearch
),cteParsed as (Select RowNr,B.Key_Value,Items=count(*) over (Partition By RowNr) From cteSearch A Cross Apply (Select * From [dbo].[udf-Str-Parse](A.col,',')) B
),cteFinal as (Select A.Full_Address,Col,Items,Hits = count(*) over (Partition By Full_Address,B.RowNr) From #Main A Join cteParsed B on (charindex(B.Key_Value,A.Full_Address)>0) Join cteSearch C on (B.RowNr=C.RowNr)
)
Select Distinct * from cteFinal Where Hits=Items
Returns
Full_Address Col Items Hits
The Museum, Cambridge , UK, E25 xxx UK, Cambridge, Museum 3 3
xxx, east london, UK, E15 xxx UK, east, london 3 3
Forgot the UDF if needed
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#Delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
-- Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
Returns #ReturnTable Table (Key_PS int IDENTITY(1,1), Key_Value varchar(max))
As
Begin
Declare #XML xml;Set #XML = Cast('<x>' + Replace(#String,#Delimeter,'</x><x>')+'</x>' as XML)
Insert Into #ReturnTable Select ltrim(rtrim(String.value('.', 'varchar(max)'))) FROM #XML.nodes('x') as T(String)
Return
End

When you replace something from a string and if the match is not found,entire string is returned.
select replace('abc','d','')--abc
Based on that and using split string function from here,wrote below logic..
create table #t
(
addr varchar(max)
)
insert into #t
select 'UK, east, london'
create table #main
(
addr1 varchar(max)
)
insert into #main
select 'xxx, east london, UK, E15 xxx'
;with cte
as
(select b.* from #t m
cross apply
(
select * from [dbo].[SplitStrings_Numbers](m.addr,',')) b
)
select case when addr1<>rplc then 'Exists' else 'Not Exists' end as 'status',item
from #main m
cross apply
(select replace(m.addr1,c.item,'') as rplc,c.item from cte c)b
Output:
status item
Exists UK
Exists east
Exists london

SQL Server Script Quick Replace all found strings with incrementing integer

I have large INSERT SQL script that I want to modify it with quick replace. By replacing each found string with interger, where every next integer is previous integer+1.
Before:
INSERT Compartment (CompartmentID) VALUES ('A')
INSERT Compartment (CompartmentID) VALUES ('B')
After:
INSERT Compartment (CompartmentID) VALUES (1)
INSERT Compartment (CompartmentID) VALUES (2)
I know how to find the specific strings, but I can't find anywhere syntax or way have to replace it incrementing integers.

You can replace all you char CompartmentID with ordered numbers like this:
declare #Compartment table(CompartmentID varchar(10), name varchar(10), intID int)
INSERT INTO #Compartment(CompartmentID, name) values
('a', 'a')
, ('b', 'b')
, ('c', 'c')
, ('d', 'd')
, ('e', 'e')
UPDATE c SET CompartmentID = o.ID
FROM #Compartment c
INNER JOIN (
SELECT CompartmentID, ID = ROW_NUMBER() over(ORDER BY CompartmentID)
FROM #Compartment
) o ON c.CompartmentID = o.CompartmentID
SELECT * FROM #Compartment
Output:
CompartmentID name
1 a
2 b
3 c
4 d
5 e
It would be better to create a new column of type int or change the type of CompartmentID once the update is finished.
You should also use an identity column if you want the numbers to be incremented automaticaly.

Not sure how you want to handle empty string. You can select the rows where CompartmentID contains a character that isnt a numeric and update the result set like this:
DECLARE #Compartment table(CompartmentID varchar(20))
INSERT #Compartment(CompartmentID) VALUES ('A'),('A'),('B'),('1'),('A1')
-- EDIT: Changed answer
;WITH CTE as
(
SELECT CompartmentID, DENSE_RANK() over (ORDER BY CompartmentID) rn
FROM #Compartment
--WHERE CompartmentID LIKE '%[^0-9]%' OR CompartmentID = ''
)
UPDATE CTE
SET CompartmentID = rn
FROM CTE
Result:
CompartmentID
2
2
4
1
3
Note: Now all id will CompartmentID changed(also the numeric CompartmentID), identical values for old CompartmentID will get identical numeric values.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Need guidance in removing + in the result using REGEXP - snowflake-cloud-data-platform

if you just want to remove +, then you can use REPLACE SELECT ID,COL_1, replace(col_1,'+') FROM VALUES ('12345', 'abc,+bda,+beach,relax') ,('23456','unknown_user,+unknown_member,+others_to_denote') as tab(ID,COL_1)

Related

TSQL, change value on a comma delimited column

How to create a new "shared" group in a single SELECT query in MS SQL

SQL Server array_agg (master - detail)

Find columns that contain all matches from a comma delimited list using TSQL

SQL Server Script Quick Replace all found strings with incrementing integer

Categories

Resources