Comma separated values count - sql-server

Table Pattern has single column with the following values:
NewsletteridPattern
------------
%50%
%51%
Table B has the following values:
SubscriberId NewsletterIdCsv
------------ -----------------
47421584 51
45551047 50,51
925606902 50
47775985 51
I have the following query which basically counts the comma seperated values by using the pattern:
SELECT *
FROM TABLEB t WITH (nolock)
JOIN Patterns p ON (t.NewsletteridPattern LIKE p.pattern)
The problem is that the count is incorrect as for example my pattern has %50% and %51% and thus the row number 2 from Table B should be counted twice, however with my query it is only once, how do I fx that?
EDIT :
I forgot to add DISTINCT in my original query which was causing the issue:
SELECT Count(Distinct Subscriberid)
FROM TABLEB t WITH (nolock)
JOIN Patterns p ON (t.NewsletteridPattern LIKE p.pattern)

I mocked up your data as such:
create table #pattern (pattern varchar(50))
insert into #pattern values ('%50%')
insert into #pattern values ('%51%')
create table #subscriber (id varchar(50), newsletter varchar(50))
insert into #subscriber values ('47421584', '51')
insert into #subscriber values ('45551047', '50,51')
insert into #subscriber values ('925606902', '50')
insert into #subscriber values ('47775985', '51')
SELECT pattern, COUNT(*) AS Counter
FROM #subscriber t WITH (nolock)
JOIN #pattern p ON (t.newsletter LIKE p.pattern)
GROUP BY pattern
And my select statement returns:
pattern Counter
------- -------
%50% 2
%51% 3
What is your final goal? Are you just concerned about counting the number of rows by pattern or are you trying to do a select of rows by pattern?

Related

How can I count pairs when the columns are equal/not equal in TSQL

I have a dataset that I want to count when the pairs are equal and not equal, grouping by one column. A toy dataset would look like this:
DECLARE #t Table (
SampleNumber varchar(max),
SampleType varchar(max),
A varchar(max),
B varchar(max))
INSERT INTO #t VALUES
('B1','DD','PASS','FAIL'),
('B1','DS','PASS','FAIL'),
('B2','DD','PASS','PASS'),
('B2','DS','PASS','PASS'),
('B3','DD','NA','NA'),
('B3','DS','NA','PASS'),
('B4','DD','PASS','PASS'),
('B4','DS','PASS','FAIL')
SELECT * FROM #t
So for this dataset I would like the output to look something like this:
I will note that I have about many SampleNumbers (100+) and about 10 columns (e.g. A through J) that I need to roll the data up from so I was hoping to a flexible solution.
It would be horribly inefficient to split the SampleType into two temp tables (e.g. DD and DS) and join by SampleNumber.
How would I accomplish this?
SELECT IQ.A, SUM(IQ.X) as DD_AGREE, SUM(1 -IQ.X) AS DD_DIsagree
FROM
(SELECT A,CASE WHEN A = B THEN 1 ELSE 0 END X FROM #t) IQ
GROUP BY IQ.A
UNION SELECT B, 0,0 FROM #T where B NOT IN(SELECT t2.A FROM #t t2)

PATINDEX all values of a column

I'm making a query that will delete all rows from table1 that has its column table1.id = table2.id
table1.id column is in nvarchar(max) with an xml format like this:
<customer><name>Paulo</name><gender>Male</gender><id>12345</id></customer>
EDIT:
The id column is just a part of a huge XML so the ending tag may not match the starting tag.
I've tried using name.nodes but it only applies to xml columns and changing the column datatype is not a choice, So far this is the my code using PATINDEX
DELETE t1
FROM table1 t1
WHERE PATINDEX('%12345%',id) != 0
But what I need is to search for all values from table2.id which contains like this:
12345
67890
10000
20000
30000
Any approach would be nice like sp_executesql and/or while loop, or is there a better approach than using patindex? thanks!
Select *
--Delete A
From Table1 A
Join Table2 B on CharIndex('id>'+SomeField+'<',ID)>0
I don't know the name of the field in Table2. I am also assuming it is a varchar. If not, cast(SomeField as varchar(25))
EDIT - This is what I tested. It should work
Declare #Table1 table (id varchar(max))
Insert Into #Table1 values
('<customer><name>Paulo</name><gender>Male</gender><id>12345</id></customer>'),
('<customer><name>Jane</name><gender>Femail</gender><id>7895</id></customer>')
Declare #Table2 table (SomeField varchar(25))
Insert into #Table2 values
('12345'),
('67890'),
('10000'),
('20000'),
('30000')
Select *
--Delete A
From #Table1 A
Join #Table2 B on CharIndex('id>'+SomeField+'<',ID)>0
;with cteBase as (
Select *,XMLData=cast(id as xml) From Table1
)
Select *
From cteBase
Where XMLData.value('(customer/id)[1]','int') in (12345,67890,10000,20000,30000)
If you are satisfied with the results, change the final Select * to Delete

T-SQL: Two Level Aggregation in Same Query

I have a query that joins a master and a detail table. Master table records are duplicated in results as expected. I get aggregation on detail table an it works fine. But I also need another aggregation on master table at the same time. But as master table is duplicated, aggregation results are duplicated too.
I want to demonstrate this situation as below;
If Object_Id('tempdb..#data') Is Not Null Drop Table #data
Create Table #data (Id int, GroupId int, Value int)
If Object_Id('tempdb..#groups') Is Not Null Drop Table #groups
Create Table #groups (Id int, Value int)
/* insert groups */
Insert #groups (Id, Value)
Values (1,100), (2,200), (3, 200)
/* insert data */
Insert #data (Id, GroupId, Value)
Values (1,1,10),
(2,1,20),
(3,2,50),
(4,2,60),
(5,2,70),
(6,3,90)
My select query is
Select Sum(data.Value) As Data_Value,
Sum(groups.Value) As Group_Value
From #data data
Inner Join #groups groups On groups.Id = data.GroupId
The result is;
Data_Value Group_Value
300 1000
Expected result is;
Data_Value Group_Value
300 500
Please note that, derived table or sub-query is not an option. Also Sum(Distinct groups.Value) is not suitable for my case.
If I am not wrong, you just want to sum value column of both table and show it in a single row. in that case you don't need to join those just select the sum as a column like :
SELECT (SELECT SUM(VALUE) AS Data_Value FROM #DATA),
(SELECT SUM(VALUE) AS Group_Value FROM #groups)
SELECT
(
Select Sum(d.Value) From #data d
WHERE EXISTS (SELECT 1 FROM #groups WHERE Id = d.GroupId )
) AS Data_Value
,(
SELECT Sum( g.Value) FROM #groups g
WHERE EXISTS (SELECT 1 FROM #data WHERE GroupId = g.Id)
) AS Group_Value
I'm not sure what you are looking for. But it seems like you want the value from one group and the collected value that represents a group in the data table.
In that case I would suggest something like this.
select Sum(t.Data_Value) as Data_Value, Sum(t.Group_Value) as Group_Value
from
(select Sum(data.Value) As Data_Value, groups.Value As Group_Value
from data
inner join groups on groups.Id = data.GroupId
group by groups.Id, groups.Value)
as t
The edit should do the trick for you.

Inserting random number of rows in SQL Server via join to integer list is inconsistent

I am creating a database with sample data. Each time I run the stored procedure to generate some new data for my sample database, I would like to clear out and repopulate table B ("Item") based on all the rows in table A ("Product").
If table A contained the rows with primary key values 1, 2, 3, 4, and 5, I would want table B to have a foreign key for table A and insert a random number of rows into table B for each table A row. (We are essentially stocking the shelves with a random number of "item" for any given "product.")
I am using code from this answer to generate a list of numbers. I join to the results of this function to create the rows to insert:
WITH cte AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (select 0)) AS i
FROM
sys.columns c1 CROSS JOIN sys.columns c2 CROSS JOIN sys.columns c3
)
SELECT i
FROM cte
WHERE
i BETWEEN #p_Min AND #p_Max AND
i % #p_Increment = 0
Random numbers are generated in a view (to get around the limitations of functions) as follows:
-- Mock.NewGuid view
SELECT id = ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)))
And a function that returns the random numbers:
-- Mock.GetRandomInt(min, max) function definition
DECLARE #random int;
SELECT #random = Id % (#MaxValue - #MinValue + 1) FROM Mock.NewGuid;
RETURN #random + #MinValue;
However, when you look at this code and execute it...
WITH Products AS
(
SELECT ProductId, ItemCount = Mock.GetRandomInt(1,5)
FROM Product.Product
)
SELECT A = Products.ProductId, B = i
FROM Products
JOIN (SELECT i FROM Mock.GetIntList(1,5,1)) Temp ON
i < Products.ItemCount
ORDER BY ProductId, i
... this returns some inconsistent results!
A,B
1,1
1,2
1,3
2,1
2,2
3,2 <-- where is 1?
3,3
4,1
5,3 <-- where is 1, 2?
6,1
I would expect that, for every product id, the JOIN results in 1-5 rows. However, it seems like values get skipped! This is even more apparent with larger data sets. I was originally trying to generate 20-50 rows in Item for each Product row, but this resulted in only 30-40 rows for each product.
The question: Any idea why this is happening? Each product should have a random number of rows (between 1 and 5) inserted for it and the B value should be sequential! Instead, some numbers are missing!
This issue also happens if I store numbers in a table I created and then join to that, or if I use a recursive CTE.
I am using SQL Server 2008R2, but I believe I see the same issue on my 2012 database as well. Compatibility levels are 2008 and 2012 respectively.
This is a fun problem. I've dealt with this in a round about way a number of times. I am sure there is a way to not use a cursor. But why not. This is a cheap problem memory wise so long as the #RandomMaxRecords doesn't get huge or you have a significant amount of product records. If the data in the Items table is meaningless then I would suggest truncating any in memory table where I define the hash table for #Item. And obviously you will pull from your Product table not the hash I have created for testing.
This is a fantastic article and describes in detail how I arrive at my solution. Less Than Dot Blog
CODE
--This is your product table with 5 random products
IF OBJECT_ID('tempdb..#Product') IS NOT NULL DROP TABLE #Product
CREATE TABLE #Product
(
ProductID INT PRIMARY KEY IDENTITY(1,1),
ProductName VARCHAR(25),
ProductDescription VARCHAR(max)
)
INSERT INTO #Product (ProductName,ProductDescription) VALUES ('Product Name 1','Product Description 1'),
('Product Name 2','Product Description 2'),
('Product Name 3','Product Description 3'),
('Product Name 4','Product Description 4'),
('Product Name 5','Product Description 5')
--This is your item table. This would probably just be a truncate statement so that your table is reset for the new values to go in
IF OBJECT_ID ('tempdb..#Item') IS NOT NULL DROP TABLE #Item
CREATE TABLE #Item
(
ItemID INT PRIMARY KEY IDENTITY(1,1),
FK_ProductID INT NOT NULL,
ItemName VARCHAR(25),
ItemDescription VARCHAR(max)
)
--Declare a bunch of variables for the cursor and insert into the item table process
DECLARE #ProductID INT
DECLARE #ProductName VARCHAR(25)
DECLARE #ProductDescription VARCHAR(max)
DECLARE #RandomItemCount INT
DECLARE #RowEnumerator INT
DECLARE #RandomMaxRecords INT = 10
--We declare a cursor to iterate over the records in product and generate random amounts of items
DECLARE ItemCursor CURSOR
FOR SELECT * FROM #Product
OPEN ItemCursor
FETCH NEXT FROM ItemCursor INTO #ProductID, #ProductName, #ProductDescription
WHILE (##FETCH_STATUS <> -1)
BEGIN
--Get the Random Number into the variable. And we only want 1 or more records. Mod division will produce a 0.
SELECT #RandomItemCount = ABS(CHECKSUM(NewID())) % #RandomMaxRecords
SELECT #RandomItemCount = CASE #RandomItemCount WHEN 0 THEN 1 ELSE #RandomItemCount END
--Iterate on the RowEnumerator to the RandomItemCount and insert item rows
SET #RowEnumerator = 1
WHILE (#RowEnumerator <= #RandomItemCount)
BEGIN
INSERT INTO #Item (FK_ProductID,ItemName,ItemDescription)
SELECT #ProductID, REPLACE(#ProductName,'Product','Item'),REPLACE(#ProductDescription,'Product','Item')
SELECT #RowEnumerator = #RowEnumerator + 1
END
FETCH NEXT FROM ItemCursor INTO #ProductID, #ProductName, #ProductDescription
END
CLOSE ItemCursor
DEALLOCATE ItemCursor
GO
--Look at the result
SELECT
*
FROM
#Product AS P
RIGHT JOIN #Item AS I ON (P.ProductID = I.FK_ProductID)
--Cleanup
DROP TABLE #Product
DROP TABLE #Item
It looks like a LEFT OUTER JOIN to GetIntList (as opposed to INNER JOIN) fixes the problem I am having.

Table variable in SQL Server

I am using SQL Server 2005. I have heard that we can use a table variable to use instead of LEFT OUTER JOIN.
What I understand is that, we have to put all the values from the left table to the table variable, first. Then we have to UPDATE the table variable with the right table values. Then select from the table variable.
Has anyone come across this kind of approach? Could you please suggest a real time example (with query)?
I have not written any query for this. My question is - if someone has used a similar approach, I would like to know the scenario and how it is handled. I understand that in some cases it may be slower than the LEFT OUTER JOIN.
Please assume that we are dealing with tables that have less than 5000 records.
Thanks
It can be done, but I have no idea why you would ever want to do it.
This realy does seem like it is being done backwards. But if you are trying this for your own learning only, here goes:
DECLARE #MainTable TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #MainTable SELECT 1, 1
INSERT INTO #MainTable SELECT 2, 2
INSERT INTO #MainTable SELECT 3, 3
INSERT INTO #MainTable SELECT 4, 4
DECLARE #LeftTable TABLE(
ID INT,
MainID INT,
Val FLOAT
)
INSERT INTO #LeftTable SELECT 1, 1, 11
INSERT INTO #LeftTable SELECT 3, 3, 33
SELECT *,
mt.Val + ISNULL(lt.Val, 0)
FROM #MainTable mt LEFT JOIN
#LeftTable lt ON mt.ID = lt.MainID
DECLARE #Table TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #Table
SELECT ID,
Val
FROM #MainTable
UPDATE #Table
SET Val = t.Val + lt.Val
FROM #Table t INNER JOIN
#LeftTable lt ON t.ID = lt.ID
SELECT *
FROM #Table
I don't think it's very clear from your question what you want to achieve? (What your tables look like, and what result you want). But you can certainly select data into a variable of a table datatype, and tamper with it. It's quite convenient:
DECLARE #tbl TABLE (id INT IDENTITY(1,1), userId int, foreignId int)
INSERT INTO #tbl (userId)
SELECT id FROM users
WHERE name LIKE 'a%'
UPDATE #tbl t
SET
foreignId = (SELECT id FROM foreignTable f WHERE f.userId = t.userId)
In that example I gave the table variable an identity column of its own, distinct from the one in the source table. I often find that useful. Adjust as you like... Again, it's not very clear what the question is, but I hope this might guide you in the right direction...?
Every scenario is different, and without full details on a specific case it's difficult to say whether it would be a good approach for you.
Having said that, I would not be looking to use the table variable approach unless I had a specific functional reason to - if the query can be fulfilled with a standard SELECT query using an OUTER JOIN, then I'd use that as I'd expect that to be most efficient.
The times where you may want to use a temp table/table variable instead, are more when you want to get an intermediary resultset and then do some processing on it before then returning it out - i.e. the kind of processing that cannot be done with a straight forward query.
Note the table variables are very handy, but take into account that they are not guaranteed to reside in-memory - they can get persisted to tempdb like standard temp tables.
Thank you, astander.
I tried with an example given below. Both of the approaches took 19 seconds. However, I guess some tuning will help the Table varaible update approach to become faster than LEFT JOIN.
AS I am not a master in tuning I request your help. Any SQL expert ready to prove it?
---- PLease replace "" with '' below. I am not familiar with how to put code in this forum... It causes some troubles....
CREATE TABLE #MainTable (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(100)
)
DECLARE #Count INT
SET #Count = 0
DECLARE #Iterator INT
SET #Iterator = 0
WHILE #Count <8000
BEGIN
INSERT INTO #MainTable SELECT #Count, "Cust"+CONVERT(VARCHAR(10),#Count)
SET #Count = #Count+1
END
CREATE TABLE #RightTable
(
OrderID INT PRIMARY KEY,
CustomerID INT,
Product VARCHAR(100)
)
CREATE INDEX [IDX_CustomerID] ON #RightTable (CustomerID)
WHILE #Iterator <400000
BEGIN
IF #Iterator % 2 = 0
BEGIN
INSERT INTO #RightTable SELECT #Iterator,2, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
ELSE
BEGIN
INSERT INTO #RightTable SELECT #Iterator,1, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
SET #Iterator = #Iterator+1
END
-- Using LEFT JOIN
SELECT mt.CustomerID,mt.FirstName,COUNT(rt.Product) [CountResult]
FROM #MainTable mt
LEFT JOIN #RightTable rt ON mt.CustomerID = rt.CustomerID
GROUP BY mt.CustomerID,mt.FirstName
---------------------------
-- Using Table variable Update
DECLARE #WorkingTableVariable TABLE
(
CustomerID INT,
FirstName VARCHAR(100),
ProductCount INT
)
INSERT
INTO #WorkingTableVariable (CustomerID,FirstName)
SELECT CustomerID, FirstName FROM #MainTable
UPDATE #WorkingTableVariable
SET ProductCount = [Count]
FROM #WorkingTableVariable wt
INNER JOIN
(SELECT CustomerID,COUNT(rt.Product) AS [Count]
FROM #RightTable rt
GROUP BY CustomerID) IV ON wt.CustomerID = IV.CustomerID
SELECT CustomerID,FirstName, ISNULL(ProductCount,0) [CountResult] FROM #WorkingTableVariable
ORDER BY CustomerID
--------
DROP TABLE #MainTable
DROP TABLE #RightTable
Thanks
Lijo
In my opinion there is one reason to do this:
If you have a complicated query with lots of inner joins and one left join you sometimes get in trouble because this query is hundreds of times less fast than using the same query without the left join.
If you query lots of records with a result of very few records to be joined to the left join you could get faster results if you materialize the intermediate result into a table variable or temp table.
But usually there is no need to really update the data in the table variable - you could query the table variable using the left join to return the result.
... just my two cents.

Resources