MERGE between 2 tables, one table with 10million rows - sql-server

TableA
match / Keyword
0 Stackoverflow
1 Youtube
1 Google
0 Yandex
1 Twitter
0 Facebook
0 Teacher
Totally 10million rows in TableA
There is Clustered index at Keyword column
TableB
match / word
1 You
1 Go
1 Twit
0 Home
0 Car
0 Pencil
0 Money
0 Weather
0 Her
Totally 500 rows in TableB
There is Clustered index at word column
My Question
i want to make a sql query to match every word from TableB if matches in TableA keywords. And update the TableB.match with 1
(TableA.keyword like '+TableB.word+'%') (will be matched)
NOT the middle of the keyword matches; (TableA.keyword like '%'+TableB.word+'%')
Forexample Her -> in Teacher (wont be matched)
I Tried to use MERGE
First Try;
i tried to match keywords with words and update TableB
i get error, because there is multiple matches in TableA and MERGE do not allow updating multiple times a row in Target table (TableB)
MERGE INTO [TableB] As XB
USING (Select keyword FROM [TableA]) As XA
ON XB.word LIKE ''+XA.keyword+'%'
WHEN MATCHED THEN UPDATE SET XB.match=1;
Second Try;
i tried to match words with keywords and update TableA
i get what i want, The problem is, it takes 1 hour to execute the query for 500words in 10million keywords.
MERGE INTO [TableA] As XA
USING (Select word FROM [TableB]) As XB
ON XB.word LIKE ''+XA.keyword+'%'
WHEN MATCHED THEN UPDATE SET XA.match=1;
Is there an option to fasten these lookups in SecondTry?

An update statement will suffice for what you're trying to do. Note that this will probably not perform very well as SQL isn't great at comparing strings.
declare #a table (match int, keyword varchar(50))
declare #b table (match int, keyword varchar(50))
insert into #a values (0, 'Stackoverflow')
insert into #a values (0, 'Youtube')
insert into #a values (0, 'Google')
insert into #a values (0, 'Yandex')
insert into #a values (0, 'Twitter')
insert into #a values (0, 'Facebook')
insert into #a values (0, 'Teacher')
insert into #b values (0, 'You')
insert into #b values (0, 'Go')
insert into #b values (0, 'Twit')
insert into #b values (0, 'Home')
insert into #b values (0, 'Car')
insert into #b values (0, 'Pencil')
insert into #b values (0, 'Money')
insert into #b values (0, 'Weather')
insert into #b values (0, 'Her')
--commented out because user didn't want this, but it matches the provided data
--update #a
--set match = 1
--where keyword in
--(
-- select
-- distinct a.keyword
-- from #a a
-- cross apply #b b
-- where a.keyword like b.keyword + '%'
--)
update #b
set match = 1
where keyword in
(
select
distinct b.keyword
from #a a
cross apply #b b
where a.keyword like b.keyword + '%'
)
select *
from #a
select *
from #b
--EDIT BY Sean--
Here is how you could do this as a correlated subquery so you can use EXISTS.
update b
set match = 1
from #b b
where exists
(
select b.keyword
from #a a
where a.keyword like b.keyword + '%'
)

Related

Find data based on data-driven conditions from another table ("At least One" of and "Required")

Using data in the following format as an example, in which table B stores orders, and table A stores conditions for which a promotional discount may be applied to orders in B:
DROP TABLE #A
DROP TABLE #B
-- Table #A stores information about the requirements for allowing promotions
CREATE TABLE #A(
PromoName varchar(50),
Product varchar(250),
ConditionType varchar(50)
)
INSERT INTO #A VALUES ('PromoA','Product1','AT LEAST ONE')
INSERT INTO #A VALUES ('PromoA','Product2','AT LEAST ONE')
INSERT INTO #A VALUES ('PromoA','Product3','REQUIRED')
INSERT INTO #A VALUES ('PromoA','Product4','REQUIRED')
INSERT INTO #A VALUES ('PromoA','Product5','REQUIRED')
-- Table B contains order information, and whether products from #A are in the order
CREATE TABLE #B(
QuoteID varchar(50),
ProductName varchar(250)
)
INSERT INTO #B VALUES ('Quote1','Product1')
INSERT INTO #B VALUES ('Quote2','Product3')
INSERT INTO #B VALUES ('Quote3','Product4')
INSERT INTO #B VALUES ('Quote4','Product5')
-- Select * from #A
-- Select * from #B
I need to find data in #B that matches the requirements set in #A. So, in the example data I provided, the records in #B should be returned because the requirements from A are met...that being that the orders in #B contain "at least one" of either "Product1" or "Product2" (because #B indeed contains Product 1) and it also contains all of the "REQUIRED" items of Product3, Product4, and Product5.
But, if one of the required fields were missing, like if we were to remove Product5 from #B, then no records should be returned from table #B. Likewise, no records should return if table #B contained neither products 1 or 2.
How can I get this data? I attempted something here which seems logically correct to me, but it's not and I think this may be turning more complex than I initially thought. Here's my code:
;WITH CTE_Required as --These are "required" promotion requirements indicating that an item must be
-- on the order
(
Select PromoName,Product,ConditionType from #A
where ConditionType = 'REQUIRED'
),
CTE_AtLeastOne as --These are requirements that "at least one" of the "at least one" items must exist
-- on the order
(
Select PromoName,Product,ConditionType from #A
where ConditionType = 'AT LEAST ONE'
),
CTE_PromoRequiredRestrictionNotMet as -- The "required" restriction test has failed for these
(
Select a.Product
from CTE_Required a
left join #B b on b.ProductName = a.Product
where b.QuoteID is null -- Data is in the "required" list but it's not in #B
),
CTE_PromoAtLeastOneRestrictionNotMet as --This data needs at least one in #B, but none exist in #B
(
Select a.Product
from CTE_AtLeastOne a
left join #B b on b.ProductName = a.Product
where b.QuoteID is not null
),
CTE_PromoRequiredRestrictionMet as --These are items not in the failed items ("required" test passes)
(
Select * from #B where ProductName not in
(
Select * from CTE_PromoRequiredRestrictionNotMet
)
),
CTE_PromoAtLeastOneRestrictionMet as -- These pass the "At least one" test
(
Select * from #B where ProductName not in
(
Select * from CTE_PromoRequiredRestrictionNotMet
)
)
Select * from CTE_PromoRequiredRestrictionMet c -- Get items that passed both tests
join CTE_PromoAtLeastOneRestrictionMet d on c.ProductName = d.ProductName
This returns all records correctly when the products/conditions match (in the listed data example above), however it's not correct if I remove a "required" product from #B. So, if I remove product 3 from #B, then the results still return Products 1,2 and 4, which I do not want. I only want to return records where all conditions are met.
In my code, I kind of get why it doesn't work...I have several CTEs set up to get the data in small bits (I'm trying to differentiate the data that meets the "REQUIRED" requirements and separately the data that meets the "AT LEAST ONE" requirements, then find the ones that meet both. (And there are intermediate CTEs that used left joins to find which things are in #A that are NOT in #B, which I then use with a "not in" to decide what is in #A and #B). I have a feeling I need a group-by clause somewhere.
In any case, what query could I use to select either all-or-none records from #B, based on whether all of the data-driven conditions are met defined in #A? Aside from the inner join at the end on the CTE's containing passed data items, I've tried a few different joins including union and none are totally working. Thank you in advance!
EDIT: TO be clear, there may be more data, so in table #A I have "promoA" listed, but there may also be a promoB and I don't want promoA results to affect promoB results.
This might work best as a stored procedure, but see my logic below.
--DROP #C IF EXISTS
IF OBJECT_ID('tempdb..#C') IS NOT NULL DROP TABLE #C
--INSERT ALL OF #B INTO WORKING TABLE #C
SELECT * INTO #C FROM #B
--IF A REQUIRED PRODUCT IS MISSING, DELETE RECORDS FROM #C
IF EXISTS (SELECT * FROM #A A LEFT JOIN #C C ON A.PRODUCT = C.PRODUCTNAME WHERE A.CONDITIONTYPE ='REQUIRED' AND C.PRODUCTNAME IS NULL) DELETE FROM #C
--IF ANY 'AT LEAST ONE' PRODUCTS ARE MISSING, DELETE RECORDS FROM $C
IF NOT EXISTS(SELECT * FROM #C C WHERE C.PRODUCTNAME IN (SELECT PRODUCT FROM #A A WHERE A.CONDITIONTYPE ='AT LEAST ONE')) DELETE FROM #C
--RETURN RECORDS (IF ANY)
SELECT * FROM #C
John

TSQL IF/ELSE or CASE (UPSERT)

Not sure if IF/ELSE is the right way to go for the following. It always returns ELSE so it seems its not working correctly.
IF ((SELECT COUNT(CAST(StudentuserID AS int)) FROM StudentAttendance WHERE StudentUserID=1)>0)
PRINT 'Yes'
ELSE
PRINT 'No'
This test should result in yes as the data is 8>0
I will be replacing PRINT with an UPDATE ELSE INSERT statement.
IF ((SELECT COUNT(CAST(StudentuserID AS int)) FROM StudentAttendance WHERE StudentUserID=1)>0)
UPDATE StudentAttendance
SET
CID = CAST('[querystring:CID]' AS int),
CalendarEventID = CAST('[querystring:CEID]' AS int),
StudentUserID = CAST('[StudentUserID]' AS int),
Attendance = '[Attendance]'
ELSE
INSERT INTO StudentAttendance
(CID,CalendarEventID,StudentUserID,Attendance)
VALUES
(CAST('[querystring:CID]' AS int), CAST('[querystring:CEID]' AS int), CAST('[StudentsUserID]' AS int),'[Attendance]')
It looks like your IF/ELSE would work fine (it looks like you're doing this for one record in a stored procedure or something?). If it's currently returning 'No' and you don't think it should be, I'd perhaps do a more basic check on your table, e.g.:
SELECT *
FROM StudentAttendance
WHERE StudentUserID = 1
You can also use a MERGE statement for this, and you can use multiple source tables by joining them within the USING part. Here is a basic example of that:
DECLARE #A table (Aid int, value int)
DECLARE #B table (Aid int, Cid int)
DECLARE #C table (Cid int, value int)
INSERT INTO #A VALUES (1, 1)
INSERT INTO #B VALUES (1, 2)
INSERT INTO #B VALUES (2, 3)
INSERT INTO #C VALUES (2, 4)
INSERT INTO #C VALUES (3, 6)
;
SELECT *
FROM #A
;
MERGE INTO #A tgt
USING (SELECT B.Aid, B.Cid, C.value FROM #B B JOIN #C C ON B.Cid = C.Cid) src
ON tgt.Aid = src.Aid
WHEN MATCHED THEN UPDATE
SET tgt.value = src.value
WHEN NOT MATCHED THEN
INSERT
(
Aid
, value
)
VALUES
(
src.Aid
, src.value
)
;
SELECT *
FROM #A
;

T-SQL: How to merge two tables with delete and insert when matched and only insert when not matched?

Here is an example statement to explain what I mean:
DECLARE #sourceTable table(ID int, tmstmp datetime, data varchar(max))
DECLARE #targetTable table(ID int, tmstmp datetime, data varchar(max))
INSERT INTO
#sourceTable
VALUES
(1, '2015-07-23T01:01:00', 'Testdata6')
,(1, '2015-07-23T02:02:00', 'Testdata7')
,(2, '2015-07-23T03:03:00', 'Testdata8')
,(2, '2015-07-23T04:04:00', 'Testdata9')
INSERT INTO
#targetTable
VALUES
(2, '2015-07-23T00:01:00', 'Testdata1')
,(2, '2015-07-23T00:02:00', 'Testdata2')
,(2, '2015-07-23T00:03:00', 'Testdata3')
,(3, '2015-07-23T00:04:00', 'Testdata4')
,(3, '2015-07-23T00:05:00', 'Testdata5')
MERGE INTO
#targetTable T
USING
#sourceTable S
ON
S.ID = T.ID
WHEN MATCHED THEN
DELETE
-- also want to INSERT newer ID 2 source records here after delete
WHEN NOT MATCHED THEN
INSERT (ID, tmstmp, data)
VALUES (S.ID, S.tmstmp, S.data)
;
When I make a select...
SELECT
*
FROM
#targetTable
...I get the following table:
ID tmstmp data
3 2015-07-23 00:04:00.000 Testdata4
3 2015-07-23 00:05:00.000 Testdata5
1 2015-07-23 01:01:00.000 Testdata6
1 2015-07-23 02:02:00.000 Testdata7
But I want to get the following table instead:
ID tmstmp data
3 2015-07-23 00:04:00.000 Testdata4
3 2015-07-23 00:05:00.000 Testdata5
1 2015-07-23 01:01:00.000 Testdata6
1 2015-07-23 02:02:00.000 Testdata7
2 2015-07-23 03:03:00.000 Testdata8
2 2015-07-23 04:04:00.000 Testdata9
How to realize this in one statement, because I use an extensive CTE for the source table.
Thanks in advance...
We can add some extra rows to the "source" table to take care of clearing out the existing rows, then let all of the current rows fall into the NOT MATCHED clause, which is the only one allowed to perform INSERT operations:
;With Clears as (
SELECT *,0 as Rem from #sourceTable
union all
select distinct ID,'1900-01-01','',1 from #sourceTable
)
MERGE INTO
#targetTable T
USING
Clears S
ON
S.ID = T.ID and s.Rem = 1
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED and Rem = 0 THEN
INSERT (ID, tmstmp, data)
VALUES (S.ID, S.tmstmp, S.data)
;
Fiddle
The basic rule with trying to achieve multiple operations within a MERGE statement is you need at least one source row for each action you want to take. It's then a challenge to formulate the ON clause and the various additional conditions after the WHEN clauses such that each operation applies when you want it to.
E.g. without the extra and Rem = 0 added to WHEN NOT MATCHED above, the extra row we added into Clears to remove any rows with ID of 1 would instead end up creating an extra row, since there aren't any ID 1 rows in the target table.
Wouldn't a simple DELETE-INSERT work here?
DELETE t FROM #targetTable t
WHERE EXISTS(
SELECT 1
FROM #sourceTable
WHERE ID = t.ID
)
INSERT INTO #targetTable(ID, tmstmp, data)
SELECT ID, tmstmp, data
FROM #sourceTable s
WHERE NOT EXISTS(
SELECT 1
FROM #targetTable
WHERE ID = s.ID
)
You may want to keep the two statements under one transaction.
EDIT: I just realized you wanted a single statement. But I'll leave it here as an alternate solution.

Comma separated values count

Table Pattern has single column with the following values:
NewsletteridPattern
------------
%50%
%51%
Table B has the following values:
SubscriberId NewsletterIdCsv
------------ -----------------
47421584 51
45551047 50,51
925606902 50
47775985 51
I have the following query which basically counts the comma seperated values by using the pattern:
SELECT *
FROM TABLEB t WITH (nolock)
JOIN Patterns p ON (t.NewsletteridPattern LIKE p.pattern)
The problem is that the count is incorrect as for example my pattern has %50% and %51% and thus the row number 2 from Table B should be counted twice, however with my query it is only once, how do I fx that?
EDIT :
I forgot to add DISTINCT in my original query which was causing the issue:
SELECT Count(Distinct Subscriberid)
FROM TABLEB t WITH (nolock)
JOIN Patterns p ON (t.NewsletteridPattern LIKE p.pattern)
I mocked up your data as such:
create table #pattern (pattern varchar(50))
insert into #pattern values ('%50%')
insert into #pattern values ('%51%')
create table #subscriber (id varchar(50), newsletter varchar(50))
insert into #subscriber values ('47421584', '51')
insert into #subscriber values ('45551047', '50,51')
insert into #subscriber values ('925606902', '50')
insert into #subscriber values ('47775985', '51')
SELECT pattern, COUNT(*) AS Counter
FROM #subscriber t WITH (nolock)
JOIN #pattern p ON (t.newsletter LIKE p.pattern)
GROUP BY pattern
And my select statement returns:
pattern Counter
------- -------
%50% 2
%51% 3
What is your final goal? Are you just concerned about counting the number of rows by pattern or are you trying to do a select of rows by pattern?

SQL Query Uniqueness with subjoin

Help! Here is a very simple a,b,c sample of what I need to accomplish. I have been pulling my hair out. I've written this before but can't get my head around it now! So here it is, with actual and expected results demonstrated below:
set nocount on
declare #a table (id int, a varchar(10))
declare #b table (ref int, b varchar(10), c varchar(20))
insert into #a select 1, 'bingo'
insert into #a select 2, 'bongo'
insert into #b select 1, 'T5', 'asdfwef'
insert into #b select 1, 'T8', 'asfqwez'
insert into #b select 1, 'T6', 'qweoae'
insert into #b select 1, 'T8', 'qzoeqe'
insert into #b select 1, 'T9', 'oqeizef'
insert into #b select 2, 'T3', 'awega'
insert into #b select 2, 'T6', 'fhaeaw'
insert into #b select 2, 'T3', 'fqsegw'
select * from #a a join #b b on a.id = b.ref
-- Expected (Uniqueness is: a’s id to b’s ref and the first b value ingoring b’s c value)
----1,bingo,1,T5,asdfwef
----1,bingo,1,T8,asfqwez
----1,bingo,1,T6,qweoae
----1,bingo,1,T9,oqeizef
----2,bongo,2,T3,awega
----2,bongo,2,T6,fhaeaw
-- Actual
----1,bingo,1,T5,asdfwef
----1,bingo,1,T8,asfqwez
----1,bingo,1,T6,qweoae
----1,bingo,1,T8,qzoeqe
----1,bingo,1,T9,oqeizef
----2,bongo,2,T3,awega
----2,bongo,2,T6,fhaeaw
----2,bongo,2,T3,fqsegw
Your query is returning the correct results. All the matching values from #b.
If you want the first b value, you need to do two things. First, you need to include an ordering column in b so you know what "first" is. Remember, SQL tables are unordered. This is easy:
declare #b table (id int identity(1,1) not null, ref int, b varchar(10), c varchar(20));
You then have to change the inserts to insert all but the id:
insert into #b(ref, b, c) select 1, 'T5', 'asdfwef';
Now you are ready for the actual query:
select *
from #a a join
(select b.*, row_number() over (partition by b.ref, b.b order by b.id) as seqnum
from #b b
) b
on a.id = b.ref and b.seqnum = 1

Resources