sql query for non-existing entries - sql-server

I have inherited a website and its corresponding database (SQL Server). The website uses stored procedures to pull data from the database. One of these stored procedures contains a pivot and it the pivot is taking over 4 hours to run. This is currently unacceptable. I am looking for help in replacing the pivot with standard SQL queries because I assume that will be faster and provide better performance.
Here is the pivot in question:
SELECT *
FROM (
SELECT ac.AID
,ac.CatName AS t
,convert(INT, ac.Code) AS c
FROM categories AS ac
) AS s
Pivot(Sum(c) FOR t IN (
[tob]
,[ecit]
,[tobwcom]
,[rnorm]
,[raddict]
,[rpolicy]
,[ryouth]
,[rhealth]
,…
)) AS p;
And the results of the pivot
| AID | tob | ecit | tobwcom | rnorm |
|-----------|-----------|------------|---------------|-------------|
| 1 | 1 | NULL | NULL | 0 |
| 2 | 1 | NULL | NULL | 1 |
| 3 | 1 | NULL | NULL | 0 |
| 4 | 1 | NULL | NULL | 0 |
| 5 | 1 | NULL | NULL | 0 |
| 6 | 1 | NULL | NULL | 1 |
Here’s the source table categories and some sample data:
CREATE TABLE categories(
ArticleID INTEGER NOT NULL
,ThemeID INTEGER NOT NULL
,ThemeName VARCHAR(7) NOT NULL
,Code BIT NOT NULL
,CreatedTime VARCHAR(7) NOT NULL
);
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,2,'ecig',1,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,5,'rnorm',1,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,6,'raddict',0,'42:59.8');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,6,'raddict',0,'42:59.8');
And here’s the table containing the category names – (mytable for now)
CREATE TABLE mytable(
CatID INTEGER NOT NULL PRIMARY KEY
,CatName VARCHAR(7) NOT NULL
,CreatedTime DATETIME NOT NULL
);
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (1,'tob','2015-03-12 10:07:54.173');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (2,'ecig','2015-05-18 11:48:16.297');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (4,'tobwcom','2015-06-19 11:12:01.537');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (5,'rnorm','2015-06-22 14:24:02.317');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (6,'raddict','2015-06-22 14:24:13.957');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (7,'ecit','2015-06-22 14:26:18.437');
What I need is a way to perform the pivot’s ability to find the non-existing data in categories. The output would be something like:
| AID | tob | ecit | tobwcom | rnorm |
|-----------|-----------|------------|---------------|-------------|
| 1 | 1 | NULL | NULL | 0 |
| 2 | 1 | NULL | NULL | 1 |
Or the list of AIDs and the CatNames that don’t have any values. Such as:
| AID | CatName |
|-----|---------|
| 1 | ecit |
| 1 | tobwcom |
| 2 | ecit |
| 2 | tobwcom |
I have tried
select distinct(AID) FROM [categories]
where [CatName] not in ( 'ecit', 'tobwcom')
but the results from this, the numbers don't seem to add up, however this could be an error on my part.

Not sure if it would be fast enough for such a huge table. But for that second expected result then something this could help to find the missing.
select a.ArticleID, c.CatName
from #myarticles a
cross join #mycategories c
left join categories ca on (ca.ArticleID = a.ArticleID and ca.ThemeID = c.CatID)
where ca.ArticleID is null;
A test can be found here
Note that this method benefits from a combined primary key index on (ArticleID, ThemeID)
As an alternative, the LEFT JOIN with a NULL check can be changed to a NOT EXISTS.
select a.ArticleID, c.CatName
from #myarticles a
join #mycategories c on c.CatID between 1 and 7
where NOT EXISTS
(
select 1
from categories ca
where ca.ArticleID = a.ArticleID
and ca.ThemeID = c.CatID
);

Related

Separate comma values into individual values

I need to separate columns in SQL Server
Table: columnsseparates
CREATE TABLE [dbo].[columnsseparates](
[id] [varchar](50) NULL,
[name] [varchar](500) NULL
)
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'1,2,3,4', N'abc,xyz,mn')
GO
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'4,5,6', N'xy,yz')
GO
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'7,100', N'yy')
INSERT [dbo].[columnsseparates] ([id], [name]) VALUES (N'101', N'oo,yy')
GO
based on above data I want output like below:
id | Name
1 |abc
2 |xyz
3 |mn
4 |null
4 |xy
5 |yz
6 |null
7 |yy
100 |null
101 |oo
null |yy
How to achieve this task in SQL Server?
Storing non-atomic values in column is a sign that schema should be normalised.
Naive approach using PARSENAME(up to 4 comma separated values):
SELECT DISTINCT s.id, s.name
FROM [dbo].[columnsseparates]
CROSS APPLY(SELECT REVERSE(REPLACE(id,',','.')) id,REVERSE(REPLACE(name, ',','.')) name) sub
CROSS APPLY(VALUES (REVERSE(PARSENAME(sub.id,1)), REVERSE(PARSENAME(sub.name,1))),
(REVERSE(PARSENAME(sub.id,2)), REVERSE(PARSENAME(sub.name,2))),
(REVERSE(PARSENAME(sub.id,3)), REVERSE(PARSENAME(sub.name,3))),
(REVERSE(PARSENAME(sub.id,4)), REVERSE(PARSENAME(sub.name,4)))
) AS s(id, name)
ORDER BY s.id;
db<>fiddle demo
Output:
+------+------+
| id | name |
+------+------+
| | |
| | yy |
| 1 | abc |
| 100 | |
| 101 | oo |
| 2 | xyz |
| 3 | mn |
| 4 | |
| 4 | xy |
| 5 | yz |
| 6 | |
| 7 | yy |
+------+------+
If you have more than 4 values, then you'll to use a string splitter that can return the ordinal value. I use delimitedsplit8k_LEAD here:
WITH Ids AS(
SELECT cs.id,
cs.name,
DS.ItemNumber,
DS.Item
FROM dbo.columnsseparates cs
CROSS APPLY dbo.DelimitedSplit8K_LEAD (cs.id,',') DS),
Names AS (
SELECT cs.id,
cs.name,
DS.ItemNumber,
DS.Item
FROM dbo.columnsseparates cs
CROSS APPLY dbo.DelimitedSplit8K_LEAD (cs.[name],',') DS)
SELECT I.Item AS ID,
N.Item AS [Name]
FROM Ids I
FULL OUTER JOIN Names N ON I.id = N.id
AND I.ItemNumber = N.ItemNumber
ORDER BY CASE WHEN I.Item IS NULL THEN 1 ELSE 0 END,
TRY_CONVERT(int,I.Item);

SQL: How to extract a check on these values

I'm having troubling extracting some data from a certain (strangely built) table. What I want to do is extract value X from these values, to ensure the value exists:
Name | Fieldinfo | ValueDateTime | X
Date | NULL | 2017-07-05 | 1
AmountGross | 123.45 | NULL |1
AmountNet | 137.02 | NULL | 1
AmountVat | 28.77 | NULL | 1
I'm asware the database is weird, as these are all records, but I want to extract the value from it. I tried doing:
SELECT tel from (select count(X) as tel
from headfields where Fieldinfo in ('123.45', '137.02', '28.77')
UNION
select Xfrom headfields where ValueDateTime = '2017-07-05' ) as tel
But this returns all the records that have the values, indeependent of the UNION.
Maybe this?
drop table if exists x;
create table x (xName varchar(20), Fieldinfo numeric(10,2), ValueDateTime date, X int);
insert into x values
('Date' , NULL , '2017-07-05' ,1),
('AmountGross' , 123.45 , NULL ,1),
('AmountNet' , 137.02 , NULL ,1),
('AmountVat' , 28.77 , NULL ,1);
SELECT x, count(*) as tel
from x where Fieldinfo in ('123.45', '137.02', '28.77') or
valuedatetime = '2017-07-05'
group by x.x
result
+------+-----+
| x | tel |
+------+-----+
| 1 | 4 |
+------+-----+
1 row in set (0.00 sec)

TSQL: How to update one column if the value that exists in my row equals the same value of another row that may exist in a different column

I'm using SQL Server 2012 version 11.0.6020.0. Apologies in advance, I'm new to SQL.
I have one table for a person ID. Due to duplication, a person can have multiple ID's. In an attempt to clean this up, a master ID is created. However, there still exists duplications. Currently, it looks like this ...
IF OBJECT_ID('tempdb..#table1') IS NOT NULL
BEGIN
DROP TABLE #table1
END
CREATE TABLE #table1 (MasterID varchar(1), PersonID1 varchar(3), PersonID2 varchar(3), PersonID3 varchar(3), PersonID4 varchar(3), PersonID5 varchar(3))
INSERT INTO #table1 VALUES ('A', '12', '34', '56', '78', null);
INSERT INTO #table1 VALUES ('B', '34', '12', '90', null, null);
INSERT INTO #table1 VALUES ('C', '777', '888', null, null, null);
The table looks like this when the code above is executed.
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | |
| B | 34 | 12 | 90 | | |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
MasterID A and MasterID B is the same person because some of the PersonID overlap. MasterID C is a different person because it shares none of the ID's. If one ID is shared, then it's safe for me to assume that it is the same patient. So the output I want is ...
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | 90 |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
I thought about unpivoting the data and grouping it.
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
BEGIN
DROP TABLE #t1
END
SELECT MasterID, PersonID
INTO #t1
FROM
(
SELECT MasterID, PersonID1, PersonID2, PersonID3, PersonID4, PersonID5
FROM #table1
) t1
UNPIVOT
(
PersonID FOR PersonIDs IN (PersonID1, PersonID2, PersonID3, PersonID4, PersonID5)
) AS up
GO
---------------------------------------------------
SELECT min(MasterID) as MasterID, PersonID
FROM #t1
GROUP BY PersonID
ORDER BY 1, 2
However, this solution will leave me with this below where it looks like 90 is its own person.
+----------+-----------+
| MasterID | PersonID |
+----------+-----------+
| A | 12 |
| A | 34 |
| A | 56 |
| A | 78 |
| B | 90 |
| C | 777 |
| C | 888 |
+----------+-----------+
I looked through stack overflow and the closest solution I found is this but it involves two tables whereas mine is within the same table
SQL UPDATE SET one column to be equal to a value in a related table referenced by a different column?
I also found this but the max aggregate function probably won't work for my case. Merge two rows in SQL
This solution looks like it'll work but it'll require me to manually check each field for duplicate PersonID first before updating my MasterID. set a row equal to another row in the same table apart the primary key column
My goal is to have SQL check for duplicates and if found, remove the duplicates and update add the new PersonID. And as for which masterID to use, it doesn't matter whether I keep A or B.
Let me know if you know of any solutions or can direct me to one. I'm new to SQL so I may be searching the wrong keywords and vocabularies. Thanks, I really appreciate it!
Please try the following query. It adds a MainMasterID column to identify the main MasterID for each record.
select *,
(select min(MasterID)
from #table1 t2
where t1.PersonID1 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID2 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID3 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID4 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
or t1.PersonID5 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
) AS MainMasterID
from #table1 t1
/* Sample data output
MasterID PersonID1 PersonID2 PersonID3 PersonID4 PersonID5 MainMasterID
-------- --------- --------- --------- --------- --------- ------------
A 12 34 56 78 NULL A
B 34 12 90 NULL NULL A
C 777 888 NULL NULL NULL C
*/

Troubleshooting to implement SQL Server trigger

I have this table called InspectionsReview:
CREATE TABLE InspectionsReview
(
ID int NOT NULL AUTO_INCREMENT,
InspectionItemId int,
SiteId int,
ObjectId int,
DateReview DATETIME,
PRIMARY KEY (ID)
);
Here how the table looks:
+----+------------------+--------+-----------+--------------+
| ID | InspectionItemId | SiteId | ObjectId | DateReview |
+----+------------------+--------+-----------+--------------+
| 1 | 3 | 3 | 3045 | 20-05-2016 |
| 2 | 5 | 45 | 3025 | 01-03-2016 |
| 3 | 4 | 63 | 3098 | 05-05-2016 |
| 4 | 5 | 5 | 3041 | 03-04-2016 |
| 5 | 3 | 97 | 3092 | 22-02-2016 |
| 6 | 1 | 22 | 3086 | 24-11-2016 |
| 7 | 9 | 24 | 3085 | 15-12-2016 |
+----+------------------+--------+-----------+--------------+
I need to write trigger that checks before the new row is inserted to the table if the table already has row with columns values 'ObjectId' and 'DateReview' that equal to the columns values of the row that have to be inserted, if it's equal I need to get the ID of the exited row and to put to trigger variable called duplicate .
For example, if new row that has to be inserted is:
INSERT INTO InspectionsReview (InspectionItemId, SiteId, ObjectId, DateReview)]
VALUES (4, 63, 3098, '05-05-2016');
The duplicate variable in SQL Server trigger must be equal to 3.
Because the row in InspectionsReview table were ID = 3 has ObjectId and DateReview values the same as in new row that have to be inserted. How can I implement this?
With the extra assumption that you want to log all the duplicate to a different table, then my solution would be to create an AFTER trigger that would check for the duplicate and insert it into your logging table.
Of course, whether this is the solution depends on whether my extra assumption is valid.
Here is my logging table.
CREATE TABLE dbo.InspectionsReviewLog (
ID int
, ObjectID int
, DateReview DATETIME
, duplicate int
);
Here is the trigger (pretty straightforward with the extra assumption)
CREATE TRIGGER tr_InspectionsReview
ON dbo.InspectionsReview
AFTER INSERT
AS
BEGIN
DECLARE #tableVar TABLE(
ID int
, ObjectID int
, DateReview DATETIME
);
INSERT INTO #tableVar (ID, ObjectID, DateReview)
SELECT DISTINCT inserted.ID, inserted.ObjectID, inserted.DateReview
FROM inserted
JOIN dbo.InspectionsReview ir ON inserted.ObjectID=ir.ObjectID AND inserted.DateReview=ir.DateReview AND inserted.ID <> ir.ID;
INSERT INTO dbo.InspectionsReviewLog (ID, ObjectID, DateReview, duplicate)
SELECT ID, ObjectID, DateReview, 3
FROM
#tableVar;
END;

SQL Server : Insert Multiple Rows into Multiple Tables From Table Type Paramater

I'm trying to write a stored procedure which takes in a table type parameter and inserts into two tables at once.
I have an entity table which is a base table holding the id for various tables, below is the entity table and a sample Site table.
------ Entity Table ------------------------------------------
| Id | bigint | NOT NULL | IDENTITY(1,1) | PRIMARY KEY
| TypeId | tinyint | NOT NULL |
| Updated | datetime | NULL |
| Created | datetime | NOT NULL |
| IsActive | bit | NOT NULL |
------- Site Table ---------------------------------------
| EntityId | bigint | NOT NULL | PRIMARY KEY
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
And here is my table type used to pass into the stored procedure
------- Site Table Type ----------------------------------
| EntityTypeId | tinyint | NOT NULL |
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
The idea is that I will pass in a table type parameter into the stored procedure and insert multiple rows at once to save looping inserting one row at a time.
Here's what I have so far
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- Insert into Entity & Site Tables here, using the Id from the Entity Table in the Site table
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[Id], S.ProductTypeCode, S.SupplierCode, S.Name, S.Description
INTO Site
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
END
I've read about using insert and output together but cannot get this to work. I've also read about merge but also cannot get this to work.
Any help or pointers you can give will be greatly appreciated.
Thanks
Neil
---- Edit ----
Could I do something like this? I'm not sure how to finish this off...
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- First insert enough rows into Entity table, saving the inserted Ids to a table variable
DECLARE #InsertedOutput TABLE (EntityId bigint)
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[id]
INTO #InsertedOutput
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
-- Use the Ids in #InsertedOutput against the rows in #Sites to insert into Sites
END

Resources