I have inherited a website and its corresponding database (SQL Server). The website uses stored procedures to pull data from the database. One of these stored procedures contains a pivot and it the pivot is taking over 4 hours to run. This is currently unacceptable. I am looking for help in replacing the pivot with standard SQL queries because I assume that will be faster and provide better performance.
Here is the pivot in question:
SELECT *
FROM (
SELECT ac.AID
,ac.CatName AS t
,convert(INT, ac.Code) AS c
FROM categories AS ac
) AS s
Pivot(Sum(c) FOR t IN (
[tob]
,[ecit]
,[tobwcom]
,[rnorm]
,[raddict]
,[rpolicy]
,[ryouth]
,[rhealth]
,…
)) AS p;
And the results of the pivot
| AID | tob | ecit | tobwcom | rnorm |
|-----------|-----------|------------|---------------|-------------|
| 1 | 1 | NULL | NULL | 0 |
| 2 | 1 | NULL | NULL | 1 |
| 3 | 1 | NULL | NULL | 0 |
| 4 | 1 | NULL | NULL | 0 |
| 5 | 1 | NULL | NULL | 0 |
| 6 | 1 | NULL | NULL | 1 |
Here’s the source table categories and some sample data:
CREATE TABLE categories(
ArticleID INTEGER NOT NULL
,ThemeID INTEGER NOT NULL
,ThemeName VARCHAR(7) NOT NULL
,Code BIT NOT NULL
,CreatedTime VARCHAR(7) NOT NULL
);
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,2,'ecig',1,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (1,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,5,'rnorm',1,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (2,6,'raddict',0,'42:59.8');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (3,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,1,'tob',1,'57:30.7');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,2,'ecig',0,'03:58.3');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,5,'rnorm',0,'42:56.5');
INSERT INTO categories(ArticleID,ThemeID,ThemeName,Code,CreatedTime) VALUES (21,6,'raddict',0,'42:59.8');
And here’s the table containing the category names – (mytable for now)
CREATE TABLE mytable(
CatID INTEGER NOT NULL PRIMARY KEY
,CatName VARCHAR(7) NOT NULL
,CreatedTime DATETIME NOT NULL
);
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (1,'tob','2015-03-12 10:07:54.173');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (2,'ecig','2015-05-18 11:48:16.297');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (4,'tobwcom','2015-06-19 11:12:01.537');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (5,'rnorm','2015-06-22 14:24:02.317');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (6,'raddict','2015-06-22 14:24:13.957');
INSERT INTO mytable(CatID,CatName,CreatedTime) VALUES (7,'ecit','2015-06-22 14:26:18.437');
What I need is a way to perform the pivot’s ability to find the non-existing data in categories. The output would be something like:
| AID | tob | ecit | tobwcom | rnorm |
|-----------|-----------|------------|---------------|-------------|
| 1 | 1 | NULL | NULL | 0 |
| 2 | 1 | NULL | NULL | 1 |
Or the list of AIDs and the CatNames that don’t have any values. Such as:
| AID | CatName |
|-----|---------|
| 1 | ecit |
| 1 | tobwcom |
| 2 | ecit |
| 2 | tobwcom |
I have tried
select distinct(AID) FROM [categories]
where [CatName] not in ( 'ecit', 'tobwcom')
but the results from this, the numbers don't seem to add up, however this could be an error on my part.
Not sure if it would be fast enough for such a huge table. But for that second expected result then something this could help to find the missing.
select a.ArticleID, c.CatName
from #myarticles a
cross join #mycategories c
left join categories ca on (ca.ArticleID = a.ArticleID and ca.ThemeID = c.CatID)
where ca.ArticleID is null;
A test can be found here
Note that this method benefits from a combined primary key index on (ArticleID, ThemeID)
As an alternative, the LEFT JOIN with a NULL check can be changed to a NOT EXISTS.
select a.ArticleID, c.CatName
from #myarticles a
join #mycategories c on c.CatID between 1 and 7
where NOT EXISTS
(
select 1
from categories ca
where ca.ArticleID = a.ArticleID
and ca.ThemeID = c.CatID
);
I have a table "temp"
author | title | bibkey | Data
-----------------------------------
John | JsPaper | John2008 | 65
Kate | KsPaper | | 60
| | Data2015 | 80
From this I want to produce two tables, a 'sample_table' and a 'ref_table' like so:
sample_table:
sample_id|ref_id| data
--------------------------
1 | 1 | 65
2 | 2 | 60
3 | 3 | 80
ref_table:
ref_id | author | title | bibkey
--------------------------------------
1 | John | JsPaper | John2008
2 | Kate | KsPaper |
3 | | | Data2015
I've created both tables
CREATE TABLE ref_table ( CREATE TABLE sample_table (
ref_id serial PRIMARY KEY, sample_id serial PRIMARY KEY,
author text, ref_id integer REFERENCES ref_table(ref_id),
title text, data numeric
bibkey text );
);
And inserted the unique author,title,bibkey rows into the reference table as above. What I want to do now is do the join for the sample_table to get the ref_id's. For my insert statement i currently have:
INSERT INTO sample_table (
ref_id,data
)
SELECT ref.ref_id, t.data
FROM
temp t
LEFT JOIN
ref_table ref ON COALESCE(ref.author,'00000') = COALESCE(t.author,'00000')
AND COALESCE(ref.title,'00000') = COALESCE(t.title,'00000')
AND COALESCE(ref.bibkey,'00000') = COALESCE(t.bibkey,'00000');
However i really want to have a conditional statement in the join, rather than all 3 like I have:
IF a bibkey exists for that row, I know it is unique, and join only on that.
If bibkey is NULL, then join on both author and title for the unique pair, and not bibkey.
Is this possible?
I am looking for advice on the best way to accomplish the following
I have a table in SQL Server that holds downloaded data from an external system. I need to use it to update another database. Some records will be inserts and others will be updates. There is a comment table and a main table to insert/update. Comments are linked by an ID created in the comments table and stored in a column of the main table record. (one to one relationship)
So insert into comment table and get a scope_identity return value and then use that as part of the insert statement for the main table.
The updates get the comment ID from the record in the main table and then update the comment table where necessary and also the main table where necessary
EG Table has 5 records
Get first record
If exists in database
get commentID column from comment table and update comment and main table
If not exists
insert into comment table and return comment ID and insert the record into the main table with that comment ID
get the next record
I'm struggling to figure out how to best do this in SQL Server. Can't find the right combination of cursor, while loops, stored procedure etc. Haven't done much by way of procedural work in SQL Server.
Any advice/help is greatly appreciated
Thanks
Habo. I appreciate the feedback. I do struggle to write a clear concise question. The linked page provides good advice. Hope this script below helps clarify.
Thanks again.
USE TEMPDB
--TABLE TO HOLD JOB RECORDS
create table tbl_jobs(
jobnumber varchar(16) primary key clustered,
jobdesc varchar(50),
commentID int
)
GO
INSERT INTO tbl_jobs VALUES ('Job1','Desc1', '1')
INSERT INTO tbl_jobs VALUES ('Job2','Desc2', '2')
INSERT INTO tbl_jobs VALUES ('Job3','Desc3', '3')
--TABLE TO HOLD JOB RECORD COMMENTS
create table tbl_jobComments(
commentID INT IDENTITY(1,1) NOT NULL,
comment text
)
GO
Insert into tbl_jobComments VALUES ('Comment1')
Insert into tbl_jobComments VALUES ('Comment2')
Insert into tbl_jobComments VALUES ('Comment3')
--TABLE TO HOLD RECORDS DOWNLOADED FROM EXTERNAL SYSTEM
create table tbl_updates(
jobnumber varchar(16) primary key clustered,
jobdesc varchar(50),
comment text
)
GO
INSERT INTO tbl_updates VALUES ('Job1','Desc1Modified', 'Comment1')
INSERT INTO tbl_updates VALUES ('Job2','Desc2', 'Comment2')
INSERT INTO tbl_updates VALUES ('Job3','Desc3Modified', 'Comment3')
INSERT INTO tbl_updates VALUES ('Job4','Desc4', 'Comment4')
GO
--OUTPUT FROM tbl_Jobs
+-----------+---------+-----------+
| jobnumber | jobdesc | commentID |
+-----------+---------+-----------+
| Job1 | Desc1 | 1 |
| Job2 | Desc2 | 2 |
| Job3 | Desc3 | 3 |
+-----------+---------+-----------+
--OUTPUT FROM tbl_JobComments
+-----------+----------+
| commentID | comment |
+-----------+----------+
| 1 | Comment1 |
| 2 | Comment2 |
| 3 | Comment3 |
+-----------+----------+
--OUTPUT FROM tbl_updates
+-----------+---------------+-----------+
| jobnumber | jobdesc | comment |
+-----------+---------------+-----------+
| Job1 | Desc1Modified | Comment1 |
| Job2 | Desc2 | Comment2a |
| Job3 | Desc3Modified | Comment3 |
| Job4 | Desc4 | Comment4 |
+-----------+---------------+-----------+
--DESIRED RESULTS tbl_jobs
+-----------+-----------------+-----------+
| jobnumber | jobdesc | commentID |
+-----------+-----------------+-----------+
| Job1 | Desc1Modified | 1 |
| Job2 | Desc2 | 2 |
| Job3 | Desc3Modified | 3 |
| Job4 | Desc4 | 4 |
+-----------+---------+-------------------+
--DESIRED RESULTS tbl_jobs_comments
+-----------+-----------+
| commentID | comment |
+-----------+-----------+
| 1 | Comment1 |
| 2 | Comment2a |
| 3 | Comment3 |
| 4 | Comment4 |
+-----------+-----------+
You can break this into 2 statements, an update and an insert query
(This assumes there is only 1 comment per ID)
UPDATE maintable
SET Comment=upd.comment
FROM maintable mt
JOIN updatestable upd
ON mt.id=upd.id
then insert what is missing:
INSERT INTO maintable (id,comment)
SELECT id, comment
FROM updatestable
WHERE id NOT IN (SELECT id FROM maintable)
I have 2 tables with details as follows
Table 1
Name | City | Employee_Id
-----------------
Raj | CA | A2345
Diya | IL | A1234
Max | PL | A2321
Anna | TX | A1222
Luke | DC | A5643
Table 2
Name | City | Employee_Id | Phone | Age
---------------------------------------
Raj | CA | A2345 | 4094 | 25
Diya | IL | A1234 | 4055 | 19
Max | PL | A2321 | 4076 | 23
As you can see, Employee_Id is the common column in both the columns. I want to update all the entries present in table 1 into table 2.
Raj, Divya and Max are already present in Table 2. So it should not create a duplicate entry in table 2 and skip those 3 entries whereas Anna and Luke are not present in table 2. so this should be added as a new row.
The SQL should be able to merge these 2 columns and ignore the rows which are already present. The final table 2 must be similar to this.
Table 2
Name | City | Employee_Id | Phone | Age
---------------------------------------
Raj | CA | A2345 | 4094 | 25
Diya | IL | A1234 | 4055 | 19
Max | PL | A2321 | 4076 | 23
Anna | TX | A1222 | |
Luke | DC | A5643 | |
Is there a way I could achieve this? I am pretty new to SQL, so any inputs would be of great help. I read about merge and update feature but I guess merge is in Transact-SQL. Also read about joins but could not find a way to crack this.
Demo Setup
CREATE TABLE Table1
([Name] varchar(4), [City] varchar(2), [Employee_Id] varchar(5));
INSERT INTO Table1
([Name], [City], [Employee_Id])
VALUES
('Raj', 'FL', 'A2345'),
('Diya', 'IL', 'A1234'),
('Max', 'PL', 'A2321'),
('Anna', 'TX', 'A1222'),
('Luke', 'DC', 'A5643');
CREATE TABLE Table2
([Name] varchar(4), [City] varchar(2), [Employee_Id] varchar(5), [Phone] int, [Age] int);
INSERT INTO Table2
([Name], [City], [Employee_Id], [Phone], [Age])
VALUES
('Raj', 'CA', 'A2345', 4094, 25),
('Diya', 'IL', 'A1234', 4055, 19),
('Max', 'PL', 'A2321', 4076, 23);
MERGE QUERY
MERGE Table2 AS target
USING Table1 AS source
ON (target.[Employee_Id] = source.[Employee_Id])
WHEN MATCHED THEN
UPDATE SET [Name] = source.[Name],
[City] = source.[City]
WHEN NOT MATCHED THEN
INSERT ([Name], [City], [Employee_Id], [Phone], [Age])
VALUES (source.[Name], source.[City], source.[Employee_Id], NULL, NULL);
SELECT *
FROM Table2
I'm trying to write a stored procedure which takes in a table type parameter and inserts into two tables at once.
I have an entity table which is a base table holding the id for various tables, below is the entity table and a sample Site table.
------ Entity Table ------------------------------------------
| Id | bigint | NOT NULL | IDENTITY(1,1) | PRIMARY KEY
| TypeId | tinyint | NOT NULL |
| Updated | datetime | NULL |
| Created | datetime | NOT NULL |
| IsActive | bit | NOT NULL |
------- Site Table ---------------------------------------
| EntityId | bigint | NOT NULL | PRIMARY KEY
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
And here is my table type used to pass into the stored procedure
------- Site Table Type ----------------------------------
| EntityTypeId | tinyint | NOT NULL |
| ProductTypeCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| SupplierCode | nvarchar(8) | NOT NULL | PRIMARY KEY
| Name | nvarchar(128) | NOT NULL |
| Description | nvarchar(max) | NULL |
The idea is that I will pass in a table type parameter into the stored procedure and insert multiple rows at once to save looping inserting one row at a time.
Here's what I have so far
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- Insert into Entity & Site Tables here, using the Id from the Entity Table in the Site table
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[Id], S.ProductTypeCode, S.SupplierCode, S.Name, S.Description
INTO Site
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
END
I've read about using insert and output together but cannot get this to work. I've also read about merge but also cannot get this to work.
Any help or pointers you can give will be greatly appreciated.
Thanks
Neil
---- Edit ----
Could I do something like this? I'm not sure how to finish this off...
CREATE PROCEDURE InsertSites
#Sites SiteTypeTable READONLY
AS
BEGIN
-- First insert enough rows into Entity table, saving the inserted Ids to a table variable
DECLARE #InsertedOutput TABLE (EntityId bigint)
INSERT INTO Entity (TypeId, Updated, Created, IsActive)
OUTPUT [inserted].[id]
INTO #InsertedOutput
SELECT EntityTypeId, NULL, GETDATE(), 1
FROM #Sites S
-- Use the Ids in #InsertedOutput against the rows in #Sites to insert into Sites
END