SQL Pivot with multiple joins

SQL Pivot with multiple joins - sql-server

The SQL Pivot command seems difficult at least. I've read a lot about it, and been tinkering with this query for a while, but all I get are really obscure error messages that don't help, like "The column name 'Id' was specified multiple times.." or "The multi-part identifier X could not be bound."
Our database collects client answers to questions. I'd like to create a table which contains a row for each client, and columns for each question (ID) they've answered and the AVG ResponseTime across all times that user has logged in. This is made more difficult as the UserId isn't directly stored in the UserSessionData table, it's stored in the UserSession table, so I have to do a join first, which seems to complicate the issue.
The tables I'm trying to pivot are roughly of the following form:
CREATE TABLE [dbo].[UserSessionData](
[Id] [int] IDENTITY(1,1) NOT NULL,
[UserSessionId] [int] NOT NULL,
[UserWasCorrect] [bit] NULL,
[ResponseTime] [float] NULL,
[QuestionId] [int] NULL)
--This table contains user answers to a number of questions.
CREATE TABLE [dbo].[UserSession](
[Id] [int] IDENTITY(1,1) NOT NULL,
[UserId] [int] NOT NULL,
[SessionCode] [nvarchar](50) NOT NULL)
--This table contains details of the user's login session.
CREATE TABLE [dbo].[Question](
[Id] [int] IDENTITY(1,1) NOT NULL,
[QuestionText] [nvarchar](max) NOT NULL,
[GameId] [int] NOT NULL,
[Description] [nvarchar](max) NULL)
--This table contains question details
I'll continue trying to mangle a solution, but if anyone can shed any light (or suggest an easier method than PIVOT to achieve the desired result), then that would be great.
Cheers

It's because you've got the same column names in multiple tables so after you've done the join the pivot sees multiple columns all the same name. Have a look at my example below:
SELECT
*
FROM (
SELECT
usd.Id AS usdId
,UserSessionId
,UserWasCorrect
,ResponseTime
,QuestionId
,us.Id AS usId
,SessionCode
,UserId
,Description
,GameId
,qu.Id AS quId
,QuestionText
FROM #UserSessionData usd
LEFT JOIN #UserSession us
ON usd.UserSessionId = us.Id
LEFT JOIN #Question qu
ON usd.QuestionId = qu.Id
) AS tbl PIVOT (
-- As no example data was provided 'quest' & 'voyage are just random values I put in. Change the pivot statement to match what you want
MIN(ResponseTime) FOR SessionCode IN (quest, voyage)
) AS pvt
'quest' and 'voyage' are example data of the rows contents in the Column SessionCode. This will need to be changed to your columns contents. In PIVOTs and UNPIVOTs you cannot use a query to get these values and they have to be statically put in. You could use dynamic SQL to generate the values however this is usually heavily advised against

Related

How to delete documents from Filetable?

I am trying to delete some documents from sql server's filetable.
Here I have one table in which I am storing all my Attachment's details and Documents in sql server's file table named Attchemnts.
AttachmentDetails table has below schema,
CREATE TABLE [dbo].[AttachmentDetails](
[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[DocumentName] [nvarchar](max) NULL,
[DocumentType] [nvarchar](max) NULL,
[ModifiedDateTime] [datetime] NOT NULL,
[CreatedDateTime] [datetime] NOT NULL,
[CreatedBy] [nvarchar](254) NULL,
[ModifiedBy] [nvarchar](254) NULL,
[IsDeleted] [bit] NULL,
)
Whenever I am uploading any document to File table then I am inserting that document's detailed information in AttchemntsDetails table as per table schema.
Here I have tried the below solution
CREATE PROCEDURE [dbo].[DeleteFiles]
AS
BEGIN
DELETE Attachments
FROM AttachmentDetails a
WHERE
DocumentType = 'video/mp4' AND DATEDIFF(day, a.CreatedDateTime, GETDATE())<11
end
This procedure suppose to delete only Video/mp4 files who are 10 days older But it deletes any type of document from the filetable.

SQL is a set-based language. For every cursor/loop based script there's a far simpler and faster set based solution. In any case, the way this query is written would result in random deletions since there's no guarantee what all those TOP 1 queries will return without an ORDER BY clause.
It looks like you're trying to delete all video attachments older than 30 days. It also looks like the date is stored in a separate table called table1. You can write a DELETE statement whose rows come from a JOIN if you use the FROM clause, eg:
DELETE Attachments
FROM Attachments inner join table1 a on a.ID=Attachments.ID
WHERE
DocumentType = 'video/mp4' AND
CreatedDateTime < DATEADD(day,-30,getdate())
EDIT
The original query contained DATEADD(day,30,getdate()) when it should be DATEADD(day,-30,getdate())
Example
Assuming we have those two tables :
create table attachments (ID int primary key,DocumentType nvarchar(100))
insert into attachments (ID,DocumentType)
values
(1,'video/mp4'),
(2,'audio/mp3'),
(3,'application/octet-stream'),
(4,'video/mp4')
and
create table table1 (ID int primary key, CreatedDateTime datetime)
insert into table1 (ID,CreatedDateTime)
values
(1,dateadd(day,-40,getdate())),
(2,dateadd(day,-40,getdate())),
(3,getdate()),
(4,getdate())
Executing the DELETE query will only delete the Attachment with ID=1. The query
select *
from Attachments
```
Will return :
```
ID DocumentType
2 audio/mp3
3 application/octet-stream
4 video/mp4
```

Dynamic SQL to execute large number of rows from a table

I have a table with a very large number of rows which I wish to execute via dynamic SQL. They are basically existence checks and insert statements and I want to migrate data from one production database to another - we are merging transactional data. I am trying to find the optimal way to execute the rows.
I've been finding the coalesce method for appending all the rows to one another to not be efficient for this particularly when the number of rows executed at a time is greater than ~100.
Assume the structure of the source table is something arbitrary like this:
CREATE TABLE [dbo].[MyTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[DataField1] [int] NOT NULL,
[FK_ID1] [int] NOT NULL,
[LotsMoreFields] [NVARCHAR] (MAX),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([ID] ASC)
)
CREATE TABLE [dbo].[FK1]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [int] NOT NULL, -- Unique constrained value
CONSTRAINT [PK_FK1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
The other requirement is I am tracking the source table PK vs the target PK and whether an insert occurred or whether I have already migrated that row to the target. To do this, I'm tracking migrated rows in another table like so:
CREATE TABLE [dbo].[ChangeTracking]
(
[ReferenceID] BIGINT IDENTITY(1,1),
[Src_ID] BIGINT,
[Dest_ID] BIGINT,
[TableName] NVARCHAR(255),
CONSTRAINT [PK_ChangeTracking] PRIMARY KEY CLUSTERED ([ReferenceID] ASC)
)
My existing method is executing some dynamic sql generated by a stored procedure. The stored proc does PK lookups as the source system has different PK values for table [dbo].[FK1].
E.g.
IF NOT EXISTS (<ignore this existence check for now>)
BEGIN
INSERT INTO [Dest].[dbo].[MyTable] ([DataField1],[FK_ID1],[LotsMoreFields]) VALUES (333,(SELECT [ID] FROM [Dest].[dbo].[FK1] WHERE [Name]=N'ValueFoundInSource'),N'LotsMoreValues');
INSERT INTO [Dest].[dbo].[ChangeTracking] ([Src_ID],[Dest_ID],[TableName]) VALUES (666,SCOPE_IDENTITY(),N'MyTable'); --666 is the PK in [Src].[dbo].[MyTable] for this inserted row
END
So when you have a million of these, it isn't quick.
Is there a recommended performant way of doing this?

As mentioned, the MERGE statement works well when you're looking at a complex JOIN condition (if any of these fields are different, update the record to match). You can also look into creating a HASHBYTES hash of the entire record to quickly find differences between source and target tables, though that can also be time-consuming on very large data sets.

It sounds like you're making these updates like a front-end developer, by checking each row for a match and then doing the insert. It will be far more efficient to do the inserts with a single query. Below is an example that looks for names that are in the tblNewClient table, but not in the tblClient table:
INSERT INTO tblClient
( [Name] ,
TypeID ,
ParentID
)
SELECT nc.[Name] ,
nc.TypeID ,
nc.ParentID
FROM tblNewClient nc
LEFT JOIN tblClient cl
ON nc.[Name] = cl.[Name]
WHERE cl.ID IS NULL;
This is will way more efficient than doing it RBAR (row by agonizing row).

Taking the two answers from #RusselFox and putting them together, I reached this tentative solution (but looking a LOT more efficient):
MERGE INTO [Dest].[dbo].[MyTable] [MT_D]
USING (SELECT [MT_S].[ID] as [SrcID],[MT_S].[DataField1],[FK_1_D].[ID] as [FK_ID1],[MT_S].[LotsMoreFields]
FROM [Src].[dbo].[MyTable] [MT_S]
JOIN [Src].[dbo].[FK_1] ON [MT_S].[FK_ID1] = [FK_1].[ID]
JOIN [Dest].[dbo].[FK_1] [FK_1_D] ON [FK_1].[Name] = [FK_1_D].[Name]
) [SRC] ON 1 = 0
WHEN NOT MATCHED THEN
INSERT([DataField1],[FL_ID1],[LotsMoreFields])
VALUES ([DataField1],[FL_ID1],[LotsMoreFields])
OUTPUT [SRC].[SrcID],INSERTED.[ID],0,N'MyTable' INTO [Dest].[dbo].[ChangeTracking]([Src_ID],[Dest_ID],[AlreadyExists],[TableName]);

Inserting into a joined view SQL Server

This is a question more about design than about solving a problem.
I created three tables as such
CREATE TABLE [CapInvUser](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](150) NOT NULL,
[AreaId] [int] NULL,
[Account] [varchar](150) NULL,
[mail] [varchar](150) NULL,
[UserLevelId] [int] NOT NULL
)
CREATE TABLE [CapInvUserLevel](
[UserLevelId] [int] IDENTITY(1,1) NOT NULL,
[Level] [varchar](50) NOT NULL
)
CREATE TABLE [CapInvUserRegistry](
[UserRegistryId] [int] IDENTITY(1,1) NOT NULL,
[UserLevelId] int NOT NULL,
[DateRegistry] DATE NOT NULL,
[RegistryStatus] VARCHAR(50) NOT NULL,
)
With a view that shows all the data on the first table with "AreaId" being parsed as the varchar identifier of that table, the UserLevel being parsed as the varchar value of that table, and a join of the registry status of the last one.
Right now when I want to register a new user, I insert into all three tables using separate queries, but I feel like I should have a way to insert into all of them at the same time.
I thought about using a stored procedure to insert, but I still don't know if that would be apropiate.
My question is
"Is there a more apropiate way of doing this?"
"Is there a way to create a view that will let me insert over it? (without passing the int value manually)"
--This are just representations of the tables, not the real ones.
-- I'm still learning how to work with SQL Server properly.
Thank you for your answers and/or guidance.

The most common way of doing this, in my experience, is to write a stored procedure that does all three inserts in the necessary order to create the FK relationships.
This would be my unequivocal recommendation.

Row update if row exists. Insert it if row doesn't exist

I'm developing a SQL SERVER 2012 express and developer solution.
I will receive an xml in an stored procedure. In the stored procedure I will parse the xml and insert its data into a table.
My problem here is that in this xml could contain data that exists on the table, and I need to update the data on the table with the new one.
I don't want to check if each row in xml exists on the table.
I think I can use IGNORE_DUP_KEY but I'm not sure.
How can I update or insert new data without checking it?
This is the table where I want to insert (or update) the new data:
CREATE TABLE [dbo].[CODES]
(
[ID_CODE] [bigint] IDENTITY(1,1) NOT NULL,
[CODE_LEVEL] [tinyint] NOT NULL,
[CODE] [nvarchar](20) NOT NULL,
[COMMISIONING_FLAG] [tinyint] NOT NULL,
[IS_TRANSMITTED] [bit] NOT NULL,
[TIMESPAN] [datetime] NULL,
[USERNAME] [nvarchar](50) NULL,
[SOURCE] [nvarchar](50) NULL,
[REASON] [nvarchar](200) NULL
CONSTRAINT [PK_CODES] PRIMARY KEY CLUSTERED
(
[CODE_LEVEL] ASC,
[CODE] ASC
)
)

The "IGNORE_DUP_KEY" parameter ,is ignore inserting new row, if he is already exists, but it is not dealing with update in case it exists.
the solution to your request is by MERGE or DML operation (INSERT/UPDATE/DELETE) .
BTW,
The parameter "IGNORE_DUP_KEY" is covering existsnce for the index key only (index column).

Sql server query using function and view is slower

I have a table with a xml column named Data:
CREATE TABLE [dbo].[Users](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NOT NULL,
[LastName] [nvarchar](max) NOT NULL,
[Email] [nvarchar](250) NOT NULL,
[Password] [nvarchar](max) NULL,
[UserName] [nvarchar](250) NOT NULL,
[LanguageId] [int] NOT NULL,
[Data] [xml] NULL,
[IsDeleted] [bit] NOT NULL,...
In the Data column there's this xml
<data>
<RRN>...</RRN>
<DateOfBirth>...</DateOfBirth>
<Gender>...</Gender>
</data>
Now, executing this query:
SELECT UserId FROM Users
WHERE data.value('(/data/RRN)[1]', 'nvarchar(max)') = #RRN
after clearing the cache takes (if I execute it a couple of times after each other) 910, 739, 630, 635, ... ms.
Now, a db specialist told me that adding a function, a view and changing the query would make it much more faster to search a user with a given RRN. But, instead, these are the results when I execute with the changes from the db specialist: 2584, 2342, 2322, 2383, ...
This is the added function:
CREATE FUNCTION dbo.fn_Users_RRN(#data xml)
RETURNS nvarchar(100)
WITH SCHEMABINDING
AS
BEGIN
RETURN #data.value('(/data/RRN)[1]', 'varchar(max)');
END;
The added view:
CREATE VIEW vwi_Users
WITH SCHEMABINDING
AS
SELECT UserId, dbo.fn_Users_RRN(Data) AS RRN from dbo.Users
Indexes:
CREATE UNIQUE CLUSTERED INDEX cx_vwi_Users ON vwi_Users(UserId)
CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN)
And then the changed query:
SELECT UserId FROM Users
WHERE dbo.fn_Users_RRN(Data) = #RRN
Why is the solution with a function and a view going slower?

the point of the view was to pre-compute the XML value into a regular column. To then use that precomputed value in the index on the view, shouldn't you actually query the view?
SELECT
UserId
FROM vwi_Users
WHERE RRN= '59021626919-61861855-S_FA1E11'
also, make the index this:
CREATE NONCLUSTERED INDEX cx_vwi_Users__RRN ON vwi_Users(RRN) INCLUDE (UserId)
it is called a covering index, since all columns needed in the query are in the index.

Have you tried to add that function result to your table (not a view) as a persisted, computed column??
ALTER TABLE dbo.Users
ADD dbo.fn_Users_RRN(Data) PERSISTED
Doing so will extract that piece of information from the XML, store it in a computed, always up-to-date column, and the persisted flag makes it physically stored along side the other columns in your table.
If this works (the PERSISTED flag is a bit iffy in terms of all the limitations it has), then you should see nearly the same performance as querying any other string field on your table... and if the computed column is PERSISTED, you can even put an index on it if you feel the need for that.

Check the query execution plan and confirm whether or not the new query is even using the view. If the query doesn't use the view, that's the problem.
How does this query fair?
SELECT UserId FROM vwi_Users
WHERE RRN = '59021626919-61861855-S_FA1E11'
I see you're freely mixing nvarchar and varchar. Don't do that! It can cause full index conversions (eeeeevil).

Scalar functions tend to perform very poorly in SQL Server. I'm not sure why if you make it a persisted computed column and index it, it doesn't have identical performance to a normal indexed-column, but it may be due to the UDF being called even though you think it's no longer needed to be called once the data is computed.
I think you know this from another answer, but your final query is wrongly calling the scalar UDF on every row (defeating the point of persisting the computation):
SELECT UserId FROM Users
WHERE dbo.fn_Users_RRN(Data) = #RRN
It should be
SELECT UserId FROM vwi_Users
WHERE RNN = #RRN

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Pivot with multiple joins - sql-server

Related

How to delete documents from Filetable?

Dynamic SQL to execute large number of rows from a table

Inserting into a joined view SQL Server

Row update if row exists. Insert it if row doesn't exist

Sql server query using function and view is slower

Categories

Resources