rSQL While Loop insert - sql-server

*Updated - Please see below(Past the picture)
I am really stuck with this particular problem, I have two tables, Projects and Project Allocations, they are joined by the Project ID.
My goal is to populate a modified projects table's columns using the rows of the project allocations table. I've included an image below to illustrate what I'm trying to achieve.
A project can have up to 6 Project Allocations. Each Project Allocation has an Auto increment ID (Allocation ID) but I can't use this ID in a sub-selects because it isn't in a range of 1-6 so I can distinguish between who is the first PA2 and who is PA3.
Example:
(SELECT pa1.name FROM table where project.projectid = project_allocations.projectid and JVID = '1') as [PA1 Name],
(SELECT pa2.name FROM table where project.projectid = project_allocations.projectid and JVID = '1') as [PA2 Name],
The modified Projects table has columns for PA1, PA2, PA3. I need to populate these columns based on the project allocations table. So the first record in the database FOR EACH project will be PA1.
I've put together an SQL Agent job that drops and re-creates this table with the added columns so this is more about writing the project allocation row's into the modified projects table by row_num?
Any advice?
--Update
What I need to do now is to get the row_number added as a column for EACH project in order of DESC.
So the first row for each project ID will be 1 and for each row after that will be 2,3,4,5,6.
I've found the following code on this website:
use db_name
with cte as
(
select *
, new_row_id=ROW_NUMBER() OVER (ORDER BY eraprojectid desc)
from era_project_allocations_m
where era_project_allocations_m.eraprojectid = era_project_allocations_m.eraprojectid
)
update cte
set row_id = new_row_id
update cte
set row_id = new_row_id
I've added row_id as a column in the previous SQL Agent step and this code and it runs but it doesn't produce me a row_number FOR EACH projectid.
As you can see from the above image; I need to have 1-2 FOR Each project ID - effectively giving me thousands of 1s, 2s, 3s, 4s.
That way I can sort them into columns :)

From what I can tell a query using row number is what you are after. (Also, it might be a pivot table..)
Example:
create table Something (
someId int,
someValue varchar(255)
);
insert into Something values (1, 'one'), (1, 'two'), (1, 'three'), (1, 'four'), (2, 'ein'), (2, 'swei'), (3, 'un')
with cte as (
select someId,
someValue,
row_number() over(partition by someId order by someId) as rn
from Something
)
select distinct someId,
(select someValue from cte where ct.someId = someId and rn = 1) as value1,
(select someValue from cte where ct.someId = someId and rn = 2) as value2,
(select someValue from cte where ct.someId = someId and rn = 3) as value3,
(select someValue from cte where ct.someId = someId and rn = 4) as value4
into somethingElse
from cte ct;
select * from somethingElse;
Result:
someId value1 value2 value3 value4
1 one two three four
2 ein swei NULL NULL
3 un NULL NULL NULL

Related

Efficiently query for the latest version of a record using SQL

I need to query a table for the latest version of a record for all available dates (end of day time-series). The example below illustrates what I am trying to achieve.
My question is whether the table's design (primary key, etc.) and the LEFT OUTER JOIN query is accomplishing this goal in the most efficient manner.
CREATE TABLE [PriceHistory]
(
[RowID] [int] IDENTITY(1,1) NOT NULL,
[ItemIdentifier] [varchar](10) NOT NULL,
[EffectiveDate] [date] NOT NULL,
[Price] [decimal](12, 2) NOT NULL,
CONSTRAINT [PK_PriceHistory]
PRIMARY KEY CLUSTERED ([ItemIdentifier] ASC, [RowID] DESC, [EffectiveDate] ASC)
)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-15',5.50)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-16',5.75)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-16',6.25)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-17',6.05)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-18',6.85)
GO
SELECT
L.EffectiveDate, L.Price
FROM
[PriceHistory] L
LEFT OUTER JOIN
[PriceHistory] R ON L.ItemIdentifier = R.ItemIdentifier
AND L.EffectiveDate = R.EffectiveDate
AND L.RowID < R.RowID
WHERE
L.ItemIdentifier = 'ABC' and R.EffectiveDate is NULL
ORDER BY
L.EffectiveDate
Follow up: Table can contain thousands of ItemIdentifiers each with dacades worth of price data. Historical version of data needs to be preserved for audit reasons. Say I query the table and use the data in a report. I store #MRID = Max(RowID) at the time the report was generated. Now if the price for 'ABC' on '2016-03-16' is corrected at some later date, I can modify the query using #MRID and replicate the report that I ran earlier.
A slightly modified version of #SeanLange's answer will give you the last row per date, instead of per product:
with sortedResults as
(
select *
, ROW_NUMBER() over(PARTITION by ItemIdentifier, EffectiveDate
ORDER by ID desc) as RowNum
from PriceHistory
)
select ItemIdentifier, EffectiveDate, Price
from sortedResults
where RowNum = 1
order by 2
I assume you have more than 1 ItemIdentifier in your table. Your design is a bit problematic in that you are keeping versions of the data in your table. You can however do something like this quite easily to get the most recent one for each ItemIdentifier.
with sortedResults as
(
select *
, ROW_NUMBER() over(PARTITION by ItemIdentifier order by EffectiveDate desc) as RowNum
from PriceHistory
)
select *
from sortedResults
where RowNum = 1
Short answer, no.
You're hitting the same table twice, and possibly creating a looped table scan, depending on your existing indexes. In the best case, you're causing a looped index seek, and then throwing out most of the rows.
This would be the most efficient query for what you're asking.
SELECT
L.EffectiveDate,
L.Price
FROM
(
SELECT
L.EffectiveDate,
L.Price,
ROW_NUMBER() OVER (
PARTITION BY
L.ItemIdentifier,
L.EffectiveDate
ORDER BY RowID DESC ) RowNum
FROM [PriceHistory] L
WHERE L.ItemIdentifier = 'ABC'
) L
WHERE
L.RowNum = 1;

SQL Server Hierarchical Sum of column

I have my database design as per the diagram.
Category table is self referencing parent child relationship
Budget will have all the categories and amount define for each category
Expense table will have entries for categories for which the amount has been spend (consider Total column from this table).
I want to write select statement that will retrieve dataset with columns given below :
ID
CategoryID
CategoryName
TotalAmount (Sum of Amount Column of all children hierarchy From BudgetTable )
SumOfExpense (Sum of Total Column of Expense all children hierarchy from expense table)
I tried to use a CTE but was unable to produce anything useful. Thanks for your help in advance. :)
Update
I just to combine and simplify data I have created one view with the query below.
SELECT
dbo.Budget.Id, dbo.Budget.ProjectId, dbo.Budget.CategoryId,
dbo.Budget.Amount,
dbo.Category.ParentID, dbo.Category.Name,
ISNULL(dbo.Expense.Total, 0) AS CostToDate
FROM
dbo.Budget
INNER JOIN
dbo.Category ON dbo.Budget.CategoryId = dbo.Category.Id
LEFT OUTER JOIN
dbo.Expense ON dbo.Category.Id = dbo.Expense.CategoryId
Basically that should produce results like this.
This is an interesting problem. And I'm going to solve it with a hierarchyid. First, the setup:
USE tempdb;
IF OBJECT_ID('dbo.Hierarchy') IS NOT NULL
DROP TABLE dbo.[Hierarchy];
CREATE TABLE dbo.Hierarchy
(
ID INT NOT NULL PRIMARY KEY,
ParentID INT NULL,
CONSTRAINT [FK_parent] FOREIGN KEY ([ParentID]) REFERENCES dbo.Hierarchy([ID]),
hid HIERARCHYID,
Amount INT NOT null
);
INSERT INTO [dbo].[Hierarchy]
( [ID], [ParentID], [Amount] )
VALUES
(1, NULL, 100 ),
(2, 1, 50),
(3, 1, 50),
(4, 2, 58),
(5, 2, 7),
(6, 3, 10),
(7, 3, 20)
SELECT * FROM dbo.[Hierarchy] AS [h];
Next, to update the hid column with a proper value for the hiearchyid. I'll use a bog standard recursive cte for that
WITH cte AS (
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST('/' + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
WHERE [h].[ParentID] IS NULL
UNION ALL
SELECT [h].[ID] ,
[h].[ParentID] ,
CAST([c].[h] + CAST(h.[ID] AS VARCHAR(10)) + '/' AS VARCHAR(MAX)) AS [h],
[h].[hid]
FROM [dbo].[Hierarchy] AS [h]
JOIN [cte] AS [c]
ON [h].[ParentID] = [c].[ID]
)
UPDATE [h]
SET hid = [cte].[h]
FROM cte
JOIN dbo.[Hierarchy] AS [h]
ON [h].[ID] = [cte].[ID];
Now that the heavy lifting is done, the results you want are almost trivially obtained:
SELECT p.id, SUM([c].[Amount])
FROM dbo.[Hierarchy] AS [p]
JOIN [dbo].[Hierarchy] AS [c]
ON c.[hid].IsDescendantOf(p.[hid]) = 1
GROUP BY [p].[ID];
After much research and using test data, I was able to get the running totals starting from bottom of hierarchy.
The solution is made up of two steps.
Create a scalar-valued function that will decide whether a categoryId is a direct or indirect child of another categoryId. This is given in first code-snippet. Note that a recursive query is used for this since that is the best approach when dealing with hierarchy in SQL Server.
Write the running total query that will give totals according to your requirements for all categories. You can filter by category if you wanted to on this query. The second code snippet provides this query.
Scalar-valued function that tells if a child category is a direct or indirect child of another category
CREATE FUNCTION dbo.IsADirectOrIndirectChild(
#childId int, #parentId int)
RETURNS int
AS
BEGIN
DECLARE #isAChild int;
WITH h(ParentId, ChildId)
-- CTE name and columns
AS (
SELECT TOP 1 #parentId, #parentId
FROM dbo.Category AS b
UNION ALL
SELECT b.ParentId, b.Id AS ChildId
FROM h AS cte
INNER JOIN
Category AS b
ON b.ParentId = cte.ChildId AND
cte.ChildId IS NOT NULL)
SELECT #isAChild = ISNULL(ChildId, 0)
FROM h
WHERE ChildId = #childId AND
ParentId <> ChildId
OPTION(MAXRECURSION 32000);
IF #isAChild > 0
BEGIN
SET #isAChild = 1;
END;
ELSE
BEGIN
SET #isAChild = 0;
END;
RETURN #isAChild;
END;
GO
Query for running total starting from bottom of hierarchy
SELECT c.Id AS CategoryId, c.Name AS CategoryName,
(
SELECT SUM(ISNULL(b.amount, 0))
FROM dbo.Budget AS b
WHERE dbo.IsADirectOrIndirectChild( b.CategoryId, c.Id ) = 1 OR
b.CategoryId = c.Id
) AS totalAmount,
(
SELECT SUM(ISNULL(e.total, 0))
FROM dbo.Expense AS e
WHERE dbo.IsADirectOrIndirectChild( e.CategoryId, c.Id ) = 1 OR
e.CategoryId = c.Id
) AS totalCost
FROM dbo.Category AS c;

How can I increment an integer field in a SQL Server database table only when another field changes value?

I have an integer field (essentially a counter) called PID in a sql server table called Projects that I want to increment only when the value of another field changes (Task). Initially PID=1 for all rows. The following query (which I got from one of your answers elsewhere) does exactly what I want but I need to update my table with the result and that I cannot figure out.
SELECT Task,
dense_rank() over(order by Task) PID
FROM dbo.Projects;
If I do something like
Update Projects
SET Projects.PID =(SELECT Task,
dense_rank() over(order by Task) PID
FROM dbo.Projects);
I get "The select list for the INSERT statement contains more items than the insert list. The number of SELECT values must match the number of INSERT columns." How can I update my table with a query that gives me what I want?
This is the table design:
CREATE TABLE [dbo].[Projects]
(
[PID] [int] NULL
, [TID] [int] IDENTITY(1,1) NOT NULL
, [Project] [nvarchar](127) NOT NULL
, [Task] [nvarchar](127) NOT NULL
, [Dollars] [decimal](18, 0) NOT NULL
, [TaskLead] [nvarchar](127) NULL
) ON [PRIMARY];
I populate the table with
INSERT INTO dbo.Projects(Project, Task, Dollars, TaskLead)
SELECT Project + ' ' + ProjectDescription, Task + ' ' + TaskDescription, Dollars, TaskLead
FROM TM1_1
ORDER BY Project ASC, Task ASC;
E.g. data:
PID TID Project Task
1 1 Prj1 Tsk11
1 2 Prj1 Tsk12
2 1 Prj2 Tsk21
I want to update the table such that all projects that are the same have the same PID. I am now trying:
use mbt_tm1;
;WITH cteRank AS (
SELECT PID, DENSE_RANK() OVER ( PARTITION BY Project ORDER BY Project ASC) AS Calculated_Rank
FROM Projects )
UPDATE cteRank
SET PID = Calculated_Rank
Without knowing more about your exact question, perhaps this would work?
Update Projects
SET Projects.PID = dense_rank() over(order by Task);
To help with your question, I've put together a little sample that I think shows what you are asking. If this does not reflect your environment, please add details to your question.
CREATE TABLE Projects
(
ProjectID INT
, Task INT
, PID INT
);
GO
INSERT INTO Projects (ProjectID, Task, PID) VALUES (1, 1, 1);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (2, 1, 2);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (3, 1, 3);
INSERT INTO Projects (ProjectID, Task, PID) VALUES (4, 2, 1);
UPDATE Projects
SET Projects.PID = (
SELECT MAX(PID) + 1
FROM Projects p
WHERE P.Task = Projects.Task
);
SELECT *
FROM Projects;
[EDIT # 2]
Unless I am misunderstanding your requirements, this seems a simple way to update the PID for all associated projects:
UPDATE Projects
SET PID = (SELECT MAX(PID) FROM Projects P WHERE P.Project = Projects.Project);
You have to return only one value from the inner query and you're looking for a rank:
Update Projects
SET Projects.PID = X.Calculated_Rank
FROM Project P
INNER JOIN
(SELECT P1.PID, DENSE_RANK() OVER ( PARTITION BY P1.Project ORDER BY P1.Project ASC) AS Calculated_Rank
FROM Projects P1) X
ON X.PID = P.PID
This is what worked:
;WITH cte AS (
SELECT PID,
DENSE_RANK() OVER (
ORDER BY Project ASC) AS Calculated_Rank
FROM Projects
)
UPDATE cte SET PID = Calculated_Rank
It transformed a query that showed the result, namely:
SELECT Project,
dense_rank() over(order by Project) PID
FROM dbo.Projects;
into one that actually updated the table. Thank you so much for your support!
sorry to say but your question is very unclear. yet the original SELECT query you told is working for you and you want to use it for update i see that update is wrong.
correct update syntex would be something like this.
update PID in table with new Rank where Task is matching.
UPDATE t1 SET t1.Projects.PID = t2.pid
FROM Projects t1
JOIN
(
SELECT Task ,DENSE_RANK() OVER ( ORDER BY Task ) PID
FROM Projects
)t2
ON t1.Task=t2.task
EDIT:-1
this query is to get the output as you explained and shown in example.
no need to use CTE.
set nocount on
declare #prj table
(
pid int null
,tid int null
,prj sysname
,tsk sysname
)
insert into #prj (prj,tsk)
select 'prj1','tsk11'
union all select 'prj1','tsk12'
union all select 'prj2','tsk21'
union all select 'prj2','tsk22'
update t1 set t1.pID=t2.pID, t1.tID=t2.tID
from #prj t1
join
(
select dense_RANK() over (order by prj) as pID
,ROW_NUMBER() over (partition by prj order by prj,tsk) as tID
,prj,tsk
from #prj
)t2
on t1.prj=t2.prj and t1.tsk=t2.tsk
select * from #prj

SQL Newbie Needs Assistance w/ Query

Below is what I am trying to do in SQL Server 2012. I want to Update Table 2 with the % of total that each AMT value is to the total in Table 1. But the denominator to get the % should only be the total of the rows that have the same MasterDept. I can use this SELECT query to get the correct percentages when I load the table with only one MasterDept but do not know how to do it when there are multiple MasterDept. The first 2 columns in each table are identical both in structure and the data within the columns.
SELECT ABCID,
[AMT%] = ClientSpreadData.AMT/CONVERT(DECIMAL(16,4),(SELECT SUM(ClientSpreadData.AMT)
FROM ClientSpreadData))
FROM ClientSpreadData
Table data
TABLE 1 (MasterDept varchar(4), ABCID varchar(20), AMT INT)
Sample Data (4700, 1, 25),
(4300, 2, 30),
(4700, 3, 50),
(4300, 4, 15)
TABLE 2 (MasterDept varchar(4), ABCID varchar(20), [AMT%] INT)
Sample Data (4700, 1, AMT%)
AMT% should equal AMT / SUM(AMT). SUM(AMT) should only be summing the values where the MasterDept on Table 1 matches the MasterDept from the record on Table 2.
Does that make sense?
You can use a window to get a partitioned SUM():
SELECT MasterDept, ABCID, AMT, SUM(AMT) OVER(PARTITION BY MasterDept)
FROM #Table1
You can use that to get the percentage for each row to update your second table (this assumes 1 row per MasterDept/ABCID combination):
UPDATE A
SET A.[AMT%] = B.[AMT%]
FROM Table2 A
JOIN (SELECT MasterDept
, ABCID
, AMT
, CASE WHEN SUM(AMT) OVER(PARTITION BY MasterDept) = 0 THEN 0
ELSE AMT*1.0/SUM(AMT) OVER(PARTITION BY MasterDept)
END 'AMT%'
FROM #Table1
) B
ON A.MasterDept = B.MasterDept
AND A.ABCID = B.ABCID
As you can see in the subquery, a percent of total can be added to your Table1, so perhaps you don't even need Table2 as it's a bit redundant.
Update: You can use a CASE statement to handle a SUM() of 0.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Resources