I have a lot of data from an old system which defines the data in a Bill of Materials by the position it exists in a table.
The BoM data table coming from the old system looks like
ID level ItemNumber
1 1 TopItem
2 .2 FirstChildOfTop
3 .2 2ndChildofTop
4 .2 3ChildOfTop
5 ..3 1stChildof3ChildofTop
6 ..3 2ndChildof3ChildofTop
7 .2 4thChildofTop
8 ..3 1stChildof4ChildTop
9 ...4 1stChildof4ChildTop
10 ..3 2ndChildof4ChildofTop
11 .2 5thChildofTop
12 ..3 1stChildof5thChildofTop
13 ...4 1stChildof1stChildof5thChildofTop
14 ..3 2ndChildof5thChildofTop
15 1 2ndTopItem
16 1 3rdTopItem
In my example the ID is consecutive, the real data the ID can be broken but always lowest to highest as that is how the hierarchy is defined.
By using some simple code to replace the level number with tabs we can get visual hierarchy
1 TopItem
2 FirstChildOfTop
3 2ndChildofTop
4 3ChildOfTop
5 1stChildof3ChildofTo
6 2ndChildof3ChildofTo
7 4thChildofTop
8 1stChildof4ChildTop
9 1stChildof4ChildTop
10 2ndChildof4ChildofTo
11 5thChildofTop
12 1stChildof5thChildof
13 1stChildof1stChildof
14 2ndChildof5thChildof
15 2ndTopItem
16 3rdTopItem
As I have about 5,000 of these lists and they are all between 25 and 55 thousand lines long, I need some code to convert this hierarchy to use sql HierarchyID so we can query at any level in the list. At the moment I hope my explanation shows, you have to work from the top to find in the Item is 2nd, 3rd or some other level and if it has any children. The items in the third column exist in a simple Item Master table but its role in a BoM is defined in these tables only.
I'd offer some code but all my attempts and conversion have failed miserably. I'd claim I'm OK a set based queries
The target is Microsoft SQL 2014
The primary aim is to data warehouse the data but enable to people to find sub-assemblies and where used.
Edit:
In answer to Anthony Hancock's very pertinent question I did some work. Please consider the following
ID level ItemNumber sampH lft rgt
1 1 TopItem 1/2 2 28
2 .2 FirstChildOfTop 1/2/3 3 4
3 .2 2ndChildofTop 1/2/3 5 6
4 .2 3ChildOfTop 1/2/3 7 11
5 ..3 1stChildof3ChildofTop 2/3/4 8 9
6 ..3 2ndChildof3ChildofTop 2/3/4 10 11
7 .2 4thChildofTop 1/2/3 13 20
8 ..3 1stChildof4ChildTop 2/3/4 14 17
9 ...4 1stChildof4ChildTop 3/4/5 15 16
10 ..3 2ndChildof4ChildofTop 2/3/4 18 19
11 .2 5thChildofTop 1/2/3/ 20 25
12 ..3 1stChildof5thChildofTop 2/3/4 21 24
13 ...4 1stChildof1stChildof5thChildofTop 3/4/5 22 23
14 ..3 2ndChildof5thChildofTop 2/3/4 26 27
15 1 2ndTopItem 1/2 2 28
16 1 3rdTopItem 1/2 2 28
17 0 verytop 1/ 1 29
Apologies for the awful formatting
1) I have added at line 17 the item we are making - ie this BoM makes the 'verytop' item - so I have renumbered the 'level'
2) I have added in the column 'sampH' column my hand edited PathEnumeratedTree values
3) In the two columns 'lft' and 'rgt' I have added some identifiers of NestedSets data
Please forgive if my hand edited columns aren't correct.
My aim is to get a structure so that someone can query these many deep list to find where an item sits in the tree and what are its children. So I'm open to whatever works.
My testing of the NestedSets - so far - has shown I can do stuff like this:
-- Children of a given parent ItemNumber
Select c.itemnumber, ' is child of 2ndTopItem'
from [dbo].[Sample] as p, [dbo].[Sample] as c
where (c.lft between p.lft and p.rgt)
and (c.lft <> p.lft)
and p.ItemNumber = '2ndTopItem'
But I am completely open to any suggestions how to enumerate the tree structure.
Try the following code:
declare #Source table (
Id int ,
[Level] varchar(20) ,
[Name] varchar(50)
);
declare #Target table (
Id int ,
[Level] int ,
[Name] varchar(50) ,
ParentId int ,
Hid hierarchyid ,
primary key (Id),
unique ([Level], Id),
unique (ParentId, Id)
);
-- 1. The Test Data (Thanks Anthony Hancock for it)
insert into #Source
values
( 1 , '1' , 'TopItem' ),
( 2 , '.2' , 'FirstChildOfTop' ),
( 3 , '.2' , '2ndChildofTop' ),
( 4 , '.2' , '3ChildOfTop' ),
( 5 , '..3' , '1stChildof3ChildofTop' ),
( 6 , '..3' , '2ndChildof3ChildofTop' ),
( 7 , '.2' , '4thChildofTop' ),
( 8 , '..3' , '1stChildof4ChildTop' ),
( 9 , '...4' , '1stChildof4ChildTop' ),
( 10 , '..3' , '2ndChildof4ChildofTop' ),
( 11 , '.2' , '5thChildofTop' ),
( 12 , '..3' , '1stChildof5thChildofTop' ),
( 13 , '...4' , '1stChildof1stChildof5thChildofTop' ),
( 14 , '..3' , '2ndChildof5thChildofTop' ),
( 15 , '1' , '2ndTopItem' ),
( 16 , '1' , '3rdTopItem' );
-- 2. Insert the Test Data to the #Target table
-- with converting of the Level column to int data type
-- to use it as an indexed column in the query # 3
-- (once there are millions of records, that index will be highly useful)
insert into #Target (Id, [Level], [Name])
select
Id,
[Level] = cast(replace([Level],'.','') as int),
[Name]
from
#Source
-- 3. Calculate the ParentId column and update the #Target table
-- to use the ParentId as an indexed column in the query # 4
update t set
ParentId = (
select top 1 Id
from #Target as p
where p.Id < t.Id and p.[Level] < t.[Level]
order by p.Id desc )
from
#Target t;
-- 4. Calculate the Hid column
-- based on the ParentId link and in accordance with the Id order
with Recursion as
(
select
Id ,
ParentId ,
Hid = cast(
concat(
'/',
row_number() over (order by Id),
'/'
)
as varchar(1000)
)
from
#Target
where
ParentId is null
union all
select
Id = t.Id ,
ParentId = t.ParentId ,
Hid = cast(
concat(
r.Hid,
row_number() over (partition by t.ParentId order by t.Id),
'/'
)
as varchar(1000)
)
from
Recursion r
inner join #Target t on t.ParentId = r.Id
)
update t set
Hid = r.Hid
from
#Target t
inner join Recursion r on r.Id = t.Id;
-- 5. See the result ordered by Hid
select
Id ,
[Level] ,
[Name] ,
ParentId ,
Hid ,
HidPath = Hid.ToString()
from
#Target
order by
Hid;
Read more about Combination of Id-ParentId and HierarchyId Approaches to Hierarchical Data
Using your example data to create a test table and then create parent IDs for each row I think this is what you are after? The big caveat is that this is entirely dependent on your table being ordered correctly for the hierarchies but I don't see any other options from the information provided.
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST
(
ID INT
,[Level] VARCHAR(20)
,ItemNumber VARCHAR(50)
)
;
INSERT INTO TEST
(ID,[Level],ItemNumber)
VALUES
(1,'1','TopItem')
,(2,'.2','FirstChildOfTop')
,(3,'.2','2ndChildofTop')
,(4,'.2','3ChildOfTop')
,(5,'..3','1stChildof3ChildofTop')
,(6,'..3','2ndChildof3ChildofTop')
,(7,'.2','4thChildofTop')
,(8,'..3','1stChildof4ChildTop')
,(9,'...4','1stChildof4ChildTop')
,(10,'..3','2ndChildof4ChildofTop')
,(11,'.2','5thChildofTop')
,(12,'..3','1stChildof5thChildofTop')
,(13,'...4','1stChildof1stChildof5thChildofTop')
,(14,'..3','2ndChildof5thChildofTop')
,(15,'1','2ndTopItem')
,(16,'1','3rdTopItem')
;
SELECT *
,V.ParentID
FROM TEST AS T
OUTER APPLY
(
SELECT TOP 1 ID AS ParentID
FROM TEST AS _T
WHERE _T.ID < T.ID
AND REPLACE(_T.[Level],'.','') < REPLACE(T.[Level],'.','')
ORDER BY _T.ID DESC
) AS V
ORDER BY T.ID
;
DROP TABLE IF EXISTS TEST;
Related
I have this little script that shall return the first number in a column of type int which is not used yet.
SELECT t1.plu + 1 AS plu
FROM tovary t1
WHERE NOT EXISTS (SELECT 1 FROM tovary t2 WHERE t2.plu = t1.plu + 1)
AND t1.plu > 0;
this returns the unused numbers like
3
11
22
27
...
The problem is, that when I make a simple select like
SELECT plu
FROM tovary
WHERE plu > 0
ORDER BY plu ASC;
the results are
1
2
10
20
...
Why the first script isn't returning some of free numbers like 4, 5, 6 and so on?
Compiling a formal answer from the comments.
Credit to Larnu:
It seems what the OP really needs here is an (inline) Numbers/Tally (table) which they can then use a NOT EXISTS against their table.
Sample data
create table tovary
(
plu int
);
insert into tovary (plu) values
(1),
(2),
(10),
(20);
Solution
Isolating the tally table in a common table expression First1000 to produce the numbers 1 to 1000. The amount of generated numbers can be scaled up as needed.
with First1000(n) as
(
select row_number() over(order by (select null))
from ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) a(n) -- 10^1
cross join ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) b(n) -- 10^2
cross join ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) c(n) -- 10^3
)
select top 20 f.n as Missing
from First1000 f
where not exists ( select 'x'
from tovary
where plu = f.n);
Using top 20 in the query above to limit the output. This gives:
Missing
-------
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
19
21
22
23
24
I have a data set produced from a UNION query that aggregates data from 2 sources.
I want to select that data based on whether or not data was found in only of those sources,or both.
The data relevant parts of the set looks like this, there are a number of other columns:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
1
135
1
5
1
135
2
6
1
135
3
7
2
111
1
8
2
135
1
The [preference] column combined with the [group] column is what I'm trying to filter on, I want to return all the rows that have the same [preference] as the MIN([preference]) for each [group]
The desired output given the data above would be rows 1 -> 6
The [preference] column indicates the original source of the data in the UNION query so a legitimate data set could look like:
row
preference
group
position
1
1
111
1
2
1
111
2
3
1
111
3
4
2
111
1
5
2
135
1
In which case the desired output would be rows 1,2,3, & 5
What I can't work out is how to do (not real code):
SELECT * WHERE [preference] = MIN([preference]) PARTITION BY [group]
One way to do this is using RANK:
SELECT row
, preference
, [group]
, position
FROM (
SELECT row
, preference
, [group]
, position
, RANK() OVER (PARTITION BY [group] ORDER BY preference) AS seq
FROM t) t2
WHERE seq = 1
Demo here
Should by doable via simple inner join:
SELECT t1.*
FROM t AS t1
INNER JOIN (SELECT [group], MIN(preference) AS preference
FROM t
GROUP BY [group]
) t2 ON t1.[group] = t2.[group]
AND t1.preference = t2.preference
There are three column,wherever D_ID=13,value_amount holds value for mode of payment and wherever D_ID=10,value_amount holds value for amount.
ID D_ID Value_amount
1 13 2
1 13 2
1 10 1500
1 10 1500
2 13 1
2 13 1
2 10 2000
2 10 2000
Now I have to add two more columns amount and mode_of_payment and result should come like below
ID amount mode_of_payment
1 1500 2
1 1500 2
2 2000 1
2 2000 1
This is too long for a comment.
Simply put, your data is severely flawed. For the example data you've given, you're "ok", because the rows have the same values to the same ID, but what about when they don't? Let's assume, for example, we have data that looks like this:
ID D_ID Value_amount
1 13 1 --1
1 13 2 --2
1 10 1500 --3
1 10 1000 --4
2 13 1 --5
2 13 2 --6
2 10 2000 --7
2 10 3000 --8
I've added a "row number" next to data, for demonstration purposes only.
Here, what row is row "1" related to? Row "3" or row "4"? How do you know? There's no always ascending value in your data, so row "3" could just as easily be row "4". In fact, if we were to order the data using ID ASC, D_ID DESC, Value_amount ASC then rows 3 and 4 would "swap" in order. This could mean that when you attempt a solution, the order in wrong.
Tables aren't stored in any particular order, that are unordered. What determines the order the data is presented in is the ORDER BY clause, and if you don't have a value to define that "order", then that "order" is lost as soon as you INSERT it.
If, however, we add a always ascending value into your data, you can achieve this.
CREATE TABLE dbo.YourTable (UID int IDENTITY,
ID int,
DID int,
Value_amount int);
GO
INSERT INTO dbo.YourTable (ID, DID, Value_amount)
VALUES (1,13,1 ),
(1,13,2 ),
(1,10,1500),
(1,10,1000),
(2,13,1 ),
(2,13,2 ),
(2,10,2000),
(2,10,3000);
GO
WITH RNs AS(
SELECT ID,
DID,
Value_amount,
ROW_NUMBER() OVER (PARTITION BY ID, DID ORDER BY UID ASC) AS RN
FROM dbo.YourTable)
SELECT ID,
MAX(CASE DID WHEN 13 THEN Value_Amount END) AS Amount,
MAX(CASE DID WHEN 10 THEN Value_Amount END) AS PaymentMode
FROM RNs
GROUP BY RN,
ID;
GO
DROP TABLE dbo.YourTable;
Of course, you need to fix your design to implement this, but you need to do that anyway.
Imagine a table :
ID Month Year Value 1
1 May 17 58
2 June 09 42
3 December 18 58
4 December 18 58
5 September 10 84
6 May 17 42
7 January 16 3
I want to return all the data that shares the same month and year where Value 1 is different. So in our example, I want to return 1 and 6 only but not 3 and 4 or any of the other entries.
Is there a way to do this? I am thinking about a combination of distinct and group by but can't seem to come up with the right answer being new to SQL.
Thanks.
It could be done without grouping, but with simple self-join:
select distinct t1.*
from [Table] t1
inner join [Table] t2 on
t1.Month = t2.Month
and t1.Year = t2.Year
and t1.Value_1 <> t2.Value_1
You can find some information and self-join examples here and here.
For each row you can examine aggregates in its group with the OVER clause. eg:
create table #t(id int, month varchar(20), year int, value int)
insert into #t(id,month,year,value)
values
(1,'May' ,17, 58 ),
(2,'June' ,09, 42 ),
(3,'December' ,18, 58 ),
(4,'December' ,18, 58 ),
(5,'September',10, 84 ),
(6,'May' ,17, 42 ),
(7,'January' ,16, 3 );
with q as
(
select *,
min(value) over (partition by month,year) value_min,
max(value) over (partition by month,year) value_max
from #t
)
select id,month,year,value
from q
where value_min <> value_max;
If I understood your question correctly, you are looking for the HAVING keyword.
If you GROUP BY Month, Year, Value_1 HAVING COUNT(*) = 1, you get all combinations of Month, Year and Value_1 that have no other occurrence.
I'm trying to write a incremental update statement using SQL Server 2012.
Current Data:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx NULL
3 16 xxx NULL
12 19 yyy 3
13 19 yyy NULL
14 19 yyy NULL
15 19 yyy NULL
Expected result:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx 1
3 16 xxx 0
12 19 yyy 3
13 19 yyy 2
14 19 yyy 1
15 19 yyy 0
However with following approach, I ended up with the result set as below.
UPDATE a
SET a.Revision = (SELECT MIN(b.Revision)
FROM [dbo].[foo] b
WHERE b.item_code = a.item_code
AND b.budget_id = a.budget_id
GROUP BY b.item_code ) -1
FROM [dbo].[foo] a
WHERE a.Revision is NULL
Result:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx 1
3 16 xxx 1
12 19 yyy 3
13 19 yyy 2
14 19 yyy 2
15 19 yyy 2
Can anyone help me to get this right?
Thanks in advance!
Try this:
;with cte as
(select *, row_number() over (partition by budget_id order by rec_no desc) rn from dbo.foo)
update cte
set revision = rn - 1
Basically, since the revision value seems to be decreasing with increase in rec_no, we simply use the row_number() function to get row number of each record within the subset of all records with a particular budget_id, sorted in descending order of rec_no. Since the least possible value of row_number() will be 1, we subtract 1 so that the last record in the partition will have revision set to 0 instead 1.
You may test the code here
I found this example from this link https://stackoverflow.com/a/13629639/1692632
First you select MIN value to some variable and then you can update table by decreasing variable at same time.
DECLARE #table TABLE (ID INT, SomeData VARCHAR(10))
INSERT INTO #table (SomeData, ID) SELECT 'abc', 6 ;
INSERT INTO #table (SomeData) SELECT 'def' ;
INSERT INTO #table (SomeData) SELECT 'ghi' ;
INSERT INTO #table (SomeData) SELECT 'jkl' ;
INSERT INTO #table (SomeData) SELECT 'mno' ;
INSERT INTO #table (SomeData) SELECT 'prs' ;
DECLARE #i INT = (SELECT ISNULL(MIN(ID),0) FROM #table)
UPDATE #table
SET ID = #i, #i = #i - 1
WHERE ID IS NULL
SELECT *
FROM #table
I'm not sure if this will do the trick but you can try with
Update top(1) a
SET a.Revision = (Select MIN(b.Revision)
FROM [dbo].[foo] b where b.item_code = a.item_code and b.budget_id = a.budget_id
group by b.item_code ) -1
FROM [dbo].[foo] a
WHERE a.Revision is NULL
and repeat until there's no changes left
Update Data
set Revision = x.Revision
from
(select RecNo, Budget_ID, Item_Code, case when Revision is null then ROW_NUMBER() over(partition by Budget_ID order by RecNo desc) - 1 else Revision end Revision
from Data
) x
where x.RecNo = data.RecNo
You basically use ROW_NUMBER() to count backwards for each Budget_ID, and use that row number minus 1 where Revision is null. This is basically the same as Shree's answer, just without the CTE.