Resolving hierarchy in a database table

Resolving hierarchy in a database table - sql-server

I have an org chart table which is modeled like this:
+-------------+------------+-----------------+
| Employee_ID | Manager_ID | Department_Name |
+-------------+------------+-----------------+
| 1 | 2 | Level1 |
| 2 | 3 | Level2 |
| 3 | | Level3 |
+-------------+------------+-----------------+
So, each employee refers to another row, in a chain which represents the org chart. With all employees, this model is used to represent the hierarchy.
However, for reporting purposes, we'd need to query a denormalized table, i.e. where the data is represented like this:
+-------------+--------+--------+--------+
| Employee_ID | ORG_1 | ORG_2 | ORG_3 |
+-------------+--------+--------+--------+
| 1 | Level1 | | |
| 2 | Level1 | Level2 | |
| 3 | Level1 | Level2 | Level3 |
+-------------+--------+--------+--------+
with an many ORG_x columns as needed to represent all levels that can be found. Then you can do simple groupings such as GROUP BY ORG_1, ORG_2, ORG_3. Note that one could reasonably assume the maximum number of levels.
So here's my question: since the database sits on SQL server, can I expect this to be feasible in Transact-SQL so that I could build a view?
Before I start learning T-SQL, I want to make sure I'm on the right track.
(BTW, if yes, I'd be interested in recommendations for a good tutorial!)
Thanks!
R.

I would use common table expressions with PIVOT:
DECLARE #T TABLE
(
Employee_ID int,
Manager_ID int,
Department_Name varchar(10)
);
INSERT #T VALUES
(1,2,'Level 1'),
(2,3,'Level 2'),
(3,NULL,'Level 3');
WITH C AS (
SELECT Employee_ID, Manager_ID, Department_Name
FROM #T
UNION ALL
SELECT T.Employee_ID, T.Manager_ID, C.Department_Name
FROM C
JOIN #T T ON C.Manager_ID=T.Employee_ID
), N AS (
SELECT ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY Department_Name) N, *
FROM C
)
SELECT Employee_ID, [1] ORG_1, [2] ORG_2, [3] ORG_3
FROM N
PIVOT (MAX(Department_Name) FOR N IN ([1],[2],[3])) P
ORDER BY Employee_ID
Result:
Employee_ID ORG_1 ORG_2 ORG_3
----------- ---------- ---------- ----------
1 Level 1 NULL NULL
2 Level 1 Level 2 NULL
3 Level 1 Level 2 Level 3
Note: If you have only 3 levels, you can also do simple 3 x JOIN

Yes the pattern you have here is known as an adjacency list. It is very common. The downside is that build your tree requires you to use recursion which can lead to performance problems on large sets. Another approach that is a lot faster is to use the Nested Sets model. It is a little less intuitive at first but once you understand the concept it is super easy.
No matter which model you use to store your data it is going to require a dynamic pivot or a dynamic crosstab to get it in the denormalized format you need.

Related

How to INSERT rows based on other rows?

I need to run a query that will INSERT new rows into a SQL Server join table.
Suppose I have the following tables to describe which products a store sells and in which states:
products:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1 | Laptop |
| 2 | Aspirin |
| 3 | Mattress |
+------------+--------------+
stores:
+----------+------------+
| store_id | store_name |
+----------+------------+
| 1 | Walmart |
| 2 | Best Buy |
| 3 | Sam's Club |
+----------+------------+
products_stores_states:
+------------+----------+-------+
| product_id | store_id | state |
+------------+----------+-------+
| 1 | 2 | AL |
| 1 | 2 | AR |
| 2 | 2 | AL |
| 2 | 2 | AR |
| 3 | 2 | AL |
| 3 | 2 | AR |
+------------+----------+-------+
So here we see that Best Buy sells all 3 products in AL and AR.
What I need to do is somehow insert rows into the products_stores_states table to add AZ for all products it currently sells.
With a small dataset, I could do this manually, row by row:
INSERT INTO products_stores_states (product_id, store_id, state) VALUES
(1,2,'AZ'),
(2,2,'AZ'),
(3,2,'AZ');
Since this is a large dataset, this is not really an option.
How would I go about inserting a new state for Best Buy for every product_id that the products_stores_states table already contains for Best Buy?
Bonus: If a query could be made to do this for multiple states that the same time, that would be even better.
Right now, I cannot wrap my head around how to do this, but I assume there would need to be a subquery to get the list of matching product_id values I need to use.

The following query will do what you want to do
DECLARE #temp TABLE (
state VARCHAR(20)
)
-- we are inserting state names into a temp table to use it further
INSERT INTO #temp (state)
VALUES
('AZ'),
('MA'),
('TX');
INSERT INTO products_stores_states(product_id, store_id, state )
SELECT
T.product_id,
T.store_id,
temp.state
FROM(
SELECT DISTINCT
Product_id, store_id
FROM
products_stores_states
WHERE store_id = 2 -- the store_id for which you want to make changes
) AS T
CROSS JOIN
#temp AS temp
At first, we are storing the state names into a table variable. Then we need to select only the distinct store_id and product_id combinations for a specific store.
Then we should insert the distinct values cross join with the table variable where we stored state names.
Here is the live demo.
Hope, this helps! Thanks.

If you are positive the inserted state is "NEW" to the table, something like this would work, changing the state variable to whatever you want to insert the new records.
DECLARE #State CHAR(2), #StoreId INT;
SET #State = 'AZ';
SET #StoreId = 2;
INSERT INTO products_stores_states (product_id, store_id, state)
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;
You could first see what this statement would add using this:
DECLARE #State CHAR(2)
SET #State = 'AZ';
SET #StoreId = 2;
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;

SQL Server Indexes - Column Order

Going of the diagram here: I'm confused on column 1 and 3.
I am working on an datawarehouse table and there are two columns that are used as a key that gets you the primary key.
The first column is the source system. there are three possible values Lets say IBM, SQL, ORACLE. Then the second part of the composite key is the transaction ID it could ne numerical or varchar. There is no 3rd column. Other than the secret key which would be a key generated by Identity(1,1) as the record gets loaded. So in the graph below I imagine if I pass in a query
Select a.Patient,
b.Source System,
b.TransactionID
from Patient A
right join Transactions B
on A.sourceSystem = B.sourceSystem and
a.transactionID = B.transactionID
where SourceSystem = "SQL"
The graph leads me to think that column 1 in the index should be set to the SourceSystem. Since it would immediately split the drill down into the next level of index by a 3rd. But when showing this graph to a coworker, they interpreted it as column 1 would be the transactionID, and column 2 as the source system.
Cols
1 2 3
-------------
| | 1 | |
| A |---| |
| | 2 | |
|---|---| |
| | | |
| | 1 | 9 |
| B | | |
| |---| |
| | 2 | |
| |---| |
| | 3 | |
|---|---| |

First, you should qualify all column names in a query. Second, left join usually makes more sense than a right join (the semantics are keep all columns in the first table). Finally, if you have proper foreign key relationships, then you probably don't need an outer join at all.
Let's consider this query:
Select p.Patient, t.Source System, t.TransactionID
from Patient p join
Transactions t
on t.sourceSystem = p.sourceSystem and
t.transactionID = p.transactionID
where t.SourceSystem = 'SQL';
The correct index for this query is Transactions(SourceSystem, TransactionId).
Notes:
Outer joins affect the choice of indexes. Basically if one of the tables has to be scanned anyway, then an index might be less useful.
t.SourceSystem = 'SQL' and p.SourceSystem = 'SQL' would probably optimize differently.
Does the patient really have a transaction id? That seems strange.

sum column with duplicates in another table

Wrong Result
So i have two tables
Order
Staging
Order Table having column structure
+-------+---------+-------------+---------------+----------+
| PO | cashAmt | ClaimNumber | TransactionID | Supplier |
+-------+---------+-------------+---------------+----------+
| 12345 | 100 | 99876 | abc123 | 0101 |
| 12346 | 50 | 99875 | abc123 | 0102 |
| 12345 | 100 | 99876 | abc123 | 0101 |
+-------+---------+-------------+---------------+----------+
Staging Table having column structure
+----------+------------+-------------+---------------+
| PONumber | paymentAmt | ClaimNumber | TransactionID |
+----------+------------+-------------+---------------+
| 12345 | 100 | 99876 | abc123 |
| 12346 | 50 | 99875 | abc123 |
+----------+------------+-------------+---------------+
The query i am executing is
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM [order] with (nolock)
WHERE TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging with (nolock)
where TransactionID='abc123'
but the sum is getting messed up because there is duplicate in one of the tables.
How can i edit that i get only uniques from the order table and the sums are correct

First ask yourself why are there duplicates in the Orders table? There must be a reason why they are there. I would deal with that first.
That issue aside, if the duplicates in the Orders table have a purpose and yet are not to be considered for this particular query, then you should be able to leave out the duplicates by simply changing the query to use DISTINCT on whatever field in the Orders table can reliably identify a duplicate.
select Distinct fieldname sum(cashAmt)... etc.

Assuming duplicates in your table are OK.
Not sure why you are using no lock, it seems like it shouldn't be included.
You could use a table variable to store the distinct values. You'll need to adjust the data types in the table variable to match your table structure.
I haven't tested the code below but it should look something like this.
DECLARE #OrderTmp TABLE (
cashAmt MyNumericColumn numeric(10,2)
, ClaimNumber int
, TransactionID Int
)
INSERT INTO #OrderTmp
select Distinct
cashAmt
,ClaimNumber
,TransactionID
FROM
[order]
WHERE TransactionID='abc123'
SELECT DISTINCT
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM #OrderTmp
where TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging
where TransactionID='abc123'

Data Type conversion error in Pivot Table

I have a database that stores our company's positions and "requirements". I.e. each position needs to have undergone a building induction, etc., etc. There's a program that allows the users to see/manage all this, but of course, someone wants an export for a client and it's not really possible with the current setup, I'm thinking a quick pivot table will get the job done though.
I have the following tables;
---------------------------
| Positions |
---------------------------
| PositionID | int |
| PositionName | nvarchar |
---------------------------
------------------------------
| Requirements |
------------------------------
| RequirementID | int |
| RequirementName | nvarchar |
| RequirementType | bit |
------------------------------
-------------------------
| Position Requirements |
-------------------------
| Position_ID | int |
| Requirement_ID | int |
-------------------------
What I would like to do is pull out the data for a specific Position or Positions, i.e. SELECT * FROM Positions WHERE PositionName LIKE '%Manager%';
These Positions would form the leftmost column of the PivotTable.
For the top row of the PivotTable, I would like to have each RequirementName.
The internal data would be the RequirementType field (i.e. '0' or '1', maybe 'Any' / 'All').
I've read and read and read, but I can never quite seem to get my head around the concept of them, so this is my current attempt;
SELECT *
FROM Requirements
PIVOT (MAX(RequirementType) FOR RequirementName IN ([Requirement], [Names], [Go], [Here])) AS pivtable
WHERE [Requirement], [Names], [Go], [Here] IN (
SELECT RequirementName FROM Requirements WHERE RequirementID IN (
SELECT Requirement_ID FROM PositionRequirements WHERE Position_ID IN (
SELECT PositionID FROM Positions WHERE PositionName LIKE '%Manager%')));

Your PIVOT query is not arranged properly. Try this one:
SELECT * FROM
(
SELECT
PositionName,
RequirementType,
RequirementName
FROM [Position Requirements] A
LEFT JOIN Positions B ON A.Position_ID=B.PositionID
LEFT JOIN Requirements C ON A.Requirement_ID=C.RequirementID
WHERE PositionName LIKE '%Manager%'
) AS TABLE
PIVOT(MAX(RequirementType) FOR RequirementName IN ([Requirement],[Names],[Go],[Here])AS pvt

Search for string in a text column and list the count

Ok, I may be asking very stupid question but somehow I am not able to get a way to perform the following.
I have a table that contains two columns as below
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| SL No | Work |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Identify Process Champs across all teams for BCUK processes |
| 2 | Impart short training on FMEA to all the Process Champs |
| 2 | List down all critical steps involved in the Process to ascertain the risk involved, feed the details back to FMEA template to analyze the risk |
| 3 | Prioritize the process steps based on Risk Priority Number |
| 4 | Identity the Process Gaps, suggest process improvement ideas to mitigate/mistake proof or reduce the risk involved in the process |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------+
Now I have other table that holds the "Key Words" like below
+-------+----------+
| Sl No | Tags |
+-------+----------+
| 1 | BCUK |
| 2 | FMEA |
| 3 | Priority |
| 4 | Process |
+-------+----------+
Now I would like to "Search for String" in first table based on "tags" in second table and return something like this
+----------+-------+
| Tags | Count |
+----------+-------+
| BCUK | 1 |
| FMEA | 2 |
| Priority | 1 |
| Process | 8 |
+----------+-------+
As "Process" keyword appears eight times in the entire table (first table) across multiple rows it returns the count as 8.
I am using SQL Server 2014 Express Edition

Adam Machanic has a function GetSubstringCount for this kind of operations. I modified it a bit for your needs. For more info: http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
SAMPLE DATA
CREATE TABLE MyTable(
SLNo INT,
Work VARCHAR(4000)
)
INSERT INTO MyTable VALUES
(1, 'Identify Process Champs across all teams for BCUK processes'),
(2, 'Impart short training on FMEA to all the Process Champs'),
(2, 'List down all critical steps involved in the Process to ascertain the risk involved, feed the details back to FMEA template to analyze the risk'),
(3, 'Prioritize the process steps based on Risk Priority Number'),
(4, 'Identity the Process Gaps, suggest process improvement ideas to mitigate/mistake proof or reduce the risk involved in the process');
CREATE TABLE KeyWord(
SLNo INT,
Tag VARCHAR(20)
)
INSERT INTO KeyWord VALUES
(1, 'BCUK'),
(2, 'FMEA'),
(3, 'Priority'),
(4, 'Process');
SOLUTION
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2 AS(SELECT 1 AS N FROM E1 a, E1 b)
,E4 AS(SELECT 1 AS N FROM E2 a, E2 b)
,Tally(N) AS(
SELECT TOP(11000) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))FROM E4 a, e4 b
)
SELECT
k.Tag,
[Count] = SUM(x.cc)
FROM KeyWord k
CROSS JOIN MyTable m
CROSS APPLY(
SELECT COUNT(*) AS cc
FROM Tally
WHERE
SUBSTRING(m.Work, N, LEN(k.tag)) = k.tag
)x
GROUP BY k.tag
RESULT
Tag Count
-------------------- -----------
BCUK 1
FMEA 2
Priority 1
Process 8

Instead of counting the matches, I am replacing them with an extra character and comparing the length with the original length. That way the counting is very easy and fast.
Test tables and data
DECLARE #texts table(SL_No int identity(1,1),Work varchar(max))
INSERT #texts VALUES
('Identify Process Champs across all teams for BCUK processes'),
('Impart short training on FMEA to all the Process Champs'),
('List down all critical steps involved in the Process to ascertain the risk involved, feed the details back to FMEA template to analyze the risk'),
('Prioritize the process steps based on Risk Priority Number'),
('Identity the Process Gaps, suggest process improvement ideas to mitigate/mistake proof or reduce the risk involved in the process')
DECLARE #searchvalues table(S1_No int identity(1,1),Tags varchar(max))
INSERT #searchvalues
VALUES('CUK'),('FMEA'),('Priority'),('Process')
Query:
SELECT
sum(len(replace(txt.work, sv.tags, sv.tags + '#')) - len(txt.work)) count,
tags
FROM
#texts txt
CROSS APPLY
#searchvalues sv
WHERE charindex(sv.tag, txt.work) > 0
GROUP BY tags
Result:
count tags
1 CUK
2 FMEA
1 Priority
8 Process

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Resolving hierarchy in a database table - sql-server

Related

How to INSERT rows based on other rows?

SQL Server Indexes - Column Order

sum column with duplicates in another table

Data Type conversion error in Pivot Table

Search for string in a text column and list the count

Categories

Resources