Creating SQL View that checks for data in multiple tables - sql-server

So I have a sql table that will hold an ID like so
---------------------------------------------------
| RecordID | Date | Price
--------------------------------------------------
| 1 | 8/31/2016 | 49
--------------------------------------------------
| 2 | 8/31/2016 | 101
--------------------------------------------------
And I have 3 other tables that will hold information about this ID and can have different amount of columns
Table 1
---------------------------------------------------
| RecordID | Date | Price | Name
--------------------------------------------------
| 1 | 8/31/2016 | 50 | System
--------------------------------------------------
Table 2
---------------------------------------------------
| RecordID | Date | Price | Coupon
--------------------------------------------------
| 2 | 8/31/2016 | 100 | 7
--------------------------------------------------
but the IDs are split between them. Meaning table 1 can have ID 1 then table 2 can have ID 2 and so on. An ID can only exist in one of the three tables.
So my desire is to create a view where i can get the ID and price from the original table and find which table this ID exists in and put it all together nicely like this
---------------------------------------------------
| RecordID | Date | Price1 | Price2
--------------------------------------------------
| 1 | 8/31/2016 | 49 | 50
--------------------------------------------------
| 2 | 8/31/2016 | 101 | 100
--------------------------------------------------

You can do it using LEFT OUTER JOINS and coalesce.
Assuming you have a main table called tRecords and 3 other tables called tInfo1, tInfo2 and tInfo3, all of which have a RecordID and Price columns:
SELECT
A.RecordID
,A.[Date]
,A.Price AS Price1
,COALESCE(B.Price,C.Price,D.Price) AS Price2
FROM
tRecords A
LEFT OUTER JOIN tInfo1 B
ON A.RecordID = B.RecordID
LEFT OUTER JOIN tInfo2 C
ON A.RecordID = C.RecordID
LEFT OUTER JOIN tInfo3 D
ON A.RecordID = D.RecordID

Depending on the type of data you have in your tables, consider creating a View to combine the results of your 3 other tables, to make a virtual combination table:
create view vRecordCombo
select RecordId, Date, Price, Name as 'InfoString', NULL as 'InfoValue'
from table1
UNION ALL
select RecordId, Date, Price, NULL as 'InfoString', Coupon as 'InfoValue'
from table2
Then you can do joins on the combo as if it's a single table containing all your records.

Related

SQL - split data into two columns based on identifier

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:
|---------------------|------------------|
| Product | Header_Flag |
|---------------------|------------------|
| Furniture | Y |
| Bed | N |
| Table | N |
| Chair | N |
| Cosmetics | Y |
| Lip balm | N |
| Lip stick | N |
| Eye liner | N |
| Apparel | Y |
| Shirt | N |
| Trouser | N |
|---------------------|------------------|
The output format I'm looking for would be like this:
|---------------------|------------------|
| Category | Product |
|---------------------|------------------|
| Furniture | Bed |
| Furniture | Table |
| Furniture | Chair |
| Cosmetics | Lip balm |
| Cosmetics | Lip stick |
| Cosmetics | Eye liner |
| Apparel | Shirt |
| Apparel | Trouser |
|---------------------|------------------|
With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:
CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO
SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO
DROP TABLE dbo.YourTable;
AS you can see, the orders both differ.
If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:
CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable YT
ORDER BY I;
Then you can use a cumulative COUNT to put the values into groups and get the header:
WITH Grps AS(
SELECT YT.I,
YT.Product,
YT.HeaderFlag,
COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM dbo.YourTable YT),
Split AS(
SELECT G.I,
MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
G.Product,
G.HeaderFlag
FROM Grps G)
SELECT S.Category,
S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

Get all categories with number of associated records with where clause

So I have two tables:
Categories
-------------------
| Id | Name |
-------------------
| 1 | Category1 |
-------------------
| 2 | Category2 |
-------------------
| 3 | Category3 |
-------------------
Products
--------------------------------------------
| Id | CategoryId | Name | CreatedDate |
--------------------------------------------
| 1 | 1 | Product1 | 2017-05-05 |
--------------------------------------------
| 1 | 1 | Product2 | 2017-05-06 |
--------------------------------------------
| 2 | 2 | Product3 | 2017-12-21 |
--------------------------------------------
I need a query to select all categories along with the number of products for each for a specific time range in which those products were created (CreatedDate).
What I currently have is this:
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
WHERE p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
My issue is that I'm not seeing Category2 and Category3 in the results set because they don't pass the WHERE clause. I want to see all categories in the results set.
Put the where condition in the left join clause
SELECT c.[Name], COUNT(p.[Id]) AS ProductCount
FROM Categories AS c
LEFT JOIN Products AS p ON p.[CategoryId] = c.[Id]
AND p.[CreatedDate] BETWEEN '2017-05-01' AND '2017-06-01'
GROUP BY c.[Name]
This way it is applied to the join only and not to the complete result set.

PostgreSQL conditional join - if column is not NULL

I have a table "temp"
author | title | bibkey | Data
-----------------------------------
John | JsPaper | John2008 | 65
Kate | KsPaper | | 60
| | Data2015 | 80
From this I want to produce two tables, a 'sample_table' and a 'ref_table' like so:
sample_table:
sample_id|ref_id| data
--------------------------
1 | 1 | 65
2 | 2 | 60
3 | 3 | 80
ref_table:
ref_id | author | title | bibkey
--------------------------------------
1 | John | JsPaper | John2008
2 | Kate | KsPaper |
3 | | | Data2015
I've created both tables
CREATE TABLE ref_table ( CREATE TABLE sample_table (
ref_id serial PRIMARY KEY, sample_id serial PRIMARY KEY,
author text, ref_id integer REFERENCES ref_table(ref_id),
title text, data numeric
bibkey text );
);
And inserted the unique author,title,bibkey rows into the reference table as above. What I want to do now is do the join for the sample_table to get the ref_id's. For my insert statement i currently have:
INSERT INTO sample_table (
ref_id,data
)
SELECT ref.ref_id, t.data
FROM
temp t
LEFT JOIN
ref_table ref ON COALESCE(ref.author,'00000') = COALESCE(t.author,'00000')
AND COALESCE(ref.title,'00000') = COALESCE(t.title,'00000')
AND COALESCE(ref.bibkey,'00000') = COALESCE(t.bibkey,'00000');
However i really want to have a conditional statement in the join, rather than all 3 like I have:
IF a bibkey exists for that row, I know it is unique, and join only on that.
If bibkey is NULL, then join on both author and title for the unique pair, and not bibkey.
Is this possible?

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)
I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

Resources