SSRS 100% Stacked Bar - Group on Field Names to show totals - sql-server

I am trying to create a 100% stacked bar chart to display totals I have of various counts I'm getting through a query. I cannot figure out how to use the grouping the allow the y axis of the chart to display the different column names of the totals I have. I have created what I want in separate charts but because the way html formats everything there is a bunch of unwanted spacing and I'm trying to create this in one chart. Example below
In this I am just getting the sum of the column I have which is returning a 1 or 0 for each item based on the status. Then for the remaining 100% I'm using an expression to subtract them from a total.
=Sum(Count(Fields!Computer_Name.Value)) - Sum(Fields!ClientHealthEvaluation.Value)
I can modify my query to group all this in the results and just have a query of the totals but I still haven't found a way to make that work in one bar chart either. I would like to know if there is a way to have all this displayed in one bar chart, y axis groups are the separate columns, bars showing the totals so its cleaner or at least to know of a better way to structure all these separate charts in the image so they look more uniform. Is this possible?
--Edit - Adding sample of dataset
resourceID IsActiveDDR IsActivePolicyRequest IsActiveStatusMessages IsActiveHW IsActiveSW ClientHealthEvaluation
16784171 0 0 0 0 0 1
16784668 1 1 1 1 0 1
16784901 0 0 0 0 0 1
16785366 1 0 1 0 0 1
16786781 0 0 0 0 0 1
16786855 0 0 0 0 0 1
16787070 1 1 1 0 0 1
16787571 0 0 0 0 0 1
16787996 1 1 1 1 0 0
16788182 1 1 1 1 0 1
This is the data i currently have and i use sum function on each column to get the totals. I can group this like below if its easier.
Total IsActiveDDR IsActivePolicyRequest IsActiveStatusMessages IsActiveHW IsActiveSW ClientHealthEvaluation
10 5 4 5 3 0 9
---Edit: Updated dataset as suggested to now format the table in a way that will work better with a chart.
select
'Client Health Evaluation' as 'PolicyType',
'Success' as 'Status',
SUM(a.ClientHealthEvaluation) as 'Amount'
from(
Select
summ.Resourceid,
case summ.LastEvaluationHealthy when 1 then 1 else 0 end as 'ClientHealthEvaluation'
from v_CH_ClientSummary summ) a
UNION
select
'Client Health Evaluation' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.ClientHealthEvaluation)) as 'Amount'
from(
Select
summ.Resourceid,
case summ.LastEvaluationHealthy when 1 then 1 else 0 end as 'ClientHealthEvaluation'
from v_CH_ClientSummary summ) a
-------------
UNION
select
'Policy Request' as 'PolicyType',
'Success' as 'Status',
SUM(a.IsActivePolicyRequest) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActivePolicyRequest
from v_CH_ClientSummary summ) a
UNION
select
'Policy Request' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.IsActivePolicyRequest)) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActivePolicyRequest
from v_CH_ClientSummary summ) a
-------------
UNION
select
'Data Discovery' as 'PolicyType',
'Success' as 'Status',
SUM(a.IsActiveDDR) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveDDR
from v_CH_ClientSummary summ) a
UNION
select
'Data Discovery' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.IsActiveDDR)) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveDDR
from v_CH_ClientSummary summ) a
-------------
UNION
select
'Hardware Inventory' as 'PolicyType',
'Success' as 'Status',
SUM(a.IsActiveHW) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveHW
from v_CH_ClientSummary summ) a
UNION
select
'Hardware Inventory' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.IsActiveHW)) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveHW
from v_CH_ClientSummary summ) a
-------------
UNION
select
'Software Inventory' as 'PolicyType',
'Success' as 'Status',
SUM(a.IsActiveSW) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveSW
from v_CH_ClientSummary summ) a
UNION
select
'Software Inventory' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.IsActiveSW)) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveSW
from v_CH_ClientSummary summ) a
-------------
UNION
select
'Status Messages' as 'PolicyType',
'Success' as 'Status',
SUM(a.IsActiveStatusMessages) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveStatusMessages
from v_CH_ClientSummary summ) a
UNION
select
'Status Messages' as 'PolicyType',
'Failure' as 'Status',
(Count(a.resourceid) - SUM(a.IsActiveStatusMessages)) as 'Amount'
from(
Select
summ.Resourceid,
summ.IsActiveStatusMessages
from v_CH_ClientSummary summ) a
Order by PolicyType, Status desc
Output:
PolicyType Status Amount
ClientHealth Evaluation Success 13862
Client Health Evaluation Failure 210
Data Discovery Success 13967
Data Discovery Failure 105
Hardware Inventory Success 13854
Hardware Inventory Failure 218
Policy Request Success 14025
Policy Request Failure 47
Software Inventory Success 13713
Software Inventory Failure 359
Status Messages Success 14018
Status Messages Failure 54
Chart:

If this does not help, please edit your question and include a sample of your base data and dataset output.
I created some sample data as follows - I think this is where yo might need to make a change, by calculating this in your dataset query but until I see your data its hard to tell.
Anyway, here i sthe sample data
DECLARE #t TABLE(Caption varchar(30), Colour varchar(10), Amount int)
INSERT INTO #t VALUES
('Client Health Evaluation', 'Green', 10),
('Client Health Evaluation', 'Red', 1),
('Policy Request', 'Green', 12),
('Policy Request', 'Red', 3),
('Data Discovery', 'Green', 15),
('Data Discovery', 'Red', 2),
('Hardware Inventory', 'Green', 20),
('Hardware Inventory', 'Red', 2),
('Software Inventory', 'Green', 30),
('Software Inventory', 'Red', 5),
('Status Messages', 'Green', 10),
('Status Messages', 'Red', 2)
SELECT * FROM #t
Then I simply added a 100% bar chart and configured it like this...
All I did was drag the fields into the respective bins 'Values'/'Category Groups'/'Series Groups' and added a bit of colour formatting.
Which gives the following output.
EDIT using example data
I've taken you table (again just as a table variable so just swap out #t for the real table name.
I then unpivot the data inside a CTE so I can reference it more than once, then UNION two copies of the data, one is the actual values and one is the values subtracted from the record count to give you the "red" values.
DECLARE #t TABLE (resourceID bigint, IsActiveDDR int, IsActivePolicyRequest int, IsActiveStatusMessages int, IsActiveHW int , IsActiveSW int, ClientHealthEvaluation int)
INSERT INTO #t VALUES
(16784171, 0, 0, 0, 0, 0, 1),
(16784668, 1, 1, 1, 1, 0, 1),
(16784901, 0, 0, 0, 0, 0, 1),
(16785366, 1, 0, 1, 0, 0, 1),
(16786781, 0, 0, 0, 0, 0, 1),
(16786855, 0, 0, 0, 0, 0, 1),
(16787070, 1, 1, 1, 0, 0, 1),
(16787571, 0, 0, 0, 0, 0, 1),
(16787996, 1, 1, 1, 1, 0, 0),
(16788182, 1, 1, 1, 1, 0, 1);
-- SELECT * from #t
-- UNPIVOT THE DATA
WITH nd (caption, FlagSum, RecordCount) AS
(
SELECT caption, SUM(flag) AS flagCount, (SELECT COUNT(*) FROM #t) AS TCount
FROM
(SELECT IsActiveDDR, IsActivePolicyRequest, IsActiveStatusMessages, IsActiveHW, IsActiveSW, ClientHealthEvaluation FROM #t) p
UNPIVOT
(flag FOR caption
IN (IsActiveDDR, IsActivePolicyRequest, IsActiveStatusMessages, IsActiveHW, IsActiveSW, ClientHealthEvaluation)
) unpiv
GROUP BY caption
)
SELECT caption, 'Green' as SeriesGroup, FlagSum FROM nd
UNION ALL
SELECT caption, 'Red', RecordCount - FlagSum FROM nd
This give us the following results
You should be able to drop this straight onto your chart as per this answer above.

Related

PIVOT returns NULL When Converting Rows to Columns

Using the following example data:
INVNUM ORDNUM SHIPNUM INVLINE CHGCODE TAXBODY TAXRATE
I1 O1 0 1 0 36 4.00
I1 O1 0 1 0 51000 4.50
I1 O1 0 1 0 359071 0.37
I2 O2 0 1 0 13 4.00
I2 O2 0 1 0 211 .25
I3 O1 1 1 0 36 4.00
I3 O1 1 1 0 51000 4.50
I3 O1 1 2 A 36 4.00
I3 O1 1 2 A 51000 4.50
I4 O1 0 1 0 359071 6.35
I5 O4 0 1 0 6 6.00
I5 O4 0 1 0 65 0.25
I5 O4 0 1 0 AIHK0 1.00
I5 O4 0 1 0 EMBA0 0.50
I5 O4 0 1 0 EMTQ0 1.00
There can be up to 10 TAXBODY rows for each INVNUM, ORDNUM, SHIPNUM, INVLINE, CHGCODE combination. In other words, there can be up to 10 but there may be less.
Using I2 as an example, I would like the result to be:
INVNUM ORDNUM SHIPNUM INVLINE CHGCODE TAXBODY0 TAXBODY1...TAXBODY9 TAXRATE0 TAXRATE1...TAXRATE9
I2 O2 0 1 0 13 211 NULL 4.00 .25 NULL
I attempted to start small so I could test using PIVOT to get me just the tax rate and I'm getting back NULL in all the TAXRATEx columns. Here is my SQL statement I used:
SELECT * FROM (
SELECT
[invnum],
[ordnum],
[shipnum],
[taxbody],
[taxrate]
FROM #mytable
) TaxDetails
PIVOT (
AVG([taxrate])
FOR [taxbody]
IN (
[taxrate0],
[taxrate1],
[taxrate2],
[taxrate3],
[taxrate4],
[taxrate5],
[taxrate6],
[taxrate7],
[taxrate8],
[taxrate9]
)) AS PivotTable
I realize it doesn't contain everything I need for my unique combination but again, I was trying to start out small for testing purposes. When I run the query, I end up with NULL in TAXRATE0 through TAXRATE9. It also doesn't seem to matter what aggregate function I use, e.g. SUM or MAX. I've not used PIVOT before so I'm sure I'm doing something wrong.
I am requesting help figuring out why my SQL statement doesn't work and ultimately would like help, e.g. links or examples, with the following end result:
One row with 10 TAXBODY and TAXRATE columns for each INVNUM, ORDNUM, SHIPNUM, INVLINE, CHGCODE combination as mentioned above in the example.
The table should allow NULL in the TAXBODYx and TAXRATEx columns where I don't have a value, i.e. I only have 2 tax body rows out of a possible 10.
To be able to use the query or a stored procedure to update the appropriate TAXBODY and TAXRATE column(s) in a table that already exists. The existing table already supports the ability to hold values from multiple TAXBODY and TAXRATE columns.
I've found a couple examples of using PIVOT but they don't seem to work for me. I'm guessing it's because I don't understand how PIVOT works. Resources, e.g. links, would be appreciated as well as an explanation if you provide examples to help get me on the right path.
Basic Example
An example that pivots multiple columns
I think this is more in the direction you need:
Sample data:
declare #tbl table(INVNUM varchar(50), ORDNUM varchar(50), SHIPNUM varchar(50), INVLINE varchar(50), CHGCODE varchar(50), TAXBODY varchar(50), TAXRATE decimal(18,2) )
insert into #tbl
select 'I1', 'O1', '0', ' 1', ' 0', '36', '4.00'
union all select 'I1', 'O1', '0', ' 1', ' 0', '51000','4.50'
union all select 'I1', 'O1', '0', ' 1', ' 0', '359071', '0.37'
union all select 'I2', 'O2', '0', ' 1', ' 0', '13', '4.00'
union all select 'I2', 'O2', '0', ' 1', ' 0', '211', '.25'
union all select 'I3', 'O1', '1', ' 1', ' 0', '36', '4.00'
union all select 'I3', 'O1', '1', ' 1', ' 0', '51000', '4.50'
union all select 'I3', 'O1', '1', ' 2', ' A', '36', '4.00'
union all select 'I3', 'O1', '1', ' 2', ' A', '51000', '4.50'
union all select 'I4', 'O1', '0', ' 1', ' 0', '359071', '6.35'
union all select 'I5', 'O4', '0', ' 1', ' 0', '6', ' 6.00'
union all select 'I5', 'O4', '0', ' 1', ' 0', '65', '0.25'
union all select 'I5', 'O4', '0', ' 1', ' 0', 'AIHK0', '1.00'
union all select 'I5', 'O4', '0', ' 1', ' 0', 'EMBA0', '0.50'
union all select 'I5', 'O4', '0', ' 1', ' 0', 'EMTQ0', '1.00'
For a basic pivot, you need to put actual row values as column names. In your example, this is what you pivot so you have values:
SELECT * FROM (
SELECT [invnum],[ordnum],[shipnum],[taxbody],[taxrate] FROM #tbl) TaxDetails
PIVOT (AVG([taxrate]) FOR [taxbody] IN ([36],[51000],[taxrate2],[taxrate3],[taxrate4],[taxrate5],[taxrate6],[taxrate7],[taxrate8],[taxrate9])) AS PivotTable````
The 2nd link you posted basicly 'creates' 2 new columns on which to pivot data, so that you obtain nice column names.(good link btw).
SELECT [invnum],
[ordnum],
[shipnum],
max(taxrate0) as taxrate0,
max(taxrate1) as taxrate1,
max(taxrate2) as taxrate2,
max(taxrate3) as taxrate3,
max(taxrate4) as taxrate4,
max(taxrate5) as taxrate5,
max(taxbody0) as taxbody0,
max(taxbody1) as taxbody1,
max(taxbody2) as taxbody2,
max(taxbody3) as taxbody3,
max(taxbody4) as taxbody4,
max(taxbody5) as taxbody5
FROM (
SELECT
[invnum],
[ordnum],
[shipnum],
[taxbody],
[taxrate],
'taxrate'+ CAST(DENSE_RANK() OVER (PARTITION BY [invnum], [ordnum] ORDER BY taxrate ASC)-1 AS NVARCHAR) AS taxratenumber,
'taxbody'+ CAST(DENSE_RANK() OVER (PARTITION BY [invnum], [ordnum] ORDER BY taxrate ASC)-1 AS NVARCHAR) AS taxbodynumber
FROM #tbl
) TaxDetails
PIVOT (
AVG([taxrate])
FOR taxratenumber
IN ([taxrate0],[taxrate1],[taxrate2],[taxrate3],[taxrate4],[taxrate5],[taxrate6],[taxrate7],[taxrate8],[taxrate9])) AS PivotTable1
PIVOT (
max([taxBody])
FOR taxbodynumber
IN ([taxbody0],[taxbody1],[taxbody2],[taxbody3],[taxbody4],[taxbody5],[taxbody6],[taxbody7],[taxbody8],[taxbody9])) AS PivotTable2
group by invnum, ordnum, shipnum

Count(*) automatically rounds

I have a query where I am trying to determine what percentage of events happen on certain days and I'm getting nothing but zeroes back. I think (but am not sure) that something is causing my query to round. This is happening to me in SQL Server but not MySQL.
/* create the event table */
create table event (id int
, dayOf datetime
, description varchar(32)
);
/* add some events */
insert into event( id, dayOf, description ) values
( 1, '2018-01-01', 'Thing 1'),
( 2, '2018-01-01', 'Thing 2'),
( 3, '2018-01-02', 'Thing 3'),
( 4, '2018-01-02', 'Thing 4'),
( 5, '2018-01-03', 'Thing 5');
/* try to get % of events by day, but actually get zeroes */
select event_daily.dayOf, event_daily.cnt, event_total.cnt,
event_daily.cnt / event_total.cnt as pct_daily /* this is the zero */
from ( select dayOf, count(*) as cnt from event group by dayOf ) event_daily
, ( select count(*) as cnt from event ) event_total;
Anticipated result:
DateOf cnt cnt pct_daily
1/1/2018 2 5 0.40
1/2/2018 2 5 0.40
1/3/2018 1 5 0.20
Actual result:
DateOf cnt cnt pct_daily
1/1/2018 2 5 0
1/2/2018 2 5 0
1/3/2018 1 5 0
Any help would be much appreciated!
That is because SQL Server performs integer division, you can convert it into float first with CAST
select event_daily.dayOf, event_daily.cnt, event_total.cnt,
CAST(event_daily.cnt AS float) / CAST(event_total.cnt AS float) as pct_daily
from ( select dayOf, count(*) as cnt from event group by dayOf ) event_daily
, ( select count(*) as cnt from event ) event_total;
Try the below approach
declare #TotalCount DECIMAL(18, 2)
select #TotalCount = count(*) from #event
select
a.dayOf, a.DailyCount, a.TotalCount, CONVERT(DECIMAL(18, 2), (A.DailyCount/A.TotalCount)) AS pct_daily
FROM
(select
dayOf, Count(Id) AS DailyCount, #TotalCount as TotalCount
from
#event
group by
dayOf ) a

Retrieve a count of all duplicated values from one column grouping by unique values in the other column

I have a big table with the two columns:
Bld_id - which has multiple unique appartments, so Bld_id may repeat multiple times depending on the number of appartments in it.
Second column is Appartment_Status which has four possible values:
ACTIVE,
NOT ACTIVE
NULL
(blank).
So I want to have my output to look like a table of 6 columns
Bld_id (unique)
Count(ACTIVE Status)
Count(NOT ACTIVE Status)
COUNT (NULL Status)
Count (blank Satus)
Count (Total statuses)
Grouped by all unique Bld_id.
It would be also beneficial to display results of the two statuses below in just one column with the name of Count(No Status)
Count (NULL Status)
Count (blank Satus)
Thanks,
Let's try this for fun
with
--
-- Test case supplied
--
test(Building, Appartement, Status) as
(
select 1, 1, 'ACTIVE' from dual union all
select 1, 2, 'ACTIVE' from dual union all
select 1, 3, 'NOT ACTIVE' from dual union all
select 1, 4, 'BLANK' from dual union all
select 1, 5, NULL from dual union all
select 2, 1, 'ACTIVE' from dual union all
select 2, 2, 'BLANK' from dual union all
select 2, 3, 'NOT ACTIVE' from dual union all
select 2, 4, 'BLANK' from dual union all
select 2, 5, NULL from dual
)
--
-- SELECT statement
--
select Building,
sum(case when Status = 'ACTIVE' then 1 else 0 end) active,
sum(case when Status = 'NOT ACTIVE' then 1 else 0 end) NOT_active,
sum(case when Status = 'BLANK' then 1 else 0 end) Blanks,
sum(case when Status is null then 1 else 0 end) IS_NULLS,
sum(case when Status is null or status = 'BLANK' then 1 else 0 end) no_status
from test
group by building;
Result :
BUILDING ACTIVE NOT_ACTIVE BLANKS IS_NULLS NO_STATUS
---------- ---------- ---------- ---------- ---------- ----------
1 2 1 1 1 2
2 1 1 2 1 3
Is that what you were looking for ?

Using the HAVING clause to identify groups with a defined combination of records

I have a table with names, types and values.
DECLARE #t_Table TABLE
(
Name VARCHAR(10),
[Type] VARCHAR(10),
Value INT
)
INSERT INTO #t_Table
VALUES('Jill', 'Yellow', 100)
INSERT INTO #t_Table
VALUES('Jill', 'Blue', 200)
INSERT INTO #t_Table
VALUES('Jill', 'Green', 300)
INSERT INTO #t_Table
VALUES('Jill', 'Green', 400)
INSERT INTO #t_Table
VALUES('Jill', 'Green', 500)
INSERT INTO #t_Table
VALUES('Bob', 'Yellow', 100)
INSERT INTO #t_Table
VALUES('Bob', 'Blue', 200)
INSERT INTO #t_Table
VALUES('Bob', 'Green', 300)
INSERT INTO #t_Table
VALUES('Bob', 'Orange', 400)
INSERT INTO #t_Table
VALUES('Bob', 'Orange', 400)
INSERT INTO #t_Table
VALUES('Bob', 'Purple', 500)
INSERT INTO #t_Table
VALUES('Steve', 'Yellow', 100)
INSERT INTO #t_Table
VALUES('Steve', 'Blue', 200)
INSERT INTO #t_Table
VALUES('Steve', 'Green', 300)
INSERT INTO #t_Table
VALUES('Steve', 'Orange', 400)
INSERT INTO #t_Table
VALUES('Steve', 'Orange', 400)
I want to get the total value for groups of names where the underlying records in the group satisfy a constraint on the occurrence of specific types. I want to accomplish this with a single aggregate in the HAVING clause.
In the case where I want a group with exactly one record of type x, exactly one record of type y, zero or more records of type z and no other records, I've arrived at the following solution, for example, when I want exactly one Yellow, one Blue and zero or more Green:
SELECT Name,
TotalValue = SUM(Value)
FROM #t_Table
GROUP BY Name
HAVING SUM(CASE WHEN [Type] = 'Yellow' THEN 1
WHEN [Type] = 'Blue' THEN 2
WHEN [Type] = 'Green' THEN 0
ELSE 4 END) = 3
Which correctly returns this result:
Name TotalValue
---------- -----------
Jill 1500
How would I go about constructing the following?
SELECT Name,
TotalValue = SUM(Value)
FROM #t_Table
GROUP BY Name
/*HAVING exactly one record with [Type] = 'Yellow'
and exactly one record with [Type] = 'Blue'
and exactly one record with [Type] = 'Green'
and zero or more records with [Type] = 'Orange'
and no records of any other type
*/
Where the expected result given the data above would be
Name TotalValue
---------- -----------
Steve 1400
I know of the following solution (below), but I need one that has a single aggregate in the HAVING clause. I am also open to another query structure that solves my problem as long as it is as simple or simpler as the structure I have proposed and performs similarly or better.
SELECT
Name,
TotalValue = SUM(Value)
FROM
#t_Table
GROUP BY
Name
HAVING
SUM(CASE WHEN [Type] = 'Yellow' THEN 1 ELSE NULL END) = 1
AND SUM(CASE WHEN [Type] = 'Blue' THEN 1 ELSE NULL END) = 1
AND SUM(CASE WHEN [Type] = 'Green' THEN 1 ELSE NULL END) = 1
AND SUM(CASE WHEN [Type] IN ('Yellow','Blue','Green','Orange') THEN 0 ELSE 1 END) = 0
How about using your concept but with decimal weight on every type:
SqlFiddleDemo
SELECT Name,
TotalValue = SUM(Value)
FROM #t_Table
GROUP BY Name
HAVING SUM(
CASE [Type]
WHEN 'Yellow' THEN 1
WHEN 'Blue' THEN 10
WHEN 'Green' THEN 100
WHEN 'Orange' THEN 0
ELSE 0
END) = 111
It means that exactly 1-Yellow, 1-Blue, 1-Green.
More complex conditions could be accomplished by using BETWEEN or < <= > =. One note this will work as long you will search for max 9 in one group.
If you afraid of overflow due to 10 based system, consider using for example 1000 based system like:
SELECT Name,
TotalValue = SUM(Value)
FROM #t_Table
GROUP BY Name
HAVING SUM(
CASE [Type]
WHEN 'Yellow' THEN 1.0
WHEN 'Blue' THEN 1000.0
WHEN 'Green' THEN 1000000.0
WHEN 'Orange' THEN 0
ELSE 0
END) = 1 * 1000000.0 + 1 * 1000.0 + 1.0 -- For clearance use calculated version
Imagine in the HAVING clause you can compare two polynomials.
Then imagine you define (from the data you have) this polynomial:
count_YELLOW * x^3 +
count_BLUE * x^2 +
count_GREEN * x^1 +
count_ORANGE
/*
HAVING exactly one record with [Type] = 'Yellow'
and exactly one record with [Type] = 'Blue'
and exactly one record with [Type] = 'Green'
and zero or more records with [Type] = 'Orange'
and no records of any other type
*/
Now ... to express what you want in the HAVING clause you would say:
HAVING
count_YELLOW * x^3 +
count_BLUE * x^2 +
count_GREEN * x^1 +
count_ORANGE >=
1 * x^3 +
1 * x^2 +
1 * x^1 +
0
AND
count_YELLOW * x^3 +
count_BLUE * x^2 +
count_GREEN * x^1 +
count_ORANGE <
1 * x^4
Now ...
just pick x = sum (all counts) + 1,
or x = max (all counts) + 1,
and you can turn this into numbers.
I think this will work. I may try it tomorrow in T-SQL.
You will run into kind of big numbers though. This is unavoidable,
since you want to encode unabmiguously a vector of 4 numbers into a single number.
I think this query is more efficient
with cte as
(
select name, [type] tp, nb = count(*)
from #t_table
group by name, [type]
)
Select t1.name, sum(t1.Value)
from # t_table t1 inner join cte t2 on t1.name = t2.name
where nb = (case t2.tp when 'Yellow' then 1
when 'Blue' then 1
...
end)
AND Exist (select * from cte where name = t2.name and t2.tp = 'Yellow')
AND Exist (select * from cte where name = t2.name and t2.tp = 'Blue')
...
group by t1.name

T-SQL - Deduplicate large table

Sorry if this has already been asked. I see a lot of similar questions but none exactly like this one.I am trying to de-dup a large set (about 500 M) records:
Sample data:
CUST_ID PROD_TYPE VALUE DATE
------------------------------------
1 1 Y 5/1/2015 *
1 2 N 5/1/2015 *
1 1 N 5/2/2015 *
1 2 N 5/2/2015
1 1 Y 5/3/2015 *
1 2 Y 5/3/2015 *
1 1 Y 5/6/2015
1 2 N 5/6/2015 *
By CUST_ID and PROD_TYPE, I need to retain the initial records as well as any records having a changed VALUE (the records with the asterisks). There can sometimes be gaps between the dates. There are around 5M unique CUST_ID's.
Any help would be greatly appreciated.
Not sure why LAG isn't working for you, this returns your results:
with t as (
select 1 as CUST_ID, 1 as PROD_TYPE, 'Y' as VALUE, '5/1/2015' as [Date]
union
select 1, 2, 'N', '5/1/2015'
union
select 1, 1, 'N', '5/2/2015'
union
select 1, 2, 'N', '5/2/2015'
union
select 1,1, 'Y', '5/3/2015'
union
select 1, 2, 'Y','5/3/2015'
union
select 1,1, 'Y', '5/6/2015'
union
select 1, 2,'N','5/6/2015')
select
*,
case when
value <>
isnull(lag(value) over (partition by cust_id, prod_type order by [date]),'')
then 1 else 0
end as keep
from
t
order by
[date],
cust_id,
prod_type
Thanks Kyle, that is exactly correct, and I was able to use that as a solution to my problem. The issue I was having (not being familiar with lag) was that I had failed to provide a default, so the gap in dates was creating a NULL value which was giving me problems, but once I provided that, it worked like a charm. Thanks!

Resources