Trying to to do a Sum(MAX) calculation in SQL Server - sql-server

Thanks for all the help that you all provided and though it was an eye opener unfortunately it did not produce the expected results I was looking for. In an effort to better get the help I'm looking for I will try to explain what I'm looking to achieve.
I think the main columns of focus are "IN", "AA_Now", "STF_Now", "dbo.Sheet1$.LOB_name", "dbo.Sheet1$.LifeCycleName" and "dbo.Sheet1$.AreaOfBusiness". Each "IN" have an "AA_Now" and "STF_Now". A group of "IN" rolls up under "dbo.Sheet1$.LOB_name". Under "dbo.Sheet1$.LOB_name" I just want the max value of the Group of "IN" that is rolled up. Now "dbo.Sheet1$.LOB_name" is rolled up under "dbo.Sheet1$.LifeCycleName" and what I want is the sum of of the max values that are rolled up under "dbo.Sheet1$.LOB_name" to show in the rollup of "dbo.Sheet1$.LifeCycleName". Finally "dbo.Sheet1$.LifeCycleName" rolls up to "dbo.Sheet1$.AreaOfBusiness". As before what I'm looking for is the sum of "dbo.Sheet1$.LifeCycleName" to show. These are only for the columns of "AA_Now" and "STF_Now"
I tried doing it from a Pivot table but to no avail and figured that it would be best to sort it out in the raw data.
I'm trying to to do a SUM(MAX) calculation in SQL server and getting the follow error when executing the command
Msg 130, Level 15, State 1, Line 6
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
I'm sure the error is caused by both
,SUM(MAX(convert(float,replace([AA_Now], 'N/A','0')))) As [AA2_Now]
and
,SUM(MAX(convert(float,replace([STF_Now], 'N/A','0')))) As [STF2_Now]
but have no idea how to rewrite it without causing an error.
Below is the full code.
SELECT dbo.CCA_Merged.id, dbo.CCA_Merged.timeStamp, dbo.CCA_Merged.name, dbo.CCA_Merged.lN
,dbo.CCA_Merged.type, dbo.CCA_Merged.id2, dbo.CCA_Merged.aG
,dbo.CCA_Merged.regionId, dbo.CCA_Merged.sgcc
,convert(float,replace([SLC_Today],'N/A','0')) As [SLC_Today]
,convert(float,replace([AA_Now],'N/A','0')) As [AA_Now]
,SUM(MAX(convert(float,replace([AA_Now],'N/A','0')))) As [AA2_Now]
,convert(float,replace([SLCO_Today],'N/A','0')) As [SLCO_Today]
,convert(float,replace([CABN_Today],'N/A','0')) As [CABN_Today]
,convert(float,replace([COF_Today],'N/A','0')) As [COF_Today]
,convert(float,replace([HT_Today],'N/A','0')) As [HT_Today]
,convert(float,replace(replace([CH_Today],'N/A','0'),'-','0')) As [CH_Today]
,convert(float,replace([SLC_Now],'N/A','0')) As [SLC_Now]
,convert(float,replace([SLCO_Now],'N/A','0')) As [SLCO_Now]
,convert(float,replace([SLC_Thirty],'N/A','0')) As [SLC_Thirty]
,convert(float,replace(replace([SLCO_Thirty],'N/A','0'),'-','0')) As [SLCO_Thirty]
,convert(float,replace([ACWT_Today],'N/A','0')) As [ACWT_Today]
,convert(float,replace([CQ_Now],'N/A','0')) As [CQ_Now]
,convert(float,replace([LCQ_Now],'N/A','0')) As [LCQ_Now]
,convert(float,replace([SLCH_Now],'N/A','0')) As [SLCH_Now]
,convert(float,replace([STF_Now],'N/A','0')) As [STF_Now]
,SUM(MAX(convert(float,replace([STF_Now],'N/A','0')))) As [STF2_Now]
,dbo.Sheet1$.AreaOfBusiness, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.LOB_name
FROM dbo.Sheet1$ RIGHT OUTER JOIN
dbo.CCA_Merged ON dbo.Sheet1$.Skill_Name = dbo.CCA_Merged.lN
Group by ROLLUP (stf_now) ,dbo.CCA_Merged.id, dbo.CCA_Merged.timeStamp, dbo.CCA_Merged.name, dbo.CCA_Merged.lN
,dbo.CCA_Merged.type, dbo.CCA_Merged.id2, dbo.CCA_Merged.aG ,dbo.CCA_Merged.regionId
,dbo.CCA_Merged.sgcc,AA_Now,SLC_Today,SLCO_Today,CABN_Today,COF_Today,HT_Today,CH_Today
,SLC_Now,SLCO_Now,SLC_Thirty,SLCO_Thirty,ACWT_Today,CQ_Now,LCQ_Now,SLCH_Now
,dbo.Sheet1$.AreaOfBusiness, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.LOB_name
I'm relatively new with SQL Server and any help would be greatly appreciated.
Thanks in advance
Updated Stripped down script
SELECT dbo.CCA_Merged.lN
,convert(float,replace([STF_Now],'N/A','0')) As [STF_Now]
,dbo.Sheet1$.LOB_name, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.AreaOfBusiness
FROM dbo.Sheet1$ RIGHT OUTER JOIN
dbo.CCA_Merged ON dbo.Sheet1$.Skill_Name = dbo.CCA_Merged.lN
Group by stf_now ,AA_Now,dbo.CCA_Merged.lN,dbo.Sheet1$.AreaOfBusiness, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.LOB_name
Order by AreaOfBusiness DESC
+----+---------+----------+---------------+----------------+
| LN | STF_Now | LOB_name | LifeCycleName | AreaOfBusiness |
+----+---------+----------+---------------+----------------+
| A | 46 | BSW | BS | Business |
| B | 46 | BSW | BS | Business |
| C | 0 | BOSS | BS | Business |
| D | 112 | MSD | BS | Business |
| E | 112 | MSD | BS | Business |
| F | 42 | BHV | BR | Business |
| G | 23 | BCR | BR | Business |
| H | 23 | BHV | BR | Business |
| I | 55 | BSW2 | BS | Business |
| J | 1 | BSW2 | BS | Business |
| K | 46 | BSW | BS | Business |
| L | 112 | MSD | BS | Business |
| M | 112 | MSD | BS | Business |
| N | 57 | BSW | BS | Business |
| O | 0 | BOSS | BS | Business |
| P | 38 | MSD | BS | Business |
| Q | 38 | MSD | BS | Business |
| R | 19 | BHV | BR | Business |
| S | 0 | BCR | BR | Business |
| T | 19 | BHV | BR | Business |
| U | 2 | BSW | BS | Business |
| V | 1 | BSW | BS | Business |
| W | 57 | BSW | BS | Business |
| X | 38 | MSD | BS | Business |
| Y | 38 | MSD | BS | Business |
+----+---------+----------+---------------+----------------+
Below is the the expected results in 3 added columns
LOB_Name2 (This is the Max of STF_Now resulting from LN)
57 BSW
0 BOSS
112 MSD
42 BHV
23 BCR
55 BSW2
LifeCycleName2 (This is the Sum of the Max of the Rollup of LOB_Name2)
224 BS
65 BR
AreaOfBusiness2 (This is the Sum of the Rollup of LifeCycleName2)
289 Business

You can't sum a max because it would be the same amount anyhow, if you have the same group by. You probably need to have an inner and outer parts with different group by, something like:
select
product_group,
sum(max_cost)
from
(
select
product,
product_group,
max(cost) as max_cost
from
orders
group by
product_group,product
) X
group by product_group
This imaginary SQL will fetch maximum cost for each product, and them sum them up to the product group level. That's the only way I can figure out you'd actually need to sum a max

You're having issues because you're trying to do two nested aggregates. This will roughly give you what you're after, if SUM(MAX) is actually what you're trying to do.
However, as James pointed out, You'll need some group by logic that pulls this together.
SELECT ...
, SUM(AA2_Now_Max)
...
, SUM(STF2_Now_Max)
FROM(
SELECT dbo.CCA_Merged.id, dbo.CCA_Merged.timeStamp, dbo.CCA_Merged.name, dbo.CCA_Merged.lN
,dbo.CCA_Merged.type, dbo.CCA_Merged.id2, dbo.CCA_Merged.aG
,dbo.CCA_Merged.regionId, dbo.CCA_Merged.sgcc
,convert(float,replace([SLC_Today],'N/A','0')) As [SLC_Today]
,convert(float,replace([AA_Now],'N/A','0')) As [AA_Now]
,MAX(convert(float,replace([AA_Now],'N/A','0'))) As [AA2_Now_Max]
,convert(float,replace([SLCO_Today],'N/A','0')) As [SLCO_Today]
,convert(float,replace([CABN_Today],'N/A','0')) As [CABN_Today]
,convert(float,replace([COF_Today],'N/A','0')) As [COF_Today]
,convert(float,replace([HT_Today],'N/A','0')) As [HT_Today]
,convert(float,replace(replace([CH_Today],'N/A','0'),'-','0')) As [CH_Today]
,convert(float,replace([SLC_Now],'N/A','0')) As [SLC_Now]
,convert(float,replace([SLCO_Now],'N/A','0')) As [SLCO_Now]
,convert(float,replace([SLC_Thirty],'N/A','0')) As [SLC_Thirty]
,convert(float,replace(replace([SLCO_Thirty],'N/A','0'),'-','0')) As [SLCO_Thirty]
,convert(float,replace([ACWT_Today],'N/A','0')) As [ACWT_Today]
,convert(float,replace([CQ_Now],'N/A','0')) As [CQ_Now]
,convert(float,replace([LCQ_Now],'N/A','0')) As [LCQ_Now]
,convert(float,replace([SLCH_Now],'N/A','0')) As [SLCH_Now]
,convert(float,replace([STF_Now],'N/A','0')) As [STF_Now]
,MAX(convert(float,replace([STF_Now],'N/A','0'))) As [STF2_Now_Max]
,dbo.Sheet1$.AreaOfBusiness, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.LOB_name
FROM dbo.Sheet1$ RIGHT OUTER JOIN
dbo.CCA_Merged ON dbo.Sheet1$.Skill_Name = dbo.CCA_Merged.lN
Group by ROLLUP (stf_now) ,dbo.CCA_Merged.id, dbo.CCA_Merged.timeStamp, dbo.CCA_Merged.name, dbo.CCA_Merged.lN
,dbo.CCA_Merged.type, dbo.CCA_Merged.id2, dbo.CCA_Merged.aG ,dbo.CCA_Merged.regionId
,dbo.CCA_Merged.sgcc,AA_Now,SLC_Today,SLCO_Today,CABN_Today,COF_Today,HT_Today,CH_Today
,SLC_Now,SLCO_Now,SLC_Thirty,SLCO_Thirty,ACWT_Today,CQ_Now,LCQ_Now,SLCH_Now
,dbo.Sheet1$.AreaOfBusiness, dbo.Sheet1$.LifeCycleName, dbo.Sheet1$.LOB_name ) x
GROUP BY ...
This other question is similar- follow the pattern! SQL: SUM the MAX values of results returned

You need to add another query level to SUM your MAX values.
The idea would be to MAX in one select and then SUM the results using a outer query. I use AVG and MAX in the example below, however, any aggregate function could be used.
SELECT
LocationID,
MaxAverageSalePriceByLocation=MAX(AvgerageSalePriceByUserLocation)
FROM
(
SELECT
UserID,
AvgerageSalePriceByUserLocation=AVG(SalePrice)
FROM
MyTable
GROUP BY
UserID,LocationID
)AS A
GROUP BY
LocationID

If you want to sum all the rows Max values then use OVER()
Sum(Max(CONVERT(FLOAT, Replace([STF_Now], 'N/A', '0'))))OVER() AS [STF2_Now]
If you want to sum all the rows Max values for each group then use OVER(Partition by)
Sum(Max(CONVERT(FLOAT, Replace([STF_Now], 'N/A', '0'))))OVER(partition by grp1,grp2,..) AS [STF2_Now]
Note Converting the numeric data to float could lead to approximation issues.. Use Numeric with precision and scale

Related

How do you make a table into one long row in SAS?

I have a table with a number of variables such as:
+-----------+------------+---------+-----------+--------+
| DateFrom | DateTo | Price | Discount | Cost |
+-----------+------------+---------+-----------+--------+
| 01jan17 | 01jul17 | 17 | 4 | 5 |
| 01aug17 | 01feb18 | 15 | 1 | 3 |
| 01mar18 | 01dec18 | 12 | 2 | 1 |
| ... | ... | ... | ... | ... |
+-----------+------------+---------+-----------+--------+
However I want to split this so I have:
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| DateFrom1 | DateTo1 | Price1 | Discount1 | Cost1 | DateFrom2 | DateTo2 | Price2 | Discount2 | Cost2 ... |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| 01jan17 | 01jul17 | 17 | 4 | 5 | 01aug17 | 01feb18 | 15 | 1 | 3 |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
There's a cool (not at all obvious) solution using proc summary and the idgroup statement that only takes a few lines of code. This runs in memory and you're likely to come into problems if the dataset is large, otherwise this works very well.
Note that out[3] relates to the number of rows in the source data. You could easily make this dynamic by adding a prior step that calculates the number of rows and stores it in a macro variable.
/* create initial dataset */
data have;
input (DateFrom DateTo) (:date7.) Price Discount Cost;
format DateFrom DateTo date7.;
datalines;
01jan17 01jul17 17 4 5
01aug17 01feb18 15 1 3
01mar18 01dec18 12 2 1
;
run;
/* transform data into 1 row */
proc summary data=have nway;
output out=want (drop=_:)
idgroup(out[3] (_all_)=) / autoname;
run;

Database schema for product prices which can change daily

I am looking for suggestions to design a database schema based on following requirements.
Product can have variant
each variant can have different price
Price can be different for specific day of a week
Price can be different for a specific time of the day
all prices will have validity for specific dates only
Prices can be defined for peak, high or medium seasons
Suppliers offering any product can define their own prices and above rules still applies
what could be best possible schema where data is easy to retrieve without impacting performance?
Thanks in advance for your suggestions.
Regards
Harmeet
SO isn't really a forum for suggestions as much as answers. No answer anyone gives can be definitively correct. With that said, I would keep the tables as granular as possible to allow for easy changes across products.
In regards to #5, I would place start and end dates on products. If the price is no-longer valid, the product should no longer be available.
This includes relations for different seasonal prices, however you would either need to hardcode the seasons or create another table to define those.
For prices, if this is more than 1 region you may want a regions table in which case a currency column would be appropriate.
This is operational data, not temporal data. If you want it available for historical analysis of pricing you would need to create temporal tables as well.
Product Table
+-------------+-----------+------------------+----------------+
| ProductName | ProductID | ProductStartDate | ProductEndDate |
+-------------+-----------+------------------+----------------+
| Product1 | 1 | 01/01/2017 | 01/01/2018 |
| Product2 | 2 | 01/01/2017 | 01/01/2018 |
+-------------+-----------+------------------+----------------+
Variant Table
+-----------+-----------+-------------+---------------+-------------+-------------+
| ProductID | VariantID | VariantName | NormalPriceID | HighPriceID | PeakPriceID |
+-----------+-----------+-------------+---------------+-------------+-------------+
| 1 | 1 | Blue | 1 | 3 | 5 |
| 1 | 2 | Black | 2 | 4 | 5 |
+-----------+-----------+-------------+---------------+-------------+-------------+
Price Table
+---------+-----+-----+-----+-----+-----+-----+-----+
| PriceID | Mon | Tue | Wed | Thu | Fri | Sat | Sun |
+---------+-----+-----+-----+-----+-----+-----+-----+
| 1 | 30 | 30 | 30 | 30 | 35 | 35 | 35 |
| 2 | 35 | 35 | 35 | 35 | 40 | 40 | 40 |
| 3 | 33 | 33 | 33 | 33 | 39 | 39 | 39 |
| 4 | 38 | 38 | 38 | 38 | 44 | 44 | 44 |
| 5 | 40 | 40 | 40 | 40 | 50 | 50 | 50 |
+---------+-----+-----+-----+-----+-----+-----+-----+

Design a table around matrix table.

So I have a database that is for selling products. The orders tables hold's the customer's order info like shipping province and Shipping method:
+--------------------+----+
| order | |
+--------------------+----+
| someOtherInfo | |
| shippingMethodID | fk |
| shippingProvinceID | fk |
| basePrice | |
+--------------------+----+
Basically what's happen now is we have a base price for shipping that's based on the Province and Shipping Method. I need to now add that base price for shipping to my orders table and the fact that base price for shipping is a Matrix table or 2d array where col(shipping Method) + Row(Province) = baseCost is throwing me off on how to implement it. I never had to deal with this so Don't even know what to look up.
Example of what the matrix looks like:
+--------+----+----+----+-----+
| | BC | AB | SK | etc |
+--------+----+----+----+-----+
| ground | 9 | 9 | 9 | |
| Air | 15 | 21 | 21 | |
| etc | | | | |
+--------+----+----+----+-----+

SQL Database Constraint | Multi-table Constraint

I need to make 2 database constraints that connect two different tables at one time.
1. The total score of the four quarters equals the total score of the game the quarters belong to.
2. The total point of all the players equals to the score of the game of that team.
Here is what my tables look like.
quarter table
+------+--------+--------+--------+
| gNum | Period | hScore | aScore |
+------+--------+--------+--------+
| 1 | 1 | 13 | 18 |
| 1 | 2 | 12 | 19 |
| 1 | 3 | 23 | 31 |
| 1 | 4 | 32 | 18 |
| | | Total | Total |
| | | 80 | 86 |
+------+--------+--------+--------+
Game Table
+-----+--------+--------+--------+
| gID | hScore | lScore | tScore |
+-----+--------+--------+--------+
| 1 | 86 | 80 | 166 |
+-----+--------+--------+--------+
Player Table
+-----+------+--------+--------+
| pID | gNum | Period | Points |
+-----+------+--------+--------+
| 1 | 1 | 1 | 20 |
| | | 2 | 20 |
| | | 3 | 20 |
| | | 4 | 20 |
+-----+------+--------+--------+
So Virtually I need to use CHECK I think to make sure that players points = score of their team ie (hScore, aScore) and also make sure that the hScore and aScore = the total score in the Game table.
I was thinking of creating a foreign key variable on one of the tables and setting up constraints on that would this be the best way of going about it?
Thanks

What database technologies should I consider for building a scalable "running average" view?

We are working on an application where millions of users will be entering information at the same time. Suppose the application allows people to rate geographic regions on where they would like to live. Each participant is allowed to rate each region using a decimal value from 0-10. Each person belongs to one or more groups based upon attributes such as gender, and people that consider themselves active, or enjoy culture.
Every time a rating is made, we need to have a view which shows us the average rating for each region/group. I'm aware that most DB's have an "average" function, but for our purposes we need to be able to use our own function as we may use a the geometric mean instead of the arithmetic mean.
Below are some tables which might be used. Note: I did not include the relationship table PeopleGroups which map which groups a person is a member of for brevity purposes.
Regions People Groups RegionScoresByPerson
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| RID | NAME | | PID | Name | | GID | Name | | RID | PID | Score |
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| 1 | Flordia | | P1 | Alice | | G0 | Everyone | | 1 | P1 | 6 |
| 2 | California | | P2 | Bob | | G1 | Women | | 1 | P2 | 8 |
+-----+------------+ | P3 | Frank | | G2 | Men | | 1 | P3 | 3 |
| P4 | Mary | | G3 | Active | | 1 | P4 | 2 |
+-----+-------+ | G4 | Culture | | 1 | P1 | 7 |
+-----+----------+ | 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
+-----+-----+-------+
Our current implementation uses a similar set of tables for storing ratings, but we don't calculate averages real-time. Anytime we need the results (e.g. show me the average score California for women), we have to pull all the information into memory and run the calculations manually.
I was wondering how I leverage database technologies such as views, triggers, stored procedures, etc. to present to me a simple table that will allow me to get scores by for people and groups so we don't have to manually run calculations.
I would like some table like the following, where everything is handled by the DB. Any insert,update,delete actions on the RegionScoresByPerson or Groups tables would automatically be reflected in this table. If it is not apparent, the rows marked with * calculated rows. In this case I'm using a simple arithmetic average, but I the design should allow for any type of function.
EID stands for entity ID (a person or group)
Besides deciding how to build such a view, I'm unsure of what sort of datatypes to use (and index) for People and Groups. I suppose I'd like the index to be integers, but that would prevent me from creating the table below because I couldn't distinguish between Person 1 and Group 1 -- Would having ID's such as P1 and G1 be a performance hit? I'm obviously concerned about the design being scalable.
ScoreView
+-----------+-----+-------+
| RID | EID | Score |
| 1 | P1 | 6 |
| 1 | P2 | 8 |
| 1 | P3 | 3 |
| 1 | P4 | 2 |
| 1 | P1 | 7 |
| 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
| 1 | G0 | 4.75 |*
| 1 | G1 | 4 |*
| 1 | G2 | … |*
| 1 | G3 | … |*
+-----------+-----+-------+
Apache Flume is the open source tool designed to solve this kind of problem. Also have a look at Google Cloud Dataflow.
https://flume.apache.org/

Resources