Grouping Column Results by Result Name - snowflake-cloud-data-platform

Grouping Column Results by Result Name - snowflake-cloud-data-platform

In Snowflake, I am trying to get a specific column to group results by result names to reduce the number of duplicate rows reported. Example: instead of getting 1,227 rows in Column ItemName of "New Client Session", I would like this condensed to 1 row when all other data in the row match.
Below is the query I've constructed, but can't get to work.
SELECT
StudioID,
StudioName,
ItemName,
ServiceCategory,
RevenueCategory,
SecondRevenueCategory
FROM
BV_SalesOverview
WHERE
SecondRevenueCategory <> 'Null'
ORDER BY
ItemName
Any ideas/options are greatly appreciated.
I've tried using GROUP BY, but can't seem to get it right. I continue to get errors.

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.

If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here

You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1

If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

how to add a total column to a table in sql server

I am trying to sum my column named target where measured_component is equal to a specific condition and add it to my table but am having trouble. Ultimately I want to add 4 new rows for the 4 conditions to my current table with all the columns null except for the time_value which would be the month for each total based on the condition.
I am using the below query.
select sum(TARGET) as TARGET_TOTAL
from REF_targets
where MEASURED_COMPONENT ='dispatch'
or MEASURED_COMPONENT='acknoweledge'
or MEASURED_COMPONENT= 'DRIVE'
or MEASURED_COMPONENT= 'ENROUTE'
group by TIME_VALUE
When I have the conditions grouped, I get a crazy number for my sum, but if I create separate queries I get the correct total.
select time_value
, sum(TARGET) as TARGET_TOTAL
from REF_targets
where MEASURED_COMPONENT ='dispatch'
group by TIME_VALUE
I cant select all with this query because I keep getting an error saying that I need to add ALL the columns to the group by which ultimately gives me a mirror of the data I already have for target just in a new column.
Please help,
Thanks!

You get a large number because you don't put MEASURED_COMPONENT in the GROUP BY. This should give you sum for each MEASURED_COMPONENT.
select TIME_VALUES, MEASURED_COMPONENT, sum(TARGET) as TARGET_TOTAL
from REF_targets
where MEASURED_COMPONENT ='dispatch'
or MEASURED_COMPONENT='acknoweledge'
or MEASURED_COMPONENT= 'DRIVE'
or MEASURED_COMPONENT= 'ENROUTE'
group by TIME_VALUES, MEASURED_COMPONENT

What did I do wrong with this subquery for SQL Server?

I've got a table called tblEventLocationStock. It stores sales information for stock at a certain location and event. I'm trying to get a list of items that have a different starting count than the end count from the previous event. I've got this query, but I get the "subquery returned more than 1 value" error:
SELECT ID,EventID,LocationID,StockID,StartQty,UnitPrice,PhysicalSalesQty,PhysicalSalesValue,PhysicalEndQty,TillSoldQty,TillSoldValue
FROM tblEventLocationStock ELS
where StartQty <> (
select PhysicalEndQty from tblEventLocationStock ELSO
where ELS.StockID=ELSO.StockID
and ELS.LocationID=ELSO.LocationID
and ELS.EventID=(ELSO.EventID+1000))
ORDER BY ID desc
I use ELS.EventID=ELSO.EventID+1000 because the event ID's go up in intervals of 1000.
What's odd is that even though I get the "subquery returned more than 1 value" error, I still get 10 rows in the results tab. Those 10 results do appear to have a different starting count for the items than the same item at the same location from the previous event. Also, I get no results if I use an order by, but I still get 10 results if I don't use an order by.
What's even more odd is that I get those 10 results if I run the query with some joins to some other tables so I can get names of the stock items and locations instead of just IDs, but if I do it without the joins, I get no results.

Try This,
SELECT ID, EventID, LocationID, StockID, StartQty, UnitPrice, PhysicalSalesQty,
PhysicalSalesValue, PhysicalEndQty, TillSoldQty, TillSoldValue
FROM tblEventLocationStock ELS
WHERE NOT EXISTS (
SELECT 1
FROM tblEventLocationStock ELSO
WHERE ELS.StockID = ELSO.StockID AND
ELS.StartQty <> ELSO.PhysicalEndQty AND
ELS.LocationID = ELSO.LocationID AND
ELS.EventID = (ELSO.EventID+1000)
)
ORDER BY ID DESC

Difference between duplicate check if using Distinct and Group by with aggregate

Okay it has been quite some time since I have used SQL Server very intensively for writing queries.
There has to be some gotcha that I am missing.
As per my understanding the following two queries should return the same number of duplicate records
SELECT COUNT(INVNO)
, INVNO
FROM INVOICE
GROUP BY INVNO
HAVING COUNT(INVNO) > 1
ORDER BY INVNO
SELECT DISTINCT invno
FROM INVOICE
ORDER BY INVNO
There are no null values in INVNO
Where could I be possible going wrong?

Those queries will not return same results. First one will only give you INVNO values that have duplicates, second will give all unique INVNO values, even if they appear only once in entire table.

the group by query will filter our all the single invoices while the distinct will simply pick one from every invoice. First query is a subset of the second

In addition to what Adam said, the GROUP BY will sort the data on the GROUPed columns.

Grouping by single column but returning all the columns without including other columns in aggregate function

I am working on an SQL query which should group by a column bidBroker and return all the columns in the table.
I tried it using the following query
select Product,
Term,
BidBroker,
BidVolume,
BidCP,
Bid,
Offer,
OfferCP,
OfferVolume,
OfferBroker,
ProductID,
TermID
from canadiancrudes
group by BidBroker
The above query threw me an error as follows
Column 'canadiancrudes.Product' is invalid in the select list because it is not contained in either an aggregate function or the
GROUP BY clause.
Is there any other way which returns all the data grouping by bidBroker without changing the order of data coming from CanadadianCrudes?

First if you are going to agregate, you should learn about agregate functions.
Then grouping becomes much more obvious.
I think you should explain what you are trying to accomplish here, because I suspect that you are trying to SORT bu Bidbroker, rather than grouping.

If you mean you want to sort by BidBroker, you can use:
SELECT Product,Term,BidBroker,BidVolume,BidCP,Bid,Offer,OfferCP,OfferVolume,OfferBroker,ProductID,TermID
FROM canadiancrudes
ORDER BY BidBroker
If you want to GROUP BY, and give example-data you can use:
SELECT c1.Product,c1.Term,c1.BidBroker,c1.BidVolume,c1.BidCP,c1.Bid,c1.Offer,c1.OfferCP,c1.OfferVolume,c1.OfferBroker,c1.ProductID,c1.TermID
FROM canadiancrudes c1
WHERE c1.YOURPRIMARYKEY IN (
select MIN(c2.YOURPRIMARYKEY) from canadiancrudes c2 group by c2.BidBroker
)
Replace YOURPRIMARYKEY with your column with your row-unique id.

As others have said, don't use "group by" if you don't want to aggregate something. If you do want to aggregate by one column but include others as well, consider researching "partition."

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Grouping Column Results by Result Name - snowflake-cloud-data-platform

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

how to add a total column to a table in sql server

What did I do wrong with this subquery for SQL Server?

Difference between duplicate check if using Distinct and Group by with aggregate

Grouping by single column but returning all the columns without including other columns in aggregate function

Categories

Resources