aggregate / count rows grouped by geography (or geometry) - sql-server

I have a table such as:
Create Table SalesTable
( StuffID int identity not null,
County geography not null,
SaleAmount decimal(12,8) not null,
SaleTime datetime not null )
It has a recording of every sale with amount, time, and a geography of the county that the sale happened in.
I want to run a query like this:
Select sum(SaleAmount), County from SalesTable group by County
But if I try to do that, I get:
The type "geography" is not comparable. It cannot be used in the GROUP BY clause.
But I'd like to know how many sales happened per county. Annoyingly, if I had the counties abbreviated (SDC,LAC,SIC, etc) then I could group them because it would simply be a varchar. But then I use the geography datatype for other reasons.

There's a function to work with geography type as char
try this
Select sum(SaleAmount), County.STAsText() from SalesTable
group by County.STAsText()

I would propose a slightly different structure:
create table dbo.County (
CountyID int identity not null
constraint [PK_County] primary key clustered (CountyID),
Name varchar(200) not null,
Abbreviation varchar(10) not null,
geo geography not null
);
Create Table SalesTable
(
StuffID int identity not null,
CountyID int not null
constraint FK_Sales_County foreign key (CountyID)
references dbo.County (CountyID),
SaleAmount decimal(12,8) not null,
SaleTime datetime not null
);
From there, your aggregate looks something like:
Select c.Abbreviation, sum(SaleAmount)
from SalesTable as s
join dbo.County as c
on s.CountyID = c.CountyID
group by c.Abbreviation;
If you really need the geography column in the aggregate, you're a sub-query or a common table expression away:
with s as (
Select c.CountyID, c.Abbreviation,
sum(s.SaleAmount) as [TotalSalesAmount]
from SalesTable as s
join dbo.County as c
on s.CountyID = c.CountyID
group by c.Abbreviation
)
select s.Abbreviation, s.geo, s.TotalSalesAmount
from s
join dbo.County as c
on s.CountyID = s.CountyID;

Related

Is there a way to display top 10 average ratings of a business type?

I have the following tables in my database scheme
CREATE TABLE public.organization_rating
(
rating integer NOT NULL,
user_id integer NOT NULL,
organization_id integer NOT NULL,
CONSTRAINT organization_rating_pkey PRIMARY KEY (user_id, organization_id),
CONSTRAINT user_id FOREIGN KEY (user_id)
REFERENCES public.users (user_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE,
CONSTRAINT stars CHECK (rating >= 1 AND rating < 5)
)
And
CREATE TABLE public.organization
(
org_id integer NOT NULL DEFAULT nextval('organization_org_id_seq'::regclass),
name character varying(90) COLLATE pg_catalog."default" NOT NULL,
description character varying(90) COLLATE pg_catalog."default" NOT NULL,
email text COLLATE pg_catalog."default" NOT NULL,
phone_number character varying COLLATE pg_catalog."default" NOT NULL,
bt_id integer NOT NULL,
bs_id integer NOT NULL,
CONSTRAINT organization_pkey PRIMARY KEY (org_id),
CONSTRAINT bs_id FOREIGN KEY (bs_id)
REFERENCES public.business_step (bs_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID,
CONSTRAINT bt_id FOREIGN KEY (bt_id)
REFERENCES public.business_type (bt_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID
)
I would like to implement a query that gives me the following:
Top 10 organization ratings per business type
Top 10 organizations per business stage
Top 3 organizations with worst rating
Since the queries appear to be similar, I just have to order DESC or ASC, depending on the requirement, I just need one query to work and I will have the other 2. I tried implementing this query:
CREATE TABLE public.organization
(
org_id integer NOT NULL DEFAULT nextval('organization_org_id_seq'::regclass),
name character varying(90) COLLATE pg_catalog."default" NOT NULL,
description character varying(90) COLLATE pg_catalog."default" NOT NULL,
email text COLLATE pg_catalog."default" NOT NULL,
phone_number character varying COLLATE pg_catalog."default" NOT NULL,
bt_id integer NOT NULL,
bs_id integer NOT NULL,
CONSTRAINT organization_pkey PRIMARY KEY (org_id),
CONSTRAINT bs_id FOREIGN KEY (bs_id)
REFERENCES public.business_step (bs_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID,
CONSTRAINT bt_id FOREIGN KEY (bt_id)
REFERENCES public.business_type (bt_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID
)
Here is my select statement:
SELECT O.org_id, O.bt_id, R.rating
FROM public.organization as O
INNER JOIN public.organization_rating as R ON O.org_id = R.organization_id
WHERE bt_id=1
GROUP by org_id, bt_id, rating
ORDER BY ROUND(AVG(rating)) DESC LIMIT 10
But the output is as follows:
There seems to be an error in which various organizations are being duplicated. These are the real average values of the organizations which are being duplicated:
And
Why are the organizations id being duplicated?
Thanks in advance.
The reason you see duplicated records is because the ratings in organization_rating are per user. There can be several users rating an organization. You should first compute an average rating and then join with the organization table.
You can do something like this for bt_id=1:
with average_rating as (
select organization_id as org_id, avg(rating) as avg_rating
from organization_rating r
group by org_id
)
select org_id, bt_id, avg_rating
from average_rating r
join organization o on o.org_id = r.org_id
where r.bt_id = 1
order by avg_rating desc limit 10;
If you want to get all data in a single query, you could use a window function:
with average_rating as (
select organization_id as org_id, avg(rating) as avg_rating
from organization_rating r
group by org_id
),
ordered_data as (
select org_id, bt_id, avg_rating, row_number() over (partition by bt_id order by avg_rating desc) rank
from average_rating r
join organization o on o.org_id = r.org_id
order by bt_id, avg_rating desc
)
select org_id, bt_id, avg_rating
from ordered_data
where rank <= 10
Thanks to mihai_f87, I was able to construct this query:
with average_rating as (
SELECT organization_id as Organization_ID, ROUND(AVG(rating)) as Rating
FROM organization_rating
GROUP BY organization_id
),
ordered_data as (
select org_id, bt_id, rating, row_number() over (partition by bt_id order by rating desc) rank
from average_rating r
join organization o on o.org_id = r.organization_id
where bt_id = 1
order by bt_id, rating desc
)
select org_id, bt_id, rating
from ordered_data
where rank <= 10
With this query, I was able to search top 10 organizations per business type.
Output

SQL Server 2016 Compare values from multiple columns in multiple rows in single table

USE dev_db
GO
CREATE TABLE T1_VALS
(
[SITE_ID] [int] NULL,
[LATITUDE] [numeric](10, 6) NULL,
[UNIQUE_ID] [int] NULL,
[COLLECT_RANK] [int] NULL,
[CREATED_RANK] [int] NULL,
[UNIQUE_ID_RANK] [int] NULL,
[UPDATE_FLAG] [int] NULL
)
GO
INSERT INTO T1_VALS
(SITE_ID,LATITUDE,UNIQUE_ID,COLLECT_RANK,CREATED_RANK,UNIQUEID_RANK)
VALUES
(207442,40.900470,59664,1,1,1)
(207442,40.900280,61320,1,1,2)
(204314,40.245220,48685,1,2,2)
(204314,40.245910,59977,1,1,1)
(202416,39.449530,9295,1,1,2)
(202416,39.449680,62264,1,1,1)
I generated the COLLECT_RANK and CREATED_RANK columns from two date columns (not shown here) and the UNIQUEID_RANK column from the UNIQUE_ID which is used here.
I used a SELECT OVER clause with ranking function to generate these columns. A _RANK value of 1 means the latest date or greatest UNIQUE_ID value. I thought my solution would be pretty straight forward using these rank values via array and cursor processing but I seem to have painted myself into a corner.
My problem: I need to choose LONGITUDE value and its UNIQUE_ID based upon the following business rules and set the update value, (1), for that record in its UPDATE_FLAG column.
Select the record w/most recent Collection Date (i.e. RANK value = 1) for a given SITE_ID. If multiple records exist w/same Collection Date (i.e. same RANK value), select the record w/most recent Created Date (RANK value =1) for a given SITE_ID. If multiple records exist w/same Created Date, select the record w/highest Unique ID for a given SITE_ID (i.e. RANK value = 1).
Your suggestions would be most appreciated.
I think you can use top and order by:
select top 1 t1.*
from t1_vals
order by collect_rank asc, create_rank, unique_id desc;
If you want this for sites, which might be what your question is asking, then use row_number():
select t1.*
from (select t1.*,
row_number() over (partition by site_id order by collect_rank asc, create_rank, unique_id desc) as seqnum
from t1_vals
) t1
where seqnum = 1;

Self Join on large tables slowness issue

I have two tables like...
table1 (cid, duedate, currency, value)
main_table1 (cid)
My query is like below, I am find out co-relation between each cid and table1 contains 3 million records(cid and duedate column is compositely unique) and main_table contains 1500 records all unique.
SELECT
b.cid, c.cid,
(COUNT(*) * SUM(b.value * c.value) -
SUM(b.value) * SUM(c.value)) /
(SQRT(COUNT(*) * SUM(b.value * b.value) -
SUM(b.value) * SUM(b.value)) *
SQRT(COUNT(*) * SUM(c.value * c.value) -
SUM(c.value) * SUM(c.value))
) AS correl_ij
FROM
main_table1 a
JOIN
table1 AS b ON a.cid = b.cid
JOIN
table1 AS c ON b.cid < c.cid
AND b.duedate = c.duedate
AND b.currency = c.currency
GROUP BY
b.cid, c.cid
Please suggest how to optimize this query because it is running slow.
CREATE TABLE #table1(
id int identity,
cid int NOT NULL,
duedate date NOT NULL,
currency char(3) NOT NULL,
value float,
PRIMARY KEY(id,currency,cid,duedate)
);
CREATE TABLE #main_table1(
cid int NOT NULL PRIMARY KEY,
currency char(3)
);
--#main table contains 155000 cid records there is no duplicate values
insert into #main_table1
values(19498,'ABC'),(19500,'ABC'),(19534,'ABC')
INSERT INTO #table1(CID,DUEDATE,currency,value)
VALUES(19498,'2016-12-08','USD',-0.0279702098021799) ,
(19498,'2016-12-12','USD',0.0151285161000268),
(19498,'2016-12-15','USD',-0.00965080868337728),
(19498,'2016-12-19','USD',0.00808331709091531)
There are 3 million records in this table for diffrent dates and cid and most of the cid are present in #main_table1.
I am using a.cid < b.cid to remove duplicate relationship between a.cid and b.cid beause i am deriving corelation between each cid.
so 19498 -->>19500 corelation is calculated hence then i do not want 19500--> 19498 because it would be same but duplicate.
That PK is silly. Why would you include Iden in a composite PK let alone in the first position? Drop Iden unless you have to have it for some misguided reason.
PRIMARY KEY(cid, currency, duedate)
Or the natural key if different
If you're commonly joining or sorting on the cid column, you probably want a clustered index on that column or a composite beginning with that column.
If cid, duedate is unique then you can consider removing the id altogether.
If you want to retain id for some reason, make it PRIMARY KEY NONCLUSTERED, and specify a clustered index on cid, duedate.

SQL Server table design to define WHERE condition

I have an existing Stored procedure which has lots of hard-coding with IF conditions. The procedure checks the values of following input fields and displays relevant message: The fields are:
• BrandId
• ProductId
• SchemeId
• RegionId
The existing Message table:
MsgId MsgText
1 AAAA
2 BBBB
3 CCCC
4 MMMM
Existing stored proc. pseudo code:
IF(BrandId in (5,10))
IF(#ProductId in (5))
SELECT ‘BBBB’ as MsgText
END IF
END IF
IF(SchemeId in (1,5,10))
SELECT ‘AAAA’ as MsgText
IF(SchemeId =2 AND #RegionId=4)
SELECT ‘BBBB’ as MsgText
IF (#RegionId=6)
SELECT ‘MMMM’ as MsgText
In order to remove hard-coding and re-writing the procedure cleanly from scratch, I want to design new tables which will store "MsgId"s against a BrandId/ProdId/PlanId/SchemeId value or against a combination of these fields (e.g SchemeId =2 AND RegionId=4).With this kind of design I can directly fetch the relevant MsgId against a specific field or combination of fields.
Could anybody suggest table designs to meet the requirement?
Based on your responses to the comments, this might work out.
create table dbo.[Messages] (
MessageId int not null
, MessageText nvarchar(1024) not null
, constraint pk_Messages primary key clustered (MessageId)
);
insert into dbo.[Messages] (MessageId,MessageText) values
(1,'AAAA')
, (2,'BBBB')
, (13,'MMMM');
create table dbo.Messages_BrandProduct (
BrandId int not null
, ProductId int not null
, MessageId int not null
, constraint pk_Messages_BrandProduct primary key clustered
(BrandId, ProductId, MessageId)
);
insert into dbo.Messages_BrandProduct (BrandId, ProductId, MessageId) values
(5,5,2)
,(10,5,2);
create table dbo.Messages_SchemeRegion (
SchemeId int not null
, RegionId int not null
, MessageId int not null
, constraint pk_Messages_SchemeRegion primary key clustered
(SchemeId, RegionId, MessageId)
);
insert into dbo.Messages_SchemeRegion (SchemeId, RegionId, MessageId)
select SchemeId = 1, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 5, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 10, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 2, RegionId = 4, MessageId = 2
union all
select SchemeId , RegionId = 6, MessageId = 13 from dbo.Schemes;
In your procedure you could pull the messages like this:
select MessageId
from dbo.Messages_BrandProduct mbp
inner join dbo.[Messages] m on mbp.MessageId=m.MessageId
where mbp.BrandId = #BrandId and mbp.ProductId = #ProductId
union -- union all if you don't need to deduplicate messages
select MessageId
from dbo.Messages_SchemeRegion msr
inner join dbo.[Messages] m on msr.MessageId=m.MessageId
where msr.SchemeId = #SchemeId and msr.RegionId = #RegionId;
This should do it.
CREATE TABLE [dbo].[IDs](
[BrandID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[SchemeID] [int] NOT NULL,
[RegionID] [int] NOT NULL,
[MsgID] [int] NOT NULL
)
You can adjust the table and column names as needed. Cheers.

TSQL to insert a set of rows and dependent rows

I have 2 tables:
Order (with a identity order id field)
OrderItems (with a foreign key to order id)
In a stored proc, I have a list of orders that I need to duplicate. Is there a good way to do this in a stored proc without a cursor?
Edit:
This is on SQL Server 2008.
A sample spec for the table might be:
CREATE TABLE Order (
OrderID INT IDENTITY(1,1),
CustomerName VARCHAR(100),
CONSTRAINT PK_Order PRIMARY KEY (OrderID)
)
CREATE TABLE OrderItem (
OrderID INT,
LineNumber INT,
Price money,
Notes VARCHAR(100),
CONSTRAINT PK_OrderItem PRIMARY KEY (OrderID, LineNumber),
CONSTRAINT FK_OrderItem_Order FOREIGN KEY (OrderID) REFERENCES Order(OrderID)
)
The stored proc is passed a customerName of 'fred', so its trying to clone all orders where CustomerName = 'fred'.
To give a more concrete example:
Fred happens to have 2 orders:
Order 1 has line numbers 1,2,3
Order 2 has line numbers 1,2,4,6.
If the next identity in the table was 123, then I would want to create:
Order 123 with lines 1,2,3
Order 124 with lines 1,2,4,6
On SQL Server 2008 you can use MERGE and the OUTPUT clause to get the mappings between the original and cloned id values from the insert into Orders then join onto that to clone the OrderItems.
DECLARE #IdMappings TABLE(
New_OrderId INT,
Old_OrderId INT)
;WITH SourceOrders AS
(
SELECT *
FROM Orders
WHERE CustomerName = 'fred'
)
MERGE Orders AS T
USING SourceOrders AS S
ON 0 = 1
WHEN NOT MATCHED THEN
INSERT (CustomerName )
VALUES (CustomerName )
OUTPUT inserted.OrderId,
S.OrderId INTO #IdMappings;
INSERT INTO OrderItems
SELECT New_OrderId,
LineNumber,
Price,
Notes
FROM OrderItems OI
JOIN #IdMappings IDM
ON IDM.Old_OrderId = OI.OrderID

Resources