Performance loss in sql left join when using int - sql-server

I have a question about left join performance.
Supose I have this 2 tables:
CREATE TABLE "Jobs" (
"Id" INT NOT NULL,
"Department" VARCHAR(25) NULL,
"Job" VARCHAR(50) NOT NULL,
PRIMARY KEY ("Id", "JobName")
);
CREATE TABLE "Workers" (
"Id" INT NOT NULL,
"JobID" INT NOT NULL,
"WorkerName" VARCHAR(25) NULL,
"WorkerSurname" VARCHAR(25) NULL,
"JobName" VARCHAR(50) NOT NULL,
PRIMARY KEY ("Id")
);
Now, I want to left join both tables in order to get all the jobs on an specific department for an specific worker, even if the worker does´t take that job.
select t1.job, t1.department, t2.WorkerName, t2.WorkerSurname
from (SELECT distinct job, department FROM Jobs WHERE DEpartment= #depto) t1
left join dbo.Workers t2 on t1.id=t2.Jobid and t2.WorkerName=#NAME
This sql tooks about 0,171ms.
But, if I join with "Job" instead of "id":
select t1.job, t1.department, t2.WorkerName, t2.WorkerSurname
from (SELECT distinct job, department FROM Jobs WHERE DEpartment= #depto) t1
left join dbo.Workers t2 on t1.Job=t2.JobName and t2.WorkerName=#NAME
I tooks about 0,030 ms.
Can anyone explain me why is this happening? i thought integer joins were faster than varchar ones
Thanks

Related

Select all Main table rows with detail table column constraints with GROUP BY

I've 2 tables tblMain and tblDetail on SQL Server that are linked with tblMain.id=tblDetail.OrderID for orders usage. I've not found exactly the same situation in StackOverflow.
Here below is the sample table design:
/* create and populate tblMain: */
CREATE TABLE tblMain (
ID int IDENTITY(1,1) NOT NULL,
DateOrder datetime NULL,
CONSTRAINT PK_tblMain PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblMain (DateOrder) VALUES('2021-05-20T12:12:10');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-21T09:13:13');
INSERT INTO tblMain (DateOrder) VALUES('2021-05-22T21:30:28');
GO
/* create and populate tblDetail: */
CREATE TABLE tblDetail (
ID int IDENTITY(1,1) NOT NULL,
OrderID int NULL,
Gencod VARCHAR(255),
Quantity float,
Price float,
CONSTRAINT PK_tblDetail PRIMARY KEY
(
ID ASC
)
)
GO
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '1234567890123', 8, 12.30);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(1, '5825867890321', 2, 2.88);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '7788997890333', 1, 1.77);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9882254656215', 3, 5.66);
INSERT INTO tblDetail (OrderID, Gencod, Quantity, Price) VALUES(3, '9665464654654', 4, 10.64);
GO
Here is my SELECT with grouping:
SELECT tblMain.id,SUM(tblDetail.Quantity*tblDetail.Price) AS TotalPrice
FROM tblMain LEFT JOIN tblDetail ON tblMain.id=tblDetail.orderid
WHERE (tblDetail.Quantity<>0) GROUP BY tblMain.id;
GO
This gives:
The wished output:
We see that id=2 is not shown even with LEFT JOIN, as there is no records with OrderID=2 in tblDetail.
How to design a new query to show tblMain.id = 2? Mean while I must keep WHERE (tblDetail.Quantity<>0) constraints. Many thanks.
EDIT:
The above query serves as CTE (Common Table Expression) for a main query that takes into account payments table tblPayments again.
After testing, both solutions work.
In my case, the main table has 15K records, while detail table has some millions. With (tblDetail.Quantity<>0 OR tblDetail.Quantity IS NULL) AND tblDetail.IsActive=1 added on JOIN ON clause it takes 37s to run, while the first solution of #pwilcox, the condition being added on the where clause, it ends up on 29s. So a gain of time of 20%.
tblDetail.IsActive column permits me ignore detail rows that is temporarily ignored by setting it to false.
So the for me it's ( #pwilcox's answer).
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)
Change
WHERE (tblDetail.Quantity<>0)
to
where (tblDetail.quantity <> 0 or tblDetail.quantity is null)
as the former will omit id = 2 because the corresponding quantity would be null in a left join.
And as HABO mentions, you can also make the condition a part of your join logic as opposed to your where statement, avoiding the need for the 'or' condition.
select m.id,
totalPrice = sum(d.quantity * d.price)
from tblMain m
left join tblDetail d
on m.id = d.orderid
and d.quantity <> 0
group by m.id;

Is there a way to display top 10 average ratings of a business type?

I have the following tables in my database scheme
CREATE TABLE public.organization_rating
(
rating integer NOT NULL,
user_id integer NOT NULL,
organization_id integer NOT NULL,
CONSTRAINT organization_rating_pkey PRIMARY KEY (user_id, organization_id),
CONSTRAINT user_id FOREIGN KEY (user_id)
REFERENCES public.users (user_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE,
CONSTRAINT stars CHECK (rating >= 1 AND rating < 5)
)
And
CREATE TABLE public.organization
(
org_id integer NOT NULL DEFAULT nextval('organization_org_id_seq'::regclass),
name character varying(90) COLLATE pg_catalog."default" NOT NULL,
description character varying(90) COLLATE pg_catalog."default" NOT NULL,
email text COLLATE pg_catalog."default" NOT NULL,
phone_number character varying COLLATE pg_catalog."default" NOT NULL,
bt_id integer NOT NULL,
bs_id integer NOT NULL,
CONSTRAINT organization_pkey PRIMARY KEY (org_id),
CONSTRAINT bs_id FOREIGN KEY (bs_id)
REFERENCES public.business_step (bs_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID,
CONSTRAINT bt_id FOREIGN KEY (bt_id)
REFERENCES public.business_type (bt_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID
)
I would like to implement a query that gives me the following:
Top 10 organization ratings per business type
Top 10 organizations per business stage
Top 3 organizations with worst rating
Since the queries appear to be similar, I just have to order DESC or ASC, depending on the requirement, I just need one query to work and I will have the other 2. I tried implementing this query:
CREATE TABLE public.organization
(
org_id integer NOT NULL DEFAULT nextval('organization_org_id_seq'::regclass),
name character varying(90) COLLATE pg_catalog."default" NOT NULL,
description character varying(90) COLLATE pg_catalog."default" NOT NULL,
email text COLLATE pg_catalog."default" NOT NULL,
phone_number character varying COLLATE pg_catalog."default" NOT NULL,
bt_id integer NOT NULL,
bs_id integer NOT NULL,
CONSTRAINT organization_pkey PRIMARY KEY (org_id),
CONSTRAINT bs_id FOREIGN KEY (bs_id)
REFERENCES public.business_step (bs_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID,
CONSTRAINT bt_id FOREIGN KEY (bt_id)
REFERENCES public.business_type (bt_id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
NOT VALID
)
Here is my select statement:
SELECT O.org_id, O.bt_id, R.rating
FROM public.organization as O
INNER JOIN public.organization_rating as R ON O.org_id = R.organization_id
WHERE bt_id=1
GROUP by org_id, bt_id, rating
ORDER BY ROUND(AVG(rating)) DESC LIMIT 10
But the output is as follows:
There seems to be an error in which various organizations are being duplicated. These are the real average values of the organizations which are being duplicated:
And
Why are the organizations id being duplicated?
Thanks in advance.
The reason you see duplicated records is because the ratings in organization_rating are per user. There can be several users rating an organization. You should first compute an average rating and then join with the organization table.
You can do something like this for bt_id=1:
with average_rating as (
select organization_id as org_id, avg(rating) as avg_rating
from organization_rating r
group by org_id
)
select org_id, bt_id, avg_rating
from average_rating r
join organization o on o.org_id = r.org_id
where r.bt_id = 1
order by avg_rating desc limit 10;
If you want to get all data in a single query, you could use a window function:
with average_rating as (
select organization_id as org_id, avg(rating) as avg_rating
from organization_rating r
group by org_id
),
ordered_data as (
select org_id, bt_id, avg_rating, row_number() over (partition by bt_id order by avg_rating desc) rank
from average_rating r
join organization o on o.org_id = r.org_id
order by bt_id, avg_rating desc
)
select org_id, bt_id, avg_rating
from ordered_data
where rank <= 10
Thanks to mihai_f87, I was able to construct this query:
with average_rating as (
SELECT organization_id as Organization_ID, ROUND(AVG(rating)) as Rating
FROM organization_rating
GROUP BY organization_id
),
ordered_data as (
select org_id, bt_id, rating, row_number() over (partition by bt_id order by rating desc) rank
from average_rating r
join organization o on o.org_id = r.organization_id
where bt_id = 1
order by bt_id, rating desc
)
select org_id, bt_id, rating
from ordered_data
where rank <= 10
With this query, I was able to search top 10 organizations per business type.
Output

How to Join 4 tables in SQL

What I'm trying to do is to select all 4 of those tables with join but i can't figure it out how because there isn't a table connected to all of the others.
create table Encomenda(
idEncomenda int identity,
idFornededor int not null,
estado varchar not null,
Constraint pk_Encomenda Primary Key (idEncomenda),
);
create table Produto_Encomenda(
idProduto_Encomenda int identity,
idProduto int not null,
idEncomenda int not null,
quantidade int not null,
constraint pk_Produto_Encomenda Primary Key (idProduto_Encomenda),
constraint fk_Produto foreign key (idProduto) references Produto (idProduto) ,
constraint fk_idEncomenda foreign key (idEncomenda) references Encomenda (idEncomenda) ,
);
create table Fornecedor(
idFornecedor int identity,
nomeFornecedor varchar(60) not null,
moradaFornecedor varchar(60) not null,
contactoFornecedor int not null,
constraint pk_Fornecedor Primary Key (idFornecedor),
);
create table Produto(
idProduto int identity,
nomeProduto varchar(60) not null,
quantidadeExistenteProduto int not null,
precoUnidade float not null,
Constraint pk_produto Primary Key (idProduto),
);
I was trying to make a join between the 4 of them and what I would like to show/select are:
Fornecedor.nomeFornecedor, idEncomenda, Produto.nomeProduto and Produto_encomenda.quantidade" joined toguether where
Produto.idproduto = produto_Encomenda.idproduto
Fornecedor.idFornecedor = Encomenda.idFornecedor
I don't think I can explain better but in the end I wanted to select a table that containsFornecedor.nomeFornecedor, idEncomenda, Produto.nomeProduto and Produto_encomenda.quantidade, but because the 4 tables dont have 1 common table im lost in how to make the join :\ im probably just tired as hell but if someone could help me i would apreciatte cuz im so lost here
Ok, now that I think I better understand the question you need the following fields: Fornecedor.nomeFornecedor, idEncomenda, Produto.nomeProduto and Produto_encomenda.quantidade.
So, let's see if this works:
SELECT f.nomeFornecedor,
e.idEncomenda,
p.nomeProduto,
pe.quantidade
FROM Fornecedor as f
INNER JOIN Encomenda AS e
ON f.idFornecedor = e.idFornededor
INNER JOIN Produto_Encomenda as pe
ON e.idEncomenda = pe.idEncomenda
INNER JOIN Produto as p
ON p.idProduto = pe.idProduto
I think this should work
You join tables using the 'JOIN' statement. There are four types:
INNER - Only join where a match is found.
LEFT - Only join where a match is found in the right hand table, but join the whole of the left.
RIGHT - Only join where a match is found in the left hand table, but join the whole of the right.
FULL OUTER - Join both tables together, even where no match is found.
A basic JOIN goes like this:
INNER JOIN MyTable ON MyTable.ID = SomeTable.ID
You should read this.
Hope it helps!

SQL Query for Outer Join with Group By

I have the Months Table with MonthName, MonthNumber and Fiscal Year starts with July so I have assigned the values to the months like
MonthName=July and MonthNumber=1
MonthName=August and MonthNumber=2.
I have another Domain table BudgetCategory and it has BudgetCategoryId, BudgetCategoryName.
The PurchaseOrder table has OrderID, PurchaseMonth, BudgetCategoryId.
Now I want the query to find out the Monthly Purchases SUM(TotalCost) for every BudgetCategory. If there are no purchases for any BudgetCategoryId I want to display the zero in report.
Schema of Table:
CREATE TABLE [dbo].[BudgetCategory](
[BudgetCategoryId] [numeric](18, 0) NOT NULL,
[BudgetCategoryName] [varchar](50) NULL,
[TotalBudget] [nvarchar](50) NULL)
CREATE TABLE [dbo].[PurchaseOrder](
[OrderId] [bigint] NOT NULL,
[BudgetCategoryId] [bigint] NULL,
[PurchaseMonth] [nvarchar](50) NULL,
[QTY] [bigint] NULL,
[CostPerItem] [decimal](10, 2) NULL,
[TotalCost] [decimal](10, 2) NULL)
CREATE TABLE [dbo].[MonthTable](
[MonthNumber] [bigint] NULL,
[MonthName] [nvarchar](30) NULL)
Try this:
select a.BudgetCategoryName,
ISNULL(c.MonthName,'No purchase') as Month,
sum(ISNULL(TotalCost,0)) as TotalCost
from #BudgetCategory a left join #PurchaseOrder b on a.BudgetCategoryId = b.BudgetCategoryId
left join #MonthTable c on b.PurchaseMonth = c.[MonthName]
group by a.BudgetCategoryName,c.MonthName
order by a.BudgetCategoryName
Tested with this data
INSERT #BudgetCategory
VALUES (1,'CategoryA',1000),
(2,'CategoryB',2000),
(3,'CategoryC',1500),
(4,'CategoryD',2000)
INSERT #PurchaseOrder (OrderId,BudgetCategoryId,TotalCost,PurchaseMonth)
VALUES (1,1,550,'July'),
(2,1,700,'July'),
(3,2,600,'August')
INSERT #MonthTable
VALUES
(1,'July'),
(2,'August')
It will produce this results:
Let me know if this could help you
SELECT b.*, m.MonthNumber, q.[BudgetCategoryId], q.[PurchaseMonth], ISNULL(q.[TotalCost],0)
FROM [dbo].[BudgetCategory] b
LEFT JOIN
(
SELECT [BudgetCategoryId], [PurchaseMonth], sum([TotalCost]) [TotalCost]
FROM [dbo].[PurchaseOrder] p
GROUP BY p.[BudgetCategoryId], [PurchaseMonth]
) q ON b.BudgetCategoryId = q.BudgetCategoryId
LEFT JOIN [dbo].[MonthTable] m ON q.[PurchaseMonth] = m.[MonthName]

Speed up view performance

I have an old view that takes 4 mins to run, I have been asked to speed it up. The FROM looks like this:
FROM TableA
CROSS JOIN ViewA
INNER JOIN TableB on ViewA.Name = TableB.Name
AND TableA.Code = TableB.Code
AND TableA.Location = TableB.Location
WHERE (DATEDIFF(m, ViewA.SubmitDate, GETDATE()) = 1) -- Only pull last months rows
Table A has around 99k rows, ViewA has around 2000 rows and TableB has around 101K rows. I think the problem is at the INNER JOIN because it I remove it, the query takes 1 second.
My first thought was to see if I could down the number of rows in ViewA by breaking the whole thing into CTEs but this made zero impact. I am thinking I need to index TableB, because it is just a bunch of varchars being used in the joins. I am now changing it to temp tables so I can index it. I can not change the underlying tables and views. Is index temp tables a good way to go, or is there a better solution.
Edit to add info regarding existing indexes. Only thing with an index on it right now is TableA.Id which is the PK and a clustered Index. TableB has an Id field but it is not the PK. ViewA is not indexed.
Edit again to correct some structure. SubmitDate is in the View, not the table.
Here is a very basic structure:
CREATE TABLE TableA
(
Id int NOT NULL PRIMARY KEY,
Section varchar(20) NULL,
Code varchar(20) NULL
)
CREATE TABLE TableB
(
Id int NOT NULL PRIMARY KEY,
Name varchar(20) NULL,
Code varchar(20) NULL,
Section varchar(20) NULL
)
CREATE TABLE TableC
(
Id int NOT NULL PRIMARY KEY,
Name varchar(20) NULL,
SubmitDate DateTime NOT NULL
)
CREATE TABLE TableD
(
Id int NOT NULL PRIMARY KEY,
Section varchar(20) NULL
)
CREATE VIEW ViewA
AS
SELECT c.Section, d.Name, c.SubmitDate
FROM TableC c
JOIN TableD d ON a.Id = b.Id
One improovement is to rewrite where clause into sargable clause. Add index to SubmitDate if there is no index and change query to:
FROM TableA
CROSS JOIN ViewA
INNER JOIN TableB on ViewA.Name = TableB.Name
AND TableA.Code = TableB.Code
AND TableA.Location = TableB.Location
WHERE
TableA.SubmitDate >=DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE())-1,0)
And TableA.SubmitDate < Dateadd(DAY, 1, DATEADD(MONTH,
DATEDIFF(MONTH, -1, GETDATE())-1, -1) )
Also add nonclustered indexes on Name, Code and Location columns.

Resources