SQL Query to get Merchant Name with max date - sql-server

I would like to achieve the below but not sure how to go about it any query pointing in the right direction will be a great help.
Tables: I have three tables below#
Merchant(MerchantId, Name, Date),
MerchantCategory(MerchantId, CategoryId),
Category (CategoryId, Name)
How to return category name, Merchant count,Merchant name with max date

From the requirement I understand that there should be 1 row per category, that the number of merchants should be shown and that the name of the merchant with the most recent date should be shown.
I have prepared a query below that generates some sample data and provides the result intended as I understand it.
The way this works is that the merchant volume is calculated by joining the merchant category table on to the category table and then counting the merchant id's per category. The name is trickier and requires using outer apply that per category (per row) works out the top 1 name in the merchant table ordered by the max(date) desc
I hope this helps, any questions please let me know.
declare #Merchant table (
MerchantId int,
Name nvarchar(25),
Date Date
);
declare #MerchantCategory table (
MerchantId int,
CategoryId int
);
declare #Category table (
CategoryId int,
Name nvarchar(25)
);
insert into #Merchant (MerchantId, Name, Date)
values
(1, 'Lucy', '2019-01-05'),
(2, 'Dave', '2019-01-30'),
(3, 'Daniel' ,'2019-02-01');
insert into #MerchantCategory (MerchantId, CategoryId)
values
(1, 4),
(1, 5),
(2, 4),
(3, 5);
insert into #Category (CategoryId, Name)
values
(4, 'Cat1'),
(5, 'Cat2');
select c. Name, max(m.name) as MaxMerchantName, count(distinct mc2.merchantid) as Merchantvol from #Category c
left join #MerchantCategory mc2 on c.CategoryId=mc2.CategoryId
outer apply (select top 1 name, max(date) as date from #Merchant m inner join #MerchantCategory mc on m.MerchantId=mc.MerchantId where c.CategoryId=mc.CategoryId group by Name order by max(date) desc) m
group by c.Name;

I would have expected to see your efforts..
so am going against SO principles # the moment..
try this:
select c.Name as category_name, count(*) as Merchant_Count, m.Name as Merchant_Name, max(Date) from
Merchant m join MerchantCategory mc
on m.MerchantId = mc.MerchantId
join Category c
on mc.CategoryId = c.CategoryId
group by c.Name, m.Name

Related

Why does my record count blow up with a left join?

If I run this:
Select *
FROM RAW_DATA
WHERE Portfolio like '%deposit%'
I get 131047 records. Now, I join to another table, and I want all records from RAW_DATA and matches from another table, like this:
Select *
FROM RAW_DATA AS RawData LEFT OUTER JOIN
DATAHIST AS HIST ON RawData.Parse2 = HIST.CONTACT_ID AND RawData.AsofDate = HIST.ASOFDATE
WHERE RawData.Portfolio like '%deposit%'
Now, my count blows up to 158745. If I want everything from Raw_Data and only matches from DATAHIST, how do I create the join line? There are only a couple options here.
Will I have to count rows, and select where rn = 1?
Determine which of the HIST data has repetition (duplicates) in your join key (the combination of id and date). You should also check for duplicates in the RAW data. Each key with duplicates will cause an n x m affect on the selection
drop table #RAW
drop table #HIST
create table #RAW (contact_id int, asofdate date);
insert into #RAW values
(1, '2017-01-01'),
(2, '2017-01-01')
create table #HIST (contact_id int, asofdate date, balance int);
insert into #HIST values
(1, '2017-01-01', 100), (1, '2017-01-01', 150), (1, '2017-01-02', 200),
(2, '2017-01-01', 125)
--select * from #RAW
--select * from #HIST
-- find duplicates in HIST
select distinct contact_id, asofdate, count(*) as occur_count
from #HIST
group by contact_id, asofdate
having count(*) > 1
-- because of the duplicates the result has 3 rows instead of the 2 you might be expecting
select raw.contact_id, raw.asofdate, hist.balance
from #raw raw left join #hist hist
on raw.contact_id = hist.contact_id and raw.asofdate = hist.asofdate
You probably want to clean that data or reduce it (per dlatikay) via aggregation or further select condition
A history table with only 1 row for any source row would be very unhelpful because a history table usually holds all history for each row of source data. So you should expect the number of rows to expand.
What I suspect you want is "the most recent entry" of history, and for such a need it will help to number the rows before joining, like this:
SELECT
*
FROM RAW_DATA AS rawdata
LEFT OUTER JOIN (
SELECT
*
, row_number() (PARTITION BY CONTACT_ID
ORDER BY ASOFDATE DESC) AS rn
DATAHIST
) AS hist ON rawdata.Parse2 = hist.CONTACT_ID
AND hist.rn = 1
WHERE rawdata.Portfolio LIKE '%deposit%'
So if there is more that one row in history for any row of rawdata, it will only permit joining to the most recent matching row in the history table.
Vary the order by to affect which rows are joined. e.g. by changing to ascending order you would get the "earliest" history instead of the "latest". If the asofdate column is sufficient add others as tie breakers e.g. order by asofdate desc, ID desc

An SQL query to give cheapest product by each supplier

I need to find the cheapest product from each supplier.
The tables I have are tb_supplier, tb_consumer, tb_offers, tb_requests, tb_transactions, and tb_products.
I have the following code that currently only shows the cheapest product, it's supplier, and the amount. How do I find the cheapest for EVERY supplier?
SELECT Tb_Product.Name, Tb_Supplier.Name, Tb_Offers.Price as 'Price'
FROM Tb_Product, Tb_Supplier, Tb_Offers
WHERE Tb_Product.Prod_ID = Tb_Offers.Prod_ID
AND Tb_Offers.Supp_ID = Tb_Supplier.Supp_ID
AND Tb_Offers.Price = (SELECT Min(Tb_Offers.Price)
FROM Tb_Offers)
Thanks
Assuming a product is offered by several suppliers and each supplier offers several products we have following schema.
declare #prod table(id int, name varchar(50))
declare #supplier table(id int, name varchar(50))
declare #offer table(prodId int, supplierId int, price float)
--populate tables
insert #prod (id,name) values
(1,'A'),(2,'B'),(3,'C')
insert #supplier (id,name) values
(1,'A'),(2,'B'),(3,'C')
insert #offer(prodId,supplierId,price) values
(1,1,10),(1,2,8),(2,1,12),(2,3,11),(3,1,15),(3,3,20)
-- use common table expression to sort offers by price
;with cte as (
select prodId, supplierId, price,
row_number() over(partition by prodId order by price) rn
from #offer
)
select prodId, p.name, supplierId, s.name, price --beautify with names
from cte
inner join #prod p on cte.prodId=p.id
inner join #supplier s on cte.supplierId=s.id
where rn = 1 --only first (cheapest) price
--result returned
prodId name supplierId name price
1 A 2 B 8
2 B 3 C 11
3 C 1 A 15
Side note: Comma separated list of tables syntax considered obsolete according to ANSI92. Just 25 years ago.

Select all records for customers where mindate in 2015

I want to select all records for customers whose first order is from 2015. I want any orders they placed after 2015 too, but I DON'T want the records for customers whose first order was in 2016. I am ultimately trying to find the percentage of people who order more than twice, but I want to exclude the customers who were new in 2016.
This doesn't work because 'mindate' is an invalid column name but I'm not sure why or how else to try it.
Select
od.CustomerID, OrderID, OrderDSC, OrderDTS
From
OrderDetail OD
Join
(Select
OrderID, Min(orderdts) as mindate
From
OrderDetail
Where
mindate Between '2015-1-1' and '2015-12-31'
Group By Orderid) b on od.OrderID = b.OrderID
Because execution phases - it's seqency how is qry evaluated and by engine. In where clause your mindate not yet exists.
You can change mindate by orderdts:
select OrderID, min(orderdts) as mindate
from OrderDetail
where orderdts between '2015-1-1' and '2015-12-31'
group by Orderid
Second option is to use having statement - it's evaluated after group by.
What I di was select the distinct CustomerIDs that fall in between your daterange and did a left join with the table so it filters out anyone that doesn't fall in between your daterange.
SELECT * FROM
(Select DISTINCT(CustomerID) as CustomerID
FROM OrderDetail WHERE OrderDTS between '2015-1-1' AND '2015-12-31') oIDs
LEFT JOIN
OrderDetail OD
ON oIDs.CustomerID = OD.CustomerID
Try using the EXISTS clause. It is basically a sub-query. Below is an example you should be able to adapt.
create table Test (Id int, aDate datetime)
insert Test values (1,'04/04/2014')
insert Test values (1,'05/05/2015')
insert Test values (1,'06/06/2016')
insert Test values (2,'04/30/2016')
insert Test values (3,'02/27/2014')
select t.* from Test t
where
aDate>='01/01/2015'
and exists(select * from Test x where x.Id=t.Id and x.aDate >='01/01/2015' and x.aDate<'01/01/2016')
I don't know the orderdts data type but if it is datetime orders on 2015-12-31 will not be included (unless the order date is 2015-12-31 00:00:00.000. Note how this will skip the first record:
DECLARE #orders TABLE (CustomerID INT, orderDate DATETIME);
INSERT #orders VALUES (1, '2015-12-31 00:00:01.000'), (1, '2015-12-30'), (2, '2015-01-04');
SELECT * FROM #orders WHERE orderDate BETWEEN '2015-01-01' AND '2015-12-31';
In this case you would want the WHERE clause filter to look like:
WHERE orderDate BETWEEN '2015-01-01 00:00:00.000' AND '2015-12-31 23:59:59.999';
Or
WHERE CAST(orderDate AS date) BETWEEN '2015-01-01' AND '2015-12-31';
(the first example will almost certainly perform better).
Now, using this sample data:
-- Sample data
CREATE TABLE #LIST (LISTName varchar(10) NOT NULL);
INSERT #LIST
SELECT TOP (100) LEFT(newid(), 8)
FROM sys.all_columns a, sys.all_columns b;
-- You will probably want LISTName to be indexed
CREATE NONCLUSTERED INDEX nc_LISTName ON #LIST(LISTName);
You can implement Paul's solution like this:
DECLARE #LIST_Param varchar(8) = 'No List';
SELECT LISTName
FROM
(
SELECT distinct LISTName
FROM #LIST
UNION ALL
SELECT 'No List'
WHERE (SELECT COUNT(DISTINCT LISTName) FROM #LIST) < 1000000
) Distinct_LISTName
WHERE (#LIST_Param = 'No List' or #LIST_Param = LISTName);
Alternatively you can do this:
DECLARE #LIST_Param varchar(8) = 'No List';
WITH x AS
(
SELECT LISTName, c = COUNT(*)
FROM #LIST
WHERE (#LIST_Param = 'No List' or #LIST_Param = LISTName)
GROUP BY LISTName
),
c AS (SELECT s = SUM(c) FROM x)
SELECT LISTName
FROM x CROSS JOIN c
WHERE s < 1000000;

How to add a row before first row of every group in SQL?

I want to add a row before every Customer group in my table which is ordered using the Customer ID. Is it possible to do so using FIRST_VALUE() or there is some other technique?
declare #customer table (id int, name varchar(500));
insert into #customer (id, name) values
(1, 'Client1'),
(2, 'Client2'),
(3, 'Client3'),
(4, 'Client4');
select
c.id, c.name, 2 ord
from
#customer c
union all
select
c.id, 'some new value before', 1 ord
from
#customer c
order by
id, ord
;

Using MAX() with GROUP BY

I have a history table and I want to get the latest modification of one employee.
I have this example, the max always brings one record?
CREATE TABLE EmployeeHIST
(
Id INT PRIMARY KEY,
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
ModifiedDate DATETIME
)
INSERT INTO EmployeeHIST VALUES (1, 1, 'Jhon', 'Doo', '2013-01-24 23:45:12')
INSERT INTO EmployeeHIST VALUES (2, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
INSERT INTO EmployeeHIST VALUES (3, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
SELECT EmployeeId, MAX([ModifiedDate])
FROM EmployeeHIST
WHERE EmployeeId = 1
GROUP BY EmployeeId
Ok yes, you are right, but in case I need to get the Id column for EmployeeId = 1, in this case I will receive two values 2 and 3, so I need to apply a top one right?
The Max() brings one record for each of the combination of values defined in Group By.
In your sample data, yes always one record.
If you were Group By Id,EmployeeId you would get three records as there are three unique combinations of those values.
This also applies for other aggregation functions as Min(), Avg(), Count() etc
UPDATE
If you want to get the id of the record that has the max(date) then you have the following option (there may be better ones):
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT E.Id,E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
SQLFiddle 1
Now, in this case you have both ids 2 and 3 returned. I do not know what is the business requirement here, but i believe you would want only 3 to be returned, so the next is a solution:
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT MAX(E.Id),E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
GROUP BY E.EmployeeId,ModifiedDate
SQLFiddle 2

Resources