CROSS APPLY in Oracle/SQL Server substitute in Snowflake - snowflake-cloud-data-platform

I am looking for the best alternatives on mapping Cross APPLY to SF.
Something like:
select department_name, employee_id, employee_name
from departments d
cross apply (select employee_id, employee_name
from employees e
where salary >= 2000
and e.department_id = d.department_id)
order by 1, 2, 3;

The ANSI SQL equivalent of CROSS APPLY is JOIN LATERAL:
select department_name, employee_id, employee_name
from departments d
join lateral (select employee_id, employee_name
from employees e
where salary >= 2000
and e.department_id = d.department_id)
order by 1, 2, 3;
Output:
and for OUTER APPLY is LEFT JOIN LATERAL () ON TRUE:
select department_name, employee_id, employee_name
from departments d
left join lateral (select employee_id, employee_name
from employees e
where salary >= 2000
and e.department_id = d.department_id) ON TRUE
order by 1, 2, 3;
Output:
For source data:
CREATE OR REPLACE TABLE departments(department_id INT, department_name TEXT,
deparment_location TEXT)
AS
SELECT 1, 'HR', 'London' UNION
SELECT 2, 'SALES', 'Berlin' UNION
SELECT 3, 'RESEARCH', 'Paris';
CREATE OR REPLACE TABLE employees(employee_id INT, employee_name TEXT,
salary INT, department_id INT)
AS
SELECT 100, 'John', 2000, 1 UNION
SELECT 101, 'Anna', 4000, 2;
Related: CROSS/OUTER APPLY in MySQL

What I have seen is that the same results can be achieved with a CROSS JOIN LATERAL
using this testdata from here:
create table departments (
department_id number(2) ,
department_name varchar2(14),
location varchar2(13)
);
insert into departments values (10,'ACCOUNTING','NEW YORK');
insert into departments values (20,'RESEARCH','DALLAS');
insert into departments values (30,'SALES','CHICAGO');
insert into departments values (40,'OPERATIONS','BOSTON');
create table employees (
employee_id number(4) ,
employee_name varchar2(10),
job varchar2(9),
manager_id number(4),
hiredate date,
salary number(7,2),
commission number(7,2),
department_id number(2)
);
insert into employees values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
insert into employees values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
insert into employees values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
insert into employees values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
insert into employees values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
insert into employees values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
insert into employees values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
insert into employees values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
insert into employees values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
insert into employees values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
insert into employees values (7876,'ADAMS','CLERK',7788,to_date('13-6-87', 'dd-mm-yyyy')-51,1100,NULL,20);
insert into employees values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
insert into employees values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
insert into employees values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
As mentioned in the question, the following:
select department_name, employee_id, employee_name
from departments d
cross join lateral (select employee_id, employee_name
from employees e
where salary >= 2000
and e.department_id = d.department_id)
order by 1, 2, 3;
is equivalent, but is it the best option?

Related

SQL subquery with GROUP BY that returns duplicate minimums. how is this interpreted by outer query?

I was working through some subquery questions and code below was provided as the answer.
my question:
if the inner query returns two minimum salaries that are the same, but belong to different departments. how will the outer query interpet this? will it recognize that salaries refer to different departments?
SELECT first_name, last_name, salary, department_id
FROM employees
WHERE salary IN ( SELECT MIN(salary)
FROM employees
GROUP BY department_id );
thank you
No, it does not know anything about the department information. You need to change the IN to a JOIN:
SELECT e.first_name, e.last_name, e.salary, e.department_id
FROM employees e
INNER JOIN ( SELECT department_id,IN(salary) AS salary
FROM employees
GROUP BY department_id) s
ON s.department_id=e.department_id
AND s.salary=e.salary;
As the statement is currently written it is going to show you each record from employees table which matches the minimum salary for each department with the salary itself. So, the outer query doesn't know anything about the department_id, meaning that there is no correlation between this attribute from the inner query with the outer query at all. You would need to change your logic for example to a JOIN to achieve that.
You can use rank() :
select e.*
from (select e.*, rank() over (partition by e.department_id order by e.salary) as seq
from employees e
) e
where e.seq = 1;
You can pass additional information/condition like this for the department.
CREATE TABLE employees (
empname VARCHAR(20)
,salary DECIMAL(18, 2)
,department_id INT
)
INSERT INTO employees
VALUES ('A', 100,1), ('B', 100, 2), ('C', 300, 2)
SELECT *
FROM employees
WHERE salary IN (
SELECT MIN(salary)
FROM employees
GROUP BY department_id
HAVING department_id = 2
)
AND department_id = 2;
Here is the output
empname salary department_id
-----------------------------
B 100.00 2
Although, 100.00 is the minimum salary for both A and B, but you have passed information for department id: 2. So, only B has come in the output.

SQL Query to get Merchant Name with max date

I would like to achieve the below but not sure how to go about it any query pointing in the right direction will be a great help.
Tables: I have three tables below#
Merchant(MerchantId, Name, Date),
MerchantCategory(MerchantId, CategoryId),
Category (CategoryId, Name)
How to return category name, Merchant count,Merchant name with max date
From the requirement I understand that there should be 1 row per category, that the number of merchants should be shown and that the name of the merchant with the most recent date should be shown.
I have prepared a query below that generates some sample data and provides the result intended as I understand it.
The way this works is that the merchant volume is calculated by joining the merchant category table on to the category table and then counting the merchant id's per category. The name is trickier and requires using outer apply that per category (per row) works out the top 1 name in the merchant table ordered by the max(date) desc
I hope this helps, any questions please let me know.
declare #Merchant table (
MerchantId int,
Name nvarchar(25),
Date Date
);
declare #MerchantCategory table (
MerchantId int,
CategoryId int
);
declare #Category table (
CategoryId int,
Name nvarchar(25)
);
insert into #Merchant (MerchantId, Name, Date)
values
(1, 'Lucy', '2019-01-05'),
(2, 'Dave', '2019-01-30'),
(3, 'Daniel' ,'2019-02-01');
insert into #MerchantCategory (MerchantId, CategoryId)
values
(1, 4),
(1, 5),
(2, 4),
(3, 5);
insert into #Category (CategoryId, Name)
values
(4, 'Cat1'),
(5, 'Cat2');
select c. Name, max(m.name) as MaxMerchantName, count(distinct mc2.merchantid) as Merchantvol from #Category c
left join #MerchantCategory mc2 on c.CategoryId=mc2.CategoryId
outer apply (select top 1 name, max(date) as date from #Merchant m inner join #MerchantCategory mc on m.MerchantId=mc.MerchantId where c.CategoryId=mc.CategoryId group by Name order by max(date) desc) m
group by c.Name;
I would have expected to see your efforts..
so am going against SO principles # the moment..
try this:
select c.Name as category_name, count(*) as Merchant_Count, m.Name as Merchant_Name, max(Date) from
Merchant m join MerchantCategory mc
on m.MerchantId = mc.MerchantId
join Category c
on mc.CategoryId = c.CategoryId
group by c.Name, m.Name

Select row with max value with having clause

create table Users
(
ID int primary key,
Username char(13) not null,
Salary int,
DepartmentID int,
PCID int
);
insert into Users values (1, 'Jenson', 180000, 4,12);
insert into Users values (2, 'John', 161000, 2,11);
insert into Users values (3, 'Jack', 150000, 1,10);
insert into Users values (4, 'James', 150000, 3,9);
insert into Users values (5, 'Jeremy', 151000, 3,7);
create table Departments
(
ID int primary key,
Name char(13) not null,
);
insert into Departments values (1, 'Programming');
insert into Departments values (2, 'Supply');
insert into Departments values (3, 'Medicine');
insert into Departments values (4, 'Economic');
insert into Departments values (5, 'Communication');
SELECT
s.dep_id as dep_id, s.Sum_Salary
FROM
(SELECT
d.ID AS dep_id, SUM(u.Salary) AS Sum_Salary
FROM
dbo.users u
INNER JOIN
Departments d ON u.DepartmentID = d.id
GROUP BY
d.ID) s
I can select from Department_id and sum_salary
How can I select row select row with max value of sum_salary? Not using CTE or same ways.
You can use TOP and ORDER BY for this:
SELECT TOP 1
d.ID AS dep_id,
sum(u.Salary) AS Sum_Salary
from dbo.users u
INNER JOIN Departments d ON u.DepartmentID=d.id
GROUP BY d.ID
order by Sum_Salary desc;
It'll return the top 1 row with maximum Sum_salary.
If you just want to find maximum sum_salary, use MAX:
SELECT
MAX(s.Sum_Salary)
FROM
(SELECT
SUM(u.Salary) AS Sum_Salary
FROM
dbo.users u
INNER JOIN
Departments d ON u.DepartmentID = d.id
GROUP BY
d.ID) s
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER( ORDER BY SUM_SALARY DESC) AS RN FROM (SELECT D.ID AS DEP_ID ,SUM(U.SALARY) AS SUM_SALARY FROM DBO.USERS U
INNER JOIN DEPARTMENTS D ON U.DEPARTMENTID=D.ID
GROUP BY D.ID )A
)
SELECT SUM_SALARY, RN
FROM CTE WHERE RN=1
OR
SELECT D.ID AS DEP_ID ,SUM(U.SALARY) AS SUM_SALARY FROM DBO.USERS U
INNER JOIN DEPARTMENTS D ON U.DEPARTMENTID=D.ID
GROUP BY D.ID
HAVING SUM(U.SALARY) = (SELECT TOP 1 SUM(U.SALARY) AS SUM_SALARY FROM DBO.USERS U
INNER JOIN DEPARTMENTS D ON U.DEPARTMENTID=D.ID
GROUP BY D.ID
ORDER BY SUM_SALARY DESC)

Using MAX() with GROUP BY

I have a history table and I want to get the latest modification of one employee.
I have this example, the max always brings one record?
CREATE TABLE EmployeeHIST
(
Id INT PRIMARY KEY,
EmployeeId INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
ModifiedDate DATETIME
)
INSERT INTO EmployeeHIST VALUES (1, 1, 'Jhon', 'Doo', '2013-01-24 23:45:12')
INSERT INTO EmployeeHIST VALUES (2, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
INSERT INTO EmployeeHIST VALUES (3, 1, 'Jhon', 'Doo', '2013-02-24 15:45:12')
SELECT EmployeeId, MAX([ModifiedDate])
FROM EmployeeHIST
WHERE EmployeeId = 1
GROUP BY EmployeeId
Ok yes, you are right, but in case I need to get the Id column for EmployeeId = 1, in this case I will receive two values 2 and 3, so I need to apply a top one right?
The Max() brings one record for each of the combination of values defined in Group By.
In your sample data, yes always one record.
If you were Group By Id,EmployeeId you would get three records as there are three unique combinations of those values.
This also applies for other aggregation functions as Min(), Avg(), Count() etc
UPDATE
If you want to get the id of the record that has the max(date) then you have the following option (there may be better ones):
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT E.Id,E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
SQLFiddle 1
Now, in this case you have both ids 2 and 3 returned. I do not know what is the business requirement here, but i believe you would want only 3 to be returned, so the next is a solution:
;With MyCTE AS
(
SELECT EmployeeId, MAX([ModifiedDate]) AS MaxDate
FROM EmployeeHIST
GROUP BY EmployeeId
)
SELECT MAX(E.Id),E.EmployeeId,ModifiedDate
FROM EmployeeHIST E
JOIN MyCTE M
ON M.EmployeeId = E.EmployeeId
AND M.MaxDate = E.ModifiedDate
WHERE E.EmployeeId = 1
GROUP BY E.EmployeeId,ModifiedDate
SQLFiddle 2

SQL Server 2008 - Get Latest Record from Joined Table

I have a SQL Server 2008 database. This database has two tables called Customer and Order. These tables are defined as follows:
Customer
--------
ID,
First Name,
Last Name
Order
-----
ID,
CustomerID,
Date,
Description
I am trying to write a query that returns all of the customers in my database. If the user has placed at least one order, I want to return the information associated with the most recent order placed. Currently, I have the following:
SELECT
*
FROM
Customer c LEFT OUTER JOIN Order o ON c.[ID]=o.[CustomerID]
As you can imagine, this will return all of the orders associated with a customer. In reality though, I only want the most recent one. How do I do this in SQL?
Thank you!
Here's a method that doesn't assume that the order dates are unique:
SELECT
Customer.ID CustomerID,
Customer.FirstName,
Customer.LastName,
T1.ID OrderID,
T1.Date OrderDate,
T1.Description OrderDescription
FROM Customer
LEFT JOIN (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY Date DESC) AS rn
FROM [Order]
) T1
ON Customer.ID = T1.CustomerID AND T1.rn = 1
Result:
CustomerID FirstName LastName OrderID OrderDate OrderDescription
1 FirstName1 LastName1 2 2010-05-02 Description2
2 FirstName2 LastName2 3 2010-05-03 Description3
3 FirstName3 LastName3 NULL NULL NULL
Test data:
CREATE TABLE Customer (ID INT NOT NULL, FirstName VARCHAR(100) NOT NULL, LastName VARCHAR(100) NOT NULL);
INSERT INTO Customer (ID, FirstName, LastName) VALUES
(1, 'FirstName1', 'LastName1'),
(2, 'FirstName2', 'LastName2'),
(3, 'FirstName3', 'LastName3');
CREATE TABLE [Order] (ID INT NOT NULL, CustomerID INT NOT NULL, Date DATE NOT NULL, Description NVARCHAR(100) NOT NULL);
INSERT INTO [Order] (ID, CustomerID, Date, Description) VALUES
(1, 1, '2010-05-01', 'Description1'),
(2, 1, '2010-05-02', 'Description2'),
(3, 2, '2010-05-03', 'Description3'),
(4, 2, '2010-05-03', 'Description4');
select c.ID, c.FirstName, c.LastName, o.ID as OrderID, o.Date, o.Description
from Customer c
left outer join (
select CustomerID, max(Date) as MaxDate
from Order
group by CustomerID
) om on c.ID = om.CustomerID
left outer join Order o on om.CustomerID = o.CustomerID and om.MaxDate = o.Date
I would use where clause with Max() function to guarantee the latest added record:
(you code...)
Where o.id = max(o.id)
Select * from
Customer C Left join
(
Select o.CustomerID, Description, Date from
Orders o inner join
(
Select CustomerID, Max(Date) as LastOrder
From Orders Group by CustomerID
) SubLatest on o.CustomerID = SubLatest.CustomerID
and o.Date = SubLatest.LastOrder
) SubDetails
on C.id = SubDetails.CustomerID
Not Homework I hope! :O)
Select Top 1 C.*
,O.*
From Customer C left outer join
Order O on O.CustomerId = C.Id
Order by O.[Date] Desc
Hope that helps

Resources