MSSQL DISTINCT again

MSSQL DISTINCT again - sql-server

This query work perfectly on MySQL, but I should rewrite to work with MSSQL and this doesn't work
SELECT DISTINCT TOP 20 [UF].[id], [UF].[created], [Company].[name]
FROM [user_functions] AS [UF]
LEFT JOIN [companies] AS [Company] ON ([Company].[code] = [UF].[company_code])
WHERE [UF].[user_id] = 8923 AND [UF].[state] != 500
ORDER BY [UF].[created] DESC
This query return duplicated rows, even i set DISTINCT.
But, when remove [Company].[name] from SELECT it's return correctly.
I would like using many fields from [Company] and [UF] tables.

You can try row_number and get the first rownum as below:
select * from
(
SELECT DISTINCT TOP 20 [UF].[id], [UF].[created], [Company].[name],
RowNum = row_number() over(partition by [Company].[name] order by [UF].[ID])
FROM [user_functions] AS [UF]
LEFT JOIN [companies] AS [Company] ON ([Company].[code] = [UF].[company_code])
WHERE [UF].[user_id] = 8923 AND [UF].[state] != 500
) a where RowNum = 1
order by a.Created Desc

Related

Return only rows from the joined table with the latest date

By running the following query I realized that I have duplicates on the column QueryExecutionId.
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.workflowdefinitionid AS FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN [WorkflowInstances] wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;
E.g. When I test with one of these QueryExecutionIds, I can two results
select * from ##Send
where QueryExecutionId = 169237
We realized the reason is that these two rows have a different FlowId (second returned value in the first query). After discussing this issue, we decided to take the record with a FlowId that has the latest date. This date is a column called lastexecutiontime that sits in the third joined table [WorkflowInstances] which is also the table where FlowId comes from.
How do I only get unique values of QueryExecutionId with the latest value of WorkflowInstances.lastexecution time and remove the duplicates?

You can use a derived table with first_value partitioned by workflowinstanceid ordered by lastexecutiontime desc:
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;

Please note that your distinct query is pertaining to the selected variables,
eg. Data 1 (QueryExecutionId = 169237 and typename = test 1)
    Data 2 (QueryExecutionId = 169237 and typename = test 2)
The above 2 data are considered as distinct
Try partition by and selection the [seq] = 1 (the below code are partition by their date)
SELECT *
into #Send
FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY [QueryExecutionid] ORDER BY [Date] DESC) [Seq]
FROM
(
SELECT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date], --should not have any null values
qe.[customer_idhash]
c.typename AS [Type],
c.name As Name
FROM [QueryExecutions] qe
JOIN [Campaign] c
ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica
ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
) a
WHERE [customer_idhash] IS NOT NULL
) b
WHERE [Seq] = 1
ORDER BY [QueryExecutionid]

Getting most recent date from multiple SQL columns

The suggested answer, in this post, works great for two columns.
I have about 50 different date columns, where I need to be able to report on the most recent interaction, regardless of table.
In this case, I am bringing the columns in to a view, since they are coming from different tables in two different databases...
CREATE VIEW vMyView
AS
SELECT
comp_name AS Customer
, Comp_UpdatedDate AS Last_Change
, CmLi_UpdatedDate AS Last_Communication
, Case_UpdatedDate AS Last_Case
, AdLi_UpdatedDate AS Address_Change
FROM Company
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
...
My question is, how I would easily account for the many possibilities of one column being greater than the others?
Using only the two first columns, as per the example above, works great. But considering that one row could have column 3 as the highest value, another row could have column 14 etc...
SELECT Customer, MAX(CASE WHEN (Last_Change IS NULL OR Last_Communication> Last_Change)
THEN Last_Communication ELSE Last_Change
END) AS MaxDate
FROM vMyView
GROUP BY Customer
So, how can I easily grab the highest value for each row in any of the 50(ish) columns?
I am using SQL Server 2008 R2, but I also need this to work in versions 2012 and 2014.
Any help would be greatly appreciated.
EDIT:
I just discovered that the second database is storing the dates in NUMERIC fields, rather than DATETIME. (Stupid! I know!)
So I get the error:
The type of column "ARCUS" conflicts with the type of other columns specified in the UNPIVOT list.
I tried to resolve this with a CAST to make it DATETIME, but that only resulted in more errors.
;WITH X AS
(
SELECT Customer
,Value [Date]
,ColumnName [Entity]
,BusinessEmail
,ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Value DESC) rn
FROM (
SELECT comp_name AS Customer
, Pers_EmailAddress AS BusinessEmail
, Comp_UpdatedDate AS Company
, CmLi_UpdatedDate AS Communication
, Case_UpdatedDate AS [Case]
, AdLi_UpdatedDate AS [Address]
, PLink_UpdatedDate AS Phone
, ELink_UpdatedDate AS Email
, Pers_UpdatedDate AS Person
, oppo_updateddate as Opportunity
, samdat.dbo.ARCUS.AUDTDATE AS ARCUS
FROM vCompanyPE
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
LEFT JOIN PhoneLink on Comp_CompanyId = PLink_RecordID
LEFT JOIN EmailLink on Comp_CompanyId = ELink_RecordID
LEFT JOIN vPersonPE on Comp_CompanyId = Pers_CompanyId
LEFT JOIN Opportunity on Comp_CompanyId = Oppo_PrimaryCompanyId
LEFT JOIN Orders on Oppo_OpportunityId = Orde_opportunityid
LEFT JOIN SAMDAT.DBO.ARCUS on IDCUST = Comp_IdCust
COLLATE Latin1_General_CI_AS
WHERE Comp_IdCust IS NOT NULL
AND Comp_deleted IS NULL
) t
UNPIVOT (Value FOR ColumnName IN
(
Company
,Communication
,[Case]
,[Address]
,Phone
,Email
,Person
,Opportunity
,ARCUS
)
)up
)
SELECT Customer
, BusinessEmail
,[Date]
,[Entity]
FROM X
WHERE rn = 1 AND [DATE] >= DATEADD(year,-2,GETDATE()) and BusinessEmail is not null

You could use CROSS APPLY to manually pivot your fields, then use MAX()
SELECT
vMyView.*,
greatest.val
FROM
vMyView
CROSS APPLY
(
SELECT
MAX(val) AS val
FROM
(
SELECT vMyView.field01 AS val
UNION ALL SELECT vMyView.field02 AS val
...
UNION ALL SELECT vMyView.field50 AS val
)
AS manual_pivot
)
AS greatest
The inner most query will pivot each field in to a new row, then the MAX() re-aggregate them back in to a single row. (Also skipping NULLs, so you don't need to explicitly cater for them.)

;WITH X AS
(
SELECT Customer
,Value [Date]
,ColumnName [CommunicationType]
,ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Value DESC) rn
FROM (
SELECT comp_name AS Customer
, Comp_UpdatedDate AS Last_Change
, CmLi_UpdatedDate AS Last_Communication
, Case_UpdatedDate AS Last_Case
, AdLi_UpdatedDate AS Address_Change
FROM Company
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
) t
UNPIVOT (Value FOR ColumnName IN (Last_Change,Last_Communication,
Last_Case,Address_Change))up
)
SELECT Customer
,[Date]
,[CommunicationType]
FROM X
WHERE rn = 1

select records After top 1000 rows

I want to select records from 1000 to 2000 rows and so on in batch of 1000.
I have written query to select top 1000 records but how can i select from 1000 to 2000.
can you help me with a query that can select those records.
SELECT TOP 1000 *
FROM tblProductInformation p1 INNER JOIN tblProduct1 p
ON p.productname = p1.productname

I think you need to order on specific a column, for example order on the primary key.
SELECT *
FROM
(
SELECT tbl.*, p.*, ROW_NUMBER() OVER (ORDER BY ProductID_PRIMARYKEY) rownum
FROM tblProductInformation as tbl INNER JOIN tblProduct1 p
ON p.productname = p1.productname
) seq
WHERE seq.rownum BETWEEN 1000 AND 2000

WITH cte AS(
SELECT ROW_NUMBER()OVER(Order By p1.productname ASC, p1.ID ASC) As RowNum
,p1 .*
from tblProductInformation p1
inner join tblProduct1 p on p.productname = p1.productname
)
SELECT * FROM cte
WHERE RowNum BETWEEN #FromRowNum AND #ToRowNum
ROW_NUMBER: http://msdn.microsoft.com/en-us/library/ms186734.aspx
Paging on SQL-Server: http://www.mssqltips.com/sqlservertip/1175/page-through-sql-server-results-with-the-rownumber-function/

WITH Results AS (
select TOP 1000 f.*, ROW_NUMBER() OVER (ORDER BY f.[type]) as RowNumber
from tblProductInformation f
) select *
from Results
where RowNumber between 1001 and 2000
tutorial

Answer Late .. but could be helpful for some one coming here ... simple one
Another simple approach ...
You can create a similar table "tblProductInformation_tmp" OR #tblProductInformation_tmp - with an extra column "UniqueID" and make that auto-increment IDENTITY column.
then just insert the same data to table :
insert * into tblProductInformation_tmp
select * from tblProductInformation
Now its simple ryt :
select * from tblProductInformation_tmp where UniqueID < 1001
select * from tblProductInformation_tmp where UniqueID between 1001 and 2001
:) Dont forget to delete : tblProductInformation_tmp
Rigin

SQL Server: join on derived table that contains WITH clause?

I'd like to join on a subquery / derived table that contains a WITH clause (the WITH clause is necessary to filter on ROW_NUMBER() = 1). In Teradata something similar would work fine, but Teradata uses QUALIFY ROW_NUMBER() = 1 instead of a WITH clause.
Here is my attempt at this join:
-- want to join row with max StartDate on JobModelID
INNER JOIN (
WITH AllRuns AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY JobModelID ORDER BY StartDate DESC) AS RowNumber
FROM Runs
)
SELECT * FROM AllRuns WHERE RowNumber = 1
) Runs
ON JobModels.JobModelID = Runs.JobModelID
What am I doing wrong?

You could use multiple WITH clauses. Something like
;WITH AllRuns AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY JobModelID ORDER BY StartDate DESC) AS RowNumber
FROM Runs
),
Runs AS(
SELECT *
FROM AllRuns
WHERE RowNumber = 1
)
SELECT *
FROM ... INNER JOIN (
Runs ON JobModels.JobModelID = Runs.JobModelID
For more detail on the usages/structure/rules see WITH common_table_expression (Transact-SQL)

Adding a join condition is probably less efficient, but usually works fine for me.
INNER JOIN (
SELECT *,
ROW_NUMBER() OVER
(PARTITION BY JobModelID
ORDER BY StartDate DESC) AS RowNumber
FROM Runs
) Runs
ON JobModels.JobModelID = Runs.JobModelID
AND Runs.RowNumber = 1

select top 1 with a group by

I have two columns:
namecode name
050125 chris
050125 tof
050125 tof
050130 chris
050131 tof
I want to group by namecode, and return only the name with the most number of occurrences. In this instance, the result would be
050125 tof
050130 chris
050131 tof
This is with SQL Server 2000

I usually use ROW_NUMBER() to achieve this. Not sure how it performs against various data sets, but we haven't had any performance issues as a result of using ROW_NUMBER.
The PARTITION BY clause specifies which value to "group" the row numbers by, and the ORDER BY clause specifies how the records within each "group" should be sorted. So partition the data set by NameCode, and get all records with a Row Number of 1 (that is, the first record in each partition, ordered by the ORDER BY clause).
SELECT
i.NameCode,
i.Name
FROM
(
SELECT
RowNumber = ROW_NUMBER() OVER (PARTITION BY t.NameCode ORDER BY t.Name),
t.NameCode,
t.Name
FROM
MyTable t
) i
WHERE
i.RowNumber = 1;

select distinct namecode
, (
select top 1 name from
(
select namecode, name, count(*)
from myTable i
where i.namecode = o.namecode
group by namecode, name
order by count(*) desc
) x
) as name
from myTable o

SELECT max_table.namecode, count_table2.name
FROM
(SELECT namecode, MAX(count_name) AS max_count
FROM
(SELECT namecode, name, COUNT(name) AS count_name
FROM mytable
GROUP BY namecode, name) AS count_table1
GROUP BY namecode) AS max_table
INNER JOIN
(SELECT namecode, COUNT(name) AS count_name, name
FROM mytable
GROUP BY namecode, name) count_table2
ON max_table.namecode = count_table2.namecode AND
count_table2.count_name = max_table.max_count

I did not try but this should work,
select top 1 t2.* from (
select namecode, count(*) count from temp
group by namecode) t1 join temp t2 on t1.namecode = t2.namecode
order by t1.count desc

Here are to examples that you could use but the temp table use is more efficient than the view, but was done on a small data sample. You would want to check your own statistics.
--Creating A View
GO
CREATE VIEW StateStoreSales AS
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank'
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
GO
SELECT * FROM StateStoreSales
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP VIEW StateStoreSales
---Using a Temp Table
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank' INTO #TEMP
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
SELECT * FROM #TEMP
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP TABLE #TEMP

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MSSQL DISTINCT again - sql-server

Related

Return only rows from the joined table with the latest date

Getting most recent date from multiple SQL columns

select records After top 1000 rows

SQL Server: join on derived table that contains WITH clause?

select top 1 with a group by

Categories

Resources