SQL Server - Insert into with select and union - duplicates being inserted - sql-server

When I execute my "select union select", I get the correct number or rows (156)
Executed independently, select #1 returns 65 rows and select #2 returns 138 rows.
When I use this "select union select" with an Insert into, I get 203 rows (65+138) with duplicates.
I would like to know if it is my code structure that is causing this issue ?
INSERT INTO dpapm_MediaObjectValidation (mediaobject_id, username, checked_date, expiration_date, notified)
(SELECT FKMediaObjectId, CreatedBy,#checkdate,dateadd(ww,2,#checkdate),0
FROM dbo.gs_MediaObjectMetadata
LEFT JOIN gs_MediaObject mo
ON gs_MediaObjectMetadata.FKMediaObjectId = mo.MediaObjectId
WHERE UPPER([Description]) IN ('CAPTION','TITLE','AUTHOR','DATE PHOTO TAKEN','KEYWORDS')
AND FKMediaObjectId >=
(SELECT TOP 1 MediaObjectId
FROM dbo.gs_MediaObject
WHERE DateAdded > #lastcheck
ORDER BY MediaObjectId)
GROUP BY FKMediaObjectId, CreatedBy
HAVING count(*) < 5
UNION
SELECT FKMediaObjectId, CreatedBy,getdate(),dateadd(ww,2,getdate()),0
FROM gs_MediaObjectMetadata yt
LEFT JOIN gs_MediaObject mo
ON yt.FKMediaObjectId = mo.MediaObjectId
WHERE UPPER([Description]) = 'KEYWORDS'
AND FKMediaObjectId >=
(SELECT TOP 1 MediaObjectId
FROM dbo.gs_MediaObject
WHERE DateAdded > #lastcheck
ORDER BY MediaObjectId)
AND NOT EXISTS
(
SELECT *
FROM dbo.fnSplit(Replace(yt.Value, '''', ''''''), ',') split
WHERE split.item in (SELECT KeywordEn FROM gs_Keywords) or split.item in (SELECT KeywordFr FROM gs_Keywords)
)
)
I would appreciate any clues into resolving this problem ...
Thank you !

The UNION keyword should only return distinct records between the two queries. However, if I recall correctly, this is only true if the datatypes are the same. The date variables might be throwing that off. Depending on the collation type, whitespace might be handled differently as well. You might want to do a SELECT DISTINCT on the dpapm_MediaObjectValidation table after doing your insert, but be sure to trim whitespace from both sides in your comparison. Another approach is to do your first insert, then on your second insert, forgo the UNION altogether and do a manual EXISTS check to see if the items to be inserted already exist.

Related

SQL stored procedure for picking a random sample based on multiple criteria

I am new to SQL. I looked for all over the internet for a solution that matches the problem I have but I couldn't find any. I have a table named 'tblItemReviewItems' in an SQL server 2012.
tblItemReviewItems
Information:
1. ItemReviewId column is the PK.
2. Deleted column will have only "Yes" and "No" value.
3. Audited column will have only "Yes" and "No" value.
I want to create a stored procedure to do the followings:
Pick a random sample of 10% of all ItemReviewId for distinct 'UserId' and distinct 'ReviewDate' in a given date range. 10% sample should include- 5% of the total population from Deleted (No) and 5% of the total population from Deleted (Yes). Audited ="Yes" will be excluded from the sample.
For example – A user has 118 records. Out of the 118 records, 17 records have Deleted column value "No" and 101 records have Deleted column value "Yes". We need to pick a random sample of 12 records. Out of those 12 records, 6 should have Deleted column value "No" and 6 should have Deleted column value "Yes".
Update Audited column value to "Check" for the picked sample.
How can I achieve this?
This is the stored procedure I used to pick a sample of 5% of Deleted column value "No" and 5% of Deleted column value "Yes". Now the situation is different.
ALTER PROC [dbo].[spItemReviewQcPickSample]
(
#StartDate Datetime
,#EndDate Datetime
)
AS
BEGIN
WITH CTE
AS (SELECT ItemReviewId
,100.0
*row_number() OVER(PARTITION BY UserId
,ReviewDate
,Deleted
order by newid()
)
/count(*) OVER(PARTITION BY UserId
,Reviewdate
,Deleted
)
AS pct
FROM tblItemReviewItems
WHERE ReviewDate BETWEEN #StartDate AND #EndDate
AND Deleted in ('Yes','No')
AND Audited='No'
)
SELECT a.*
FROM tblItemReviewItems AS a
INNER JOIN cte AS b
ON b.ItemReviewId=a.ItemReviewId
AND b.pct<=6
;
WITH CTE
AS (SELECT ItemReviewId
,100.00
*row_number() OVER(PARTITION BY UserId
,ReviewDate
,Deleted
ORDER BY newid()
)
/COUNT(*) OVER(PARTITION BY UserId
,Reviewdate
,Deleted
)
AS pct
FROM tblItemReviewItems
WHERE ReviewDate BETWEEN #StartDate AND #EndDate
AND deleted IN ('Yes','No')
AND audited='No'
)
UPDATE a
SET Audited='Check'
FROM tblItemReviewItems AS a
INNER JOIN cte AS b
ON b.ItemReviewId=a.ItemReviewId
AND b.pct<=6
;
END
Any help would be highly appreciated. Thanks in advance.
This may assist you in getting started. My idea is, you create the temp tables you need, and load the specific data into the (deleted, not deleted etc.). You then run something along the lines of:
IF OBJECT_ID('tempdb..#tmpTest') IS NOT NULL DROP TABLE #tmpTest
GO
CREATE TABLE #tmpTest
(
ID INT ,
Random_Order INT
)
INSERT INTO #tmpTest
(
ID
)
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 12 UNION ALL
SELECT 13 UNION ALL
SELECT 14 UNION ALL
SELECT 15 UNION ALL
SELECT 16;
DECLARE #intMinID INT ,
#intMaxID INT;
SELECT #intMinID = MIN(ID)
FROM #tmpTest;
SELECT #intMaxID = MAX(ID)
FROM #tmpTest;
WHILE #intMinID <= #intMaxID
BEGIN
UPDATE #tmpTest
SET Random_Order = 10 + CONVERT(INT, (30-10+1)*RAND())
WHERE ID = #intMinID;
SELECT #intMinID = #intMinID + 1;
END
SELECT TOP 5 *
FROM #tmpTest
ORDER BY Random_Order;
This assigns a random number to a column, that you then use in conjunction with a TOP 5 clause, to get a random top 5 selection.
Appreciate a loop may not be efficient, but you may be able to update to a random number without it, and the same principle could be implemented. Hope that gives you some ideas.

Issue while using 'case when ' in 'where' clause sql server

While using case when in where clause in sql query it's not working.
Problem :
I have two tables named TblEmployee and TblAssociate.Both tables contains common columns PeriodId, EmpId and AssociateId. My requirement is to fetch values from
TblEmployee with combination of EmpId and AssociateId from TblAssociate should be excluded.And the exclusion should be based on PeriodId condition.`
If(#PeriodID<50)
BEGIN
SELECT *
FROM TblEmployee
WHERE (EmpId+AssociateId) NOT IN (SELECT EmpId+AssociateId FROM TblAssociate)
END
ELSE
BEGIN
SELECT *
FROM TblEmployee
WHERE (EmpId) NOT IN (SELECT EmpId FROM TblAssociate)
END
The above code is working, but I need to avoid that IF-ELSE condition and I wish to use 'case when' in where clause.Please help
Try this:
SELECT *
FROM TblEmployee
WHERE (EmpId + CASE WHEN #PeriodID<50 THEN AssociateId ELSE 0 END) NOT IN
(SELECT EmpId + CASE WHEN #PeriodID<50 THEN AssociateId ELSE 0 END FROM TblAssociate)
You say your code is working but this is rather odd, since it doesn't make much sense to add together id values. In any case, the above statement produces a result that is equivalent to the code originally posted.
You could use AND-OR combination in the WHERE clause. Additionally, you should not be using + as it may lead to incorrect result. You can rewrite your query as:
SELECT e.*
FROM TblEmployee e
WHERE
(
#PeriodID < 50
AND NOT EXISTS(
SELECT 1
FROM TblAssociate a
WHERE
a.EmpId = e.EmpId
AND a.AssociateId = e.AssociateId
)
)
OR
(
#PeriodID >= 50
AND NOT EXISTS(
SELECT 1
FROM TblAssociate a
WHERE a.EmpId = e.EmpId
)
)
The addition of IDs do not guarantee uniqueness. For instance, if EmpId is 5 and AssociateId is 6, then EmpId + AssociateId = 11, while EmpId + AssociateId = 11 even if EmpId is 6 and AssociateId is 5. In the query below, I made sure that the subquery will stop searching when the first record is found and will return a single record, having the value of 1. We select the employee if and only if 1 is among the results. In the subquery we check the operand we are sure of first and then check if we are not in a period where AssociateId must be checked, or it matches.
select *
from TblEmployee
where 1 in (select top 1 1
from TblAssociate
where TblEmployee.EmpId = TblAssociate.EmpId and
(#PeriodID >= 50 or TblEmployee.AssociateId = TblAssociate.AssociateId))

Write the output of a select into a table

I have this statement:
insert into Admin.VersionHistory --do not know what to put here
select COUNT(*) as cnt
from membership.members as mm
left join aspnet_membership as asp
on mm.aspnetuserid=asp.userid
left join trade.tradesmen as tr
on tr.memberid=mm.memberid
where asp.isapproved = 0 and tr.ImportDPN IS NOT NULL and tr.importDPN <> ''
and it gives me a total of 179956. I want to write this total to another table called Admin.VersionHistory which has id(autoinc), version(varchar) and date(sysdate) columns,
How can I do this please?
Thanks
Your query returns resultset with one column and one row. It could be inserted in some table that has one integer column. Your target table is not suitable for this because it has three columns.
You should be able to do this:
INSERT INTO Admin.VersionHistory(ColumnName)
SELECT COUNT(*)
-- the resut of your query here
INSERT INTO Admin.VersionHistory (ColCount, version, Sysdate)
VALUES(
(SELECT COUNT(*) AS cnt FROM membership.members AS mm
lLEFT JOIN aspnet_membership AS asp ON mm.aspnetuserid = asp.userid
LEFT JOIN trade.tradesmen AS tr ON tr.memberid=mm.memberid
WHERE
asp.isapproved = 0 AND
tr.ImportDPN IS NOT NULL AND
tr.importDPN <> ''),
VERSIONVALUE, GETDATE())
Though I have tried similar thing in Oracle I guess it should not be very different in SQL.
Give it a shot.

Need to return all columns from a table when using GROUP BY

I have a table let's say it has four columns
Id, Name, Cell_no, Cat_id.
I need to return all columns whose count of Cat_id is greater than 1.
The group should be done on Cell_no and Name.
What i have done so far..
select Cell_no, COUNT(Cat_id)
from TableName
group by Cell_Number
having COUNT(Cat_id) > 1
But what i need is some thing like this.
select *
from TableName
group by Cell_Number
having COUNT(Cat_id) > 1
Pratik's answer is good but rather than using the IN operator (which only works for single values) you will need to JOIN back to the result set like this
SELECT t.*
FROM tableName t
INNER JOIN
(SELECT Cell_no, Name
FROM TableName
GROUP BY Cell_no , Name
HAVING COUNT(Cat_id) > 1) filter
ON t.Cell_no = filter.Cell_no AND t.Name = filter.Name
you just need to modify your query like below --
select * from tableName where (Cell_no, Name) in (
select Cell_no, Name from TableName
Group by Cell_no , Name
having COUNT(Cat_id) > 1
)
as asked in question you want to group by Cell_no and Name.. if so you need to change your query for group by columns and select part also.. as I have mentioned
This version requires only one pass over the data:
SELECT *
FROM (SELECT a.*
,COUNT(cat_id) OVER (PARTITION BY cell_no)
AS count_cat_id_not_null
FROM TableName a)
WHERE count_cat_id_not_null > 1;

How do I select last 5 rows in a table without sorting?

I want to select the last 5 records from a table in SQL Server without arranging the table in ascending or descending order.
This is just about the most bizarre query I've ever written, but I'm pretty sure it gets the "last 5" rows from a table without ordering:
select *
from issues
where issueid not in (
select top (
(select count(*) from issues) - 5
) issueid
from issues
)
Note that this makes use of SQL Server 2005's ability to pass a value into the "top" clause - it doesn't work on SQL Server 2000.
Suppose you have an index on id, this will be lightning fast:
SELECT * FROM [MyTable] WHERE [id] > (SELECT MAX([id]) - 5 FROM [MyTable])
The way your question is phrased makes it sound like you think you have to physically resort the data in the table in order to get it back in the order you want. If so, this is not the case, the ORDER BY clause exists for this purpose. The physical order in which the records are stored remains unchanged when using ORDER BY. The records are sorted in memory (or in temporary disk space) before they are returned.
Note that the order that records get returned is not guaranteed without using an ORDER BY clause. So, while any of the the suggestions here may work, there is no reason to think they will continue to work, nor can you prove that they work in all cases with your current database. This is by design - I am assuming it is to give the database engine the freedom do as it will with the records in order to obtain best performance in the case where there is no explicit order specified.
Assuming you wanted the last 5 records sorted by the field Name in ascending order, you could do something like this, which should work in either SQL 2000 or 2005:
select Name
from (
select top 5 Name
from MyTable
order by Name desc
) a
order by Name asc
You need to count number of rows inside table ( say we have 12 rows )
then subtract 5 rows from them ( we are now in 7 )
select * where index_column > 7
select * from users
where user_id >
( (select COUNT(*) from users) - 5)
you can order them ASC or DESC
But when using this code
select TOP 5 from users order by user_id DESC
it will not be ordered easily.
select * from table limit 5 offset (select count(*) from table) - 5;
Without an order, this is impossible. What defines the "bottom"? The following will select 5 rows according to how they are stored in the database.
SELECT TOP 5 * FROM [TableName]
Well, the "last five rows" are actually the last five rows depending on your clustered index. Your clustered index, by definition, is the way that he rows are ordered. So you really can't get the "last five rows" without some order. You can, however, get the last five rows as it pertains to the clustered index.
SELECT TOP 5 * FROM MyTable
ORDER BY MyCLusteredIndexColumn1, MyCLusteredIndexColumnq, ..., MyCLusteredIndexColumnN DESC
Search 5 records from last records you can use this,
SELECT *
FROM Table Name
WHERE ID <= IDENT_CURRENT('Table Name')
AND ID >= IDENT_CURRENT('Table Name') - 5
If you know how many rows there will be in total you can use the ROW_NUMBER() function.
Here's an examble from MSDN (http://msdn.microsoft.com/en-us/library/ms186734.aspx)
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
In SQL Server 2012 you can do this :
Declare #Count1 int ;
Select #Count1 = Count(*)
FROM [Log] AS L
SELECT
*
FROM [Log] AS L
ORDER BY L.id
OFFSET #Count - 5 ROWS
FETCH NEXT 5 ROWS ONLY;
Try this, if you don't have a primary key or identical column:
select [Stu_Id],[Student_Name] ,[City] ,[Registered],
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from student
ORDER BY RowNum desc
You can retrieve them from memory.
So first you get the rows in a DataSet, and then get the last 5 out of the DataSet.
There is a handy trick that works in some databases for ordering in database order,
SELECT * FROM TableName ORDER BY true
Apparently, this can work in conjunction with any of the other suggestions posted here to leave the results in "order they came out of the database" order, which in some databases, is the order they were last modified in.
select *
from table
order by empno(primary key) desc
fetch first 5 rows only
Last 5 rows retrieve in mysql
This query working perfectly
SELECT * FROM (SELECT * FROM recharge ORDER BY sno DESC LIMIT 5)sub ORDER BY sno ASC
or
select sno from(select sno from recharge order by sno desc limit 5) as t where t.sno order by t.sno asc
When number of rows in table is less than 5 the answers of Matt Hamilton and msuvajac is Incorrect.
Because a TOP N rowcount value may not be negative.
A great example can be found Here.
i am using this code:
select * from tweets where placeID = '$placeID' and id > (
(select count(*) from tweets where placeID = '$placeID')-2)
In SQL Server, it does not seem possible without using ordering in the query.
This is what I have used.
SELECT *
FROM
(
SELECT TOP 5 *
FROM [MyTable]
ORDER BY Id DESC /*Primary Key*/
) AS T
ORDER BY T.Id ASC; /*Primary Key*/
DECLARE #MYVAR NVARCHAR(100)
DECLARE #step int
SET #step = 0;
DECLARE MYTESTCURSOR CURSOR
DYNAMIC
FOR
SELECT col FROM [dbo].[table]
OPEN MYTESTCURSOR
FETCH LAST FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
WHILE #step < 10
BEGIN
FETCH PRIOR FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
SET #step = #step + 1;
END
CLOSE MYTESTCURSOR
DEALLOCATE MYTESTCURSOR
Thanks to #Apps Tawale , Based on his answer, here's a bit of another (my) version,
To select last 5 records without an identity column,
select top 5 *,
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
Nevertheless, it has an order by, but on RowNum :)
Note(1): The above query will reverse the order of what we get when we run the main select query.
So to maintain the order, we can slightly go like:
select *, RowNum2 = row_number() OVER (ORDER BY (SELECT 0))
from (
select top 5 *, RowNum = row_number() OVER (ORDER BY (SELECT 0))
from [dbo].[ViewEmployeeMaster]
ORDER BY RowNum desc
) as t1
order by RowNum2 desc
Note(2): Without an identity column, the query takes a bit of time in case of large data
Get the count of that table
select count(*) from TABLE
select top count * from TABLE where 'primary key row' NOT IN (select top (count-5) 'primary key row' from TABLE)
If you do not want to arrange the table in ascending or descending order. Use this.
select * from table limit 5 offset (select count(*) from table) - 5;

Resources