Duplicate Query in Microsoft Access or SQL Server - sql-server

I have over 1,000 entries with duplicate values in an Access database. I want to create a column called Duplicate and set it to true if a record is NOT THE FIRST ONE with a particular value. This means, if the record is the first one with the value "Red Chair", the Duplicate field is set to false, but all subsequent records with value "Red Chair" will have the Duplicate field set to true.
How do I perform this query in Access?
This database will be upsized to an SQL Server database, so an option for me is to 'ignore' the duplicate records while retrieving the records in my SQL query. If this option is viable, I'd like to know how to do this in SQL as alternative. Thanks.

You will have to use subqueries. Try this
UPDATE Tabelle1 SET Tabelle1.b = 'Duplicate'
WHERE
((Tabelle1.[ID] In
(SELECT Tabelle1.[ID] FROM Tabelle1 WHERE
((Tabelle1.[a] In
(SELECT [a] FROM [Tabelle1] As Tmp GROUP BY [a] HAVING Count(*)>1 )
)
AND
(Tabelle1.[ID] Not In
(SELECT min([id]) FROM [Tabelle1] as grpid GROUP BY [a] HAVING Count(*)>1)
));
)));

I'm no expert on the Access dialect, but this adaptation of RJIGO's answer or something similar may also work and be more efficient:
UPDATE Tabelle1 SET
b = 'Duplicate'
WHERE
Tabelle1.[ID] > (
SELECT min([id])
FROM [Tabelle1] as T2
WHERE T2.[a] = Tabelle1.[a]
);

I hope this sql help u:
SELECT table.field, Count(table.field) AS test, IIf([test]>1,"TRUE","FALSE") AS check
FROM table
GROUP BY table.field, IIf([test]>1,"TRUE","FALSE");

Related

Data sync between Linked Server and Local server using Merge when table structure is different (SQL)

I have a linked server and Test Db under my localDb Server (SQL 2014).
A linked server has a table:
Valid State(StateId(char Pk), Name, Desc, CreatedBy, UpdatedBy)
Inside my Test Db I have a table:
Valid State(Id(int PK),Abbreviation(char) ,Name, IsActive)
I need to sync the data between these linked server and my table. What would be the approach to deal with the situation where I can implement SQL- Merge.
I got some idea from http://www.sqlservercentral.com/articles/T-SQL/66066/
I get to the point where the query works if, the table structures are SAME. But the situation is different where structure are different. Any suggestions will be appreciated :)
Thank you !
Assuming the id's between the systems actually match, this will do what you want:
-- Insert any new records
INSERT INTO [Valid State](Id,Name, IsActive)
SELECT StateId, Name
FROM LinkedServer.database.schema.[Valid State] SRC
WHERE NOT EXISTS (
SELECT * FROM [Valid State] TGT WHERE TGT.ID = SRC.StateID
)
-- Update any existing records
UPDATE TGT
SET Name = SRC.Name
FROM [Valid State] TGT
INNER JOIN
LinkedServer.database.schema.[Valid State] SRC
ON SRC.StateID = TGT.ID
Even after all this prompting you haven' explained what you want to do with the leftover fields. So I've left them out
I would suggest to update first and then insert, so you don't waste time on updating an insert.

SQL Server : using SELECT in NOT IN WHERE Clause

I've been using this query statement ever since. I wonder why this does not work on SQL Server 2008 R2.
SELECT
UserName
FROM
Users
WHERE
UserName NOT IN (SELECT UserName FROM UserTableT2)
The codes does not return any data. Goal is select all UserName in Users table which do not belong to UserTableT2.
EDIT:
Here's the actual query
Update using #Tim Schelmter's query:
Update :
Update:
Thank you!
I would use NOT EXISTS:
SELECT u.UserName
FROM Users u
WHERE NOT EXISTS
(
SELECT 1 FROM UserTableT2 ut2
WHERE u.UserName = ut2.UserName
)
Why? Because it works also if there are NULL values in UserTableT2.UserName.
Worth reading:
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.
With your updated columns and tables:
SELECT u.usr_id
FROM ousr u
WHERE NOT EXISTS
(
SELECT 1 FROM ApprovalStageApprovers asa
WHERE u.usr_id = asa.ApprovalUser
)

DB2 SELECT COUNT, if NULL then default to 0

GOAL
I am trying to select a user ID from one table and a count of associated items from another table in DB2. I am trying to execute this query in SSIS to import the data into a SQL Server database where I will perform additional transformation processes on the data. I mention the SSIS not because I think its part of the issue, but just for background info. I'm fairly certain the problem lies with my inexperience in DB2. My background is in SQL Server, I'm very new to DB2.
ISSUE
The problem occurs when (I'm assuming) the count is 0. When I execute the query in my DB2 command editor, it just returns a blank row. I would expect it to at least return the user ID and then just have a blank field for the count, but instead the whole row is blank. This then causes issues with my SSIS package where its trying to do a bunch of inserts with no data.
Again, a lot of this is an assumption because I'm not experienced with DB2 Command Editor, or DB2, but if I open up the dates a bit, I will get the expected results (user id and count).
I've tried wrapping the count in a COALESCE() function, but that didn't resolve the issue.
QUERY
SELECT a.user_id, COUNT(DISTINCT b.item_number) as Count
FROM TABLE_A a
LEFT OUTER JOIN TABLE_B b
ON a.PRIMARY_KEY = b.PRIMARY_KEY
WHERE a.user_id = '1234'
AND b.DATE_1 >= '01/01/2013'
AND b.DATE_1 <= '01/05/2013'
AND b.DATE_2 >= '01/01/2013'
AND b.DATE_2 <= '12/23/2014'
AND a.OTHER_FILTER_FIELD = 'ABC'
GROUP BY a.user_id
Wrap the field in COALESCE()
COUNT(DISTINCT COALESCE(b.item_number,0))
also make sure
WHERE a.user_id = '1234'
exists in TABLE_A. If there is no user 1234 you will get no results with the query as written.

Single condition slows down SQL query drastically

I have a SQL query looking something like this:
WITH RES_CTE AS
(SELECT
COLUMN1,
COLUMN2,
[MORE COLUMNS...]
ROW_NUMBER() OVER (ORDER BY R.RANKING DESC) AS RowNum
FROM TABLE1 As R, TABLE2 As A, TABLE3 As U, TABLE4 As S, TABLE5 As T
WHERE R.RID = A.LID
AND S.QRYID = R.QRYID
AND A.AID = U.AID
AND CONDITION1 = 'VALUE'
AND CONDITION2 = 'VALUE'
AND [MORE CONDITIONS...]
),
Results_Cnt AS
(SELECT COUNT(*) CNT FROM Results_CTE)
SELECT * FROM Results_CTE, Results_Cnt WHERE RowNum >= 1 AND RowNum <= 25
Now, this query typically runs under 1 sec and returns the 25 records out of 5000 based on CONDITION1.
Recently though, I added a new column to a TABLE1 and then use its values as a CONDITION2 in the query above. The column is populated going forward but all the values in the past are NULL.
I read something above joining table that have NULL being a reason for slow execution. The table has about 1,300,000 records. 90% of them are NULL in the problematic column. But that column is not being joined on. (The one that is being joined on has an INDEX)
However, I wanted to try that anyway by creating a new column and simply copying the data like so:
ALTER TABLE TABLE1 ADD COL_NEW
UPDATE TABLE1 SET COL_NEW = COL_OLD
My next step was to replace the NULLs with an actual value but first, just for kicks, I changed the query to use as a condition the new field COL_NEW, and the problem went away.
Although I'm happy the problem is gone, I can't explain it to myself. Why was the execution slow in the first place if it had nothing to do with the NULLs?
UPDATE: It appears the problem may have resulted from a cached query plan. So the question essentially becomes, how to force a query plan refresh?
UPDATE: Although doing ALTER TABLE may have refreshed the execution plan, the problem returned. How can I find out what is happening?
It sounds like your query plan got cached while the stats for the new column showed it completely full of nulls, forcing a table scan. Following the ALTER TABLE the query plan was refreshed, replcing the table scan with an index lookujp again, and performance returned to normal.
The only way to know for sure if that is what happened would be to examine the query plans for both queries, but those are long gone now.

UPSERT in SSIS

I am writing an SSIS package to run on SQL Server 2008. How do you do an UPSERT in SSIS?
IF KEY NOT EXISTS
INSERT
ELSE
IF DATA CHANGED
UPDATE
ENDIF
ENDIF
See SQL Server 2008 - Using Merge From SSIS. I've implemented something like this, and it was very easy. Just using the BOL page Inserting, Updating, and Deleting Data using MERGE was enough to get me going.
Apart from T-SQL based solutions (and this is not even tagged as sql/tsql), you can use an SSIS Data Flow Task with a Merge Join as described here (and elsewhere).
The crucial part is the Full Outer Join in the Merger Join (if you only want to insert/update and not delete a Left Outer Join works as well) of your sorted sources.
followed by a Conditional Split to know what to do next: Insert into the destination (which is also my source here), update it (via SQL Command), or delete from it (again via SQL Command).
INSERT: If the gid is found only on the source (left)
UPDATE If the gid exists on both the source and destination
DELETE: If the gid is not found in the source but exists in the destination (right)
I would suggest you to have a look at Mat Stephen's weblog on SQL Server's upsert.
SQL 2005 - UPSERT: In nature but not by name; but at last!
Another way to create an upsert in sql (if you have pre-stage or stage tables):
--Insert Portion
INSERT INTO FinalTable
( Colums )
SELECT T.TempColumns
FROM TempTable T
WHERE
(
SELECT 'Bam'
FROM FinalTable F
WHERE F.Key(s) = T.Key(s)
) IS NULL
--Update Portion
UPDATE FinalTable
SET NonKeyColumn(s) = T.TempNonKeyColumn(s)
FROM TempTable T
WHERE FinalTable.Key(s) = T.Key(s)
AND CHECKSUM(FinalTable.NonKeyColumn(s)) <> CHECKSUM(T.NonKeyColumn(s))
The basic Data Manipulation Language (DML) commands that have been in use over the years are Update, Insert and Delete. They do exactly what you expect: Insert adds new records, Update modifies existing records and Delete removes records.
UPSERT statement modifies existing records, if a records is not present it INSERTS new records.
The functionality of UPSERT statment can be acheived by two new set of TSQL operators. These are the two new ones
EXCEPT
INTERSECT
Except:-
Returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query
Intersect:-
Returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.
Example:- Lets say we have two tables Table 1 and Table 2
Table_1 column name(Number, datatype int)
----------
1
2
3
4
5
Table_2 column name(Number, datatype int)
----------
1
2
5
SELECT * FROM TABLE_1 EXCEPT SELECT * FROM TABLE_2
will return 3,4 as it is present in Table_1 not in Table_2
SELECT * FROM TABLE_1 INTERSECT SELECT * FROM TABLE_2
will return 1,2,5 as they are present in both tables Table_1 and Table_2.
All the pains of Complex joins are now eliminated :-)
To use this functionality in SSIS, all you need to do add an "Execute SQL" task and put the code in there.
I usually prefer to let SSIS engine to manage delta merge. Only new items are inserted and changed are updated.
If your destination Server does not have enough resources to manage heavy query, this method allow to use resources of your SSIS server.
We can use slowly changing dimension component in SSIS to upsert.
https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/configure-outputs-using-the-slowly-changing-dimension-wizard?view=sql-server-ver15
I would use the 'slow changing dimension' task

Resources