How can I check and remove duplicate rows?

How can I check and remove duplicate rows? - sql-server

Have problem with quite big table, where are some null values in 3 columns - datetime2 (and 2 float columns).
Nice simple request from similar question returns only 2 rows where datetime2 is null, but nothing else (same as lot of others):
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, allRemainingCols
FROM MyTable
GROUP BY allRemainingCols
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
Seems to work without datetime2 column having nulls ??
There is manual workaround, but is there any way to create request or procedure using TSQL only ?
SELECT id,remainingColumns
FROM table
order BY remainingColumns
Compare all columns in XL (15 in my case, placed =ROW() in first column as a check and formula next to last column + auto filter for TRUEs): =AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2)
Or compare 3 rows like this and select all non-unique rows
=OR(
AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2);
AND(B2=B3;C2=C3;D2=D3;E2=E3;F2=F3;G2=G3;H2=H3;I2=I3;J2=J3;K2=K3;L2=L3;M2=M3;N2=N3;O2=O3;P2=P3)
)

Quite much work to find my particular data/answer...
Most of float numbers were slightly different.
Hard to find, but simple CAST(column as binary) can show these invisible differences...
Like 96,6666666666667 vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAD vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAB etc.
And visible 96.6666666666667 can return something different way again:
0x0000000000000000000000000000000000000F0D0001AB6A489F2D6F0300

Related

Need to be able to check column by one by one and for a certain string but can't figure out a way to do so

TablesIs there a way for me to have a String and check within multiple columns in another table in order starting from 1 until I get a match?
I have table with a few fields
Medicine
-----------
Advil
Tylenol
Midol
I need to check it against another table and check column in order for the medicine above.
MedsToTry1 | MedsToTry2 | MedsToTry3 | MedsToTry4 | MedsToTry5 | MedsToTry6 |
------------|------------|------------|------------|------------|------------|
NotAdvil Advil Null Null Null Null
Tylenol Ibuprofen NotTylenol Null Null Null
NotMidol NotAdvil Ibuprofen Midol Null Null
So I have to go through each one of the fields in the first table and search for them in the 'MedsToTry1' field if not there then on 'MedsToTry2' and so on until found.
I've tried concatentation on all the strings in the MedsToTry fields and searching for the string in there but it doesn't guarantee that it'll be in order and I need for 'MedsToTry1' to be checked first.
I tried to use COALESCE but it returns the fields on MedsToTry1 since they're all not null but won't go to MedsToTry2 to see if it's there.
Is there a way for me to do this? Have a String and check within multiple columns in another table in order starting from 1 until I get a match?
If I need to provide more information please let me know. I pretty new to SQL so I'm take any and all help I can get.
Thank you.

Your table is - quite probably - not the real source table. If you can query against this, it was much better. This is - again my guessing - the result of a pivot operation, where a row-wise table is transformed to a side-by-side or column-wise format.
You did not state the expected output. For the next time, or to improve this question, please look at my code and try to prepare a stand-alone example to reproduce your issue yourself. This time I've done it for you:
First we have to declare mockup-tables to simulate your issue:
DECLARE #tblA TABLE(Medicine VARCHAR(100));
INSERT INTO #tblA VALUES
('Advil')
,('Tylenol')
,('Midol');
DECLARE #tblB TABLE(RowId INT,MedsToTry1 VARCHAR(100),MedsToTry2 VARCHAR(100),MedsToTry3 VARCHAR(100),MedsToTry4 VARCHAR(100),MedsToTry5 VARCHAR(100),MedsToTry6 VARCHAR(100));
INSERT INTO #tblB VALUES
(1,'NotAdvil','Advil',Null,Null ,Null,Null)
,(2,'Tylenol','Ibuprofen','NotTylenol',Null ,Null,Null)
,(3,'NotMidol','NotAdvil','Ibuprofen','Midol',Null,Null);
--This is the query (use your own table names to test this in your environment)
SELECT B.RowId
,C.*
FROM #tblB b
CROSS APPLY (VALUES(1,b.MedsToTry1)
,(2,b.MedsToTry2)
,(3,b.MedsToTry3)
,(4,b.MedsToTry4)
,(5,b.MedsToTry5)
,(6,b.MedsToTry6)) C(MedRank,Medicin)
WHERE EXISTS(SELECT 1 FROM #tblA A WHERE A.Medicine=C.Medicin);
The idea in short:
The trick with CROSS APPLY (VALUES... will return each name-numbered column (MedsToTry1, MedsToTry2...) in one row, together with a rank. This way we do not lose the information of the sort order or the position within the table.
The WHERE will reduce the set to rows, where there exists a corresponding medicine in the other table.
The result
RowId MedRank Medicin
1 2 Advil
2 1 Tylenol
3 4 Midol

How can this expression reach the NULL expression?

I'm trying to randomly populate a column with values from another table using this statement:
UPDATE dbo.SERVICE_TICKET
SET Vehicle_Type = (SELECT TOP 1 [text]
FROM dbo.vehicle_typ
WHERE id = abs(checksum(NewID()))%21)
It seems to work fine, however the value NULL is inserted into the column. How can I get rid of the NULL and only insert the values from the table?

This can happen when you don't have an appropriate index on the ID column of your vehicle_typ table. Here's a smaller query that exhibits the same problem:
create table T (ID int null)
insert into T(ID) values (0),(1),(2),(3)
select top 1 * from T where ID = abs(checksum(NewID()))%3
Because there's no index on T, what happens is that SQL Server performs a table scan and then, for each row, attempts to satisfy the where clause. Which means that, for each row it evaluates abs(checksum(NewID()))%3 anew. You'll only get a result if, by chance, that expression produces, say, 1 when it's evaluated for the row with ID 1.
If possible (I don't know your table structure) I would first populate a column in SERVICE_TICKET with a random number between 0 and 20 and then perform this update using the already generated number. Otherwise, with the current query structure, you're always relying on SQL Server being clever enough to only evaluate abs(checksum(NewID()))%21once for each outer row, which it may not always do (as you've already found out).

#Damien_The_Unbeliever explained why your query fails.
My first variant was not correct, because I didn't understand the problem in full.
You want to set each row in SERVICE_TICKET to a different random value from vehicle_typ.
To fix it simply order by random number, rather than comparing a random number with ID. Like this (and you don't care how many rows are in vehicle_typ as long as there is at least one row there).
WITH
CTE
AS
(
SELECT
dbo.SERVICE_TICKET.Vehicle_Type
CA.[text]
FROM
dbo.SERVICE_TICKET
CROSS APPLY
(
SELECT TOP 1 [text]
FROM dbo.vehicle_typ
ORDER BY NewID()
) AS CA
)
UPDATE CTE
SET Vehicle_Type = [text];
At first we make a Common Table Expression, you can think of it as a temporary table. For each row in SERVICE_TICKET we pick one random row from vehicle_typ using CROSS APPLY. Then we UPDATE the original table with chosen rows.

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]

That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate

I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.

While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x

Preserving ORDER BY in SELECT INTO

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO VibeFGEventsAfterStudyStart
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id
The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.
In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?

I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:
insert Table2 (T2Col1, T2Col2)
select top 10000 T1Col1, T1Col2
from Table1
order by newid()

What for?
Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.
The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.
As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.
So it makes no sense to have a specific order on the insert as you try to achieve.
SQL 101: sets have no order.

Just add top to your sql with a number that is greater than the actual number of rows:
SELECT top 25000 *
into spx_copy
from SPX
order by date

I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.
It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.

I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:
create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);
insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');
select * from tmp1 order by id;
And I got this list:
1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry
No surprises here. Then I made a copy from tmp1 to tmp2 this way:
insert into tmp2 (name)
select name
from tmp1
order by id;
select * from tmp2 order by id;
I got the exact response like before. Apple to Blackberry.
Now reverse the order to test it:
delete from tmp2;
insert into tmp2 (name)
select name
from tmp1
order by id desc;
select * from tmp2 order by id;
9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple
So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!

The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't.
So, what do you do?
The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).

I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.
Why not create row numbers based on the sequence (order) you need to preserve?
SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
, VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.

I found this approach helpful to solve this problem:
WITH ordered as
(
SELECT TOP 1000
[Month]
FROM SourceTable
GROUP BY [Month]
ORDER BY [Month]
)
INSERT INTO DestinationTable (MonthStart)
(
SELECT * from ordered
)

Try using INSERT INTO instead of SELECT INTO
INSERT INTO VibeFGEventsAfterStudyStart
SELECT VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

Removing characters from a alphanumeric field SQL

Im moving data from one table to another using insert into. in the select bit need to transfer from column with characters and numerical in to another with only the numerical. The original column is in varchar format.
original column -
ABC100
XYZ:200
DD2000
Wanted column
100
200
2000
Cant write a function because cant have a function in side select statement when inserting
Using MS SQL

I encourage you to read this:
Extracting Data
There is an example function that removes alpha characters from a string. This will be much faster than a bunch of replace statements.

You can probably do that with a regex replace. The syntax for this depends on your database software (which you haven't specified).
You should be able to do function calls in your SELECT statement, even when you're using it to INSERT INTO.

If your data is fixed-format I'd do something like
INSERT INTO SOME_TABLE(COLUMN1, COLUMN2, COLUMN3)
SELECT TO_NUMBER(SUBSTR(SOURCE_COLUMN, 4, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 12, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 18, 4))
FROM SOME_OTHER_TABLE
WHERE <conditions>;
The above code is for Oracle. Depending on the database you're using you may have to do things a bit differently.
I hope this helps.

You certainly can have a function inside a SELECT statement during an INSERT:
INSERT INTO CleanTable (CleanColumn)
SELECT dbo.udf_CleanString(DirtyColumn)
FROM DirtyTable
Your main problem is going to be getting the function right (the one the G Mastros linked to is pretty good) and getting it performing. If you're only talking thousands of rows, this should be fine. If you are talking about millions of rows, you might need a different strategy.

Writing a UDF is how I've solved this problem in the past. However, I got to thinking if there was a set-based solution. Here's what I have:
First my table which I used Red Gate's Data Generator to populate with a bunch of random alpha numeric values:
Create Table MixedValues (
Id int not null identity(1,1) Primary Key
, AlphaValue varchar(50)
)
Next I built a Tally table on the fly using a CTE but normally I have a fixed table for this. A Tally table is just a table of sequential numbers.
;With Tally As
(
Select ROW_NUMBER() OVER ( ORDER BY object_id ) As Num
From sys.columns
)
, IndividualChars As
(
Select MX.Id, Substring(MX.AlphaValue, Num, 1) As CharValue, Num
From Tally
Cross Join MixedValues As MX
Where Num Between 1 And Len(MX.AlphaValue)
)
Select MX.Id, MX.AlphaValue
, Replace(
(
Select '' + CharValue
From IndividualChars As IC
Where IC.Id = MX.Id
And PATINDEX('[ 0-9]', CharValue) > 0
Order By Num
For Xml Path('')
)
, ' ', ' ') As NewValue
From MixedValues As MX
From a top level, the idea here is to split the string into one row per individual character, test the type of pattern you want and then re-constitute it.
Note that my sys.columns table only contains 500 some odd rows. If you had strings larger than 500 characters, you could simply cross join sys.columns to itself and get 500^2 rows. In addition, For Xml Path returns a string with spaces escaped (note the space in my pattern index [ 0-9] which tells the system to include spaces.) so I use the replace function to reverse the escaping.
EDIT: Btw, this will only work in SQL 2005+ because of my use of the CTE. If you wanted a SQL 2000 solution, you would need to break up the CTE into separate table creation calls (e.g. Temp tables) but it could still be done.
EDIT: I added the Num column in the IndividualChars CTE and added an Order By to the NewValue query at the end. Although it probably will reconstitute the string in order, I wanted to ensure that it would by explicitly ordering the results.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How can I check and remove duplicate rows? - sql-server

Related

Need to be able to check column by one by one and for a certain string but can't figure out a way to do so

How can this expression reach the NULL expression?

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

Preserving ORDER BY in SELECT INTO

Removing characters from a alphanumeric field SQL

Categories

Resources