Duplicate all columns except for some of them - sql-server

Is there an option to duplicate row in sql server without write all the columns? I just want to write the columns that I want to insert it by myself.
for example:
MYTABLE:
Id|Name|Status|Date
-------------------
2|abca|active|03.10
so I can do:
INSERT INTO MYTABLE (Id, Name, Status, Date )
SELECT NEWID(), "bird", status, Date
FROM MYTABLE
WHERE Id = "2"
it will duplicate the row:
Id |Name|Status|Date
-------------------
2 |abca|active|03.10
fg35|bird|active|03.10
can't I duplicate all columns, except for what I write?
in this example:
something like (pseudo code):
duplicate all columns in MYTABLE where Id="2" except for (id="newID()", name="bird")

No, in SQL Server you either need to provide all values for all columns:
INSERT INTO TableName VALUES(value1, value2, ...)
or provide the columns you want to insert to
INSERT INTO TableName (column1, column2, ...) VALUES(value1, value2, ...)
You cannot achieve what you want in a general sense.
There may be a way to do it in some specific cases buy duplicating all columns and updating some but that will depend on your ability to distinguish the newly inserted row from the previously existing one.

Related

How to delete Duplicate records in snowflake database table

how to delete the duplicate records from snowflake table. Thanks
ID Name
1 Apple
1 Apple
2 Apple
3 Orange
3 Orange
Result should be:
ID Name
1 Apple
2 Apple
3 Orange
Adding here a solution that doesn't recreate the table. This because recreating a table can break a lot of existing configurations and history.
Instead we are going to delete only the duplicate rows and insert a single copy of each, within a transaction:
-- find all duplicates
create or replace transient table duplicate_holder as (
select $1, $2, $3
from some_table
group by 1,2,3
having count(*)>1
);
-- time to use a transaction to insert and delete
begin transaction;
-- delete duplicates
delete from some_table a
using duplicate_holder b
where (a.$1,a.$2,a.$3)=(b.$1,b.$2,b.$3);
-- insert single copy
insert into some_table
select *
from duplicate_holder;
-- we are done
commit;
Advantages:
Doesn't recreate the table
Doesn't modify the original table
Only deletes and inserts duplicated rows (good for time travel storage costs, avoids unnecessary reclustering)
All in a transaction
If you have some primary key as such:
CREATE TABLE fruit (key number, id number, name text);
insert into fruit values (1,1, 'Apple'), (2,1,'Apple'),
(3,2, 'Apple'), (4,3, 'Orange'), (5,3, 'Orange');
as then
DELETE FROM fruit
WHERE key in (
SELECT key
FROM (
SELECT key
,ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY key) AS rn
FROM fruit
)
WHERE rn > 1
);
But if you do not have a unique key then you cannot delete that way. At which point a
CREATE TABLE new_table_name AS
SELECT id, name FROM (
SELECT id
,name
,ROW_NUMBER() OVER (PARTITION BY id, name) AS rn
FROM table_name
)
WHERE rn > 1
and then swap them
ALTER TABLE table_name SWAP WITH new_table_name
Here's a very simple approach that doesn't need any temporary tables. It will work very nicely for small tables, but might not be the best approach for large tables.
insert overwrite into some_table
select distinct * from some_table
;
The OVERWRITE keyword means that the table will be truncated before the insert takes place.
Snowflake does not have effective primary keys, their use is primarily with ERD tools.
Snowflake does not have something like a ROWID either, so there is no way to identify duplicates for deletion.
It is possible to temporarily add a "is_duplicate" column, eg. numbering all the duplicates with the ROW_NUMBER() function, and then delete all records with "is_duplicate" > 1 and finally delete the utility column.
Another way is to create a duplicate table and swap, as others have suggested.
However, constraints and grants must be kept. One way to do this is:
CREATE TABLE new_table LIKE old_table COPY GRANTS;
INSERT INTO new_table SELECT DISTINCT * FROM old_table;
ALTER TABLE old_table SWAP WITH new_table;
The code above removes exact duplicates. If you want to end up with a row for each "PK" you need to include logic to select which copy you want to keep.
This illustrates the importance to add update timestamp columns in a Snowflake Data Warehouse.
this has been bothering me for some time as well. As snowflake has added support for qualify you can now create a dedupped table with a single statement without subselects:
CREATE TABLE fruit (id number, nam text);
insert into fruit values (1, 'Apple'), (1,'Apple'),
(2, 'Apple'), (3, 'Orange'), (3, 'Orange');
CREATE OR REPLACE TABLE fruit AS
SELECT * FROM
fruit
qualify row_number() OVER (PARTITION BY id, nam ORDER BY id, nam) = 1;
SELECT * FROM fruit;
Of course you are left with a new table and loose table history, primary keys, foreign keys and such.
Based on above ideas.....following query worked perfectly in my case.
CREATE OR REPLACE TABLE SCHEMA.table
AS
SELECT
DISTINCT *
FROM
SCHEMA.table
;
Your question boils down to: How can I delete one of two perfectly identical rows? . You can't. You can only do a DELETE FROM fruit where ID = 1 and Name = 'Apple';, then both rows will go away. Or you don't, and keep both.
For some databases, there are workarounds using internal rows, but there isn't any in snowflake, see https://support.snowflake.net/s/question/0D50Z00008FQyGqSAL/is-there-an-internalmetadata-unique-rowid-in-snowflake-that-i-can-reference . You cannot limit deletes, either, so your only option is to create a new table and swap.
Additional Note on Hans Henrik Eriksen's remark on the importance of update timestamps: This is a real help when the duplicates where added later. If, for example, you want to keep the newer values, you can then do this:
-- setup
create table fruit (ID Integer, Name VARCHAR(16777216), "UPDATED_AT" TIMESTAMP_NTZ);
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (2, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- wait > 1 nanosecond
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- delete older duplicates (DESC)
DELETE FROM fruit
WHERE (ID
, UPDATED_AT) IN (
SELECT ID
, UPDATED_AT
FROM (
SELECT ID
, UPDATED_AT
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS rn
FROM fruit
)
WHERE rn > 1
);
simple UNION eliminate duplicates on use case of just all columns/no pks.
anyway problem should he solved as early on ingestion pipeline, and/or use scd etc.
Just a raw magic best way how to delete is wrong in principle, use scd with high resolution timestamp, solves any problem.
you want fix massive dups load ? then add column like batch id and remove all batch loaded records
Its like being healthy, you have 2 approaches:
eat a lot > get far > go-to a gym to burn it
eat well > have healthy life style and no need for gym.
So before discussing best gym, try change life style.
hope this helps, learn to do pressure upstream on data producers instead of living like jesus christ trying to clean up the mess of everyone.
The following solution is effective if you are looking at one or few columns as primary key references for the table.
-- Create a temp table to hold our duplicates (only second occurrence)
CREATE OR REPLACE TRANSIENT TABLE temp_table AS (
SELECT [col1], [col2], .. [coln]
FROM (
SELECT *, ROW_NUMBER () OVER(
PARTITION BY [pk]1, [pk]2, .. [pk]m
ORDER BY [pk]1, [pk]2, .. [pk]m) AS duplicate_count
FROM [schema].[table]
) WHERE duplicate_count = 2
);
-- Delete all the duplicate records from the table
DELETE FROM [schema].[table] t1
USING temp_table t2
WHERE
t1.[pk]1 = t2.[pk]1 AND
t1.[pk]2 = t2.[pk]2 AND
..
t1.[pk]n = t2.[pk]m;
-- Insert single copy using the temp_table in the original table
INSERT INTO [schema].[table]
SELECT *
FROM temp_table;
This is inspired by #Felipe Hoffa's answer:
##create table with dupes and take the max id
create or replace transient table duplicate_holder as (
select max(S.ID) ID, some_field, count(some_field) numberAssets
from some_table S
group by some_field
having count(some_field)>1
)
##join back to the original table on the field excluding the ID in the duplicate table and delete.
delete from some_table as t
USING duplicate_holder as d
WHERE t.some_field=d.some_field
and t.id <> d.id
Not sure if people are still interested in this but I've used the below query which is more elegant and seems to have worked
create or replace table {{your_table}} as
select * from {{your_table}}
qualify row_number() over (partition by {{criteria_columns}} order by 1) = 1

Export data from Microsoft SQL Server using a query to target data

I know how to generate scripts to script insert lines allowing me to backup some data. I was wondering though if it was possible to write a query (using WHERE clause as an example) to target a very small subset of data in a very large table?
In the end I want to generate a script that has a bunch of insert lines and will allow for inserting primary key values (where it normally would not let you).
SSMS will not let you to have the INSERT queries for specific rows in a table. You can do this by using GenerateInsert stored procedure. For example :
EXECUTE dbo.GenerateInsert #ObjectName = N'YourTableName'
,#SearchCondition='[ColumnName]=ColumnValue';
will give you similar result for the filtered rows specified in the #SearchCondition
Let's say your table name is Table1 which has columns Salary & Name and you want the insert queries for those who have salary greater than 1000 whose name starts with Mr., then you can use this :
EXECUTE dbo.GenerateInsert #ObjectName = N'Table1'
,#SearchCondition='[Salary]>1000 AND [Name] LIKE ''Mr.%'''
,#PopulateIdentityColumn=1;
If I read your requirement correctly, what you actually want to do is simply make a copy of some data in your table. I typically do this by using a SELECT INTO. This will also generate the target table for you.
CREATE TABLE myTable (Column1 int, column2 NVARCHAR(50))
;
INSERT INTO myTable VALUES (1, 'abc'), (2, 'bcd'), (3, 'cde'), (4, 'def')
;
SELECT * FROM myTable
;
SELECT
*
INTO myTable2
FROM myTable WHERE Column1 > 2
;
SELECT * FROM myTable;
SELECT * FROM myTable2;
DROP TABLE myTable;
DROP TABLE myTable2;
myTable will contain the following:
Column1 column2
1 abc
2 bcd
3 cde
4 def
myTable2 will only have the last 2 rows:
Column1 column2
3 cde
4 def
Edit: Just saw the bit about the Primary Key values. Does this mean you want to insert the data into an existing table, rather than just creating a backup set? If so, you can issue SET IDENTITY_INSERT myTable2 ON to allow for this.
However, be aware that might cause issues in case the id values you are trying to insert already exist.

MSSQL Insert Data From One Table To Another With Fixed Value

Hello I would like to insert in a table called Data multiple columns from another table called SourceTable and one colum that has a standar value for every row added in Data.
Assume that you have Column1 and Column2 in the table called SourceTable and source_id is precalculated and it will be the same for every row added into Data on this query.
INSERT INTO Data (Columns1, Column2, source_id)
SELECT Column1, Column2
FROM SourceTable
UNION SELECT 2;
I tried this one but is not working, most likely because the SELECT 2 returns only one row.
Your issue is that you're giving SQL 3 columns to insert 2 values into, if source_id is going to be 2 as your union selects then you'd want something like this;
INSERT INTO Data (Columns1, Column2, source_id)
SELECT Column1, Column2, 2
FROM SourceTable
The number of columns you're inserting needs to match the number of columns that you're inserting to. The way you were doing it would have produced this result;
Column1 Column2 source_id
Value1 Value2
2
but even the union would have failed as the queries that you're unioning need to have the same number of columns.

How to use INSERT SELECT?

I have a table's structure:
[Subjects]:
id int Identity Specification yes
Deleted bit
[Juridical]:
id int
Name varchar
typeid int
[Individual]:
id int
Name varchar
Juridical and Individual it's a children classes of Subjects class. So it's mean that same rows in tables Individual and Subjects have a same id.
Now I have a table:
[MyTable]:
typeid varchar
Name varchar
And I want to select data from this table and insert it into my table structure. But I don't know what to do. I tried to use OUTPUT:
INSERT INTO [Individual](Name)
OUTPUT false
INTO [Subjects].[Deleted]
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
But the syntax is not correct.
Just use:
INSERT INTO Individual(Name)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
and
INSERT INTO Subjects(Deleted)
SELECT [MyTable].[Name] as Name
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You can't insert in a single query in two tables, you need two separate queries for that. For that reason I split your initial query into two INSERT statements, to add records to both your Individual and Subjects table.
Just as #marc_s said, you must select the exact number of columns in your SELECT statement with the number of columns you want to insert data into your tables.
Other than these two constraints, which are both related to syntax, you are fully allowed to do any filtering in the SELECT part or make any complex logic as you would do in a normal SELECT query.
You need to use this syntax:
INSERT INTO [Individual] (Name)
SELECT [MyTable].[Name]
FROM [MyTable]
WHERE [MyTable].[type] = 'Indv'
You should define the list of column to insert into in the INSERT INTO line, and then you must have a SELECT that returns exactly that many columns as you need (and the column types need to match, too)

SQL Server query - how to insert selected values into another table

Here's what I am trying to do.
This is the select statement
select Id
from tblUsersPokemons
where UserId = 1 and PokemonPlace = 'bag'
Now I want to insert those returned Ids into another table like this:
foreach all returned Ids
insert into tblNpcBattleUsersPokemons values (Id, 1)
How can I do that ?
Like this:
insert into tblNpcBattleUsersPokemons
select Id, 1 from tblUsersPokemons where UserId=1 and PokemonPlace='bag'
This can be done in a single sql call
insert into tblNpcBattleUsersPokemons (id, [whatever the name of the other column is])
select Id, 1 from tblUsersPokemons where UserId=1 and PokemonPlace='bag'
I've used the longhand here because it does not make any assumption about the ordering of columns in the destination table, which can change over time and invalidate your insert statement.
You can insert the set retrieved by a SELECT into another existing table using the INSERT...SELECT syntax.
For example:
INSERT INTO tblNpcBattleUsersPokemons (Id, Val) -- not sure what the second column name was
SELECT Id FROM tblUsersPokemons WHERE UserID = 1 and PokemonPlace='bag';

Resources