Ignore/Remove duplicate rows in a column - sql-server

I have a SQL table:
Table code:
CREATE TABLE Gender
(
GenderID int primary key identity,
Gender char(20)
)
I would like to ignore or remove duplicate rows in Gender, whilst maintaining the auto incrementation of GenderID (specified in my create table code), so that it results in:
----------------
| 1 | Male |
----------------
| 2 | Female |
----------------
My attempt:
DELETE
FROM Gender
WHERE GenderID NOT IN (
SELECT MIN(GenderID)
FROM Gender
GROUP BY Gender)
Returns:
image

You can create new table and load the values as given below.
DECLARE #datasource TABLE(id int identity(1,1), gender CHAR(10))
INSERT INTO #datasource(gender)
SELECT * FROM
(
VALUES
('male'),
('male'),
('male'),
('female'),
('female'),
('female')
) as t(gender)
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT null)) as id, gender
INTO NewTableName
FROM #datasource
group by gender
Result Set
+----+-----------+
| id | gender |
+----+-----------+
| 1 | female |
| 2 | male |
+----+-----------+

I suggest you drop and re-create the table.
Add a unique index with IGNORE_DUP_KEY=ON to prevent duplicates.
[Can I set ignore_dup_key on for a primary key?

Related

Sqlserver how to create composite unique key with one of column is nullable

i am not expert with indexing.
I would like to create a composite key unique constraint. How to create if one of the column is nullable?
CREATE UNIQUE CLUSTERED INDEX [IX_User_Email_PeronsId] ON [dbo].[User]
(
[Email] ASC,
[PersonId] ASC
)
GO
PersonId is nullable column
In fact you can create a unique clustered index with nullable columns, just tried it:
USE tempdb;
CREATE TABLE dbo.[User]
(
Email nvarchar(50) NOT NULL,
PersonID int NULL
);
CREATE UNIQUE CLUSTERED INDEX [IX_User_Email_PersonID]
ON dbo.[User]
(
Email,
PersonID
);
Commands completed successfully.
You didn't mention what exactly you are trying to achieve, so let me have a guess. I think you want to achieve, that the combination of Email and PersonID has to be unique, except for the rows where PersonID is null.
In this case, using a clustered index is not useful, but you can use a filtered nonclustered index:
USE tempdb;
-- Create the test table
CREATE TABLE dbo.[User]
(
Email nvarchar(50) NOT NULL,
PersonID int NULL
);
-- Create a filtered unique index
CREATE UNIQUE NONCLUSTERED INDEX [IX_User_Email_PersonID]
ON dbo.[User]
(
Email,
PersonID
)
WHERE PersonID IS NOT NULL;
-- Insert test data
INSERT INTO dbo.[User]
(
Email,
PersonId
)
VALUES
( N'a#mydomain.com', 1 ),
( N'b#mydomain.com', 2 ),
( N'b#mydomain.com', 3 ),
( N'c#mydomain.com', 3 ),
( N'c#mydomain.com', 4 ),
( N'd#mydomain.com', NULL ),
( N'e#mydomain.com', NULL ),
( N'f#mydomain.com', NULL );
Test whether you can insert which data:
-- Works
INSERT INTO dbo.[User] ( Email, PersonId )
VALUES ( N'c#mydomain.com', 5 );
-- Fails
INSERT INTO dbo.[User] ( Email, PersonId )
VALUES ( N'c#mydomain.com', 5 );
-- Works
INSERT INTO dbo.[User] ( Email, PersonId )
VALUES ( N'f#mydomain.com', NULL );
-- Works
INSERT INTO dbo.[User] ( Email, PersonId )
VALUES ( N'f#mydomain.com', NULL );
Content of the table after step-by-step execution:
| Email | PersonID |
| ----------------- | -------- |
| a#mydomain.com | 1 |
| b#mydomain.com | 2 |
| b#mydomain.com | 3 |
| c#mydomain.com | 3 |
| c#mydomain.com | 4 |
| d#mydomain.com | NULL |
| e#mydomain.com | NULL |
| f#mydomain.com | NULL |
| c#mydomain.com | 5 |
| f#mydomain.com | NULL |
| f#mydomain.com | NULL |

Query for multiple many-to-many relations

I am trying to learn normalization on a EDIT: SQL Server 15.x database. My initial table would look like this:
| TeacherId(PK) | FirstName | LastName | Course | GroupCode |
| 1 | Smith | Jane | AAA,BBB | A1,A2,B2 |
| 2 | Smith | John | BBB,CCC | A2,B1,B2 |
After normalization I ended up with three tables
| TeacherId(PK) | FirstName | LastName | | Course(PK)(FK) | | GroupCode(PK)(FK) |
| 1 | Smith | Jane | | AAA | | A1 |
| 2 | Smith | John | | BBB | | A2 |
| CCC | | B1 |
| B2 |
and two joining tables
| TeacherId(PK) | Course(PK) | | TeacherId(PK) | GroupCode(PK) |
| 1 | AAA | | 1 | A1 |
| 1 | BBB | | 1 | A2 |
| 2 | BBB | | 1 | B2 |
| 2 | CCC | | 2 | A2 |
| 2 | B1 |
| 2 | B2 |
The code for these tables is:
CREATE TABLE [dbo].[Teachers]
(
[TeacherId] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[FirstName] [nchar](10) NOT NULL,
[LastName] [nchar](10) NOT NULL
)
GO
CREATE TABLE [dbo].[Courses]
(
[Course] [nchar](10) NOT NULL PRIMARY KEY
)
GO
CREATE TABLE [dbo].[GroupCodes]
(
[GroupCode] [nchar](10) NOT NULL PRIMARY KEY
)
GO
CREATE TABLE [dbo].[TeacherCourse]
(
[TeacherId] [int] NOT NULL,
[Course] [nchar](10) NOT NULL,
PRIMARY KEY (TeacherId, Course),
CONSTRAINT FK_TeacherCourses FOREIGN KEY (TeacherId)
REFERENCES Teachers(TeacherId),
CONSTRAINT FK_TeachersCourse FOREIGN KEY (Course)
REFERENCES Courses(Course)
)
GO
CREATE TABLE [dbo].[TeacherGroup]
(
[TeacherId] [int] NOT NULL,
[GroupCode] [nchar](10) NOT NULL,
PRIMARY KEY (TeacherId, GroupCode),
CONSTRAINT FK_TeacherGroups FOREIGN KEY (TeacherId)
REFERENCES Teachers(TeacherId),
CONSTRAINT FK_TeachersGroup FOREIGN KEY (GroupCode)
REFERENCES GroupCodes(GroupCode)
)
GO
INSERT INTO Teachers(FirstName,LastName)
VALUES ('Smith','Jane'),('Smith','John')
GO
INSERT INTO Courses(Course)
VALUES ('AAA','BBB','CCC')
GO
INSERT INTO GroupCodes(GroupCode)
VALUES ('A1','A2','B1','B2')
GO
INSERT INTO TeacherCourse(TeacherId,Course)
VALUES ('1','AAA'),('1','BBB'),('2','BBB'),('2','CCC')
GO
INSERT INTO TeacherGroup(TeacherId,GroupCode)
VALUES ('1','A1'),('1','A2'),('1','B2'),('2','A2'),('2','B1'),('2','B2')
GO
I need to come up with a query that returns each teacher entry in the same form as my initial table:
| 1 | Smith | Jane | AAA,BBB | A1,A2,B2 |
So, my question is: how do I make the two joins?
I have tried
SELECT t.TeacherId AS TeacherId, t.FName AS FirstName, t.LName AS LastName,
c.Course AS Course, g.GroupCode AS GroupCode
FROM TeacherCourse tc, TeacherGroup tg
JOIN Teachers t ON tc.TeacherId=t.TeacherId
JOIN Courses c ON tc.Course=c.Course
JOIN Teachers t ON tg.TeacherId=t.TeacherId
JOIN GroupCodes g ON tg.GroupCode=g.GroupCode
ORDER BY TeacherId
with no success.
Thank you.
EDIT:
I managed to sort out the concatenation using this idea enter link description here
In case this helps anyone, the select code I used is:
SELECT t.TeacherId AS TeacherId, t.FirstName AS FirstName, t.LastName AS LastName,
(SELECT STUFF((SELECT DISTINCT ', ' + LTRIM(RTRIM(tc.Course)) FROM TeacherCourse tc
INNER JOIN Courses c ON tc.Course = c.Course WHERE tc.TeacherId = t.TeacherId
FOR XML PATH('')),1,1,(''))) AS Courses,
(SELECT STUFF((SELECT DISTINCT ', ' + LTRIM(RTRIM(tg.GroupCode)) FROM TeacherGroup tg
INNER JOIN GroupCodes g ON tg.GroupCode = g.GroupCode WHERE tg.TeacherId = t.TeacherId
FOR XML PATH('')),1,1,(''))) AS Group_Codes
FROM TeacherCourse tc
JOIN TeacherGroup tg ON
tc.TeacherId = tg.TeacherId
JOIN Teachers t ON
tc.TeacherId=t.TeacherId AND tg.TeacherId=t.TeacherId
JOIN Courses c ON
tc.Course=c.Course
JOIN GroupCodes g ON
tg.GroupCode=g.GroupCode
/* WHERE clause can be inserted here specific results. Example:
WHERE T.LastName = 'Jane'
*/
GROUP BY t.TeacherId, t.FirstName, t.LastName
ORDER BY TeacherId
Thank you very much for all your help.
Your normalization scheme seems to be incorrect.
You create 3 tables (teacher, course, group) - this is correct.
Now you'd create a table which creates a relation between course and group. This course_group table must be treated as one more entity. This entity stores what course the group should study.
Then you'd create a table which creates a relation between teacher and course_group. And this sets what teacher will study this group with this course.
Also I'd create one more table - the table which sets the relation between teacher and course. This table will store the data about what course the teacher may teach for. This is a pattern, not entity, table. It is used for client-side consistency checking. If a teacher is assigned to teach for a course which not present in this table then the operator will receive a warning that this course is not one registered for this teacher. But the operator may assign nevertheless.
It needs a lot of cleaning to solve your problem. please let's go step by step.
I created your tables here again. At the end, I changed your code in some critical places.
Please keep in mind :) MySQL != MSSQL Server
CREATE TABLE Teachers (
TeacherId int NOT NULL AUTO_INCREMENT,
FirstName char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
LastName char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (TeacherId)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE Courses (
Course char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (Course)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE GroupCodes (
GroupCode char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (GroupCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE TeacherGroup (
TeacherId int NOT NULL,
GroupCode char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (TeacherId,GroupCode),
KEY FK_TeachersGroup (GroupCode),
CONSTRAINT FK_TeacherGroups FOREIGN KEY (TeacherId) REFERENCES Teachers (TeacherId),
CONSTRAINT FK_TeachersGroup FOREIGN KEY (GroupCode) REFERENCES GroupCodes (GroupCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE TeacherCourse (
TeacherId int NOT NULL,
Course char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (TeacherId,Course),
KEY FK_TeachersCourse (Course),
CONSTRAINT FK_TeacherCourses FOREIGN KEY (TeacherId) REFERENCES Teachers (TeacherId),
CONSTRAINT FK_TeachersCourse FOREIGN KEY (Course) REFERENCES Courses (Course)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
End Result:
Here I corrected some joins, put in group by to summarize the result set by TeacherId, FirstName, and LastName.
In the select list I aggregated group codes and course by GROUP_CONCAT() function which does the job well.
SELECT t.TeacherId AS TeacherId, t.FirstName AS FirstName, t.LastName AS LastName,
GROUP_CONCAT(DISTINCT c.Course separator ', ') as courses,
GROUP_CONCAT(DISTINCT g.GroupCode separator ', ') as group_codes
FROM TeacherCourse tc
JOIN TeacherGroup tg ON
tc.TeacherId = tg.TeacherId
JOIN Teachers t ON
tc.TeacherId=t.TeacherId AND tg.TeacherId=t.TeacherId
JOIN Courses c ON
tc.Course=c.Course
JOIN GroupCodes g ON
tg.GroupCode=g.GroupCode
GROUP BY TeacherId, FirstName, LastName
ORDER BY TeacherId;
Resulting Screenshoot is your initial dataset:

Oracle 11g Insert into from another table that has duplicates

Ok so I have 2 new tables: Client and Contract. I'm gonna focus on the first one as they have the same structure. Client looks like:
+-----------+---------+
| client_id | name |
+-----------+---------+
| Value 1 | Value 2 |
+-----------+---------+
And created like this:
CREATE TABLE Client (
client_id varchar2(15) NOT NULL,
name varchar2(100) NOT NULL,
CONSTRAINT Client_pk PRIMARY KEY (client_id)
) ;
Also I have an old table: old_contracts looking like:
+------------+----------+------+
| contractid | clientid | name |
+------------+----------+------+
| con1 | cli1 | n1 |
| con2 | cli2 | n2 |
| con3 | cli2 | n2 |
| con4 | cli3 | n3 |
| con5 | cli3 | n3 |
+------------+----------+------+
Defined:
CREATE TABLE old_contracts(
contractid varchar2(15) NOT NULL
clientid varchar2(15) NOT NULL,
name varchar2(100) NOT NULL
) ;
I want to take the data from old_contract and insert it into Client.
This old_contracts table has rows with duplicate clientid (one client can have more than one contract) but I don't want to have duplicates on Client table so I am doing this:
INSERT INTO Client (
client_id,
name
) SELECT DISTINCT
clientid,
name
FROM old_contracts;
to not get duplicates. Anyway, I'm getting this error:
Error SQL: ORA-00001: unique constraint (USER.CLIENT_PK) violated
00001.00000 - "unique constraint (%s.%s) violated"
What's going on? I believe the DISTINCT keyword was going to do the thing.
I've also tried adding a WHERE NOT EXISTS clause as suggested in related posts (i.e. this one), but the result I'm getting is the same error.
Most likely, the name is not always the same for a given clientid.
Try this instead:
INSERT INTO Client (
client_id,
name
) SELECT clientid,
max(name)
FROM old_contracts
GROUP BY clientid;

SQL Server split SELECT XML column as arbitrary individual columns

In my application, I have few pre-defined fields for an object and user can define custom fields. I am using XML data type to store the custom fields in a name value format.
e.g. I have Employees table that has FN, LN, Email as pre-defined columns and CustomFields as XML column to hold the user defined fields.
And different rows can contain different custom fields.
e.g. Row 1 -> John, Smith, jsmith#example.com,
<root>
<phone>123-123-1234</phone>
<country>USA</country>
</root>
and then Row 2 -> Smith, John, sjohn#example.com,
<root>
<age>50</age>
<sex>Male</sex>
</root>
And there can be any number of such custom fields defined for different employee records. The format will always be the same
<root><field>value</field></root>
How can I return Phone and Country as columns while selecting Row1 and return Age and Sex as columns while selecting Row2?
Take this temp table for all examples
CREATE TABLE #tbl (ID INT IDENTITY, FirstName VARCHAR(100),LastName VARCHAR(100),eMail VARCHAR(100),CustomFields XML);
INSERT INTO #tbl VALUES
('John','Smith','john.smith#test.com'
,'<root>
<phone>123-123-1234</phone>
<country>USA</country>
</root>')
, ('Jane','Miller','jane.miller#test.com'
,'<root>
<age>50</age>
<sex>Male</sex>
</root>');
Option 1
Assuming that there is a fix known set of custom fields.
This allows typesafe reading (age as INT)
all possible columns are returned, unused are NULL
Try this code
SELECT tbl.ID
,tbl.FirstName
,tbl.LastName
,tbl.eMail
,tbl.CustomFields.value('(/root/phone)[1]','nvarchar(max)') AS phone
,tbl.CustomFields.value('(/root/country)[1]','nvarchar(max)') AS country
,tbl.CustomFields.value('(/root/age)[1]','int') AS age
,tbl.CustomFields.value('(/root/sex)[1]','nvarchar(max)') AS sex
FROM #tbl AS tbl
This is the result
+----+-----------+----------+----------------------+--------------+---------+------+------+
| ID | FirstName | LastName | eMail | phone | country | age | sex |
+----+-----------+----------+----------------------+--------------+---------+------+------+
| 1 | John | Smith | john.smith#test.com | 123-123-1234 | USA | NULL | NULL |
+----+-----------+----------+----------------------+--------------+---------+------+------+
| 2 | Jane | Miller | jane.miller#test.com | NULL | NULL | 50 | Male |
+----+-----------+----------+----------------------+--------------+---------+------+------+
*/
Option 2
assuming you do not know the field names in advance you cannot name the output columns directly
But you can use generic names, read the data row-wise and do PIVOT
Try this:
SELECT p.*
FROM
(
SELECT tbl.FirstName
,tbl.LastName
,tbl.eMail
,N'Col_' + CAST(ROW_NUMBER() OVER(PARTITION BY tbl.ID ORDER BY (SELECT NULL)) AS NVARCHAR(max)) AS ColumnName
,A.cf.value('local-name(.)','nvarchar(max)') + ':' + A.cf.value('.','nvarchar(max)') AS cf
FROM #tbl AS tbl
CROSS APPLY tbl.CustomFields.nodes('/root/*') AS A(cf)
) AS x
PIVOT
(
MAX(cf) FOR ColumnName IN(Col_1,Col_2,Col_3,Col_4 /*add as many as you need*/)
) AS p
This is the result
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| FirstName | LastName | eMail | Col_1 | Col_2 | Col_3 | Col_4 |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| Jane | Miller | jane.miller#test.com | age:50 | sex:Male | NULL | NULL |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
| John | Smith | john.smith#test.com | phone:123-123-1234 | country:USA | NULL | NULL |
+-----------+----------+----------------------+--------------------+-------------+-------+-------+
Option 3
assuming you do not know the columns, but you need the columns correctly named
attention: be aware of the fact, that such an approach will never be allowed in ad-hoc-SQL such as VIEW or inline TVF which might be a great back draw...
This needs dynamic creation of a statement. I will create the statement of Option 1 but replace the fix list with a dynamically created list:
DECLARE #DynamicColumns NVARCHAR(MAX)=
(
SELECT ',tbl.CustomFields.value(''(/root/' + A.cf.value('local-name(.)','nvarchar(max)') + ')[1]'',''nvarchar(max)'') AS ' + A.cf.value('local-name(.)','nvarchar(max)')
FROM #tbl AS tbl
CROSS APPLY tbl.CustomFields.nodes('/root/*') AS A(cf)
FOR XML PATH('')
);
DECLARE #DynamicSQL NVARCHAR(MAX)=
' SELECT tbl.ID
,tbl.FirstName
,tbl.LastName
,tbl.eMail'
+ #DynamicColumns +
' FROM #tbl AS tbl;'
EXEC(#DynamicSQL);
The result would be the same as in Option 1, but with a completely dynamic approach.
Cleanup
DROP TABLE #tbl;

Need help in a reverse pivot - Column names become data and then the values in that column

I am looking to pull data from a table and insert the results into a #temp table where the column name is part of the result set. I know I can get the column names from the schema information table but I need the data in one of the columns. There will be only 1 row from the original table, so I am basically doing a reverse STUFF command or reverse Pivot. The result set would be columnName and Value but multiple rows- as many rows as columns
So basically the result set or table with have just 2 columns, one for the column name and one for the value in that column. That is my goal. I know a pivot does this in reverse but can't seem to find a "Reverse pivot". I am using SQL Server 2008.
Any help would be appreciated. Thanks!
Are you able to give a better description of what you're after? For example, more information on the table structures, etc.
Regardless. Please see below an example of using a CROSS APPLY statement to transform a 'Pivot Table' into a flat table.
Data within the pivot table
+----+-----------+----------+----------------+
| Id | FirstName | LastName | Company |
+----+-----------+----------+----------------+
| 1 | Joe | Bloggs | A Company |
| 2 | Jane | Doe | Lost and Found |
+----+-----------+----------+----------------+
SQL statement to turn pivot table to flat table
IF OBJECT_ID('PivotedTable', 'U') IS NOT NULL
DROP TABLE PivotedTable
GO
CREATE TABLE PivotedTable (
Id INT IDENTITY,
FirstName VARCHAR(255),
LastName VARCHAR(255),
Company VARCHAR(255)
)
INSERT PivotedTable (FirstName, LastName, Company)
VALUES ('Joe', 'Bloggs', 'A Company'), ('Jane', 'Doe', 'Lost and Found')
SELECT
FlatTable.ColumnName,
FlatTable.Value
FROM PivotedTable t
CROSS APPLY (
VALUES
('FirstName', FirstName),
('LastName', LastName),
('Company', Company)
) FlatTable (ColumnName, Value)
Output of the query after turning into a flat table
+------------+----------------+
| ColumnName | Value |
+------------+----------------+
| FirstName | Joe |
| LastName | Bloggs |
| Company | A Company |
| FirstName | Jane |
| LastName | Doe |
| Company | Lost and Found |
+------------+----------------+

Resources