Computing a column based on comparisons of other columns in table - sql-server

Ok I am working on a database table with 4 columns, lets say a first name, middle name, last name and a group id. I want to group people based on the fact that they have the same first and last names regardless of their middle name. I also want to, if a new entry comes in, give that entry the correct group id.
Here is an example of the result:
----------------------------------------------------------
| First_Name | Middle_Name | Last_Name | Group_ID |
----------------------------------------------------------
| Jon | Jacob | Schmidt | 1 |
----------------------------------------------------------
| William | B. | Schmidt | 1 |
----------------------------------------------------------
| Sally | Anne | Johnson | 2 |
----------------------------------------------------------
I'm not sure whether or not this falls under the jurisdiction of a computed column, some kind of join or something far less obscure, Please help!

If you only need to enumerate the groups within a query then row_number() will work for you:
declare #Names table ( First_Name varchar(10), Middle_Name varchar(10), Last_Name varchar(10))
insert into #Names
select 'Jon', 'Jacob', 'Schmidt' union all
select 'William', 'B.', 'Schmidt' union all
select 'Sally', 'Anne', 'Johnson' union all
select 'Jon', 'Two', 'Schmidt'
;with Groups (First_Name, Last_Name, Group_ID) as
( select First_Name, Last_Name, row_number()over(order by Last_Name)
from #Names
group
by First_Name, Last_Name
)
select n.First_Name, n.Middle_Name, n.Last_Name, g.Group_Id
from #Names n
join Groups g on
n.First_Name = g.First_Name and
n.Last_Name = g.Last_Name;
Be aware the Group_ID value will change as new nameGroups are introduced.
If you want to assign and persist a Group_ID then I would suggest creating an ancillary table and assign the Group_IDs there.
By storing the mapping outside of the #Names table you are allowing users to change their names and not have to worry about re-evaluating the group assignment. It also allows you to modify the grouping logic without re-assigning names. You also have the ability to map similar enough values to the same grouping (John, Jon, Jonny).
Group_ID is composed of a First_Name and Last_Name. So, store it that way.
declare #Names table ( First_Name varchar(10), Middle_Name varchar(10), Last_Name varchar(10))
insert into #Names
select 'Jon', 'Jacob', 'Schmidt' union all
select 'William', 'B.', 'Schmidt' union all
select 'Sally', 'Anne', 'Johnson' union all
select 'Jon', 'Two', 'Schmidt'
declare #NameGroup table (Group_Id int identity(1,1), First_Name varchar(10), Last_Name varchar(10) unique(First_Name, Last_Name));
insert into #NameGroup (First_Name, Last_Name)
select 'Jon', 'Schmidt' union all
select 'Sally', 'Johnson';
declare #Group_ID int;
declare #First_Name varchar(10),
#Middle_Name varchar(10),
#Last_Name varchar(10)
select #First_Name = 'Jon',
#Middle_Name = 'X',
#Last_Name = 'Schmidt'
--be sure the Id has already been assigned
insert into #NameGroup
select #First_Name, #Last_Name
where not exists(select 1 from #NameGroup where First_Name = #First_Name and Last_Name = #Last_Name)
--resolve the id
select #Group_ID = Group_ID
from #NameGroup
where First_Name = #First_Name and
Last_Name = #Last_Name;
--store the name
insert into #Names (First_Name, Middle_Name, Last_Name)
values(#First_Name, #Middle_Name, #Last_Name);
select n.First_Name, n.Middle_Name, n.Last_Name, ng.Group_Id
from #Names n
join #NameGroup ng on
n.First_Name = ng.First_Name and
n.Last_Name = ng.Last_Name;

Related

Get the max number in a varchar column SQL Server

I have a varchar column which holds comma separated numbers.
I want to fetch the max number in that column.
+-------------------+--------------+
| reference_no | Name |
+-------------------|--------------+
| 17530, 20327 | John |
| , 14864 | Smith |
| 8509 | Michael |
| 14864, 17530 | Kelly |
+-------------------+--------------+
So, in the above column (reference_no) example the output should be 20327.
Then I have to select the row which contains this number.
Assuming NOT 2016+
If you have no more than 4 values in reference_no in any particular row, then perhaps parsename()
If more than 4, you may need to fix the data or use a split/parse function.
Example
Select MaxValue = max(V)
From YourTable A
Cross Apply ( values (replace([reference_no],',','.')) ) B(S)
Cross Apply ( values (try_convert(int,parsename(S,1)))
,(try_convert(int,parsename(S,2)))
,(try_convert(int,parsename(S,3)))
,(try_convert(int,parsename(S,4)))
) C(V)
Returns
MaxValue
20327
try the following:
declare #tab table (reference_no varchar(max), [Name] varchar(100))
insert into #tab
select '17530, 20327','John' union
select ', 14864 ','Smith' union
select '8509 ','Michael' union
select '14864, 17530','Kelly'
create table #final (val int)
insert into #final
SELECT Split.a.value('.', 'VARCHAR(100)') AS String
FROM (SELECT reference_no reference_no,
CAST ('<S>' + REPLACE(reference_no, ',', '</S><S>') + '</S>' AS XML) AS String
FROM #tab) AS A CROSS APPLY String.nodes ('/S') AS Split(a);
select * from #tab where reference_no like '%'+ (select convert(varchar(100), max(val)) from #final) + '%'
drop table #final

Grouping similar items recursively

I have been reading the following Microsoft article on recursive queries using CTE and just can't seem to wrap my head around how to use it for group common items.
I have a table the contains the following columns:
ID
FirstName
LastName
DateOfBirth
BirthCountry
GroupID
What I need to do is start with the first person in the table and iterate through the table and find all the people that have the same (LastName and BirthCountry) or have the same (DateOfBirth and BirthCountry).
Now the tricky part is that I have to assign them the same GroupID and then for each person in that GroupID, I need to see if anyone else has the same information and then put the in the same GroupID.
I think I could do this with multiple cursors but it is getting tricky.
Here is sample data and output.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
John Doe and Jane Doe are in the same Group (100) because they have the same (LastName and BirthCountry).
Adam Wayne is in Group (100) because he has the same (BirthDate and BirthCountry) as John Doe.
Kay Wayne is in Group (100) because she has the same (LastName and BirthCountry) as Adam Wayne who is already in Group (100).
Matt Knox is in a new group (101) because he does not match anyone in previous groups.
Jack Stone is in a group (100) because he has the same (BirthDate and BirthCountry) as Kay Wayne who is already in Group (100).
Data scripts:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);
Here's what I came up with. I have rarely written recursive queries so it was some good practice for me. By the way Kay and Adam do not share a birth country in your sample data.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
For the recursion to terminate it's necessary to keep track of the equivalences (via string concatenation) so that at each level it only needs to consider newly discovered equivalences (or connections, transitivities, etc.) Notice that I've avoided using the word group to avoid bleeding into the GROUP BY concept.
http://rextester.com/edit/TVRVZ10193
EDIT: I used an almost arbitrary numbering for the equivalences but if you wanted them to appear in a sequence based on the lowest ID with each block that's easy to do. Instead of using row_number() say min(ID) as grpNum presuming, of course, that IDs are unique.
I assume groupid is the output you want which start from 100.
Even if groupid come from another table,then it is no problem.
Firstly,sorry for my "No cursor comments".Cursor or RBAR operation is require for this task.In fact after a very long time i met such requirement which took so long and I use RBAR operation.
if tommorrow i am able to do using SET BASE METHOD,then I will come and edit it.
Most importantly using RBAR operation make the script more understanding and I think it wil work for other sample data too.
Also give feedback about the performance and how it work with other sample data.
Alsi in my script you note that id are not in serial,and it do not matter,i did this in order to test.
I use print for debuging purpose,you can remove it.
SET NOCOUNT ON
DECLARE #Tbl TABLE(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL) ,
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL),
(7, 'Jerry', 'Stone', '1976-06-08', 'Hay', NULL)
DECLARE #StartGroupid INT = 100
DECLARE #id INT
DECLARE #Groupid INT
DECLARE #Maxid INT
DECLARE #i INT = 1
DECLARE #MinGroupID int=#StartGroupid
DECLARE #MaxGroupID int=#StartGroupid
DECLARE #LastGroupID int
SELECT #maxid = max(id)
FROM #tbl
WHILE (#i <= #maxid)
BEGIN
SELECT #id = id
,#Groupid = Groupid
FROM #Tbl a
WHERE id = #i
if(#Groupid is not null and #Groupid<#MinGroupID)
set #MinGroupID=#Groupid
if(#Groupid is not null and #Groupid>#MaxGroupID)
set #MaxGroupID=#Groupid
if(#Groupid is not null)
set #LastGroupID=#Groupid
UPDATE A
SET groupid =case
when #id=1 and b.groupid is null then #StartGroupid
when #id>1 and b.groupid is null then #MaxGroupID+1--(Select max(groupid)+1 from #tbl where id<#id)
when #id>1 and b.groupid is not null then #MinGroupID --(Select min(groupid) from #tbl where id<#id)
end
FROM #Tbl A
INNER JOIN #tbl B ON b.id = #ID
WHERE (
(
a.BirthCountry = b.BirthCountry
and a.DateOfBirth = b.dateofbirth
)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
)
--if(#id=7) --#id=2,#id=3 and so on (for debug
--break
SET #i = #i + 1
SET #ID = #I
END
SELECT *
FROM #Tbl
Alternate Method but still it return 56,000 rows without rownum=1.See if it work with other sample data or see if you can further optimize it.
;with CTE as
(
select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,#StartGroupid GroupID
,1 rn
FROM #Tbl A where a.id=1
UNION ALL
Select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,case when ((a.BirthCountry = b.BirthCountry and a.DateOfBirth = b.dateofbirth)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
) then b.groupid else b.groupid+1 end
, b.rn+1
FROM #tbl A
inner join CTE B on a.id>1
where b.rn<#Maxid
)
,CTE1 as
(select * ,row_number()over(partition by id order by groupid )rownum
from CTE )
select * from cte1
where rownum=1
Maybe you can run it in this way
SELECT *
FROM table_name
GROUP BY
FirstName,
LastName,
GroupID
HAVING COUNT(GroupID) >= 2
ORDER BY GroupID

How can I select first inserted row in SQL Server?

may be this question be unrelated to stackoverflow . but this is my problem and i do not know the syntax .
with this query i select the persons who had transactions by their date time .
this is my query
i want to write query that select the their first TransactionsTimeStamp?
I assume that you are looking for ranking functions like ROW_NUMBER, you could use them for example with a Common Table Expression (CTE):
WITH CTE AS
(
SELECT ..., RN = ROW_NUMBER() OVER (PARTITION BY FirstName, LastName
ORDER BY TransactionsTimeStamp ASC)
FROM dbo.TableName ... (join tables here)
)
SELECT ....
FROM CTE
WHERE RN = 1
... are the columns that you want to select, you can select all, as opposed to a GROUP BY.
But if you just want to select the TransactionsTimeStamp-column for every user:
SELECT MIN(TransactionsTimeStamp) AS TransactionsTimeStamp, FirstName, LastName
FROM dbo.tableName ... (join tables here)
GROUP BY FirstName, LastName
The problem in your query is that you are grouping by Date column. So you are getting all different Date values as a result. You should group only by FirstName and LastName and apply some aggregation functions to Date column.
If just Min date is needed then you can get that date using aggregate function like:
DECLARE #test TABLE
(
first_name NVARCHAR(MAX) ,
last_name NVARCHAR(MAX) ,
transaction_date DATETIME
)
INSERT INTO #test
VALUES ( 'A', 'B', '20150101' )
INSERT INTO #test
VALUES ( 'A', 'B', '20150120' )
INSERT INTO #test
VALUES ( 'C', 'D', '20150103' )
INSERT INTO #test
VALUES ( 'C', 'D', '20150119' )
SELECT first_name ,
last_name ,
MIN(transaction_date) AS min_transaction_date
FROM #test
GROUP BY first_name ,
last_name
Output:
first_name last_name min_transaction_date
A B 2015-01-01 00:00:00.000
C D 2015-01-03 00:00:00.000
Select firstname, lastname, min(date) as minimum_date from clubprofile_cp
group by firstname, lastname

Dynamic Pivot Tables MS SQL 2008

I'm trying to create a dynamic pivot table to get results in different columns rather than rows.
This is the table I'm using to test
CREATE TABLE [dbo].[Authors](
[Client_Id] [nvarchar](50) NOT NULL,
[Project_Id] [nvarchar](50) NOT NULL,
[Person_Id] [int] NOT NULL,
[Author_Number] [int] NOT NULL,
[Family_Name] [nvarchar](50) NULL,
[First_Name] [nvarchar](50) NULL,
)
INSERT INTO Authors (Client_Id, Project_Id, Person_Id, Author_Number, Family_Name, First_Name)
VALUES ('TEST','TEST',12345,1,'Giust','Fede')
INSERT INTO Authors (Client_Id, Project_Id, Person_Id, Author_Number, Family_Name, First_Name)
VALUES ('TEST','TEST',12345,2,'Giust','Fede')
INSERT INTO Authors (Client_Id, Project_Id, Person_Id, Author_Number, Family_Name, First_Name)
VALUES ('TEST','TEST',12346,1,'Giust','Fede')
INSERT INTO Authors (Client_Id, Project_Id, Person_Id, Author_Number, Family_Name, First_Name)
VALUES ('TEST','TEST',12346,2,'Giust','Fede')
INSERT INTO Authors (Client_Id, Project_Id, Person_Id, Author_Number, Family_Name, First_Name)
VALUES ('TEST','TEST',12346,3,'Giust','Fede')
So far I get the results like this
CLIENT_ID PROJECT_ID PERSON_ID AUTHOR_NUMBER FAMILY_NAME FIRST_NAME
TEST TEST 12345 1 Giust Fede
TEST TEST 12345 2 Ma Ke
TEST TEST 12346 1 Jones Peter
TEST TEST 12346 2 Davies Bob
TEST TEST 12346 3 Richards Craig
I need the results to come out like this, and to be dynamic because sometime I can have 2 authors, or 10 authors.
CLIENT_ID PROJECT_ID PERSON_ID FAMILY_NAME_1 FIRST_NAME_1 FAMILY_NAME_2 FIRST_NAME_2 FAMILY_NAME_3 FIRST_NAME_3
TEST TEST 12345 Giust Fede Ma Ke
TEST TEST 12346 Jones Peter Davies Bob Richards Craig
I've been trying to use this code, but keep getting errors
SQL Fiddle
Here are a few suggestions to solve your issue.
First, your current code is unpivoting all of the columns in the table, you only need to UNPIVOT the Family_Name and First_Name columns. As a result I would not use a variable to get this list of columns, just hard-code the two columns you need to unpivot.
Then to get the list of columns to PIVOT, alter the code to use both the Author_number and then a string with the column names Family_Name and First_Name:
select #colsPivot = STUFF((SELECT ',' + QUOTENAME(c.col + '_'+cast(Author_Number as varchar(10)))
from Authors
cross apply
(
select 'Family_Name' col, 1 so union all
select 'First_Name', 2
) c
group by col, author_number, so
order by author_number, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
This gives a list of cols:
| COLUMN_0 |
------------------------------------------------------------------------------------------------
| [Family_Name_1],[First_Name_1],[Family_Name_2],[First_Name_2],[Family_Name_3],[First_Name_3] |
Making these changes to your code the final query will be
DECLARE #colsPivot AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #colsPivot = STUFF((SELECT ',' + QUOTENAME(c.col + '_'+cast(Author_Number as varchar(10)))
from Authors
cross apply
(
select 'Family_Name' col, 1 so union all
select 'First_Name', 2
) c
group by col, author_number, so
order by author_number, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query
= 'select Client_id, Project_id, Person_Id, '+#colsPivot+'
from
(
select Client_id, Project_id, Person_Id,
col+''_''+cast(Author_Number as varchar(10)) col, val
from
(
select Client_id, Project_id,
Person_Id,
Author_Number,
Family_Name,
First_Name
from Authors
) s
unpivot
(
val
for col in (Family_Name, First_Name)
) u
) x1
pivot
(
max(val)
for col in ('+ #colspivot +')
) p'
exec(#query)
See SQL Fiddle with Demo. This gives a result:
| CLIENT_ID | PROJECT_ID | PERSON_ID | FAMILY_NAME_1 | FIRST_NAME_1 | FAMILY_NAME_2 | FIRST_NAME_2 | FAMILY_NAME_3 | FIRST_NAME_3 |
-----------------------------------------------------------------------------------------------------------------------------------
| TEST | TEST | 12345 | Giust | Fede | Ma | Ke | (null) | (null) |
| TEST | TEST | 12346 | Jones | Peter | Davies | Bob | Richards | Craig |

SQL Server T-SQL concat aggregate

Imagine I have this table
BirthDay |Name
1-10-2010 | 'Joe'
2-10-2010 | 'Bob'
2-10-2010 | 'Alice'
How can I get a result like this
BirthDay |Name
1-10-2010 | 'Joe'
2-10-2010 | 'Bob', 'Alice
tks
try this:
set nocount on;
declare #t table (BirthDay datetime, name varchar(20))
insert into #t VALUES ('1-10-2010', 'Joe' )
insert into #t VALUES ('2-10-2010', 'Bob' )
insert into #t VALUES ('2-10-2010', 'Alice')
set nocount off
SELECT p1.BirthDay,
stuff(
(SELECT
', ' + p2.name --use this if you want quotes around the names: ', ''' + p2.name+''''
FROM #t p2
WHERE p2.BirthDay=p1.BirthDay
ORDER BY p2.name
FOR XML PATH('')
)
,1,2, ''
) AS Names
FROM #t p1
GROUP BY
BirthDay
OUTPUT:
BirthDay Names
----------------------- ------------
2010-01-10 00:00:00.000 Joe
2010-02-10 00:00:00.000 Alice, Bob
(2 row(s) affected)
This solution works with no need of deploy from Visual studio or dll file in server.
Copy-Paste and it Work!
http://groupconcat.codeplex.com/
dbo.GROUP_CONCAT(VALUE )
dbo.GROUP_CONCAT_D(VALUE ), DELIMITER )
dbo.GROUP_CONCAT_DS(VALUE , DELIMITER , SORT_ORDER )
dbo.GROUP_CONCAT_S(VALUE , SORT_ORDER )

Resources