How do I generate Unique Username using TSQL? - sql-server

I am trying to generate unique Login username for each users in my system.
What I have done so far is:
Select p.FirstName+P.LastName + p.PersonId from person As P
Here 1255 is Primary Key. The Output is like JamesHarley1255 ,
However I don't want to use the Primarykey. What are other
alternatives.There will be numerous Duplicate Records too.
Lets say I need a function that generates a unique number everytime between 1 to n numbers. And somehow I need to produce output like JamesHarley125

Try this,
Select p.FirstName+P.LastName +CAST(ROW_NUMBER() OVER (ORDER BY p.FirstName) AS VARCHAR(20))
from person As P

If you want to get one unique column against each record without using primary key column, I think you can use Row Number function. Query below should work in this case
Select p.FirstName+P.LastName + CAST((row_number() OVER ( PARTITION BY p.FirstName ORDER BY p.FirstName)) AS varchar(100)) from person As P

You can create an column with incremental integers using identity property and you can concatenate the same with the first name and last name. So that each row will be unique.
Select p.FirstName+P.LastName + P.Id from person As P
Here column P.Id can declared as below in the table definition. This will start with value '1' and will be incremented in the scale of '1'. Identity(seed, increment)
id int IDENTITY(1,1)

Building off the previous responses, this generates usernames without the numbers where possible, and then only adds numbers to make the usernames unique where actually required. I know this isn't the exact way you are generating your usernames, but this is how I decided to generate mine. I just changed the table names to match yours.
with Usernames(PersonId, Username) as
(
select PersonId,
left(Firstname, 1) + Middlename + Lastname as Username
from Person
),
NumberedUserNames(PersonId, Username, Number) as
(
select PersonId,
Username,
ROW_NUMBER() OVER (PARTITION BY Username ORDER BY Username) as Number
from Usernames
),
UniqueUsernames(PersonId, UniqueUsername) AS
(
select PersonId,
case when Number = 1 then UserName
else UserName + CAST(Number-1 as VarChar(20))
End as UniqueUsername
from NumberedUserNames
)
select PersonId,
UniqueUsername
from UniqueUsernames;

Related

How can I delete duplicates from a table based on a location id and name

columns are: Name, Location_Name, Location_ID
I want to check Names and Location_ID, and if there are two that are the same I want to delete/remove that row.
For example: If Name John Fox at location id 4 shows up two or more times I want to just keep one.
After that, I want to count how many people per location.
Location_Name1: 45
Location_Name2: 66
Etc...
The location name and Location Id are related.
Sample data
Code I tried
Deleting duplicates is a common pattern. You apply a sequence number to all of the duplicates, then delete any that aren't first. In this case I order arbitrarily but you can choose to keep the one with the lowest PK value or that was modified last or whatever - just update the ORDER BY to sort the one you want to keep first.
;WITH cte AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY Name, Location_ID ORDER BY ##SPID)
FROM dbo.TableName
)
DELETE cte WHERE rn > 1;
Then to count, assuming there can't be two different Location_IDs for a given Location_Name (this is why schema + sample data is so helpful):
SELECT Location_Name, People = COUNT(Name)
FROM dbo.TableName
GROUP BY Location_Name;
Example db<>fiddle
If Location_Name and Location_ID are not tightly coupled (e.g. there could be Location_ID = 4, Location_Name = Place 1 and Location_ID = 4, Location_Name = Place 2 then you're going to have to define how to determine which place to display if you group by Location_ID, or admit that perhaps one of those columns is meaningless.
If Location_Name and Location_ID are tightly coupled, they shouldn't both be stored in this table. You should have a lookup/dimension table that stores both of those columns (once!) and you use the smallest data type as the key you record in the fact table (where it is repeated over and over again). This has several benefits:
Scanning the bigger table is faster, because it's not as wide
Storage is reduced because you're not repeating long strings over and over and over again
Aggregation is clearer and you can join to get names after aggregation, which will be faster
If you need to change a location's name, you only need to change it in exactly one place
Sample code
CREATE TABLE People_Location
(
Name VARCHAR(30) NOT NULL,
Location_Name VARCHAR(30) NOT NULL,
Location_ID INT NOT NULL,
)
INSERT INTO People_Location
VALUES
('John Fox', 'Moon', 4),
('John Bear', 'Moon', 4),
('Peter', 'Saturn', 5),
('John Fox', 'Moon', 4),
('Micheal', 'Sun', 1),
('Jackie', 'Sun', 1),
('Tito', 'Sun', 1),
('Peter', 'Saturn', 5)
Get location and count
select Location_Name, count(1)
from
(select Name, Location_Name,
rn = ROW_NUMBER() OVER (PARTITION BY Name, Location_ID ORDER BY Name)
from People_Location
) t
where rn = 1
group by Location_Name
Result
Moon 2
Saturn 1
Sun 3

BigQuery Convert Columns to RECORD

In BigQuery, How can I turn many columns into a RECORD or Array of Key Value pairs
e.g.
Source Table
id
Name
DOB
Sex
1
Fred
01.01.2001
M
Destination Table
Id
Name
Key
Value
1
Fred
DOB
01.01.2001
Sex
M
I've tried a few things but cant get there, is there a nice way of doing it?
I've tried a few things but cant get there, is there a nice way of doing it?
Not sure what exactly was /could be an issue here as it is as simple/straightforward as below
select id, Name,
[struct<key string, value string>('DOB', DOB),('Sex', Sex)] info
from `project.dataset.table`
with output
Meantime, usually the issue comes when you don't know those columns names in advance and want to have generic approach - in this case you can use below approach where column names DOB and Sex are not being used
select id, Name,
array(
select as struct
split(replace(kv, '"', ''),':')[offset(0)] key,
split(replace(kv, '"', ''),':')[offset(1)] value,
from unnest(split(trim(to_json_string((select as struct * except (id, name) from unnest([t]))), '{}'))) kv
) info
from `project.dataset.table` t
with exact same result/output
Here you go with a solution:-
WITH `proj.dataset.tbl` AS
(SELECT '1' AS id, 'Fred' AS Name, '2020-12-07' AS DOB, 'M' as Sex
)
SELECT id, Name,
[struct('DOB' as key, cast (DOB as string) as value ),
struct('Sex' as key, Sex as value)
] as key_values
FROM `proj.dataset.tbl`
Output will be as :-

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

Sql find Row with longest String and delete the Rest

I am currently working on a table with approx. 7.5mio rows and 16 columns. One of the rows is an internal identifier (let's call it ID) we use at my university. Another column contains a string.
So, ID is NOT the unique index for a row, so it is possible that one identifier appears more than once in the table - the only difference between the two rows being the string.
I need to find all rows with ID and just keep the one with the longest string and deleting every other row from the original table. Unfortunately I am more of a SQL Novice, and I am really stuck at this point. So if anyone could help, this would be really nice.
Take a look at this sample:
SELECT * INTO #sample FROM (VALUES
(1, 'A'),
(1,'Long A'),
(2,'B'),
(2,'Long B'),
(2,'BB')
) T(ID,Txt)
DELETE S FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LEN(Txt) DESC) RN
FROM #sample) S
WHERE RN!=1
SELECT * FROM #sample
Results:
ID Txt
-- ------
1 Long A
2 Long B
It might be possible just in SQL, but the way I know how to do it would be a two-pass approach using application code - I assume you have an application you are writing.
The first pass would be something like:
SELECT theid, count(*) AS num, MAX(LEN(thestring)) AS keepme FROM thetable WHERE num > 1 GROUP BY theid
Then you'd loop through the results in whatever language you're using and delete anything with that ID except the one matching the string returned. The language I know is PHP, so I'll use it for my example, but the method would be the same in any language (for brevity, I'm skipping error checking, prepared statements, and such, and not testing - please use carefully):
$sql = 'SELECT theid, count(*) AS num, MAX(LEN(thestring)) AS keepme FROM thetable WHERE num > 1 GROUP BY theid';
$result = sqlsrv_query($resource, $sql);
while ($row = sqlsrv_fetch_object($result)) {
$sql = 'DELETE FROM thetable WHERE theid = '.$row->theid.' AND NOT thestring = '.$row->keepme;
$result = sqlsrv_query($resource, $sql);
}
You didn't say what you would want to do if two strings are the same length, so this solution does not deal with that at all - I'm assuming that each ID will only have one longest string.

Problem with unique SQL query

I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?
Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.
SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.
I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.
create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale

Resources