I'm trying to understand what this query is doing (SQL Server):
Select StudentId, left(fullname,charindex(' ',fullname + ' ')) as mysteryCol
From (select StudentId, ltrim (rtrim (fullname)) as fullname from STUDENTS) as S
And the query:
Select top 10 msyteryCol , count(*) as howMany from
(Select left (fullname, charindex (' ', fullname + ' ')) as mysteryCol
From (select StudentId, ltrim (rtrim (fullname)) as fullname from STUDENTS) as S) as Z
Group by mysteryCol
Having count (*) > 100
Order by 2 desc
I only understood that the charindex will find the index place of an empty space ' ' from the fullname, but I haven't really understood what is the final output of this.
Thanks for all the helpers
Short answer : It will read the first name From a full name. The second query will just group based on first name and give first 10 names and count ordered in descending order that have a occurrence of more than 100.
Explanation : From (select StudentId, ltrim (rtrim (fullname)) as fullname from STUDENTS this line of code removes any leading and trailing spaces in the fullname. The only spaces left in name are now left between the first and last names(If any).
charindex(' ',fullname + ' ') This line gets the index of the first space that occurs in the full name. If the full name is only made up of one string fullname + ' ' takes care of that. This gives us the index of the first space that occurs in the name.
left(fullname,charindex(' ',fullname + ' ')) gets the string value to the left of first occurrence of a space character. Hence the first name.
Select top 10 msyteryCol , count(*) as howMany from
(Select left (fullname, charindex (' ', fullname + ' ')) as mysteryCol
From (select StudentId, ltrim (rtrim (fullname)) as fullname from STUDENTS) as
S) as Z
Group by mysteryCol
Having count (*) > 100
Order by 2 desc
This query groups the query by First name and counts the number of occurrences of each first name. It displays the top 10 names that have the most counts and where count of occurrences is greater than 100.
Related
I have few string with numbers like this; and its around 3000 records.
Column
------------
Cell 233567-3455
Cell123-4567
Cell#123-7449
Local 456-0987
1 616 468-7796
1234567-5x2345
234/625-1234
(C)755-7442
5732878-2
5721899-23
6712909-3
7894200-234
2144-57238
5673893/588218
437-4737-5772
How can i find the records like below:
Column
-------------
5732878-2
5721899-23
6712909-3
7894200-234
Once I find this, I need to split those into two parts
1st Column. | 2nd column
------------- |
5732878 | 5732872
5721899 | 5721823
6712909 | 6712903
7894200 | 7894234
I tried to fix This using PARINDEX and CHARINDEX
But somehow its not working.Please help.
I don't know your filtering logic to get to your intermediate set, but this should get your expected final result set. I assumed you only want records where the length of the string to the left of the hyphen is greater than the length on the right and also exclude records with more than 1 hyphen.
SELECT LEFT(telephone, CHARINDEX('-', telephone)-1) AS [1stTelephone],
STUFF(
--get the string before the hyphen
LEFT(telephone, CHARINDEX('-', telephone)-1),
--get the starting location of chars we are going to replace
LEN(LEFT(telephone, CHARINDEX('-', telephone)))-LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--get the length of the section we are replacing
LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)),
--replace that section with the string after the hyphen
RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))-1)
) AS [2nd telephone]
FROM your_table
WHERE LEN(LEFT(telephone, CHARINDEX('-', telephone))) > LEN(RIGHT(telephone, CHARINDEX('-', REVERSE(telephone))))
AND len(telephone) - len(REPLACE(telephone, '-', '')) = 1
Somewhat dirty method (looks specifically for 7 digits followed by hyphen followed by any number of digits):
SELECT BasePhone AS Phone1, LEFT(BasePhone, 7-LEN(OtherPhoneEnd)) + OtherPhoneEnd AS Phone2
FROM (
SELECT LEFT(Telephone, 7) AS BasePhone, SUBSTRING(Telephone,9,7) AS OtherPhoneEnd
FROM Telephones
WHERE Telephone LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-%'
)
I assumed based on information you given, that you want numbers with hyphen (-) at 8th position. Try this:
create table #TelNo (
Tel varchar(30)
)
insert #TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
select Tel, LEFT(Tel, Len(tel) - len(suffix)) + suffix [SecondTel] from (
select substring(Tel, 1, 7) [Tel], substring(Tel, 9, 10) [suffix] from #TelNo
where CHARINDEX('-', Tel) = 8
)a
You could use something like this:
DDL
use tempdb
create table TelNo (
Tel varchar(30)
)
insert TelNo(Tel)
values ('5732878-2'),
('5721899-23'),
('6712909-3'),
('7894200-234'),
('2144-57238'),
('5673893/588218'),
('437-4737-5772')
Code
select Tel,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-2)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-3)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-4)
else Tel
end Tel1,
case
when Tel like '%_-[0-9]' then left(Tel, len(Tel)-3) + right(Tel, 1)
when Tel like '%__-[0-9][0-9]' then left(Tel, len(Tel)-5) + right(Tel, 2)
when Tel like '%___-[0-9][0-9][0-9]' then left(Tel, len(Tel)-7) + right(Tel, 3)
else NULL
end Tel2
from TelNo
We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END
Let's say I have a simple query like this:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by grade
It seems that order by grade is being ordered by the alias grade instead of the actual column grade; at least that's what the result shows.
Is this correct?
Since I can't change the columns that are included in the result, is the solution to add an alias to the actual column? Something like this?
select
grade as 'grade2',
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup,grade
order by grade2
If you prefix the column name by its table name (or an alias given to the table in the FROM clause) in the ORDER BY clause, then it will use the column, not the expression computed in the SELECT clause and given the same name as the column.
So this should sort using the original grade column:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by table_empl.grade
Or:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl t
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by t.grade
Instruction Order By runs after all instructions, even Select. And in this case it's correct to take alias instead actual column.
The clauses are processed in the following order:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can use name(Alias) of table to specify table column
A very good question. Apparently, the official documentation does not provide a direct answer to it. However, one can imply the observed behaviour from the following fact: the difference between column alias and column is that the latter can be prefixed with its parent table name (alias), whereas the former cannot.
Since you didn't specify the table name in the ORDER BY clause, the column alias takes root.
I have a front end search box where the user can search for someone by firstname, middlename, surname or job title and bulk of the backend code looks like this:
SELECT TOP 50 * FROM (SELECT [EmployeeId], SUM(MatchOrder) as MatchOrder
FROM (SELECT
[EmployeeId],
CASE WHEN A.[EmployeeFieldId] = 4 Then 15 --Surname
WHEN A.[EmployeeFieldId] in (1, 2) Then 15 --PreferredName, FirstName
WHEN A.[EmployeeFieldId] = 3 Then 5 --MiddleName
WHEN A.[EmployeeFieldId] = 5 Then 20 --JobTitle
ELSE 3
END as MatchOrder
FROM [latest].[EmployeeAttributes] A
WHERE (' + #search + ')
) internal
GROUP BY EmployeeId) A
join dbo.vwEmployees E on E.EmployeeId = A.EmployeeId -- TEMP
ORDER BY 2 DESC'
Each employeeID is given a score (MatchOrder) which is totalled depending on how many of the above criteria are met (e.g. First Name + Surname match = 30) and then the search is ordered by the MatchOrder score to be displayed by the front end, But the problem is that if someone's First and Surname are very similar, e.g. Patrick Patterson and I only search for Pat Rice, then Patrick Patterson (30 pts) appears above Patrick Rice(30pts) because the First Name is being matched twice.
I'd like for it to either lower the points score if the match is doubly made, or modify my switch statement to somehow do this (nested case?
Do you know how I can combat this? Any help would be appreciated.
Thanks
Since [EmployeeFieldId] is always mapped to the same [MatchOrder], you should be able to control this by including [EmployeeFieldId] in the "internal" result set and slapping a DISTINCT clause on the SELECT:
SELECT DISTINCT
[EmployeeId],
[EmployeeFieldId],
CASE WHEN A.[EmployeeFieldId] = 4 Then 15 --Surname
WHEN A.[EmployeeFieldId] in (1, 2) Then 15 --PreferredName, FirstName
WHEN A.[EmployeeFieldId] = 3 Then 5 --MiddleName
WHEN A.[EmployeeFieldId] = 5 Then 20 --JobTitle
ELSE 3
END as MatchOrder
FROM [latest].[EmployeeAttributes] A
WHERE (' + #search + ')
That way, each employee will get at max one of the same field IDs applied towards their score.
I dynamically select a string built using another string. So, if string1='David Banner', then MyDynamicString should be 'DBanne'
Select
...
, Left(
left((select top 1 strval from dbo.SPLIT(string1,' ')) //first word
,1) //first character
+ (select top 1 strval from dbo.SPLIT(string1,' ')
//second word
where strval not in (select top 1 strval from dbo.SPLIT(string1,' ')))
,6) //1st character of 1st word, followed by up to 5 characters of second word
[MyDynamicString]
,...
From table1 Join table2 on table1pkey=table2fkey
Where MyDynamicString <> table2.someotherfield
I know table2.someotherfield is not equal to the dynamic string. However, when I replace MyDynamicString in the Where clause with the full left(left(etc.. function, it works as expected.
Can I not reference this string later in the query? Do I have to build it using the left(left(etc.. function each time in the where clause?
If you do it as you have it above, then the answer is yes, you have to recreate it again in the where clause.
As an alternative, you could use an inline view:
Select
...
, X.theString
,...
From table1 Join table2 on table1pkey=table2fkey
, (SELECT
string1
,Left(
left((select top 1 strval from dbo.SPLIT(string1,' ')) //first word
,1) //first character
+ (select top 1 strval from dbo.SPLIT(string1,' ')
//second word
where strval not in (select top 1 strval from dbo.SPLIT(string1,' ')))
,6) theString //1st character of 1st word, followed by up to 5 characters of second word
FROM table1
) X
Where X.theString <> table2.someotherfield
AND X.string1 = <whatever you need to join it to>
In SQL 2008 you can use the alias in the ORDER BY CLAUSE, but not in the where clause.
Why not wrap the calculation up into a user defined function (UDF) to avoid violating DRY and also to make the query more readable?
If it matters, here is an article that explains why you can't use a column alias in a HAVING, WHERE, or GROUP BY.