Find/replace string values based upon table values - sql-server

I have a somewhat unusual need to find/replace values in a string from values in a separate table.
Basically, I need to standardize a bunch of addresses, and one of the steps is to replace things like St, Rd or Blvd with Street, Road or Boulevard. I was going to write a function with bunch of nested REPLACE() statements, but this is 1) inefficient; and 2) not practical. There are over 500 possible abbreviations for street types according the USPS website.
What I'd like to do is something akin to:
REPLACE(Address1, Col1, Col2) where col1 and col2 are abbreviation and full street type in a separate table.
Anyone have any insight into something like this?

You can do such replacements using a recursive CTE. Something like this:
with r as (
select t.*, row_number() over (order by id) as seqnum
from replacements
),
cte as (
select replace(t.address, r.col1, r.col2) as address, seqnum as i
from t cross join
r
where r.seqnum = 1
union all
select replace(cte.address, r.col1, r.col2), i + 1
from cte join
r
on r.i = cte.i + 1
)
select cte.*
from (select cte.*, max(i) over () as maxi
from cte
) cte
where maxi = i;
That said, this is basically iteration. It will be quite expensive to do this on a table where there are 500 replacements per row.
SQL is probably not the best tool for this.

Related

Top vs Rank/Row Number functions - Which performs higher?

I attempted to Google the Cost of using Top in a query vs using a Ranking or Row_Number type function.
Does the cost of each depend on the situation or can the cost of these two features be determined across the board for all situations?
Some mock SQL is below using a simple CTE to demonstrate my question would look like the below:
WITH fData AS
(
SELECT 1 AS ID, 'John' AS fName, 'Black' AS lName, CAST('05/19/1975' AS DATE) AS birthDate UNION ALL
SELECT 2 AS ID, 'John' AS fName, 'Black' AS lName, CAST('04/1/1989' AS DATE) AS birthDate UNION ALL
SELECT 3 AS ID, 'John' AS fName, 'Black' AS lName, CAST('11/16/1995' AS DATE) AS birthDate UNION ALL
SELECT 4 AS ID, 'John' AS fName, 'Black' AS lName, CAST('01/16/1968' AS DATE) AS birthDate UNION ALL
SELECT 5 AS ID, 'John' AS fName, 'Black' AS lName, CAST('01/16/1968' AS DATE) AS birthDate
)
/* Using TOP 1 vs Row_Number() - Uncomment this and comment the below to VIEW TOP version */
--SELECT TOP 1 d.ID, d.fName, d.lName, d.birthDate
--FROM fData d
--ORDER BY d.birthDate
/* Using the below vs TOP 1 */
SELECT * FROM
( SELECT d.ID, d.fName, d.lName, d.birthDate, Row_Number() OVER (ORDER BY d.birthDate) AS ranker
FROM fData d
) r
WHERE r.ranker = 1
When using TOP there's not a need to apply a secondary Wrapping query around it and it looks cleaner. After applying a Row_Number or a Ranking function you then must wrap it to tell the query which row you are now wanting... either by applying the WHERE ranker = 1 or ranker >= 5 to achieve the same as TOP 1 or TOP 5.
Which is better faster if this is even something that can be determined?
In the case of your example the TOP is somewhat more efficient.
The execution plan for TOP is below
The TOP N sort with N=1 just needs to keep track of the row with the lowest birthDate that it sees.
For the row_number query it recognises that the row number is always ascending and does itself add a TOP 1 to the plan but it doesn't combine the separated TOP and SORT into a TOP N Sort - so it does a full sort of all 5 rows.
In the case that an index supplies rows in the desired order without the need for a sort there won't be much in it. The row_number query will have an extra couple of operators that are fairly inexpensive anyway.
WHY use ranking functions in SQL Server when it has TOP
Ranking functions in general are more powerful than TOP.
For the cases where both would work consider that TOP is a fairly ancient proprietary syntax and not standard SQL. It was in the product a long time before window functions were added. If portable SQL is a concern you should not use TOP.
Though you might not use ranking functions either. As another (standard SQL) alternative is
SELECT d.ID, d.fName, d.lName, d.birthDate
FROM fData d
ORDER BY d.birthDate
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY
which gives the same plan as TOP 1

How can I access a specific field in a named subquery when the field name might not be unique?

I am trying to create a routine that can accept an SQL query as a string and the [table].[primaryKey] of the primary record in the returned dataset, then wrap that original query to implement pagination (return records 40-49 when requesting page 4 and 10 records per page).
The dataset returned by the original queries will frequently contain multiple instances of the primary record, one for each occurrence of supporting records. For the example provided, if a customer has three phone numbers on record the results for that customer in the original query would look like:
{5; John Smith; 205 W. Fort St; 17; Home; 123-123-4587}
{5; John Smith; 205 W. Fort St; 18; Work; 123-123-8547}
{5; John Smith; 205 W. Fort St; 19; Mobile; 123-123-1147}
I'm almost there, I think, with the following query:
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
SELECT [WrappedQuery].[RowNumber], [OriginalQuery].* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [RowNumber], *
FROM (
SELECT DISTINCT [OriginalQuery].[{Customer.Id}] [PrimaryKey]
FROM [OriginalQuery]
) [RuwNumberQuery]
) [WrappedQuery]
INNER JOIN [OriginalQuery] ON [WrappedQuery].[PrimaryKey] = [OriginalQuery].[{Customer.Id}]
WHERE [WrappedQuery].[RowNumber] >= #PageNumber
AND [WrappedQuery].[RowNumber] < #PageNumber + #RecordsPerPage
This solution performs a SELECT DISTINCT on the primary key for the Primary (Customer) record and uses the SQL routine Row_Number() then joins the result with the results of the original query such that each unique primary (customer) record is numbered 1 - {end of file}, and I can pull only the RowNumber counts that I want.
But because OriginalQuery may have multiple fields named Id (from different tables), I can't figure out how to properly access [Customer].[Id] in my SELECT DISTINCT clause of [RowNumberQuery] or in the INNER JOIN.
Is there a better way to implement pagination at the SQL level, or a more direct method of accessing the field I need from within the subquery based on the table to which it belongs?
EDIT:
I've caused confusion in the pagination I am looking for. I am using Dapper in C# to compile the resulting dataset into individual complex objects, so the goal in the example would be to retrieve customers 31-40 in the list regardless of how many individual records exist for each customer. If Customer 31 had five phone records, Customer 32 had three phone records, Customer 33 had 1 phone record, and the remaining seven customers had two phone records each, I would expect the resulting dataset to contain 23 records total, but only 10 distinct customers.
SOLUTION
Thank you for all of the assistance, and I apologize for those areas I should have clarified sooner. I am creating a toolset that will allow C# Data Access Libraries to implement a set of standard parameters. If I have an option to implement the pagination in an internal function that can accept the SQL statement, I can defer to the toolset and not have to remember (or count on others to remember) to add the appropriate text each time. I'll set it up to return the finished objects, but if I were going to just modify the original query string it would look like:
public static string AddPagination(string sql, string primaryKey, Parameter requestParameters)
{
return $"WITH OriginalQuery AS ({sql.Replace("SELECT ", $"SELECT DENSE_RANK() OVER (ORDER BY {primaryKey}) AS PrimaryRecordCount, ",StringComparison.OrdinalIgnoreCase)}) " +
$"SELECT TOP ({requestParameters.MaxRecords}) * " +
$"FROM OriginalQuery " +
$"WHERE PrimaryRecordCount >= 1 + (({requestParameters.PageNumber - 1}) * {requestParameters.RecordsPerPage})" +
$" AND PrimaryRecordCount <= {requestParameters.Page} * {requestParameters.Limit}";
}
Just give your columns a different alias in your original query, e.g. [Customer].[Id] AS CustomerId, [Phone].[Id] AS PhoneId..., then you can reference OriginalQuery.CustomerId, or OriginalQuery.PhoneId
e.g.
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT [Customer].[Id] AS CustomerId,
[Customer].[Name],
[Customer].[Address],
[Phone].[Id] AS PhoneId,
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
SELECT [WrappedQuery].[RowNumber], [OriginalQuery].* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [RowNumber], *
FROM (
SELECT DISTINCT [OriginalQuery].[{Customer.Id}] [PrimaryKey]
FROM [OriginalQuery]
) [RuwNumberQuery]
) [WrappedQuery]
INNER JOIN [OriginalQuery] ON [WrappedQuery].[PrimaryKey] = [OriginalQuery].[CustomerId]
WHERE [WrappedQuery].[RowNumber] >= #PageNumber
AND [WrappedQuery].[RowNumber] < #PageNumber + #RecordsPerPage
It's worth noting that your paging logic is wrong too. Currently you are adding page number to the number of pages so you are searching for:
Page 1: Customers 1 - 10
Page 2: Customers 2 - 11
Page 3: Customers 3 - 12
Your logic should be:
WHERE [WrappedQuery].[RowNumber] >= 1 + ((#PageNumber - 1) * #RecordsPerPage)
AND [WrappedQuery].[RowNumber] <= (#PageNumber * #RecordsPerPage)
Page 1: Customers 1 - 10
Page 2: Customers 11 - 20
Page 3: Customers 21 - 30
With that being said, you could just use DENSE_RANK() Rather than ROW_NUMBER which would simplify everything. I think this would give you the same result:
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT c.Id AS CustomerId,
c.Name,
c.Address,
p.Id AS PhoneId,
p.Type,
p.Number,
DENSE_RANK() OVER(ORDER BY c.Id) AS RowNumber
FROM Customer AS c INNER JOIN Phone AS p ON c.Id = p.CustomerId
)
SELECT oq.CustomerId, oq.Name, oq.Address, oq.PhoneId, oq.Type, oq.Number
FROM OriginalQuery AS oq
WHERE oq.RowNumber >= 1 +((#PageNumber - 1) * #RecordsPerPage)
AND oq.RowNumber <= (#PageNumber * #RecordsPerPage);
I've added table aliases to try and make the code a bit cleaner, and also removed all the unnecessary square brackets. This is not necessary, but I personally find them quite hard on the eye, and only use them to escape key words.
Another difference is that in adding ORDER BY c.CustomerId you ensure consistent results for your paging. Using ORDER BY (SELECT NULL) implies that you don't care about the order, but you should if you using it for paging.
There are many concerns with what you are trying to do and you might be better off explaining why you are trying to make this process.
SQL query as a string
You are receiving a SQL query as a string, how are you parsing that string into the OriginalQuery CTE? This has both concerns about sql injection and concerns about global temp tables if you are using those.
Secondly, your example isn't doing pagination as it is commonly understood. If someone were to request page 1, 10 records per page, the calling application would expect to receive the first 10 records of the result set but your example will returns all records for the first 10 customers. Meaning the result could be 40+ if they each had 4 phone numbers as in your example data.
You should take a look at OFFSET and FETCH NEXT, as well as why this requirement to parse an arbitrary SQL string. There is probably a better way to do that.
Here is a rough example using OFFSET and FETCH NEXT from a static query, and returning only #RecordsPerPage number of records.
DECLARE #PageNumber int = 1;
DECLARE #RecordsPerPage int = 10;
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
ORDER BY [Customer].[Id]
OFFSET (#PageNumber-1)*#RecordsPerPage rows
FETCH NEXT #RecordsPerPage ROWS ONLY
If you wanted to return all records for the the RecordsPerPage number of entries which have a corresponding phone number, then it would be something like...
DECLARE #PageNumber int = 1;
DECLARE #RecordsPerPage int = 10;
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
WHERE Customer.ID IN (
SELECT DISTINCT Customer.ID FROM Customer INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
ORDER BY [Customer].[Id]
OFFSET (#PageNumber-1)*#RecordsPerPage rows
FETCH NEXT #RecordsPerPage ROWS ONLY
)
This does leave a question, what is the point of this query when the calling application can just use their own OFFSET and FETCH NEXT? They already have the SQL to generate the initial dataset, all they need to do is add OFFSET / FETCH NEXT to the end of it and they have their own pagination without trying to wrap it in a procedure of some sort.
To create a comparison, would you create a stored procedure that accepts a SQL string and then filters specific fields by specific values? Or would the people calling that stored procedure just add a Where clause to their own queries instead?
You can use alias name for the cuplicated column.
For example:
WITH OriginalQuery AS (
SELECT [Customer].[Id] as CustomerID,
[Customer].[Name],
[Customer].[Address],
[Phone].[Id] as PhoneID,
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
now you can use the 2 ids whit the alias name for the next query.

Finding point of interest on a square wave using sql

Good day,
I have a sql table with the following setup:
DataPoints{ DateTime timeStampUtc , bit value}
The points are on a minute interval, and store either a 1(on) or a 0(off).
I need to write a stored procedure to find the points of interest from all the data points.
I have a simplified drawing below:
I need to find the corner points only. Please note that there may be many data points between a value change. For example:
{0,0,0,0,0,0,0,1,1,1,1,0,0,0}
This is my thinking atm (high level)
Select timeStampUtc, Value
From Data Points
Where Value before or value after differs by 1 or -1
I am struggling to convert this concept to sql, and I also have a feeling there is an more elegant mathematical solution that I am not aware off. This must be a common problem in electronics?
I have wrapped the table into a CTE. Then, I am joining every row in the CTE to the next row of itself. Also, I've added a condition that the consequent rows should differ in the value.
This would return you all rows where the value changes.
;WITH CTE AS(
SELECT ROW_NUMBER() OVER(ORDER BY TimeStampUTC) AS id, VALUE, TIMESTAMPUTC
FROM DataPoints
)
SELECT CTE.TimeStampUTC as "Time when the value changes", CTE.id, *
FROM CTE
INNER JOIN CTE as CTE2
ON CTE.id = CTE2.id + 1
AND CTE.Value != CTE2.Value
Here's a working fiddle: http://sqlfiddle.com/#!6/a0ddc/3
If I got it correct, you are looking for something like this:
with cte as (
select * from (values (1,0),(2,0),(3,1),(4,1),(5,0),(6,1),(7,0),(8,0),(9,1)) t(a,b)
)
select
min(a), b
from (
select
a, b, sum(c) over (order by a rows unbounded preceding) grp
from (
select
*, iif(b = lag(b) over (order by a), 0, 1) c
from
cte
) t
) t
group by b, grp

Generate a row range based on "FromRange" "ToRange" field of each row

I have a table with following fields:
DailyWork(ID, WorkerID, FromHour, ToHour) assume that, all of the fields are of type INT.
This table needs to be expanded in a T_SQL statement to be part of a JOIN.
By expand a row, I mean, generate a hour for each number in range of FromHour and ToHour. and then join it with the rest of the statement.
Example:
Assume, I have another table like this: Worker(ID, Name). and a simple SELECT statement would be like this:
SELECT * FROM
Worker JOIN DailyWork ON Worker.ID = DailyWork.WorkerID
The result has columns similar to this: WorkerID, Name, DailyWorkID, WorkerID, FromHour, ToHour
But, what i need, has columns like this: WorkerID, Name, Hour.
In fact the range of FromHour and ToHour is expanded. and each individual hour placed in separate row, in Hour column.
Although i read a similar question to generate a range of number , but it didn't really help.
I you start with a list of numbers, then this is pretty easy. Often, the table master.spt_values is used for this purpose:
with nums as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select dw.*, (dw.fromhour + nums.n) as specifichour
from dailywork dw join
nums
on dw.tohour >= dw.fromhour + nums.n;
The table master.spt_values generally has a few thousand rows at least.
Another solution would be...
WITH [DayHours] AS (
SELECT 1 AS [DayHour]
UNION ALL
SELECT [DayHour] + 1 FROM [DayHours] WHERE [DayHour] + 1 <= 24
)
SELECT [Worker]
JOIN [DayHours] ON [Worker].[FromHour] <= [DayHours].[DayHour]
AND [Worker].[ToHour] >= [DayHours].[DayHour]

TSQL self join to get results

I run the following query
Select * From
(
Select
GUID,
MFG_CODE,
STK_NAME,
parentid,
masteritem,
ROW_NUMBER() over(order by guid) r
From Fstock Where MasterItem=1 OR isNull(parentID, '')=''
) a
Where r between 4716 And 4716
And I get following results
GUID MFG_CODE parentid masteritem r
31955 369553 0 1 4717
As you can see GUID 31955 is actually a parentITEM & I need to bring in all the children of this parent item within the same query.
For example if I do:
Select * From Fstock where parentID = 31955
It returns 3 children of it
GUID
31956
31957
31958
So is there a way to combine these two queries together, I only want to return fixed amount of rows using row_number() function, however those returned rows sometimes contain a Parent ITem, I would like to return the children for those parent items as well within same query.
Performance is very important for me.
--- EDIT ----
I got it to work with following query, does anyone have other ideas?
With CTE
As
(
Select
GUID,
Manufacturer,
SELL_PRICE,
MFG_CODE,
parentid,
masteritem,
ROW_NUMBER() over(order by GUID) r
From Fstock Where MasterItem=1 OR isNull(parentID, '')=''
)
Select A.*,F.parentID From
(
Select * From CTE
Where r between 4717 And 6000
) A
Left join Fstock F on F.parentID = A.GUID
Order by A.r
This is crude and untested, but I believe you're looking for a recursive Common Table Expression (CTE) that will combine the parent-child relationships for you. Now, natively, this does not integrate any row limitations you mentioned in terms of returning a "fixed number of rows," which I was not precisely sure how to interpret, but the basic query below should be a start for you.
With Products(GUID, MFG_CODE,STK_NAME, parentid,masteritem)
as
(
Select GUID,MFG_CODE,STK_NAME,parentid,masteritem
from fstock
where masteritem=1 OR isNull(parentID, '')=''
Union all
Select f.GUID,f.MFG_CODE,f.STK_NAME,f.parentid,f.masteritem
from fstock f
inner join products g
on f.parentid=g.guid
)

Resources