Related
I need a query or function to count the 0's between 1's in a string.
For example:
String1 = '10101101' -> Result=3
String2 = '11111001101' -> Result=1
String3 = '01111111111' -> Result=1
I only need to search for 101 pattern or 01 pattern if its at the beginning of the string.
You may try to decompose the input strings using SUBTRING() and a number table:
SELECT
String, COUNT(*) AS [101Count]
FROM (
SELECT
v.String,
SUBSTRING(v.String, t.No - 1, 1) AS PreviousChar,
SUBSTRING(v.String, t.No, 1) AS CurrentChar,
SUBSTRING(v.String, t.No + 1, 1) AS NextChar
FROM (VALUES
('10101101'),
('11111001101'),
('01111111111')
) v (String)
CROSS APPLY (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t (No)
) cte
WHERE
CASE WHEN PreviousChar = '' THEN '1' ELSE PreviousChar END = '1' AND
CurrentChar = '0' AND
NextChar = '1'
GROUP BY String
Result:
String 101Count
10101101 3
11111001101 1
01111111111 1
Notes:
The table with alias v is the source table, the table with alias t is the number table. If the input strings have more than 10 characters, use an appropriate number (tally) table.
-- This converts "111101101010111" in "01101010" and "011101000" in "01110"
regexp_replace(field, '^1*(.*)1*0*$', '\1')
-- This converts "01101010" in "0000"
regexp_replace(field, '1', '')
-- This counts the string length, returning 4 for '0000':
LENGTH(field)
-- Put all together:
LENGTH(
regexp_replace(
regexp_replace(field, '^1*(.*)1*0*$', '\1')
, '1', '')
)
Different or more complicated cases require a modification of the regular expression.
Update
For "zeros between 1s" I see now you mean "101" sequences. This is more complicated because of the possibility of having "10101". Suppose you want to count this as 2:
replace 101 with 11011. Now 10101 will become either 1101101 or 1101111011. In either case, you have the "101" sequence well apart and still only have two of them.
replace all 101s with 'X'. You now have 1X11X1
replace [01] with the empty string. You now have XX.
use LENGTH to count the X's.
Any extra special sequence like "01" at the beginning you can convert as first thing with "X1" ("10" at the end would become "1X"), which will then neatly fold back in the above workflow.
By using the LIKE operator with % you can decide how to search a specific string. In this SQL query I am saying that I want every record that starts as 101 or 01.
SELECT ColumnsYouWant FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
You can simple COUNT the ColumnYouWant, like this:
SELECT COUNT(ColumnYouWant) FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
Or you can use a method of your backend language to count the results that the first query returns. This count method will depend on the language you are working with.
SQL Documentation for LIKE: https://www.w3schools.com/sql/sql_like.asp
SQL Documentation for COUNT; https://www.w3schools.com/sql/sql_count_avg_sum.asp
The other solutions do not account for all of the characters (max of 11, of the examples shown)
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '10101101'),
(2, '11111001101'),
(3, '01111111111')) V(id, string);
Query
;with
split_cte as (
select id, n, substring(t.string, v.n, 1) subchar
from #tTEST t
cross apply (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)
where v.n<=len(t.string)),
lead_lag_cte as (
select id, n, lead(subchar, 1, 9) over (partition by id order by n) lead_c, subchar,
lag(subchar, 1, 9) over (partition by id order by n) lag_c
from split_cte)
select id, sum(case when (lead_c=1 and lag_c=9) then 1 else
case when (lead_c=1 and lag_c=1) then 1 else 0 end end) zero_count
from lead_lag_cte
where subchar=0
group by id;
Results
id zero_count
1 3
2 1
3 1
Another way, perhasp quicker:
DECLARE #T TABLE (ID INT, STRING VARCHAR(32));
INSERT INTO #T
VALUES (1, '10101101'),
(2, '11111001101'),
(3, '01111111111');
SELECT *, LEN(STRING) - LEN(REPLACE(STRING, '0', '')) AS NUMBER_OF_ZERO
FROM #T
Result:
ID STRING NUMBER_OF_ZERO
----------- -------------------------------- --------------
1 10101101 3
2 11111001101 3
3 01111111111 1
select (len(replace('1'+x, '101', '11011')) - len(replace(replace('1'+x, '101', '11011'), '101', '')))/3
from
(
values
('10101101'),
('11111001101'),
('01111111111'),
('01010101010101010101')
) v(x);
I have an dataset which has 12 different values for an ID, and also start and end Values. What I want to initilize is take the start value as my begining argument of loop and end value as the last argument. Search trough values accourding to them, and finding the maximum of them. After finding maximum search through values again wrt start and end value and find the longest consecutive max value occurance.
Below I posted an example dataset:
create table #sample_data(
ID VARCHAR(10), val1 INT, val2 INT, val3 INT, val4 INT, val5 INT,
val6 INT, val7 INT, val8 INT, val9 INT, val10 INT, val11 INT, val12 INT,
startValue INT, endValue INT );
insert into #sample_data values
(1001,3,2,1,0,1,2,3,0,0,0,0,0,1,7),
(1002,1,2,3,4,0,0,0,1,2,3,0,0,1,12),
(1003,0,3,2,1,0,0,0,3,3,0,0,0,1,12),
(1004,0,1,2,4,4,0,0,0,0,0,0,0,3,9),
(1005,1,2,2,1,0,0,2,2,2,1,0,0,1,8);
The result I expect for Id=1001 start=1, end = 7, max value is 3 and, it occurs 2 times but they arent consecutive, therefore final output I'd like to get is 1.
For ID=1002 start=1, end=12, max is 4 and it only occurs 1 time, so final output shoud be 1.
For ID = 1003 start=1, end=12, max is 3, 3 occurs three times but only 2 of them are consecutive therefore I expect to get 2.
For ID = 1004 start=3, end=9, max is 4 it occurs two times consecutively therefore output should be 2.
For ID = 1005 start=1, end=8, max is 2 it totaly occurs 5 times, 2 and 3 times consecutively, I expect to get 3 as my final output since it is longest.
If I understand the question correctly, the result for the row with Id 1005 should be 2 and not 3, because the max value (which is 2) appears consecutively in places 2,3 and then again in places 7,8,9 - but the endValue of that row is 8, and therefor the larger consecutive should not be counted.
Based on that understanding (which might be incorrect, hence the comment I've written to the question), this can be done with a set based approach (meaning, without any loops), with the help of some nice SQL tricks.
So the first thing you want to do is to use cross apply with a table value constructor to convert the val1...val12 columns to rows. I guess this can also be done using Pivot but I never quite got the hang of pivot so I prefer other solutions to get the same thing.
In my code, this step is done in the first common table expression (called CTEValues).
Next, you use a trick from Itzik Ben-Gan to handle gaps and island problems to identify the groups of consecutive values within each row. This step is done in the second cte (CTEGroups).
The third and final cte called CTEConsecutive use a simple group by and count to get the number of consecutive max values within each row of the original table, providing their column location is between the startValue and EndValue.
The last thing to do is get the max value of that count for each id - and that should give you the desired results.
Here's the full code:
WITH CTEValues AS
(
SELECT ID, startValue, EndValue, Val, ValId, IIF(Val = MAX(Val) OVER(PARTITION BY ID), 1, 0) As IsMax
FROM #sample_data
CROSS APPLY
(
SELECT *
FROM (VALUES
(Val1, 1),
(Val2, 2),
(Val3, 3),
(Val4, 4),
(Val5, 5),
(Val6, 6),
(Val7, 7),
(Val8, 8),
(Val9, 9),
(Val10, 10),
(Val11, 11),
(Val12, 12)
)V(Val, ValId)
) vals
), CTEGroups AS
(
SELECT ID, startValue, EndValue, Val, ValId, IsMax,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ValId) -
ROW_NUMBER() OVER(PARTITION BY ID, IsMax ORDER BY ValId) As Grp
FROM CTEValues
), CTEConsecutive AS
(
SELECT ID, COUNT(Val) As NumOfConsecutiveMaxValues --*, OVER(PARTITION BY Id, Grp) As NumOfValues
FROM CTEGroups
WHERE IsMax = 1
AND ValId >= startValue
AND ValId <= EndValue
GROUP BY ID, Grp
)
SELECT ID, MAX(NumOfConsecutiveMaxValues)
FROM CTEConsecutive
GROUP BY ID
ORDER BY Id
You can see a live demo on rextester.
If, however, I'm wrong in my initial assumption and the startvalue and endvalue are only relevant to the range of which to search for the max value, (and that would give you the expected results you've posted in the question), you will need another cte.
WITH CTEValues AS
(
SELECT ID, startValue, EndValue, Val, ValId
FROM #sample_data
CROSS APPLY
(
SELECT *
FROM (VALUES
(Val1, 1),
(Val2, 2),
(Val3, 3),
(Val4, 4),
(Val5, 5),
(Val6, 6),
(Val7, 7),
(Val8, 8),
(Val9, 9),
(Val10, 10),
(Val11, 11),
(Val12, 12)
)V(Val, ValId)
) vals
), CTEValuesWithMax AS
(
SELECT ID, startValue, EndValue, Val, ValId,
IIF(Val = (
SELECT MAX(Val)
FROM CTEValues AS T1
WHERE T0.ID = T1.ID
AND T1.ValId >= T1.startValue
AND T1.ValId <= T1.EndValue
), 1, 0) As IsMax
FROM CTEValues AS T0
)
The rest of the code remains the same, except that CTEGroups now selects from CTEValuesWithMax instead of from CTEValues.
You can see a live demo of this as well.
I am querying for a date value that follows a specific phrase in the same text column value. There are a number of date values in this text column but I'm needing only the DATE following the phrase I've found via a patindex value. Any help/direction would be appreciated. Thanks.
Here is my SQL code:
SELECT
NotesSysID,
PATINDEX('%Demand Due Date:%', NoteText) AS [Index of DemandDueDate text],
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText) AS [Index of DemandDueDate date],
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText), (PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', NoteText)))
FROM
#temp_ExtractedNotes;
Here is a pic of my data, please note that the 2nd index is LESS than the Index of DemandDueDate text found and I'm needing the subsequent date AFTER the Index of DemandDueDate text column. Hopefully this makes sense.
I think your code will be easier for you to work with if you use a CTE and some stepwise refinement. That'll free you from trying to do everything with one hugely-nested SELECT statement.
;
WITH FirstCut as (
SELECT
NotesSysID,
LocationOfText = PATINDEX('%Demand Due Date:%', NoteText),
NoteText
FROM #temp_ExtractedNotes
),
SecondCut as (
SELECT
NotesSysID,
NoteText,
-- making assumption date will be within first 250 chars of text
DemandDueDateSection = SUBSTRING( NoteText, [LocationOfText], 250)
FROM FirstCut
),
ThirdCut as (
SELECT
NotesSysID,
NoteText,
DemandDueDateSection,
LocationOfDate = PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', DemandDueDateSection)
FROM SecondCut
),
FourthCut as (
SELECT
NotesSysID,
NoteText,
DateAsText = SUBSTRING( DemandDueDateSection, LocationOfDate, 10 )
FROM ThirdCut
)
SELECT
NotesSysID,
NoteText,
DemandDueDate = CONVERT( DateTime, DateAsText)
FROM FourthCut
The issue you're having is because your second PATINDEX isn't finding the date you want; it's finding the first date in the string, which, as you noted, appears before the PATINDEX of the phrase you're searching for %Demand Due Date:%. The third parameter of SUBSTRING is LENGTH, which is to say how many characters after the second parameter you want to pull. By using that second PATINDEX value as the third parameter in your SUBSTRING you're returning a sub-string that starts where you want it to, and is of LENGTH equal to the number of characters into the string where that first date appears.
Which, of course, isn't what you want. To #ZLK's point in the comments, first, you need to do a nested PATINDEX. That's going to be pretty slow, so I'm hoping there aren't a bazillion records in your temp table.
Based on the sample image, it looks like the date you're interested in can appear a variable number of characters after %Demand Due Date:%. We'll start by adding 16 to the PATINDEX of %Demand Due Date:% (because that's how many characters are in %Demand Due Date:%, so we'll just start right after that). Then we'll pick up the next 100 characters. You can tweak that later if you need more or not that many.
So you're first SUBSTRING will look like this:
SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100)
Now we have to search that sub-string for the second pattern, the one that should yield a date for you.
PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%', SUBSTRING(NoteText, PATINDEX('%Demand Due Date:%', NoteText) + 16, 100))
The number returned there is the point where your date value starts within the 100 characters following %Demand Due Date:%. Armed with that number, you just need to SUBSTRING out the next ten characters, and, just for fun, CAST it as a DATE. That big ugly formula will look like this:
DECLARE #test VARCHAR(200) = 'foo bar Demand Due Date: 12/21/2018 bar foo foo bar';
SELECT
CAST(
SUBSTRING(
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100),
PATINDEX
( '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',
SUBSTRING
(#test, PATINDEX('%Demand Due Date:%', #test) + 16, 100)
)
,10)
AS DATE);
Result:
2018-12-21
Rextester: https://rextester.com/KCY79989
Grab a copy of PatternSplitCM.
CREATE FUNCTION dbo.PatternSplitCM
(
#List VARCHAR(8000) = NULL
,#Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(#List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(#List,MIN(n),1+MAX(n)-MIN(n)),
Matched
FROM (
SELECT n, y.Matched, Grouper = n - ROW_NUMBER() OVER(ORDER BY y.Matched,n)
FROM numbers
CROSS APPLY (
SELECT Matched = CASE WHEN SUBSTRING(#List,n,1) LIKE #Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY Matched, Grouper;
Then it's pretty easy:
-- Sample Data
DECLARE #temp_ExtractedNotes TABLE (someID INT IDENTITY, NoteText VARCHAR(8000));
INSERT #temp_ExtractedNotes (NoteText) VALUES
('blah blah blah 2/4/2016... Demand Due Date: 1/05/2011; blah blah 12/1/2017...'),
('Yada yad..... Demand Due Date: 11/21/2016;...'),
('adas dasd asd a sd asdas Demand Due Date: 09/09/2019... 5/05/2005, 10/10/2010....'),
('nothing to see here - moving on... 01/02/2003!');
-- Solution
SELECT TOP (1) WITH TIES t.someID, ns.s, DueDate = CASE SIGN(nt.s) WHEN 1 THEN f.item END
FROM #temp_ExtractedNotes AS t
CROSS APPLY (VALUES(CHARINDEX('Demand Due Date:',t.NoteText))) AS nt(s)
CROSS APPLY (VALUES(SUBSTRING(t.noteText, nt.s+16, 8000))) AS ns(s)
CROSS APPLY dbo.patternsplitCM(ns.s,'[0-9/]') AS f
WHERE f.matched = 1
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.someID ORDER BY f.itemNumber);
Results:
someID s DueDate
----------- ---------------------------------------- ------------
1 1/05/2011; blah blah 12/1/2017... 1/05/2011
2 11/21/2016;... 11/21/2016
3 09/09/2019... 5/05/2005, 10/10/2010.... 09/09/2019
4 here - moving on... 01/02/2003! NULL
I thought the following three SQL statements are semantically the same. The database engine will expand the second and third query to the first one internally.
select ....
from T
where Id = 1
select *
from
(select .... from T) t
where Id = 1
select *
from
(select .... from T where Id = 1) t
However, I found the window function behaves differently. I have the following code.
-- Prepare test data
with t1 as
(
select *
from (values ( 2, null), ( 3, 10), ( 5, -1), ( 7, null), ( 11, null), ( 13, -12), ( 17, null), ( 19, null), ( 23, 1759) ) v ( id, col1 )
)
select *
into #t
from t1
alter table #t add primary key (id)
go
The following query returns all the rows.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4)))
over (order by id
rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
id col1 lastval
-------------------
2 NULL NULL
3 10 NULL
5 -1 10
7 NULL -1
11 NULL -1
13 -12 -1
17 NULL -12
19 NULL -12
23 1759 -12
Without CTE/subquery: then I added a condition just return the row which Id = 19.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
where
id = 19;
However, lastval returns null?
With CTE/subquery: now the condition is applied to the CTE:
with t as
(
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from
#t)
select *
from t
where id = 19;
-- Subquery
select
*
from
(select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t) t
where
id = 19;
Now lastval returns -12 as expected?
The logic order of operations of the SELECT statement is import to understand the results of your first example. From the Microsoft documentation, the order is, from top to bottom:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Note that the WHERE clause processing happens logically before the SELECT clause.
The query without the CTE is being filtered where id = 19. The order of operations causes the where to process before the window function in the select clause. There is only 1 row with an id of 19. Therefore, the where limits the rows to id = 19 before the window function can process the rows between unbounded preceding and 1 preceding. Since there are no rows for the window function, the lastval is null.
Compare this to the CTE. The outer query's filter has not yet been applied, so the CTE operates an all of the data. The rows between unbounded preceding finds the prior rows. The outer part of the query applies the filter to the intermediate results returns just the row 19 which already has the correct lastval.
You can think of the CTE as creating a temporary #Table with the CTE data in it. All of the data is logically processed into a separate table before returning data to the outer query. The CTE in your example creates a temporary work table with all of the rows that includes the lastval from the prior rows. Then, the filter in the outer query gets applied and limits the results to id 19.
(In reality, the CTE can shortcut and skip generating data, if it can do so to improve performance without affecting the results. Itzik Ben-Gan has a great example of a CTE that skips processing when it has returned enough data to satisfy the query.)
Consider what happens if you put the filter in the CTE. This should behave exactly like the first example query that you provided. There is only 1 row with an id = 19, so the window function does not find any preceding rows:
with t as ( select id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over ( order by id
rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from #t
where id = 19 -- moved filter inside CTE
)
select *
from t
Window functions operate on your result set, so when you added where id = 19 your result set only had 1 row. Since your window function specifies rows between unbounded preceding and 1 preceding there was no preceding row, and resulted in null.
By using the subquery/cte you are allowing the window function to operate over the unfiltered result set (where the preceding rows exist), then retrieving only those rows from that result set where id = 19.
The querys you are comparing are not equivalent.
select id ,
(... ) as lastval
from #t
where id = 19;
will take only 1 row, so 'lastval' will take NULL from col1 as for the windowed function does not find preceding row.
What I'm looking for is a way in MSSQL to create a complex IN or LIKE clause that contains a SET of values, some of which will be ranges.
Sort of like this, there are some single numbers, but also some ranges of numbers.
EX: SELECT * FROM table WHERE field LIKE/IN '1-10, 13, 24, 51-60'
I need to find a way to do this WITHOUT having to specify every number in the ranges separately AND without having to say "field LIKE blah OR field BETWEEN blah AND blah OR field LIKE blah.
This is just a simple example but the real query will have many groups and large ranges in it so all the OR's will not work.
One fairly easy way to do this would be to load a temp table with your values/ranges:
CREATE TABLE #Ranges (ValA int, ValB int)
INSERT INTO #Ranges
VALUES
(1, 10)
,(13, NULL)
,(24, NULL)
,(51,60)
SELECT *
FROM Table t
JOIN #Ranges R
ON (t.Field = R.ValA AND R.ValB IS NULL)
OR (t.Field BETWEEN R.ValA and R.ValB AND R.ValB IS NOT NULL)
The BETWEEN won't scale that well, though, so you may want to consider expanding this to include all values and eliminating ranges.
You can do this with CTEs.
First, create a numbers/tally table if you don't already have one (it might be better to make it permanent instead of temporary if you are going to use it a lot):
;WITH Numbers AS
(
SELECT
1 as Value
UNION ALL
SELECT
Numbers.Value + 1
FROM
Numbers
)
SELECT TOP 1000
Value
INTO ##Numbers
FROM
Numbers
OPTION (MAXRECURSION 1000)
Then you can use a CTE to parse the comma delimited string and join the ranges with the numbers table to get the "NewValue" column which contains the whole list of numbers you are looking for:
DECLARE #TestData varchar(50) = '1-10,13,24,51-60'
;WITH CTE AS
(
SELECT
1 AS RowCounter,
1 AS StartPosition,
CHARINDEX(',',#TestData) AS EndPosition
UNION ALL
SELECT
CTE.RowCounter + 1,
EndPosition + 1,
CHARINDEX(',',#TestData, CTE.EndPosition+1)
FROM CTE
WHERE
CTE.EndPosition > 0
)
SELECT
u.Value,
u.StartValue,
u.EndValue,
n.Value as NewValue
FROM
(
SELECT
Value,
SUBSTRING(Value,1,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)-1 ELSE LEN(Value) END) AS StartValue,
SUBSTRING(Value,CASE WHEN CHARINDEX('-',Value) > 0 THEN CHARINDEX('-',Value)+1 ELSE 1 END,LEN(Value)- CHARINDEX('-',Value)) AS EndValue
FROM
(
SELECT
SUBSTRING(#TestData, StartPosition, CASE WHEN EndPosition > 0 THEN EndPosition-StartPosition ELSE LEN(#TestData)-StartPosition+1 END) AS Value
FROM
CTE
)t
)u INNER JOIN ##Numbers n ON n.Value BETWEEN u.StartValue AND u.EndValue
All you would need to do once you have that is query the results using an IN statement, so something like
SELECT * FROM MyTable WHERE Value IN (SELECT NewValue FROM (/*subquery from above*/)t)