Extracting Data between two characters with T-SQL [duplicate] - sql-server

This question already has answers here:
Using T-SQL, return nth delimited element from a string
(14 answers)
Closed 2 years ago.
I do not have access to write to a database, queries are limited to retrieval only. CTEs are not allowed. Cannot write functions. (Here's a document of what I'm limited to https://help.salesforce.com/articleView?id=mc_as_sql_reference.htm&type=5)
Here's an example of a string I'm working with, I'm looking to extract the email address, where I consistently know that the email will begin at the fourth |, and end at the fifth |.
D||John Smith|EML|test#gmail.com|Y|2014/01/03 17:14:01.000000|
This is what I tried so far, it's not able to return any email addresses for me
SELECT
CASE
WHEN CHARINDEX('|',AllData) > 0
THEN SUBSTRING(AllData, CHARINDEX('|', AllData, 22) + 1, ABS((CHARINDEX('|', AllData, CHARINDEX('#', AllData))) - (CHARINDEX('|', AllData, 22) + 1)))
ELSE 'NotWorking'
END AS email
FROM
[test_file]

I would suggest string_split():
select s.value
from test_file tf cross apply
string_split(alldata, '|') s
where s.value like '%#%.%';
This doesn't extract the fourth value. It extracts any value that looks like an email -- but it is pretty simple code that should work.

Related

Issue where clause FOR JSON AUTO has generated the incomplete answer [duplicate]

This question already has answers here:
FOR JSON PATH results in SSMS truncated to 2033 characters
(10 answers)
SQL Server json truncated (even when using NVARCHAR(max) )
(10 answers)
Closed 9 months ago.
Getting JSON from SQL Server is great, but I ran into a problem.
Example. I have a LithologySamples table with a very basic structure:
[Id] [uniqueidentifier],
[Depth1] [real],
[Depth2] [real],
[RockId] [nvarchar](8),
In the database there are more or less 600 records of this table. I want to generate a JSON to transport data to another database, so I use FOR JSON AUTO. Which has worked perfectly with other tables with less records. But in this case I see that the response is generated incomplete. It has me baffled. I noticed when examining the output:
[{
"Id": "77769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 4.2000000e+001,
"Depth2": 5.8000000e+001,
"RockId": "MIC SST"
}, {
"Id": "78769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 5.8000000e+001,
"Depth2": 6.3000000e+001,
"RockId": "CGL"
}, {
"Id": "79769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 6.3000000e+001,
"Depth2": 8.3000000e+001,
"RockId": "MIC SST"
}, {
// ... OK, continue fine, but it breaks off towards the end:
}, {
"Id": "85769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 2.0500000e+002,
"Depth2": 2.1500000e+002,
"RockId": "MIC SST"
}, {
"Id": "86769039-
// inexplicably it cuts here !?
I've searched and I can't find any options for the answer to come out complete.
The SQL query is as follows:
SELECT*FROM LithologySamples FOR JSON AUTO;
AUTO or PATH are the same result
Does anyone know what I should do so that the statement generates the JSON of the entire table?
But in this case I see that the response is generated incomplete.
If you are checking this in SSMS, it truncates text in various ways depending on the output method you're using (PRINT, SELECT, results to text/grid). The string is complete, it's just the output that has been mangled.
One way to validate that the string is in fact complete is to:
SELECT * INTO #foo FROM
(SELECT * FROM LithologySamples FOR JSON AUTO) x(y);
Then checking LEN(y), DATALENGTH(y), RIGHT(y , 50) (see example db<>fiddle), or selecting from that table using CONVERT(xml (see this article for more info).
In your case it seems the problem is coming from how C# is consuming the output. If the consumer is treating the JSON as multiple rows, then assigning a variable there will ultimately assign one arbitrary row of <= 2033 characters, not the whole value. I talked about this briefly back in 2015. Let's say you are using reader[0] or similar to test:
CREATE TABLE dbo.Samples
(
[Id] [uniqueidentifier] NOT NULL DEFAULT NEWID(),
[Depth1] [real] NOT NULL DEFAULT 5,
[Depth2] [real] NOT NULL DEFAULT 5,
[RockId] [nvarchar](8)
);
INSERT dbo.Samples(RockId) SELECT TOP (100) LEFT(name, 8) FROM sys.all_columns;
-- pretend this is your C# reader:
SELECT * FROM dbo.Samples FOR JSON AUTO;
-- reader[0] here would be something like this:
-- [{"Id":"054EC9A2-760B-4EBA-BF06-...,"RockId":"ser
-- which is the first 2,033 characters
SELECT LEN('[{"Id":"054EC9A2-760B-4EBA-BF06-..."RockId":"ser')
-- instead, since you want C# to assign a scalar,
-- assign output to a scalar first:
DECLARE #json nvarchar(max) = (SELECT * FROM dbo.Samples FOR JSON AUTO);
SELECT json = #json;
-- now reader[0] will be the whole thing
Example db<>fiddle
The 2033 comes from the same place it comes from for XML (since SQL Server's JSON implementation is just a pretty wrapper under existing underlying XML functionality), as Charlie points out Martin explained here:
SELECT FOR XML AUTO and return datatypes

SQL Server: String Manipulation

I have currently managed to pull out a list of codes from some descriptions of which all should be formatted similarly for example:
'%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
(00000-0000)
'%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
(0000-0000)
'%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]%'
(00000-000)
'%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]%'
(0000-000)
'%[0-9][0-9][0-9][0-9]-[0-9][0-9]%'
(0000-00)
I have noticed the codes I have not currently pulled out as formatted as 00000000 and was wondering say if I were to pull these codes out if there was a way to restructure them to 00000-000?
The code I am currently using is as such based on the suggestion from Larnu yesterday if anybody could help with this String manipulation question would be greatly appreciated, If possible I would like to place the restructured string in the column with the others in the correct format.
WITH VTE AS (
SELECT *
FROM [Remedies].[dbo].[ShortageCompany] V)
SELECT
V.[ShortageDetailID]
,V.[ShortageID]
,V.[Company]
,V.[CompanyID]
,V.[Presentation]
,V.[Availability]
,V.[Information]
,V.[Reason]
,V.[StandardReason],
CASE WHEN PI1.C > 0 THEN SUBSTRING(V.[Presentation],PI1.C, 10)
WHEN PI2.C > 0 THEN SUBSTRING(V.[Presentation],PI2.C, 9)
WHEN PI3.C > 0 THEN SUBSTRING(V.[Presentation],PI3.C, 9)
WHEN PI4.C > 0 THEN SUBSTRING(V.[Presentation],PI3.C, 8)
WHEN PI5.C > 0 THEN SUBSTRING(V.[Presentation],PI3.C, 7)
ELSE NULL
END AS N
FROM VTE V
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.[Presentation]))) PI1(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.[Presentation]))) PI2(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]%',V.[Presentation]))) PI3(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]%',V.[Presentation]))) PI4(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9]%',V.[Presentation]))) PI5(C)
ORDER BY [ShortageDetailID]
This method actually work, despite what I had said. The data I ran this query against was inconsistent and the code matched values where possible, although I had had to expand the search to look for a couple more varieties of Character structures.

SQL-manipulating strings

I'll try and make this clear...
Let's say I have a table with 2 columns. issue_number and issue_text. I need to grab 2 strings out of the issue_text column. The first string is something that can be hard coded with case statements since there are only so many types of issues that can be logged (note, i know this isn't the best way)
case
when issue_text like '%error%' then 'error'
else 'not found'
end as error_type
the issue_text is a string that will be formatted mostly the same, it'll have an error, more info, then an incident number, and that is the end of the string.
i.e. "Can't add address. Ref Number: 9999999"
the problem I'm having is the number will not always be the same amount of characters away from the error message.
I was wondering if there is a way to access the substring that causes a match from the like clause. like another case statement using a regex(which i know aren't supported well in sql)
case
when issue_text like '%[0-9 .]%' then (the substring match from like '%[0-9 .]%')
else 00000
end as issue_number
I am restricted to solving this issue and parsing these strings from SQL Server Management Studio or yes, I'd use .net or something to leverage.
Declare #YourTable table (ID int,issue_text varchar(150))
Insert Into #YourTable values
(1,'Can''t add address. Ref Number: 9999999'),
(2,'error')
Select ID
,Issue = Left(issue_text,PatIndex('%:%',issue_text+':')-1)
,IssueNo = substring(issue_text,PatIndex('%:%',issue_text+':')+2,25)
From #YourTable
Returns
ID Issue IssueNo
1 Can't add address. Ref Number 9999999
2 error
If there's always a space just before the number and the number is the last part of the string you can do
RIGHT(issue_text, CHARINDEX(' ', REVERSE(issue_text)) - 1)

Oracle ROWTOCOL Function oddities

I have a requirement to pull data in a specific format and I'm struggling slightly with the ROWTOCOL function and was hoping a fresh pair of eyes might be able to help.
I'm using 10g Oracle DB (10.2) so LISTAGG which appears to do what I need to achieve is not an option.
I need to aggregate a number of usernames into a string delimited with a '$' but I also need to concatenate another column to to build up email addresses.
select
rowtocol('select username_id from username where user_id = '||s.user_id|| 'order by USERNAME_ID asc','#'||d.domain_name||'$')
from username s, domain d
where s.user_id = d.user_id
(I've simplified the query specific to just this function as the actual query is quite large and all works except for this particular function.)
in the DOMAIN Table I have a number of domains such as 'hotmail.com','gmail.com' etc
I need to concatenate the username, an '#' symbol followed by the domain and all delimited with a '$'
such as ......
joe.bloggs#gmail.com$joeblogs#gmail.com$joe_bloggs#gmail.com
I've battled with this and I've got close but in reverse?!.....
gmail.com$joe.bloggs#gmail.com$joeblogs#gmail.com$joe_bloggs
I've also noticed that if I play around with the delimiter (,'#'||d.domain_name||'$') it has a tendency to drop off the first character as can be seen above the preceding '#' has been dropped from the first email address.
Can anyone offer any suggestions as to how to get this working?
Many Thanks in advance!
Assuming you're using the rowtocol function from OTN, and have tables something like:
create table username (user_id number, username_id varchar2(20));
create table domain (user_id number, domain_name varchar2(20));
insert into username values (1, 'joe.bloggs');
insert into username values (1, 'joebloggs');
insert into username values (1, 'joe_bloggs');
insert into domain values (1, 'gmail.com');
Then your original query gets three rows back:
gmail.com$joe.bloggs
gmail.com$joe_bloggs#gmail.com$joebloggs
gmail.com$joe_bloggs#gmail.com$joebloggs
You're passing the data from each of your user IDs to a separate call to rowtocol, which isn't really what you want. You can get the result I think you're after by reversing it; pass the main query that joins the two tables as the select argument to the function, and have that passed query do the username/domain concatenation - that is a separate step to the string aggregation:
select
rowtocol('select s.username_id || ''#'' || d.domain_name from username s join domain d on d.user_id = s.user_id', '$')
from dual;
which gets a single result:
joe.bloggs#gmail.com$joe_bloggs#gmail.com$joebloggs#gmail.com
Whether that fits into your larger query, which you haven't shown, is a separate question. You might need to correlate it with the rest of your query.
There are other ways to string aggregation in Oracle, but this function is one way, and you already have it installed. I'd look at alternatives though, such as ThomasG's answer, which make it a bit clearer what's going on I think.
As Alex told you in comments, this ROWTOCOL isn't a standard function so if you don't show its code, there's nothing we can do to fix it.
However you can accomplish what you want in Oracle 10 using the XMLAGG built-in function.
try this :
SELECT
rtrim (xmlagg (xmlelement (e, s.user_id || '#' || d.domain_name || '$')).extract ('//text()'), '$') whatever
FROM username s
INNER JOIN domain d ON s.user_id = d.user_id

Intra-SELECT variables?

Would it be possible to alias an expression returned by a SELECT statement in order to refer to it in other parts of this same SELECT as if it would be a column among others ?
A kind of "temporary variable" whose scope would be limited to the SELECT statement, a little bit like the WITH clause before a SELECT to use a temporary named recorset.
A naive sample of what I'd like to achieve :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FULLNAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
where FULLNAME could be used to determine the subsequent output field ISCORRECT, though not being a real column of the table USERS... instead of this laboured error-prone (but working) copy/paste :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FIRSTNAME + ' ' + NAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
This sample well describes what I want, but I can easily imagine similar needs where FULLNAME might also be used in other parts of the SELECT statement : in a JOIN, in the WHERE, in a GROUP BY, ORDER BY, etc.
PS : I use SQL Server 2005 but would be also interested in any 2008-specific answer.
Thanks a lot ! :-)
Edit :
In spite of my high respect towards those of you proposing to use a side- or inner-query, I don't feel at ease with such possibilities. My sample really is a naive one. The true queries are rather with 30 output fields including complex expressions (including calls to CLR functions), 15 inner/left outer joins, and 20 additionnal where criteria. I suspect I had rather not multiplying to many indirections towards co-queries if I can avoid it.
I believe you would have to put it in an inner query, and then be able to refer to it outside of the query.
Simplest example based on yours:
select a.fullname, case when len(a.fullname) > 3 then 1
else 0 end as incorrect
from (select firstname + ' ' + name as fullname
from users) a
Example with a CTE
;with names (FULLNAME) as (
SELECT FIRSTNAME + ' ' + NAME
FROM USERS
) select
FULLNAME,
CASE WHEN LEN(FULLNAME) > 3 THEN 1 ELSE 0 END AS ISCORRECT
FROM names
You can use cross apply to concatenate strings or do calculations etc.. that involves just the current row.
select T.fullname,
case when len(T.fullname) > 3
then 1
else 0
end iscorrect
from users as U
cross apply
(select U.firstname+' '+U.name) as T(fullname)
order by T.fullname
Though not very satisfied with it, I choose (temporarily ?) a third option : I avoid co-queries and copy/pasting my complex hard-to-read expression (here symbolized by the simple one aliased as FULLNAME) by embeddind it in a scalar function... which is therefore called several times in different parts of my SELECT.
SELECT
dbo.GetFULLNAME(FIRSTNAME,NAME) AS FULLNAME,
CASE WHEN LEN(dbo.GetFULLNAME(FIRSTNAME,NAME))>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
What do you think of it ?
(I precise that though more complex and unreadable than in my OP, the real expression remains a "simple" matter of string manipulation using several input fields, and doesn't involve any sub-querying or anything like that).

Resources