TSQL search exact match into a string - sql-server

I stumbling on an issue with string parsing; what I'm trying to achieve is substitute a marker string with a value but the string match needs to be perfect.
Keep in mind that before the compare I split the entire string in a table (rowID int, segment nvarchar(max)) wherever i find a space so, a thing like 'The delta_s is §delta_s' will look like:
rowID | segment
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s
After this i cycle each row with my table of "replacements" (idString nvarchar(max), val float); example:
Marker string (#segment): '§deltaT_s'
String to replace (#idString): '§deltaT_s'
The instruction I am using (since "like" is a lost cause as far I can see):
SELECT STUFF(#segment, PATINDEX('%'+#idString+'[^a-z]%', #segment), LEN(#idString), CAST(#val AS NVARCHAR(MAX)))
with #val being the number to substitute taken from the "replacements" table.
Now, in my table of "replacements" i have 2 delta like markers
1) §deltaT_s
2) §deltaT
My issue is that when the cycle start comparing the segments with the markers and the §deltaT comes up it will substitute the first part of the string in this way
'§deltaT_s' -> '10_s'
I don't understand what I am doing wrong with the REGEX anyone can give me and hand on this matter?
I am available in case more info are required.
Thank you,
F.

If possible you should change the marking style putting a paragraph symbol (§) at both side of the token, making one of the example in your comment
the deltaT_s is §deltaT_s§, see ya!
doing that the sentence will be split as
rowID | segment
--------------------
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s§,
5 | see
6 | ya!
if the replace values are stored in a fact table you will have something like
token | value
------------------
§deltaT§ | foo
§deltaT_s§ | 10
or you can fake it putting the symbol at the end of the token in you query.
Than it's possible to search for the substitution with a LIKE and a LEFT JOIN between the two tables
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '%'
SQLFiddle demo
If you cannot change the fact table you can fake the change adding the symbol after the token
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '§%'

Maybe it is not an option, but for me helped ones.
If you can use Regex in sql or create CLR functions, look at this link http://www.sqllion.com/2010/12/pattern-matching-regex-in-t-sql/ last 2 options.
For you the best will be to take last choice using CLR function.
Then you will can do like this:
Text: the deltaT_s is §delta, see ya!
Regex: (?<=[^a-z])§delta(?![a-z_]) - this (?<=[^a-z]) means that will not take to match and (?![a-z_]) is not followed by letters and underline.
Replace to : 10
I also have tried regex \b§delta\b (\b :Start or End of word), but it seems it doesn't like §

Related

Use TQuery.Locate() function to find other then first matching

Locate moves the cursor to the first row matching a specified set of search criteria.
Let's say that q is TQuery component, which is connected to the database with two columns TAG and TAGTEXT. With next code I am getting letter a. And I would like to use Locate() function to get letter d.
If q.Locate('TAG','1',[loPartialKey]) Then
begin
tag60 := q.FieldByName('TAGTEXT');
end
For example if I got table like this:
TAG | TAGTEXT
+---+--------+
| 1 | a |
+---+--------+
| 2 | b |
+---+--------+
| 3 | c |
+---+--------+
| 1 | d |
+---+--------+
| 4 | e |
+---+--------+
| 1 | f |
+---+--------+
is it possible to locate the second time number one occurred in table?
EDIT
My job is to find the occurrence of TAG with value 1 (which occurrence I need depends on the parameter I get), I need to iterate through table and get the values from all the TAGTEXT fields till I find that value in TAG field is again number 1. Number 1 in this case represents the start of new segment, and all between the two number 1s belongs to one segment. It doesn't have to be same number of rows in each segment. Also I am not allowed to do any changes on table.
What I thought I could do is to create a counter variable that is going to be increased by one every time it comes to TAG with value 1 in it. When the counter equals to the parameter that represents the occurrence I know that I am in the right segment and I am going to iterate through that segment and get the values I need.
But this might be slow solution, and I wanted to know if there was any faster.
You need to be a bit wary of using Locate for a purpose like this, because some
TDataSet descendants' implementation of Locate (or the underlying db-access layer) construct a temporary index on the dataset. which can be discarded immediately afterwards, so repeatedly calling Locate to iterate the rows of a given segment may be a lot more inefficient than one might expect it to be.
Also, TClientDataSet constructs, uses and then discards an expression parser for each invocation of Locate (in its internal call to LocateRecord), which is a lot of overhead for repeated calls, especial when they are entirely avoidable.
In any case, the best way to do this is to ensure that your table records which segment a given row belongs to, adding a column like the SegmentID below if your table does not already have one:
TAG | TAGTEXT|SegmentID
+---+--------+---------+
| 1 | a | 1
| 2 | b | 1
| 3 | c | 1
| 1 | d | 2
+---+--------+---------+ // btw, what happened to the 2 missing rows after this one?
| 4 | e | 2
| 1 | f | 3
+---+--------+---------+
Then, you could use code like this to iterate the rows of a segment:
procedure IterateSegment(Query : TSomeTypeOfQueryComponent; SegmentID : Integer);
var
Sql; String;
begin
Sql := Format('select * from mytable where SegmentID = %d order by Tag', [SegmentID]);
if Query.Active then
Query.Close;
Query.Sql.Text := Sql;
Query.Open;
Query.DisableControls;
try
while not Query.Eof do begin
// process row here
Query.Next;
end;
finally
Query.EnableControls;
end;
end;
Once you have the SegmentID column in the table, if you don't want to open a new query to iterate a block, you can set up a local index (by SegmentID then Tag), assuming your dataset type supports it, set a filter on the dataset to restrict it to a given SegmentID and then iterate over it
You have much options to do this.
If your component don´t provide a locateNext you can make your on function locateNext, comparing the value and make next until find.
You can also bring the sql with order by then use locate for de the first value and test if the next value match the comparision.
If you use a clientDataset you can filter into the component filter propertie, or set IndexFieldNames to order values instead the "order by" of sql in the prior suggestion.
You can filter it on the SQL Where clausule too.

Find first appearance of a character in a set of possible characters in a string in SQL Server 2012

I'm aware of the SQL Server CHARINDEX function which returns the position of a character (or sub-string) within another string. Still, I did not find any evident that there is support for regular expressions (unless I develop my own UDF).
What I'm looking for is the ability to find the first position of any character in a set within a string.
Example:
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text' ;
SELECT <some function> (#_Source_String,'"\') ;
This should return 9 because " appears before \. On the other hand:
SELECT <some function> (#_Source_String,'x\') ;
should return 21 because \ is before x.
I should add that performance is very important since this function/mechanism will be invoked with very high frequency.
Pattern matching capabilities in TSQL are pretty basic and often you would require CLR and regular expressions.
You can do this requirement with PATINDEX though. A list of characters in square brackets denotes a set of characters to match.
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text';
SELECT PATINDEX('%["\]%', #_Source_String),
PATINDEX('%[x\]%', #_Source_String);
Returns
+------------------+------------------+
| (No column name) | (No column name) |
+------------------+------------------+
| 9 | 21 |
+------------------+------------------+

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

sphinx - Column count doesn't match

I have the following in my sphinx
mysql> desc rec;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| desc | field |
| tid | uint |
| gid | uint |
| no | uint |
+-----------+---------+
And I ran the following successfully in sphinx sql
replace into rec VALUES ('24','test test',1,1, 1 );
But when I run in the C mysql API I get this error
Column count doesn't match value count at row 1
the c code is this
if (mysql_query(con, "replace into rec VALUES ('24','test test',1,1, 1 )") )
{
fprintf(stderr, "%s\n", mysql_error(con));
mysql_close(con);
exit(1);
}
Please note that the C program is connecting to the sphinx sql with no issues
One problem may be that you are quoting the integer for the id column. I would try taking out the single quotes around the 24. The column named desc is also concerning, since that is a reserved word in MySQL.
A good best practice is to always specify the column names, even if you are inserting into all columns. The reason is that you may want to alter the table later to add a column and you don't necessarily want to go back and change all your code to match the new structure. It also makes your code clearer since you don't have to reference the table structure to know what the values mean and it helps in case a tool like Sphinx is using a different order for the columns than you expect. Try changing your code to this, which specifies the columns and quotes them (mysql uses backticks for quotes) and also removes the quotes around the value for the id column:
if (mysql_query(con, "replace into rec (`id`, `desc`, `tid`, `gid`, `no`) VALUES (24, 'test test', 1, 1, 1)") )

A single MySQL query for 'bouncing' table selects

So, say for the sake of simplicity, I have a master table containing two fields - The first is an attribute and the second is the attributes value. If the second field is set to reference a value in another table it is denoted in parenthesis.
Example:
MASTER_TABLE:
Attr_ID | Attr_Val
--------+-----------
1 | 23(table1) --> 23rd value from `table1`
2 | ...
1 | 42 --> the number 42
1 | 72(table2) --> 72nd value from `table2`
3 | ...
1 | txt --> string "txt"
2 | ...
4 | ...
TABLE 1:
Val_Id | Value
--------+-----------
1 | some_content
2 | ...
. | ...
. | ...
. | ...
23 | some_content
. | ...
Is it possible to perform a single query in SQL (without parsing the results inside the application and requerying the db) that would iterate trough master_table and for the given <attr_id> get only the attributes that reference other tables (e.g. 23(table1), 72(table2), ...), then parse the tables names from the parenthesis (e.g. table1, table2, ...) and perform a query to get the (23rd, 72nd, ...) value (e.g. some_content) from that referenced table?
Here is something I've done, and it parses the Attr_Val for the table name, but I don't know how to assign it to a string and then do a query with that string.
PREPARE pstmt FROM
"SELECT * FROM information_schema.tables
WHERE TABLESCHEMA = '<my_db_name>' AND TABLE_NAME=?";
SET #str_tablename =
(SELECT table.tablename FROM
(SELECT #string:=(SELECT <string_column> FROM <table> WHERE ID=<attr_id>) as String,
#loc1:=length(#string)-locate("(", reverse(#string))+2 AS from,
#loc2:=length(#string)-locate(")", reverse(#string))+1-#loc1 AS to,
substr(#string,#loc1, #loc2) AS tablename
) table
); <--this returns 1 rows which is OK
EXECUTE pstmt USING #str_tablename; <--this then returns 0 rows
Any thoughts?
I love the purity of this approach, if pulled off. But I'm thinking you're creating a maintenance bomb. With a cure like this, who needs to be sick?
No one has ever said of a web site "Man, their data sure is pure!" They compliment what is being done with the data. I don't recommend you keep your hands tied behind your back on this one. I guarantee your competitors aren't.

Resources