Postgresql Full text search for numbers preceeded by hashsign - database

The documents that I want to run full text search on contains sequences of a hash sign followed by a series of digits e.g. #12345 #9999. None of the parsers seem to recognize the sequence as a single token.
The blank parser does recognize '#' as a token, so I thought I could use a synonym dictionary to match '#' with 'num' and then use the follows operator e.g. # <-> 1234. However; the blank parser groups all the blank character into one token so the token usually contains a leading space ' #'. I can't make a synonym entry with a leading space (or at least don't know how to).
If I included the english_stem dictionary in the mapping for the blank parser then ' #' is recognized as a lexeme. But so are all the other blank characters which adds too much noise to the generated ts_vector
Short of creating a custom parser is there anyway I can configure the search so that I can use full text search to query explicitly for #0000 patterns?

Related

Replacing Unicode characters using TRANSLATE function

A customer asked to create a custom character mapper function from specific names to ASCII in their SQL database.
Here is a simplified fragment that works (shortened for brevity):
select TRANSLATE(N'àáâãäåāąæậạả',
N'àáâãäåāąæậạả',
N'aaaaaaaaaaaa');
While analyzing the results on customer's dataset, I noticed one more unmapped symbol ă. So I added it to the mapper as follows:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaa');
Unexpectedly, it started failing with the message:
The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters.
Obviously, TRANSLATE thinks that ă is special and consists of more than one character. Actually, even Notepad thinks the same (copy ă and try to delete it using Backspace key - something unusual will happen. Delete key works normally, though).
Then I thought - if TRANSLATE considers it a two-char symbol, let's add a two char mapping then:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaaa');
No errors this time, yay. But the input string was not processed correctly, ă was not replaced with a.
What is the correct (case-sensitive) way to replace such "double symbols"? Can it be done using TRANSLATE at all? I don't want to add a bunch of REPLACE for every such symbol I find.

Azure search filter with ping's in the search result

I am building an app where you can search for gamers. I have the filter below:
var filter = $"(Game/any(x: search.in(x, '{string.Join("|", query.Games)}', '|')))";
The property Game is a an Edm.Collection and if one or more of the words in Games are pressent in the query.Games list then there should be a match. This works fine in most scenarios. But not if one of the games contains ' (ping). Ping is not a special character how should this be escaped?
Single quotes are reserved characters in the OData filter syntax. Single quotes (') are used to delimit the literal being used in the filter expression. To work around this, you can add an additional single quote to the one in the name of your game and it would be interpreted as a quote that's part of the text.
For ex. if you have game named ga'me2, you can use it in the filter as shown below.
$filter=Game/any(x: search.in(x, 'game1, ga''me2')
For your case, you will probably have to do a String.Replace on individual game names.

mongo/mongoose query to find field removing space

Can we create a mongoose query to return a result of a specific field data without space in between?
for example:
series 'KA 04 A' and 'KA04A' is same. i want to make a check while adding a new series if the new series is already exist in my mongodb thru mongoose.
Currently my series field have space in between
the current code of my mongoose is:
seriesModel.find({series_name:req.body.seriesName.toUpperCase(), status:'active'}, function(err, data){
if(err){
logger.info(err);
return false;
}
else if(data.length){
return res.json({error:true,message:'Series name already exists.'});
}
how can i return field data from db without space. so that i can make a check if the same series exists or not.
Ideally you would standardize your series_name field before saving the record, using a custom hook that strips out whitespace.
That being said, you can use a regular expression in your query. For example, you can search for uppercase/lowercase variations on the word "mongoose" as follows:
model.findOne({name: /mongoose/i}, fn(){...});
One way to solve your problem would be to take your string, split it into individual characters, and then inject the whitespace pattern \s between all the characters (I am assuming there is no leading or trailing whitespace in this field):
"^" + "KA04A".split("").join("\\s*") + "$" // "^K\s*A\s*0\s*4\s*A$"
(The start ^ and end $ are necessary so that we do not get false positives on e.g. "xKA04Ax")
Next, you would need to convert this string into the regex format ( /^K\s*A\s*0\s*4\s*A$/) and plug it into your query:
model.findOne({name: new RegExp("^K\\s*A\\s*0\\s*4\\s*A$")}, fn(){...});
Major Caveat: you will need to be super careful that your original string only contains a-zA-Z0-9 and whitespace. It cannot contain anything that could be mistaken as a regex pattern without first escaping those characters (see here for more on this approach).

Word wrap issues with SSIS Flat file destination

Background: I need to generate a text file with 5 records each of 1565 character length. This text file is further used to feed the data to a software.
Hence, they are some required fields and optional fields. I created a query with all the fields added together to get one single field. I populated optional fields with a blank.
For example:
Here is the sample input layout for each fields
Field CharLength Required
ID 7 Yes
Name 15 Yes
Address 15 No
DOB 10 Yes
Age 1 No
Information 200 No
IDNumber 13 Yes
and then i generated a query for each unique ID with the above fields into a single row which looks like following:
> SELECT Cast(1 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(' AS CHAR(200))+CAST('123456' AS CHAR(13))
UNION
SELECT Cast(2 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(''AS CHAR(200))+CAST('123456' AS CHAR(13))
Then, I created an SSIS package to produce the output text file through Flat file destination delimited.
Problem:
Even though the flat file is generated as per the desired length(1565). The text file looks differently when the word wrap is ON or OFF. When Word wrap is off , i get the record in single line. If the Word wrap is on, the line is broken into multiple. the length of the record in either case is same.
Even i tried to use VARCHAR + Space in the query instead of CHAR for each field, but there is no success. Its breaking the line for blank fields.
For example: Cast('' as varchar(1)) + Space(200-len(Cast('' as varchar(1)))) for Information field
Question: How do make it into a single line even though the word wrap is ON.
Since its my first post, please excuse me for format of the question
The purpose of word wrap is to put characters on the next line in instances of overflow rather than creating an extremely horizontal scrolling document.
Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, when possible.
Because this is what word wrap is there's nothing you can do to change its behavior. What does it matter anyway? The document should still be parsed as you would expect. Just don't turn word wrap on.
As far as I'm aware, having word wrap on or off has no impact on the document itself, it's simply a presentation option.
Applications parsing a document parse it as if word wrap were off. Something that could throw off parsing is breaks for a new line, but that is a completely different thing from word wrap.

.NET Regex for SQL Server string... but not Unicode string?

I'm trying to build a .NET regex to match SQL Server constant strings... but not Unicode strings.
Here's a bit of SQL:
select * from SomeTable where SomeKey = 'abc''def' and AnotherField = n'another''value'
Note that within a string two single quotes escapes a single quote.
The regex should match 'abc''def' but not n'another''value'.
I have a regex now that manages to locate a string, but it also matches the Unicode string (starting just after the N):
'('{2})*([^']*)('{2})*([^']*)('{2})*'
Thanks!
This pattern will do most of what you are looking to do:
(?<unicode>n)?'(?<value>(?:''|[^'])*)'
The upside is that it should accurately match any number of escaped quotes. (SomeKey = 'abc''''def''' will match abc''''def''.)
The downside is it also matches Unicode strings, although it captures the leading n to identify it as a Unicode string. When you process the regular expression, you can ignore matches where the match group "unicode" was successful.
The pattern creates the following groups for each match:
unicode: Success if the string is a Unicode string, fails to match if ASCII
value: the string value. escaped single quotes remain escaped
If you are using .NET regular expressions, you could add (?(unicode)(?<-value>)) to the end of the pattern to suppress matching the value, although the pattern as a whole would still match.
Edit
Having thought about it some more, the following pattern should do exactly what you wanted; it will not match Unicode strings at all. The above approach might still be more readable, however.
(?:n'(?:''|[^'])*'[^']*)*(?<!n)'(?<value>(?:''|[^'])*)'

Resources