QRadar, parsing Log - text-parsing

I want to parse some application log, I did a lot of regex that works correctly with notepad++ and the website www.regex101.com .
But when I apply them in QRadar they don't match nothing.
For example
12/2/2017 9:53:58,4040007,blablablbla,blablabla --- Abonnement Mobile N° : 0663016666 | balbalbal | 03/06/2006 11:11:22 --- Soldes,10.10.10.10
I did this regex (?<=---)\s+[A-Za-z+ \/\w+0-9._%+-]+(?=(\sN°|\s\sN°|\sID)) to match Abonnement mobile it works correctly , but it doesn't match anything in QRadar.

QRadar does not accept all regex configurations. When you try parsing something you can use extract property field to check. Here is a regex that works fine in my system.
\-\-\-\s(\w+\s\w+)\s
this regex will work if only "Abonnement Mobile" field is includes letters or digits. If you want to catch "Abonnement Mobile N°" you can use this regex and this will work whatever comes in this field.
\-\-\-\s([^\:]+)\:

Related

Semantic Syntax Markup Language (SSML) on blueprint?

I was trying to use SSML syntax while filling out the Alexa blueprint skill form but then I got an error. Is there a way the form can support SSML?
Entry:
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
Error:
Special characters are not supported. However, Alexa can speak special characters ( # # $ % _ + = | ; ), if enclosed in single quotes ( '#' ).
Sorry, but (currently) you can only use PlainText in blueprint based skills, because (as you mentioned the error message) the special characters needed for an XML based syntax are not supported by the form.
Just an additional hint to your SSML text, if you want to use it in a regular skill:
As a XML based syntax it needs embracing speak tag and no outside/final dot.
<speak><amazon:effect name="whispered">I am not a real human.</amazon:effect></speak>
https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html

Regex Negative lookbehind in React app breaking app on iOS devices

After a recent build my app has stopped displaying on iOS devices (just shows a blank screen).
After a log of digging, I've been able to narrow down the cause and it's this regex expression:
(?<!#)
Here's a context how I used it:
/\b(?<!#)gmail\b|\b(?<!#)google\b/i
which means I want to capture the words "gmail" and "google", but only if they are not preceded by an "#" symbol.
My question is, what is the correct regex expression that will do the same job, and work on all browsers/devices?
Thank you
You could capture the words "gmail" and "google", but only if they are not preceded by an "#" symbol by matching them using a non capture group #(?:gmail|google)
Use an alternation | and a capture group (gmail|google) for google or gmail.
#(?:gmail|google)\b|\b(gmail|google)\b
See a regex demo
For example, if you are doing a replacement you could check for the existence of group 1.
const regex = /#(?:gmail|google)\b|\b(gmail|google)\b/g;
const str = `gmail
google
#gmail
#google
test#google.com
#agmail`;
let res = Array.from(str.matchAll(regex), m => m[1] ? `[REPLACED]${m[1]}[REPLACED]` : m[0]);
console.log(res)
You could just directly match a non # character:
/[^#](google|gmail)\b/i
If you need to also match the domain (which can only be google or gmail), then you may access the first capture group.
It seems that a combination of your two answers worked for me:
/[^#](?:gmail|google)\b|\b(gmail|google)\b/i
Thanks!

Azure search not behaving as expected for dashes

I'm having an issue when using azure search for the following example data set: abc-123-456, abc-123-457, abc-123-458, etc
When making the search for abc-123-456, I'd expected to only return one results but instead getting all results containing abc-123-...
Is there some setting or way to change this behavior?
Current search settings:
TheSearchIndex.TokenFilters.Add(new EdgeNGramTokenFilter("frontEdgeNGram")
{
Side = EdgeNGramTokenFilterSide.Front,
MinGram = 3,
MaxGram = 20
});
TheSearchIndex.Analyzers.Add(new CustomAnalyzer("FrontEdgeNGram", LexicalTokenizerName.Whitespace)
{
TokenFilters =
{
TokenFilterName.Lowercase,
new TokenFilterName("frontEdgeNGram"),
TokenFilterName.Classic,
TokenFilterName.AsciiFolding
}
});
SearchOptions UsersSearchOptions = new SearchOptions
{
QueryType = SearchQueryType.Simple,
SearchMode = SearchMode.All,
};
Using azure.search.documents ver 11.1.1
Edit: Search with abc-123-456* with the asterisk gives me the one result as expected. How to get this behavior working as default?
Just to add to this..
The portal version is 2020-06-30
The sdk version we use is azure.search.documents ver 11.1.1
abc-123-456 does NOT work as expected
"abc-123-456" does NOT work as expected
"abc-123-456"* does NOT work
"abc-123-456*" does NOT work
If we append an asterisks to the end of the search text and it is not within a phrase .. it works as expected.
IE:
abc-123-456* works as expected.
(abc-123-456* | abc-123-457* ) works as expected.
Why is the asterisks required? How can we make this work within a phrase?
This is expected behavior when using the EdgeNGramTokenFilter inside the custom analyzer configuration. The text “abc-123-456” is broken into smaller tokens like “abc”, “abc-1”, “abc-12”, “abc-123”….”abc-123-456”. Check out the Analyzer API for the full list of tokens generated by a particular analyzer.
For a query - abc-123, if the default analyzer is being used, the query terms will be abc and 123 and will match all the documents that contain these terms.
The prefix query on the other hand is not analyzed and looks for documents that contain the prefix as is “abc-123”. A prefix search bypasses full-text search and looks for verbatim matches, which is why the correct result is coming back. Full-text search is over tokens in inverted indexes. Everything else (filters, fuzzy, regex, prefix/wildcard, etc.) is over verbatim strings in a separate unprocessed/internal index.
Another way can be to set only the search analyzer on the field to keyword to avoid breaking the input query.

Validate angular regex so so it doesn’t have info#, admin#, help#, sales#

I would like to block some kind of emails using angular ng-pattern
The emails below should not be valid
info#anything.com
admin#anything.com
help#anything.com
sales#anything.com
The regex below worked
^((?!info)(?!admin)(?!help)(?!sales)[a-zA-Z0-9._%+-])+#[a-zA-Z0-9.-]+\.[a-zA-Z]{1,63}$
But not as I expected because I wold like to allow i.e
information#anything.com
How can I block the info#, admin#, help#, sales#?
Thanks
You may join the lookaheads into 1 and add # after the values to ensure you match the user part up to # (as a whole):
/^(?!(?:info|admin|help|sales)#)[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{1,63}$/
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
The (?!(?:info|admin|help|sales)#) negative lookahead fails the match if, after the start of a string (^), there is info# or admin# or help#, or sales#.

How to query atom field with unicode value in Google App Engine production search?

I wrote some text search with use Google App Engine search.
In SDK I tested such query on atom field:
u'tag:"wartości"'
In production I run the same query but it not works on same data.
How can I do unicode query on atom field?
Is it possible to use unicode in Google App Engine search?
We are aware of this issue and plan to fix ASAP. The fix that we're currently planning will require that the atom field value include exactly the same accent characters in order to match. Matches will continue to be case-insensitive. We expect that at least initially, values that use combining diacritical marks will be treated as different values than those using precomposed characters. We may revisit that decision depending on feedback, but it's the most straightforward fix on our end.
For more on the precomposed characters vs. combining diacritical marks, see this Wikipedia article:
http://en.wikipedia.org/wiki/Precomposed_character
Chris
It looks that I need translate AtomField values into new string and I need to translate queries too. This workaround will allow only Polish unicode search. I do not know tonkenization rules so I use 'q', 'x' to expand alphabet since not used in Polish.
# coding=utf-8
translate = {
u'ą': u'aq',
u'Ą': u'Aq',
u'ć': u'cq',
u'Ć': u'Cq',
u'ę': u'eq',
u'Ę': u'Eq',
u'ł': u'lq',
u'Ł': u'Lq',
u'ń': u'nq',
u'Ń': u'Nq',
u'ó': u'oq',
u'Ó': u'Oq',
u'ś': u'sq',
u'Ś': u'Sq',
u'ż': u'zx',
u'Ż': u'Zx',
u'ź': u'zq',
u'Ź': u'Zq',
}
import re
reTranslate = re.compile(u'(%s)' % u'|'.join(translate))
print reTranslate.pattern
test = u"""\
Właściwie prowadzona komunikacja wewnętrzna w firmie,\
zwłaszcza dużej czy posiadającej rozproszoną sieć oddziałów,\
może przynieść oszczędność czasu, a co za tym idzie, również pieniędzy."""
print reTranslate.sub(lambda match: translate[match.group(0)], test)

Resources