What is a better set of settings to use than text to compare xml files using Collaborator's DiffMerge? - diffmerge

Collaborator uses DiffMerge to compare files. It provides a means to add rulesets. There is nothing provided for XML files. I'd like to be able to compare without including the comments. I can get sections on the same line to behave with \<!--.*--\>
Multiline comments are not working.

Better, but not close to perfect. XML really needs ...
In any case, creating a Custom Context for the multi-line comments does exclude those comments from the testing of "this changed".
Ruleset: XML Files
Suffixes: xml runsettings config
Line Match Handling: [0x00000010]
Ignore/Strip EOLs: true
Ignore/Fold Case: true
Strip Whitespace: true
Also Treat TABs as Whitespace: true
Default Context Guidelines: [0x0000001a]
Classify Differences as Important: true
EOL differences are important: N/A
Case differences are important: true
Whitespace differences are important: false
Treat TABs as Whitespace: true
Custom Contexts: [1 contexts]
Context[0]: Comment: \<!-- to --\> (Escape character \)
Guidelines: [0x0000001b]
Classify Differences as Important: false
EOL differences are important: N/A
Case differences are important: N/A
Whitespace differences are important: N/A
Treat TABs as Whitespace: N/A
Character Encoding:
Automatically detect Unicode BOM: true
Fallback Handling: Use System Local/Default
Lines To Omit: [3 patterns]
LOmit[0]: Each Line Matching: ^[[:blank:]]*$
LOmit[1]: Each Line Matching: \f
LOmit[2]: Each Line Matching: \<!--.*--\>
The important part is the context start \<!--, end --\>, escape character \
and to realize that the ignored content does not get grayed out.

Related

VBSCRIPT REPLACE not removing spaces from Decrypted fields

Got quite a head-scratcher....
I'm using the VBScript function REPLACE to replace spaces in a decrypted field from a MSSQL DB with "/".
But the REPLACE function isn't "seeing" the spaces.
For example, if I run any one of the following, where the decrypted value of the field "ITF_U_ClientName_Denc" is "Johnny Carson":
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"),"Chr(160)","/")
REPLACE(CSTR(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"))," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/",1,-1,1)
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/",1,-1,0)
The returned value is "Johnny Carson" (space not replaced with /)
The issue seems to be exclusively with spaces, because when I run this:
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"),"a","/")
I get "Johnny C/rson".
Also, the issue seems to be exclusively with spaces in the decrypted value, because when I run this:
REPLACE("Johnny Carson"," ","/")
Of course, the returned value is "Johnny/Carson".
I have checked what is being written to the source of the page and it is simply "Johnny Carson" with no encoding or special characters.
I have also tried the SPLIT function to see if it would "see" the space, but it doesn't.
Finally, thanks to a helpful comment, I tried VBS REGEX searching for \s.
Set regExp = New RegExp
regExp.IgnoreCase = True
regExp.Global = True
regExp.Pattern = "\s" 'Add here every character you don't consider as special character
strProcessed = regExp.Replace(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"), "?")
Unfortunately, strProcessed retruns "Johnny Carson" (ie. spaces not detected/removed).
If I replace regExp.Pattern = "a", strProcessed returns "Johnny C?rson".
Many thanks for your help!!
As we found, the right character code is 160, and that did the trick:
replace(..., ChrW(160), "...")
This seems to be data specific and, additionally, as an alternative you can try to get same encoding of the source script (i.e. save with Save As with Encoding), or convert received database value into a different target encoding.

IRI.getShortForm() does not work as expected with spaces and some extra symbols

It seems that getShortForm() of the IRI class is not able to process the classnames with spaces and some other symbols.
Is there a method in the OWL-API that parses IRIs more corectly (the same way as Protege does).
For this code
for (OWLClass cls : clses) {
String s = cls.toString();
String s1 = cls.asOWLClass().getIRI().getShortForm();
System.out.println("SHORT: "+s1+" LONG: "+s);
}
I've got the following strange results:
SHORT: CAPECCWEAttackPatterns#DoS: resource consumption (memory) LONG: <http://www.grsu.by/net/CAPECCWEAttackPatterns#DoS: resource consumption (memory)>
SHORT: restart LONG: <http://www.grsu.by/net/CAPECCWEAttackPatterns#DoS: crash / exit / restart>
SHORT: data LONG: <http://www.grsu.by/net/CAPECCWEAttackPatterns#Modify application data>
IRIs cannot contain spaces. They need to be escaped as %20 sequences. The same is true for a number of other characters.
For spaces and other characters, the common approach is to use rdfs:label annotations. Protege has renderers that use these to show classes and properties on screen.

Validate angular regex so so it doesn’t have info#, admin#, help#, sales#

I would like to block some kind of emails using angular ng-pattern
The emails below should not be valid
info#anything.com
admin#anything.com
help#anything.com
sales#anything.com
The regex below worked
^((?!info)(?!admin)(?!help)(?!sales)[a-zA-Z0-9._%+-])+#[a-zA-Z0-9.-]+\.[a-zA-Z]{1,63}$
But not as I expected because I wold like to allow i.e
information#anything.com
How can I block the info#, admin#, help#, sales#?
Thanks
You may join the lookaheads into 1 and add # after the values to ensure you match the user part up to # (as a whole):
/^(?!(?:info|admin|help|sales)#)[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{1,63}$/
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
The (?!(?:info|admin|help|sales)#) negative lookahead fails the match if, after the start of a string (^), there is info# or admin# or help#, or sales#.

removing portion of filename

I have done some searching but cannot see how to actually code this. I am new to Python and not really sure what method I should use to try to do this.
I have some files that I would like to rename. Unfortunately the portion towards the file extension is never the same and would like to just remove it.
File name is like AC_DC - Shot Down In Flames (Official Video)-UKwVvSleM6w.mp3
Any help would be appreciated.
Since this looks like the result from youtube-dl, the "random" substring is most likely the unique video id, which in my experience is always 11 characters long. It can, however, include dashes (-), so the regex-approach suggested by smitrp would not always work.
I use this "dirty" workaround:
>>> original_name="AC_DC - Shot Down In Flames (Official Video)-UKwVvSleM6w.mp3"
>>> new_name=original_name[:-16]+".mp3"
>>> new_name
'AC_DC - Shot Down In Flames (Official Video).mp3'
Edit:
If you really, REALLY want to find the "-XXXX"-portion, have a look at str.rfind(). This will help you to find the index of the last dash (-), which you can directly use for the slice notation of the string.
Disclaimer:
This will provide wrong results, if the video id contains a dash, e.g. here: https://www.youtube.com/watch?v=7WVBEB8-wa0
Then you will find the last dash, remove -wa0 and be left with -7WVBEB8 at the end of the filename.
Using idea of the above answer, one can also take into account that a normal word does not
contain more than one capital character.
def youtube_name_fix(folder):
import os
from pathlib import Path
import re
REGEX = re.compile(r'[A-Z]')
for name in os.listdir(folder):
basename = Path(name)
last_12 = basename.stem[-12:]
# check if the end string is not all uppercase (then it could be part of a valid name)
if not last_12.isupper():
# check if the last string has more than one uppercase letters
if len(REGEX.findall(last_12)) > 1:
# remove the end youtube string and create new full path
new_name = os.path.join(folder, basename.stem[:-12] + basename.suffix)
try:
os.rename(os.path.join(folder,name), new_name)
except Exception as e:
print(e)
> youtube_name_fix(p)
old name -> "4-Discrete and Continuous Probability Models-esHwigpYggU.mp4"
new name -> "4-Discrete and Continuous Probability Models.mp4"

Specify alternative line_end character for Elixir File.stream! function

Elixir's File.stream! splits on and assumed \r character.
Is it possible to specify for example, \r\n or any other pattern?
Such a convenience would make File.stream! a handy file parser.
Edit: Added source file content:
iex(1)> File.read! "D:\\Projects\\Telegram\\PQ.txt"
"1039027537039357001\r\n1124138842463513719\r\n1137145765766942221\r\n1159807134726147157\r\n1162386423249503807\r\n1166092057686212149\r\n1192934946182607263\r\n1239437837009623463\r\n1242249431735251217\r\n1286092661601003031\r\n1300223652350017207\r\n1320700236992142661\r\n1322986082402655259\r\n1342729635050601557\r\n1342815051384338027\r\n1361578683715077199\r\n1381265403472415423\r\n1387654405700676857\r\n1414719090657425471\r\n1438176310698548801\r\n1440426998028857687\r\n1444777794598883737\r\n1448786004429696643\r\n1449069084476072141\r\n1449922801627060913\r\n1459186197300152561\r\n1470497644058466497\r\n1497532721434112879\r\n1514370843858307907\r\n1528087672407582373\r\n1530255914631110911\r\n1537681216742780453\r\n1547498566041252091\r\n1563354550428106363\r\n1570520040759209689\r\n1570650619548126013\r\n1572342415580617699\r\n1595238677050713949\r\n1602246062455069687\r\n1603930707387709439\r\n1620038771342153713\r\n1626781435762382063\r\n1628817368590631491\r\n1646011824126204499\r\n1654346190847567153\r\n1671293643237388043\r\n1674249379765115707\r\n1683876665120978837\r\n1700490364729897369\r\n1724114033281923457\r\n1729626235343064671\r\n1736390408379387421\r\n1742094280210984849\r\n1750652888783086363\r\n1756848379834132853\r\n1769689620230136307\r\n1791811376213642701\r\n1802412521744570741\r\n1816018323888992941\r\n1816202297040826291\r\n1833488086890603497\r\n1834281595607491843\r\n1840295490995033057\r\n1843931859412695937\r\n1845134226412607369\r\n1847514467055999659\r\n1868936961235125427\r\n18733753
Example:
iex(134)> s|> Enum.to_list
["1039027537039357001\n", "1124138842463513719\n", "1137145765766942221\n",
"1159807134726147157\n", "1162386423249503807\n", "1166092057686212149\n",
"1192934946182607263\n", "1239437837009623463\n", "1242249431735251217\n",
"1286092661601003031\n", "1300223652350017207\n", "1320700236992142661\n",
"1322986082402655259\n", "1342729635050601557\n", "1342815051384338027\n",
"1361578683715077199\n", "1381265403472415423\n", "1387654405700676857\n",
"1414719090657425471\n", "1438176310698548801\n", "1440426998028857687\n",
"1444777794598883737\n", "1448786004429696643\n", "1449069084476072141\n",
"1449922801627060913\n", "1459186197300152561\n", "1470497644058466497\n",
"1497532721434112879\n", "1514370843858307907\n", "1528087672407582373\n",
"1530255914631110911\n", "1537681216742780453\n", "1547498566041252091\n",
"1563354550428106363\n", "1570520040759209689\n", "1570650619548126013\n",
"1572342415580617699\n", "1595238677050713949\n", "1602246062455069687\n",
"1603930707387709439\n", "1620038771342153713\n", "1626781435762382063\n",
"1628817368590631491\n", "1646011824126204499\n", "1654346190847567153\n",
"1671293643237388043\n", "1674249379765115707\n", "1683876665120978837\n",
"1700490364729897369\n", "1724114033281923457\n", ...]
iex(135)> s|> String.to_integer|> Primes.factorize|> Enum.to_list
Elixir handles the differences between Windows and Unix just fine by always normalizing "\r\n" into "\n", so developers don't need to worry about both formats. That's what is happening in the example above and that's what you should expect from the operations in both IO and File module.
You could open the file in raw mode (see here) and check the characters yourself.

Resources