How to fix pcre2 to \w will match marks? - c

I use Pcre2 lib that can be found here.
As you can see here Pcre2 \w matches only L and N categories and underscore and not matches M - marks (see here). However .Net Regex matches marks (see here).
I want to change the source code of PCRE2 to behave like .Net Regex, only I'm not sure I'm doing right.
What I want to do is find in the code where PT_WORD is referenced, like this:
case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N ||
fc == CHAR_UNDERSCORE) == (Fop == OP_NOTPROP))
And add another line like that:
case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N ||
PRIV(ucp_gentype)[prop->chartype] == ucp_M || // <-- new line
fc == CHAR_UNDERSCORE) == (Fop == OP_NOTPROP))
Is it right to do so? Are there other things to consider? What else do I need to change elsewhere in the code?

A .NET \w construct matches
Category Description
Ll Letter, Lowercase
Lu Letter, Uppercase
Lt Letter, Titlecase
Lo Letter, Other
Lm Letter, Modifier
Mn Mark, Nonspacing
Nd Number, Decimal Digit
Pc Punctuation, Connector. This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.
Note the differences: .NET \w does not match all numbers, only those from the Nd category, and as for the M category, it only matches Mn subset.
Make sure you match these Unicode categories within your code and \w will behave as in .NET regex.
Use
case PT_WORD:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Ll ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lu ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lt ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lo ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lm ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Mn ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Nd ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Lm ||
PRIV(ucp_gentype)[prop->chartype] == ucp_Pc) == (Fop == OP_NOTPROP))
RRETURN(MATCH_NOMATCH);
break;
Note you do not need to care about fc == CHAR_UNDERSCORE as it is part of \p{Pc} and you can't use just ucp_L as it also includes \p{LC}.

Related

Darker Google userscript not working in Tampermonkey 4.7

I'm trying to install the Darker Google userscript in Tampermonkey 4.7 (for Safari 12), but it's not working.
Since I'm a total newbie on Tampermonkey I don't really know where to put my hands on.
In the Tampermonkey's dashboard I see that this userscript does not match any particular website, while for example Darker Facebook shows "*.facebook.com" and it works, but perhaps this is just a bad guess.
The beginning of the userscript is:
(function() {var css = "";
css += [
"/* Darker Google by Zigboom Designs */",
"",
"#namespace url(http://www.w3.org/1999/xhtml);"
].join("\n");
if (false ||
(document.location.href.indexOf("http://blogsearch.google") == 0) ||
(document.location.href.indexOf("http://books.google") == 0) ||
(document.location.href.indexOf("http://209.85.165.104") == 0) ||
(document.location.href.indexOf("http://translate.google") == 0) ||
(document.location.href.indexOf("http://video.google") == 0) ||
(document.location.href.indexOf("https://encrypted.google") == 0) ||
(document.location.href.indexOf("https://translate.google") == 0) ||
(document.location.href.indexOf("http://scholar.google") == 0) ||
(document.location.href.indexOf("https://scholar.google") == 0) ||
(document.location.href.indexOf("http://images.google") == 0) ||
(document.location.href.indexOf("https://images.google") == 0) ||
(document.location.href.indexOf("https://www.google.com/fonts") == 0) ||
(new RegExp("^https?://www\\.google\\.[a-z.]*/(?!calendar|nexus|adsense|analytics|maps).*$")).test(document.location.href))
which makes me thing it should match any *.google.com website... but it doesn't.
Its because the userscript is not running in the page. Add the following line, save it and reload the page.
// #match *://*.google.com/*
The above makes the userscript to run on Google sites.
For match-patterns, refer here

Multiple logical operator || (OR) conditions in for loop in C

I started studying C a week ago and decided to write my own tic-tac-toe game for practise.
I have a game loop in main:
for(int i = 1; player1.isWinner!=1 || player2.isWinner!=1 || noWinner!=1; i++){...}
Where i - counts turns and condition of end of the game is one of players has won, or no one has won (draw).
For now, it quits executing only if all conditions are 1.
How can I make it work right?
Is a value of 1 where someone won?
If so, then you would need check any of those conditions is true and loop if they are not:
!(player1.isWinner==1 || player2.isWinner==1 || noWinner==1)
Or using AND, check and loop when none are set:
(player1.isWinner!=1 && player2.isWinner!=1 && noWinner!=1)
Consider extracting the condition to a well-named function in order to aid readability and maintanability:
int hasWinner(/*...*/)
{
return player1.isWinner == 1 || player2.isWinner == 1 || noWinner == 1;
}
It then becomes obvious what the condition should be:
for(int i = 1; !hasWinner(/*...*/); i++){ /*...*/ }
You seem to be using some sort of backwards boolean logic. If 1 represents the boolean value true, then the condition should be
!(player1.isWinner || player2.isWinner || noWinner)
This assumes that you set player1.isWinner to 1 when player1 has won.
It would probably be easier to use bool with values true or false from stdbool.h.

Libxml2: Outputting XML element with attribute and content

I'm using the libxml2 XMLTextWriter API (of which an official example is provided here) to output XML, but can't find any examples or see how to produce an element with both attributes and content, like so:
<MyElement myAttrib="x">Content</MyElement>
Surprisingly, I'm not seeing any questions on SO that address this. Maybe because people just output XML themselves rather than using a library.
The C code I have so far is:
if (xmlTextWriterStartElement(writer, BAD_CAST "MyElement") < 0
|| xmlTextWriterWriteAttribute(writer, BAD_CAST "myAttrib", "x") < 0
|| somehow print out content < 0
|| xmlTextWriterEndElement(writer) < 0)
{
// Handle error
}
It looks like xmlTextWriterWriteFormatString or xmlTextWriterWriteString will do the trick. Somehow I missed those at first
when looking through the API details.
Rather than delete, I'll leave here as this info might be useful for others looking for this info quickly.
Example:
if (xmlTextWriterStartElement(writer, BAD_CAST "MyElement") < 0
|| xmlTextWriterWriteAttribute(writer, BAD_CAST "myAttrib", "x") < 0
|| xmlTextWriterWriteString(writer, "Content") < 0
|| xmlTextWriterEndElement(writer) < 0)
{
// Handle error
}
Update: Tested and confirmed this works.

Adding numbers Conditionally

This seems basic but I need to add numbers whether or not they are the condition is "on" (i'll probably change this to boolean). So my question is how to do this in C code if it is possible. I tried something of this sort and various renditions:
dfTotalTaxOwed[nIndex] = dfFedTaxOwed[nIndex] + if(arrNYStateTaxStatus[nIndex] == 1){dfNYStateTaxOwed[nIndex];}
+ if(arrNDStateTaxStatus[nIndex] == 1){dfNDStateTaxOwed[nIndex];}
+ if(arrNHStateTaxStatus[nIndex] == 1){dfNHStateTaxOwed[nIndex];}
+ if(arrOHStateTaxStatus[nIndex] == 1){dfOHStateTaxOwed[nIndex];}
+ if(arrPAStateTaxStatus[nIndex] == 1){dfPAStateTaxOwed[nIndex];}
+ if(arrNJStateTaxStatus[nIndex] == 1){dfNJStateTaxOwed[nIndex];}
+ dfFicaTaxOwed[nIndex];
thanks
You can use the ternary operator.
expr ? true value : false value
I.e. replace if(arrNYStateTaxStatus[nIndex] == 1){dfNYStateTaxOwed[nIndex];} with (arrNYStateTaxStatus[nIndex] == 1) ? dfNYStateTaxOwed[nIndex] : 0.
On a side note, you might want to consider redesigning your program to use a dictionary instead of having an array for each state.
C control statements don't have return values so this approach won't work. Is there a reason you don't want to do a series of if statements like
if(arrNYStateTaxStatus[nIndex] == 1) {
dfTotalTaxOwed[nIndex] += dfNYStateTaxOwed[nIndex];
}
?

Comparing chars

for my word game i'm comparing chars in words, 6 in the first word, 6 in the random word.
Now I want it to do things when certain letters are on there place.
for example if
(aChar1 == aChar7 && aChar2 == aChar8){
//do something
}
but later in my code there is
(aChar1 == aChar7 && aChar2 == aChar8 && aChar3 == aChar9){
//do something
}
Now I just want the second code line to happen and not the first since that is only if the first 2 are in place. I need to add code like :
(aChar1 == aChar7 && aChar2 == aChar8 && aChar3 isnotequalto aChar9){
//do something
}
What code should I use for saying not equal?
use this and tell me if it works
achar3!=achar9

Resources