Regular expression to extract string in C Code (not inside comment) - c

I have this code in C but I only know how to extract string with regular expression that not inside comment code:
1. /* * "path_build()" function in "home.c" for more information.
2. * this is an example basic"
3. */
4.
5. /*** Free ***/
6. VALOR = string_make(format("%sxtra", libpath));
7. event_signal_string(EVENT_INITSTATUS, "Inicializando...");
should only return:
"%sxtra"
"Inicializando..."
I try:
".*"
but its don't work, it show me all text inside "", including the strings that inside /*...*/
I use EditPag Pro, RegExp panel.
It's a game translation project, I take the string of every C file and I translate to Spanish. I can't remove the comments of the original file.
The only thing I have clear is that this is the regex to find comments in C, maybe that will help the solution:
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)
Any help?
Edit: I put number of lines.

Hernaldo, this is an interesting question.
Here are two versions because I am not sure if you want to capture the "inside of the string" or "the whole string"
The regexps below capture the strings to capture Group 1. You completely ignore the overall match (Group 0) and just focus on Group 1. To retrieve the strings, just iterate over Group 1 matches in your language (discarding empty strings if any).
Version 1: "The inside of the string"
(?s)/\*.*?\*/|"([^"]+)"
This will capture %sxtra and Inicializando... to Group 1.
Version 2: "The whole string"
(?s)/\*.*?\*/|("[^"]+")
This will capture "%sxtra" and "Inicializando..." to Group 1.
Please let me know if you have any questions!
Note: I did not handle /* nested /* comments */ */ as that was not specified in the question. That would require a bit of tweaking and probably a regex engine supporting recursion.

The final solution for EditPad 6/7 is:
(?<!^[ \t]*/?[*#][^"\n]*")(?<=^[^"\n]*")[^"]+
Link:
Regular expression for a string that does not start with a /*

Related

Multi-line inputs in RIDE (APL)?

I wish to type in a multi-line array as follows:
ast ← ('∘'
('a' ('p'))
('b'
('q' ('v'))
('r'))
('c'
('s' ('w' 'x'))
('t' ('y' 'z'))))
This is correctly paranthesized, but I am unable to copy-paste it into the Dyalog APL RIDE interface. I searched around, and found two answers both of which do not help me:
A github issue: Unable to paste any DFN from dfns website asks about pasting DFNS, which explains that one can use ∇. When I type the ∇, the UI of the text-box changes and becomes dark gray, which is encouraging, but on typing ast ← { <ENTER> or ast ← ( <ENTER> it errors out.
This SO question on multi-line text values in APL asks how to input text. I tried to use the { } method, but when I type ast ← { <ENTER> it already errors out.
So, how does one type multiline data in APL?
The session doesn't currently support multi line arrays.
For now, you still have to create multi dimensional arrays programmatically for the most part (although you can, for example, create an editable text matrix, fill it with "numbers" and then use ⍎¨)
cmat←⍪''
)ed cmat
paste this
0123
2314
1244
then fix it (press Esc) and use
⍎¨cmat
For me, I find Shift-Enter and Ctrl-Enter are my best friends most of the time
It looks like you're trying to represent a tree as a nested array (look at dfns tview and tnest and other tree stuff for more on that). As such, it doesn't look like you really need multiline (all arrays in APL are hyperrectangular)?
ast←('∘'('a' ('p'))('b'('q'('v'))('r'))('c'('s' ('w' 'x'))('t' ('y' 'z'))))
Traditional functions (tradfns) can be copy and pasted readily, if they use the session input format:
∇ r ← larg Fun rarg
r ← larg, rarg
∇
Multi-line dfns can be pasted. First use the ]dinput user command.
]dinput
then paste
dfn ← {
⍺, ⍵
}
(btw, regarding ∇ from the previous comment, you can paste the multiline dfn and prepend with ∇, but you have to put ∇ on the last line [n] and press enter for it to fix the function. The ]dinput user command is a bit simpler)
In addition to Richard Park's excellent answer, it should be noted that Dyalog is working on multi-line arrays on two fronts:
Designing a new multi-line array notation
The newest edition was presented in 2018
⎕SE.Link.Serialise can create multi-line notation from most any array
⎕SE.Link.Deserialise will return the array specified by its argument notation array
Multi-line session input
Version 18.0 (due summer 2020) includes experimental multi-line session support. It has to be enabled with a configuration parameter.
It will detect unfinished dfns (e.g. MyFn←{ and 4{) and control structures (e.g. :If myVar>5 and :Class MyCl) but not array notation.
18.0 will also contain a tool, ⎕SE.Link.Array which allows wrapping multi-line array notation in a dfn:
{
[1 2 3
4 5 6]
}⎕SE.Link.Array⍬

regex: extract text between two string with text that match a specific word

I'm refactorying a very big C project and I need to find out some part of code written by specific programmer.
Fortunately every guy involved in this project mark his own code using his email address in standard C style comments.
Ok, someone could say that this could be achieved easily with a grep from command line, but this is not my goal: I may need to remove this comments or substitute them with other text so regex is the only solution.
Ex.
/*********************************************
*
* ... some text ....
*
* author: user#domain.com
*
*********************************************/
From this post I found the right expression to search for C style comments which is:
\/\*(\*(?!\/)|[^*])*\*\/
But that is not enough! I only need the comments which contains a specific email address. Fortunately the domain of email address I'm looking for seems to be unique in the whole project so this could make it simpler.
I think I must use some positive lookahead assertion, I've tried this one:
(\/\*)(\*(?!\/)|[^*](?=.*domain.com))*(\*\/)
but it doesn't run!
Any advice?
You can use
\/\*[^*]*(?:\*(?!\/)[^*]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See the regex demo
Pattern details:
/\* - comment start
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
#domain\.com - literal domain.com
[^*]*(?:\*(?!\/)[^*]*)* - everything but */
\*\/ - comment end
A faster alternative (as the first part will be looking for everything but the comment end and the word #domain):
\/\*[^*#]*(?:\*(?!\/)[^*#]*|#(?!domain\.com)[^*#]*)*#domain\.com[^*]*(?:\*(?!\/)[^*]*)*\*\/
See another demo
In these patterns, I used an unrolled construct for (\*(?!\/)|[^*])*: [^*]*(?:\*(?!\/)[^*]*)*. Unrolling helps construct more efficient patterns.

Flex lexical analyzer: remember starting position

I'm working on a flex program and i want to do the following :
-read a line , split it in tokens, remember the tokens(let's say in array1)
if number of tokens equal the number of tokens on the next line(which were remembered in array2) , print array1[i] : array2[i];
First i thought to remember each of the tokens in a matrix , but it's too muahc to do , allocating dynamic memory and so on . I'm sure there is a simple way , i just don't have enough experience with flex.
Thank you.
As far as I know, there is no built-in functionality in Flex to store a sequence of tokens, so that you can print them later. So just do it in normal code. Just use two (possibly malloced) arrays. – Thomas Padron-McCarthy
You could try implementing a simple vector structure yourself and use this. Then let flex return a special value on newline. Just check for this value and you know you're done. – Shadowwolf

Regular expression to replace (if|then)

I have some verse references in articles that I want to link to the adjacent verses file.
Example:
some text (Gen 2:15, 16), other text (Ex 4:12, 13) more.. etc.
I could replace the first one with the following regex:
\(Gen \1: \2, \3\)
Here I fixed the "1" (book=) and the "Gen"
But I couldn't figure out how to use if|then so that I could give it all arrays of (Gen|Ex|Lev.. etc.), so that it replaces Gen with book number "1", Ex "2".. etc.
You need to somewhere define what all the book orders are. And you'll need to use some sort of scripting language, not just a plain old regex. For example, you could do something along the lines of:
books = ["Gen", "Ex", ..., "Rev"]
...and then replace book_name with books.index(book_name)+1
The exact code/syntax obviously depends on which language you choose to use.
With notepad++ you won't be able to get the order numbers.
But everything else is possible. You need to put each book on a new line:
find \), and replace by \n
Then use this pattern:
[a-z\s]+\(([a-z]+)\s+([0-9:]+)\,\s+([0-9]+)\)
and replace by:
\1: \2, \3
you'll get the list of urls. Which then you can merge back to one line if needed.
The only problem is the book number.
Demo is here: https://regex101.com/r/qN8mO7/2

Sublime Text Snippet: Create camelcased string from the hyphenated file name

I am trying to create a Sublime Text snippet for AngularJs. This snippet should expand to AngularJs controller (or service, etc or any ng component). In the resulting code, it should construct the controller name in camelCase from the hyphenated file name.
For example:
when I type the snippets strings, say, ngctrl in an empty file called employee-benefits-controller.js, it should expand as given below:
angular.module('').controller('EmployeeBenefitsController', ['', function(){
}]);
I am trying to use the $TM_FILENAME variable by applying a regex on it to achieve this conversion. If anyone has already done this, please let us know.
You could use something like this:
<snippet>
<content><![CDATA[
angular.module('${1:moduleName}').controller('${TM_FILENAME/(^|-|\.js)(.?)|/\U\2\E/g}', ['', function(){
${2://functionCode}
}]);
]]></content>
<tabTrigger>ngctrl</tabTrigger>
</snippet>
Notes:
Note 1: maybe you want to change the scope so that the snippet its only triggered in javascript context.
Note 2: I'm not familiar with angularjs, so I don't know its naming conventions (I have supposed that an uppercase letter its needed after a hyphen [-] character and at the begining of the name, but I don't know if a uppercase character its needed after a dot character for example). So, you'll probablly have to adapt the snippet.
Note 3: expression explained:
${TM_FILENAME/(^|-|.js)(.?)/\U\2\E/g}
TM_FILENAME its the var_name item
(^|-|.js)(.?) its the regex (the parts of the variable we select).
\U\2\E its the format_string (how we format what we have selected).
g its the options (g means globally, so every time something its selected the format its given).
TM_FILENAME: the file name with the extension included.
\U => init uppercase conversion. \E => finish uppercase conversion. \2 => second group, i.e. second parénthesis, (.?), its a single char or an empty string.
(^|-|.js)(.?) First we look for the beginning of the word (^), or for a hypen character (-), or for the extension (.js).
(.?) Then we select in a parenthesis group (second group) the character (if any) after that hypen (or at the beginning of the word or after the extension).
Finally we use the uppercase conversion over that selected character as explained. Note that as there is not character after the extension, we are simply removing the extension from the output.
Note 4: as you probablly know, using ${1:moduleName} and ${2://functionCode} allows you to quickly move (using tab) and edit the important parts of the snippet once it has been triggered, such as the module or the function code.

Resources