Pattern matching in C , Alternative to pcre - c

I am trying to write a C code that will find hyperlinks in a mail and replace them.
Is using a pcre library a good thing to do ?
Since pcre is ,allegedly, too slow is there an alternative ?

C is the last language I would choose to do this. Firstly, if you want to do this with high accuracy - use a MIME parser to get the HTML body out. Java has mime4j, Perl has MIME::Parser, Python has email, etc. This isn't too hard and I'm willing to help with this step in any of these languages if you'd like. Secondly, use an HTML parser to isolate the links.
If you're ok with some mistakes then just write a one-line program in Perl or PHP. Or sed even. Really. If you are replacing with a fixed URL, use sed. If you are modifying the URL, the only reason this won't work as-is is you'll probably have to url_encode it which a P-language can handle in one line.

Related

Is there a way to get help for some C functions inside of vim/Neovim?

This question may be a little off topic. But I was wondering if there was a way for me to look at the descriptions of C functions using vim or neovim. Is it possible to look at their documentations by doing something like :help? This would really be helpful since I wouldn't need to lookup to my browser everytime.
I am unclear about these things:
Can :help be my friend here ?
Can I use LSPs to do something like this ?
I am using latest Neovim inside Ubunutu 20.04 in WSL. Is this helpful somehow ?
By pressing K, the keyword under the cursor is looked up using a configured keyword lookup program, the default being man. This works pretty much out of the box for the C standard library.
For C++, you might want to look at something like cppman.
Well yes, you can get the description of C functions by using a LSP (language server plugin)! Here is an image of me using clangd as my LSP:
You'd "just" need to install the LSP and start it. I don't know how familiar you're with neovim, but just in case if you don't know how to install a plugin or to be more specifique: If you don't know how you can install a LSP server, then you can do the following:
There're plenty videos how to set up a LSP-Server for your language. Here's an example.
If you don't want to set up on your own, you can pick up one of the preconfigured neovim setups (some of my friends are recommending lunarvim)
But yeah, that's it. If you have any further questions feel free to ask them in the comments.
Happy vimming c(^-^)c
Let's explain how "K" command works in more detail.
You can run external commands by prefixing them with :! command. So running man tool is as easy as
:!man <C-R><C-W>
Here <C-R><C-W> is a special key combination used to put word under cursor from text buffer down to command line.
Same for showing Vim's built-in help page
:help <C-R><C-W>
As it feels tedious to type that, Vim also defines K Normal mode command that does pretty much the same thing. Except the tool name is taken from value of an option named "keywordprg".
So doing set keywordprg=man (default for *nix systems) makes K to invoke !man tool; while set keywordprg=:help is for bultin help.
Also, the option :h 'keywordprg' is made global or local-to-buffer, so any Vim buffer is able to overwrite global setting. For example, this is already done by standard runtime for "vim" and "help" buffers, so they call ":help" instead of "man".
The problem with :!man command is that it shows "black console". It'd be nice if we could capture man's output and open it inside Vim just like a builtin help page. Then we could also apply some pretty highlighting, assign key macros and all such. This is a pretty common trick and it is already done by a standard plugin shipped with Vim/Neovim.
A command that the plugin provides is called :Man, so you can open :Man man instead of :!man man, for example. The plugin is preactivated in Neovim; for Vim you still need to source one file manually. So to make use of this plugin you'll need something like this
set keywordprg=:Man
if !has("nvim")
source $VIMRUNTIME/ftplugin/man.vim
endif
The previous answer recommending cppman is the way to go. There is no need to install a bulky language server just for the purpose of having the hover functionality. However, make sure you're caching the man pages via cppman -c. Otherwise, there will be a noticeable delay since cppman is fetching the page from cppreference.com on the fly.
If you like popups for displaying documentation, convert the uncompressed man pages (groff -t -e -mandoc -Tascii <man-page> | col -bx), and set keywordprg to your own wrapper to search for keywords according to your needs.

xmlParseFile vs xmlReadFile (libxml2)

I'm writing some C code using the libxml2 library to read an XML file. There seem to be two different functions for this purpose, xmlParseFile and xmlReadFile, and and I'm not sure of the difference between them (besides the fact that xmlReadFile() takes some additional parameters).
The examples on the libxml2 website sometimes use xmlParseFile and some use xmlReadFile.
So when should you use xmlParseFile and when should you use xmlReadFile?
I haven't been able to find anything that explains this.
xmlReadFile() is a bit more powerful as it is able to take an URL instead of a local file path, and allows to specify more options (http://xmlsoft.org/html/libxml-parser.html#xmlParserOption), so I tend to use it instead of xmlParseFile(). That said, if you are parsing a local XML file and not using the parser options, you will be fine with xmlParseFile().
xmlReadFile() is more powerful and latest version for parsing the XML. I am also using it in place of xmlParseFile().
I have xml arriving in character buffer 'msg' on TCP pipe so I use libxml2 call xmlReadDoc() instead as following with options XML_PARSE_NOBLANKS and XML_PARSE_OLD10
xmlDocPtr parsed_xml_dom;
parsed_xml_dom = xmlReadDoc((xmlChar *)(msg), NULL, NULL, XML_PARSE_NOBLANKS| XML_PARSE_OLD10);

Prolog, Define Clause Grammar and File

I'm new to Prolog and I have just started looking around. I read the Define Clause Grammar chapter on both Simply Logical and Learn Prolog now!, so now I wanted to get started with some exercise but I'm stuck.
I have to read from a file with this syntax
setName = {elemen1, element2,..., elementN}.
element1: element2 > element3.
Now I have read that when you define a DCG you have a parser for free, so I wanted to do that to get the data from my file to the Prolog program.
My problem is that in all the examples I have read they always provide a basic dictionary like
article --> [the]
but I cannot do that because I don't know what is going to be written in the file.
Any suggestions?
In SWI-Prolog, consider using library(dcg/basics). It provides building-blocks that you can use in your DCG. Focus on a clear declarative description of what the contents of the file look like, state this with a DCG. Then use phrase_from_file/2 from library(pio) to apply the DCG to a file.

Regular expression for filter c comments

For a merge with a tool I need to compare only non-commented parts of source lines.
So I try to create a filter which detects actual code, i.e. a regular expression that matches all text EXCEPT comments.
Perhaps something like this:
^.*(?!((/\**([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)))
This one will do :
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)
Source : http://ostermiller.org/findcomment.html.
Or using non-greedy matching : (/\*([\r\n]|.)*?\*/)|(//.*).
Amine's answer is right, but you could also find for any comments and remove them from the string:
This regex will give you all comments:
(/\*.*?\*/)|//.*?\n
This will replace the matches with "" (if you're using c++):
std::string str2 = std::tr1::regex_replace(string, regex, "");
Maybe your compiler can help. Some might have an option to preprocess source and strip comments. Maybe the preprocessor can be made to only strip comments. This would be the Unix way of having one tool do one thing right--the C preprocessor knows what a comment is (while regexen are a kluge for parsing, IMNSHO).
As a second option, writing a lexer with lex or flex to recognize comments is easy. There should be plenty of examples on the 'net. Any search engine will turn up tons of hits.
For multiline comments use :
/\/\*([\s\S]*?)\*\//mg
and for matching single line comments:
/\/\/([\s\S]*?)[\n\r]?$/mg
or combine these two for matching all comments
/(\/\*(?<multiline>[\s\S]*?)\*\/)|(\/\/(?<singleline>[\s\S]*?)[\n\r]?$)/mg

make file running on Linux - how to ignore case sensitive?

I have a huge project, whole written in C language and I have a single make file that is used to compile it. The project C files contains lots of capitalize problems in it's header files, meaning there are tones of header files that were miss-spelled in lots of C files.
The problem is I need to migrate this project to compile on Linux machine and since Linux is case sensitive I got tones of errors.
Is there an elegant way which I can run make file in Linux and tell him to ignore case sensitive?
Any other solution will be welcome as well.
Thanks a lot.
Motti.
You'll have to fix everything by hand and rename every file or fix every place with #include. Even if you have a huge project (comparable with linux kernel), it should be possible to do this during a hour or two. Automation may be possible, but manual way should be better - because script won't be able to guess which name is right - filename, or the name used in #include.
Besides, this situation is a fault of original project developer. If he/she wasn't sloppy and named every header in every #include correctly, this wouldn't happen. Technically, this is a code problem similar to syntax error. The only right way to deal with it is to fix it.
I think it takes not too long to write a small script, which goes thru the directories first, then replaces C headers. Explained:
Scan the headers' folder and collect filenames.
Make a lowercase list of them. You have now original and locased pairs.
Scan the C source files and find each line contains "#include"
Lowercase it.
Find the lowercase filename in the list collected and lowercased from headers.
Replace the source line with the one collected from headers.
You should put the modified files into a separate folder structure, avoid overwriting the whole source with some buggy stuff. Don't forget to create target folders during the source tree scan.
I recommend a script language for that task, I prefer PHP, but just it's the only server-side script language which I know. Yep, it will run for a while, but only once.
(I bet that you will have other difficulties with that project, this problem is not a typical indicator of high quality work.)
Well I can only tell you that you need to change the case of those header files. I don't know that there is any way you can make it automatic but still you can use cscope to do it in a easier way.
http://www.linux-tutorial.info/modules.php?name=ManPage&sec=1&manpage=cscope
You can mount the files on a case-insensitive file system. FAT comes to mind. ntfs-3g does not appear to support this.
I use the find all and replace all functionality of Source Insight when i have to do complete replacement. But your problem seems quite big, but you can try the option to replace every header file name in all occurences of source files using the
"Find All" + "Replace" functionality. You can use notepad++ too for doing the same.
A long time ago there was a great tool under MPW (Macintosh Programmer's Workshop) called Canon. It was used to canonize text files, i.e. make all symbols found in a given refernce list have have the same usage of upper/lower case. This tool would be ideal for a task like this - I wonder if anything similar exists under Linux ?

Resources