How to use regular expressions in C? - c

I need to write a little program in C that parses a string. I wanted to use regular expressions since I've been using them for years, but I have no idea how to do that in C. I can't find any straight forward examples (i.e., "use this library", "this is the methodology").
Can someone give me a simple example?

You can use PCRE:
The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API. The PCRE library is free, even for building commercial software.
See pcredemo.c for a PCRE example.
If you cannot use PCRE, POSIX regular expression support is probably available on your system (as #tinkertim pointed out). For Windows, you can use the gnuwin Regex for Windows package.
The regcomp documentation includes the following example:
#include <regex.h>
/*
* Match string against the extended regular expression in
* pattern, treating errors as no match.
*
* Return 1 for match, 0 for no match.
*/
int
match(const char *string, char *pattern)
{
int status;
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
return(0); /* Report error. */
}
status = regexec(&re, string, (size_t) 0, NULL, 0);
regfree(&re);
if (status != 0) {
return(0); /* Report error. */
}
return(1);
}

If forced into POSIX only (no pcre), here's a tidbit of fall back:
#include <regex.h>
#include <stdbool.h>
bool reg_matches(const char *str, const char *pattern)
{
regex_t re;
int ret;
if (regcomp(&re, pattern, REG_EXTENDED) != 0)
return false;
ret = regexec(&re, str, (size_t) 0, NULL, 0);
regfree(&re);
if (ret == 0)
return true;
return false;
}
You might call it like this:
int main(void)
{
static const char *pattern = "/foo/[0-9]+$";
/* Going to return 1 always, since pattern wants the last part of the
* path to be an unsigned integer */
if (! reg_matches("/foo/abc", pattern))
return 1;
return 0;
}
I highly recommend making use of PCRE if its available. But, its nice to check for it and have some sort of fall back.
I pulled the snippets from a project currently in my editor. Its just a very basic example, but gives you types and functions to look up should you need them. This answer more or less augments Sinan's answer.

Another option besides a native C library is to use an interface to another language like Python or Perl. Not having to deal with C's string handling, and the better language support for regex's should make things much easier for you. You can also use a tool like SWIG to generate wrappers for calling the code from C.

Related

How to use Regex to verify data input from keyboard is real numbers with C languages?

I try to research for REGEX in C and try to understand but I have trouble with pattern of the string type.
In this program I want to verify string input is a number(only digits number, not characters, space, or special characters)
#include<stdio.h>
#include <regex.h>
void print_result(int return_value){
if (return_value == 0){
printf("Pattern found.\n");
}
else if (return_value == REG_NOMATCH){
printf("Pattern not found.\n");
}
else{
printf("An error occured.\n");
}
}
int main() {
regex_t regex;
int return_value;
int return_value2;
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
return_value2 = regcomp(&regex,"\d+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
print_result(return_value); //not found
print_result(return_value); //no found
print_result(return_value2);
return 0;
}
Can you give me some ideas to verify the input. I want find another way without use ASCII values
If you specify the flags as 0 in regcomp:
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
then you are accepting the default regex syntax, which is a so-called Basic Regular Expression (BRE). The only sensible thing that can be said about BREs is "don't use them." Always specify the REG_EXTENDED flag (at least), and then you will be working with a regular expression syntax that at least bears a passing resemblance to what you expect. (Otherwise, your strings will be dominated by what's technically called "leaning timber": \ characters which enable metacharacters in the regex, and more \ characters so that the \ characters you need are not treated as escape characters in the character string.)
Take a look at man regexec and man 7 regex for more details. Make sure you read the second link thoroughly (although you can ignore basic regular expression syntax :-) ) because there are many commonly-used syntaxes in more modern regex libraries which are not present in Posix regexes, not even extended ones. (That includes \d, used in your second regex. Posix has named character classes, such as [[:digit:]].)

regexec in C does not match when \b is used in the expression

I am trying to use regular expressions in my C code to find a string in each line of a text file that I am reading and \b boundary seems like it does not work. That string can not be a part of a bigger string.
After that failure I also tried some hand-written boundary expression in the following and could not make it work in my code as well (source here):
(?i)(?<=^|[^a-z])MYWORDHERE(?=$|[^a-z])
But when I try something simple like a as the regular expression, it finds what is expected.
Here is my shortened snippet:
#include <regex.h>
void readFromFile(char arr[], char * wordToSearch) {
regex_t regex;
int regexi;
char regexStr [100];
strcpy(regexStr, "\\b(");
strcat(regexStr, wordToSearch);
strcat(regexStr, ")\\b");
regexi = regcomp(&regex, regexStr, 0);
printf("regexi while compiling: %d\n", regexi);
if (regexi) {
fprintf(stderr, "compile error\n");
}
FILE* file = fopen(arr, "r");
char line[256];
while (fgets(line, sizeof(line), file)) {
regexi = regexec(&regex, line, 0, NULL, 0);
printf("%s\n", line);
printf("regexi while execing: %d\n", regexi);
if (!regexi) {
printf("there is a match.");
}
}
fclose(file);
}
In the regcomp function, I also tried to pass the REG_EXTENDED as the flag and it also did not work.
The regular expressions supported by POSIX are documented in the Linux regex(7) manual page and re_format(7) for MacOS X.
Unfortunately the POSIX standard regular expressions (which come in 2 standard flavours: obsolete basic, and the REG_EXTENDED) support neither \b nor any of the (?...) formats, both of which I believe originated in Perl.
Mac OS X (and possibly other BSD derived systems) additionally has the REG_ENHANCED format, which is not portable.
Your best choice would be to use some other regular expression library such as PCRE. While the word boundaries themselves are a regular language, the use of capturing groups make this harder, as POSIX doesn't even support non-capturing grouping, otherwise you could use something like (^|[^[:alpha:])(.*)($|[^[:alpha:]]*) but it surely would get really messy.

List files in directories using Glob() in C

Basically, so far I have this code:
#include <glob.h>
#include <string.h>
#include <stdio.h>
# define ERROR 1
# define FAILURE -1
int main(int ac, char **av)
{
glob_t globlist;
int i;
i = 0;
if (ac == 1)
return (-1);
else
{
if (glob(av[1], GLOB_PERIOD, NULL, &globlist) == GLOB_NOSPACE
|| glob(av[1], GLOB_PERIOD, NULL, &globlist) == GLOB_NOMATCH)
return (FAILURE);
if (glob(av[1], GLOB_PERIOD, NULL, &globlist) == GLOB_ABORTED)
return (ERROR);
while (globlist.gl_pathv[i])
{
printf("%s\n", globlist.gl_pathv[i]);
i++;
}
}
return (0);
}
When I type ./a.out "*" for example it prints all my files where I am, aswell as directories, but it doesn't print what is inside directories. How should I do to print ALL files, including sub-files/folders?
Thanks
Use nftw() instead of glob() if you want to examine entire trees, rather than one specific path and filename pattern.
(It is absolutely silly to reinvent the wheel by going at it using opendir()/readdir()/closedir(), especially because nftw() should handle filesystem changes gracefully, whereas self-spun tree walking code usually ignores all the hard stuff, and only works in optimal conditions on your own machine, failing in spectacular and wonderful ways elsewhere.)
In the filter function, use fnmatch() to decide whether the file name is acceptable using glob patterns.
If you wish to filter using regular expressions instead, use regcomp() to compile the pattern(s) before calling nftw(), then regexec() in your filter function. (Regular expressions are more powerful than glob patterns, and they are compiled to a tight state machine, so they are quite efficient, too.)
If you are unsure about the difference, the Wikipedia articles on glob patterns and regular expressions are very useful and informative.
All of the above are defined in POSIX.1-2008, so they are portable across all POSIX-y operating systems.

C code in Swift Project

I have a C program, that I would like to print its output from swift, and when it scans I can give it input through Swift. Is such thing possible? I tried this with a simple function, and it worked, but how can someone do so with many different functions that call other functions?
I know the question is a bit vague, but can someone point me into the right direction?
Example of code:
int main(int argc, char **argv) {
int i;
int hitme;
char ch;
prelim();
if (argc > 1) { // look for -f option
if (strcmp(argv[1], "-f")== 0) {
coordfixed = 1;
argc--;
argv++;
}
}
if (argc > 1) {
fromcommandline = 1;
line[0] = '\0';
while (--argc > 0) {
strcat(line, *(++argv));
strcat(line, " ");
}
}
else fromcommandline = 0;
while (TRUE) { /* Play a game */
setup();
if (alldone) {
score(0);
alldone = 0;
}
else makemoves();
skip(2);
stars();
skip(1);
if (tourn && alldone) {
printf("Do you want your score recorded?");
if (ja()) {
chew2();
freeze(FALSE);
}
}
printf("Do you want to play again?");
if (!ja()) break;
}
skip(1);
prout("May the Great Bird of the Galaxy roost upon your home planet.");
return 0;
}
Yes.
This is extensively covered in Using Swift with Cocoa and Objective-C. Objective-C is a superset of C, so all the instructions for Objective-C work equally well for C.
The short version is that you just add the C code to your project, import its header in your Objective-C Bridging Header, and then the C functions will be available in Swift (using various automatic translations).
That said, if you really want to read the output (i.e. the results of these printf) calls, that's a bit different problem. I'd avoid it if you can. Otherwise you'd need to do something like build the C program as its own executable and use NSTask within Swift to call it and capture the output, or you'd have to hijack stdout with something like fdopen. It's a pain to do that completely correctly.
I will focus on the second part of your question, how to interact with C code that uses the standard IO facilities:
The obvious choice as Rob Napier pointed out is just compiling the C code into an executable and using something akin to popen(3) to read and write to its standard IO facilities, the same way you would read/write any other FILE*.
Another way would be to seek out places where stdio is used and change these functions. For example you could use
#ifdef STANDALONE
#define print printf
#else
#define print passToSwift
#endif
Then you can change all the printfs to prints and just #define which mode you want your C code to operate in. In case STANDALONE is left undefined, you will have to provide a passToSwift function that will connect your C and Swift functionality.
One more way without having to change all printfs is using funopen(3) or friends, particularly fwopen(3). With fwopen(3) (man fwopen) you can provide a passToSwift function to be called whenever something is written to stdout.
#include <stdio.h>
int passToSwift(void * cookie, const char * buffer, int len)
{
(void)cookie;
// do stuff with the buffer you recieved
return len;
}
int main(void)
{
fflush(stdout);
stdout = fwopen(NULL, passToSwift);
printf("Hey\n");
}
The assignment to stdout is not portable, but works for me on OS X. I am not aware of any other way to achieve it. (dup2 gives EBADF for funopend streams, freopen expects an entry in the filesystem).
I am adressing a quite similar problem.
I have a solution open to discussion on codereview: C hack: replace printf to collect output and return complete string by using a line buffer
Maybe you could use that (or a part of it) for your text game as well ...
The improved version of C hack: replace printf to collect output and return complete string by using a line buffer is now availabe on github as Xcode 7 project swift-C-string-passing (and standalone gcc version).
Especially look at the #define preprocessor statements to make use of the bridge to swift (similar to a3f's answer).
My solution is able to pass strings in and out to the C code. But how are the answers retrieved from the user? I.e. what does the ja() function do?

getopt.h: Compiling Linux C-Code in Windows

I am trying to get a set of nine *.c files (and nine related *.h files) to compile under Windows.
The code was originally designed in Linux to take command line arguments using the standard GNU-Linux/C library "getopt.h". And that library does not apply to building the C-code in Windows.
I want to ignore what my code does right now and ask the following question. For those of you familiar with this C-library "getopt.h": will it be possible to build and run my code in Windows if it depends on POSIX-style command-line arguments? Or will I have to re-write the code to work for Windows, passing input files differently (and ditching the "getopt.h" dependency)?
getopt() is actually a really simple function. I made a github gist for it, code from here is below too
#include <string.h>
#include <stdio.h>
int opterr = 1, /* if error message should be printed */
optind = 1, /* index into parent argv vector */
optopt, /* character checked for validity */
optreset; /* reset getopt */
char *optarg; /* argument associated with option */
#define BADCH (int)'?'
#define BADARG (int)':'
#define EMSG ""
/*
* getopt --
* Parse argc/argv argument vector.
*/
int
getopt(int nargc, char * const nargv[], const char *ostr)
{
static char *place = EMSG; /* option letter processing */
const char *oli; /* option letter list index */
if (optreset || !*place) { /* update scanning pointer */
optreset = 0;
if (optind >= nargc || *(place = nargv[optind]) != '-') {
place = EMSG;
return (-1);
}
if (place[1] && *++place == '-') { /* found "--" */
++optind;
place = EMSG;
return (-1);
}
} /* option letter okay? */
if ((optopt = (int)*place++) == (int)':' ||
!(oli = strchr(ostr, optopt))) {
/*
* if the user didn't specify '-' as an option,
* assume it means -1.
*/
if (optopt == (int)'-')
return (-1);
if (!*place)
++optind;
if (opterr && *ostr != ':')
(void)printf("illegal option -- %c\n", optopt);
return (BADCH);
}
if (*++oli != ':') { /* don't need argument */
optarg = NULL;
if (!*place)
++optind;
}
else { /* need an argument */
if (*place) /* no white space */
optarg = place;
else if (nargc <= ++optind) { /* no arg */
place = EMSG;
if (*ostr == ':')
return (BADARG);
if (opterr)
(void)printf("option requires an argument -- %c\n", optopt);
return (BADCH);
}
else /* white space */
optarg = nargv[optind];
place = EMSG;
++optind;
}
return (optopt); /* dump back option letter */
}
You are correct. getopt() is POSIX, not Windows, you would generally have to re-write all command-line argument parsing code.
Fortunately, there is a project, Xgetopt, that is meant for Windows/MFC classes.
http://www.codeproject.com/Articles/1940/XGetopt-A-Unix-compatible-getopt-for-MFC-and-Win32
If you can get this working in your project, it should save you a fair bit of coding and prevent you from having to rework all parsing.
Additionally, it comes with a nice GUI-enabled demo app that you should find helpful.
Good luck!
There is a possibilty to use code from MinGW runtime (by Todd C. Miller):
http://sourceforge.net/apps/trac/mingw-w64/browser/trunk/mingw-w64-crt/misc
I have created a small library with these files and CMake script (can generate a VS project):
https://github.com/alex85k/wingetopt
I did compile the getopt code under windows.
I did this as I wanted to explicilty use its command line parsing functionality in a windows (command line) app.
I successfully did this using VC2010.
As far as I remember I ran into no significant issues doing so.
getopt.c getoptl.c
if you just want getopt to be used in visual c++ without other dependences, I have port the getopt.c from latest gnu libc 2.12, with all new features.The only difference is you have to use TCHAR instead of char,but This is very common in windows.
simply download the source, make, copy libgetopt.lib and getopt.h getopt_int.h to your project.
you can also make it using CMakeList.txt in the root dir.
download the source from github
You might try looking into glib-2.0 as an alternative. It would be a bit large for just needing an option parser. The up side would be having access to all the other wonderful toys in the glib.
Just to be honest, I haven't tried getting this to work (I stick mostly to Linux), so YMMV.
Getting glib to work in windows: HowTo
Oh, you might explore using mingw for the build environment, and visual studio for your IDE.
Glib for Win32: HowTo
Anywho, hope this helps.
From my reading of the documentation the header file getopt.h is specific to the GNU C library as used with Linux (and Hurd). The getopt function itself has been standardised by POSIX which says it should be declared, along with optind optarg etc. in unistd.h
I can't try this on Visual Studio myself but it would be worth checking if unistd.h exists and declares this function as Visual Studio does provides some other POSIX functions.
If not, then I'd definitely grab an implementation of getopt rather than re-write the argument parsing to work without it. Getopt was written to make things easier for the programmer and more consistent for user of programs with command line arguments. Do check the license, though.
From what I remember of getopt.h, all it does is provide a handy parser for processing argv from your main function:
int main(int argc, char * argv[])
{
}
Windows console programs still have a main method, so you can simply loop through your argv array and parse the parameters yourself. e.g.
for ( int i = 1; i < argc; i++ )
{
if (!strcmp(argv[i], "-f"))
filename = argv[++i];
}
The getopt.h exists in git, I have download it and it works for me:
https://gist.github.com/ashelly/7776712

Resources