Input using sscanf with regular expression - c

I want to take input of a particular part of a string like
"First (helloWorld): last"
From that string I want to take input only "helloWorld" by regular expression. I am using
%*[^(] (%s):"
But that does not serve my purpose. Please somebody help me to solve this problem.

The format specifiers in the scanf family of functions are not generally considered to be a species of regular expression.
However, you can do what you want something like this.
#include <stdio.h>
int main() {
char str[256];
scanf("First (helloWorld): last", "%*[^(](%[^)]%*[^\n]", str);
printf("%s\n", str);
return 0;
}
%*[^(] read and discard everything up to opening paren
( read and discard the opening paren
%[^)] read and store up up to (but not including) the closing paren
%*[^\n] read and discard up to (but not including) the newline
The last format specifier is not necessary in the context of the above sscanf, but would be useful if reading from a stream and you want it positioned at the end of the current line for the next read. Note that the newline is still left in the stream, though.
Rather than use fscanf (or scanf) to read from a stream directly, it's pretty much always better read a line with fgets and then extract the fields of interest with sscanf
// Read lines, extracting the first parenthesized substring.
#include <stdio.h>
int main() {
char line[256], str[128];
while (fgets(line, sizeof line, stdin)) {
sscanf(line, "%*[^(](%127[^)]", str);
printf("|%s|\n", str);
}
return 0;
}
Sample run:
one (two) three
|two|
four (five) six
|five|
seven eight (nine) ten
|nine|

Sorry, no true regular expression parser in standard C.
Using the format in the scanf() family is not a full-fledged regular expression, but can do the job. "%n" tells sscanf() to save the current scanning offset.
#include <stdio.h>
#include <stdlib.h>
char *foo(char *buf) {
#define NotOpenParen "%*[^(]"
#define NotCloseParen "%*[^)]"
int start;
int end = 0;
sscanf(buf, NotOpenParen "(%n" NotCloseParen ")%n", &start, &end);
if (end == 0) {
return NULL; // End never found
}
buf[end-1] = '\0';
return &buf[start];
}
// Usage example
char buf[] = "First (helloWorld): last";
printf("%s\n", foo(buf));
But this approach fails on "First (): last". More code would be needed.
A pair of strchr() calls is better.
char *foo(char *buf) {
char *start = strchr(buf, '(');
if (start == NULL) {
return NULL; // start never found
}
char *end = strchr(start, ')');
if (end == NULL) {
return NULL; // end never found
}
*end = '\0';
return &start[1];
}
Else one needs to use a not-part-of-the C-spec solution.

Related

Ignoring "=" character with fscanf

I am trying to read a file that contains lines in this format abc=1234. How can I make fscanf ignore the = and store str1="abc" and str2="1234"?
I tried this:
fscanf(fich1, "%[^=]=%[^=]" , palavra, num_char)
I'd recommend using fgets to read lines and then parse them with sscanf. But you can use the same principle for just fscanf if you want.
#include <stdio.h>
int main(void) {
char buf[100];
char str1[100];
char str2[100];
if(! fgets(buf, sizeof buf, stdin)) return 1;
if(sscanf(buf, "%[^=]=%s", str1, str2) != 2) return 1;
puts(str1);
puts(str2);
}
So what does %[^=]=%s do? First %[^=] reads everything until the first occurrence of = and stores it in str1. Then it reads a = and discards it. Then it reads a string to str2. And here you can see the problem with your format string. %[^=] expects the string to end with =, and you have another one at the end. So you would have a successful read of the string abc=1234=.
Note that %[^=] and %s treats white space a little differently. So if that's a concern, you need to account for that. For example with %[^=]=%[^\n].
And in order to avoid buffer overflow, you also might want to do %99[^=]=%99[^\n].

C how to search string in a file?

I have a problem with my code, I'm trying to search a string in a file and I can read it but, when I compare two strings it takes only the last one of the file as equal to the the first string entered with the scanf().
So imagine I wrote in my file three words and each one is returning to the line.
test
test12
test123
If in my scanf() I write test12 for example or test when it's going to read it will return false to the compare so (!== 0). But if I write test123 it will works because it's the last word of the file but I don't know why?
char word[26];
char singleLine[26];
FILE *file = fopen("bin/Release/myWords.txt", "a+");
scanf("%26s", word);
if (file != NULL) {
while (!feof(file)) {
fgets(singleLine, 26, file);
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n",word);
}
}
fclose(file);
}
Your program only works in very special cases and has several problems:
scanf("%26s", word); may affect up to 27 bytes in the destination array, which is defined with a length of only 26.
furthermore, you should check the return value to avoid undefined behavior on invalid input.
fopen("bin/Release/myWords.txt", "a+"); opens the file in append mode: is this necessary?
while (!feof(file)) is always wrong, you should instead check the return value of fgets() that returns NULL at end of file.
compare = strcmp(singleLine, word); only compares for an exact math of the full line, which can only happen if the word has 25 characters, otherwise the trailing newline in singleLine will cause the comparison to fail. Furthermore, broken lines may cause unexpected results, as well as if the file does not end with a newline.
the reason it matches the last word in the file is you forget to write a trailing newline after the last word in the file, so the last fgets() fills the buffer with the exact word and no trailing newline.
if you search for matches inside the line, you should use a larger buffer and search for a match with strstr.
if you search for a exact match, you should strip the trailing newline before the comparison.
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main() {
char word[27];
char singleLine[256];
FILE *file = fopen("bin/Release/myWords.txt", "r");
if (scanf("%26s", word) != 1)
return 1;
if (file != NULL) {
while (fgets(singleLine, sizeof singleLine, file)) {
singleLine[strcspn(singleLine, "\n")] = '\0'; // strip the newline if any
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n", word);
}
}
fclose(file);
}
return 0;
}

Split string using more than one char as delimeter

Let's say I have a string "file1.h: file2.c,file3.cpp" and I want to split it into "file1.h" and "file2.c,file3.cpp" - that is using : (: and whitespace) as delimiter. How can I do it?
I tried this code with no help:
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s: %s", name, depends);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
}
And the output I get is:
Name: file1.h:
Deps:
What you seem to need is strtok(). Read about it in the man page. Related quote from C11, chapter ยง7.24.5.8
A sequence of calls to the strtok function breaks the string pointed to by s1 into a
sequence of tokens, each of which is delimited by a character from the string pointed to
by s2. [...]
In your case, you can use a delimiter like
char * delim = ": "; //combination of : and a space
go get the job done.
Things to mention additionally,
the input needs to be modifiable (which is, in your case) for strtok()
and it actually destroys the input fed to it, keep a copy around if you need the actual later.
This is an alternative way to do it, it uses strchr(), but this assumes that the input string always has the format
name: item1,item2,item3,...,itemN
Here is the program
#include <string.h>
#include <stdio.h>
int
main(void)
{
const char *const string = "file1.h: file2.c,file3.cpp ";
const char *head;
const char *tail;
const char *next;
// This basically makes a pointer to the `:'
head = string;
// If there is no `:' this string does not follow
// the assumption that the format is
//
// name: item1,item2,item3,...,itemN
//
if ((tail = strchr(head, ':')) == NULL)
return -1;
// Save a pointer to the next character after the `:'
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
fputc('*', stdout);
// Simply print the characters between `head' and `tail'
// you could as well copy them, or whatever
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
head = next;
while (head != NULL) {
tail = strchr(head, ',');
if (tail == NULL) {
// This means there are no more `,'
// so we now try to point to the end
// of the string
tail = strchr(head, '\0');
}
// This is basically the same algorithm
// just with a different delimiter which
// will presumably be the same from
// here
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
// Here is where you can extract the string
// I print it surrounded by `*' to show that
// it's stripping white spaces
fputc('*', stdout);
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
// Try to point to the next one
// or make head `NULL' if this is
// the end of the string
//
// Note that the original `tail' pointer
// that was pointing to the next `,' or
// the end of the string, has changed but
// we have saved it's original value
// plus one, we now inspect what was
// there
if (*(next - 1) == '\0') {
head = NULL;
} else {
head = next;
}
}
fputc('\n', stderr);
return 0;
}
It's excessively commented to guide the reader.
As Sourav says, you really need to use strtok for tokenizing strings. But this doesn't explain why your existing code is not working.
The answer lies in the specification for sscanf and how it handles a '%s' in the format string.
From the man page:
s Matches a sequence of non-white-space characters;
So, the presence of a colon-space in your format string is largely irrelevant for mathcing the first '%s'. When sscanf sees the first %s it simply consumes the input string until a whitespace character is encountered, giving you your value for name of "file1.h:" (note the inclusion of the colon).
Next it tries to deal with the colon-space sequence in your format string.
Again, from the man page
The format string consists of a sequence of directives which describe how to process the sequence of input characters.
The colon-space sequence does not match any known directive (i.e. "%" followed by something) and thus you get a matching failure.
If, instead, your format string was simply "%s%s", then sscanf will get you almost exactly what you want.
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s%s", name, depends);
printf("str: '%s'\n", str);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
return 0;
}
Which gives this output:
str: 'file1.h: file2.c,file3.cpp'
Name: file1.h:
Deps: file2.c,file3.cpp
At this point, you can simply check that sscanf gave a return value of 2 (i.e. it found two values), and that the last character of name is a colon. Then just truncate name and you have your answer.
Of course, by this logic, you aren't going to be able to use sscanf to parse your depends variable into multiple strings ... which is why others are recommending using strtok, strpbrk etc because you are both parsing and tokenizing your input.
Well, I am pretty late. I do not have much knowledge on inbuilt functions in C. So I started writing a solution for you. I don't think you need this now. But, anyway here it is and modify it as per your need. If you find any bug feel free to tell.

Dynamically allocate user inputted string

I am trying to write a function that does the following things:
Start an input loop, printing '> ' each iteration.
Take whatever the user enters (unknown length) and read it into a character array, dynamically allocating the size of the array if necessary. The user-entered line will end at a newline character.
Add a null byte, '\0', to the end of the character array.
Loop terminates when the user enters a blank line: '\n'
This is what I've currently written:
void input_loop(){
char *str = NULL;
printf("> ");
while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){
/*Add null byte to the end of str*/
/*Do stuff to input, including traversing until the null byte is reached*/
free(str);
str = NULL;
}
free(str);
str = NULL;
}
Now, I'm not too sure how to go about adding the null byte to the end of the string. I was thinking something like this:
last_index = strlen(str);
str[last_index] = '\0';
But I'm not too sure if that would work though. I can't test if it would work because I'm encountering this error when I try to compile my code:
warning: ISO C does not support the 'a' scanf flag [-Wformat=]
So what can I do to make my code work?
EDIT: changing scanf("%a[^\n]%*c",&input) == 1 to scanf("%as[^\n]%*c",&input) == 1 gives me the same error.
First of all, scanf format strings do not use regular expressions, so I don't think something close to what you want will work. As for the error you get, according to my trusty manual, the %a conversion flag is for floating point numbers, but it only works on C99 (and your compiler is probably configured for C90)
But then you have a bigger problem. scanf expects that you pass it a previously allocated empty buffer for it to fill in with the read input. It does not malloc the sctring for you so your attempts at initializing str to NULL and the corresponding frees will not work with scanf.
The simplest thing you can do is to give up on n arbritrary length strings. Create a large buffer and forbid inputs that are longer than that.
You can then use the fgets function to populate your buffer. To check if it managed to read the full line, check if your string ends with a "\n".
char str[256+1];
while(true){
printf("> ");
if(!fgets(str, sizeof str, stdin)){
//error or end of file
break;
}
size_t len = strlen(str);
if(len + 1 == sizeof str){
//user typed something too long
exit(1);
}
printf("user typed %s", str);
}
Another alternative is you can use a nonstandard library function. For example, in Linux there is the getline function that reads a full line of input using malloc behind the scenes.
No error checking, don't forget to free the pointer when you're done with it. If you use this code to read enormous lines, you deserve all the pain it will bring you.
#include <stdio.h>
#include <stdlib.h>
char *readInfiniteString() {
int l = 256;
char *buf = malloc(l);
int p = 0;
char ch;
ch = getchar();
while(ch != '\n') {
buf[p++] = ch;
if (p == l) {
l += 256;
buf = realloc(buf, l);
}
ch = getchar();
}
buf[p] = '\0';
return buf;
}
int main(int argc, char *argv[]) {
printf("> ");
char *buf = readInfiniteString();
printf("%s\n", buf);
free(buf);
}
If you are on a POSIX system such as Linux, you should have access to getline. It can be made to behave like fgets, but if you start with a null pointer and a zero length, it will take care of memory allocation for you.
You can use in in a loop like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strcmp
int main(void)
{
char *line = NULL;
size_t nline = 0;
for (;;) {
ptrdiff_t n;
printf("> ");
// read line, allocating as necessary
n = getline(&line, &nline, stdin);
if (n < 0) break;
// remove trailing newline
if (n && line[n - 1] == '\n') line[n - 1] = '\0';
// do stuff
printf("'%s'\n", line);
if (strcmp("quit", line) == 0) break;
}
free(line);
printf("\nBye\n");
return 0;
}
The passed pointer and the length value must be consistent, so that getline can reallocate memory as required. (That means that you shouldn't change nline or the pointer line in the loop.) If the line fits, the same buffer is used in each pass through the loop, so that you have to free the line string only once, when you're done reading.
Some have mentioned that scanf is probably unsuitable for this purpose. I wouldn't suggest using fgets, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets right the first time without reading the fgets manual in full. The parts most people manage to neglect entirely are:
what happens when the line is too large, and
what happens when EOF or an error is encountered.
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF or an error occurs; your program will probably be caught in an infinite loop.
When no '\n' is present, the remaining bytes of the line are yet to have been read. Thus, fgets will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n', to that, you're parsing the data a second time.
This allows you to realloc the storage and call fgets again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets solution is at least twice as poor as the same solution using fgetc. We can avoid parsing data the second time by using fgetc, so using fgetc might seem most appropriate. Alas most C programmers also manage to use fgetc incorrectly when neglecting the fgetc manual.
The most important detail is to realise that fgetc returns an int, not a char. It may return typically one of 256 distinct values, between 0 and UCHAR_MAX (inclusive). It may otherwise return EOF, meaning there are typically 257 distinct values that fgetc (or consequently, getchar) may return. Trying to store those values into a char or unsigned char results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT is greater than 8, and consequently UCHAR_MAX is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}

Split string with multiple delimiters using strtok in C

I have problem with splitting a string. The code below works, but only if between strings are ' ' (spaces). But I need to split strings even if there is any whitespace char. Is strtok() even necessary?
char input[1024];
char *string[3];
int i=0;
fgets(input,1024,stdin)!='\0') //get input
{
string[0]=strtok(input," "); //parce first string
while(string[i]!=NULL) //parce others
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL," ");
}
A simple example that shows how to use multiple delimiters and potential improvements in your code. See embedded comments for explanation.
Be warned about the general shortcomings of strtok() (from manual):
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread
safe. Use strtok_r() if this matters to you.
#include <stdio.h>
#include<string.h>
int main(void)
{
char input[1024];
char *string[256]; // 1) 3 is dangerously small,256 can hold a while;-)
// You may want to dynamically allocate the pointers
// in a general, robust case.
char delimit[]=" \t\r\n\v\f"; // 2) POSIX whitespace characters
int i = 0, j = 0;
if(fgets(input, sizeof input, stdin)) // 3) fgets() returns NULL on error.
// 4) Better practice to use sizeof
// input rather hard-coding size
{
string[i]=strtok(input,delimit); // 5) Make use of i to be explicit
while(string[i]!=NULL)
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL,delimit);
}
for (j=0;j<i;j++)
printf("%s", string[i]);
}
return 0;
}

Resources