Regular expressions with scanf in C - c

I'm trying to achieve the following without any success:
Removing the opening
message "
and trailing
"
while leaving the content in between, and saving it into my variable, using sscanf regular expressions.
I wrote the following code:
sscanf( buffer, "message \"%[^\"]", message)
Which works good when I have something like message "Hey there", but when I'm trying the following string, I get only the white space between the two quotation marks.
message " """ This is a Test """ "
The result for this should be """ This is a Test """
Is there a way to upgrade my expression so it will include this extreme event of message? I tried to look it up both in google and here, and couldn't find an elegant answer. I'm aware that it's possible using string manipulation with a lot lines of code, but I'm trying something more simple here.
P.S. The trailing " is the end of the expression, and is a must by the program, after that comes nothing.
Thanks in advance for the feedback!

If you're fine with not using regex for the whole thing:
Original version:
sscanf(buffer, "message \"%[^$]", message); // remove 'message "'
message[strlen(message) - 1] = '\0'; // remove trailing '"'
Safe, correct, and generic version:
char* buffer = ...;
const char* prefix = "message \"";
const char* suffix = "\"";
if (strstr(buffer, prefix) != buffer) {
// error, doesn't start with `prefix`
}
buffer += strlen(prefix);
char* suffixStart = strrchr(buffer, suffix[0]);
if (!suffixStart || strcmp(suffixStart, suffix) != 0) {
// error, doesn't end with `suffix`
}
*suffixStart = '\0'; // strip `suffix`

Related

strtok fail at finding new line character ("\n")

Here is the function:
int parse_headers(c_request *req, char *raw_headers) {
char *command_line;
char *raw_header;
req->headers = NULL;
command_line = strtok_r(raw_headers, "\\n", &raw_headers);
printf("command line = [%s]\n", command_line);
if (parse_command(req, command_line) < 0)
return -1;
while ((raw_header = strtok_r(raw_headers, "\\n", &raw_headers))) {
printf("\nraw header = [%s]\n", raw_header);
parse_header(req, raw_header);
}
return 0;
}
raw_headers equal :
POST www.google.fr HTTP/1.1\nUser-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)\nHost: www.tutorialspoint.com\nContent-Type: text/xml; charset=utf-8\nContent-Length: 38\nAccept-Language: en-us\nAccept-Encoding: gzip, deflate\nConnection: Keep-Alive\r\n\r\n<?xml version='1.0' encoding='utf-8'?
So strtok find the first \n (command_line equal POST www.google.fr HTTP/1.1), but in the while loop, when I print the raw_header, it print me raw header = [User-Age] instead of raw header = [User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)]
What can I do to fix that?
The delimiter "\\n" specified for strtok_r consists of two characters: '\\' and 'n'. There is a n after User-Age, so the data is cut there.
You should use "\n" to have strtok_r search for LF.
If you actually want to separate the data by a multi-charcter string "\\n", then strtok_r is not for that. You should do it manually, maybe using strstr().
Hmm.... you are using a delimiter formed by a backslash (\) and the character n, not a new line (new line has to be written as \n, not \\n. You can try to check how it parses a string with n's or backslashes, and you'll see how it breaks the string at these two characters.

What does git_sysdir_find_in_dirlist() in libgit2 do?

I'm working on porting some logic from libgit2 into Go, but not a 1:1 port as Go works differently. I think this function scans a directory tree, but I'm not sure.
static int git_sysdir_find_in_dirlist(
git_buf *path,
const char *name,
git_sysdir_t which,
const char *label)
{
// allocations
size_t len;
const char *scan, *next = NULL;
const git_buf *syspath;
// check the path to make sure it exists?
GIT_ERROR_CHECK_ERROR(git_sysdir_get(&syspath, which));
if (!syspath || !git_buf_len(syspath))
goto done;
// this is the part I don't understand
for (scan = git_buf_cstr(syspath); scan; scan = next) {
/* find unescaped separator or end of string */
for (next = scan; *next; ++next) {
if (*next == GIT_PATH_LIST_SEPARATOR &&
(next <= scan || next[-1] != '\\'))
break;
}
len = (size_t)(next - scan);
next = (*next ? next + 1 : NULL);
if (!len)
continue;
GIT_ERROR_CHECK_ERROR(git_buf_set(path, scan, len));
if (name)
GIT_ERROR_CHECK_ERROR(git_buf_joinpath(path, path->ptr, name));
if (git_path_exists(path->ptr))
return 0;
}
done:
git_buf_dispose(path);
git_error_set(GIT_ERROR_OS, "the %s file '%s' doesn't exist", label, name);
return GIT_ENOTFOUND;
}
It's the for loop that's confusing me. for (scan = git_buf_cstr(syspath); scan; scan = next) { ... } looks like it's iterating/scanning syspath, and then I get totally lost at for (scan = git_buf_cstr(syspath); scan; scan = next) { ... }.
What does this function specifically do?
This is not looking at a directory tree but rather at delimited string containing a list of directories. For instance (though this clearly isn't aimed at this particular case), as the top level documentation says:
GIT_ALTERNATE_OBJECT_DIRECTORIES
Due to the immutable nature of Git objects, old objects can be
archived into shared, read-only directories. This variable
specifies a ":" separated (on Windows ";" separated) list of Git
object directories which can be used to search for Git objects. New
objects will not be written to these directories.
Entries that begin with " (double-quote) will be interpreted as
C-style quoted paths, removing leading and trailing double-quotes
and respecting backslash escapes. E.g., the value
"path-with-\"-and-:-in-it":vanilla-path has two paths:
path-with-"-and-:-in-it and vanilla-path.
The function is obviously scanning a character-separated path list (whatever that character is—probably colon or semicolon as above), checking for backslash prefixes, so that you can write /a:C\:/a to allow the thing to look into either /a or C:/a.
This function is tasked with locating the file name inside a "configuration level", (ie. ~/.git/, /etc/git, see git_sysdir_t for a list of known locations). As those levels are stored as a bunch of static ("readonly") / (or \)-separated C strings, and we can't modify that at runtime, we have to jump through hoops to what amounts to a foreach-string loop.

detect if a line is match to a format - in C

I have a file and I need to check if its lines are in the following format:
name: name1,name2,name3,name4 ...
(some string, followed by ":", then a single space and after that strings separated by ",").
I tried doing it with the following code:
int result =0;
do
{
result =sscanf(rest,"%[^:]: %s%s", p1,p2,p3);
if(result==3)
{
printf("invalid!");
fclose(fpointer);
return -1;
}
}while (fgets(rest ,LINE , fpointer) != NULL);
this works good for lines like: name: name1, name2 (with space between name1, and name2).
but it fails with the following line:
name : name1,name2
I want to somehow tell sscanf not to avoid this white space before the ":".
could someone see how ?
Thanks for helping!
This works for me:
result = sscanf(rest,"%[^*:]: %[^,],%s", p1, p2, p3);
Notice the * is used to consume the space (if any).

Breaking a string in C with multiple spaces

Ok, so my code currently splits a single string like this: "hello world" into:
hello
world
But when I have multiple spaces in between, before or after within the string, my code doesn't behave. It takes that space and counts it as a word/number to be analyzed. For example, if I put in two spaces in between hello and world my code would produce:
hello
(a space character)
world
The space is actually counted as a word/token.
int counter = 0;
int index = strcur->current_index;
char *string = strcur->myString;
char token_buffer = string[index];
while(strcur->current_index <= strcur->end_index)
{
counter = 0;
token_buffer = string[counter+index];
while(!is_delimiter(token_buffer) && (index+counter)<=strcur->end_index)//delimiters are: '\0','\n','\r',' '
{
counter++;
token_buffer = string[index+counter];
}
char *output_token = malloc(counter+1);
strncpy(output_token,string+index,counter);
printf("%s \n", output_token);
TKProcessing(output_token);
//update information
counter++;
strcur->current_index += counter;
index += counter;
}
I can see the problem area in my loop, but I'm a bit stumped as to how to fix this. Any help would be must appreciated.
From a coding stand point, if you wanted to know how to do this without a library as an exercise, what's happening is your loop breaks after you run into the first delimeter. Then when you loop to the second delimeter, you don't enter the second while loop and print a new line again. You can put
//update information
while(is_delimiter(token_buffer) && (index+counter)<=strcur->end_index)
{
counter++;
token_buffer = string[index+counter];
}
Use the standard C library function strtok().
Rather than redevelop such a standard function.
Here's the related related manual page.
Can use as following in your case:
#include <string.h>
char *token;
token = strtok (string, " \r\n");
// do something with your first token
while (token != NULL)
{
// do something with subsequents tokens
token = strtok (NULL, " \r\n");
}
As you can observe, each subsequent call to strtok using the same arguments will send you back a char* adressing to the next token.
In the case you're working on a threaded program, you might use strtok_r() C function.
First call to it should be the same as strtok(), but subsequent calls are done passing NULL as the first argument. :
#include <string.h>
char *token;
char *saveptr;
token = strtok_r(string, " \r\n", &saveptr)
// do something with your first token
while (token != NULL)
{
// do something with subsequents tokens
token = strtok_r(NULL, " \r\n", &saveptr)
}
Just put the process token logic into aif(counter > 0){...}, which makes malloc happen only when there was a real token. like this
if(counter > 0){ // it means has a real word, not delimeters
char *output_token = malloc(counter+1);
strncpy(output_token,string+index,counter);
printf("%s \n", output_token);
TKProcessing(output_token);
}

Creating a terminal menu with a challenge

What I wont to do is to create a terminal menu that takes various types of arguments and place it in a array param. Under is the code: Here is some trouble that I have and cant find a good solution for.
if i just type 'list' I will get Not a valid command, I have to type “list “ (list and space).
Menu choice new should be like this: new “My name is hello”. param[0] = new and param[1] = My name is hello , (sow I can create a message with spaces).
How can I accomplish this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
int menu()
{
printf(">");
char line[LINE_MAX];
int i = 0;
char *param[4];
while(fgets(line, LINE_MAX, stdin) != NULL) {
param[i++] = strtok(line, " \n");
if(param[0] != NULL) {
char *argument;
while((argument = strtok(NULL, "\n")) != NULL) {
param[i++] = argument;
}
}
if(strcmp(param[0], "new") == 0) {
//new(param[1]);
menu();
} else if(strcmp(param[0], "list") == 0) {
//list();
menu();
} else {
printf("Not a valid command.\n\n");
menu();
}
}
return 0;
}
You're delimiting on " ".
fgets reads the ENTER.
So, when you type "listENTER" and tokenise at spaces you get one token, namely "listENTER". Later you compare with "list" and, of course, it doesn't match.
Try
strtok(line, " \n"); /* maybe include tabs too? */
PS. Why are you calling menu recursively? You already have a while in the function ...
Your problem is param[i++] = strtok(line, " "); will only split on space, not on \n (newline). Try adding this to your array of delimeters.
Oh, and congratulations for some decent looking code that's clean and well formatted. A pleasant change.
I'm not sure if this causes your problem but these lines
/*new(param[1]);
/*list();
Start a comment that is never terminated.
If you want one line comments you can use:
// comment
(atleast in C++ and from C99 on)
But comments starting with /*must be ended with a */and not nested:
/* comment */
/* also multi line
allowed */
Since you start a comment in a comment your compiler should have emmited a warning, actually this shouldn't compile at all.
The reason you need to type "list " is that your first strtok tokenizes until a space character, so you need to enter one in this case. Try allowing both '\n' and space as separators, i.e. replace the second parameter of strtok with " \n".
As for quotes, you need to re-combine parameters starting from the one beginning with a quote to the one ending with one by replacing the characters in between them with spaces. Or do away with strtok and parse by manually iterating through the characters in line.

Resources