Read from line with sscanf, including whitespaces, breaking on other character - c

I'm sorry for the sloppy title, but I didn't know how to format my question correctly. I'm trying to read a .txt, of which every line has information needed to fill a struct. First I use fgets to read the line, and then i was going to use sscanf to read the individual parts. Now here is where I'm stuck: normally sscanf breaks off parts on whitespaces, but I need the whitespace to be included. I know that sscanf allows ignoring whitespaces, but the tricky part is that I then need some other arbitrary character to separate the parts. For example, I have to break the line
Carl Sagan~Contact~scifi~1997
up into parts for Author,Name,Genre,year. You can see I need the space in Carl Sagan, but I need the function to break off the strings on the tilde character. Any help is appreciated

If your input is delimited by ~ or for instance any specific character:
Use this:
sscanf(s, "%[^~]", name);
[^ is conversion type, that matches all characters except the ones listed, ending with ]
Here is the sample program for testing it:
#include <stdio.h>
int main(int argv, char **argc)
{
char *s = "Carl Sagan~Contact~scifi~1997";
char name[100], contact[100], genre[100];
int yr;
sscanf(s, "%99[^~]~%99[^~]~%99[^~]~%d", name, contact, genre, &yr);
printf("%s\n%s\n%s\n%d\n", name, contact, genre, yr);
return 0;
}

You need strtok. Use ~ as your delimiter.
See the documentation: http://linux.die.net/man/3/strtok
strtok has some drawbacks but it sounds like it will work for you.
EDIT:
After reading this, it sounds like you can use sscanf cleverly to achieve the same result, and it may actually be safer after all.

#include <stddef.h>
#include <string.h>
#include <stdio.h>
char* mystrsep(char** input, const char* delim)
{
char* result = *input;
char* p;
p = (result != NULL) ? strpbrk(result, delim) : NULL;
if (p == NULL)
*input = NULL;
else
{
*p = '\0';
*input = p + 1;
}
return result;
}
int main()
{
char str[] = "Carl Sagan~Contact~scifi~1997";
const char delimiters[] = "~";
char* ptr;
char* token;
ptr = str;
token = mystrsep(&ptr, delimiters);
while(token)
{
printf("%s\n",token);
token = mystrsep(&ptr, delimiters);
}
return 0;
}
Output :-
Carl Sagan
Contact
scifi
1997

Related

C string nested splitting

I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.

How do I make this shell to parse the statement with quotes around them in C?

I am trying to make this shell parse. How do I make the program implement parsing in a way so that commands that are in quotes will be parsed based on the starting and ending quotes and will consider it as one token? During the second while loop where I am printing out the tokens I think I need to put some sort of if statement, but I am not too sure. Any feedback/suggestions are greatly appreciated.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
int main(int argc, char **argv[]){
int MaxLength = 1024; //size of buffer
int inloop = 1; //loop runs forever while 1
char buffer[MaxLength]; //buffer
bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
const char s[] = "-,+,|, ";
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
token = strtok(command,s);
while (token !=NULL){
printf( " %s\n",token);
token = strtok(NULL, s); //checks for elements
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
For an arbitrary command-line syntax, strtok is not the best function. It works for simple cases, where the words are delimited by special characters or white space, but there will come a time where you want to split something like this ls>out into three tokens. strtok can't handle this, because it needs to place its terminating zeros somewhere.
Here's a quick and dirty custom command-line parser:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int error(const char *msg)
{
printf("Error: %s\n", msg);
return -1;
}
int token(const char *begin, const char *end)
{
printf("'%.*s'\n", end - begin, begin);
return 1;
}
int parse(const char *cmd)
{
const char *p = cmd;
int count = 0;
for (;;) {
while (isspace(*p)) p++;
if (*p == '\0') break;
if (*p == '"' || *p == '\'') {
int quote = *p++;
const char *begin = p;
while (*p && *p != quote) p++;
if (*p == '\0') return error("Unmachted quote");
count += token(begin, p);
p++;
continue;
}
if (strchr("<>()|", *p)) {
count += token(p, p + 1);
p++;
continue;
}
if (isalnum(*p)) {
const char *begin = p;
while (isalnum(*p)) p++;
count += token(begin, p);
continue;
}
return error("Illegal character");
}
return count;
}
This code understands words separated by white-space, words separated by single or double quotation marks and single-character operators. It doesn't understand escaped quotation marks inside quotes and non-alphanumeric characters such as the dot in words.
The code is not hard to understand and you can extend it easily to understand double-char operators such as >> or comments.
If you want to escape quotation marks, you'll have to recognise the escape character in parse and unescape it and possible other escape sequences in token.
First, you've declared argv to be an array of pointers to... pointers. In fact, it is an array of pointers to chars. So:
int main(int argc, char **argv){
The trend is you want to reach for [], which got you into incorrect code here, but the idiom in C/C++ is more commonly to use pointer syntax, e.g.:
const char* s = "-+| ";
FWIW.
Also, note that fgets() will return NULL when it hits end of file (e.g., the user types CTRL-D on *nix or CTRL-Z on DOS/Windows). You probably don't want a segment violation when that happens.
Also, bzero() is a nonportable function (you probably don't care in this context) and the C compiler will happily initialize an array to zeroes for you if you ask it to (possibly worth caring about; syntax demonstrated below).
Next, as soon as you allow quoted strings, the next language question that immediately arises is: "how do I quote a quote?". Then, you are immediately out of the territory that can be handled cleanly with strtok(). I'm not 100% sure how you want to break your string into tokens. Using strtok() in the way you do, I think the string "a|b" would produce two tokens, "a" and "b", making you overlook the "|". You're treating "|" and "-" and "+" like whitespace, to be ignored, which is not generally what a shell does. For example, given this command-line:
echo 'This isn''t so hard' | cp -n foo.h .. >foo.out
I would probably want to get the following list of tokens:
echo
'This isn''t so hard'
|
cp
-n
foo.h
..
>
foo.out
Usually, characters like '+' and '-' are not special for most shells' tokenizing process (unlike '|' and '&' and '<', etc. which are instructions to the shell that the spawned command never sees). They get passed onto the application that is then free to decide "'-' indicates this word is an option and not a filename" or whatever.
What follows is a version of your code that produces the output I described (which may or may not be exactly what you want) and allows either double or single-quoted arguments (trivial to extend to handle back-ticks too) that can contain quote marks of the same kind, etc.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
#define MAXLENGTH 1024
int main(int argc, char **argv[]){
int inloop = 1; //loop runs forever while 1
char buffer[MAXLENGTH] = {'\0'}; //compiler inits entire array to NUL bytes
// bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
char* rover;
const char* StopChars = "|&<> ";
size_t toklen;
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
token = command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
if(command)
while(*token)
{
// skip leading whitespace
while(*token == ' ')
++token;
rover = token;
// if possible quoted string
if(*rover == '\'' || *rover == '\"')
{
char Quote = *rover++;
while(*rover)
if(*rover != Quote)
++rover;
else if(rover[1] == Quote)
rover += 2;
else
{
++rover;
break;
}
}
// else if special-meaning character token
else if(strchr(StopChars, *rover))
++rover;
// else generic token
else
while(*rover)
if(strchr(StopChars, *rover))
break;
else
++rover;
toklen = (size_t)(rover-token);
if(toklen)
printf(" %*.*s\n", toklen, toklen, token);
token = rover;
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
Regarding your specific request: commands that are in quotes will be parsed based on the starting and ending quotes.
You can use strtok() by tokenizing on the " character. Here's how:
char a[]={"\"this is a set\" this is not"};
char *buf;
buf = strtok(a, "\"");
In that code snippet, buf will contain "this is a set"
Note the use of \ allowing the " character to used as a token delimiter.
Also, Not your main issue, but you need to:
Change this:
const char s[] = "-,+,|, "; //strtok will parse on -,+| and a " " (space)
To:
const char s[] = "-+| "; //strtok will parse on only -+| and a " " (space)
strtok() will parse out whatever you have in the delimiter string, including ","

read the characters between special characters in C

I'm new to C language and I need a help on String functions.
I have a string variable called mcname upon which I would like to compare the characters between special characters.
For example:
*mcname="G2-99-77"
I expect the output to be 99 as this is between the - characters.
How can I do this in C please?
Travel the string (walking pointer) till u hit a special character.
Then start copying the characters into seperate array untill u hit the next special character (Place a null character when u encounter the special character second time)
You can do this by using strtok or sscanf
using sscanf:
#include <stdio.h>
int main()
{
char str[64];
int out;
char mcname[] = "G2-99-77";
sscanf(mcname, "%[^-]-%d", str, &out);
printf("%d\n", out);
return 0;
}
Using strtok:
#include <stdio.h>
#include <string.h>
int main()
{
char *str;
int out;
char mcname[] = "G2-99-77";
str = strtok(mcname, "-");
str = strtok (NULL, "-");
out = atoi(str);
printf("%d\n", out);
return 0;
}
sscanf() has great flexibility. Used correctly, code may readily parse a string.
Be sure to test the sscanf() return value.
%2[A-Z0-9] means to scan up to 2 characters from the set 'A' to 'Z' and '0' to '9'.
Use %2[^-] if code goal is any 2 char other than '-'.
char *mcname = "G2-99-77";
char prefix[3];
char middle[3];
char suffix[3];
int cnt = sscanf(mcname, "%2[A-Z0-9]-%2[A-Z0-9]-%2[A-Z0-9]", prefix, middle,
suffix);
if (cnt != 3) {
puts("Parse Error\n");
}
else {
printf("Prefix:<%s> Middle:<%s> Suffix:<%s>\n", prefix, middle, suffix);
}

Using Pointers and strtok()

I'm building a linked list and need your assistance please as I'm new to C.
I need to input a string that looks like this: (word)_#_(year)_#_(DEFINITION(UPPER CASE))
Ex: Enter a string
Input: invest_#_1945_#_TRADE
Basically I'm looking to build a function that scans the DEFINITION and give's me back the word it relates to.
Enter a word to search in the dictionary
Input: TRADE
Output: Found "TREADE" in the word "invest"
So far I managed to come up using the strtok() function but right now I'm not sure what to do about printing the first word then.
Here's what I could come up with:
char split(char words[99],char *p)
{
p=strtok(words, "_#_");
while (p!=NULL)
{
printf("%s\n",p);
p = strtok(NULL, "_#_");
}
return 0;
}
int main()
{
char hello[99];
char *s = NULL;
printf("Enter a string you want to split\n");
scanf("%s", hello);
split(hello,s);
return 0;
}
Any ideas on what should I do?
I reckon that your problem is how to extract the three bits of information from your formatted string.
The function strtok does not work as you think it does: The second argument is not a literal delimiting string, but a string that serves as a set of characters that are delimiters.
In your case, sscanf seems to be the better choice:
#include <stdlib.h>
#include <stdio.h>
int main()
{
const char *line = "invest_#_1945 _#_TRADE ";
char word[40];
int year;
char def[40];
int n;
n = sscanf(line, "%40[^_]_#_%d_#_%40s", word, &year, def);
if (n == 3) {
printf("word: %s\n", word);
printf("year: %d\n", year);
printf("def'n: %s\n", def);
} else {
printf("Unrecognized line.\n");
}
return 0;
}
The function sscanf examines a given string according to a given pattern. Roughly, that pattern consists of format specifiers that begin with a percent sign, of spaces which denote any amount of white-space characters (including none) and of other characters that have to be matched varbatim. The format specifiers yield a result, which has to be stored. Therefore, for each specifier, a result variable must be given after the format string.
In this case, there are several chunks:
%40[^_] reads up to 40 characters that are not the underscore into a char array. This is a special case of reading a string. Strings in sscanf are really words and may not contain white space. The underscore, however, would be part of a string, so in order not to eat up the underscore of the first delimiter, you have to use the notation [^(chars)], which means: Any sequence of chars that do not contain the given chars. (The caret does the negation here, [(chars)] would mean any sequence of the given chars.)
_#_ matches the first delimiter literally, i.e. only if the next chars are underscore hash mark, underscore.
%d reads a decimal number into an integer. Note that the adress of the integer has to be given here with &.
_#_ matches the second delimiter.
%40s reads a string of up to 40 non-whitespace characters into a char array.
The function returns the number of matched results, which should be three if the line is valid. The function sscanf can be cumbersome, but is probably your best bet here for quick and dirty input.
#include <stdio.h>
#include <string.h>
char *strtokByWord_r(char *str, const char *word, char **store){
char *p, *ret;
if(str != NULL){
*store = str;
}
if(*store == NULL) return NULL;
p = strstr(ret=*store, word);
if(p){
*p='\0';
*store = p + strlen(word);
} else {
*store = NULL;
}
return ret;
}
char *strtokByWord(char *str, const char *word){
static char *store = NULL;
return strtokByWord_r(str, word, &store);
}
int main(){
char input[]="invest_#_1945_#_TRADE";
char *array[3];
char *p;
int i, size = sizeof(array)/sizeof(char*);
for(i=0, p=input;i<size;++i){
if(NULL!=(p=strtokByWord(p, "_#_"))){
array[i]=p;//strdup(p);
p=NULL;
} else {
array[i]=NULL;
break;
}
}
for(i = 0;i<size;++i)
printf("array[%d]=\"%s\"\n", i, array[i]);
/* result
array[0]="invest"
array[1]="1945"
array[2]="TRADE"
*/
return 0;
}

There has to be a better way to substr

Hi guys I'm currently using the code below and I'm pretty sure there's a better way to do it. What the code does is look if there's the delimiter (~~~~), puts everything before ~~~~ in cmd and everything after ~~~~ in param. If anyone could let me know how I should be doing this then it would be very appreciated! I'm not used to low-level languages so strings and pointers are still confusing to me!
Thanks!
char buffer[1024], *tempCharPointer, cmd[100], param[1024];
if(strstr(buffer, "~~~~"))
{
strcpy(cmd, buffer);
tempCharPointer = strstr(buffer, "~~~~");
index = (tempCharPointer-buffer) + 4;
strcpy(param, &tempCharPointer[4]);
memmove(&cmd[index-4], "", (index-4));
}
You can simplify your code as follows:
char cmd[1024], *tempCharPointer, *param = "";
// Fill in cmd from somewhere...
...
char *delim = strstr(cmd, "~~~~");
if(delim)
{
param = delim+4;
*delim = '\0';
}
You can simplify your code and insert \0 before the delimiter (modify the first character of the delimiter and make it \0) and have command be a pointer to the beginning of the string and param a pointer to the first character after the delimiter. Saves you memory and all these moves and such.
char buffer[1024], *tempCharPointer, cmd[100], param[1024];
tempCharPointer = strstr(buffer, "~~~~");
if (tempCharPointer){
*tempCharPointer = '\0';
tempCharPointer +=4;
//now buffer points to the first half, and tempCharPointer points to second half
//do with them what you will
}
The strtok function in the C library (extract tokens from strings) can be useful here.
A small example follows. man strtok for more info. Note that strtok_r (used below) is used for reentrant support.
#include <string.h>
#include <stdio.h>
int main(const int argc, const char const** argv)
{
char buffer[1024];
sprintf(buffer, "~~~~foo~~~~bar~~~~baz");
char* saveptr = NULL;
char* token = strtok_r(buffer, "~~~~", &saveptr);
while(token != NULL)
{
printf("TOKEN: %s\n", token);
token = strtok_r(NULL, "~~~~", &saveptr);
}
}

Resources