Using C sscanf to parse number from a directory path - c

I have the following string
/foo123/bar123/card45/foofoo/1.3/
And I want to parse the number that follows the word "card", which in the example above would be 45. Should I use sscanf for this and if so, how would I go about doing so?
Thanks

Should I use sscanf for [XYZ problem]
No.
But you can use strstr and strtol instead:
const char *s = "/foo123/bar123/card45/foofoo/1.3/";
const char *p = "card";
const char *t = strstr(s, p);
int i = -1; // a negative number indicates a parse failure, for example
if (t != NULL) {
t += strlen(p);
char *end;
i = strtol(t, &end, 10);
if (!end || *end != '/') {
// parsing the number failed
}
}

Using strstr() followed by sscanf() will do the job. Suppose you have your source string in the character array source_string.Then use this:
char * ptr;
ptr = strstr(source_string,"card");
sscanf (ptr,"%*s %d",&number); //Sorry this is wrong!!
sscanf(ptr,"card%d",&number); //This is right!!
sscanf(ptr,"%*4s%d",&number); //This works too
printf("The card number is %d",number);
strstr() gets you the address where "card" begins.Then you pass that address to sscanf() as the source.The %*s reads "card" but then discards it.After this the %d reads the number following "card" and stores it in the integer variable number, which you then display using the printf().

How about using simple splitting function that cuts out your path by '/'?
Here is function that I use to split random char array by specific letter. It saves each parsed part into string vector.
vector<string> split(const char *str, char c)
{
vector<string> result;
while(1)
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
if(0 == *str++)
break;
}
return result;
}
So, you can call this function like this
vector<string> ParsedString;
ParsedString = split("Your/Random/path", '/');
Then, you can access to each index of ParsedString to see any of them has specific word that you are looking for.
Or, if the word that you are looking for is always located on 3rd or 4th from front, then you can only pick that one for searching.
string InterestedTarget = ParsedString[4];

Related

Code does not appear to print concatenated strings correctly

I have some code here where, given a .txt file whose contents is
find replace pre
pre
cpre
,I want to find every instance of "pre", and append "k" to it. ie the file should become "find replace kpre".
So I first set out to create a string that is the concatenation of k and pre
(assume k and pre are argv[1] and argv[3], respectively)
char appended[1024];
strcpy(appended, argv[1]);
strcat(appended, argv[3]);
printf("appended string is %s", appended); //prints kpre, which is good
char *replaced = replace(buf, argv[3], appended);
//*string is a line in the file
char* replace(char *string, char *find, char *replace) {
char *position;
char temp[1024];
int find_length = strlen(find);
int index = 0;
while ((position = strstr(string, find)) != NULL) {
strcpy(temp, string);
index = position - string;
string[index] = '\0';
strcat(string, replace); //add new word to the string
strcat(string, temp + index + find_length); //add the unsearched
//remainder of the string
}
return string;
}
.................
fputs(replaced, temp);
Checking on the console, appended = "kpre", which is correct, but when the code is run the file looks like
find replace kkkkkkkkkkkkkkkk.....kkkkkkk
kkkkkkkkk......kkkkk
ckkkkk....kkkkk
the k's go on for a while, I cannot see pre when scrolling all the way to the right. I'm having difficulty figuring out why the code doesn't replace
the instance of 'pre' with 'kpre', even when the appended variable appears to be correct. I have a feeling it has to do with the fact that I set a 1024 character for temp, but even then I'm not sure why k was copied so many times.
Here
while ((position = strstr(string, find)) != NULL) {
you are passing string to strstr() function. The strstr() will return the pointer to the first occurrence of find in string. When you replace pre with kpre and calling again strstr(), it is retuning the pointer to the first occurrence of pre in string which is a sub string of replace string. After some iterations of while loop, it will start accessing the string beyond its size which will lead to undefined behavior.
Instead of passing string to strstr(), you should pass pointer to string and after every replace operation, the make the pointer point to after the replaced part of string. Other way is you can traverse the string character by character using pointer instead of using strstr(), like this:
#define BUFSZ 1024
char* replace(char *string, const char *find, const char *replace) {
if ((string == NULL) || (find == NULL) || (replace == NULL)) {
printf ("Invalid argument..\n");
return NULL;
}
char temp[BUFSZ];
char *ptr = string;
size_t find_len = strlen(find);
size_t repl_len = strlen(replace);
while (ptr[0]) {
if (strncmp (ptr, find, find_len)) {
ptr++;
continue;
}
strcpy (temp, ptr + find_len); // No need to copy whole string to temp
snprintf (ptr, BUFSZ - (ptr - string), "%s%s", replace, temp);
ptr = ptr + repl_len;
}
return string;
}
Note that above code is based on the example you have posted in your question and just to give you an idea about how you can achieve your goal without using strstr(). When writing code, take care of the other possibilities as well like, replace is a huge string.

C string nested splitting

I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.

Using strncpy to remove part of a char*

I am trying to remove a certain part of my string using strncpy but I am facing some issues here.
This is what my 2 char* has.
trimmed has for example "127.0.0.1/8|rubbish|rubbish2|" which is a
prefix of a address.
backportion contains "|rubbish|rubbish2|"
What I wanna do is to remove the backportion of the code from trimmed. So far I got this:
char* extractPrefix(char buf[1024]){
int count = 0;
const char *divider = "|";
char *c = buf;
char *trimmed;
char *backportionl;
while(*c){
if(strchr(divider,*c)){
count++;
if(count == 5){
++c;
trimmed = c;
//printf("Statement: %s\n",trimmed);
}
if(count == 6){
backportionl = c;
}
}
c++;
}
strncpy(trimmed,backportionl,sizeof(backportionl));
printf("Statement 2: %s\n", trimmed);
Which nets me an error of backportionl being a char* instead of a char.
Is there anyway I can fix this issue or find a better way to trim this char* to get my aim?
Here's one way that works for a list of dividers, similar to how strtok works the first time it's called:
char *extractPrefix(char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
if (buf[div_idx] != 0)
buf[div_idx] = 0;
return buf;
}
If you don't want the original buffer modified, you can use strndup, assuming your platform supports the function (Windows doesn't; you'd need to code it yourself). Don't forget to free the pointer that is returned when you're done with it:
char *extractPrefix(const char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
return strndup(buf, div_idx);
}
Alternatively, you could just return the number of characters (or some value less than 0 if the number of characters in the prefix won't fit in an int):
int pfxlen(const char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
if (div_idx > (size_t)INT_MAX)
return -1;
return (int)div_idx;
}
and use it like this:
int n;
const char *example = "127.0.0.1/8|rubbish|rubbish2|";
n = pfxlen(example, "|");
if (n >= 0)
printf("Prefix: %.*s\n", n, example);
else
fprintf(stderr, "prefix too long\n");
Obviously you have a number of options. It's really up to you which one you want to use.
Welp, this is stupid but i fixed my issue in basically one line. so here goes,
trimmed[strchr(trimmed,'|')-trimmed] = '\0';
printf("Statement 2: %s\n", trimmed);
So by getting the index of 'backportion' from the trimmed char* using strchr, i was effectively able to fix the issue.
Thanks internet, for not much.
Disclaimer: I'm not sure whether I correctly understood what you actually want to achieve. Some examples would probably be helpful.
I am trying to remove a certain part of my string [..]
I have no idea what you're trying in your code, but this is pretty easy to achieve with strstr, strlen and memmove:
First, find the position of the string you want to remove using strstr. Then copy what's behind that found string to the position where the found string starts.
char cut_out_first(char * input, char const * unwanted) {
assert(input); assert(unwanted);
char * start = strstr(input, unwanted);
if (start == NULL) {
return 0;
}
char * rest = start + strlen(unwanted);
memmove(start, rest, strlen(rest) + 1);
return 1;
}

Using Pointers and strtok()

I'm building a linked list and need your assistance please as I'm new to C.
I need to input a string that looks like this: (word)_#_(year)_#_(DEFINITION(UPPER CASE))
Ex: Enter a string
Input: invest_#_1945_#_TRADE
Basically I'm looking to build a function that scans the DEFINITION and give's me back the word it relates to.
Enter a word to search in the dictionary
Input: TRADE
Output: Found "TREADE" in the word "invest"
So far I managed to come up using the strtok() function but right now I'm not sure what to do about printing the first word then.
Here's what I could come up with:
char split(char words[99],char *p)
{
p=strtok(words, "_#_");
while (p!=NULL)
{
printf("%s\n",p);
p = strtok(NULL, "_#_");
}
return 0;
}
int main()
{
char hello[99];
char *s = NULL;
printf("Enter a string you want to split\n");
scanf("%s", hello);
split(hello,s);
return 0;
}
Any ideas on what should I do?
I reckon that your problem is how to extract the three bits of information from your formatted string.
The function strtok does not work as you think it does: The second argument is not a literal delimiting string, but a string that serves as a set of characters that are delimiters.
In your case, sscanf seems to be the better choice:
#include <stdlib.h>
#include <stdio.h>
int main()
{
const char *line = "invest_#_1945 _#_TRADE ";
char word[40];
int year;
char def[40];
int n;
n = sscanf(line, "%40[^_]_#_%d_#_%40s", word, &year, def);
if (n == 3) {
printf("word: %s\n", word);
printf("year: %d\n", year);
printf("def'n: %s\n", def);
} else {
printf("Unrecognized line.\n");
}
return 0;
}
The function sscanf examines a given string according to a given pattern. Roughly, that pattern consists of format specifiers that begin with a percent sign, of spaces which denote any amount of white-space characters (including none) and of other characters that have to be matched varbatim. The format specifiers yield a result, which has to be stored. Therefore, for each specifier, a result variable must be given after the format string.
In this case, there are several chunks:
%40[^_] reads up to 40 characters that are not the underscore into a char array. This is a special case of reading a string. Strings in sscanf are really words and may not contain white space. The underscore, however, would be part of a string, so in order not to eat up the underscore of the first delimiter, you have to use the notation [^(chars)], which means: Any sequence of chars that do not contain the given chars. (The caret does the negation here, [(chars)] would mean any sequence of the given chars.)
_#_ matches the first delimiter literally, i.e. only if the next chars are underscore hash mark, underscore.
%d reads a decimal number into an integer. Note that the adress of the integer has to be given here with &.
_#_ matches the second delimiter.
%40s reads a string of up to 40 non-whitespace characters into a char array.
The function returns the number of matched results, which should be three if the line is valid. The function sscanf can be cumbersome, but is probably your best bet here for quick and dirty input.
#include <stdio.h>
#include <string.h>
char *strtokByWord_r(char *str, const char *word, char **store){
char *p, *ret;
if(str != NULL){
*store = str;
}
if(*store == NULL) return NULL;
p = strstr(ret=*store, word);
if(p){
*p='\0';
*store = p + strlen(word);
} else {
*store = NULL;
}
return ret;
}
char *strtokByWord(char *str, const char *word){
static char *store = NULL;
return strtokByWord_r(str, word, &store);
}
int main(){
char input[]="invest_#_1945_#_TRADE";
char *array[3];
char *p;
int i, size = sizeof(array)/sizeof(char*);
for(i=0, p=input;i<size;++i){
if(NULL!=(p=strtokByWord(p, "_#_"))){
array[i]=p;//strdup(p);
p=NULL;
} else {
array[i]=NULL;
break;
}
}
for(i = 0;i<size;++i)
printf("array[%d]=\"%s\"\n", i, array[i]);
/* result
array[0]="invest"
array[1]="1945"
array[2]="TRADE"
*/
return 0;
}

Parsing text in C

I have a file like this:
...
words 13
more words 21
even more words 4
...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.
Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.
fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words.
In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.
Given the description, I think I'd use a variant of this (now tested) C99 code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
struct word_number
{
char word[128];
long number;
};
int read_word_number(FILE *fp, struct word_number *wnp)
{
char buffer[140];
if (fgets(buffer, sizeof(buffer), fp) == 0)
return EOF;
size_t len = strlen(buffer);
if (buffer[len-1] != '\n') // Error if line too long to fit
return EOF;
buffer[--len] = '\0';
char *num = &buffer[len-1];
while (num > buffer && !isspace((unsigned char)*num))
num--;
if (num == buffer) // No space in input data
return EOF;
char *end;
wnp->number = strtol(num+1, &end, 0);
if (*end != '\0') // Invalid number as last word on line
return EOF;
*num = '\0';
if (num - buffer >= sizeof(wnp->word)) // Non-number part too long
return EOF;
memcpy(wnp->word, buffer, num - buffer);
return(0);
}
int main(void)
{
struct word_number wn;
while (read_word_number(stdin, &wn) != EOF)
printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
return(0);
}
You could improve the error reporting by returning different values for different problems.
You could make it work with dynamically allocated memory for the word portion of the lines.
You could make it work with longer lines than I allow.
You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly.
You might prefer to ensure there are no digits in the word part; this code does not care.
You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).
Assuming that the number is immediately followed by '\n'.
you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.
Depending on how complex your strings become you may want to use the PCRE library. At least that way you can compile a perl'ish regular expression to split your lines. It may be overkill though.
Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* Read the next line from the file, splitting the tokens into
* multiple strings and a single integer. Assumes input lines
* never exceed MAX_LINE_LENGTH and each individual string never
* exceeds MAX_STR_SIZE. Otherwise things get a little more
* interesting. Also assumes that the integer is the last
* thing on each line.
*/
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
char buffer[MAX_LINE_LENGTH];
int rval = 1;
if (fgets(buffer, buffer, sizeof buffer))
{
char *token = strtok(buffer, " ");
*numStrings = 0;
while (token)
{
char *chk;
*value = (int) strtol(token, &chk, 10);
if (*chk != 0 && *chk != '\n')
{
strcpy(strs[(*numStrings)++], token);
}
token = strtok(NULL, " ");
}
}
else
{
/**
* fgets() hit either EOF or error; either way return 0
*/
rval = 0;
}
return rval;
}
/**
* sample main
*/
int main(void)
{
FILE *input;
char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
int numStrings;
int value;
input = fopen("datafile.txt", "r");
if (input)
{
while (getNextLine(input, &strings, &numStrings, &value))
{
/**
* Do something with strings and value here
*/
}
fclose(input);
}
return 0;
}

Resources