Mystery about strtok() function [duplicate] - c

This question already has answers here:
How does strtok() split the string into tokens in C?
(16 answers)
Closed 6 years ago.
Sorry for probably a stupid question but, after reading a considerably amount of examples I still don't understand how strtok() works.
Here is example:
char s[] = " 1 2 3"; // 3 spaces before 1
int count = 0;
char* token = strtok(s, " ");
while (token != NULL) {
count++;
token = strtok(NULL, " ");
}
After executing count equals 3. Why?
Please explain I've given detailed steps of what happens inside the call to that function.

Because:
http://man7.org/linux/man-pages/man3/strtok.3.html
From the above description, it follows that a sequence of two or more
contiguous delimiter bytes in the parsed string is considered to be a
single delimiter, and that delimiter bytes at the start or end of the
string are ignored.
You could also have print the consecutive tokens.
Output for me:
1
2
3

It is C, but C++ reference has a nice example: http://www.cplusplus.com/reference/cstring/strtok/
char s[] once initialized, points to a memory location with data: 1 2 3.
First call to strtok, with delimiter a single space, advances the pointer to point to further memory loaction, now with just: 1 2 3. Further calls increase the pointer, to point to locations of consecutive tokens.
Note: think of strtok() as a tokenizer function, iterating forward by the valid tokens. Also, printing out the current value of pch may help to understand it better (or use a debugger).

Related

How do I calculate the number of chars in a string (if not all the space is taken) [duplicate]

This question already has answers here:
Getting wrong string length
(3 answers)
Closed 4 years ago.
I have this piece of code:
char* input = malloc(sizeof(char)*100);
scanf("%s", input); //let's say that the number of chars in "%s" is 5
How do I calculate how many chars I typed in (5)? I tried by playing around with sizeof(), but couldn't find a solution.
Edit (better explanation): the input variable can host up to 100 chars, but let's say I type in the terminal 'abcde': then it hosts only 5 chars, the other 95 are not taken. I want to calculate that '5'.
You have to find the null terminator.
int i = 0;
while(input[i] != 0) {
++i;
}
//i marks the spot
But yeah, strlen() does a better job, since it has some improved/optimized searching, since it uses word(16/32/64? bit) compare and stuff.

array indexing with signed number in C [duplicate]

This question already has answers here:
Are negative array indexes allowed in C?
(9 answers)
Closed 5 years ago.
I just want to know is it a good way of programming style.
I know what is happening in this piece of code. look for the first occurrence of href save it next_next and then look for the first occurrence of "}" and save it end_marker.
Here my question is end_marker[-1] = '\0'; is needed? Because strstr, upon successful completion, strstr() shall return a pointer to the located string or a null pointer if the string is not found.
I know the endmarker '\0' is for string but don't know is it good to index the array in the negative number?
Code:
char *end_marker;
char *next_next = strstr(links_ptr, "href");
if (next_next != NULL) {
next_next += 7;
end_marker= strstr(next_next, "}");
end_marker[-1] = '\0'; // :)
}
EDIT: links_ptr contains this data
"links": [
{
"rel": "next",
"href": "https://www.randomstuff.com/blabla"
}
]
This usage of strstr assumes much about the input. Given input it doesn't expect, it can scan memory out of the string bounds, write to bad addresses, or try to dereference a null pointer.
If links_ptr is different - if it's part of user input or data downloaded on the internet - then it's a definite bug and security issue.
next_next += 7 assumes that strlen(next_next) >= 7. If the string is shorter you'll be scanning memory that doesn't belong to the string until the first '\0' or '}' is found.
if the previous scan finds '}' it will write '\0' to an unrelated address
if '}' isn't found, end_marker will be NULL and end_marker[-1] should crash
In C/C++, there's nothing evil in using a negative array index. In this way you are addressing the slot BEFORE the pointer represented by end_marker. However, you need to ensure that there's valid memory at this address.
In this case it would be undefined behaviour, you should do
if (end_marker != NULL)
{
end_marker[strlen(end_marker) - 1] = '\0';
}
Using negative number isnt good practice and you should do it.
To do it you have to be sure there is still this array.

I have a filename ABCD_81018293.txt. I try sscanf to get ABCD as text and the digits as an int [duplicate]

This question already has answers here:
Crash or "segmentation fault" when data is copied/scanned/read to an uninitialized pointer
(5 answers)
Closed 6 years ago.
One of the many forms I have tried is
char *text, ebuf[32];
int *num;
strncpy (ebuf,tfp->d_name,strlen(tfp->d_name));
sscanf (ebuf, "%s_%i.txt", text, num);
I started originally with tfp->d_name in place of the ebuf hack, with the same results: segment faults. I understood from the man page that perhaps the %s should be %4c (text string has 4 characters - but not always). Anyway %4c or 4%c didn't make any difference.
Thanks to all who responded. Your clarifications - especially WhozCraig's "better man page" - put me on the right track.
The initialization issue was not my major problem, it was the syntax of the sscanf. For my program, once initialization was out of the way, the single line I really needed was:
sscanf (tfp->d_name, "%[^_]_%i", &text[0], &num);
Your pointer are not inizialize, so are pointing to nothing consistent.
You can allocate space for them using malloc, like:
char *text = malloc(max_len);
int *num = malloc(sizeof(int));
// ....
free(text);
free(num);
As you can see mallocated memory must be freed to release the allocated memory.
Otherwise you can use a simple variable and an array to store your c-string
char text[max_len];
int num;
// ...
sscanf (ebuf, "%s_%i.txt", text, &num);
Take a look at this SO question to understand how you can manage to trigger a '_' as string delimiter.
So, as BLUEPIXY commented you:
sscanf(ebuf, "%[^_]_%i.", text, &num);
Last thing, you can avoid the strncpy, and sscanf directly to tfp->d_name
sscanf(tfp->d_name, "%[^_]_%i.", text, &num);
This also avoid the "explosion" of sscanf due to the no null terminated ebuf string: You should use strlen(tfp->d_name)+1 as third parameter of your strncpy to copy to ebuf the null terminator too.
As strlen Man says:
DESCRIPTION
The strlen() function calculates the length of the string pointed to
by s, excluding the terminating null byte ('\0').
emphasis mine

Separating a single string into two different strings in C [duplicate]

This question already has answers here:
Split string with delimiters in C
(25 answers)
Closed 9 years ago.
I take user input in char name[20] using fgets like this:
fgets(name,20,stdin);
The user enters two strings separated by white space like John Smith. What if I wanted to use John and Smith in two strings like char name[20] , char surname[20] or just compare John and Smith using strcmp?
I tried a lot, but I did not find any way to do this.
What are some ways to fix this kind of problem?
You need to learn char * strtok (char *restrict newstring, const char *restrict delimiters) function in C uses to splitting a string up into token separated by set of delimiters.
You input string John Smith is separated by space (' ') char. You need to write a code something like below:
char *token;
token = strtok(name, " "); // first name
strcpy(fname, token);
token = strtok(NULL, " "); // second name
strcpy(lname, token);
You will need to search for the blank within the string yourself - look up the strchr function for that. Then, use strncpy to copy the two parts in 2 different strings.
Use the strtok function to split strings.

strtok into character arrays not working as expected

I am quite new to C programming and and currently struggling with strtok. I want to split a string into two strings using the following code (the string is e.g. "Bat1:185", the delimiter is the colon):
char batName[13];
char batVoltage[3];
char *result = NULL;
result = strtok(pStringToSplit, pDelimiter);
strcpy(batName, result);
result = strtok(NULL, pDelimiter);
strcpy(batVoltage, result);
After the first strtok call batName contains the value ("Bat1") as expected, but after the second strtok batName is empty, batVoltage contains the correct value "185".
I know that this code is very weak, but currently I am just trying to understand the basics of strtok. I have already spent a lot of time looking for a solution to this, but could not find any.
Thanks a lot for any hints
Peter
The value 185 is terminated by a nul (\0) character and really takes 4 characters of space. Your nul is written outside the buffer, seemingly overwriting the first character in batName with a string terminator.
Strictly, overwriting a buffer is undefined behavior, so this code could really behave in any way, depending on the compiler.
You have a buffer overflow in your code. Remember that the destination string arrays need an extra character for the terminating '\0' character. That means that when you get the second sub-string you overwrite the batVoltage array and apparently into the batName array.
Increase the size of the batVoltage array by one and it should work fine.
As pinted out by Joachim you have a buffer overflow for voltage, as its of three characters 185, the destination needs an extra space for NULL (\0).
So,
int main ()
{
char batName[13]; // Battery Identifier must be <=12, 1 for NULL
char batVoltage[3+1]; //Increase size, not just 4, but max digits of volatage+1
char *result = NULL;
char pStringToSplit[] ="Bat1:185";
const char *pDelimiter=":";
result = strtok(pStringToSplit, pDelimiter);
strcpy(batName, result);
result = strtok(NULL, pDelimiter);
strcpy(batVoltage, result);
printf("Battery :%s Voltage:%s ",batName,batVoltage);
return 0;
}
As Joachim Pileborg and Joachim Isaksson answered, You will hit Buffer Overflow here. Going forward if you fix this issue and according to your string (if it is like this Bat1:185:Bat2:186:Bat3:187) You can do some mathematics stuff to tokenize this string.
After the first strtok call batName contains the value ("Bat1") as expected, but after the second strtok batName is empty, batVoltage contains the correct value "185".
Here you can easily check that after every odd call you will get Bat* and every even call you will get 18*.
I hope this will also help you.

Resources