I have a project in which I need to sort multiple lines of text based on the second, third, etc word in each line, not the first word. For example,
this line is first
but this line is second
finally there is this line
and you choose to sort by the second word, it would turn into
this line is first
finally there is this line
but this line is second
(since line is before there is before this)
I have a pointer to a char array that contains each line. So far what I've done is use strtok() to split each line up to the second word, but that changes the entire string to just that word and stores it in my array. My code for the tokenize bit looks like this:
for (i = 0; i < numLines; i++) {
char* token = strtok(labels[i], " ");
token = strtok(NULL, " ");
labels[i] = token;
}
This would give me the second word in each line, since I called strtok twice. Then I sort those words. (line, this, there) However, I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
I'm sure the answer lies in using pointers, but I'm confused what exactly I need to do next.
I should mention I'm reading in the lines from an input file as shown:
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i] = strdup(buffer);
Edit: my find_offset method
size_t find_offset(const char *s, int n) {
size_t len;
while (n > 0) {
len = strspn(s, " ");
s += len;
}
return len;
}
Edit 2: The relevant code used to sort
//Getting the line and offset
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset(labels[i].line, nth);
}
int n = sizeof(labels) / sizeof(labels[0]);
qsort(labels, n, sizeof(*labels), myCompare);
for (i = 0; i < numLines; i++)
printf("%d: %s", i, labels[i].line); //Print the sorted lines
int myCompare(const void* a, const void* b) { //Compare function
xline *xlineA = (xline *)a;
xline *xlineB = (xline *)b;
return strcmp(xlineA->line + xlineA->offset, xlineB->line + xlineB->offset);
}
Perhaps rather than mess with strtok(), use strspn(), strcspn() to parse the string for tokens. Then the original string can even be const.
#include <stdio.h>
#include <string.h>
int main(void) {
const char str[] = "this line is first";
const char *s = str;
while (*(s += strspn(s, " ")) != '\0') {
size_t len = strcspn(s, " ");
// Instead of printing, use the nth parsed token for key sorting
printf("<%.*s>\n", (int) len, s);
s += len;
}
}
Output
<this>
<line>
<is>
<first>
Or
Do not sort lines.
Sort structures
typedef struct {
char *line;
size_t offset;
} xline;
Pseudo code
int fcmp(a, b) {
return strcmp(a->line + a->offset, b->line + b->offset);
}
size_t find_offset_of_nth_word(const char *s, n) {
while (n > 0) {
use strspn(), strcspn() like above
}
}
main() {
int nth = ...;
xline labels[numLines];
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset_of_nth_word(nth);
}
qsort(labels, i, sizeof *labels, fcmp);
}
Or
After reading each line, find the nth token with strspn(), strcspn() and the reform the line from "aaa bbb ccc ddd \n" to "ccd ddd \naaa bbb ", sort and then later re-order the line.
In all case, do not use strtok() - too much information lost.
I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
Far better would be to avoid damaging the original strings in the first place if you want to keep them, and especially to avoid losing the pointers to them. Provided that it is safe to assume that there are at least three words in each line and that the second is separated from the first and third by exactly one space on each side, you could undo strtok()'s replacement of delimiters with string terminators. However, there is no safe or reliable way to recover the start of the overall string once you lose it.
I suggest creating an auxiliary array in which you record information about the second word of each sentence -- obtained without damaging the original sentences -- and then co-sorting the auxiliary array and sentence array. The information to be recorded in the aux array could be a copy of the second word of the sentence, their offsets and lengths, or something similar.
I'm working on a function that will take in a file, line by line, and take each line, remove anything that isn't hex from it, and return the non-hex free line. As I'm iterating through each line of the file I'm seeing the expected values, and the function is only grabbing the hex values, which is exactly what I want.
However, when I print the strippedLine in the main, I'm getting some unexpected characters at the beginning, and missing the last byte of data. (I'm very new to c, and memory management).
Here's code from my main:
char currentLine[100];
char *strippedLine = NULL;
FILE *file = fopen("filename.txt", "r");
// Works as expected getting each line from file.
while(fgets(currentLine, sizeof(currentLine), file) != NULL)
{
// Change from original question
//strippedLine = (char *)malloc(1 + strlen(currentLine));
strippedLine = malloc(1 + strlen(currentLine));
// Change from original question
//strcpy(strippedLine, StripNonHex(currentLine));
strippedLine = StripNonHex(currentLine);
printf("%s", strippedLine);
}
Here's the function that I'd like to return a char array with all hex stripped out:
char *StripNonHex(char *line)
{
char *nl = NULL;
char *token = NULL;
int convs = 0;
unsigned ch = '\0';
int hexLine = 0;
char *strippedLine = (char *) malloc(sizeof(char) * 256);
int counter = 0;
// Remove new-line char
if (nl)
{
*nl = '\0';
}
// Split each line into space-delimited tokens
token = strtok(line, " ");
convs = sscanf(token, "%x", &ch);
// Works as expected seeing each space delimited hex value of file.
while(token)
{
convs = sscanf(token, "%x", &ch);
if (convs == 1 && strlen(token) == 2)
{
hexLine = 1;
strippedLine[counter] = token;
}
counter += strlen(token);
token = strtok(NULL, " ");
// Removed from original question
//counter++;
}
// Removed from original question
//strippedLine[counter + 1] = '\0';
return strippedLine;
}
Sorry about missing the input and output. Here it is below
Input
A5 12 00 24 00 01 22 00 3F 11
Output
≡¡║A5120024000122003F
This output is much closer thanks to the recommendations from #Barmar. It's just missing the last byte, and has some unexpected characters in the beginning.
I've updated my question to make more sense with what I have now.
Moving counter += strlen(token); inside of the if statement was my final issue. I was inadvertently moving my pointer whether I found what I was looking for or not.
You're not copying token into strippedLine correctly.
strippedLine[counter] = token;
converts the pointer in token to a char, which is an implementation-defined conversion, most likely just taking the low-order 8 bits of the address, and stores that converted value into strippedLine[counter]. The correct way is:
strcpy(&strippedLine[counter], token);
Then you need to increment counter by the length of the token:
counter += strlen(token);
If you want to scan a hex value into unsigned char, you need to use:
sscanf(token, "%hhx", &ch);
You don't need strippedLine[counter + 1] = '\0'; at the end, since strcpy() copies the null terminator.
Since StripNonHex() returns a newly allocated string, there's no need to use strcpy() on the result. Just assign it directly the strippedLine instead of allocating another string.
char *strippedLine = StripNonHex(currentLine);
Then you'll be able to use free(strippedLine) when you're done with the line.
StripNonHex() should use strlen() when allocating its strippedLine variable, rather than hard-coding the size 256.
char *strippedLine = malloc(strlen(line) + 1);
So im getting a file with strings, i want to tokenize each string whenever i come to a whitespace/newline. i am able to get the tokens seperated into delimiter strings, but im not able to copy them into an array.
int lexer(FILE* file){
char line[50];
char* delim;
int i = 0;
int* intptr = &i;
while(fgets(line,sizeof(line),file)){
printf("%s\n", line);
if(is_empty(line) == 1)
continue;
delim = strtok(line," ");
if(delim == NULL)
printf("%s\n", "ERROR");
while(delim != NULL){
if(delim[0] == '\n'){
//rintf("%s\n", "olala");
break;
}
tokenArray[*intptr] = delim;
printf("Token IN array: %s\n", tokenArray[*intptr]);
*intptr = *intptr + 1;
delim = strtok(NULL, " ");
}
if i run this i get the output :
Token IN array: 012
Token IN array: 23ddd
Token IN array: vs32
Token IN array: ,344
Token IN array: 0sdf
which is correct according to my textfile, but when i try to reprint the array at a later time in the same function and out
*intptr = *intptr + 1;
delim = strtok(NULL, " ");
}
}
printf("%s\n", tokenArray[3]);
fclose(file);
return 0;
i dont get an output, i tried writing all the contents of the array to a txt file, i got gibberish. i dont know what to do plz help
First, your pointer on i is useless. Why not using i directly?
I'll assume that from now on.
Then, the real problem: you have to allocate and copy the strings that strtok returns each time because strtok does not allocate the tokens for you, it justs points to the last one. The references are all the same, so you get last empty token
Something like this would help:
tokenArray[*intptr] = strdup(delim);
(instead of tokenArray[*intptr] = delim;) note that I have replaced the index by i. Just to i++ afterwards.
BTW I wouldn't recommend using strtok for other purposes that quick hacks. This function has a memory, so if you call several functions using it in different parts of your program, it can conflict (I made that mistake a long time ago). Check manual for strtok_r in that case (r for reentrant)
tokenArray[*intptr] = delim;
In this line, delim is a pointer to a char array of which the content is ever changing in the for loop. So in your case, the content which delim point to should be copied as content of tokenArray[*intptr], that is:
tokenArray[*intptr] = strdup(delim);
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I'm trying to split up a string (typed in by the user at run time) into words (separated by spaces), and put each word into a different slot into an array. So, for example, if I took the string "hello world", array[0] would contain "hello" and array[1] would contain "world". And the last slot (in this case array[2]) would contain NULL. Here's what I have so far, which doesn't seem to be working properly. Any help would be appreciated. (By the way, this is part of a program which will call execvp(argv[0],argv); )
char input[100];
char* argv[20];
char* token;
scanf("%s", input);
//get the first token
token = strtok(input, " ");
int i=0;
//walk through other tokens
while( token != NULL ) {
argv[i] = token;
i++;
token = strtok(NULL, " ");
}
argv[i] = NULL; //argv ends with NULL
You need to allocate memory for each argv[i] and copy the current token to argv[i]:
token = strtok(input, " ");
int i=0;
//walk through other tokens
while( token != NULL ) {
argv[i] = malloc(strlen(token) + 1);
strncpy(argv[i], token, strlen(token));
//argv[i] = token;
i++;
token = strtok(NULL, " ");
}
argv[i] = NULL; //argv ends with NULL
I have created an example of what I think you want. I have used one malloc(3) for the whole
line of strings and another for the array of pointers you will get from the function.
Also, the second parameter of strtok(3) is passed to give more flexibility (the shell normally uses the contents of IFS environment variable to separate arguments so you can use the same algorithm as the shell does) I think you should use " \n\t" at least. It has a main() test function, so it's complete for your purpose.
#include <assert.h> /* man assert(3) */
#include <stdlib.h> /* malloc lives here */
#include <string.h> /* strtok, strdup lives here */
#include <stdio.h> /* printf lives here */
char **split(const char *str, const char *delim)
{
char *aux;
char *p;
char **res;
char *argv[200]; /* place for 200 words. */
int n = 0, i;
assert(aux = strdup(str));
for (p = strtok(aux, delim); p; p = strtok(NULL, delim))
argv[n++] = p;
argv[n++] = NULL;
/* i'll put de strdup()ed string one place past the NULL,
* so you can free(3), once finished */
argv[n++] = aux;
/* now, we need to copy the array, so we can use it outside
* this function. */
assert(res = calloc(n, sizeof (char *)));
for (i = 0; i < n; i++)
res[i] = argv[i];
return res;
} /* split */
int main()
{
char **argv =
split("Put each word of a string into array in C", " ");
int i;
for (i = 0; argv[i]; i++)
printf("[%s]", argv[i]);
puts(""); /* to end with a newline */
free(argv[i+1]);
free(argv);
} /* main */
The sample code just outputs:
$ pru
[Put][each][word][of][a][string][into][array][in][C]
I think I just figured out my problem: I need to use gets() instead of scanf(), because scanf() only gets the first word, up until a space, while I want to be able to get a string containing multiple words separated by spaces.
Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDIT:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."
strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.
This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.
strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)
From the POSIX strtok page:
This function uses static storage to keep track of the current string position between calls.
There is a thread-safe variant (strtok_r) that doesn't do this type of magic.
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;
for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
printf("token=%s\n", token);
}
Result:
this
is
the
string
I
want
to
parse
The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.
The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
And here is an example usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zStrtok(s,","));
printf("2 %s\n",zStrtok(NULL,","));
printf("3 %s\n",zStrtok(NULL,","));
printf("4 %s\n",zStrtok(NULL,","));
printf("5 %s\n",zStrtok(NULL,","));
printf("6 %s\n",zStrtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
#include "stdafx.h"
#include <iostream>
using namespace std;
char* mystrtok(char str[],char filter[])
{
if(filter == NULL) {
return str;
}
static char *ptr = str;
static int flag = 0;
if(flag == 1) {
return NULL;
}
char* ptrReturn = ptr;
for(int j = 0; ptr != '\0'; j++) {
for(int i=0 ; filter[i] != '\0' ; i++) {
if(ptr[j] == '\0') {
flag = 1;
return ptrReturn;
}
if( ptr[j] == filter[i]) {
ptr[j] = '\0';
ptr+=j+1;
return ptrReturn;
}
}
}
return NULL;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[200] = "This,is my,string.test";
char *ppt = mystrtok(str,", .");
while(ppt != NULL ) {
cout<< ppt << endl;
ppt = mystrtok(NULL,", .");
}
return 0;
}
For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.
In case the link got broken, paste in:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "Hello, my name is? Matthew! Hey.";
char* p;
for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
puts(p);
}
return 0;
}
Credits go to Anders K.
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DICT_LEN 256
int *create_delim_dict(char *delim)
{
int *d = (int*)malloc(sizeof(int)*DICT_LEN);
memset((void*)d, 0, sizeof(int)*DICT_LEN);
int i;
for(i=0; i< strlen(delim); i++) {
d[delim[i]] = 1;
}
return d;
}
char *my_strtok(char *str, char *delim)
{
static char *last, *to_free;
int *deli_dict = create_delim_dict(delim);
if(!deli_dict) {
/*this check if we allocate and fail the second time with entering this function */
if(to_free) {
free(to_free);
}
return NULL;
}
if(str) {
last = (char*)malloc(strlen(str)+1);
if(!last) {
free(deli_dict);
return NULL;
}
to_free = last;
strcpy(last, str);
}
while(deli_dict[*last] && *last != '\0') {
last++;
}
str = last;
if(*last == '\0') {
free(deli_dict);
free(to_free);
deli_dict = NULL;
to_free = NULL;
return NULL;
}
while (*last != '\0' && !deli_dict[*last]) {
last++;
}
*last = '\0';
last++;
free(deli_dict);
return str;
}
int main()
{
char * str = "- This, a sample string.";
char *del = " ,.-";
char *s = my_strtok(str, del);
while(s) {
printf("%s\n", s);
s = my_strtok(NULL, del);
}
return 0;
}
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
printf("%s\n",p);
}
Input: How is that
Result:
How
is
that
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok is replacing delimiter with'\0' NULL character in given string
CODE
#include<iostream>
#include<cstring>
int main()
{
char s[]="30/4/2021";
std::cout<<(void*)s<<"\n"; // 0x70fdf0
char *p1=(char*)0x70fdf0;
std::cout<<p1<<"\n";
char *p2=strtok(s,"/");
std::cout<<(void*)p2<<"\n";
std::cout<<p2<<"\n";
char *p3=(char*)0x70fdf0;
std::cout<<p3<<"\n";
for(int i=0;i<=9;i++)
{
std::cout<<*p1;
p1++;
}
}
OUTPUT
0x70fdf0 // 1. address of string s
30/4/2021 // 2. print string s through ptr p1
0x70fdf0 // 3. this address is return by strtok to ptr p2
30 // 4. print string which pointed by p2
30 // 5. again assign address of string s to ptr p3 try to print string
30 4/2021 // 6. print characters of string s one by one using loop
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that strtok is not just returning adress but it is placing '\0' character where delimiter is present.
cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''