I have experienced an issue while using strcat, using realloc however, strcat overwrites destination string
char *splitStr(char *line) {
char *str_;
str_ = (char *) malloc(1);
char *ptr = strtok(line,"\n");
int a;
while (ptr != NULL) {
if (ptr[0] != '$') {
printf("oncesi %s\n", str_);
a = strlen(ptr) + strlen(str_) + 1;
str_ = realloc(str_, a);
strcat(str_, ptr);
str_[a] = '\0';
printf("sontasi:%s\n", str_);
}
ptr = strtok(NULL, "\n");
}
printf("splitStr %d\n", strlen(str_));
printf("%s", str_);
return str_;
}
and my input value is ;
*4
$3
200
$4
4814
$7
SUCCESS
$4
3204
so I want to split this input value via strtok;
strtok(line,'\n');
and concat all line without start "$" char to new char. However, this code give following output;
line: *4
oncesi
sontasi:*4
oncesi *4
200tasi:*4
200esi *4
4814asi:*4
4814si *4
SUCCESS:*4
SUCCESS*4
3204ESS:*4
splitStr 25
seems to overwrite source string.
do you have any idea why this issue could be happening ?
the following proposed code:
cleanly compiles
performs the indicated functionality
is slightly reformated for readability of output
checks for errors from malloc() and realloc()
shows how to initialize the str[] array, which is the problem in the OPs posted code.
the function: strlen() returns a size_t, not an int. so the proper output format conversion specifier is: %zu
does not use trailing underscores on variable names
and now, the proposed code:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
char *splitStr( char *line )
{
printf("original line: %s\n", line);
char *str = malloc(1);
if( !str )
{
perror( "malloc failed" );
exit( EXIT_FAILURE );
}
str[0] = '\0'; // critical statement
char *token = strtok(line,"\n");
while( token )
{
if( token[0] != '$')
{
char* temp = realloc( str, strlen( token ) + strlen( str ) + 1 );
if( ! temp )
{
perror( "realloc failed" );
free( str );
exit( EXIT_FAILURE );
}
str = temp; // update pointer
strcat(str, token);
printf( "concat result: %s\n", str );
}
token = strtok(NULL, "\n");
}
printf("splitStr %zu\n", strlen(str));
return str;
}
int main( void )
{
char firstStr[] = "$abcd\n$defg\nhijk\n";
char *firstNewStr = splitStr( firstStr );
printf( "returned: %s\n\n\n\n", firstNewStr );
free( firstNewStr );
char secondStr[] = "abcd\ndefg\nhijk\n";
char *secondNewStr = splitStr( secondStr );
printf( "returned: %s\n\n\n\n", secondNewStr );
free( secondNewStr );
}
a run of the proposed code results in:
original line: $abcd
$defg
hijk
concat result: hijk
splitStr 4
returned: hijk
original line: abcd
defg
hijk
concat result: abcd
concat result: abcddefg
concat result: abcddefghijk
splitStr 12
returned: abcddefghijk
Your input contains Windows/DOS end-of-line codings "\r\n".
Since strtok() just replaces '\n' with '\0', the '\r' stays in the string. On output it moves the cursor to the left and additional characters overwrite old characters, at least visually.
Your concatenated string should be OK, however. Count the characters, and don't forget to include a '\r' for each line: "*4\r200\r4814\rSUCCESS\r3204\r" are 25 characters as the output splitStr 25 shows.
Additional notes:
As others already said, str_ = (char *) malloc(1); does not initialize the space str_ points to. You need to do this yourself, in example by str_[0] = '\0';.
Don't use underscores that way.
You don't need to cast the result of malloc(), it is a void* that is compatible to char* (and any other).
Related
Are the following steps all legal or may there be some undefined behavior?
#include <stdio.h>
#include <string.h>
#define MCAT(a) ((a)+strlen(a))
void test(char *ptr)
{
static int i = 0;
int n = 0;
while ( n++ < 10 )
sprintf( MCAT(ptr), " %d ", ++i );
return;
}
int main()
{
char buf[100];
buf[0] = '\0'; //<- needed for MCAT macro
test( buf );
printf( "buf: %s\n\r", buf );
// also.. using pointer
char *txt = buf;
test( txt );
printf( "txt: %s\n\r", txt );
// printf( "buf: %s\n\r", buf ); // same output, ok
}
In the "test" function, I have often seen strcpy with a temporary local variable, so I was wondering if you can also use sprintf to change the original string "buf".
It becomes undefined behavior once you write beyond the size of the buffer (here, 100 bytes), which is something that your test function doesn't check and cannot check, since it doesn't know the buffer size.
Thanks everyone for the replies.
Neglecting the overflow factor for a moment I was thinking that the most common strategies I've seen for using sprintf to concatenate strings are:
#include <string.h>
#include <stdio.h>
char * fixcpy(char *str);
int main(int argc, char *argv[])
{
char buf[1000];
char *ptr = buf;
buf[0] = '\0';
// "ptr +=" is added only for next usage on case n.4
ptr += sprintf (buf + strlen(buf), "%s %d <- some data ...\n\r", "something", 1);
printf( buf );
printf( "----------------------------------\n\r");
// or ...
ptr += sprintf (&buf[strlen(buf)], "%s %d <- some data ...\n\r", "something", 2);
printf( buf );
printf( "----------------------------------\n\r");
// or...
ptr += sprintf(strchr(buf, '\0'), "%s %d <- some data ...\n\r", "something", 3);
printf( buf );
printf( "----------------------------------\n\r");
// or with pointers
ptr += sprintf (ptr, "%s %d <- some data ...\n\r", "something", 4);
printf( buf );
// But if you want the output as argument of input you can't do that:
char tmp[1000];
strcpy( tmp, buf ); // because I want save the source buf
printf( "---------ASPECTED ERROR-----------\n\r");
sprintf (tmp, "the output was:\n\r%s", tmp);
printf( tmp );
// But working on the other side?
printf( "---------------OK-----------------\n\r");
sprintf (buf, "the output was:\n\r%s", fixcpy(buf) );
printf( buf );
// and probably work also...
printf( "---------------OK?----------------\n\r");
sprintf (buf, "the output was (new):\n\r%s%s", fixcpy(buf), fixcpy(buf) );
printf( buf );
//
}
char *fixcpy( char *str )
{
static char buf[1000];
strcpy( buf, str );
return buf;
}
it looks pretty neat and clean to me if one knows the size of the data you're working on.
I’m trying to read text from stdin line by line using fgets() and store the text in a variable “text”. However, when I use strtok() to split the words, it only works for a couple lines before terminating. What should I change to make it run through the entire text?
#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200
int main(void) {
char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
char word[WORD_BUFFER_SIZE];
int numberOfWords = 0;
while(scanf("%s", word) == 1){
if (strcmp(word, "====") == 0){
break;
}
strcpy(stopWords[numberOfWords], word);
numberOfWords++;
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){
strcat(text, buffer);
}
char *k;
k = strtok(text, " ");
while (k != NULL) {
printf("%s\n", k);
k = strtok(NULL, " ");
}
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
sizeof(WORD_BUFFER_SIZE) is a constant, it's the size of integer. You probably mean WORD_BUFFER_SIZE * TEXT_SIZE. But you can find the file size and calculate exactly how much memory you need.
char *text = malloc(...)
strcat(text, buffer);
text is not initialized and doesn't have a null-terminator. strcat needs to know the end of text. You have to set text[0] = '\0' before using strcat (it's not like strcpy)
int main(void)
{
fseek(stdin, 0, SEEK_END);
size_t filesize = ftell(stdin);
rewind(stdin);
if (filesize == 0)
{ printf("not using a file!\n"); return 0; }
char word[1000] = { 0 };
//while (scanf("%s", word) != 1)
// if (strcmp(word, "====") == 0)
// break;
char* text = malloc(filesize + 1);
if (!text)
return 0;
text[0] = '\0';
while (fgets(word, sizeof(word), stdin) != NULL)
strcat(text, word);
char* k;
k = strtok(text, " ");
while (k != NULL)
{
printf("%s\n", k);
k = strtok(NULL, " ");
}
return 0;
}
According to the information you provided in the comments section, the input text is longer than 800 bytes.
However, in the line
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
which is equivalent to
char *text = malloc(800);
you only allocated 800 bytes as storage for text. Therefore, you did not allocate sufficient space to store the entire input into text. Attempting to store more than 800 bytes will result in a buffer overflow, which invokes undefined behavior.
If you want to store the entire input into text, then you must ensure that it is large enough.
However, this is probably not necessary. Depending on your requirements, it is probably sufficient to process one line at a time, like this:
while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
char *k = strtok( buffer, " " );
while ( k != NULL )
{
printf( "%s\n", k );
k = strtok( NULL, " " );
}
}
In that case, you do not need the array text. You only need the array buffer for storing the current contents of the line.
Since you did not provide any sample input, I cannot test the code above.
EDIT: Based on your comments to this answer, it seems that your main problem is how to read in all of the input from stdin and store it as a string, when you do not know the length of the input in advance.
One common solution is to allocate an initial buffer, and to double its size every time it gets full. You can use the function realloc for this:
#include <stdio.h>
#include <stdlib.h>
int main( void )
{
char *buffer;
size_t buffer_size = 1024;
size_t input_size = 0;
//allocate initial buffer
buffer = malloc( buffer_size );
if ( buffer == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//continuously fill the buffer with input, and
//grow buffer as necessary
for (;;) //infinite loop, equivalent to while(1)
{
//we must leave room for the terminating null character
size_t to_read = buffer_size - input_size - 1;
size_t ret;
ret = fread( buffer + input_size, 1, to_read, stdin );
input_size += ret;
if ( ret != to_read )
{
//we have finished reading from input
break;
}
//buffer was filled entirely (except for the space
//reserved for the terminating null character), so
//we must grow the buffer
{
void *temp;
buffer_size *= 2;
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
}
//make sure that `fread` did not fail end due to
//error (it should only end due to end-of-file)
if ( ferror(stdin) )
{
fprintf( stderr, "input error!\n" );
exit( EXIT_FAILURE );
}
//add terminating null character
buffer[input_size++] = '\0';
//shrink buffer to required size
{
void *temp;
temp = realloc( buffer, input_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
//the entire contents is now stored in "buffer" as a
//string, and can be printed
printf( "contents of buffer:\n%s\n", buffer );
free( buffer );
}
The code above assumes that the input will be terminated by an end of file condition, which is probably the case if the input is piped from a file.
On second thought, instead of having one large string for the whole file, as you are doing in your code, you may rather want an array of char* to the individual strings, each representing a line, so that for example lines[0] will be the string of the first line, lines[1] will be the string of the second line. That way, you can easily use strstr to find the " ==== " deliminator and strchr on each individual line to find the individual words, and still have all the lines in memory for further processing.
I don't recommend that you use strtok in this case, because that function is destructive in the sense that it modifies the string, by replacing the deliminators with null characters. If you require the strings for further processing, as you stated in the comments section, then this is probably not what you want. That is why I recommend that you use strchr instead.
If a reasonable maximum number of lines is known at compile-time, then the solution is rather easy:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024
int main( void )
{
char *lines[MAX_LINES];
int num_lines = 0;
char buffer[MAX_LINE_LENGTH];
//read one line per loop iteration
while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
int line_length = strlen( buffer );
//verify that entire line was read in
if ( buffer[line_length-1] != '\n' )
{
//treat end-of file as equivalent to newline character
if ( !feof( stdin ) )
{
fprintf( stderr, "input line exceeds maximum line length!\n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character from string
buffer[--line_length] = '\0';
}
//allocate memory for new string and add to array
lines[num_lines] = malloc( line_length + 1 );
//verify that "malloc" succeeded
if ( lines[num_lines] == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//copy line to newly allocated buffer
strcpy( lines[num_lines], buffer );
//increment counter
num_lines++;
}
//All input lines have now been successfully read in, so
//we can now do something with them.
//handle one line per loop iteration
for ( int i = 0; i < num_lines; i++ )
{
char *p, *q;
//attempt to find the " ==== " marker
p = strstr( lines[i], " ==== " );
if ( p == NULL )
{
printf( "Warning: skipping line because unable to find \" ==== \".\n" );
continue;
}
//skip the " ==== " marker
p += 6;
//split tokens on remainder of line using "strchr"
while ( ( q = strchr( p, ' ') ) != NULL )
{
printf( "found token: %.*s\n", (int)(q-p), p );
p = q + 1;
}
//output last token
printf( "found token: %s\n", p );
}
//cleanup allocated memory
for ( int i = 0; i < num_lines; i++ )
{
free( lines[i] );
}
}
When running the program above with the following input
first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator
it has the following output:
found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator
If, however, there is no reasonable maximum number of lines known at compile-time, then the array lines will also have to be designed to grow in a similar way as buffer in the previous program. The same applies for the maximum line length.
I want to insert a char* to a defined location in another char*. For example:
char str1[80] = "Hello world!";
char str2[] = "the ";
I want the result to be Hello the world! (insert str2 to location 6 of str1)
I have tried:
#include <stdio.h>
char str1[80] = "Hello world!";
char str2[] = "the ";
char tmp1[80];
char tmp2[80];
char *string_insert(char *scr, char *ins, int loc){
// Get the chars from location 0 -> loc
for(int i = 0; i <= loc; i++){
tmp1[i] = scr[i];
}
// Get the chars from loc -> end of the string
for(int i = 0; i < sizeof(scr) - loc; i++){
tmp2[i] = scr[i + loc];
}
// Insert the string ins
for(int i = 0; i < sizeof(ins); i++){
tmp1[i + loc] = ins[i];
}
// Add the rest of the original string
for(int i = 0; i < sizeof(scr) - loc; i++){
tmp1[loc + 1 + i] = tmp2[i];
}
return tmp1;
}
int main(){
printf("%s", string_insert(str1, str2, 6));
return 0;
}
But then I got Hello two. You can execute it online at onlinegdb.com
I also wonder if there is any function from string.h that can do this?
Thanks for any help!
There's no function in the standard library to insert a string into another.
You have to first create enough space for the characters you want to insert by
moving the original characters to the right:
Hello World\0
Hello WorlWorld\0 <- move characters to the right
Hello the World\0 <- rewrite (including ' ')
In your example you have enough space (you create a buffer of 80 characters and use only 12 of them) but you should be absolutely be sure that this is the case.
However this is not what your code does. You copy those characters in another buffer and use that one as return value. In other words, your str1 is left unchanged. I don't think this is what you wanted, right?
fixed code with some comments to help with understanding:
#include <stdio.h>
#include <string.h>
char str1[] = "Hello world!"; // use `[]` to include trailing \0
char str2[] = "the ";
char tmp1[80]; // not very good idea to use global variables, but for algorithm learning purposes it is ok
char tmp2[80];
char *string_insert(char *src, char *str, int loc){
// function uses global variable tmp1
// if total length of string is > 80 - result is undefined
int ti = 0; // index in tmp variable
while (ti<loc) tmp1[ti++] = *src++; // copy first part from src
while (*str) tmp1[ti++] = *str++; // append str
while (*src) tmp1[ti++] = *src++; // append the rest of src
tmp1[ti] = 0; // don't forget trailing 0
return tmp1;
}
int main(){
printf("%s", string_insert(str1, str2, 6));
return 0;
}
There is no string insert function. There is a strcat() function which appends. Implementing your own however is simple enough using the string primitives that are available:
char* string_insert( char* scr, const char* ins, size_t loc )
{
size_t ins_len = strlen(ins) ;
strcpy( &scr[loc + ins_len], &scr[loc] ) ;
memcpy( &scr[loc], ins, ins_len ) ;
return scr ;
}
Here the end of the string is moved to make space for ins, then ins copied to the gap.
Example usage:
int main()
{
char str1[80] = "Hello world!" ;
const char* str2 = "the " ;
printf( "%s\n", string_insert( str1, str2, 6 ) ) ;
return 0;
}
The insertion is done in place and directly modifies scr so needs no destination buffer. That would be the idiomatic behaviour given the interface the you have specified. The use of the global tmp1 and tmp2 arrays in your attempt are not good practice, and entirely unnecessary (as is always the case with globals). If you do want to modify a separate destination string (so scr is const), then you should pass that buffer as an argument:
char* string_insert( const char* scr, const char* ins, size_t loc, char* dest )
{
size_t ins_len = strlen(ins) ;
memmove( dest, scr, loc ) ;
strcpy( &dest[loc + ins_len], &scr[loc] ) ;
memcpy( &dest[loc], ins, ins_len ) ;
return dest ;
}
int main()
{
const char* str1 = "Hello world!";
const char* str2 = "the " ;
char str3[80] = "" ;
printf( "%s\n", string_insert(str1, str2, 6, str3 ) ) ;
return 0;
}
The use of const parameters allows string literals to be passed as arguments so for the first "in-place" version.:
char str[80] = "Hello world!" ;
printf( "%s\n", string_insert( str, "the ", 6 ) ) ;
And for the "destination buffer" version:
char str[80] = "" ;
printf( "%s\n", string_insert( "Hello world!", "the ", 6, str ) ) ;
I have this simple line parser into tokens function...
But something im missing.
int parse_line(char *line,char **words){
int wordc=0;
/* get the first token */
char *word = strtok(line, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
/* walk through other tokens */
while( word != NULL ) {
word = strtok(NULL, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
}
return wordc;
}
When i run it i get a segmentation fault!
I give as first argument char[256] line and as second of course a char** words but i have first malloc memory for that one. like that
char **words = (char **)malloc(256 * sizeof(char *));
main:
.
.
.
char buffer[256];
char **words = (char **)malloc(256 * sizeof(char *));
.
.
.
n = read(stdin, buffer, 255);
if (n < 0){
perror("ERROR");
break;
}
parse_line(buffer,words);
When program executes parse_line it exits with segmentation fault
Found where the seg fault occures. And it's on that line here:
strcpy(words[wordc++],word );
And specifically on the first strcpy. Before it even reaches the while loop
while( word != NULL ) {
word = strtok(NULL, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
}
At the end of the line, word will always be set to NULL (as expected) and so strcpy(words[wordc++],word ) will be undefined behavior (likely a crash).
You need to reorganize the loop so you never try to copy a NULL string.
#jxh suggests this solution which fixes the issue of word being NULL in either of your strcpys.
/* get the first token */
char *word = strtok(line, " ");
while( word != NULL ) {
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
word = strtok(NULL, " ");
}
I'd do this (uses less memory)
/* get the first token */
char *word = strtok(line, " ");
while( word != NULL ) {
words[wordc++] = strdup(word);
word = strtok(NULL, " ");
}
the following proposed code:
cleanly compiles
performs the desired functionality
properly checks for errors
displays the results to the user
fails to pass all allocated memory to free() so has lots of memory leaks
and now the proposed code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// avoid 'magic' numbers in code
#define MAX_WORDS 256
#define MAX_LINE_LEN 256
int parse_line( char *line, char **words )
{
int wordc=0;
/* get the first token */
char *token = strtok(line, " ");
while( wordc < MAX_WORDS && token )
{
words[wordc] = strdup( token );
if( ! words[wordc] )
{
perror( "strdup failed" );
exit( EXIT_FAILURE );
}
// implied else, strdup successful
wordc++;
// get next token
token = strtok(NULL, " ");
}
return wordc;
}
int main( void )
{
char buffer[ MAX_LINE LENGTH ];
// fix another problem with OPs code
char **words = calloc( MAX_WORDS, sizeof( char* ) );
if( ! words )
{
perror( "calloc failed" );
exit( EXIT_FAILURE );
}
// implied else, calloc successful
// note: would be much better to use 'fgets()' rather than 'read()'
ssize_t n = read( 0, buffer, sizeof( buffer ) );
if (n <= 0)
{
perror("read failed");
exit( EXIT_FAILURE );
}
// implied else, read successful
// note: 'read()' does not NUL terminate the data
buffer[ n ] = '\0';
int count = parse_line( buffer, words );
for( int i = 0; i < count; i++ )
{
printf( "%s\n", words[i] );
}
}
here is a typical run of the program:
hello old friend <-- user entered line
hello
old
friend
Your answers are right ! BUT i had segF again BECAUSE OF READ!!!!!
i didn't notice that when i run the program it didn't stop for reading from the input at read !
Instead it was passing it. What i did is i changed read to fgets and it worked !!!
With also your changes!
Can someone explain to me this???? Why it doesn't stop at read function??
This is my code:
#define LEN 40
#define STUDLIST "./students.txt"
int main()
{
FILE * studd;
char del[] = "" " '\n'";
char name[LEN], surname[LEN], str[LEN];
char *ret;
char *tokens[2] = {NULL};
char *pToken = str;
unsigned int i = 0;
/* open file */
if ( (studd = fopen(STUDLIST,"r") ) == NULL )
{
fprintf(stderr, "fopen\n");
exit(EXIT_FAILURE);
}
while((ret = fgets(str, LEN, studd)))
{
if(ret)
{
for( tokens[i] = strtok_r( str, del, &pToken ); ++i < 2;
tokens[i] = strtok_r( NULL, del, &pToken ) );
strcpy(name, tokens[0]);
strcpy(surname, tokens[1]);
printf( "name = %s\n", name );
printf( "surname = %s\n", surname );
}
fflush(studd);
}
fclose(studd);
return 0;
}
Here there is the file students.txt: http://pastebin.com/wNpmXYis
I don't understand why the output isn't correct as I expected.
I use a loop to read each line with fgets, then I have a sting composed by [Name Surname], and I want to divide it in two different strings ([name] and [surname]) using strtok_r. I tried with a static string and it works well, but If I read many strings from FILE the output is not correct as you can see here:
http://pastebin.com/70uPMzPh
Where is my fault?
Why are you using forloop?
...
while((ret = fgets(str, LEN, studd)))
{
if(ret)
{
tokens[0] = strtok_r( str, del, &pToken );
tokens[1] = strtok_r( NULL, del, &pToken );
strcpy(name, tokens[0]);
strcpy(surname, tokens[1]);
printf( "name = %s\n", name );
printf( "surname = %s\n", surname );
}
}
You start i at zero:
unsigned int i = 0;
And later you increment it:
++i < 2;
You never set i back to zero, and in fact, continue incrementing i again for every new line in your file. With 14 names in your input file, I expect i to get to about 14.(or maybe 13 or 15, depending on the exact logic).
So this line:
tokens[i] = strtok_r(...);
ends up putting strtok results into tokens[2..15]. But only tokens[0] and tokens[1] are valid. Everything else is undefined behavior.
Answer: Be sure you reset i to zero when you read a new line of your file.