Substring garbage in C

Substring garbage in C - c

I want to make a program that cuts a string into 2 strings if the string contains a ?
For example,
Test1?Test2
should become
Test1 Test2
I am trying to do it dynamically but at the prints I get some garbage and I cannot figure out what I am doing wrong. Here is my code:
for(i=0;i<strlen(expression);i++){
if(expression[i]=='?' || expression[i]=='*'){
position=i; break;
}
}
printf("position=%d\n",position);
if(position!=0){
before = (char*) malloc(sizeof(char)*(position +1 ));
strncpy(before,expression,position);
before[strlen(before)]='\0';
}
if(position!=strlen(expression))
{
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
}
printf("before:%s,after:%s\n",before,after);

Since it is illegal to take length of a string until it is null terminated, you cannot do this:
before[strlen(before)]='\0';
You need to do this instead:
before[pos]='\0';
Same goes for the after part - this is illegal:
after[strlen(after)]='\0';
You need to compute the length of after upfront, and then terminate the string using the pre-computed length.

You can use strsep
char* token;
while ((token = strsep(&string, "?")) != NULL)
{
printf("%s\n", token);
}

Instead of doing it manually, you can also make good use of strtok(). Check the details here.

for(i=0;i<strlen(expression);i++){
This is very inefficient. strlen(expression) gets computed at every loop (so you get an O(n2) complexity, unless the compiler is clever enough to prove that expression stays a constant string and then to move the computation of strlen(expression) before the loop.....). Should be
for (i=0; expression[i]; i++) {
(since expression[i] is zero only when you reached the terminating null byte; for readability reasons you could have coded that condition expression[i] != '\0' if you wanted to)
And as dasblinkenlight answered you should not compute the strlen on an uninitialized buffer.
You should compute once and for all the strlen(expression) i.e. start with:
size_t exprlen = strlen(expression);
and use exprlen everywhere else instead of strlen(expression).
At last, use strdup(3):
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
should be
after = strdup(expression-position+1);
and you forgot a very important thing: always test against failure of memory allocation (and of other system functions); so
before = (char*) malloc(sizeof(char)*(position +1));
should really be (since sizeof(char) is always 1):
before = malloc(position+1);
if (!before) { perror("malloc of before"); exit(EXIT_FAILURE); };
Also, if the string pointed by expression is not used elsewhere you could simply clear the char at the found position (so it becomes a string terminator) like
if (position>=0) expression[position] = `\0`;
and set before=expression; after=expression+position+1;
At last, learn more about strtok(3), strchr(3), strstr(3), strpbrk(3), strsep(3), sscanf(3)
BTW, you should always enable all warnings and debugging info in the compiler (e.g. compile with gcc -Wall -g). Then use the debugger (like gdb) to step by step and understand what is happening... If a memory leak detector like valgrind is available, use it.

Related

Invalid read of size 1 in while loop condition

I recently inherited code, written in C, without any documentation. I've been working at optimizing and fixing it and I've come across this.
int LookBack(char * Start, int Length, char *Ignore)
{
char LookBuffer[10];
//while(Start[-1] && Length--) Start--; // Start[-1]. No idea what that is supposed to mean.
while(Length > 0 && Start[0]){
Start--;
Length--;
}
strncpy(LookBuffer, Start, sizeof(LookBuffer));
if(strcasestr(LookBuffer, Ignore)) {
return(1);
}
return(0);
}
This function is used to determine if a substring is a certain distance in front of the string Start. For example, take the string The designation is API RP 5L1 and Start is a pointer to API RP 5L1. So, if Ignore = "The" and Length = 10, the function will return 0.
My Question
Valgrind gives me the Invalid read of size 1 error because it is reading past the allocated memory at while(Length > 0 && Start[0]), or so I believe. Is there any way I can check that Start[0] is in allocated memory without doing an invalid read?

For C functions that are working with memory buffers, it is caller responsibility to pass valid pointers. There might be some platform-specific trick, but in terms of standard C there's no way, as well as for many platforms (for example just-freed memory is often indistinguishable from still allocated).

The function is called LookBack, so it seems to be called in some string processing / tokenization process similar to strtok(), which insert some \0 at the split point.
while(Start[-1] && Length--) Start--;
Look at the position before Start[0], if it is not a \0 string terminator. If it is not a \0 go one back.
while( (*(Start-1) != '\0') && (0 != (Length--))) Start--;
So, after the while loop you actually get a "start"-pointer in the string passed by Start pointer without readjusting it by +1 to actually get the second string part.
In your replacement, you actually miss to advance the Start pointer afterwards, because now it is pointing to a \0, which ends a string, so your string functions will just see an string of strlen(Start) = 0.

Why doesn't strcpy work?

char sentence2[10];
strncpy(sentence2, second, sizeof(sentence2)); //shouldn't I specify the sizeof(source) instead of sizeof(destination)?
sentence2[10] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
//////////////////////////////////////////////////////////////
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this meaningless loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
So here's the problem. When I run the first part of this code, the program crashes.
However, when I add the for loop that just prints garbage values in memory locations, it does not crash but still won't strcpy properly.
Second, when using strncpy, shouldn't I specify the sizeof(source) instead of sizeof(destination) since I'm moving the bytes of the source ?
Third, It makes sense to me to add the the null terminating character after strncpy, since I've read that it doesn't add the null character on its own, but I get a warning that it's a possible out of bounds store from my pelles c IDE.
fourth and most importantly, why doesn't the simply strcpy work ?!?!
////////////////////////////////////////////////////////////////////////////////////
UPDATE:
#include <stdio.h>
#include <string.h>
void main3(void)
{
puts("\n\n-----main3 reporting for duty!------\n");
char *first = "Metal Gear";
char *second = "Suikoden";
printf("strcmp(first, first) = %d\n", strcmp(first, first)); //returns 0 when both strings are identical.
printf("strcmp(first, second) = %d\n", strcmp(first, second)); //returns a negative when the first differenet char is less in first string. (M=77 S=83)
printf("strcmp(second, first) = %d\n", strcmp(second, first)); //returns a positive when the first different char is greater in first string.(M=77 S=83)
char sentence1[10];
strcpy(sentence1, first);
puts(sentence1);
char sentence2[10];
strncpy(sentence2, second, 10); //shouldn't I specify the sizeof(source) instead of sizeof(destination).
sentence2[9] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this nonsensical loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
}
This is how I teach myself to program. I write code and comment all I know about it so that
the next time I need to look up something, I just look at my own code in my files. In this one, I'm trying to learn the string library in c.

char *first = "Metal Gear";
char sentence1[10];
strcpy(sentence1, first);
This doesn't work because first has 11 characters: the ten in the string, plus the null terminator. So you would need char sentence1[11]; or more.
strncpy(sentence2, second, sizeof(sentence2));
//shouldn't I specify the sizeof(source) instead of sizeof(destination)?
No. The third argument to strncpy is supposed to be the size of the destination. The strncpy function will always write exactly that many bytes.
If you want to use strncpy you must also put a null terminator on (and there must be enough space for that terminator), unless you are sure that strlen(second) < sizeof sentence2.
Generally speaking, strncpy is almost never a good idea. If you want to put a null-terminated string into a buffer that might be too small, use snprintf.
This is how I teach myself to program.
Learning C by trial and error is not good. The problem is that if you write bad code, you may never know. It might appear to work , and then fail later on. For example it depends on what lies in memory after sentence1 as to whether your strcpy would step on any other variable's toes or not.
Learning from a book is by far and away the best idea. K&R 2 is a decent starting place if you don't have any other.
If you don't have a book, do look up online documentation for standard functions anyway. You could have learnt all this about strcpy and strncpy by reading their man pages, or their definitions in a C standard draft, etc.

Your problems start from here:
char sentence1[10];
strcpy(sentence1, first);
The number of characters in first, excluding the terminating null character, is 10. The space allocated for sentence1 has to be at least 11 for the program to behave in a predictable way. Since you have already used memory that you are not supposed to use, expecting anything to behave after that is not right.
You can fix this problem by changing
char sentence1[10];
to
char sentence1[N]; // where N > 10.
But then, you have to ask yourself. What are you trying to accomplish by allocating memory on the stack that's on the edge of being wrong? Are you trying to learn how things behave at the boundary of being wrong/right? If the answer to the second question is yes, hopefully you learned from it. If not, I hope you learned how to allocate adequate memory.

this is an array bounds write error. The indices are only 0-9
sentence2[10] = '\0';
it should be
sentence2[9] = '\0';
second, you're protecting the destination from buffer overflow, so specifying its size is appropriate.
EDIT:
Lastly, in this amazingly bad piece of code, which really isn't worth mentioning, is relevant to neither strcpy() nor strncpy(), yet seems to have earned me the disfavor of #nonsensicke, who seems to write very verbose and thoughtful posts... there are the following:
char *pointer = first;
for(int i =0; i < 500; i++)
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
Your use of int i=0 in the for loop is C99 specific. Depending on your compiler and compiler arguments, it can result in a compilation error.
for(int i =0; i < 500; i++)
better
int i = 0;
...
for(i=0;i<500;i++)
You neglect to check the return code of printf or indicate that you are deliberately ignoring it. I/O can fail after all...
printf("%c", *pointer);
better
int n = 0;
...
n = printf("%c", *pointer);
if(n!=1) { // error! }
or
(void) printf("%c", *pointer);
some folks will get onto you for not using {} with your if statements
if(*pointer == '\n') putchar('\n');
better
if(*pointer == '\n') {
putchar('\n');
}
but wait there's more... you didn't check the return code of putchar()... dang
better
unsigned char c = 0x00;
...
if(*pointer == '\n') {
c = putchar('\n');
if(c!=*pointer) // error
}
and lastly, with this nasty little loop you're basically romping through memory like a Kiwi in a Tulip field and lucky if you hit a newline. Depending on the OS (if you even have an OS), you might actually encounter some type of fault, e.g. outside your process space, maybe outside addressable RAM, etc. There's just not enough info provided to say actually, but it could happen.
My recommendation, beyond the absurdity of actually performing some type of detailed analysis on the rest of that code, would be to just remove it altogether.
Cheers!

strtok not going through all tokens

I'm trying to implement a shell as part of a school assignment, and I'm stuck on the file input/output redirection part.
More specifically, I've come up with a function which allows me to detect whether or not the command entered in specifies a '>' or '<' or even a '|'.
Ideally, if I enter ls -a > ls.tx', then the tokens ls -a and ls.txt should be returned.
My code doesn't do this, it only returns ls -a then stops.
My code is below:
/*commandLine is a char* taken in from the user, and is a null-terminated string */
int counter = 0;
parsedLine = strtok(commandLine, ">");
while (parsedLine != NULL)
{
if (counter == 0)
{
strncpy(parsedCpy, parsedLine, strlen(parsedLine));
parseCommand(parsedCpy, commands);
counter++;
}
else
{
redirect->re_stdout = parsedLine;
}
parsedLine = strtok(NULL, ">");
}
I've tried it in another test file just to see if there was something wrong, but this test file (code below) returns the expected result (that is, ls -a and ls.txt)
char myString[] = "ls -a > ls.txt";
char* parsed;
parsed = strtok(myString, ">");
while (parsed != NULL)
{
printf("%s\n", parsed);
parsed = strtok(NULL, ">");
}
Is there something that I'm just not understanding? I don't really see where I'm going wrong, since the code itself is nearly the same in both cases.

Note that strncpy won't zero terminate a string unless the zero termination is part of the source being copied. See man strncpy. It says:
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
That could be horsing something else up depending upon what parseCommand does.
In this case, you should just do a strcpy. A strncpy doesn't really do anything for you if you're giving it the length of the source string, unless you're intentionally trying to avoid copying the null terminator. So you should use, strcpy(parsedCpy, parsedLine);.

I cannot see how parsedLine is declared, but it needs to be handled explicitly and carefully. i.e. make sure the pointer to that value is not changed except by strtok(), and make sure that it remains null terminated. One thing I do when using strtok() for multiple calls, is to use an intermediate value to collect results, helping to keep the target buffer pure and unchanged except by strtok()
A small code snippet to illustrate:
char a[] = {"ls -a > ls.tx"};
char *buff;
char keep[80];
buff = strtok(a, ">");
strcpy(keep, buff);
buff = strtok(NULL, ">");
strcat(keep, buff);
This usage of strtok() is clean, i.e. it does not allow buff to be affected except by another call to strtok()
By comparison, this section of your code is a little scary because I do not know the output of the strncpy() which depends so heavily on the third argument, and can corrupt (place unexpected results into) parseCommand :
if (counter == 0)
{
strncpy(parsedCpy, parsedLine, strlen(parsedLine));
parseCommand(parsedCpy, commands);
counter++;
}
else
{
redirect->re_stdout = parsedLine;
}
parsedLine = strtok(NULL, ">");
Along the lines of keeping the target buffer pure, (even though it does not appear to be an issue here), strtok() is not thread safe. If a function using strtok() is used in a multi threaded process, the target buffer is subject to any number of calls, resulting in unexpected, and perhaps even undefined behavior. In this case using strtok_r() is a better option

C - Array of strings (2D array) and memory allocation gets me unwanted characters

I have a tiny problem with my assignment. The whole program is about tree data structures but I do not have problems with that.
My problem is about some basic stuff: reading strings from user input and then storing them in an array list.
char str[1000];
fgets(str, 1000, stdin);
int x = 0;
int y = 0;
int z = 0;
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
while(str[z] != '\n')
{
list[x][y] = str[z];
z++;
if(str[z] == ',')
{
x++;
y = 0;
list = (char**)realloc(list, (x+1) * sizeof(char*));
list[x] = (char*)malloc((y + 1)*sizeof(char));
z++;
if(str[z] == ' ') // Skips space after the comma
{
z++;
}
}
else if(str[z] == '\n')
{
break;
}
else
{
y++;
list[x] = (char*)realloc(list[x], (y+1)*sizeof(char));
}
}
I pass this list array into another function.
As an example, inputs could be something like
Abcde, Fghijk, Lmnop, Qrstu
and I am trying to split each of these words into the array list.
Abcde
Fghijk
Lmnop
Qrstu
When I try to output the strings I sometimes get weird, excessive characters such as upside down question marks and numbers.
printf("%s ", list[some_number]);
gets me
Fghijk¿
or
Fghijk\200
All of my program works as expected except for this minor problem which I am having trouble solving. Even with the same exact inputs the bugs may or may not appear. I am guessing it has to do with memory allocation?
Thanks for your help!

You need to put '\0' at the end of your new string.

See most of the C library functions such as printf and strlen process strings assuming \0 as the end character of all. Otherwise, they keep on reading the memory out of bounds either making a memory violation or gets some where the value 0 and stops and all the bytes in between in the memory are interpreted to their extended ascii equivalent hence you are getting such a strange behaviour.
So, allocate an extra byte for \0 character and assign it to the last byte.

Either initialize your variables to null, or as tomato said, put a null character at the end of the new string.
C lacks many of the luxuries programmers now take for granted when it comes to memory management. You're on the right path with malloc but that function only allocates memory... it doesn't clear it out. As a result, your variables will have the correct amount of space (critical for reducing memory leaks and overflow errors), but will be filled with garbage. This garbage could be anything, and in your case, it's an upside down question mark. Appropriate, don't you think?
I could be mistaken since I can't run the code myself without more information, but after your
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
statements, you'll want to do something like this:
list = NULL;
and the like to clear out the garbage.
Furthermore, you may care to use the strlen() function (contained in string.h) to figure out just how many blocks of memory you need to allocate.
Clearing out the spaces you use for variables is a good practice to get into with C. Good to see you learning it as well.

realloc missunderstanding

Can somebody explain to me, why this block of code does not work. I was looking through some of questions around but failed finding the answer. Probably because of (huge) lack of knowledge.
Thank you for any given help.
char** sentence = malloc(min);
char* temp = malloc(min2);
int i = 0;
while(i<5)
{
sentence = realloc(sentence, i+2);
scanf("%s", temp);
sentence[i] = malloc(strlen(temp));
strcpy(sentence[i], temp);
printf("%s\n", sentence[i]);
i++;
}

You forgot to account for the fact that strings have null terminators.

sentence[i] = malloc(strlen(temp));
Should be:
sentence[i] = malloc(strlen(temp)+1);
You need enough space both for the length of the string (strlen) AND also for its null-terminator.

sentence = realloc(sentence, (i+1) * sizeof(*sentence));
would make more sense: you're trying to store i+1 char*s, not i+2 bytes.
BTW, you can just replace the malloc/strlen/strcpy with:
sentence[i] = strdup(temp);
(that takes care of the nul terminator for you).

sentence = realloc(sentence, i+2);
is a common anti-pattern. If realloc returns NULL, you've just leaked sentence. Instead you need to write
temp = realloc(sentence, i+2);
if(temp == NULL)
// out of memory - do something here
sentence = temp;
To make life worse, you're using
using scanf which is a common cause of security errors
using strcpy which is a common cause of security errors
not checking the result of any of your mallocs to see if it returns NULL (if it doesn't you'll get a write-access violation)
Not adding +1 to the strlen() before calling malloc, and hence getting a 1-byte heap-overflow from the strcpy.
And using a while loop where a for loop would clearly be more appropriate.
Apart from those six security bugs, you're doing well though.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Substring garbage in C - c

You can use strsep char* token; while ((token = strsep(&string, "?")) != NULL) { printf("%s\n", token); }

Instead of doing it manually, you can also make good use of strtok(). Check the details here.

Related

Invalid read of size 1 in while loop condition

Why doesn't strcpy work?

strtok not going through all tokens

C - Array of strings (2D array) and memory allocation gets me unwanted characters

realloc missunderstanding

Categories

Resources