Invalid read of size 1 in while loop condition - c

I recently inherited code, written in C, without any documentation. I've been working at optimizing and fixing it and I've come across this.
int LookBack(char * Start, int Length, char *Ignore)
{
char LookBuffer[10];
//while(Start[-1] && Length--) Start--; // Start[-1]. No idea what that is supposed to mean.
while(Length > 0 && Start[0]){
Start--;
Length--;
}
strncpy(LookBuffer, Start, sizeof(LookBuffer));
if(strcasestr(LookBuffer, Ignore)) {
return(1);
}
return(0);
}
This function is used to determine if a substring is a certain distance in front of the string Start. For example, take the string The designation is API RP 5L1 and Start is a pointer to API RP 5L1. So, if Ignore = "The" and Length = 10, the function will return 0.
My Question
Valgrind gives me the Invalid read of size 1 error because it is reading past the allocated memory at while(Length > 0 && Start[0]), or so I believe. Is there any way I can check that Start[0] is in allocated memory without doing an invalid read?

For C functions that are working with memory buffers, it is caller responsibility to pass valid pointers. There might be some platform-specific trick, but in terms of standard C there's no way, as well as for many platforms (for example just-freed memory is often indistinguishable from still allocated).

The function is called LookBack, so it seems to be called in some string processing / tokenization process similar to strtok(), which insert some \0 at the split point.
while(Start[-1] && Length--) Start--;
Look at the position before Start[0], if it is not a \0 string terminator. If it is not a \0 go one back.
while( (*(Start-1) != '\0') && (0 != (Length--))) Start--;
So, after the while loop you actually get a "start"-pointer in the string passed by Start pointer without readjusting it by +1 to actually get the second string part.
In your replacement, you actually miss to advance the Start pointer afterwards, because now it is pointing to a \0, which ends a string, so your string functions will just see an string of strlen(Start) = 0.

Related

Finding the Beginning of a string in C

To solve a question, I am looking for a way to stop a loop after it has reached the beginning of the string, assuming the loop starts from the end and decrements, is there an alternative way to do this without finding the length of the string first and decrementing till the number is zero?
Please keep in mind the only functions I can use are malloc, free and write.
This is not possible, because there is nothing special about a string's contents at the beginning. C strings have a "sentinel value" at their end - '\0' - but the first character, and the byte in memory before the first character, can have any value.
is there an alternative way to do this without finding the length of the string first and decrementing till the number is zero?
Apparently you already know where the end of the string is. I suppose you must have a pointer to the terminator character, since you think you do not know the string length.
If finding the length of the string is a viable option at all, however, then you must already know where the beginning is, too. And if you know where the beginning is and you know where the end is, then you already know the length: it is end - beginning. But you do not need to keep a separate counter to iterate backward from the end of a string to the beginning, supposing that you do know where both the end and the beginning are. You can simply use pointer comparisons instead. For example:
int count_a_backwards(const char *beginning, const char *end) {
int count = 0;
for (const char *c = end; c > beginning; ) {
if (*--c == 'a') count += 1;
}
return count;
}
If in fact you do not know where the beginning of the string is, however, then you cannot identify it at all, at least not in the general case. Perhaps you can recognize the beginning if you have some kind of prior knowledge about the string's contents, or about its alignment, or some such, but in general, the beginning of a string cannot be recognized.
Please keep in mind the only functions I can use are malloc, free and
write.
If you are using the function malloc then the function returns pointer to the first byte of the allocated memory. So if the allocated array will contain a string then its beginning will be known.
The task is to find the end of the string.
You can use either the standard C function strlen or write your own loop that will find the end of the stored string.
So if you have two pointers, one that points to the beginning of a string and the second that points to the end of the same string then to traverse the string in the reverse order is not a hard work.
Pay attention to that if you have a character array that contains a string like this
char s[] = "Hello";
then the expressions s, s + 1, s + 2 and so on all points to a string correspondingly "Hello", "ello", "llo" and so on.
You could find the beginning of a string having a pointer to its end provided that the first element of the array contains a unique symbol that is a sentinel value. However in general this is a very rare case.
Here is a demonstrative program that shows how you can traverse a string in the reverse order without using standard C string functions except a function that places a string in a dynamically allocated array.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
enum { N = 12 };
char *s = malloc( N );
strcpy( s, "Hello World" );
puts( s );
char *p = s;
while ( *p ) ++p;
while ( p != s ) putchar( *--p );
putchar( '\n');
free( s );
return 0;
}
The program output is
Hello World
dlroW olleH

Why doesn't strcpy work?

char sentence2[10];
strncpy(sentence2, second, sizeof(sentence2)); //shouldn't I specify the sizeof(source) instead of sizeof(destination)?
sentence2[10] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
//////////////////////////////////////////////////////////////
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this meaningless loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
So here's the problem. When I run the first part of this code, the program crashes.
However, when I add the for loop that just prints garbage values in memory locations, it does not crash but still won't strcpy properly.
Second, when using strncpy, shouldn't I specify the sizeof(source) instead of sizeof(destination) since I'm moving the bytes of the source ?
Third, It makes sense to me to add the the null terminating character after strncpy, since I've read that it doesn't add the null character on its own, but I get a warning that it's a possible out of bounds store from my pelles c IDE.
fourth and most importantly, why doesn't the simply strcpy work ?!?!
////////////////////////////////////////////////////////////////////////////////////
UPDATE:
#include <stdio.h>
#include <string.h>
void main3(void)
{
puts("\n\n-----main3 reporting for duty!------\n");
char *first = "Metal Gear";
char *second = "Suikoden";
printf("strcmp(first, first) = %d\n", strcmp(first, first)); //returns 0 when both strings are identical.
printf("strcmp(first, second) = %d\n", strcmp(first, second)); //returns a negative when the first differenet char is less in first string. (M=77 S=83)
printf("strcmp(second, first) = %d\n", strcmp(second, first)); //returns a positive when the first different char is greater in first string.(M=77 S=83)
char sentence1[10];
strcpy(sentence1, first);
puts(sentence1);
char sentence2[10];
strncpy(sentence2, second, 10); //shouldn't I specify the sizeof(source) instead of sizeof(destination).
sentence2[9] = '\0'; //Is this okay since strncpy does not provide the null character.
puts(sentence2);
char *pointer = first;
for(int i =0; i < 500; i++) //Why does it crashes without this nonsensical loop?!
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
}
This is how I teach myself to program. I write code and comment all I know about it so that
the next time I need to look up something, I just look at my own code in my files. In this one, I'm trying to learn the string library in c.
char *first = "Metal Gear";
char sentence1[10];
strcpy(sentence1, first);
This doesn't work because first has 11 characters: the ten in the string, plus the null terminator. So you would need char sentence1[11]; or more.
strncpy(sentence2, second, sizeof(sentence2));
//shouldn't I specify the sizeof(source) instead of sizeof(destination)?
No. The third argument to strncpy is supposed to be the size of the destination. The strncpy function will always write exactly that many bytes.
If you want to use strncpy you must also put a null terminator on (and there must be enough space for that terminator), unless you are sure that strlen(second) < sizeof sentence2.
Generally speaking, strncpy is almost never a good idea. If you want to put a null-terminated string into a buffer that might be too small, use snprintf.
This is how I teach myself to program.
Learning C by trial and error is not good. The problem is that if you write bad code, you may never know. It might appear to work , and then fail later on. For example it depends on what lies in memory after sentence1 as to whether your strcpy would step on any other variable's toes or not.
Learning from a book is by far and away the best idea. K&R 2 is a decent starting place if you don't have any other.
If you don't have a book, do look up online documentation for standard functions anyway. You could have learnt all this about strcpy and strncpy by reading their man pages, or their definitions in a C standard draft, etc.
Your problems start from here:
char sentence1[10];
strcpy(sentence1, first);
The number of characters in first, excluding the terminating null character, is 10. The space allocated for sentence1 has to be at least 11 for the program to behave in a predictable way. Since you have already used memory that you are not supposed to use, expecting anything to behave after that is not right.
You can fix this problem by changing
char sentence1[10];
to
char sentence1[N]; // where N > 10.
But then, you have to ask yourself. What are you trying to accomplish by allocating memory on the stack that's on the edge of being wrong? Are you trying to learn how things behave at the boundary of being wrong/right? If the answer to the second question is yes, hopefully you learned from it. If not, I hope you learned how to allocate adequate memory.
this is an array bounds write error. The indices are only 0-9
sentence2[10] = '\0';
it should be
sentence2[9] = '\0';
second, you're protecting the destination from buffer overflow, so specifying its size is appropriate.
EDIT:
Lastly, in this amazingly bad piece of code, which really isn't worth mentioning, is relevant to neither strcpy() nor strncpy(), yet seems to have earned me the disfavor of #nonsensicke, who seems to write very verbose and thoughtful posts... there are the following:
char *pointer = first;
for(int i =0; i < 500; i++)
{
printf("%c", *pointer);
if(*pointer == '\n')
putchar('\n');
pointer++;
}
Your use of int i=0 in the for loop is C99 specific. Depending on your compiler and compiler arguments, it can result in a compilation error.
for(int i =0; i < 500; i++)
better
int i = 0;
...
for(i=0;i<500;i++)
You neglect to check the return code of printf or indicate that you are deliberately ignoring it. I/O can fail after all...
printf("%c", *pointer);
better
int n = 0;
...
n = printf("%c", *pointer);
if(n!=1) { // error! }
or
(void) printf("%c", *pointer);
some folks will get onto you for not using {} with your if statements
if(*pointer == '\n') putchar('\n');
better
if(*pointer == '\n') {
putchar('\n');
}
but wait there's more... you didn't check the return code of putchar()... dang
better
unsigned char c = 0x00;
...
if(*pointer == '\n') {
c = putchar('\n');
if(c!=*pointer) // error
}
and lastly, with this nasty little loop you're basically romping through memory like a Kiwi in a Tulip field and lucky if you hit a newline. Depending on the OS (if you even have an OS), you might actually encounter some type of fault, e.g. outside your process space, maybe outside addressable RAM, etc. There's just not enough info provided to say actually, but it could happen.
My recommendation, beyond the absurdity of actually performing some type of detailed analysis on the rest of that code, would be to just remove it altogether.
Cheers!

Substring garbage in C

I want to make a program that cuts a string into 2 strings if the string contains a ?
For example,
Test1?Test2
should become
Test1 Test2
I am trying to do it dynamically but at the prints I get some garbage and I cannot figure out what I am doing wrong. Here is my code:
for(i=0;i<strlen(expression);i++){
if(expression[i]=='?' || expression[i]=='*'){
position=i; break;
}
}
printf("position=%d\n",position);
if(position!=0){
before = (char*) malloc(sizeof(char)*(position +1 ));
strncpy(before,expression,position);
before[strlen(before)]='\0';
}
if(position!=strlen(expression))
{
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
}
printf("before:%s,after:%s\n",before,after);
Since it is illegal to take length of a string until it is null terminated, you cannot do this:
before[strlen(before)]='\0';
You need to do this instead:
before[pos]='\0';
Same goes for the after part - this is illegal:
after[strlen(after)]='\0';
You need to compute the length of after upfront, and then terminate the string using the pre-computed length.
You can use strsep
char* token;
while ((token = strsep(&string, "?")) != NULL)
{
printf("%s\n", token);
}
Instead of doing it manually, you can also make good use of strtok(). Check the details here.
for(i=0;i<strlen(expression);i++){
This is very inefficient. strlen(expression) gets computed at every loop (so you get an O(n2) complexity, unless the compiler is clever enough to prove that expression stays a constant string and then to move the computation of strlen(expression) before the loop.....). Should be
for (i=0; expression[i]; i++) {
(since expression[i] is zero only when you reached the terminating null byte; for readability reasons you could have coded that condition expression[i] != '\0' if you wanted to)
And as dasblinkenlight answered you should not compute the strlen on an uninitialized buffer.
You should compute once and for all the strlen(expression) i.e. start with:
size_t exprlen = strlen(expression);
and use exprlen everywhere else instead of strlen(expression).
At last, use strdup(3):
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
should be
after = strdup(expression-position+1);
and you forgot a very important thing: always test against failure of memory allocation (and of other system functions); so
before = (char*) malloc(sizeof(char)*(position +1));
should really be (since sizeof(char) is always 1):
before = malloc(position+1);
if (!before) { perror("malloc of before"); exit(EXIT_FAILURE); };
Also, if the string pointed by expression is not used elsewhere you could simply clear the char at the found position (so it becomes a string terminator) like
if (position>=0) expression[position] = `\0`;
and set before=expression; after=expression+position+1;
At last, learn more about strtok(3), strchr(3), strstr(3), strpbrk(3), strsep(3), sscanf(3)
BTW, you should always enable all warnings and debugging info in the compiler (e.g. compile with gcc -Wall -g). Then use the debugger (like gdb) to step by step and understand what is happening... If a memory leak detector like valgrind is available, use it.

C - Array of strings (2D array) and memory allocation gets me unwanted characters

I have a tiny problem with my assignment. The whole program is about tree data structures but I do not have problems with that.
My problem is about some basic stuff: reading strings from user input and then storing them in an array list.
char str[1000];
fgets(str, 1000, stdin);
int x = 0;
int y = 0;
int z = 0;
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
while(str[z] != '\n')
{
list[x][y] = str[z];
z++;
if(str[z] == ',')
{
x++;
y = 0;
list = (char**)realloc(list, (x+1) * sizeof(char*));
list[x] = (char*)malloc((y + 1)*sizeof(char));
z++;
if(str[z] == ' ') // Skips space after the comma
{
z++;
}
}
else if(str[z] == '\n')
{
break;
}
else
{
y++;
list[x] = (char*)realloc(list[x], (y+1)*sizeof(char));
}
}
I pass this list array into another function.
As an example, inputs could be something like
Abcde, Fghijk, Lmnop, Qrstu
and I am trying to split each of these words into the array list.
Abcde
Fghijk
Lmnop
Qrstu
When I try to output the strings I sometimes get weird, excessive characters such as upside down question marks and numbers.
printf("%s ", list[some_number]);
gets me
Fghijk¿
or
Fghijk\200
All of my program works as expected except for this minor problem which I am having trouble solving. Even with the same exact inputs the bugs may or may not appear. I am guessing it has to do with memory allocation?
Thanks for your help!
You need to put '\0' at the end of your new string.
See most of the C library functions such as printf and strlen process strings assuming \0 as the end character of all. Otherwise, they keep on reading the memory out of bounds either making a memory violation or gets some where the value 0 and stops and all the bytes in between in the memory are interpreted to their extended ascii equivalent hence you are getting such a strange behaviour.
So, allocate an extra byte for \0 character and assign it to the last byte.
Either initialize your variables to null, or as tomato said, put a null character at the end of the new string.
C lacks many of the luxuries programmers now take for granted when it comes to memory management. You're on the right path with malloc but that function only allocates memory... it doesn't clear it out. As a result, your variables will have the correct amount of space (critical for reducing memory leaks and overflow errors), but will be filled with garbage. This garbage could be anything, and in your case, it's an upside down question mark. Appropriate, don't you think?
I could be mistaken since I can't run the code myself without more information, but after your
char **list;
list = (char**)malloc((x+1)*sizeof(char));
list[x] = (char*)malloc((y+1)*sizeof(char));
statements, you'll want to do something like this:
list = NULL;
and the like to clear out the garbage.
Furthermore, you may care to use the strlen() function (contained in string.h) to figure out just how many blocks of memory you need to allocate.
Clearing out the spaces you use for variables is a good practice to get into with C. Good to see you learning it as well.

Can this function be any safer ? Looking for tips and your thoughts !

this is somewhat of an odd question.
I wrote a C function. Its 'like' strchr / strrchr. It's supposed to look for a character in a c-string, but going backwards, and return a pointer to it. As c strings are not "null initiated", it also takes a third parameter 'count', indicating the number of chars it should look backwards.
/*
*s: Position from where to start looking for the desired character.
*c: Character to look for.
*count: Amount of tests to be done
*
* Returns NULL if c is not in (s-count,s)
* Returns a pointer to the occurrence of c in s.
*/
char* b_strchr(const char* s,int c,size_t count){
while (count-->0){
if (*s==c) return s;
s--;
}
return NULL;
}
I have done some testing on it, but
Do you see any flaws in it? Security issues or so? Any enhancements? Could it be improved?
And more important: Is this a bad idea?
Some usage.
char* string = "1234567890";
printf("c: %c\n",*b_strchr(string+9,'5',10));//prints 5
printf("c: %c\n",*b_strchr(string+6,'1',7));//prints 1
EDIT: New interface, some changes.
/*
* from: Pointer to character where to start going back.
* begin: Pointer to characther where search will end.
*
* Returns NULL if c is not between [begin,from]
* Otherwise, returns pointer to c.
*/
char* b_strchr(const char* begin,int c,const char* from){
while (begin<=from){
if (*from==c) return from;
from--;
}
return NULL;
}
It's better with the edit, but the interface is still surprising. I'd put the begin parameter (the haystack being searched) as the first parameter, the c parameter
(the needle being searched for) second, and the from parameter (start position of the search) third. That order seems to be idiomatic across a fairly large set of APIs.
The code has an esoteric interface - pass in a pointer to the last character of the string and the length of the string. That will lead to problems using it.
(Alternatively, the code has a bug - you should add count to s before the loop.)
If begin is from, the current code will always return begin, which is not what you want. The code after the loop can just be return NULL. And instead of begin != from in the loop condition, I would use begin < from otherwise you will pointer arithmetic overflow when someone mixes up the order of the parameters.
Edit: on second thought since you want [begin, from] inclusive it should be begin <= from
I wrote a C function. Its 'like' strchr / strrchr.
You've attempted to reinvent strrchr(), so it's not like strchr().
Do you see any flaws in it?
Yes. Several. :-(
Since b_strchr() can return NULL, you shouldn't put it directly into the printf() statement. Deferencing NULL usually results in a segfault.
You may be better off with your favourite variation of ...
char *result;
result = b_strchr(string + 9, 'a', 10));
if (result == NULL)
{
printf("c: NULL\n");
}
else
{
printf("c: %c\n", *result);
}
Also, when
(count >= length of the input string) and the character is not found
you're going to get unpredicable results because s is no longer pointing to a character in the string — s is pointing to memory before the beginning of the string. As an example, try
result = b_strchr(string + 9, 'a', 11));
if (result == NULL)
{
printf("c: NULL\n");
}
else
{
printf("c: %c\n", *result);
}
and see what happens.
Expand your use test cases to include conditions outside of what you know will work successfully. Ask someone else to help you design test cases that will really test your code.
And more important: Is this a bad idea?
As a learning exercise, absolutely not.
However, in this case, for production code you'd be better off sticking to the standard strrchr().

Resources