Handling empty string input leading to segmentation fault - c

I'm encoding the contents of a message struct into a buffer.
int encode(const struct message *msg, unsigned char *buffer, size_t max_size)
{
if (buffer == NULL)
return -1;
unsigned char *buf_pos = buffer;
unsigned char *ep = buffer + max_size;
if (buf_pos + 1 <= ep) {
*buf_pos++ = SYNC_WORD_1;
} else {
return buf_pos - buffer;
}
.
.
.
}
When I call encode(&message, "", 1024); I encounter a segmentation fault as expected. My understanding is that the segfault is caused by an attempt to access memory not allocated to the program, since "" will contain just the null terminator and I'm passing it in place of a pointer.
The problem I'm having is when I try to handle this error. I haven't found a way to identify the invalid input that doesn't either cause a false-positive with valid inputs or another segfault.
So what's the correct way to weed out this kind of input?

This cannot be done.
You're basically asking "given a pointer, how can I ensure that there are n byets of writable space there?" which is a question C doesn't help you with.
This is, at its root, because pointers are just addresses, there is no additional meta information of the kind you're after associated with each pointer value.
You can check the pointer for being NULL, but that's basically the only pointer value you can be certain is invalid. Non-portably (on embedded targets especially) you can get clever and check if the pointer is in various known non-writable regions, but that's still very coarse.

I guess you are not checking the size of the buffer when you copy it in buf_pos
When trying to access buf_pos + 1, you may going into some memory you don't have acces to, causing a segmentation fault.
Did you try usung valgrind on your executable ?

When asking a question about a runtime problem, as this question is doing, post the actual input, the expected output, the actual output and most importantly, post code code that cleanly compiles, is short, and still exhibits the problem.
The following code will handle a pointer to a string that only contains a NUL byte.
However, that is not the only problem. What if the passed in buffer pointer may be pointing to a char array in read only memory, then the posted code would still result in a seg fault event.
int encode(const struct message *msg, unsigned char *buffer, size_t max_size)
{
if (buffer == NULL)
return -1;
if( strlen(buffer) == 0 )
return -1;
unsigned char *buf_pos = buffer;
unsigned char *ep = buffer + max_size;
if (buf_pos + 1 <= ep)
{
*buf_pos++ = SYNC_WORD_1;
}
else
{
return buf_pos - buffer;
}
.
.
.
}
To be able to help you more, you need to post the scenarios under which this function will be called.

Related

How to put a char into a empty pointer of a string in pure C

I want to store a single char into a char array pointer and that action is in a while loop, adding in a new char every time. I strictly want to be into a variable and not printed because I am going to compare the text. Here's my code:
#include <stdio.h>
#include <string.h>
int main()
{
char c;
char *string;
while((c=getchar())!= EOF) //gets the next char in stdin and checks if stdin is not EOF.
{
char temp[2]; // I was trying to convert c, a char to temp, a const char so that I can use strcat to concernate them to string but printf returns nothing.
temp[0]=c; //assigns temp
temp[1]='\0'; //null end point
strcat(string,temp); //concernates the strings
}
printf(string); //prints out the string.
return 0;
}
I am using GCC on Debain (POSIX/UNIX operating system) and want to have windows compatability.
EDIT:
I notice some communication errors with what I actually intend to do so I will explain: I want to create a system where I can input a unlimited amount of characters and have the that input be store in a variable and read back from a variable to me, and to get around using realloc and malloc I made it so it would get the next available char until EOF. Keep in mind that I am a beginner to C (though most of you have probably guess it first) and haven't had a lot of experience memory management.
If you want unlimited amount of character input, you'll need to actively manage the size of your buffer. Which is not as hard as it sounds.
first use malloc to allocate, say, 1000 bytes.
read until this runs out.
use realloc to allocate 2000
read until this runs out.
like this:
int main(){
int buf_size=1000;
char* buf=malloc(buf_size);
char c;
int n=0;
while((c=getchar())!= EOF)
buf[n++] = c;
if(n=>buf_size-1)
{
buf_size+=1000;
buf=realloc(buf, buf_size);
}
}
buf[n] = '\0'; //add trailing 0 at the end, to make it a proper string
//do stuff with buf;
free(buf);
return 0;
}
You won't get around using malloc-oids if you want unlimited input.
You have undefined behavior.
You never set string to point anywhere, so you can't dereference that pointer.
You need something like:
char buf[1024] = "", *string = buf;
that initializes string to point to valid memory where you can write, and also sets that memory to an empty string so you can use strcat().
Note that looping strcat() like this is very inefficient, since it needs to find the end of the destination string on each call. It's better to just use pointers.
char *string;
You've declared an uninitialised variable with this statement. With some compilers, in debug this may be initialised to 0. In other compilers and a release build, you have no idea what this is pointing to in memory. You may find that when you build and run in release, your program will crash, but appears to be ok in debug. The actual behaviour is undefined.
You need to either create a variable on the stack by doing something like this
char string[100]; // assuming you're not going to receive more than 99 characters (100 including the NULL terminator)
Or, on the heap: -
char string* = (char*)malloc(100);
In which case you'll need to free the character array when you're finished with it.
Assuming you don't know how many characters the user will type, I suggest you keep track in your loop, to ensure you don't try to concatenate beyond the memory you've allocated.
Alternatively, you could limit the number of characters that a user may enter.
const int MAX_CHARS = 100;
char string[MAX_CHARS + 1]; // +1 for Null terminator
int numChars = 0;
while(numChars < MAX_CHARS) && (c=getchar())!= EOF)
{
...
++numChars;
}
As I wrote in comments, you cannot avoid malloc() / calloc() and probably realloc() for a problem such as you have described, where your program does not know until run time how much memory it will need, and must not have any predetermined limit. In addition to the memory management issues on which most of the discussion and answers have focused, however, your code has some additional issues, including:
getchar() returns type int, and to correctly handle all possible inputs you must not convert that int to char before testing against EOF. In fact, for maximum portability you need to take considerable care in converting to char, for if default char is signed, or if its representation has certain other allowed (but rare) properties, then the value returned by getchar() may exceed its maximum value, in which case direct conversion exhibits undefined behavior. (In truth, though, this issue is often ignored, usually to no ill effect in practice.)
Never pass a user-provided string to printf() as the format string. It will not do what you want for some inputs, and it can be exploited as a security vulnerability. If you want to just print a string verbatim then fputs(string, stdout) is a better choice, but you can also safely do printf("%s", string).
Here's a way to approach your problem that addresses all of these issues:
#include <stdio.h>
#include <string.h>
#include <limits.h>
#define INITIAL_BUFFER_SIZE 1024
int main()
{
char *string = malloc(INITIAL_BUFFER_SIZE);
size_t cap = INITIAL_BUFFER_SIZE;
size_t next = 0;
int c;
if (!string) {
// allocation error
return 1;
}
while ((c = getchar()) != EOF) {
if (next + 1 >= cap) {
/* insufficient space for another character plus a terminator */
cap *= 2;
string = realloc(string, cap);
if (!string) {
/* memory reallocation failure */
/* memory was leaked, but it's ok because we're about to exit */
return 1;
}
}
#if (CHAR_MAX != UCHAR_MAX)
/* char is signed; ensure defined behavior for the upcoming conversion */
if (c > CHAR_MAX) {
c -= UCHAR_MAX;
#if ((CHAR_MAX != (UCHAR_MAX >> 1)) || (CHAR_MAX == (-1 * CHAR_MIN)))
/* char's representation has more padding bits than unsigned
char's, or it is represented as sign/magnitude or ones' complement */
if (c < CHAR_MIN) {
/* not representable as a char */
return 1;
}
#endif
}
#endif
string[next++] = (char) c;
}
string[next] = '\0';
fputs(string, stdout);
return 0;
}

Concatenation using strcat and realloc produce unexpected errors

I have encountered so called cryptic realloc invalid next size error , I am using gcc on linux my code is
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main()
{
int i;
char *buf;
char loc[120];
buf = malloc(1);
int size;
for(i=0;i<1920;i++)
{
sprintf(loc,"{Fill_next_token = my_next_token%d; Fill_next_token_id = my_next_token_id = my_next_token_id%d}",i,i);
size = strlen(buf)+strlen(loc);
printf("----%d\n",size);
if(!realloc(buf,size))
exit(1);
strcat(buf,loc);
}
}
(mine might be duplicate question) here the solution somewhere lies by avoiding strcat and to use memcpy , But in my case I really want to concatenate the string . Above code works for good for such 920 iterations but in case 1920 realloc gives invalid new size error. Please help to find alternative of concatenating , looking forward to be a helpful question for lazy programmers like me .
Your code has several issues:
You are not accounting for null terminator when deciding on the new length - it should be size = strlen(buf)+strlen(loc)+1;
You are ignoring the result of realloc - you need to check it for zero, and then assign it back to buf
You do not initialize buf to an empty string - this would make the first call of strlen produce undefined behavior (i.e. you need to add *buf = '\0';)
Once you fix these mistakes, your code should run correctly:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main() {
int i;
char *buf= malloc(1);
*buf='\0';
char loc[120];
for(i=0;i<1920;i++) {
sprintf(loc,"{Fill_next_token = my_next_token%d; Fill_next_token_id = my_next_token_id = my_next_token_id%d}",i,i);
int size = strlen(buf)+strlen(loc)+1;
printf("----%d\n",size);
char *tmp = realloc(buf,size);
if(!tmp) exit(1);
buf = tmp;
strcat(buf, loc);
}
}
Demo.
buf is not a valid string so strcat() will fail since it expects a \0 terminated string.
If you want to realloc() buf then you should assign the return value of realloc() to buf which you are not doing.
char *temp = realloc(buf,size+1);
if(temp != NULL)
buf = temp;
Point 1. Always use the return value of realloc() to access the newly allocated memory.
Point 2. strcat() needs a null-terminating string. Check the first iteration case.

Segmentation Fault

So I need to make a specific hashcode function that meets a specific algorithm. The algorithm isn't really important in the context of this question. I'm getting a seg fault and am not sure how to fix it. I debugged it in gdb and found out it's from accessing an invalid memory address.
here's my code:
int hash_code(const char* str){
int len = strlen(str);
char* dst;
if(len == 0 )
return 0;
else{
strncpy(dst, str, (len - 1));
return (hash_code(dst) * 65599) + str[len-1];
}
}
I'm pretty confident that it's from the dst, but I'm not sure how to work around it, to not get the seg fault. What would I use or initialize dst with to avoid this?
strncpy does not null-terminate its output if the buffer is too small. For this reason, many people consider it a poor choice of function in almost all circumstances.
Your code has another problem in that dst does not point anywhere, but you try to write characters through it. Where do you imagine those characters are going? Likely this causes your segfault, trying to write characters to a random memory location that you have not allocated.
Assuming you want to stick with the recursive approach: Instead of making a copy of the string every time, change your function to pass the length of the string. Then you don't need to allocate any memory, nor waste any time calling strlen:
unsigned int hash_code(const char *str, size_t len)
{
if ( len == 0 )
return 0;
return hash_code(str, len - 1) * 65599 + str[len - 1];
}
Note - to avoid integer overflow problems, use an unsigned type for the hash value.

Custom getLine() function for c

I need a function/method that will take in a char array and set it to a string read from stdin. It needs to return the last character read as its return type, so I can determine if it reached the end of a line or the end of file marker.
here is what I have so far, and I kind of based it off of code from here
UPDATE: I changed it, but now it just crashes upon hitting enter after text. I know this way is inefficient, and char is not the best for EOF check, but for now I am just trying to get it to return the string. I need it to do it in this fashion and no other fashion. I need the string to be the exact length of the line, and to return a value that is either the newline or EOF int which I believe can still be used in a char value.
This program is in C not C++
char getLine(char **line);
int main(int argc, char *argv[])
{
char *line;
char returnVal = 0;
returnVal = getLine(&line);
printf("%s", line);
free(line);
system("pause");
return 0;
}
char getLine(char **line) {
unsigned int lengthAdder = 1, counter = 0, size = 0;
char charRead = 0;
*line = malloc(lengthAdder);
while((charRead = getc(stdin)) != EOF && charRead != '\n')
{
*line[counter++] = charRead;
*line = realloc(*line, counter);
}
*line[counter] = '\0';
return charRead;
}
Thank you for any help in advance!
You're assigning the result of malloc() to a local copy of line, so after the getLine() function returns it's not modified (albeit you think it is). What you have to do is either return it (as opposed to use an output parameter) or pass its address (pass it 'by reference'):
void getLine(char **line)
{
*line = malloc(length);
// etc.
}
and call it like this:
char *line;
getLine(&line);
Your key problem is that line pointer value does not propagate out of the getLine() function. The solution is to pass pointer to the line pointer to the function as a parameter instead - calling it like getLine(&line); while the function would be defined as taking parameter char **line. In the function, on all places where you now work with line, you would work with *line instead, i.e. dereferencing the pointer to a pointer and working with the value of the variable in main() where the pointer leads. Hope this is not too confusing. :-) Try to draw it on a piece of paper.
(A tricky part - you must change line[counter] to (*line)[counter] because you first need to dereference the pointer to the string, and only then to access a specific character in the string.)
There is a couple of other problems with your code:
You use char as the type for charRead. However, the EOF constant cannot be represented using char, you need to use int - both as the type of charRead and return value of getLine(), so that you can actually distringuish between a newline and end of file.
You forgot to return the last char read from your getLine() function. :-)
You are reallocating the buffer after each character addition. This is not terribly efficient and therefore is a rather ugly programming practice. It is not too difficult to use another variable to track the amount of space allocated and then (i) start with allocating a reasonable chunk of memory, e.g. 64 bytes, so that ideally you will never reallocate (ii) enlarge the allocation only if you need to based on comparing the counter and your allocation size tracker. Two reallocation strategies are common - either doubling the size of the allocation or increasing the allocation by a fixed step.
The way you use realloc is not correct. If it returns NULL then the memory block will be lost.
It is better to use realloc in this way:
char *tmp;
...
tmp = realloc(line, counter);
if(tmp == NULL)
ERROR, TRY TO SOLVE IT
line = tmp;

Implementing a string copy function in C

At a recent job interview, I was asked to implement my own string copy function. I managed to write code that I believe works to an extent. However, when I returned home to try the problem again, I realized that it was a lot more complex than I had thought. Here is the code I came up with:
#include <stdio.h>
#include <stdlib.h>
char * mycpy(char * d, char * s);
int main() {
int i;
char buffer[1];
mycpy(buffer, "hello world\n");
printf("%s", buffer);
return 0;
}
char * mycpy (char * destination, char * source) {
if (!destination || !source) return NULL;
char * tmp = destination;
while (*destination != NULL || *source != NULL) {
*destination = *source;
destination++;
source++;
}
return tmp;
}
I looked at some other examples online and found that since all strings in C are null-terminated, I should have read up to the null character and then appended a null character to the destination string before exiting.
However one thing I'm curious about is how memory is being handled. I noticed if I used the strcpy() library function, I could copy a string of 10 characters into a char array of size 1. How is this possible? Is the strcpy() function somehow allocating more memory to the destination?
Good interview question has several layers, to which to candidate can demonstrate different levels of understanding.
On the syntactic 'C language' layer, the following code is from the classic Kernighan and Ritchie book ('The C programming language'):
while( *dest++ = *src++ )
;
In an interview, you could indeed point out the function isn't safe, most notably the buffer on *dest isn't large enough. Also, there may be overlap, i.e. if dest points to the middle of the src buffer, you'll have endless loop (which will eventually creates memory access fault).
As the other answers have said, you're overwriting the buffer, so for the sake of your test change it to:
char buffer[ 12 ];
For the job interview they were perhaps hoping for:
char *mycpy( char *s, char *t )
{
while ( *s++ = *t++ )
{
;
}
return s;
}
No, it's that strcpy() isn't safe and is overwriting the memory after it, I think. You're supposed to use strncpy() instead.
No, you're writing past the buffer and overwriting (in this case) the rest of your stack past buffer. This is very dangerous behavior.
In general, you should always create methods that supply limits. In most C libraries, these methods are denoted by an n in the method name.
C does not do any run time bounds checking like other languages(C#,Java etc). That is why you can write things past the end of the array. However, you won't be able to access that string in some cases because you might be encroaching upon memory that doesn't belong to you giving you a segementation fault. K&R would be a good book to learn such concepts.
The strcpy() function forgoes memory management entirely, therefore all allocation needs to be done before the function is called, and freed afterward when necessary. If your source string has more characters than the destination buffer, strcpy() will just keep writing past the end of the buffer into unallocated space, or into space that's allocated for something else.
This can be very bad.
strncpy() works similarly to strcpy(), except that it allows you to pass an additional variable describing the size of the buffer, so the function will stop copying when it reaches this limit. This is safer, but still relies on the calling program to allocate and describe the buffer properly -- it can still go past the end of the buffer if you provide the wrong length, leading to the same problems.
char * mycpy (char * destination, char * source) {
if (!destination || !source) return NULL;
char * tmp = destination;
while (*destination != NULL || *source != NULL) {
*destination = *source;
destination++;
source++;
}
return tmp;
}
In the above copy implementation, your tmp and destination are having the same data. Its better your dont retrun any data, and instead let the destination be your out parameter. Can you rewrite the same.
The version below works for me. I'm not sure if it is bad design though:
while(source[i] != '\0' && (i<= (MAXLINE-1)))
{
dest[i]=source[i];
++i;
}
In general it's always a good idea to have const modifier where it's possible, for example for the source parameter.

Resources