Segmentation Fault - c

So I need to make a specific hashcode function that meets a specific algorithm. The algorithm isn't really important in the context of this question. I'm getting a seg fault and am not sure how to fix it. I debugged it in gdb and found out it's from accessing an invalid memory address.
here's my code:
int hash_code(const char* str){
int len = strlen(str);
char* dst;
if(len == 0 )
return 0;
else{
strncpy(dst, str, (len - 1));
return (hash_code(dst) * 65599) + str[len-1];
}
}
I'm pretty confident that it's from the dst, but I'm not sure how to work around it, to not get the seg fault. What would I use or initialize dst with to avoid this?

strncpy does not null-terminate its output if the buffer is too small. For this reason, many people consider it a poor choice of function in almost all circumstances.
Your code has another problem in that dst does not point anywhere, but you try to write characters through it. Where do you imagine those characters are going? Likely this causes your segfault, trying to write characters to a random memory location that you have not allocated.
Assuming you want to stick with the recursive approach: Instead of making a copy of the string every time, change your function to pass the length of the string. Then you don't need to allocate any memory, nor waste any time calling strlen:
unsigned int hash_code(const char *str, size_t len)
{
if ( len == 0 )
return 0;
return hash_code(str, len - 1) * 65599 + str[len - 1];
}
Note - to avoid integer overflow problems, use an unsigned type for the hash value.

Related

Why is this use of strcpy considered bad?

I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain

Local variables and memory in c

I'm somewhat new to C and am wondering about certain things about memory allocation. My function is as follows:
size_t count_nwords(const char* str) {
//char* copied_str = strdup(str); // because 'strtok()' alters the string it passes through
char copied_str[strlen(str)];
strcpy(copied_str, str);
size_t count = 1;
strtok(copied_str, " ");
while(strtok(NULL, " ") != 0) {
count++;
}
//free(copied_str);
return count;
}
This function counts the amount of words in a string (the delimiter is a space, ie ""). I do not want the string passed in argument to be modified.
I have two questions:
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy() is sufficient and faster, but I am not certain.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the function is robust? Or is using size_t nwords = count_nwords(copied_input); completely safe and will always properly get the returned value?
Thank you!
EDIT: I've accepted the only answer that concerned my questions precisely, but I advise reading the other answers as they provide good insights regarding errors I had made in my code.
Failure to account for the null character
// char copied_str[strlen(str)];
char copied_str[strlen(str) + 1];
strcpy(copied_str, str);
Wrong algorithm
Even with above fix, code returns 1 with count_nwords(" ")
Unnecessary copying of string
strtok() not needed here. A copy of the string is not needed.
Alternative: walk the string.
size_t count_nwords(const char* str) {
size_t count = 0;
while (*str) {
while (isspace((unsigned char) *str)) {
str++;
}
if (*str) {
count++;
while (!isspace((unsigned char) *str) && *str) {
str++;
}
}
}
return count;
}
Another option is the state-loop approach where you continually loop over each character keeping track of the state of your count with a simple flag. (you are either in a word reading characters or you are reading spaces). The benefit being you have only a single loop involved. A short example would be:
size_t count_words (const char *str)
{
size_t words = 0;
int in_word = 0;
while (*str) {
if (isspace ((unsigned char)*str))
in_word = 0;
else {
if (!in_word)
words++;
in_word = 1;
}
str++;
}
return words;
}
It is worth understanding all techniques. isspace requires the inclusion of ctype.h.
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy()
is sufficient and faster, but I am not certain.
Your solution is clean and works well so don't bother. The only point is that you are using VLA which is now optional, then using strdup would be less standard prone. Now regarding performance, as it is not specified how VLAs are implemented, performance may vary from compiler/platform to compiler/platform (gcc is known to use stack for VLAs but any other compiler may use heap). We only know that strdup allocates on the heap, that's all. I doubt that performance problem will come from such a choice.
Note: you allocation size is wrong and should be at least strlen(str)+1.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the
function is robust? Or is using size_t nwords =
count_nwords(copied_input); completely safe and will always properly
get the returned value?
Managing return values and memory suitable for is a concern of the compiler. Usually, these values are transfered on/from the stack (have some reading on "stack frame"). As you may suspect, space is allocated on the stack for it just before the call and is deallocated after the call (as soon as you discard or copy the returned value).

Handling empty string input leading to segmentation fault

I'm encoding the contents of a message struct into a buffer.
int encode(const struct message *msg, unsigned char *buffer, size_t max_size)
{
if (buffer == NULL)
return -1;
unsigned char *buf_pos = buffer;
unsigned char *ep = buffer + max_size;
if (buf_pos + 1 <= ep) {
*buf_pos++ = SYNC_WORD_1;
} else {
return buf_pos - buffer;
}
.
.
.
}
When I call encode(&message, "", 1024); I encounter a segmentation fault as expected. My understanding is that the segfault is caused by an attempt to access memory not allocated to the program, since "" will contain just the null terminator and I'm passing it in place of a pointer.
The problem I'm having is when I try to handle this error. I haven't found a way to identify the invalid input that doesn't either cause a false-positive with valid inputs or another segfault.
So what's the correct way to weed out this kind of input?
This cannot be done.
You're basically asking "given a pointer, how can I ensure that there are n byets of writable space there?" which is a question C doesn't help you with.
This is, at its root, because pointers are just addresses, there is no additional meta information of the kind you're after associated with each pointer value.
You can check the pointer for being NULL, but that's basically the only pointer value you can be certain is invalid. Non-portably (on embedded targets especially) you can get clever and check if the pointer is in various known non-writable regions, but that's still very coarse.
I guess you are not checking the size of the buffer when you copy it in buf_pos
When trying to access buf_pos + 1, you may going into some memory you don't have acces to, causing a segmentation fault.
Did you try usung valgrind on your executable ?
When asking a question about a runtime problem, as this question is doing, post the actual input, the expected output, the actual output and most importantly, post code code that cleanly compiles, is short, and still exhibits the problem.
The following code will handle a pointer to a string that only contains a NUL byte.
However, that is not the only problem. What if the passed in buffer pointer may be pointing to a char array in read only memory, then the posted code would still result in a seg fault event.
int encode(const struct message *msg, unsigned char *buffer, size_t max_size)
{
if (buffer == NULL)
return -1;
if( strlen(buffer) == 0 )
return -1;
unsigned char *buf_pos = buffer;
unsigned char *ep = buffer + max_size;
if (buf_pos + 1 <= ep)
{
*buf_pos++ = SYNC_WORD_1;
}
else
{
return buf_pos - buffer;
}
.
.
.
}
To be able to help you more, you need to post the scenarios under which this function will be called.

best practice for returning a variable length string in c

I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.

Segmentation Fault in Simple Offset Encryption

Alright guys, this is my first post here. The most recent assignment in my compsci class has us coding a couple of functions to encode and decode strings based on a simple offset. So far in my encryption function I am trying to convert uppercase alphas in a string to their ASCII equivalent(an int), add the offset(and adjust if the ASCII value goes past 'Z'), cast that int back to a char(the new encrypted char) and put it into a new string. What I have here compiles fine, but it gives a Segmentation Fault (core dumped) error when I run it and input simple uppercase strings. Where am I going wrong here? (NOTE: there are some commented out bits from an attempt at solving the situation that created some odd errors in main)
#include <stdio.h>
#include <string.h>
#include <ctype.h>
//#include <stdlib.h>
char *encrypt(char *str, int offset){
int counter;
char medianstr[strlen(str)];
char *returnstr;// = malloc(sizeof(char) * strlen(str));
for(counter = 0; counter < strlen(str); counter++){
if(isalpha(str[counter]) && isupper(str[counter])){//If the character at current index is an alpha and uppercase
int charASCII = (int)str[counter];//Get ASCII value of character
int newASCII;
if(charASCII+offset <= 90 ){//If the offset won't put it outside of the uppercase range
newASCII = charASCII + offset;//Just add the offset for the new value
medianstr[counter] = (char)newASCII;
}else{
newASCII = 64 + ((charASCII + offset) - 90);//If the offset will put it outside the uppercase range, add the remaining starting at 64(right before A)
medianstr[counter] = (char)newASCII;
}
}
}
strcpy(returnstr, medianstr);
return returnstr;
}
/*
char *decrypt(char *str, int offset){
}
*/
int main(){
char *inputstr;
printf("Please enter the string to be encrypted:");
scanf("%s", inputstr);
char *encryptedstr;
encryptedstr = encrypt(inputstr, 5);
printf("%s", encryptedstr);
//free(encryptedstr);
return 0;
}
You use a bunch of pointers, but never allocate any memory to them. That will lead to segment faults.
Actually the strange thing is it seems you know you need to do this as you have the code in place, but you commented it out:
char *returnstr;// = malloc(sizeof(char) * strlen(str));
When you use a pointer you need to "point" it to allocated memory, it can either point to dynamic memory that you request via malloc() or static memory (such as an array that you declared); when you're done with dynamic memory you need to free() it, but again you seem to know this as you commented out a call to free.
Just a malloc() to inputstr and one for returnstr will be enough to get this working.
Without going any further the segmentation fault comes from your use of scanf().
Segmentation fault occurs at scanf() because it tries to write to *inputstr(a block of location inputstr is pointing at); it isn't allocated at this point.
To invoke scanf() you need to feed in a pointer in whose memory address it points to is allocated first.
Naturally, to fix the segmentation fault you want to well, allocate the memory to your char *inputstr.
To dynamically allocate memory of 128 bytes(i.e., the pointer will point to heap):
char *inputstr = (char *) malloc(128);
Or to statically allocate memory of 128 bytes(i.e., the pointer will point to stack):
char inputstr[128];
There is a lot of complexity in the encrypt() function that isn't really necessary. Note that computing the length of the string on each iteration of the loop is a costly process in general. I noted in a comment:
What's with the 90 and 64? Why not use 'A' and 'Z'? And you've commented out the memory allocation for returnstr, so you're copying via an uninitialized pointer and then returning that? Not a recipe for happiness!
The other answers have also pointed out (accurately) that you've not initialized your pointer in main(), so you don't get a chance to dump core in encrypt() because you've already dumped core in main().
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
char *encrypt(char *str, int offset)
{
int len = strlen(str) + 1;
char *returnstr = malloc(len);
if (returnstr == 0)
return 0;
for (int i = 0; i < len; i++)
{
char c = str[i];
if (isupper((unsigned char)c))
{
c += offset;
if (c > 'Z')
c = 'A' + (c - 'Z') - 1;
}
returnstr[i] = c;
}
return returnstr;
}
Long variable names are not always helpful; they make the code harder to read. Note that any character for which isupper() is true also satisfies isalpha(). The cast on the argument to isupper() prevents problems when the char type is signed and you have data where the unsigned char value is in the range 0x80..0xFF (the high bit is set). With the cast, the code will work correctly; without, you can get into trouble.

Resources