Garbage after string - c

I've written code to copy a string into another string but with a space between each character. When I run the code there is "garbage" after the string. However, if the for loop at the end is uncommented, there is no garbage after. Anyone know why this is happening?
#include<stdio.h>
#include<string.h>
#define MAX_SIZE 20
main ()
{
char name[MAX_SIZE+ 1];
char cpy[(MAX_SIZE * 2) + 1];
gets(name);
int i = 0;
while (name[i] != '\0' && i < MAX_SIZE)
{
cpy[(i * 2)] = name[i];
cpy[(i * 2) + 1] = ' ';
i++;
}
cpy[strlen(cpy)] = '\0';
printf("%s\n", cpy);
//for (i = 0; i < strlen(cpy); ++i) {
// printf("%c", cpy[i]);
//}
}

The line
cpy[strlen(cpy)] = '\0';
won't work since cpy isn't null terminated so strlen will read beyond the end of name until it either crashes or finds a zero byte of memory. You can fix this by changing that line to
cpy[i*2] = '\0';
If uncommenting the for loop at the end of your function appears to fix things, I can only guess that i gets reset to 0 before your printf call, meaning that printf finds a null terminator on the stack immediately after cpy. If this is what's happening, its very much undefined behaviour so cannot be relied upon.

while (name[i] != '\0' && i < MAX_SIZE)
{
cpy[(i * 2)] = name[i];
cpy[(i * 2) + 1] = ' ';
i++;
}
cpy[(i * 2)] = 0x0;
You have to null terminate the string.

Because you know that you are working with a string, it's a good thing if you initialize your "cpy" array with null character :
char cpy[(MAX_SIZE * 2) + 1] = "\0";
Otherwise, I agreed with simonc answer.

For the sake of completeness:
char* pcpy = cpy;
for (char const* p = fgets(name,sizeof(name)/sizeof(*name),stdin); p && *p; ++p) {
*pcpy++ = *p;
*pcpy++ = ' ';
}
*pcpy = 0;
You should use fgets, not gets in order to prevent your stack to be corrupted by data overruns. Second, You must manually terminate the string stored in the cpy array, since strlen simply counts the number of characters until the very first zero. Hence, if you haven't already terminated cpy the result of strlen(cpy) will be undefined and most likely crash your program.

Related

Trying to delete a specific character from a string in C?

I'm trying to delete a specific character (?) from the end of a string and return a pointer to a string, but it's not removing it at all at the moment. What am I doing wrong? Is there a better way to go about it?
char * word_copy = malloc(strlen(word)+1);
strcpy(word_copy, word);
int length = strlen(word_copy);
int i = 0;
int j = 0;
for (i = 0; word_copy[i] != '\0'; i++) {
if (word_copy[length - 1] == '?' && i == length - 1){
break;
}
}
for (int j = i; word_copy[j] != '\0'; j++) {
word_copy[j] = word_copy[j+1];
}
word = strdup(word_copy);
I'm immediately seeing a couple of problems.
The first for loop does nothing. It doesn't actually depend on i so it could be replaced with a single if statement.
if (word_copy[length - 1] == '?') {
i = length - 1;
} else {
i = length + 1;
}
The second for loop also acts as an if statement since it starts at the end of the string and can only ever run 0 or 1 times.
You could instead do something like this to remove the ?. This code will return a new malloced string with the last character removed if its ?.
char *remove_question_mark(char *word) {
unsigned int length = strlen(word);
if (length == 0) {
return calloc(1, 1);
}
if (word[length - 1] == '?') {
char *word_copy = malloc(length);
// Copy up to '?' and put null terminator
memcpy(word_copy, word, length - 1);
word_copy[length - 1] = 0;
return word_copy;
}
char *word_copy = malloc(length + 1);
memcpy(word_copy, word, length + 1);
return word_copy;
}
Or if you are feeling lazy, you could also just make the last character the new null terminator instead. Its essentially creates a memory leak of 1 byte, but that may be an acceptable loss. It should also be a fair bit faster since it doesn't need to allocate any new memory or copy the previous string.
unsigned int length = strlen(word);
if (length > 0 && word[length - 1] == '?') {
word[length] = 0;
}

Odd behavior removing duplicate characters in a C string

I am using the following method in a program used for simple substitution-based encryption. This method is specifically used for removing duplicate characters in the encryption/decryption key.
The method is functional, as is the rest of the program, and it works for 99% of the keys I've tried. However, when I pass it the key "goodmorning" or any key consisting of the same letters in any order (e.g. "dggimnnooor"), it fails. Further, keys containing more characters than "goodmorning" work, as well as keys with less characters.
I ran the executable through lldb with the same arguments and it works. I've cloned my repository on a machine running CentOS, and it works as is.
But I get no warnings or errors on compile.
//setting the key in main method
char * key;
key = removeDuplicates(argv[2]);
//return 1 if char in word
int targetFound(char * charArr, int num, char target){
int found = 0;
if(strchr(charArr,target))
found = 1;
return found;
}
//remove duplicate chars
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len * sizeof(char));
if (result == NULL)
errorHandler(2);
char ch;
int i;
int j;
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
}
}
return result;
}
Per request: if "feather" was passed in to this function the resulting string would be "feathr".
As R Sahu already said, you are not terminating your string with a NUL character. Now I'm not going to explain why you need to do this, but you always need to terminate your strings with a NUL character, which is '\0'. If you want to know why, head over here for a good explanation. However this is not the only problem with your code.
The main problem is that the function strchr that you are calling to find out if your result already contains some character expects you to pass a NUL terminated string, but your variable is not NUL terminated, because you keep appending characters to it.
To solve your problem, I would suggest you to use a map instead. Map all the characters you already used and if they aren't in the map add them both to the map and the result. This is simpler (no need to call strchr or any other function), faster (no need to scan all the string every time), and most importantly correct.
Here's a simple solution:
char *removeDuplicates(char *word){
char *result, *map, ch;
int i, j;
map = calloc(256, 1);
if (map == NULL)
// Maybe you want some other number here?
errorHandler(2);
// Add one char for the NUL terminator:
result = malloc(strlen(word) + 1);
if (result == NULL)
errorHandler(2);
for(i = 0, j = 0; word[i] != '\0'; i++) {
ch = word[i];
// Check if you already saw this character:
if(map[(size_t)ch] == 0) {
// If not, add it to the map:
map[(size_t)ch] = 1;
// And to your result string:
result[j] = ch;
j++;
}
}
// Correctly NUL terminate the new string;
result[j] = '\0';
return result;
}
Why does this work on other machines, but not on your machine?
You are being a victim of undefined behavior. Different compilers on different systems treat undefined behavior differently. For example, GCC may decide to not do anything in this particular case and make strchr just keep searching in the memory until it founds a '\0' character, and this is exactly what happens. Your program keeps searching for the NUL terminator and never stops because who knows where a '\0' could be in memory after your string? This is both dangerous and incorrect, because the program is not reading inside the memory reserved for it, so for example, another compiler could decide to stop the search there, and give you a correct result. This however is not something to take for granted, and you should always avoid undefined behavior.
I see couple of problems in your code:
You are not terminating the output with the null character.
You are not allocating enough memory to hold the null character when there are no duplicate characters in the input.
As a consequence, your program has undefined behavior.
Change
result = malloc (len * sizeof(char));
to
result = malloc (len+1); // No need for sizeof(char)
Add the following before the function returns.
result[j] = '\0';
The other problem, the main one, is that you are using strchr on result, which is not a null terminated string when you call targetFound. That also caused undefined behavior. You need to use:
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len+1);
if (result == NULL)
{
errorHandler(2);
}
char ch;
int i;
int j;
// Make result an empty string.
result[0] = '\0';
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
// Null terminate again so that next call to targetFound()
// will work.
result[j] = '\0';
}
}
return result;
}
A second option is to not use strchr in targetFound. Use num instead and implement the equivalent functionality.
int targetFound(char * charArr, int num, char target)
{
for ( int i = 0; i < num; ++i )
{
if ( charArr[i] == target )
{
return 1;
}
}
return 0;
}
That will allow you to avoid assigning the null character to result so many times. You will need to null terminate result only at the end.
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len+1);
if (result == NULL)
{
errorHandler(2);
}
char ch;
int i;
int j;
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
}
}
result[j] = '\0';
return result;
}

Array of separators for strtok() function

I want to divide my text into words. Separator is any symbol except latin letters.
Here i have loop, filling my separators array:
for(i = 0; i <= 127; i ++) {
if(!isalpha(i)) {
separators = (char*) realloc(separators, (length + 1) * sizeof(char));
separators[length] = i;
length ++;
}
}
Then i use it here:
char text[] = "hello world!";
char** words = NULL;
char* p = strtok(text, separators);
int cnt = 0;
while(p != NULL) {
words = (char**) realloc(words, (cnt + 1) * sizeof(char*));
words[cnt] = strdup(p);
cnt ++;
p = strtok(NULL, separators);
}
for(i = 0; i < pnt; i ++) {
printf(" - %d %s\n", i + 1, words[i]);
}
As a result a have:
-1 hello world!
If separators array is replaced by " " is works well.
What's the problem with array?
The first value of i in your loop, 0, is not alpha; so a 0 will be stored as the very first byte in the separator array.
strtok() expects to receive the separator list as a string, and strings in C are terminated by a zero. So strtok() receives a sequence beginning with a terminator, and it thinks is an empty list, with no separators at all.
You can start the array from 1 to get rid of that interfering zero:
for (i = 1; i <= 127; i ++) {
if(!isalpha(i)) {
separators = (char*) realloc(separators, (length + 1) * sizeof(char));
separators[length] = i;
length ++;
}
}
// then you also need to terminate it, otherwise strtok() will continue reading
// past the end of the array, with unpredictable (but very likely undesirable) results.
separators[length] = 0x0;
You might also want instead to allocate the string only once (you waste some space, but save some time);
#define MAX_SEPARATORS 128
separators = (char*) malloc(separators, MAX_SEPARATORS * sizeof(char));
for (i = 1; i < MAX_SEPARATORS; i++) {
if (!isalpha(i)) {
separators[length++] = i;
}
}
separators[length] = 0x0;
You have to remember that the strtok wants the separators as a string, complete with a string terminator character ('\0'). Unfortunately you don't have that terminator in the separators "string", so strtok will read that one beyond what you have allocated leading to undefined behavior.

Why do I keep getting extra characters at the end of my string?

I have the string, "helLo, wORld!" and I want my program to change it to "Hello, World!". My program works, the characters are changed correctly, but I keep getting extra characters after the exclamation mark. What could I be doing wrong?
void normalize_case(char str[], char result[])
{
if (islower(str[0]) == 1)
{
result[0] = toupper(str[0]);
}
for (int i = 1; str[i] != '\0'; i++)
{
if (isupper(str[i]) == 1)
{
result[i] = tolower(str[i]);
}
else if (islower(str[i]) == 1)
{
result[i] = str[i];
}
if (islower(str[i]) == 0 && isupper(str[i]) == 0)
{
result[i] = str[i];
}
if (str[i] == ' ')
{
result[i] = str[i];
}
if (str[i - 1] == ' ' && islower(str[i]) == 1)
{
result[i] = toupper(str[i]);
}
}
}
You are not null terminating result so when you print it out it will keep going until a null is found. If you move the declaration of i to before the for loop:
int i ;
for ( i = 1; str[i] != '\0'; i++)
you can add:
result[i] = '\0' ;
after the for loop, this is assuming result is large enough.
Extra random-ish characters at the end of a string usually means you've forgotten to null-terminate ('\0') your string. Your loop copies everything up to, but not including, the terminal null into the result.
Add result[i] = '\0'; after the loop before you return.
Normally, you treat the isxxxx() functions (macros) as returning a boolean condition, and you'd ensure that you only have one of the chain of conditions executed. You'd do that with more careful use of else clauses. Your code actually copies str[i] multiple times if it is a blank. In fact, I think you can compress your loop to:
int i;
for (i = 1; str[i] != '\0'; i++)
{
if (isupper(str[i]))
result[i] = tolower(str[i]);
else if (str[i - 1] == ' ' && islower(str[i]))
result[i] = toupper(str[i]);
else
result[i] = str[i];
}
result[i] = '\0';
If I put result[i] outside of the for loop, won't the compiler complain about i?
Yes, it will. In this context, you need i defined outside the loop control, because you need the value after the loop. See the amended code above.
You might also note that your pre-loop code quietly skips the first character of the string if it is not lower-case, leaving garbage as the first character of the result. You should really write:
result[0] = toupper(str[0]);
so that result[0] is always set.
You should add a statement result[i] = '\0' at the end of the loop because in the C language, the string array should end with a special character '\0', which tells the compiler "this is the end of the string".
I took the liberty of simplifying your code as a lot of the checks you do are unnecessary. The others have already explained some basic points to keep in mind:
#include <stdio.h> /* for printf */
#include <ctype.h> /* for islower and the like */
void normalise_case(char str[], char result[])
{
if (islower(str[0]))
{
result[0] = toupper(str[0]); /* capitalise at the start */
}
int i; /* older C standards (pre C99) won't like it if you don't pre-declare 'i' so I've put it here */
for (i = 1; str[i] != '\0'; i++)
{
result[i] = str[i]; /* I've noticed that you copy the string in each if case, so I've put it here at the top */
if (isupper(result[i]))
{
result[i] = tolower(result[i]);
}
if (result[i - 1] == ' ' && islower(result[i])) /* at the start of a word, capitalise! */
{
result[i] = toupper(result[i]);
}
}
result[i] = '\0'; /* this has already been explained */
}
int main()
{
char in[20] = "tESt tHIs StrinG";
char out[20] = ""; /* space to store the output */
normalise_case(in, out);
printf("%s\n", out); /* Prints 'Test This String' */
return 0;
}

Replacing spaces with %20 in C

I am writing a fastcgi application for my site in C. Don't ask why, leave all that part.
Just help me with this problem- I want to replace spaces in the query string with %20.
Here's the code I'm using, but I don't see 20 in the output, only %. Where's the problem?
Code:
unsigned int i = 0;
/*
* Replace spaces with its hex %20
* It will be converted back to space in the actual processing
* They make the application segfault in strtok_r()
*/
char *qstr = NULL;
for(i = 0; i <= strlen(qry); i++) {
void *_tmp;
if(qry[i] == ' ') {
_tmp = realloc(qstr, (i + 2) * sizeof(char));
if(!_tmp) error("realloc() failed while allocting string memory (space)\n");
qstr = (char *) _tmp;
qstr[i] = '%'; qstr[i + 1] = '2'; qstr[i + 2] = '0';
} else {
_tmp = realloc(qstr, (i + 1) * sizeof(char));
if(!_tmp) error("realloc() failed while allocating string memory (not space)\n");
qstr = (char *) _tmp;
qstr[i] = qry[i];
}
}
In the code, qry is char *, comes as a actual parameter to the function.
I tried with i + 3, 4, 5 in realloc() in the space replacer block, no success.
String-handling in C can be tricky. I'd suggest going through the string first, counting the spaces, and then allocating a new string of the appropriate size (original string size + (number of spaces * 2)). Then, loop through the original string, maintaining a pointer (or index) to the position in both the new string and the original one. (Why two pointers? Because every time you encounter a space, the pointer into the new string will get two characters ahead of the pointer into the old one.)
Here's some code that should do the trick:
int new_string_length = 0;
for (char *c = qry; *c != '\0'; c++) {
if (*c == ' ') new_string_length += 2;
new_string_length++;
}
char *qstr = malloc((new_string_length + 1) * sizeof qstr[0]);
char *c1, *c2;
for (c1 = qry, c2 = qstr; *c1 != '\0'; c1++) {
if (*c1 == ' ') {
c2[0] = '%';
c2[1] = '2';
c2[2] = '0';
c2 += 3;
}else{
*c2 = *c1;
c2++;
}
}
*c2 = '\0';
qstr[i] = '%'; qstr[i + 1] = '2'; qstr[i + 2] = '0';
That line writes three characters to your output buffer, so the next character you write needs to be written at qstr[i+3]. However, you only step i by 1, so the next character is written to qstr[i+1], overwriting the '2'.
You will need to keep separate indexes for stepping through qry & qstr.
I agree with David.
It is advisable to do it in two-steps: in the first loop you just count the spaces:
int spaceCounter=0;
const int sourceLen = strlen(qry);
for(int i = 0; i < sourceLen; ++i)
if ( qry[i] == ' ')
++spaceCounter;
char* newString = (char*)malloc(sourceLen + 3*spaceCounter*sizeof(char) + 1)
//check for null!
for(int i = 0; i < sourceLen; ++i)
if ( qry[i] == ' ')
{
*newString++ = '%';
*newString++ = '2';
*newString++ = '0';
}
else
*newString++ = qry[i];
*newString = '\0';
Warning: code not tested.
You are assigning using the same counter I you will need to have 2 counters since the strings have different lengths
your else case assigns qstr[i] = qry[i]; after you have written the %20 you are at least off by 2 on the result string.
This is known as url encode. You can refer to this page to see some similar implementation: http://www.geekhideout.com/urlcode.shtml
char* toHexSpace(const char *s)
{
char *b=strcpy(malloc(3*strlen(s)+1),s),*p;
while( p=strchr(b,' ') )
{
memmove(p+3,p+1,strlen(p));
strncpy(p,"%20",3);
}
return b;
}
needs "free" in calling context.

Resources