Say I have this code
char *string = "";
string += 'A';
string += 'B';
string += 'C';
printf("%s\n", string);
It just prints an empty line. Why does it do this and is there an easy way to concatenate single characters starting from an empty string if I don't know how long it'll be?
In statements like this
string += 'A';
there is used the pointer arithmetic. The value of the internal representation of the character 'A' is added to the value of the pointer string and as a result the pointer has an invalid value because it does not point to an actual object.
You need to declare a character array as for example
char string[4] = "";
and then you can set respective elements of the array to character literals like for example
int i = 0'
string[i++] = 'A';
string[i++] = 'B';
string[i++] = 'C';
string[i] = '\0';
printf("%s\n", string);
Also you have a typo in this call
printf("&s\n", string);
If a character array already contains a string like
char string[4] = "AB";
and you want to append a character to the end of the string then either you can write using a character literal
size_t n = strlen( string );
string[n] = 'C';
string[n + 1] = '\0';
Or you can use a string literal and the standard C function strcat like
strcat( string, "C" );
In any case the character array shall have enough space to accommodate a new character.
string is just a pointer to the string literal "", so when you add a char with +, you're actually just moving the pointer instead of concatenating to the string. In C, you can allocate a sufficiently large string and use strcat to add strings into it:
char string[100] = "";
strcat(string, "A");
strcat(string, "B");
strcat(string, "C");
printf("%s\n", string);
If you want to use chars, then you can convert the char to a string first.
string += 'A'; does not append a character to the string, it increments the char pointer string by the value of 'A' which on systems that use ASCII is 65 and makes string point well beyond the end of the "" string literal. Hence this code has undefined behavior.
printf("&s\n", string); should print &s and a newline.
Assuming you mistyped your code in the question, printf("%s\n", string); would have undefined behavior, and printing an empty line is possible, as well as a crash or any other nasty side-effect.
If you want to construct a string one character at a time, use this:
char buf[20];
char *string = buf;
*string++ = 'A';
*string++ = 'B';
*string++ = 'C';
*string = '\0'; // set the null terminator
printf("%s\n", buf);
Conversely, you can use strcat with string literals:
char string[20] = "";
strcat(string, "A");
strcat(string, "B");
strcat(string, "C");
printf("%s\n", string);
It just prints an empty line.
You are unfortunate, but not surprisingly so, that your code's undefined behavior manifests as printing an apparently-empty string. It would have been more indicative of the nature of the problem if it threw a segfault, or some other variety of memory-related violation, as would be entirely appropriate.
Why does it do this
Because you are performing arithmetic on the pointer, not modifying the thing to which it points. This expression ...
string += 'A';
... computes the pointer addition of string with the integer character constant 'A' (whose numeric value is system dependent, but is often the ASCII code for capital letter A), and stores the resulting pointer in string. This makes string point to something different than it previously did. It does not in any way modify the contents of the memory to which string pointed.
and is there an easy
way to concatenate single characters starting from an empty string if
I don't know how long it'll be?
If you have an upper bound on how long the data can be, then the easiest thing to do is declare a large-enough array to contain the data and initialize it to all-zero ...
char string[MAX_LEN + 1] = {0};
You can then add a single character by by writing it to the next available index (which index you may either track, or compute at need via strlen()):
unsigned next_index = 0;
string[next_index++] = 'A';
string[next_index++] = 'B';
string[next_index++] = 'C';
Note that this relies on the zero-initialization -- which is not automatic for local variables -- to ensure that the array contents at all times comprise a null-terminated string. Having done that, you can print the expected result:
printf("%s\n", string);
If you did not know in advance a reasonable upper bound on how long the string may be, or if the upper bound were exceedingly large, then you would need to relying on dynamic memory allocation and reallocation. This is a subject for you to defer until later.
Related
I was solving a challenge on CodeSignal in C. Even though the correct libraries where included, I couldn't use the strrev function in the IDE, so I looked up a similar solution and modified it to work. This is good. However, I don't understand the distinction between a literal string and an array. Reading all this online has left me a bit confused. If C stores all strings as an array with each character terminated by \0 (null terminated), how can there be any such thing as a literal string? Also if it is the case that strings are stored as an array, *would inputString store the address of the array or is it an array itself of all the individual characters stored.
Thanks in advance for any clarification provided!
Here is the original challenge, C:
Given the string, check if it is a palindrome.
bool solution(char * inputString) {
// The input will be character array type, storing a single character each terminated by \0 at each index
// * inputString is a pointer that stores the memory address of inputString. The memory address points to the user inputted string
// bonus: inputString is an array object starting at index 0
// The solution function is set up as a Boolean type ("1" is TRUE and the default "0" is FALSE)
int begin;
// The first element of the inputString array is at position 0, so is the 'counter'
int end = strlen(inputString) - 1;
// The last element is the length of the string minus 1 since the counter starts at 0 (not 1) by convention
while (end > begin) {
if (inputString[begin++] != inputString[end--]) {
return 0;
}
} return 1;
}
A string is also an array of symbols. I think that what you don't understand is the difference between a char pointer and a string. Let me explain in an example:
Imagine I have the following:
char str[20]="helloword";
str is the address of the first symbol of the array. In this case str is the address of h. Now try to printf the following:
printf("%c",str[0]);
You can see that it has printed the element of the addres that is 'h'.
If now I declare a char pointer, it will be poining to whatever char adress I want:
char *c_pointer = str+1;
Now print the element of c_pointer:
printf("%c",c_pointer[0]);
You can see that it will print 'e' as it is the element of the second adress of the original string str.
In addition, what printf("%s", string) does is to printf every elemet/symbol/char from the starting adress(string) to the end adress where its element is '\0'.
The linked question/answers in the comments pretty much cover this, but saying the same thing a slightly different way helps sometimes.
A string literal is a quoted string assigned to a char pointer. It is considered read only. That is, any attempts to modify it result in undefined behavior. I believe that most implementations put string literals in read-only memory. IMO, it's a shortcoming of C (fixed in C++) that a const char* type isn't required for assigning a string literal. Consider:
int main(void)
{
char* str = "hello";
}
str is a string literal. If you try to modify this like:
#include <string.h>
...
str[2] = 'f'; // BAD, undefined behavior
strcpy(str, "foo"); // BAD, undefined behavior
you're broken the rules. String literals are read only. In fact, you should get in the habit of assigning them to const char* types so the compiler can warn you if you try to do something stupid:
const char* str = "hello"; // now you should get some compiler help if you
// ever try to write to str
In memory, the string "hello" resides somewhere in memory, and str points to it:
str
|
|
+-------------------> "hello"
If you assign a string to an array, things are different:
int main(void)
{
char str2[] = "hello";
}
str2 is not read only, you are free to modify it as you want. Just take care not to exceed the buffer size:
#include <string.h>
...
str2[2] = 'f'; // this is OK
strcpy(str2, "foo"); // this is OK
strcpy(str2, "longer than hello"); // this is _not_ OK, we've overflowed the buffer
In memory, str2 is an array
str2 = { 'h', 'e', 'l', 'l', '0', '\0' }
and is present right there in automatic storage. It doesn't point to some string elsewhere in memory.
In most cases, str2 can be used as a char* because in C, in most contexts, an array will decay to a pointer to it's first element. So, you can pass str2 to a function with a char* argument. One instance where this is not true is with sizeof:
sizeof(str) // this is the size of pointer (either 4 or 8 depending on your
// architecture). If _does not matter_ how long the string that
// str points to is
sizeof(str2) // this is 6, string length plus the NUL terminator.
I understand that in C, a string is an array of characters with a special '\0' character at the end of the array.
Say I have "Hello" stored in a char* named string and there is a '\0' at the end of the array.
When I call printf("%s\n", string);, it would print out "Hello".
My question is, what happens to '\0' when you call printf on a string?
The null character ('\0') at the end of a string is simply a sentinel value for C library functions to know where to stop processing a string pointer.
This is necessary for two reasons:
Arrays decay to pointers to their first element when passed to functions
It's entirely possible to have a string in an array of chars that doesn't use up the entire array.
For example, strlen, which determines the length of the string, might be implemented as:
size_t strlen(char *s)
{
size_t len = 0;
while(*s++ != '\0') len++;
return len;
}
If you tried to emulate this behavior inline with a statically allocated array instead of a pointer, you still need the null terminator to know the string length:
char str[100];
size_t len = 0;
strcpy(str, "Hello World");
for(; len < 100; len++)
if(str[len]=='\0') break;
// len now contains the string length
Note that explicitly comparing for inequality with '\0' is redundant; I just included it for ease of understanding.
I'm getting a seg. fault when I try to subtract 32 from a char type (trying to convert to lowercase without tolower() in C. I have done the prerequisite searching for relevant Q/A threads with no luck. I also tried 'a' - 'A' for the conversion value, '32', casting it as (char*) and anything else I could think of. For an example:
char* s1 = "Bob";
if (*s1 >= 97 && *s1 <= 122)
*s1 -= 32;
}
Any advice?
Edit:
After following the help below, I still get the error. (For this example, I am only trying to change the first letter of the name to lowercase.) Here is what I am trying:
char* s1 = "Bob";
printf("\n %s before", s1);
// below I call my string length function to get actual size instead of 100
char* temp = malloc(100);
temp = s1;
if (*temp >= 'A' && *temp <= 'Z'){
*temp -= 32;
}
printf("\n%s after", temp);
free(temp);
Also, why do I need to allocate memory for a string that already is in memory?
You can't alter literal strings like that - they are (usually) in read-only memory. You need to make a writeable copy of the string literal:
char* name = "Bob";
char* s1 = strdup(name);
...
free(s1); // And you also need this to avoid a memory leak!
There are a number of problems with your code.
char* s1 = "Bob";
A string literal creates a read-only array of char; this array is static meaning that it exists for the entire lifetime of your program. For historical reasons, it's not const, so the compiler won't necessarily warn you if you attempt to modify it, but you should carefully avoid doing so.
s1 points to the first character of that array. You may not modify *s1. For safety, you should declare the pointer as const:
const char *s1 = "Bob";
If you want a modifiable character array, you can create it like this:
char s1[] = "Bob";
Now let's look at the remaining code:
if (*s1 >= 97 && *s1 <= 122)
*s1 -= 32;
}
97 and 122 are the numeric ASCII codes for 'a' and 'z'. 32 is the difference between a lower case letter and the corresponding upper case letter -- again, in ASCII.
The C language doesn't guarantee that characters are represented in ASCII, or in any of the character sets that are compatible with it. On an IBM mainframe, for example, characters are represented in EBCDIC, in which the codes for the letters are not contiguous (there are gaps), and the difference between corresponding lower case and upper case letters is 64, not 32.
EBCDIC systems are rare these days, but even so, portable code tends to be clearer than non-portable code, even aside from any practical issues of whether the code will work on all systems.
As I'm sure you know, the best way to do this is to use the tolower function:
*s1 = tolower((unsigned char)*s1);
Note the cast to unsigned char. The to*() and is*() functions declared in <ctype.h> are oddly behaved, for historical reasons. They don't work on char arguments; rather, they work on int arguments that are within the range of unsigned char. (They also accept EOF, which is typically -1). If plain char is signed, then passing a char value that happens to be negative causes undefined behavior. Yes, it's annoying.
But you say you don't want to use tolower. (Which is fine; learning to do things like this yourself is a good exercise.)
If you're willing to assume that upper case letters are contiguous, and that lower case letters are contiguous, then you can do something like this:
if (*s1 >= 'a' && *s1 <= 'z') {
*s1 -= 'a' - 'A';
}
That's still not portable to non-ASCII systems, but it's a lot easier to read if you don't happen to have the ASCII table memorized.
It also makes it a little more obvious that you've got the logic backwards. You say you want to convert to lower case, but your code converts from lower case to upper case.
Or you can use a lookup table that maps lower case letters to upper case letters:
char to_lower[CHAR_MAX] = { 0 }; /* sets all elements to 0 */
to_lower['A'] = 'a';
to_lower['B'] = 'b';
/* ... */
to_lower['Z'] = 'z';
Or, if your compiler supports compound literals:
const char to_lower[CHAR_MAX] = {
['A'] = 'a',
['B'] = 'b',
/* ... */
};
I'll leave it to you to fill in the rest write the code to use it.
And now you can see why the tolower and toupper functions exist -- so you don't have to deal with all this stuff (apart from the odd unsigned char casts you'll need).
UPDATE :
In response to the new parts of your question:
char* temp = malloc(100);
temp = s1;
That assignment temp = s1; doesn't copy the allocated string; it just copies the pointer. temp points to 100 bytes of allocated space, but then you make temp point to the (read-only) string literal, and you've lost any references to the allocated space, creating a memory leak.
You can't assign strings or arrays in C. To copy a string, use the strcpy() function:
char *temp = malloc(100);
if (temp == NULL) { /* Don't assume the allocation was successful! */
fprintf(stderr, "malloc failed\n");
exit(EXIT_FAILURE);
}
strcpy(temp, s1);
Also, why do I need to allocate memory for a string that already is in memory?
It's in memory, but it's memory that you're not allowed to modify. If you want to modify it, you need to copy it to a modifiable location. Or, as I suggested above, you can put it in read/write memory in the first place:
char s[] = "Bob";
That initialization copies the string into the array s.
Initialize char and use malloc to allocat memory to store all string and than use for loop and convert whole string in lower case.
You need to
Allocate buffer
Copy string "Bob" to your buffer
Loop through the string, editing it as you go.
This fails because string literals are usually stored in read-only memory.
The easiest fix is to use the literal to initialize an array, the array will be modifiable (unless explicitly made const so don't do that):
char s1[] = "Bob";
Also, it's very bad form to hardcode ASCII, use the islower() and tolower() functions from <ctype.h> to make this code proper.
char *s1 = "Bob";
is creating a pointer to a string constant. That means the string "Bob" will be somewhere in the read-only part of the memory and you just have a pointer to it. You can use the string as read-only. You cannot make changes to it.
Example:
s1[0] = 'b';
Is asking for trouble.
To make changes to s1 you should allocate memory for it
s1 = malloc(10); //or what you want
Now changes to s1 can be done easily.
Using GDB, I find I get a segmentation fault when I attempt this operation:
strcat(string,¤tChar);
Given that string is initialized as
char * string = "";
and currentChar is
char currentChar = 'B';
Why does this result in a segmentation fault?
If strcat can't be used for this, how else can I concat a char onto a string?
As responded by others, ¤tChar is a pointer to char or char*, but a string in C is char[] or const char*.
One way to use strcat to concatenate a char to string is creating a minimum string and use it to transform a char into string.
Example:
Making a simple string, with only 1 character and the suffix '\0';
char cToStr[2];
cToStr[1] = '\0';
Applying to your question:
char * string = "";
char currentChar = 'B';
cToStr will assume the string "B":
cToStr[0] = currentChar;
And strcat will work!
strcat ( string, cToStr );
Because ¤tChar is not a string, it doesn't finish with \0 character. You should define B as char *currentChar = 'B';. Also according to http://www.cplusplus.com/reference/clibrary/cstring/strcat string should have enough space to hold the result string (2 bytes in this case), but it is only 1 byte.
Or if you want to use char then you can do something like (depending of your code):
char string[256];
...
char currentChar = 'B';
size_t cur_len = strlen(string);
if(cur_len < 254) {
string[cur_len] = currentChar;
string[cur_len+1] = '\0';
}
else
printf("Not enough space");
I think the simplest method (not efficient) would be sprintf
sprintf(str, "%s%c", str, chr);
strcat() takes two '\0'-terminated strings. When you pass the address of a character, the routine will look at the memory that follows the character, looking for the terminator.
Since you don't know what that memory even refers to, you should expect problems when your code accesses it.
In addition to that, your string argument does not have room to have any characters appended to it. Where is that memory written to? It will attempt to write past the end of the memory associated with this string.
Both of the strings must be null-terminated. A single char isn't null terminated, so it's undefined when strcat will stop concatenating characters to the end. Also, string must contain at least enough space for both the original string and resultant string.
This works:
char string[10] = "";
char* currentChar = "B";
strcat(string, currentChar);
We know that currentChar = 'B'.
This can be done
strcat(string, "B\0");
If we know currentChar will be hardcoded as 'B', this would be a good approach.
It also removes the need for char currentChar = 'B';
The first argument of strcat must have enough space to hold the rest of the string. "" is a constant string and as such GCC does not allocate space.
Make it an array with enough space:
char buf[1024];
strcat(buf, "");
strcat(buf, "B");
I want to initialize string in C to empty string.
I tried:
string[0] = "";
but it wrote
"warning: assignment makes integer from pointer without a cast"
How should I do it then?
In addition to Will Dean's version, the following are common for whole buffer initialization:
char s[10] = {'\0'};
or
char s[10];
memset(s, '\0', sizeof(s));
or
char s[10];
strncpy(s, "", sizeof(s));
You want to set the first character of the string to zero, like this:
char myString[10];
myString[0] = '\0';
(Or myString[0] = 0;)
Or, actually, on initialisation, you can do:
char myString[10] = "";
But that's not a general way to set a string to zero length once it's been defined.
Assuming your array called 'string' already exists, try
string[0] = '\0';
\0 is the explicit NUL terminator, required to mark the end of string.
Assigning string literals to char array is allowed only during declaration:
char string[] = "";
This declares string as a char array of size 1 and initializes it with \0.
Try this too:
char str1[] = "";
char str2[5] = "";
printf("%d, %d\n", sizeof(str1), sizeof(str2)); //prints 1, 5
calloc allocates the requested memory and returns a pointer to it. It also sets allocated memory to zero.
In case you are planning to use your string as empty string all the time:
char *string = NULL;
string = (char*)calloc(1, sizeof(char));
In case you are planning to store some value in your string later:
char *string = NULL;
int numberOfChars = 50; // you can use as many as you need
string = (char*)calloc(numberOfChars + 1, sizeof(char));
To achieve this you can use:
strcpy(string, "");
string[0] = "";
"warning: assignment makes integer from pointer without a cast
Ok, let's dive into the expression ...
0 an int: represents the number of chars (assuming string is (or decayed into) a char*) to advance from the beginning of the object string
string[0]: the char object located at the beginning of the object string
"": string literal: an object of type char[1]
=: assignment operator: tries to assign a value of type char[1] to an object of type char. char[1] (decayed to char*) and char are not assignment compatible, but the compiler trusts you (the programmer) and goes ahead with the assignment anyway by casting the type char* (what char[1] decayed to) to an int --- and you get the warning as a bonus. You have a really nice compiler :-)
I think Amarghosh answered correctly. If you want to Initialize an empty string(without knowing the size) the best way is:
//this will create an empty string without no memory allocation.
char str[]="";// it is look like {0}
But if you want initialize a string with a fixed memory allocation you can do:
// this is better if you know your string size.
char str[5]=""; // it is look like {0, 0, 0, 0, 0}
It's a bit late but I think your issue may be that you've created a zero-length array, rather than an array of length 1.
A string is a series of characters followed by a string terminator ('\0'). An empty string ("") consists of no characters followed by a single string terminator character - i.e. one character in total.
So I would try the following:
string[1] = ""
Note that this behaviour is not the emulated by strlen, which does not count the terminator as part of the string length.