so if I have
char a = 'a';
char b = 'b';
char* combination = a + b;
where the result should be
combination = "ab"
how would I do that? I'm 99% sure that the "+" operator wouldn't work here, but I'm not sure what I should do instead. I am new to C, so sorry for how trivial this question is!
edit** is it possible to this without using an array? meaning without using brackets []
adding 2 chars together to form one char* ... how would I do that?
A direct way to do that since C99 is below which uses a compound literal. #haccks, #unwind
char a = 'a';
char b = 'b';
// v-------------------v--- compound literal
char* combination1 = (char[]) {a, b, '\0'};
printf("<%s>\n", combination1);
combination1 is valid until the end of the block.
I'm 99% sure that the "+" operator wouldn't work here,
OP is correct, + is not the right approach.
is it possible to this without using an array? meaning without using brackets []
This is a curious restrictive requirement.
Code could allocate memory and then assign.
char *concat2char(char a, char b) {
char *combination = malloc(3);
if (combination) {
char *s = combination;
*s++ = a;
*s++ = b;
*s = '\0';
}
return combination;
}
// Sample usage
char *combination2 = concat2char('a', 'b');
if (combination2) {
printf("<%s>\n", combination2);
free(combination2); // free memory when done
}
Using a 3 member struct looks like an option, yet portable code does not rely in s3 being packed.
struct {
char a, b, n;
} s3 = {a, b, '\0' };
// Unreliable to form a string, do not use !!
char* combination3 = (char *) &s3;
printf("<%s>\n", combination);
You need to set up a target array that's at least one element larger than the number of characters in the final string:
char str[3]; // 2 characters plus string terminator
str[0] = a;
str[1] = b;
str[2] = 0; // you'll sometimes see that written as str[2] = '\0'
Neither the + nor = operators are defined for string operations. You'll have to use the str* library functions (strcat, strcpy, strlen, etc.) for string operations.
EDIT
In C, a string is a sequence of character values terminated by a 0-valued byte - the string "hello" is represented by the character sequence {'h', 'e', 'l', 'l', 'o', 0} (sometimes you'll see '\0' instead of plain 0). Strings are stored in arrays of char. String literals like "hello" are stored in arrays of char such that they are visible over the lifetime of the program, but are not meant to be modified - attempting to update the contents of a string literal results in undefined behavior (code may crash, operation may just not succeed, code may behave as expected, etc.).
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element. So when we're dealing with strings, we're usually dealing with expressions of type char * - however, that does not mean that a char * always refers to a string. It can point to a single character that isn't part of a string, or it can point to a sequence of characters that don't have a 0 terminator.
So, let's start with a string literal like "hello". It will be stored "somewhere" as the array:
+---+---+---+---+---+---+
|'h'|'e'|'l'|'l'|'o'| 0 |
+---+---+---+---+---+---+
When you write something like
char *str = "hello";
the address of the first element of the array that stores "hello" is written to pointer variable str:
+---+ +---+---+---+---+---+---+
str: | | ---> |'h'|'e'|'l'|'l'|'o'| 0 |
+---+ +---+---+---+---+---+---+
While you can read each str[i], you should not write to it (technically, the behavior is undefined). On the other hand, when you write something like:
char str[] = "hello";
str is created as an array, and the contents of the string literal are copied to the array:
+---+---+---+---+---+---+
|'h'|'e'|'l'|'l'|'o'| 0 |
+---+---+---+---+---+---+
+---+---+---+---+---+---+
str: |'h'|'e'|'l'|'l'|'o'| 0 |
+---+---+---+---+---+---+
The array is sized based on the size of the initializer, so it will be 6 elements wide (+1 for the string terminator). If the array is going to hold the result of a concatenation or print operation, then it will need to be large enough to hold the resulting string, plus the 0 terminator, and C arrays do not automatically grow as stuff is added to them. So, if you want to concatenate two 3-character strings together, then the target array must be at least 7 elements wide:
char result[7];
char *foo = "foo";
char *bar = "bar";
strcpy( result, foo ); // copies contents of foo to result
strcat( result, bar ); // appends the contents of bar to result
Just define an array, and arrange the two characters into it, then end it with a final zero-character to form a proper C string:
const char a = 'a';
const char b = 'b';
const char my_string[] = { a, b, '\0' };
printf("The string is '%s'\n", my_string);
In response to comments (which I don't understand, but I'm here to help, heh), here's a twisted way of writing it that does away with the brackets.
uint32_t memory;
char *p = (char *) &memory;
*p++ = a;
*p++ = b;
*p-- = '\0';
printf("the string is '%s'\n", --p);
Note: this code is absurd, but so is the request to not use random parts of the language you're programming in. Do not use this code for anything except reading it here and thinking "wow, that would be soo much better if I could just use braces".
Related
I was solving a challenge on CodeSignal in C. Even though the correct libraries where included, I couldn't use the strrev function in the IDE, so I looked up a similar solution and modified it to work. This is good. However, I don't understand the distinction between a literal string and an array. Reading all this online has left me a bit confused. If C stores all strings as an array with each character terminated by \0 (null terminated), how can there be any such thing as a literal string? Also if it is the case that strings are stored as an array, *would inputString store the address of the array or is it an array itself of all the individual characters stored.
Thanks in advance for any clarification provided!
Here is the original challenge, C:
Given the string, check if it is a palindrome.
bool solution(char * inputString) {
// The input will be character array type, storing a single character each terminated by \0 at each index
// * inputString is a pointer that stores the memory address of inputString. The memory address points to the user inputted string
// bonus: inputString is an array object starting at index 0
// The solution function is set up as a Boolean type ("1" is TRUE and the default "0" is FALSE)
int begin;
// The first element of the inputString array is at position 0, so is the 'counter'
int end = strlen(inputString) - 1;
// The last element is the length of the string minus 1 since the counter starts at 0 (not 1) by convention
while (end > begin) {
if (inputString[begin++] != inputString[end--]) {
return 0;
}
} return 1;
}
A string is also an array of symbols. I think that what you don't understand is the difference between a char pointer and a string. Let me explain in an example:
Imagine I have the following:
char str[20]="helloword";
str is the address of the first symbol of the array. In this case str is the address of h. Now try to printf the following:
printf("%c",str[0]);
You can see that it has printed the element of the addres that is 'h'.
If now I declare a char pointer, it will be poining to whatever char adress I want:
char *c_pointer = str+1;
Now print the element of c_pointer:
printf("%c",c_pointer[0]);
You can see that it will print 'e' as it is the element of the second adress of the original string str.
In addition, what printf("%s", string) does is to printf every elemet/symbol/char from the starting adress(string) to the end adress where its element is '\0'.
The linked question/answers in the comments pretty much cover this, but saying the same thing a slightly different way helps sometimes.
A string literal is a quoted string assigned to a char pointer. It is considered read only. That is, any attempts to modify it result in undefined behavior. I believe that most implementations put string literals in read-only memory. IMO, it's a shortcoming of C (fixed in C++) that a const char* type isn't required for assigning a string literal. Consider:
int main(void)
{
char* str = "hello";
}
str is a string literal. If you try to modify this like:
#include <string.h>
...
str[2] = 'f'; // BAD, undefined behavior
strcpy(str, "foo"); // BAD, undefined behavior
you're broken the rules. String literals are read only. In fact, you should get in the habit of assigning them to const char* types so the compiler can warn you if you try to do something stupid:
const char* str = "hello"; // now you should get some compiler help if you
// ever try to write to str
In memory, the string "hello" resides somewhere in memory, and str points to it:
str
|
|
+-------------------> "hello"
If you assign a string to an array, things are different:
int main(void)
{
char str2[] = "hello";
}
str2 is not read only, you are free to modify it as you want. Just take care not to exceed the buffer size:
#include <string.h>
...
str2[2] = 'f'; // this is OK
strcpy(str2, "foo"); // this is OK
strcpy(str2, "longer than hello"); // this is _not_ OK, we've overflowed the buffer
In memory, str2 is an array
str2 = { 'h', 'e', 'l', 'l', '0', '\0' }
and is present right there in automatic storage. It doesn't point to some string elsewhere in memory.
In most cases, str2 can be used as a char* because in C, in most contexts, an array will decay to a pointer to it's first element. So, you can pass str2 to a function with a char* argument. One instance where this is not true is with sizeof:
sizeof(str) // this is the size of pointer (either 4 or 8 depending on your
// architecture). If _does not matter_ how long the string that
// str points to is
sizeof(str2) // this is 6, string length plus the NUL terminator.
I need to count the length of a string in C without using the string.h library, using char pointers. So far all the methods I've seen use the string.h library. Is it possible to do it without i?
Yes it is possible, the end of a string in C is usually null terminated, so you can loop through the string until you find the '\0' character, counting each as you go.
When talking about strings in C, you're essentially talking about an array of char, terminated by the nil character ('\0'). The length of a string, therefore is:
size_t len(const char *s) {
// if no string was passed (null pointer), just return
if (s == NULL)
return 0;
size_t ln;
// iterate over string until you encounter the terminating char
for (ln=0;s[ln] != '\0'; ln++);
return ln;
}
As pointed out by #chux-ReinstateMonica: the correct type to return here is size_t, if you want to avoid potential overflow issues + you're looking for a drop-in replacement for the strlen() function in strings.h. The length of a string can, after all, never be negative, so using the unsigned size_t makes more sense than the signed int.
Now this obviously assumes the input was a valid string. If I were to do something like:
int main( void ) {
char * random = malloc(1024); // allocate 1k chars, without initialising memory
printf("length is: %zu\n", len(random)); // use %zu because size_t is unsigned!
return 0;
}
There's no guarantee the memory I allocated will even contain a terminating character, and therefore it's not impossible for the function to return a value greater than 1024. Same goes for something like:
char foo[3] = {'f', 'o', 'o'}; // full array, no terminating char
This string, in memory, could look something like this:
//the string - random memory after the array
| f | o | o | g | a | r | b | a | g | e | U | B | \0 |
Making it look like a string of 12 characters. If you then use the assumed length of 12 to write past the actual array, all bets are off (undefined behaviour).
This string should've been:
char foo[4] = {'f', 'o', 'o', '\0'}; // or char[4] foo = "foo";
You can read an old answer I gave elsewhere for some more details on how to get the length of a string passed to a function here
Here example:
size_t pos(char* s)
{
size_t lenS; //len of string
for (lenS = 0; s[lenS]; lenS++);
return lenS;
}
As per my understanding array of strings can be initialized as shown below or using two dimensional array. Please tell is there any other possibility.
char *states[] = { "California", "Oregon", "Washington", "Texas" };
I have observed in U-boot source that environment variables are stored in one dimensional array as shown here:
uchar default_environment[] = {
#ifdef CONFIG_BOOTARGS
"bootargs=" CONFIG_BOOTARGS "\0"
#endif
#ifdef CONFIG_BOOTCOMMAND
"bootcmd=" CONFIG_BOOTCOMMAND "\0"
#endif
...
"\0"
};
Can you help me understand this?
A "string" is effectively nothing more than a pointer to a sequence of chars terminated by a char with the value 0 (note that the sequence must be within a single object).
char a[] = {65, 66, 67, 0, 97, 98, 99, 0, 'f', 'o', 'o', 'b', 'a', 'r', 0, 0};
/* ^ ^ ^ ^ */
In the above array we have four elements with value 0 ... so you can see that as 4 strings
// string 1
printf("%s\n", a); // prints ABC on a ASCII computer
// string 2
printf("%s\n", a + 4); // prints abc on a ASCII computer
// string 3
printf("%s\n", a + 8); // prints foobar
// string 4
printf("%s\n", a + 14); // prints empty string
As per my understanding array of strings can be initialized as shown below or using two dimensional array. Please tell is there any other possibility.
I have observed in U-boot source that environment variables are stored in one dimensional array.
If you have the implication that this default_environment is an array of strings, then it is not. This has nothing to do with array of strings initialization as in your first example.
You can try remove all #ifdef and #endif, then it'd be clear that default_environment is simply a concatenation of individual strings. For instance, "bootargs=" CONFIG_BOOTARGS "\0". Notice the \0 at the end, it will ensure that the string assigned to default_environment will not get pass the first line, given CONFIG_BOOTARGS is defined.
uchar default_environment[] = {
#ifdef CONFIG_BOOTARGS
"bootargs=" CONFIG_BOOTARGS "\0"
#endif
#ifdef CONFIG_BOOTCOMMAND
"bootcmd=" CONFIG_BOOTCOMMAND "\0"
#endif
...
"\0"
};
They are not creating an array of strings there, such as your char *states[], it's a single string that is being created (as a char[]). The individual 'elements' inside the string are denoted by zero-termination.
To translate your example
char *states[] = { "California", "Oregon", "Washington", "Texas" };
to their notation would be
char states[] = { "California" "\0" "Oregon" "\0" "Washington" "\0" "Texas" "\0" "\0" };
which is the same as
char states[] = { "California\0Oregon\0Washington\0Texas\0\0" };
You can use these by getting a pointer to the start of each zero-terminated block and then the string functions, such as strlen will read until they see the next '\0' character.
As for the why of it, #M.M.'s comment gives some good indication.
If we simplify the question to what is the difference between initialising a char *foo versus a char foo[100] it might help. Check out the following code:
char buffer1[100] = "this is a test";
char *buffer2 = "this is a test";
int main(void)
{
printf("buffer1 = %s\n", buffer1);
buffer1[0] = 'T';
printf("buffer1 = %s\n", buffer1);
printf("buffer2 = %s\n", buffer2);
buffer2[0] = 'T';
printf("buffer2 = %s\n", buffer2); // this will fail
}
In the first case (buffer1) we initialise a character array with a string. The compiler will allocate 100 bytes for the array, and initialise the contents to "this is a test\0\0\0\0\0\0\0\0\0...". This array is modifiable like any other heap memory.
In the second case, we didn't allocate any memory for an array. All we asked the compiler to do is set aside enough memory for a pointer (4 or 8 bytes typically), then initialise that to point to a string stored somewhere else. Typically the compiler will either generate a separate code segment for the string, or else just store it inline with the code. In either case this string is read-only. So on the final line where I attempted to write to it, it caused a seg-fault.
That is the difference between initialising an array and a pointer.
The common technique is to use an array of pointers (note: this is different from a 2D array!) as:
char *states[] = { "California", "Oregon", "Washington", "Texas" };
That way, states is an array of 4 char pointer, and you use it simply printf("State %s", states[i]);
For completeness, a 2D array would be char states[11][5] with all rows having same length which is pretty uncommon, and harder to initialize.
But some special use cases or API(*) require (or return) a single char array where strings are (normally) terminated with \0, the array itself being terminated by an empty element, that it two consecutive \0. This representation allows a single allocation bloc for the whole array, when the common array of pointers has the array of pointers in one place and the strings themselves in another place. By the way, it is easy to rebuild an array of pointers from that 1D characater arrays with \0 as separators, and it is generally done to be able to easily use the strings.
The last interesting point of the uchar default_environment[] technique is that it is a nice serialization: you can directly save it to a file and load it back. And as I have already said, the common usage is then to build an array of pointers to access easily to the individual strings.
(*) For example, the WinAPI functions GetPrivateProfileSection and WritePrivateProfileSection use such a representation to set or get a list of key=value strings in one single call.
Yes, you can create and initialized array of string or hierarchy of arrays using pointers. Describing it in details in case someone needs it.
A single char
1. char states;
Pointer to array of chars.
2. char *states = (char *) malloc(5 * sizeof(char)):
Above statement is equivalent to char states[5];
Now its up to you if you initialize it with strcpy() like
strcpy(states, "abcd");
or use direct values like this.
states[0] = 'a';
states[1] = 'b';
states[2] = 'c';
states[3] = 'd';
states[4] = '/0';
Although if you store any other char on index 4 it will work but it would be better to end this using null character '\0' .
Pointer to array of pointers or pointer to a matrix of chars
3. char ** states = (char **) malloc(5 * sizeof(char *));
It is an array of pointers i.e. each element is a pointer, which can point to or in other word hold a string etc. like
states[0] = malloc ( sizeof(char) * number );
states[1] = malloc ( sizeof(char) * number );
states[2] = malloc ( sizeof(char) * number );
states[3] = malloc ( sizeof(char) * number );
states[4] = malloc ( sizeof(char) * number );
Above statement is equivalent to char states[5][number];
Again its up to you how you initialize these pointers to strings i.e.
strcpy( states[0] , "hello");
strcpy ( states[1], "World!!");
states[2][0] = 'a';
states[2][1] = 'b';
states[2][2] = 'c';
states[2][3] = 'd';
states[2][4] = '\0';
Pointer to matrix of pointers or pointer to 3D chars
char *** states = (char ***) malloc(5 * sizeof(char**));
and so on.
Actually each of these possibilities reaches somehow to pointers.
I've recently started to try learn the C programming language. In my first program (simple hello world thing) I came across the different ways to declare a string after I realised I couldn't just do variable_name = "string data":
char *variable_name = "data"
char variable_name[] = "data"
char variable_name[5] = "data"
What I don't understand is the difference between them. I know they are different and one of them specifically allocates an amount of memory to store the data in but that's about it, and I feel like I need to understand this inside out before moving onto more complex concepts in C.
Also, why does using *variable_name let me reassign the variable name to a new string but variable_name[number] or variable_name[] does not? Surely if I assign, say, 10 bytes to it (char variable_name[10] = "data") and try reassigning it to something that is 10 bytes or smaller it should work, so why doesn't it?
What are the empty brackets and the asterix doing?
In this declaration
char *variable_name = "data";
there is declared a pointer. This pointer points to the first character of the string literal "data". The compiler places the string literal in some region of memory and assigns the pointer by the address of the first character of the literal.
You may reassign the pointer. For example
char *variable_name = "data";
char c = 'A';
variable_name = &c;
However you may not change the string literal itself. An attempt to change a string literal results in undefined behaviour of the program.
In these declarations
char variable_name[] = "data";
char variable_name[5] = "data";
there are declared two arrays elements of which are initialized by characters of used for the initialization string literals. For example this declaration
char variable_name[] = "data";
is equivalent to the following
char variable_name[] = { 'd', 'a', 't', 'a', '\0' };
The array will have 5 elements. So this declaration is fully euivalent to the declaration
char variable_name[5] = "data";
There is a difference if you would specify some other size of the array. For example
char variable_name[7] = "data";
In this case the array would be initialized the following way
char variable_name[7] = { 'd', 'a', 't', 'a', '\0', '\0', '\0' };
That is all elements of the array that do not have explicit initializers are zero-initialized.
Pay attention to that in C you may declare a character array using a string literal the following way
char variable_name[4] = "data";
that is the terminating zero of the string literal is not placed in the array.
In C++ such a declaration is invalid.
Of course you may change elements of the array (if it is not defined as a constant array) if you want.
Take into account that you may enclose a string literal used as an initializer in braces. For example
char variable_name[5] = { "data" };
In C99 you may also use so-called destination initializers. For example
char variable_name[] = { [4] = 'A', [5] = '\0' };
Here is a demonstrative program
#include <stdio.h>
#include <string.h>
int main(void)
{
char variable_name[] = { [4] = 'A', [5] = '\0' };
printf( "%zu\n", sizeof( variable_name ) );
printf( "%zu\n", strlen( variable_name ) );
return 0;
}
The program output is
6
0
When ypu apply standard C function strlen declared in header <string.h> you get that it returns 0 because the first elements of the array that precede the element with index 4 are zero initialized.
Let's say I have a char pointer called string1 that points to the first character in the word "hahahaha". I want to create a char[] that contains the same string that string1 points to.
How come this does not work?
char string2[] = string1;
"How come this does not work?"
Because that's not how the C language was defined.
You can create a copy using strdup() [Note that strdup() is not ANSI C]
Refs:
C string handling
strdup() - what does it do in C?
1) pointer string2 == pointer string1
change in value of either will change the other
From poster poida
char string1[] = "hahahahaha";
char* string2 = string1;
2) Make a Copy
char string1[] = "hahahahaha";
char string2[11]; /* allocate sufficient memory plus null character */
strcpy(string2, string1);
change in value of one of them will not change the other
What you write like this:
char str[] = "hello";
... actually becomes this:
char str[] = {'h', 'e', 'l', 'l', 'o'};
Here we are implicitly invoking something called the initializer.
Initializer is responsible for making the character array, in the above scenario.
Initializer does this, behind the scene:
char str[5];
str[0] = 'h';
str[1] = 'e';
str[2] = 'l';
str[3] = 'l';
str[4] = 'o';
C is a very low level language. Your statement:
char str[] = another_str;
doesn't make sense to C.
It is not possible to assign an entire array, to another in C. You have to copy letter by letter, either manually or using the strcpy() function.
In the above statement, the initializer does not know the length of the another_str array variable. If you hard code the string instead of putting another_str, then it will work.
Some other languages might allow to do such things... but you can't expect a manual car to switch gears automatically. You are in charge of it.
In C you have to reserve memory to hold a string.
This is done automatically when you define a constant string, and then assign to a char[].
On the other hand, when you write string2 = string1,
what you are actually doing is assigning the memory addresses of pointer-to-char objects. If string2 is declares as char* (pointer-to-char), then it is valid the assignment:
char* string2 = "Hello.";
The variable string2 now holds the address of the first character of the constanta array of char "Hello.".
It is fine, also, to write string2 = string1 when string2 is a char* and string1 is a char[].
However, it is supposed that a char[] has constant address in memory. Is not modifiable.
So, it is not allowed to write sentences like that:
char string2[];
string2 = (something...);
However, you are able to modify the individual characters of string2, because is an array of characters:
string2[0] = 'x'; /* That's ok! */