char arrays length in C - c

In C, you have to declare the length of an array:
int myArray[100];
But when you're dealing with chars and strings, the length can be left blank:
char myString[] = "Hello, World!";
Does the compiler generate the length for you by looking at the string?

This is not unique to char. You could do this, for instance:
int myNumbers[] = { 5, 10, 15, 20, 42 };
This is equivalent to writing:
int myNumbers[5] = { 5, 10, 15, 20, 42 };
Initialising a char array from a string literal is a special case.

Yes, it's the length including the terminating '\0'.

Does the compiler generate the length for you by looking at the string?
Yes, that's exactly why it works. The compiler sees the constant value, and can fill in the length so you don't have to do it.

Yes, the compiler knows the length of the string and allocates the appropriate space.

It's the same deal if you did something like...
int x[] = {1,2,3};

The size of the string literal (not length as in strlen) is used to size the array being initialized.
You can initialize a char array with a string literal which has embedded null bytes. The resulting array will have size for all the bytes after the first (or second, ...) null.
char array[] = "foo\0bar\0baz\0quux";
/* sizeof array is 17
** array[3] is 0
** printf("%s\n", array + 4); prints bar
** array[11] is 0
** printf("%s\n", array + 12); prints quux
** array[16] == 0
*/

Related

Transferring a string of characters in quotes to an array of ascii2 bytes in C

How do I easily convert a string of characters in quotes to an array of ascii2 values?
For example:
char ST1[5] = "12345";
to
char DST1[5] = { 0x31, 0x32, 0x33, 0x34, 0x35 }; //string of bytes
One for all, one answer is casting.
Easy way to do it:
int main () {
int i;
char ST1[5]="12345";
for (i=0;i<5;i++)
printf("%d\n",(int)ST1[i]);
return 0;
}
Just like I printed, you can store it, calculate using it or anything possible. And as I see, you want those number in hexadecimals. For that, just change the printf's placeholder with help from here - printf() formatting for hex
Practically nothing.
First, remember that string literals such as "12345" have an extra "null" byte at the end. So instead of treating this array initialization of "12345" to an array of 5 elements, it's really an array of 6 elements, so this initialization statement is more appropriate:
char ST1[6] = "12345";
Or just simply:
char ST1[] = "12345"; // where the size of "6" is auto inferred.
Which is equivalent to:
char DST1[] = {0x31,0x32,0x33,0x34,0x35,0x00};
And once you have an array of 6 elements, you can pass it around as if it had only 5:
So if you have some code that expects to operate on an array of 5 chars:
void foo(char DST1[])
{
for(int i = 0; i < 5; i++)
{
process(DST1[i]);
}
}
You can invoke it as follows:
char ST1[]="12345";
foo(ST1);
The only difference between the definitions of ST1 and DST1 is the silent assumption that the target character set is ASCII, at least for the digits.
Here is another alternative:
char CST1[5] = { '1', '2', '3', '4', '5' };
Note however that none of these char arrays are proper C strings because they all lack a null terminator. If you with to define an array that is a C string, you should use this syntax:
char SST1[] = "12345";
Notice the missing length between the []: the compiler will determine the length of the array from the initializer and add a null terminator, hence a length a 6.
C strings can be copies with strcpy, char arrays that do not have a null terminator should be copied with memcpy or memmove.

Create an array of strings without allocating each string

I am trying to figure out how to create an array of strings (considering I know the max length of each string).
char** strings = NULL;
strings = malloc (5*sizeof(char*));
Once I did that, how can I just fill the array without the need to allocate each string separately? Lets say I know the max length of a string is 20, how to I set it?
After the allocation of the string I wish to do the following:
strings[0] = "string";
strings[1] = "another string";
etc.
Thanks
You can declare an array of pointers to char and then assign string literals to those pointers
char *strings[5];
strings[0] = "string";
strings[1] = "another string";
/* ... */
But note that, these strings will be immutable.
You can also use an array of char arrays
char strings[5][20]; // As you know max length of string is 20
strcpy(strings[0], "string");
strcpy(strings[1], "another string");
/* ... */
One of the advantage of latter is strings will be mutable.
If you know the maximum size of each line, and if you know the maximum number of lines, you could simply define a two-dimensional array of chars, i.e. char arr[5][20+1]. Then you will have reserved space for up to 5 lines, each comprising up to 20 characters (+ null char).
And you could also define a type alias representing such a line (if you like):
#define MaxLineLength 20
typedef char Line[MaxLineLength+1];
int main() {
Line input = { 0 };
scanf("%20s", input);
Line a[5] = { 0 };
strcpy(a[0], input);
strcpy(a[1], "string1");
return 0;
}

unsure about initializing one dimensional char array with array of strings

As per my understanding array of strings can be initialized as shown below or using two dimensional array. Please tell is there any other possibility.
char *states[] = { "California", "Oregon", "Washington", "Texas" };
I have observed in U-boot source that environment variables are stored in one dimensional array as shown here:
uchar default_environment[] = {
#ifdef CONFIG_BOOTARGS
"bootargs=" CONFIG_BOOTARGS "\0"
#endif
#ifdef CONFIG_BOOTCOMMAND
"bootcmd=" CONFIG_BOOTCOMMAND "\0"
#endif
...
"\0"
};
Can you help me understand this?
A "string" is effectively nothing more than a pointer to a sequence of chars terminated by a char with the value 0 (note that the sequence must be within a single object).
char a[] = {65, 66, 67, 0, 97, 98, 99, 0, 'f', 'o', 'o', 'b', 'a', 'r', 0, 0};
/* ^ ^ ^ ^ */
In the above array we have four elements with value 0 ... so you can see that as 4 strings
// string 1
printf("%s\n", a); // prints ABC on a ASCII computer
// string 2
printf("%s\n", a + 4); // prints abc on a ASCII computer
// string 3
printf("%s\n", a + 8); // prints foobar
// string 4
printf("%s\n", a + 14); // prints empty string
As per my understanding array of strings can be initialized as shown below or using two dimensional array. Please tell is there any other possibility.
I have observed in U-boot source that environment variables are stored in one dimensional array.
If you have the implication that this default_environment is an array of strings, then it is not. This has nothing to do with array of strings initialization as in your first example.
You can try remove all #ifdef and #endif, then it'd be clear that default_environment is simply a concatenation of individual strings. For instance, "bootargs=" CONFIG_BOOTARGS "\0". Notice the \0 at the end, it will ensure that the string assigned to default_environment will not get pass the first line, given CONFIG_BOOTARGS is defined.
uchar default_environment[] = {
#ifdef CONFIG_BOOTARGS
"bootargs=" CONFIG_BOOTARGS "\0"
#endif
#ifdef CONFIG_BOOTCOMMAND
"bootcmd=" CONFIG_BOOTCOMMAND "\0"
#endif
...
"\0"
};
They are not creating an array of strings there, such as your char *states[], it's a single string that is being created (as a char[]). The individual 'elements' inside the string are denoted by zero-termination.
To translate your example
char *states[] = { "California", "Oregon", "Washington", "Texas" };
to their notation would be
char states[] = { "California" "\0" "Oregon" "\0" "Washington" "\0" "Texas" "\0" "\0" };
which is the same as
char states[] = { "California\0Oregon\0Washington\0Texas\0\0" };
You can use these by getting a pointer to the start of each zero-terminated block and then the string functions, such as strlen will read until they see the next '\0' character.
As for the why of it, #M.M.'s comment gives some good indication.
If we simplify the question to what is the difference between initialising a char *foo versus a char foo[100] it might help. Check out the following code:
char buffer1[100] = "this is a test";
char *buffer2 = "this is a test";
int main(void)
{
printf("buffer1 = %s\n", buffer1);
buffer1[0] = 'T';
printf("buffer1 = %s\n", buffer1);
printf("buffer2 = %s\n", buffer2);
buffer2[0] = 'T';
printf("buffer2 = %s\n", buffer2); // this will fail
}
In the first case (buffer1) we initialise a character array with a string. The compiler will allocate 100 bytes for the array, and initialise the contents to "this is a test\0\0\0\0\0\0\0\0\0...". This array is modifiable like any other heap memory.
In the second case, we didn't allocate any memory for an array. All we asked the compiler to do is set aside enough memory for a pointer (4 or 8 bytes typically), then initialise that to point to a string stored somewhere else. Typically the compiler will either generate a separate code segment for the string, or else just store it inline with the code. In either case this string is read-only. So on the final line where I attempted to write to it, it caused a seg-fault.
That is the difference between initialising an array and a pointer.
The common technique is to use an array of pointers (note: this is different from a 2D array!) as:
char *states[] = { "California", "Oregon", "Washington", "Texas" };
That way, states is an array of 4 char pointer, and you use it simply printf("State %s", states[i]);
For completeness, a 2D array would be char states[11][5] with all rows having same length which is pretty uncommon, and harder to initialize.
But some special use cases or API(*) require (or return) a single char array where strings are (normally) terminated with \0, the array itself being terminated by an empty element, that it two consecutive \0. This representation allows a single allocation bloc for the whole array, when the common array of pointers has the array of pointers in one place and the strings themselves in another place. By the way, it is easy to rebuild an array of pointers from that 1D characater arrays with \0 as separators, and it is generally done to be able to easily use the strings.
The last interesting point of the uchar default_environment[] technique is that it is a nice serialization: you can directly save it to a file and load it back. And as I have already said, the common usage is then to build an array of pointers to access easily to the individual strings.
(*) For example, the WinAPI functions GetPrivateProfileSection and WritePrivateProfileSection use such a representation to set or get a list of key=value strings in one single call.
Yes, you can create and initialized array of string or hierarchy of arrays using pointers. Describing it in details in case someone needs it.
A single char
1. char states;
Pointer to array of chars.
2. char *states = (char *) malloc(5 * sizeof(char)):
Above statement is equivalent to char states[5];
Now its up to you if you initialize it with strcpy() like
strcpy(states, "abcd");
or use direct values like this.
states[0] = 'a';
states[1] = 'b';
states[2] = 'c';
states[3] = 'd';
states[4] = '/0';
Although if you store any other char on index 4 it will work but it would be better to end this using null character '\0' .
Pointer to array of pointers or pointer to a matrix of chars
3. char ** states = (char **) malloc(5 * sizeof(char *));
It is an array of pointers i.e. each element is a pointer, which can point to or in other word hold a string etc. like
states[0] = malloc ( sizeof(char) * number );
states[1] = malloc ( sizeof(char) * number );
states[2] = malloc ( sizeof(char) * number );
states[3] = malloc ( sizeof(char) * number );
states[4] = malloc ( sizeof(char) * number );
Above statement is equivalent to char states[5][number];
Again its up to you how you initialize these pointers to strings i.e.
strcpy( states[0] , "hello");
strcpy ( states[1], "World!!");
states[2][0] = 'a';
states[2][1] = 'b';
states[2][2] = 'c';
states[2][3] = 'd';
states[2][4] = '\0';
Pointer to matrix of pointers or pointer to 3D chars
char *** states = (char ***) malloc(5 * sizeof(char**));
and so on.
Actually each of these possibilities reaches somehow to pointers.

C -> sizeof string is always 8

#include "usefunc.h" //don't worry about this -> lib I wrote
int main()
{
int i;
string given[4000], longest = "a"; //declared new typdef. equivalent to 2D char array
given[0] = "a";
printf("Please enter words separated by RETs...\n");
for (i = 1; i < 4000 && !StringEqual(given[i-1], "end"); i++)
{
given[i] = GetLine();
/*
if (sizeof(given[i]) > sizeof(longest))
{
longest = given[i];
}
*/
printf("%lu\n", sizeof(given[i])); //this ALWAYS RETURNS EIGHT!!!
}
printf("%s", longest);
}
Why does it always return 8???
There is no string data type in C. Is this C++? Or is string a typedef?
Assuming string is a typedef for char *, what you probably want is strlen, not sizeof. The 8 that you are getting with sizeof is actually the size of the pointer (to the first character in the string).
It is treating it as a pointer, the sizeof a pointer is obviously 8bytes = 64 bits on your machine
You say "don't worry about this -> lib i wrote" but this is the critical piece of information, as it defines string. Presumably string is char* and the size of that on your machine is 8. Thus, sizeof(given[i]) is 8 because given [i] is a string. Perhaps you want strlen rather than sizeof.
This is common mistake between the array of characters itself, and the pointer to where that array starts.
For instance the C-style string literal:
char hello[14] = "Hello, World!";
Is 14 bytes (13 for the message, and 1 for the null terminating character).
You can use sizeof() to determine the size of a raw C-style string.
However, if we create a pointer to that string:
char* strptr = hello;
And attempt to find it's size with sizeof(), it will only always return the size of a data pointer on your system.
So, in other words, when you try to get the size of the string from a string library, you're truly only getting the size of the pointer to the start of that string. What you need to use is the strlen() function, which returns the size of the string in characters:
sizeof(strptr); //usually 4 or 8 bytes
strlen(strptr); //going to be 14 bytes
Hope this clears things up!

Initialize a string in C to empty string

I want to initialize string in C to empty string.
I tried:
string[0] = "";
but it wrote
"warning: assignment makes integer from pointer without a cast"
How should I do it then?
In addition to Will Dean's version, the following are common for whole buffer initialization:
char s[10] = {'\0'};
or
char s[10];
memset(s, '\0', sizeof(s));
or
char s[10];
strncpy(s, "", sizeof(s));
You want to set the first character of the string to zero, like this:
char myString[10];
myString[0] = '\0';
(Or myString[0] = 0;)
Or, actually, on initialisation, you can do:
char myString[10] = "";
But that's not a general way to set a string to zero length once it's been defined.
Assuming your array called 'string' already exists, try
string[0] = '\0';
\0 is the explicit NUL terminator, required to mark the end of string.
Assigning string literals to char array is allowed only during declaration:
char string[] = "";
This declares string as a char array of size 1 and initializes it with \0.
Try this too:
char str1[] = "";
char str2[5] = "";
printf("%d, %d\n", sizeof(str1), sizeof(str2)); //prints 1, 5
calloc allocates the requested memory and returns a pointer to it. It also sets allocated memory to zero.
In case you are planning to use your string as empty string all the time:
char *string = NULL;
string = (char*)calloc(1, sizeof(char));
In case you are planning to store some value in your string later:
char *string = NULL;
int numberOfChars = 50; // you can use as many as you need
string = (char*)calloc(numberOfChars + 1, sizeof(char));
To achieve this you can use:
strcpy(string, "");
string[0] = "";
"warning: assignment makes integer from pointer without a cast
Ok, let's dive into the expression ...
0 an int: represents the number of chars (assuming string is (or decayed into) a char*) to advance from the beginning of the object string
string[0]: the char object located at the beginning of the object string
"": string literal: an object of type char[1]
=: assignment operator: tries to assign a value of type char[1] to an object of type char. char[1] (decayed to char*) and char are not assignment compatible, but the compiler trusts you (the programmer) and goes ahead with the assignment anyway by casting the type char* (what char[1] decayed to) to an int --- and you get the warning as a bonus. You have a really nice compiler :-)
I think Amarghosh answered correctly. If you want to Initialize an empty string(without knowing the size) the best way is:
//this will create an empty string without no memory allocation.
char str[]="";// it is look like {0}
But if you want initialize a string with a fixed memory allocation you can do:
// this is better if you know your string size.
char str[5]=""; // it is look like {0, 0, 0, 0, 0}
It's a bit late but I think your issue may be that you've created a zero-length array, rather than an array of length 1.
A string is a series of characters followed by a string terminator ('\0'). An empty string ("") consists of no characters followed by a single string terminator character - i.e. one character in total.
So I would try the following:
string[1] = ""
Note that this behaviour is not the emulated by strlen, which does not count the terminator as part of the string length.

Resources