In a C function what's the difference between char and string? - c

I'm using a template from my teacher and at the beginning of the code is says:
#include "lab8.h"
void main(void)
{
int response;
int count;
string words[MAX_COUNT];
Later on in the function, a whole lot of words get put inside the words string. So I was like looking at that last line and got confused. I thought char declared strings? What does that last line even do? I also noticed in a couple of function parameter lists later on, there was entered "string words" instead of what I expected that mention char or something.
EDIT:
typedef char string[MAX_LENGTH];
had been written in the .h file didn't see it.

C does not have a basic data type called string.
Check the lab8.h file carefully. Usually, string should be a typedef of unsigned char.
Essentially, string words[MAX_COUNT]; defines an array of variable type string containing MAX_COUNT number of variables.

C does not have a dedicated string data type. In C, a string is a sequence of character values followed by a zero-valued byte. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example,
char word[] = { 'h', 'e', 'l', 'l', 'o', 0 };
stores the string "hello" in the array variable word. The array size is taken from the size of the initializer, which is 6 (5 characters plus the 0 terminator). The zero-valued byte serves as a sentinel value for string handling functions like strlen, strcpy, strcat, and for arguments to printf and scanf that use the %s and %[ conversion specifiers.
By contrast,
char arr[] = { 'h', 'e', 'l', 'l', 'o' };
stores a sequence of character values, but since there's no terminating 0-valued byte, this sequence is not considered a string, and you would not want to use it as an argument to any string-handling function (since there's no terminator, the function has no way of knowing where the string ends and will wind up attempting to access memory outside of the array, which can lead to anything from garbage output to a crash).
Without seeing the contents of lab8.h, I'm going to speculate that the string type is a typedef for an array of char, something like
#define MAX_STRING_LENGTH 20 // or some other value
typedef char string[MAX_STRING_LENGTH];
Thus, an array of string is an array of arrays of char; it would be equivalent to
char words[MAX_COUNT][MAX_STRING_LENGTH];
So each words[i] is an N-element array of char.

Related

I want to know how double quotes are used in C

In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.

Shouldn't it be impossible to point directly to text in C?

I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.

Integer warning for char pointer

Can someone help me understand why I would be getting "warning: cast to pointer from integer of different size" for the following two lines of code?
so I have a pointer to a string (char *string) and a double pointer (char **final) that needs to the store the address of the last char in string... I thought the following lines of code would work but I keep getting the error... How do I fix it?
char last = *string;
*final = (char *)last;
(char *)last
last is of type char. Casting it to a pointer means the numeric code of the character stored in last will be interpreted as an address. So if last contains A, then this will cause the value 65 to be interpreted as an address. (Assuming ASCII). The compiler is smart and indicates that this is probably not the behavior you intend.
If string is a pointer to the last character in the string, last is a copy of that character. Since it's just a copy of the value, it bears no relationship to the location in the original string. To save that pointer into what final points to, you should do:
*final = string;
To declare a variable you have to specify what type you want the variable to be, and then what you want to call the variable. If you want a variable of type "char", called "last", it can be achieved by the following syntax:
char last;
If you want a pointer to a variable of a certain data type, you add the asterisk symbol like so:
char *last;
Now you have a pointer that you can use to point at a place in memory which have to contain a char. If you are trying to create a "string" in c, that is nothing more but a series of char's, that are ordered consecutively in memory. You can use a char pointer to point at the first char in this series of char's, and then you can use specific functions that work on strings (for example strcpy or strlen), by giving this char pointer as input argument.
Now to your problem. Let's say you create a string like this:
char *str = "example";
what you have done is create a series of char's, namely
'e', 'x', 'a', 'm', 'p', 'l', 'e', '\0'
(where the '\0' is the NULL character that marks the end of the string. This is necessary for any functions working on strings to recognize where the string ends). The char pointer you have created called "str" points at the first char, that is 'e'. Remember, the pointer has the address of this char, and all the rest of the chars are stored in the address space following this first char.
To access a particular char in this string, you have to dereference the pointer "str". If you want the first char in the string, you do this:
char first = *char;
This will save the first char in a variable of type char called "first", that is in this case the letter 'e'. To get the second char you do this:
char second = *(char+1);
What you're actually doing is "reading" (dereferencing) the value that your char pointer "str" is pointing to + 1 step of size "char" in memory. In this example, this means that the variable of type char called "second" now contains (the ASCII-value representing) the second letter in the string, that is 'x'.
If you want the size of a string you can use the function strlen. The syntax is this:
int length = strlen(str);
where "str" is our char pointer that is pointing at the first char in our string (that is 'e'). strlen will return the length of the string, not including the NULL character '\0' that simply marks the end of the string. That means in our example, length will equal 7, since there are 7 letters in the word "example". If you want to extract the last letter of this string, now all you have to do is what we did before, but remember that indexing in C start at 0. This means that if you have a string of length 7, the last element of this string will be located at "index" 6. Thus, to get the last char of a string you have to do this:
char last = *(str+length-1);
or if you have not saved length to a variable of type int, you can do it like this instead:
char last = *(str+strlen(str)-1);
If you want a pointer, pointing to the last char of the string, you have to initialize a new char pointer and make it point to place (memory address) where the last char of "str" is located. By the same logic as before, this is given by the memory address of the char at "index" 6 of our original string "str". So you create a new pointer, and let that pointer point to this memory address like this:
char *last = str+strlen(str)-1;
Remember that you need to include the header file string.h at the top of your file like so:
#include <string.h>

Unknown erroneous characters are being added to strings?

I am learning C. Some characters are being added automatically to my program. What am I doing wrong?
#include <stdio.h>
#include <string.h>
int main() {
char test1[2]="xx";
char test2[2]="xx";
printf("test is %s and %s.\n", test1, test2);
return 0;
}
Here is how I am running it on Fedora 20.
gcc -o problem problem.c
./problem
test is xx?}� and xx#.
I would expect the answer would be test is xx and xx.
The issue is that string literals such as "xx" have an extra character that is the nul-termination, \0, that is, it is composed of the characters 'x', 'x' and '\0'.
This is how functions that take char* and treat them as strings know the extent of the strings. Your arrays are simply one element too short, missing the nul-terminator. By passing char* that don't point to a nul-terminated string to a function that expects one, you are invoking undefined behaviour.
You can initialize them like this instead:
char test[] = "xx";
This will result in test having the correct length of 3. You can test that using the sizeof operator. Of course, you can also be explicit about the length:
char test[3] = "xx";
but this is more error-prone.
When you define a String in C like this
char A[] = "hello";
It gets initialized something like this
A = { 'h', 'e', 'l', 'l', 'o', '\0'}
That last null character is needed for the it to be a string. So in your code
char test1[2]="xx";
You have made the test1 character array to be 2 characters long, leaving no space for the null character.
To correct your program, You can either not give the size of the character array, like
char test1[]="xx";
Or, give one more then the characters you are filling in, like
char test1[3]="xx";
In your code char test1[2]="xx", char test1[2] creates a kind a "container" for two chars, but the actual string "xx" implicitly has three chars xx0, where 0 indicates an end of the line. This 0 is an indicator for printf, where it should stop reading the input string. In your case printf doesn't get this 0 as 0 doesn't fit into the test1 and it reads to some random zero in memory, printing everything it meets on the way.
You should change your declaration to the following:
char test1[3]="xx"

C string initializer doesn't include terminator?

I am a little confused by the following C code snippets:
printf("Peter string is %d bytes\n", sizeof("Peter")); // Peter string is 6 bytes
This tells me that when C compiles a string in double quotes, it will automatically add an extra byte for the null terminator.
printf("Hello '%s'\n", "Peter");
The printf function knows when to stop reading the string "Peter" because it reaches the null terminator, so ...
char myString[2][9] = {"123456789", "123456789" };
printf("myString: %s\n", myString[0]);
Here, printf prints all 18 characters because there's no null terminators (and they wouldn't fit without taking out the 9's). Does C not add the null terminator in a variable definition?
Your string is [2][9]. Those [9] are ['1', '2', etc... '8', '9']. Because you only gave it room for 9 chars in the first array dimension, and because you used all 9, it has no room to place a '\0' character. redefine your char array:
char string[2][10] = {"123456789", "123456789"};
And it should work.
Sure it does, you just aren't leaving enough room for the '\0' byte. Making it:
char string[2][10] = { "123456789", "123456789" };
Will work as you expect (will just print 9 characters).
If you tell C that an array is a given size, C cannot make the array any larger. It would be disobeying you if it did so! Remember that not every char array contains a null terminated string. Sometimes the array (as used) is truly an array of (individual) char. The compiler doesn't know what you are doing and cannot read your mind.
This is why C allows you to initialize a char array where the null terminator won't fit but everything else will. Try your example with a string one byte longer and the compiler will complain.
Note that your example will compile but will not do what you expect, as the contents are not (null terminated) strings. With GCC, running your example, I see the string I should, followed by garbage.
Alterenatively, you can use:
char* myString[2] = {"123456789", "123456789" };
Like this, the initializer computes the right size for your null terminated strings.
C allows unterminated strings, C++ does not.
C allows character arrays to be
initialized with string constants. It
also allows a string constant
initializer to contain exactly one
more character than the array it
initializes, i.e., the implicit
terminating null character of the
string may be ignored. For example:
char name1[] = "Harry"; // Array of 6 char
char name2[6] = "Harry"; // Array of 6 char
char name3[] = { 'H', 'a', 'r', 'r', 'y', '\0' };
// Same as 'name1' initialization
char name4[5] = "Harry"; // Array of 5 char, no null char
C++ also allows character arrays to be
initialized with string constants, but
always includes the terminating null
character in the initialization. Thus
the last initializer (name4) in the
example above is invalid in C++.
Is there a reason why the compiler doesn't warn that there isn't enough room for the 0 byte? I get a warning if I try to add another '9' that won't fit, but it doesn't seem to care about dropping the 0 byte?
The '\0' byte isn't it's problem. Most of the time, if you have this:
char code[9] = "123456789";
The next byte will be off the edge of the variable, but will be unused memory, and will most likely be 0 (unless you malloc() and don't set the values before using them). So most of the time it works, even if it's bad for you.
If you're using gcc, you might also want to use the -Wall flag, or one of the other (million) warning flags. This might help (not sure).

Resources