arrays does not null from beginning - c

I'm a beginner in C .... I have a little code:
#include <stdio.h>
#include <string.h>
int main(){
char str1[100];
char str2[100];
char str3[100];
char str4[100];
puts(str1)
puts(str2);
puts(str3);
puts(str4);
return 0;
}
I got result
2
èý(
‘Q]wØ„ÃîþÿÿÿÀ"bwd&bw
I don't know why my array does not empty from the begin. And I have to set first element to "\0" to clear content of array. Can anyone explain for me. Thank a lot.

In C, local variables are not initialized automatically if you don't assign values to them. Here your arrays are uninitialized, which means they may contain garbage after their creation.
Yes, you need to explicitly set it to be "empty" like:
char str[100];
str[0] = '\0';
// Now you have an empty string of zero length.
assert(strlen(str) == 0);
// But the size is still 100.
printf ("%d", sizeof(str));
Alternatively, you can create an empty string(character array) during the initialization. It has the same size and length as the example above.
char str[100] = "";

As for why it doesn't automatically zero the string, it's because that would be costly to do so, and C generally doesn't do costly things that you don't explicitly tell it to do. At a minimum, it would have to set the first element of every array to zero,and there are plenty of occasions where you wouldn't want to or need to initialize the array like this. If C always did this for you, then you'd always have that useless overhead that you couldn't get rid of.
As a general rule, C doesn't do anything in the background that you don't explicitly tell it to do, so when you ask for an array, it just gives you an array, and doesn't touch the contents unless you tell it to. It can create a little bit more work for the programmer, but with the benefit of more finely-grained control over exactly what the computer is doing.
Some people would consider that it's a good programming practice to always initialize your variables anyway, and to forget about this kind of tiny cost, and a lot of the time they'll have a good point, but C is deliberately a very flexible and low-level language, and it just doesn't force you to do things like this.

one is getting old when one says "In my days...". But nevertheless, "in my days", people were instructed to first declare variables, and directly afterward initialise variables.
In your case, you can do both together and even more thoroughly in one statement.
The solution of Eric Z is the correct one, that I would also use when I'm working the C-way. But to be complete for you, what age_pan describes is that Java inherently does te following:
#include <stdio.h>
int main(int argc, const char * argv[])
{
char str1[100] = { 0 };
char str2[100] = { 0 };
char str3[100] = { 0 };
char str4[100] = { 0 };
puts(str1);
puts(str2);
puts(str3);
puts(str4);
return 0;
}
The difference is that in the solution of Eric Z only the first character is set to 0, which means that you create a zero length zero terminated string. The Java method (shown in the code above) initialises every little byte to 0.
There are pro's and con's to the Java initialisation. It leads to sloppy programming (some call it easier programming) and it takes time if you don't need initialising. On the other hand, I know very little people that need te extra milliseconds that are lost by the initialisation.
Is it necessary to declare variables above the code, and to initialise them? Certainly not. Is it useful? It most certainly is. It avoids all kinds of errors that take a lot of time to debug.
By the way, you are missing a ; after puts(str1) :-)
Kind regards,
PB

I don't think you had any trouble if the array doesn't start with "empty". In C, the variables start with random values. Unlike in Java, when you declare a variable, the JVM will initiate it by default.

Related

grammatical difficulties, unsuspected output

may you please tell me why by running this two codes I have different output?
void UART_OutString(unsigned char buffer[]){
int i;
while(buffer[i]){
UART_OutChar(buffer[i]);
i++;
}
}
and
void UART_OutString(unsigned char buffer[]){
int i = 0;
while(buffer[i]){
UART_OutChar(buffer[i++]);
}
}
regards, Genadi
You didn't initialize the i variable in the first case, so it's an uninteresting typo bug that your compiler ought to warn you about...
That being said, we can apply the KISS principle and rewrite the whole code in the most readable way possible, a for loop, which by its nature makes it very hard to forget to initialize the loop iterator:
void UART_OutString(const char* buf[]){
for(int i=0; buf[i]!='\0'; i++){
UART_OutChar(buffer[i]);
}
}
As it turns out, the most readable way is very often the fastest way possible too.
(However, int might be inefficient on certain low-end systems, so if you are fine with only using strings with length 255 or less, uint8_t i would be a better choice. Embedded systems should never use int and always the stdint.h types.)
For what it's worth, I'd implement this as
void UART_OutChar(unsigned char c);
void UART_OutString(unsigned char buffer[]){
for(unsigned char *p = buffer; *p; p++) {
UART_OutChar(*p);
}
}
to avoid the separate counter variable at all.
It is always a good idea to initialize local variables, especially in C where you should assume that nothing is done for you (because that's usually the case). There is a reason why regulated languages would not allow you to do this.
I believe reading the unassigned variable will result in unspecified behaviour (effectively C doesn't know there isn't meant to be anything there and will just grab what ever), this means it is completely un-predictable.
This could also cause all kinds of problems as you then index an array with it and C will not stop you from indexing an array out of bounds so if the random i value C happens to grab is larger than the size of the array then you will experience undefined behaviour in what buffer[i] returns. This one could be particularly nasty as it could cause any kind of memory read / segmentation fault are crash your program depending on quite what it decides to read.
Therefor unassigned i = random behaviour, and you then get more random behaviour from using that i value to index your array.
I believe this is about all the reasons that this is a bad idea. In C it is particular important to pay attention to things like this as it will often allow you to compile and run your code.
Both initialising i, and using the solution in #AKX's answer are good solutions although i thaught this would more answer your question of why they return differently. To which really the answer is the first approach returns completely randomly

Dynamically allocating and copying an array

I sometimes see code like this:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
answer = input;
return answer;
}
People often say this code doesn't work and that this pattern
answer = malloc(inputLength + 1);
answer = input;
makes no sense. Why is it so? To my eye, the code is OK. It allocates the right amount of memory for the answer, and then copies the input to the answer. And it seems to work in my tests, for example
int main()
{
printf ("%s\n", copyStr("Hello world!"));
}
does what I expect it to do. So what's wrong with it?
To put it simply. This code:
var = foo();
var = bar();
is 100% equivalent to this in all1 situations:
foo();
var = bar();
Furthermore, if foo() has no side effects, it's 100% equivalent to just the last line:
// foo();
var = bar();
This goes for ANY function, including malloc. If we for a moment forget what malloc does and just focus on what just have been said, we can quickly realize what's written in the comments in this code:
answer = malloc(inputLength + 1);
// Here, the variable answer contains the return value from the call to malloc
answer = input;
// Here, it contains the value of input. The old value is overwritten, and
// is - unless you saved it in another variable - permanently lost.
What malloc does really simple. It returns a pointer to a memory block, or a NULL pointer if the allocation failed.2 That's it. What you are doing with a call like ptr = malloc(size) is absolutely nothing more fancy than storing that address in the pointer variable ptr. And pointer variables are in the same way no more fancy than other variables like int or float. An int stores an integer. A pointer stores a memory address. There's no magic here.
1It's 100% equivalent except you're doing really fancy stuff like reading the variable var with an external program
2malloc(0) can return a non-null pointer, but in practice it does not make a difference since it would be undefined behavior to dereference it, and allocating zero bytes is a pretty pointless (haha, point) operation.
To answer this question, let's look at a somewhat simpler code fragment first.
int answer;
answer = 42;
answer = 0;
Even the most cursory of observers would notice that the first assignment
answer = 42;
is useless. It places the value of 42 into answer, only to be thrown away and replaced with 0 at the very next instant of time. So that line of code can be thrown away completely.
Let's verify this by looking at optimised assembly code generated by a C compiler. As we can see, the line answer = 42; does not indeed have any effect on the resulting machine code.
Now compare this to the code in question
answer = malloc(inputLength + 1);
answer = input;
If reasoning by analogy is valid in this case, then we must conclude that the first assignment is useless and can omitted. We place something (the result of malloc) in answer, only to be thrown away and replaced by something else a moment later.
Of course we cannot say whether it is applicable without further research, but we can confirm our suspicion by looking at the generated assembly again. And it is confirmed. The compiler does not even generate any calls to malloc and strlen! They are indeed useless.
So where does this intuition
It allocates the right amount of memory for the answer, and then copies the input to the answer
break down?
The problem lies in the eternal confusion between pointers and arrays.
One may often see claims that in C, arrays are pointers, or that pointers are arrays, or that arrays and pointers are interchangeable, or any number of variations thereof. These claims are all false and misleading. Pointers and arrays are completely different things. They often work together, but that's far cry from being one and the same. Let's break down pointers and arrays in the code example.
input is a pointer variable
input (presumably) points into a string, which is an array of char
answer is another pointer variable
malloc(...) dynamically allocates a new array of char and returns a pointer that points into said array
answer = malloc(...) copies that pointer to answer, now answer points into the array allocated by malloc
answer = input copies another pointer (that we have already seen above) into answer
now answer and input point into the same string, and the result of malloc is forgotten and thrown away
So this explains why your code is doing what you expect it to do. Instead of having two identical copies of the string "Hello world!" you have just one string and two different pointers into it. Which might seem like that's just what the doctor ordered, but it breaks down as soon as we do something ever so slightly complicated. For example, code like this
char *lineArray[MAX_LINES];
char buffer[BUF_LEN];
int i = 0;
while (i < MAX_LINES && fgets(buffer, BUF_LEN, stdin)) {
lineArray[i++] = copyStr(buffer);
}
will end up with every element of stringArray pointing into the same string, instead of into a bunch of different lines taken from stdin.
OK, so now we have established that answer = input copies a pointer. But we want to copy an array, which we have just allocated space for! How do we do that?
Since our arrays are presumably NUL-terminated character strings, we can use a standard library function designed for copying NUL-terminated character strings.
strcpy(answer, input);
For other arrays we can use memcpy. The main difference is that we have to pass down the array length.
memcpy(answer, input, inputLength + 1);
Both variants will work in our case, but the first one is preferred because it reaffirms that we are dealing with strings. Here's the fixed copyStr for completeness:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
strcpy(answer, input);
return answer;
}
Incidentally, it works almost the same as the non-standard but widely available strdup function (strdup has a better signature and working error checks, which we have omitted here).

Differences between moments of char pointer assignment

Today I was told, that this code:
int main(){
char *a;
a = "foobar";
/* a used later to strcpy */
return 0;
}
Is bad, and may lead to problems and errors.
However, my code worked without any problems, and I don't understand, what is the difference between this, and
int main(){
char *a = "foobar";
/* a used later to strcpy */
return 0;
}
Which was described to me as the "correct" way.
Could someone describe, why these two codes are different?
And, if the first one may be problematic, show an example of this?
Functionally, they are the same.
In the former snippet, a is assigned to a string literal; in the latter, a is initialized with a string literal.
In both cases, a points to string literal (which can't be modified).
There's no reason to consider one as more correct than the other. I'd prefer the latter - but that's just my personal preference.
Both snippets are equally bad, because both end in a non const pointer pointing to const data. If the non const pointer is used to (try to) change the data, you will get Undefined Behaviour: everything can happen from it works to program crashes including modifying instruction is ignored.
The correct way is to either use a const pointer or to initialize a non const array.
const char *a = "foobar";
or
char a[] = "foobar";
But beware, in latter case you have a true array not a pointer, so you could also do if you really need pointer semantics:
char _a[] = "foobar";
char *a = _a;
There are some places that have coding standards, for instance to help with static code analysis by tools like Coverity.
A coding practice rule that I have seen several places is that variables should always be declared initialized to simplify things to make analysis easier.
You second snippet hews more closely to that rule than the first, as its impossible to insert new code where a could be used uninitialized.
That's a positive benefit when it comes to code maintenance.

String starting state in C

Sorry if this is a bit of a starter question but I am pretty new to C. I am using the GCC complier. When I write a program with a string in, if the string is beyond a certain length it appears to start with some contents. I am worried about just overwritting it as it could be being used by another program. Here is an example code that shows the issue:
#include <stdio.h>
// Using the GCC Compiler
// Why is there already something in MyString?
int main(void) {
char MyString[250];
printf("%s", MyString);
getch();
return 0;
}
How do I SAFELY avoid this issue? Thanks for your help.
Why is there already something in MyString?
myString is not initiailized and can contain anything.
To initialize to an empty string:
char MyString[250] = { 0 };
or as pointed out by unwind in his answer:
char MyString[250] = "";
which is more readable (and consistent with the following).
To initialize to a string:
char myString[250] = "some-string";
I am worried about just overwritting it as it could be being used by another program
Each running instance of your program will have its own myString.
For some reason many are recommending the array-style initialization of
char myString[50] = { 0 };
however, since this array is intended to be used as a string, I find it far clearer and more intuitive (and simpler syntactically) to use a string initializer:
char myString[50] = "";
This does exactly the same thing, but makes it quite a lot clearer that what you intend to initialize the array as is in fact an empty string.
The situation you're seeing with "random" data is just what happens to be in the array, since you are not initializing it you simply get what happens to be there. This does not mean that the memory is being used by some other program at the same time, so you don't need to worry about that. You do need to worry about handing a pointer to an array of char that is not properly 0-terminated to any C function expecting a string, though.
Technically you are then invoking undefined behavior, which is something you should avoid. It can easily crash your program, since there's no telling how far away into memory you might end up. Operating systems are free to kill processes that try to access memory that they're not allowed to touch.
Properly initializing the array to an empty string avoids this issue.
The problem is that your string is not initialized.
A C-String ends with ends with '\0', so you should simply put something like
MyString[0] = '\0';
behind your declaration. This way you make sure that functions like printf work the way you expect them to work.
char MyString[250] = {0};
but for good use
std::string
Since you have initialzed the char array to any value, it'll contain some garbage value. It's a good programming practice to use something like:
char MyString[250] = "My Array"; // If you know the array to be used
char MyString[250] = '\0'; // If you don't intend to fill the char array data during initialization

Explain the output of this C code?

I wrote this code today, just out of experimentation, and I'm trying to figure out the output.
/*
* This code in C attempts to exploit insufficient bounds checking
* to legitimate advantage.
*
* A dynamic structure with the accessibility of an array.
* Handy for small-time code, but largely unreliable.
*/
int array[1] = {0};
int index = 0;
put(), get();
main ( )
{
put(1); put(10), put(100);
printf("%6d %5d %5d\n", get(0), get(1), get(2));
}
put ( x )
int x;
{
array[index++] = x;
}
get ( index )
int index;
{
return array[index];
}
The output:
1 3 100
There is a problem there, in that you declare 'array' as an array of length 1 but you write 3 values to it. It should be at least 'array[3]'. Without that, you are writing to unallocated memory, so anything could happen.
The reason it outputs '3' there without the fix is that it is outputting the value of the global 'index' variable, which is the next int in memory (in your case - as I said anything could happen). Even though you do overwrite this with your put(10) call, the index value is used in as the index in the assignment and then post-incremented, which will set it back to 2 - it then gets set to 3 at the end of the put(100) call and subsequently output via printf.
It's undefined behavior, so the only real explanation is "It does some things on one machine and other things on other machines".
Also, what's with the K&R function syntax?
EDIT: The printf guess was wrong. As far as the syntax, read K&R 2nd Edition (the cover has a red ANSI stamp), which uses modern function syntax (among other useful updates).
To expand on what has been said, accessing out-of-bounds array members results in undefined behavior. Undefined behavior means that literally anything could happen. There is no way to exploit undefined behavior unless you're deep into esoteric platform-specific hacks. Don't do it.
If you do want a "dynamic array", you'll have to take care of it yourself. If your requirements are simple, you can just malloc and realloc a buffer. If your needs are more complicated, you might want to define a struct that keeps a separate buffer, a size, and a count, and write functions that operate on that struct. If you're just learning, try it both ways.
Finally, your function declaration syntax is valid, but archaic. That form is rarely seen, and virtually unheard of in new code. Declare put as:
int put(int x) {…}
And always declare main as:
int main(int argc, char **argv) {…}
The names of argc and argv aren't important, but the types are. If you forget those parameters, demons could fly out of your nose.

Resources