I'm dusting off of my C skills for an upcoming class and I came across this weird output with printf after building a string using getchar. Specifically, any string I try to output gets the same sequence of characters appended to each letter. foo becomes "f?8#{?o?8#{?o?8#{?" compiling with cc, and f¿:¿o¿:¿0¿:¿ with Apple LLVM 5.0 (Xcode). Here is the sample code that illustrates the issue:
char * input_buffer = malloc( sizeof( char ) );
char c;
while ( ( c = getchar() ) != '\n' ) {
strcat(input_buffer, &c);
}
// problem output
printf( "\n%s\n", input_buffer );
// foo -> f¿:¿o¿:¿0¿:¿
// weird side effect is the 4 is required to get a proper len
printf("\ncharacters: %lu\n", strlen( input_buffer ) / 4 );
I've searched everywhere but I'm not seeing this anywhere else, but then this seems like a bit of an edge case. Is this is some kind of an encoding issue that I am not taking into account?
You cannot call strcat(input_buffer, &c);.
Each of the arguments passed to strcat must be a valid null-terminated string of characters.
The chances of the next byte after &c being 0 are pretty slim.
The chances of the first byte pointed by input_buffer being 0 aren't very high either.
In other words, strcat reads "junk" until it encounters a 0 character, in both arguments.
Change:
while ( ( c = getchar() ) != '\n' ) {
strcat(input_buffer, &c);
}
To:
for (int i=0; 1; i++)
{
c = getchar();
if (c == '\r' || c == '\n')
{
input_buffer[i] = 0;
break;
}
input_buffer[i] = c;
}
You are allocating space to input_buffer for only one char.
strcat(input_buffer, &c); is wrong. You are concatenating character (it is not null terminated) with a string.
getchar returns int type but you declared c is of type char.
char * input_buffer = malloc( sizeof( char ) );
sizeof (char) is 1 by definition. This allocates space for a single character, and makes input_buffer point to it.
You're also not checking whether the allocation succeeded. malloc returns a null pointer on failure; you should always check for that.
And the allocated char object that input_buffer points to contains garbage.
char c;
while ( ( c = getchar() ) != '\n' ) {
strcat(input_buffer, &c);
}
getchar() returns an int, not a char. You can assign the result to a char object, but by doing so you lose the ability to detect and end-of-file or error condition. getchar() returns EOF when there are no more characters to be read; you should always check for that, and doing so requires storing the result in an int. (EOF is an integer value that's unequal to any valid character.)
strcat(input_buffer, &c);
input_buffer points to a single uninitialized char. You can treat it as an array consisting of a single char element. The first argument to strcat must already contain a valid null-terminated string, and it must have enough space to hold that string plus whatever you're appending to it.
c is a single char object, containing whatever character you just read with getchar(). The second argument tostrcatis achar*, so you've got the right type -- but thatchar*` must point to a valid null-terminated string.
strcat will first scan the array pointed to by input_buffer to find the terminating '\0' character so it knows where to start appending -- and it will probably scan into memory that's not part of any object you've declared or allocated, possibly crashing your program. If that doesn't blow up, it will then copy characters starting at c, and going past it into memory that you don't own. You have multiple forms of undefined behavior.
You don't need to use strcat to append a single character to a string; you can just assign it.
Here's a simple example:
char input_buffer[100];
int i = 0; /* index into input_buffer */
int c;
while ((c = getchar()) != '\n' && c != EOF) {
input_buffer[i] = c;
i ++;
}
input_buffer[i] = '\0'; /* ensure that it's properly null-terminated */
I allocated a fixed-size buffer rather than using malloc, just for simplicity.
Also for simplicity, I've omitted any check that the input doesn't go past the end of the input buffer. If it does, the program may crash if you're lucky; if you're not lucky, it may just appear to work while clobbering memory that doesn't belong to you. It will work ok if the input line isn't too long. In any real-world program, you'll want to check for this.
BTW, what's being done here is more easily done using fgets() -- but it's good to learn how things work on a slightly lower level.
Related
i got a string and a scanf that reads from input until it finds a *, which is the character i picked for the end of the text. After the * all the remaining cells get filled with random characters.
I know that a string after the \0 character if not filled completly until the last cell will fill all the remaining empty ones with \0, why is this not the case and how can i make it so that after the last letter given in input all the remaining cells are the same value?
char string1 [100];
scanf("%[^*]s", string1);
for (int i = 0; i < 100; ++i) {
printf("\n %d=%d",i,string1[i]);
}
if i try to input something like hello*, here's the output:
0=104
1=101
2=108
3=108
4=111
5=0
6=0
7=0
8=92
9=0
10=68
You have an uninitialized array:
char string1 [100];
that has indeterminate values. You could initialize the array like
char string1 [100] = { 0 };
or
char string1 [100] = "";
In this call
scanf("%[^*]s", string1);
you need to remove the trailing character s, because %[] and %s are distinct format specifiers. There is no %[]s format specifier. It should look like this:
scanf("%[^*]", string1);
The array contains a string terminated by the zero character '\0'.
So to output the string you should write for example
for ( int i = 0; string1[i] != '\0'; ++i) {
printf( "%c", string1[i] ); // or putchar( string1[i] );
putchar( '\n' );
or like
for ( int i = 0; string1[i] != '\0'; ++i) {
printf("\n %d=%c",i,string1[i]);
putchar( '\n' );
or just
puts( string1 );
As for your statement
printf("\n %d=%d",i,string1[i]);
then it outputs each character (including non-initialized characters) as integers due to using the conversion specifier d instead of c. That is the function outputs internal ASCII representations of characters.
I know that a string after the \0 character if not filled completly
until the last cell will fill all the remaining empty ones with \0
No, that's not true.
It couldn't be true: there is no length to a string. No where neither the compiler nor any function can even know what is the size of the string. Only you do. So, no, string don't autofill with '\0'
Keep in minds that there aren't any string types in C. Just pointer to chars (sometimes those pointers are constant pointers to an array, but still, they are just pointers. We know where they start, but there is no way (other than deciding it and being consistent while coding) to know where they stop.
Sure, most of the time, there is an obvious answer, that make obvious for any reader of the code what is the size of the allocated memory.
For example, when you code
char string1[20];
sprintf(string1, "hello");
it is quite obvious for a reader of that code that the allocated memory is 20 bytes. So you may think that the compiler should know, when sprinting in it of sscaning to it, that it should fill the unused part of the 20 bytes with 0. But, first of all, the compiler is not there anymore when you will sscanf or sprintf. That occurs at runtime, and compiler is at compilation time. At run time, there is not trace of that 20.
Plus, it can be more complicated than that
void fillString(char *p){
sprintf(p, "hello");
}
int main(){
char string1[20];
string1[0]='O';
string1[1]='t';
fillString(&(string1[2]));
}
How in this case does sprintf is supposed to know that it must fill 18 bytes with the string then '\0'?
And that is for normal usage. I haven't started yet with convoluted but legal usages. Such as using char buffer[1000]; as an array of 50 length-20 strings (buffer, buffer+20, buffer+40, ...) or things like
union {
char str[40];
struct {
char substr1[20];
char substr2[20];
} s;
}
So, no, strings are not filled up with '\0'. That is not the case. It is not the habit in C to have implicit thing happening under the hood. And that could not be the case, even if we wanted to.
Your "star-terminated string" behaves exactly as a "null-terminated string" does. Sometimes the rest of the allocated memory is full of 0, sometimes it is not. The scanf won't touch anything else that what is strictly needed. The rest of the allocated memory remains untouched. If that memory happened to be full of '\0' before the call to scanf, then it remains so. Otherwise not. Which leads me to my last remark: you seem to believe that it is scanf that fills the memory with non-null chars. It is not. Those chars were already there before. If you had the feeling that some other methods fill the rest of memory with '\0', that was just an impression (a natural one, since most of the time, newly allocated memory are 0. Not because a rule says so. But because that is the most frequent byte to be found in random area of memory. That is why uninitialized variables bugs are so painful: they occur only from times to times, because very often uninitialized variables are 0, just by chance, but still they are)
The easiest way to create a zeroed array is to use calloc.
Try replacing
char string1 [100];
with
char *string1=calloc(1,100);
i just saw this "while(something);" syntax. i googled this but did not found anything. how does this work? especially second while in the example code confuses me.
this code is a program to concatenate two strings using pointer.
#include <stdio.h>
#define MAX_SIZE 100 // Maximum string size
int main()
{
char str1[MAX_SIZE], str2[MAX_SIZE];
char * s1 = str1;
char * s2 = str2;
/* Input two strings from user */
printf("Enter first string: ");
gets(str1);
printf("Enter second string: ");
gets(str2);
/* !!!!!!!!!!!!!!!!! this is it!!!!!!!!!!!!!!!!!!!! Move till the end of str1 */
while(*(++s1));
/* !!!!!!!!!!!!!!!!! this is it!!!!!!!!!!!!!!!!!!!! Copy str2 to str1 */
while(*(s1++) = *(s2++));
printf("Concatenated string = %s", str1);
return 0;
}
The while loop is defined in C the following way
while ( expression ) statement
In this while loop
while(*(++s1));
the statement is a null statement. (The C Standard, 6.8.3 Expression and null statements)
3 A null statement (consisting of just a semicolon) performs no
operations.
So in the above while loop the expression is evaluated cyclically until it logically becomes false.
Pay attention to that this while loop has a bug.;)
Let's assume that the pointed string is empty "". In memory it is represented the following way
{ '\0' }
So initially s1 points to the terminating zero.
But before dereferencing it is incremented in the expression of the while loop
while(*(++s1));
^^^^
and after that points in the uninitialized part of the character array after the terminating zero '\0'. So the loop can invoke undefined behavior.
It would be more correctly to rewrite it like
while( *s1 != '\0' ) ++s1;
In this case after the loop the pointer s1 will point to the terminating zero '\0' of the source string.
This while loop where the statement is again a null statement
while(*(s1++) = *(s2++));
can be rewritten the following way
while( ( *s1++ = *s2++ ) != '\0' );
that is in essence the same as
while( ( *s1 = *s2 ) != '\0' )
{
++s1;
++s2;
}
(except that if the terminating zero was encountered and copied the pointers are not incremented)
That is the result of the assignment ( *s1 = *s2 ) is the assigned character that is checked whether it is equal already to the terminating zero character '\0'. And if so the loop stops and it means that the string pointed to by the pointer s2 is appended to the string pointed to by the pointer s1.
Pay attention to that the function gets is unsafe and is not supported by the C Standard. Instead you should use the function fgets as for example
#include <string.h>
#include <stdio.h>
//...
printf("Enter first string: ");
fgets(str1, sizeof( str1 ), stdin );
str1[ strcspn( str1, "\n" ) ] = '\0';
The last statement is used to remove the new line character '\n' that can be appended to the entered string by the function call.
Also you need to check in the program whether there is enough space in the array str1 and the string stored in the array str2 can be indeed appended to the string stored in the array str1.
while(*(++s1)); is an obfuscated and bugged way of writing while(*s1 != '\0') { s1++; }.
(It should have been while(*(s1++)); to behave as expected, but that too is wrong since it increments the pointer upon failure and won't work with an empty string.)
while(*(s1++) = *(s2++)); is an obfuscated (and likely inefficient) way of writing strcpy(s1,s2);.
The whole program is an obfuscated way of writing strcat(s1, s2);. You can replace both of these buggy while loops with that single function call.
Generally while(something); is bad practice, to the point where compilers might even warn for it, since it isn't clear if the semicolon ended up there on purpose or by a slip of the finger. Preferred style is either:
while(something)
; // aha this was surely not placed there by accident
or
while(something){}
or
while(something)
{}
++s1 advances (or increments) the pointer, before the while checks it value
The while loop will iterate through the string until it will reach the null terminator, since while(NULL) is equal to while(false) or while(0)
The loop
while(*(++s1));
doesn't need a body because everything is done inside the loop condition.
Therefore the loop body is an empty statement ;.
The loop consists the following steps:
++s1 increment pointer
*(...) dereference pointer, i.e. get the data where the pointer points to.
use the value as the condition (0 is false, everything else is true)
The loop can be rewritten as
do
{
++s1;
}
while(*s1); // or while(*s1 != '\0');
Similarly, the other loop
while(*(s1++) = *(s2++));
can be written as
do
{
char c;
*s1 = *s2;
c = *s1;
s1++;
s2++;
}
while(c != '\0')
Note that the original loop condition contains an assignment (=), not a comparison (==). The assigned value is used as the loop condition.
This is my code for two functions in C:
// Begin
void readTrain(Train_t *train){
printf("Name des Zugs:");
char name[STR];
getlinee(name, STR);
strcpy(train->name, name);
printf("Name des Drivers:");
char namedriver[STR];
getlinee(namedriver, STR);
strcpy(train->driver, namedriver);
}
void getlinee(char *str, long num){
char c;
int i = 0;
while(((c=getchar())!='\n') && (i<num)){
*str = c;
str++;
i++;
}
printf("i is %d\n", i);
*str = '\0';
fflush(stdin);
}
// End
So, with void getlinee(char *str, long num) function I want to get user input to first string char name[STR] and to second char namedriver[STR]. Maximal string size is STR (30 charachters) and if I have at the input more than 30 characters for first string ("Name des Zuges"), which will be stored in name[STR], after that I input second string, which will be stored in namedriver, and then printing FIRST string, I do not get the string from the user input (first 30 characters from input), but also the second string "attached" to this, I simply do not know why...otherwise it works good, if the limit of 30 characters is respected for the first string.
Here my output, when the input is larger than 30 characters for first string, problem is in the row 5 "Zugname", why I also have second string when I m printing just first one...:
Name des Zugs:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
i is 30
Name des Drivers:xxxxxxxx
i is 8
Zugname: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxxxxxxxx
Drivername: xxxxxxxx
I think your issue is that your train->name is not properly terminated with '\0', as a consequence when you call printf("%s", train->name) the function keeps reading memory until it finds '\0'. In your case I guess your structure looks like:
struct Train_t {
//...
char name[STR];
char driver[STR];
//...
};
In getlinee() function, you write '\0' after the last character. In particular, if the input is more than 30 characters long, you copy the first 30 characters, then add '\0' at the 31-th character (name[30]). This is a first buffer overflow.
So where is this '\0' actually written? well, at name[30], even though your not supposed to write there. Then, if you have the structure above when you do strcpy(train->name, name); you will actually copy a 31-bytes long string: 30 chars into train->name, and the '\0' will overflow into train->driver[0]. This is the second buffer overflow.
After this, you override the train->driver buffer so the '\0' disappears and your data in memory basically looks like:
train->name = "aaa...aaa" // no '\0' at the end so printf won't stop reading here
train->driver = "xxx\0" // but there
You have an off-by-one error on your array sizes -- you have arrays of STR chars, and you read up to STR characters into them, but then you store a NUL terminator, requiring (up to) STR + 1 bytes total. So whenever you have a max size input, you run off the end of your array(s) and get undefined behavior.
Pass STR - 1 as the second argument to getlinee for the easiest fix.
Key issues
Size test in wrong order and off-by-one. ((c=getchar())!='\n') && (i<num) --> (i+1<num) && ((c=getchar())!='\n'). Else no room for the null character. Bad form to consume an excess character here.
getlinee() should be declared before first use. Tip: Enable all compiler warnings to save time.
Other
Use int c; not char c; to well distinguish the typical 257 different possible results from getchar().
fflush(stdin); is undefined behavior. Better code would consume excess characters in a line with other code.
void getlinee(char *str, long num) better with size_t num. size_t is the right size type for array sizing and indexing.
int i should be the same type as num.
Better code would also test for EOF.
while((i<num) && ((c=getchar())!='\n') && (c != EOF)){
A better design would return something from getlinee() to indicate success and identify troubles like end-of-file with nothing read, input error, too long a line and parameter trouble like str == NULL, num <= 0.
I believe you have a struct similar to this:
typedef struct train_s
{
//...
char name[STR];
char driver[STR];
//...
} Train_t;
When you attempt to write a '\0' to a string that is longer than STR (30 in this case), you actually write a '\0' to name[STR], which you don't have, since the last element of name with length STR has an index of STR-1 (29 in this case), so you are trying to write a '\0' outside your array.
And, since two strings in this struct are stored one after another, you are writing a '\0' to driver[0], which you immediately overwrite, hence when printing out name, printf doesn't find a '\0' until it reaches the end of driver, so it prints both.
Fixing this should be easy.
Just change:
while(((c=getchar())!='\n') && (i<num))
to:
while(((c=getchar())!='\n') && (i<num - 1))
Or, as I would do it, add 1 to array size:
char name[STR + 1];
char driver[STR + 1];
I am learning C and a I came across this function in my study materials. The function accepts a string pointer and a character and counts the number of characters that are in the string. For example for a string this is a string and a ch = 'i' the function would return 3 for 3 occurrences of the letter i.
The part I found confusing is in the while loop. I would have expected that to read something like while(buffer[j] != '\0') where the program would cycle through each element j until it reads a null value. I don't get how the while loop works using buffer in the while loop, and how the program is incremented character by character using buffer++ until the null value is reached. I tried to use debug, but it doesn't work for some reason. Thanks in advance.
int charcount(char *buffer, char ch)
{
int ccount = 0;
while(*buffer != '\0')
{
if(*buffer == ch)
ccount++;
buffer++;
}
return ccount;
}
buffer is a pointer to a set of chars, a string, or a memory buffer holding char data.
*buffer will dereference the value at buffer, as a char. This can be compared with the null character.
When you add to buffer - you are adding to the address, not the value it points to, buffer++ adds 1 to the address, pointing to the next char. This means that now *buffer results in the next character.
In the loop you are incrementing the pointer buffer until it points to the null character, at which point you know you scanned the whole string. Instead of buffer[j], which is equivalent to *(buffer+j), we are incrementing the pointer itself.
When you say buffer++ you increment the address stored in buffer by one.
Once you internalize how pointers work, this code is cleaner than the code that uses a separate index to scan the character string.
In C and C++, arrays are stored in sequence, and an array is stored according to its first address and length.
Therefore *buffer is actually the address of the first byte, and is synonymous with buffer[0]. Because of this, you can use buffer as an array, like this:
int charcount(char *buffer, char ch)
{
int ccount = 0;
int charno = 0;
while(buffer[charno] != '\0')
{
if(buffer[charno] == ch)
ccount++;
charno++;
}
return ccount;
}
Note that this works because strings are null terminated - if you don't have a null termination in the character array pointed to by *buffer it will continue reading forever; you lose the bit where c knows how long the array is. This is why you see so many c functions to which you pass a pointer and a length - the pointer tells it the [0] position of the array, and the size you specify tells it how far to keep reading.
Hope this helps.
I wrote a simple C program in Linux that reads a single character from a string. I get some error regarding string functions. This is my code:
#include <stdio.h>
#include <string.h>
void main () {
char arr[10], vv[10];
int i = 0, len;
printf("enter the staement\n");
scanf("%s", arr);
len = strlen(arr);
printf("String laength=%d\n", len);
while ((vv[i] = getchar(arr)) != '\n') {
printf("%d charcter\n");
i++;
}
}
I don't want to use getchar() directly on the input text like this:
arr[i] = getchar();
I want to use getchar() from a stored string like this:
getchar(string array);
But unfortunately I get an error. Can I use the getchar() function directly from a stored string array?
Read about getchar. The link clearly says that getchar is a function that gets a character (an unsigned char) from stdin. Also, it takes no arguments. This would mean that you cannot copy each character of an array to another array using getchar. Just copy it directly using
while( (vv[i] = arr[i]) != '\n')
But I don't think this loop will end as scanf does not include the newline character when scanning a string(%s). So,you got two options:
Use fgets to get input.
Use the following
while( (vv[i] = arr[i]) != '\0')
When you have string in C, it is actually an array of chars which is terminated by '\0'. You do not need any method to get chars from it. Simply get the char as if you were accessing an array.
while((vv[i] = arr[i])!='\n')
As you have you arr[10] it will cause issues when your input is larger than 10 characters including the '\0'. So it is be better to declare it with enough space!
vv is a single char. You may not write vv[i].
Also, are you sure you want \n and not \0 [null]? scanf() won't give you a string with \n in it.
EDIT:
It is still unclear what you want to achieve, but if you want to check the presence of valid characters in the arr or vv, you can
take the base address of the arr or vv into a char *p.
check if (*p++) and do something.
EDIT:
You may try out something like
char * ip = NULL;
char * op = NULL;
int i = 10; //same as array size.
ip = arr;
op = vv;
while( (*op++ = *ip++) && i--)
{
//do something
};