I want to fill an array of char, one by one, so I am using the above code for testing, and the final output of "str" will always be the first char entered instead of all char, whats wrong ?
void gime_char(char c)
{
static *str;
static i;
if(i == 0)
str = malloc(sizeof(*str) * 10);
if(c == 'X')
{
printf("full str:%s\n", str);
}
printf("c == %c\n", c);
str[i] = c;
printf("d == %d\n", i);
i++;
}
int main()
{
char c;
while(c != 'X')
{
c = getchar();
gime_char(c);
}
}
The type of static *str is static int *str, but a string consists of chars. So now your string does not end up being stored with the characters adjacent to one another as you would expect, because each element of str has the size of int (probably 4 bytes), not that of char.
This part of the code should be fixed by specifying the type as static char *str. Once you fix that, there will be another problem when printing str without terminating it with a NUL character ('\0').
You're missing a type in the definition of i, and the type in the definition of str is incomplete: you say “pointer” (with the * character) but you don't say to what. For historical reasons, if you omit a type, the compiler assumes you meant int; this is deprecated, and good compilers warn about this.
Since you effectively wrote int *str, the memory layout looks something like this after entering hello (this is machine-dependent, but this is a pretty typical case):
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
| 'h' | 0 | 0 | 0 | 'e' | 0 | 0 | 0 | 'l' | 0 | ...
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----
^ ^ ^
| | |
str str+1 str+2
Each small cell is one byte (which corresponds to one character). Four cells make up one int. The line
str[i] = c;
writes one int into str. For example, when c is 'h', which has the numerical value 104, the number 104 is written into the int object str[0], which is represented as the four-byte sequence {104, 0, 0, 0}. This happens again for the next character, which is written in the next int-sized slot in the array, meaning 4 bytes further.
In the line
printf("full str:%s\n", str);
you print str as a string. In a string, the first zero byte marks the end of the string. So you see the string "h".
Your machine is little-endian. Exercise: what would you see on a big-endian machine?
The fix is to declare the types properly.
You should also initialize your variables. static variables are initialized to 0 anyway, but it's clearer if you do it explicitly. The variable c is not static, so it starts out containing whichever value was there before in memory; this could happen to be 'X', so you must initialize it explicitly.
Additionally, you need to make sure that the string is terminated by a zero byte before printing it. The static keyword ensures that the str variable is initialized to a null pointer, but the space that the pointer points to is allocated by malloc and contains whatever was there before.
void gime_char(char c)
{
static char *str; /* <<<< */
static int i; /* <<<< */
if(i == 0)
str = malloc(sizeof(*str) * 10);
if(c == 'X')
{
str[i] = 0; /* <<<< */
printf("full str:%s\n", str);
}
printf("c == %c\n", c);
str[i] = c;
printf("d == %d\n", i);
i++;
}
int main()
{
char c = 0; /* <<<< */
while(c != 'X')
{
c = getchar();
gime_char(c);
}
return 0; /* <<<< */
}
Advices :
Check that malloc didn't return an error
i is not initialize
Type of *str is wrong (static char *str)
Before using %s you have to add a '\0' at the end of your string.
I would say your problem comes from fact that you declare static *str;.
You declare it with out specifying the type! Compilers pass with, with a warning that this implies an int. So basically you end up with a static pointer to int. Which is not meant to store ANSI strings.
Later you allocate space for the buffer (I suppose) with str = malloc(sizeof(*str) * 10);. This is also wrong, because you allocate space for 10 pointers to str.
You want to work with characters.
static char *str;
if(i == 0)
str = malloc(sizeof(char) * 10);
Also you should initialize the string to zeros as it is almost certain you will get some garbage there and ANSI string is expected to be NULL terminated. Preferably i too but compilers initialize variables on stack.
Related
I got this example from CS50. I know that we need to check "s == NULL" in case there is no memory in the RAM. But, I am not sure why do we need to check the string length of t before capitalize.
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// Get a string
char *s = get_string("s: ");
if (s == NULL)
{
return 1;
}
// Allocate memory for another string
char *t = malloc(strlen(s) + 1);
if (t == NULL)
{
return 1;
}
// Copy string into memory
strcpy(t, s);
// Why do we need to check this condition?
if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}
// Print strings
printf("s: %s\n", s);
printf("t: %s\n", t);
// Free memory
free(t);
return 0;
}
Why do we need to use "if (strlen(t) > 0)" before capitalize?
Conceptually, there is no character to uppercase when the string is empty.
Technically, it's not needed. The first character of an empty string is 0, and toupper(0) is 0.
Note that strlen(t) > 0 can also be written as t[0] != 0 or just t[0]. There's no need to actually calculate the length of the string to find out if it's an empty string.
Also, make sure to read chux's answer for a correction regarding signed char.
// Why do we need to check this condition?
There is no need for the check. A string of length 0 consists of only a null character and toupper('\0'); returns '\0'.
Advanced: There is a need for something else though.
char may act as a signed or unsigned char. If t[0] < 0, (maybe due to entering 'é') then toupper(negative) is undefined behavior (UB). toupper() is only defined for EOF, (some negative) and values in the unsigned char range.
A more valuable code change, though pedantic, would be to access the characters as if they were unsigned char, then call toupper().
// if (strlen(t) > 0) { t[0] = toupper(t[0]); }
t[0] = (char) toupper(((unsigned char*)t)[0]);
// or
t[0] = (char) toupper(*(unsigned char*)t));
For any string t, the valid indexes (of the actual characters in the string) will be 0 to strlen(t) - 1.
Using strlen(t) as index will be the index of the null-terminator (assuming that it's a "proper" null-terminated string).
If strlen(t) == 0 then t[0] will be the null-terminator. And doing toupper on the null-terminator makes no sense. This is what the check does, make sure that there is at least one actual character (beyond the null-terminator) in the string.
In other words: It check that the string isn't empty.
Background:
I'm trying to create a program that takes a user name(assuming that input is clean), and prints out the initials of the name.
Objective:
Trying my hand out at C programming with CS50
Getting myself familiar with malloc & realloc
Code:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
string prompt(void);
char *getInitials(string input);
char *appendArray(char *output,char c,int count);
//Tracks # of initials
int counter = 0;
int main(void){
string input = prompt();
char *output = getInitials(input);
for(int i = 0; i < counter ; i++){
printf("%c",toupper(output[i]));
}
}
string prompt(void){
string input;
do{
printf("Please enter your name: ");
input = get_string();
}while(input == NULL);
return input;
}
char *getInitials(string input){
bool initials = true;
char *output;
output = malloc(sizeof(char) * counter);
for(int i = 0, n = strlen(input); i < n ; i++){
//32 -> ASCII code for spacebar
//9 -> ASCII code for tab
if(input[i] == 32 || input[i] == 9 ){
//Next char after spaces/tab will be initial
initials = true;
}else{//Not space/tab
if(initials == true){
counter++;
output = appendArray(output,input[i],counter);
initials = false;
}
}
// eprintf("Input[i] is : %c\n",input[i]);
// eprintf("Counter is : %i\n",counter);
// eprintf("i is : %i\n",i);
// eprintf("n is : %i\n",n);
}
return output;
}
char *appendArray(char *output,char c,int count){
// allocate an array of some initial (fairly small) size;
// read into this array, keeping track of how many elements you've read;
// once the array is full, reallocate it, doubling the size and preserving (i.e. copying) the contents;
// repeat until done.
//pointer to memory
char *data = malloc(0);
//Increase array size by 1
data = realloc(output,sizeof(char) * count);
//append the latest initial
strcat(data,&c);
printf("Value of c is :%c\n",c);
printf("Value of &c is :%s\n",&c);
for(int i = 0; i< count ; i++){
printf("Output: %c\n",data[i]);
}
return data;
}
Problem:
The output is not what i expected as there is a mysterious P appearing in the output.
E.g When i enter the name Barack Obama, instead of getting the result:BO, i get the result BP and the same happens for whatever name i choose to enter, with the last initial always being P.
Output:
Please enter your name: Barack Obama
Value of c is :B
Value of &c is :BP
Output: B
Value of c is :O
Value of &c is :OP
Output: B
Output: P
BP
What i've done:
I've traced the problem to the appendArray function, and more specifically to the value of &c (Address of c) though i have no idea what's causing the P to appear,what it means, why it appears and how i can get rid of it.
The value of P shows up no matter when i input.
Insights as to why it's happening and what i can do to solve it will be much appreciated.
Thanks!
Several issues, in decreasing order of importance...
First issue - c in appendArray is not a string - it is not a sequence of character values terminated by a 0. c is a single char object, storing a single char value.
When you try to print c as a string, as in
printf("Value of &c is :%s\n",&c);
printf writes out the sequence of character values starting at the address of c until it sees a 0-valued byte. For whatever reason, the byte immediately following c contains the value 80, which is the ASCII (or UTF-8) code for the character 'P'. The next byte contains a 0 (or there's a sequence of bytes containing non-printable characters, followed by a 0-valued byte).
Similarly, using &c as the argument to strcat is inappropriate, since c is not a string. Instead, you should do something like
data[count-1] = c;
Secondly, if you want to treat the data array as a string, you must make sure to size it at least 1 more than the number of initials and write a 0 to the final element:
data[count-1] = 0; // after all initials have been stored to data
Third,
char *data = malloc(0);
serves no purpose, the behavior is implementation-defined, and you immediately overwrite the result of malloc(0) with a call to realloc:
data = realloc(output,sizeof(char) * count);
So, get rid of the malloc(0) call altogether; either just initialize data to NULL, or initialize it with the realloc call:
char *data = realloc( output, sizeof(char) * count );
Fourth, avoid using "magic numbers" - numeric constants with meaning beyond their immediate, literal value. When you want to compare against character values, use character constants. IOW, change
if(input[i] == 32 || input[i] == 9 ){
to
if ( input[i] == ' ' || input[i] == '\t' )
That way you don't have to worry about whether the character encoding is ASCII, UTF-8, EBCDIC, or some other system. ' ' means space everywhere, '\t' means tab everywhere.
Finally...
I know part of your motivation for this exercise is to get familiar with malloc and realloc, but I want to caution you about some things:
realloc is potentially an expensive operation, it may move data to a new location, and it may fail. You really don't want to realloc a buffer a byte at a time. Instead, it's better to realloc in chunks. A typical strategy is to multiply the current buffer size by some factor > 1 (typically doubling):
char *tmp = realloc( data, current_size * 2 );
if ( tmp )
{
current_size *= 2;
data = tmp;
}
You should always check the result of a malloc, calloc, or realloc call to make sure it succeeded before attempting to access that memory.
Minor stylistic notes:
Avoid global variables where you can. There's no reason counter should be global, especially since you pass it as an argument to appendArray. Declare it local to main and pass it as an argument (by reference) to getInput:
int main( void )
{
int counter = 0;
...
char *output = getInitials( input, &counter );
for(int i = 0; i < counter ; i++)
{
printf("%c",toupper(output[i]));
}
...
}
/**
* The "string" typedef is an abomination that *will* lead you astray,
* and I want to have words with whoever created the CS50 header.
*
* They're trying to abstract away the concept of a "string" in C, but
* they've done it in such a way that the abstraction is "leaky" -
* in order to use and access the input object correctly, you *need to know*
* the representation behind the typedef, which in this case is `char *`.
*
* Secondly, not every `char *` object points to the beginning of a
* *string*.
*
* Hiding pointer types behind typedefs is almost always bad juju.
*/
char *getInitials( const char *input, int *counter )
{
...
(*counter)++; // parens are necessary here
output = appendArray(output,input[i],*counter); // need leading * here
...
}
I understand that strings are terminated by a NUL '\0' byte in C.
However, what I can't figure out is why a 0 in a string literal acts differently than a 0 in an char array created on the stack. When checking for NUL terminators in a literal, the zeros in the middle of the array are not treated as such.
For example:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
int main()
{
/* here, one would expect strlen to evaluate to 2 */
char *confusion = "11001";
size_t len = strlen(confusion);
printf("length = %zu\n", len); /* why is this == 5, as opposed to 2? */
/* why is the entire segment printed here, instead of the first two bytes?*/
char *p = confusion;
while (*p != '\0')
putchar(*p++);
putchar('\n');
/* this evaluates to true ... OK */
if ((char)0 == '\0')
printf("is null\n");
/* and if we do this ... */
char s[6];
s[0] = 1;
s[1] = 1;
s[2] = 0;
s[3] = 0;
s[4] = 1;
s[5] = '\0';
len = strlen(s); /* len == 2, as expected. */
printf("length = %zu\n", len);
return 0;
}
output:
length = 5
11001
is null
length = 2
Why does this occur?
The variable 'confusion' is a pointer to char of a literal string.
So the memory looks something like
[11001\0]
So when you print the variable 'confusion', it will print everything until first null character which is represented by \0.
Zeroes in 11001 are not null, they are literal zeroes since it is surrounded with double quotes.
However, in your char array assignment for variable 's', you are assigning a decimal value 0 to
char variable. When you do that, ASCII decimal value of 0 which is ASCII character value of NULL character gets assigned to it. So the the character array looks something like in the memory
[happyface, happyface, NULL]
ASCII character happyface has ASCII decimal value of 1.
So when you print, it will print everything up to first NULL and thus
the strlen is 2.
The trick here is understanding what really gets assigned to a character variable when a decimal value is assigned to it.
Try this code:
#include <stdio.h>
int
main(void)
{
char c = 0;
printf( "%c\n", c ); //Prints the ASCII character which is NULL.
printf( "%d\n", c ); //Prints the decimal value.
return 0;
}
You can view an ASCII Table (e.g. http://www.asciitable.com/) to check the exact value of character '0' and null
'0' and 0 are not the same value. (The first one is 48, usually, although technically the precise value is implementation-defined and it is considered very bad style to write 48 to refer to the character '0'.)
If a '0' terminated a character string, you wouldn't be able to put zeros in strings, which would be a bit... limiting.
What would be the reason for out[0] = '\0'; on the main() function?
It does seem to be working without it.
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
int tokentype;
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
void dcl(void);
void dirdcl(void);
int gettoken(void);
/*
Grammar:
dcl: optional * direct-dcl
direct-dcl: name
(dcl)
direct-dcl()
direct-dcl[optional size]
*/
int main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
/* 1. gettoken() gets the datatype from the token */
strcpy(datatype, token);
/* 2. Init out to end of the line? */
/* out[0] = '\0'; */
/* parse rest of line */
dcl();
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token;
/* Skip blank spaces and tabs */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()");
return tokentype = PARENS;
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
/* Reads the next character of input */
for (*p++ = c; isalnum(c = getch()); ) {
*p++ = c;
}
*p = '\0';
ungetch(c); /* Get back the space, tab */
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
}
else if (tokentype == NAME) /* variable name */ {
strcpy(name, token);
printf("token: %s\n", token);
}
else
printf("error: expected name or (dcl)\n");
while ((type = gettoken()) == PARENS || type == BRACKETS) {
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
}
You need out[0] to be zero in order for strcat to work.
While this line
out[0] = '\0';
was required prior to the introduction of static initialization rules, it is no longer required, because static arrays, such as out[], are initialized to all zeros.
According to initialization rules of C99,
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero.
if it is an aggregate, every member is initialized (recursively) according to these rules.
It is resetting the char array (aka string) to empty array. (removing junk values)
like we use:
int i = 0;
before doing something like:
i += 1;
so that junk value don't add
So just '\0' in 0 index of array tells that array is completely empty and the strcat function starts appending value from 0 index, over writing the junk values in other indexes of array.
If program is working without resetting array then it means your IDE tool is doing that for you, but it is good practice to reset it.
In short: In this particular case it's not strictly necessary, but in many other cases that look suspiciously similar, it is, so most people do it as "good style". So why would it be necessary?
There is no such thing as "empty" memory. There is no such thing as a "length". Unless you explicitly keep track of it, or define your own.
Memory is just bytes, which are numbers from 0 to 255. Since 0 is just as valid a number as 255, there is no way to tell whether a byte is used or not. You can "add up" several bytes if you need larger numbers, but everything is built out of bytes, in the end. Text is simply mapped to a number. A couple decades ago it was decided which number represents which character. So if you see a byte with the value 32, it could be a 32. Or it could be the 32nd letter in the computer's alphabet (which is the space character).
When you receive a string and you don't know how much text you will be dealing with, what you usually do is you reserve a large block of bytes. This is what char out[1000]; above does. But how do you tell where the text ends? How much of the 1000 bytes you've already used?
Well, in the old days, some people would just declare another variable, say, int length; and keep track of how many bytes they've used so far. The designers of C went a different route. They decided to pick a very rare character and use that as a marker. They picked the character with the value 0 for that (That is not the character '0'. The character '0' actually is the 48th letter of a computer's alphabet).
So you can just look at all the bytes in your string from the start, and if a character is > 0, you know it is used. If you reach a 0 character, you know this is the end of your string. There are various advantages to either approach. An int uses 4 bytes, an additional 0-character only 1. On the other hand, if you use an int, a string can also contain a 0-character, it's just another character, nobody cares.
Whenever you write "foo" in C, what C actually does is reserve room for 4 bytes, for 'f', 'o', 'o' and for the 0 to indicate the end. When you write "" in C, what it does is reserve room for a single byte, the 0. So that you can tell that the string is empty.
So, what is memory filled with before you put something into it at startup? Well, in most cases, it is just garbage. Whatever was in that memory the last time it was used (after all, you have limited RAM, so when you quit one application on your computer, its memory can get re-used for the next app you launch after that). These will be random numbers, often outside of the range of common characters.
So, if you want strcat to see out as an empty string, you need to give it a block of memory that starts with this 0 value character. If you just leave memory like it is, there might be some random characters in it. Your buffer might contain "jbhasugaudq7e1723876123798dbkda0skno§§^^%$#-9H0HWDZmwus0/usr/local/bin"
or whatever was in that memory before. If you now appended some text to it, it would think the stuff before the first 0 (which is just randomly in this place) was a valid string, and append it to that. It will only know that this string is supposed to be empty, if you put a 0 right at the start.
So why did I say it is "not strictly necessary"? Well, because in your case, out is a global variable, and global variables are special because they automatically get cleared to 0 when your application starts up (or assigned any value that you assign them when you declare them).
However, this is only true for global variables (both regular globals and static globals). So many programmers make it a habit to always initialize their blocks of bytes. That way, if someone later decides to change a global into a local variable, or copy-and-pastes the code to another spot to use with a local variable, they do not have to worry about forgetting to add this statement.
This is especially useful as random memory often contains 0 characters. So depending on what program you previously used, you might not notice you forgot the initial 0 because there happened to be one already in there. And only later, when one of your users runs this application, they get garbage at the start of their string.
Does that clarify things a bit?
I thought I had this solved, but apparently, I was incorrect. The question is... what did I miss?
Assignment description:
You are to create a C program which fills an integer array with integers and then you are to cast it as a string and print it out. The output of the string should be your first and last name with proper capitalization, spacing and punctuation. Your program should have structure similar to:
main()
{
int A[100];
char *S;
A[0]=XXXX;
A[1]=YYYY;
...
A[n]=0; -- because C strings are terminated with NULL
...
printf("My name is %s\n",S);
}
Response to my submission:
You still copied memory cells to other, which is not expected. You use different space for the integer array as the string which does not follow the requirements. Please follow the instructions carefully next time.
My submission
Note that the first time I submitted, I simply used malloc on S, and copied casted values from A to S. The response was that I could not use malloc or allocate new space. This requirement was not in the problem description above.
Below was my second and final submission, which is the submission being referred to in the submission response above.
#include <stdio.h>
/* Main Program*/
int main (int arga, char **argb){
int A[100];
char *S;
A[0] = 68;
A[1] = 117;
/** etc. etc. etc. **/
A[13] = 115;
A[14] = 0;
// Point a char pointer to the first integer
S = (char *) A;
// For generality, in C, [charSize == 1 <= intSize]
// This is the ratio of intSize over charSize
int ratio = sizeof(int);
// Copy the i'th (char sized) set of bytes into
// consecutive locations in memory.
int i = 0;
// Using the char pointer as our reference, each set of
// bits is then i*ratio positions away from the i'th
// consecutive position in which it belongs for a string.
while (S[i*ratio] != 0){
S[i] = S[i*ratio];
i++;
}
// a sentinel for the 'S string'
S[i] = 0;
printf("My name is %s\n", S);
return 0;
}// end main
It looks like you've got the core idea down: the space for one integer will hold many chars. I believe you just need to pack the integer array "by hand" instead of in the for loop. Assuming a 4-byte integer on a little-endian machine, give this a shot.
#include <stdio.h>
int main()
{
int x[50];
x[0] = 'D' | 'u' << 8 | 's' << 16 | 't' << 24;
x[1] = 0;
char *s = (char*)x;
printf("Name: %s\n", s);
return 0;
}
It sounds like your professor wanted you to put 4 bytes into each int instead of having an array of n "1 byte" ints that you later condensed into 4 / sizeof(int) bytes using the while loop. Per Hurkyl's comment, the solution to this assignment would be platform dependent, meaning that it will differ from machine to machine. I'm assuming your instructor had the class ssh into and use a specific machine?
In any case, assuming you're on a little endian machine, say you wanted to type out the string: "Hi Dad!". Then a snippet of the solution would look something like this:
// Precursor stuff
A[0] = 0x44206948; // Hi D
A[1] = 0x216461; // ad!
A[2] = 0; // Null terminated
char *S = (char *)A;
printf("My string: %s\n", S);
// Other stuff