Why do we need to check string length larger than 0? - c

I got this example from CS50. I know that we need to check "s == NULL" in case there is no memory in the RAM. But, I am not sure why do we need to check the string length of t before capitalize.
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// Get a string
char *s = get_string("s: ");
if (s == NULL)
{
return 1;
}
// Allocate memory for another string
char *t = malloc(strlen(s) + 1);
if (t == NULL)
{
return 1;
}
// Copy string into memory
strcpy(t, s);
// Why do we need to check this condition?
if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}
// Print strings
printf("s: %s\n", s);
printf("t: %s\n", t);
// Free memory
free(t);
return 0;
}
Why do we need to use "if (strlen(t) > 0)" before capitalize?

Conceptually, there is no character to uppercase when the string is empty.
Technically, it's not needed. The first character of an empty string is 0, and toupper(0) is 0.
Note that strlen(t) > 0 can also be written as t[0] != 0 or just t[0]. There's no need to actually calculate the length of the string to find out if it's an empty string.
Also, make sure to read chux's answer for a correction regarding signed char.

// Why do we need to check this condition?
There is no need for the check. A string of length 0 consists of only a null character and toupper('\0'); returns '\0'.
Advanced: There is a need for something else though.
char may act as a signed or unsigned char. If t[0] < 0, (maybe due to entering 'é') then toupper(negative) is undefined behavior (UB). toupper() is only defined for EOF, (some negative) and values in the unsigned char range.
A more valuable code change, though pedantic, would be to access the characters as if they were unsigned char, then call toupper().
// if (strlen(t) > 0) { t[0] = toupper(t[0]); }
t[0] = (char) toupper(((unsigned char*)t)[0]);
// or
t[0] = (char) toupper(*(unsigned char*)t));

For any string t, the valid indexes (of the actual characters in the string) will be 0 to strlen(t) - 1.
Using strlen(t) as index will be the index of the null-terminator (assuming that it's a "proper" null-terminated string).
If strlen(t) == 0 then t[0] will be the null-terminator. And doing toupper on the null-terminator makes no sense. This is what the check does, make sure that there is at least one actual character (beyond the null-terminator) in the string.
In other words: It check that the string isn't empty.

Related

How to read and concatenate strings repeatedly in C

The program needs to do the following:
initialise the combined string variable.
read a string (input1) of no more than 256 characters then:
Until the first character of the read string is '#'
allocate enough space for a new combined string variable to hold the current combined string variable and the new read-string (input1).
then copy the contents of the old combined variable to the new, bigger, combined variable string.
then concatenate the newly read-string (input1) to the end of the new combined string.
deallocate the old combined string.
read a string into input 1 again.
After the user has typed a string with '#' print out the new combined string.
test input: where I am #here
expected output: whereIam
actually output: Segmentation fault (core dumped)
Note that the spaces above separate the strings. Write two more test cases.
And that is my own code:
#include<stdio.h>
#include<stdlib.h> // for malloc
#include<string.h> // for string funs
int main(void) {
char input1[257];
printf("Please enter a string here: ");
scanf("%256s",input1);
int input1_index = 0;
while (input1[input1_index] != '#') {
input1_index++;
}
char *combined;
combined = malloc(sizeof(char)*(input1_index+1+1));
combined[0] = '\0';
strcpy(combined,input1);
printf("%s\n",combined);
return 0;
}
How to modify my code? What is the Segmentation fault (core dumped)?
Thank you all.
With scanf, the specifier %Ns consumes and discards all leading whitespace before reading non-whitespace characters - stopping when trailing whitespace is encountered or N non-whitespace characters are read.
With the input
where I am #here
this
char input1[257];
scanf("%256s", input1);
will result in the input1 buffer containing the string "where".
This while loop
while (input1[input1_index] != '#') {
input1_index++;
}
does not handle the specified "[read strings...] Until the first character of the read string is '#'". It simply searches the one string you have read for the '#' character, and can easily exceed the valid indices of the string if one does not exist (as in the case of "where"). This will lead to Undefined Behaviour, and is surely the cause of your SIGSEGV.
Your code does not follow the instructions detailed.
A loop is required to read two or more whitespace delimited strings using scanf and %s. This loop should end when scanf fails, or the the first character of the read string is '#'.
Inside this loop you should get the string length of the read string (strlen) and add that to the existing string length. You should allocate a new buffer of this length plus one (malloc), and exit the loop if this allocation fails.
You should then copy the existing string to this new buffer (strcpy).
You should then concatenate the read string to this new buffer (strcat).
You should then deallocate the existing string buffer (free).
You should then copy the pointer value of the new string buffer to the existing string buffer variable (=).
Then repeat the loop.
In pseudocode, this roughly looks like:
input := static [257]
combined := allocate [1] as ""
size := 0
exit program if combined is null
print "Enter strings: "
while read input does not begin with '#' do
add length of input to size
temporary := allocate [size + 1]
stop loop if temporary is null
copy combined to temporary
concatenate input to temporary
deallocate combined
combined := temporary
end
print combined
deallocate combined
Others have already explained what you should do. But my additional suggestion is to take those detailed instructions and make them comments. Then for each one of them translate into C code:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
// initialise the combined string variable.
char *combined = malloc(1);
combined[0] = 0;
while (true) {
// read a string (input1) of no more than 256 characters
char input1[257];
if (scanf("%256s", input1) != 1) {
break;
}
// then: Until the first character of the read string is '#'
if (input1[0] == '#') {
break;
}
// allocate enough space for a new combined string variable to hold the current combined string variable and the new read-string (input1).
char *tmp = malloc(strlen(combined) + strlen(input1) + 1);
// then copy the contents of the old combined variable to the new, bigger, combined variable string.
strcpy(tmp, combined);
// then concatenate the newly read-string (input1) to the end of the new combined string.
strcat(tmp, input1);
// deallocate the old combined string.
free(combined);
combined = tmp; // <-- My interpretation of what the aim is
// read a string into input 1 again. <-- I guess they mean to loop
}
// After the user has typed a string with '#' print out the new combined string.
printf("%s", combined);
free(combined);
return 0;
}
Notice how "initialize", means "make it a proper C string which can be extended later".
When you read anything remember to check if the read was successful.
The first character of input1 is just `input1[0]'. It's always safe to check it (if the read was successful), because it must be the first character or the string terminator. No risk of false detection.
Allocation requires to add the size of combined, input1, and the string terminator.
Copy and concatenation are available in <string.h>.
Notice that the instructions are missing a line stating that the new combined string should become the current combined string, but it's pretty obvious from the intent.
I strongly dislike the suggested (implied) implementation which calls for a sequence such as:
read
while (check) {
do_stuff
read
}
My preference goes to the non repeated read:
while (true) {
read
if (!check) break;
do_stuff
}
Finally, free your memory. Always. Don't be lazy!
Just for completeness, another option is to store the size of your combined string, to avoid calling strlen on something you already know. You could also leverage the calloc and realloc functions, which have been available since a lot of time:
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
size_t capacity = 1;
char *combined = calloc(capacity, 1);
while (true) {
char input1[257];
if (scanf("%256s", input1) != 1 || input1[0] == '#') {
break;
}
capacity += strlen(input1);
combined = realloc(combined, capacity);
strcat(combined, input1);
}
printf("%s", combined);
free(combined);
return 0;
}
Finally for the additional test strings:
these are not the droids you're looking for #dummy
a# b## c### d#### e##### #f###### u#######(explicit)
Useless addition
You could also limit the number of reallocations by using a size/capacity couple and doubling the capacity each time you need more. Also (not particularly useful here) checking the return value of memory allocation function is a must. My checking for realloc is needlessly complicated here, because memory is freed at program termination, but, nevertheless, free your memory. Always. Don't be lazy! 😁
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
size_t capacity = 257, size = 0;
char *combined = calloc(capacity, 1);
if (combined == NULL) {
exit(EXIT_FAILURE);
}
while (true) {
char input1[257];
if (scanf("%256s", input1) != 1 || input1[0] == '#') {
break;
}
size += strlen(input1);
if (size >= capacity) {
char *tmp = realloc(combined, capacity *= 2);
if (tmp == NULL) {
free(combined);
exit(EXIT_FAILURE);
}
combined = tmp;
}
strcat(combined, input1);
}
printf("%s", combined);
free(combined);
return 0;
}
Uncalled-for optimization
David C. Rankin pointed out another important thing to take care of when using library functions, that is not to assume they are O(1). Repeatedly calling strcat() is not the smartest move, since it's always scanning from the beginning. So here I replaced the strcat() with a ligher strcpy(), storing the length of the string, before adding the new one.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
size_t capacity = 257, size = 0;
char *combined = calloc(capacity, 1);
if (combined == NULL) {
exit(EXIT_FAILURE);
}
while (true) {
char input1[257];
if (scanf("%256s", input1) != 1 || input1[0] == '#') {
break;
}
size_t oldsize = size;
size += strlen(input1);
if (size >= capacity) {
char *tmp = realloc(combined, capacity *= 2);
if (tmp == NULL) {
free(combined);
exit(EXIT_FAILURE);
}
combined = tmp;
}
strcpy(combined + oldsize, input1);
}
printf("%s", combined);
free(combined);
return 0;
}
In the below code, your idea that strcpy() will copy only up to the allocated memory is incorrect. strcpy() will copy beyond the allocated memory which is the reason it is unsafe and maybe the reason for the Segmentation fault (core dumped).
strcpy() in your code copies all characters from input1 to combined but as the size of combined is less than input1 thus problem occurs.
Instead, you should use strncpy() to copy only n number of characters to avoid writing to memory beyond the allocated space.
You should also make sure that combined is ended with '\0' properly.
And combined[0]='\0' will not initialize all the allocated memory with '\0', this only assigned '\0' to the first byte. To initialize all the allocated memory to '\0' you need to use memset() after malloc()
combined = malloc(sizeof(char)*(input1_index+1+1));
combined[0] = '\0'; //use memset() here
strcpy(combined,input1); //Try using strncpy(combined, input1, input1_index) here

function returning a string of zeros in c

i trying to create a function that return an array of zeros us a char array
and print this array in a file text but when i return a string an addition char was returned
this the text file string the program wrote
this my fuction :
char *zeros_maker (int kj,int kj1)
{
char *zeros;
zeros = (char *) malloc(sizeof(char)*(kj-kj1));
int i;
for(i =0;i<kj-kj1;i++)
zeros[i]='0';
printf("%s\n",zeros);
return zeros;
}
the instruction i used when i printed in the file
fprintf(pFile,"%c%s%c &",34,zeros_maker(added_zeros,0),34);
Thanks in advance
'0' in C is the value of the encoding used for the digit zero. This is not allowed to have the value 0 by the C standard.
You need to add a NUL-terminator '\0' to the end of the char array, in order for the printf function to work correctly.
Else you run the risk of it running past the end of the char array, with undefined results.
Finally, don't forget to free the allocated memory at some point in your program.
Read about how string in C are meant to be terminated.
Each string terminates with the null char '\0' (the NULL symbol ASCII value 0, not to be confused with the char '0' that has ASCII value 48). It identifies the end of the string.
zeros[kj-kj1]='\0';
Plus check always if you are accessing an element out of bound. In this case it happens if kj1> kj
Instead of for loop, you may get hand of memset.
char* zeros_maker(int kj,int kj1)
{
int len=kj-kj1;
char *zeros=malloc(sizeof(char)*(len));
memset(zeros,'0',len-1);
zeros[len-1]=0;
printf("%s\n",zeros);
//fflush(stdout);
return zeros;
}
Or if you are not fan of C-style string, and it's going to be ASCII only, following could be used too. Just be careful what your are doing this way.
char* zeros_maker_pascal_form(int kj,int kj1)
{
int len=kj-kj1;
char *zeros=malloc(sizeof(char)*(len));
memset(zeros,'0',len);
for(int a=0;a<len;a++){
printf("%c",zeros[a]);
}
printf("\n");
//fflush(stdout);
return zeros;
}
Your code has a few basic issues, the main one is that it fails to terminate the string (and include space for the terminator).
Here's a fixed and cleaned-up version:
char * zeros_maker(size_t length)
{
char *s = malloc(length + 1);
if(s != NULL)
{
memset(s, '0', length);
s[length - 1] = '\0';
}
return s;
}
This has the following improvements over your code:
It simplifies the interface, just taking the number of zeroes that should be returned (the length of the returned string). Do the subtraction at the call site, where those two values make sense.
No cast of the return value from malloc(), and no scaling by sizeof (char) since I consider that pointless.
Check for NULL being returned by malloc() before using the memory.
Use memset() to set a range of bytes to a single value, that's a standard C function and much easier to know and verify than a custom loop.
Terminate the string, of course.
Call it like so:
char *zs = zeros_maker(kj - kj1);
puts(s);
free(s);
Remember to free() the string once you're done with it.

NUL character and static character arrays/string literals in C

I understand that strings are terminated by a NUL '\0' byte in C.
However, what I can't figure out is why a 0 in a string literal acts differently than a 0 in an char array created on the stack. When checking for NUL terminators in a literal, the zeros in the middle of the array are not treated as such.
For example:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
int main()
{
/* here, one would expect strlen to evaluate to 2 */
char *confusion = "11001";
size_t len = strlen(confusion);
printf("length = %zu\n", len); /* why is this == 5, as opposed to 2? */
/* why is the entire segment printed here, instead of the first two bytes?*/
char *p = confusion;
while (*p != '\0')
putchar(*p++);
putchar('\n');
/* this evaluates to true ... OK */
if ((char)0 == '\0')
printf("is null\n");
/* and if we do this ... */
char s[6];
s[0] = 1;
s[1] = 1;
s[2] = 0;
s[3] = 0;
s[4] = 1;
s[5] = '\0';
len = strlen(s); /* len == 2, as expected. */
printf("length = %zu\n", len);
return 0;
}
output:
length = 5
11001
is null
length = 2
Why does this occur?
The variable 'confusion' is a pointer to char of a literal string.
So the memory looks something like
[11001\0]
So when you print the variable 'confusion', it will print everything until first null character which is represented by \0.
Zeroes in 11001 are not null, they are literal zeroes since it is surrounded with double quotes.
However, in your char array assignment for variable 's', you are assigning a decimal value 0 to
char variable. When you do that, ASCII decimal value of 0 which is ASCII character value of NULL character gets assigned to it. So the the character array looks something like in the memory
[happyface, happyface, NULL]
ASCII character happyface has ASCII decimal value of 1.
So when you print, it will print everything up to first NULL and thus
the strlen is 2.
The trick here is understanding what really gets assigned to a character variable when a decimal value is assigned to it.
Try this code:
#include <stdio.h>
int
main(void)
{
char c = 0;
printf( "%c\n", c ); //Prints the ASCII character which is NULL.
printf( "%d\n", c ); //Prints the decimal value.
return 0;
}
You can view an ASCII Table (e.g. http://www.asciitable.com/) to check the exact value of character '0' and null
'0' and 0 are not the same value. (The first one is 48, usually, although technically the precise value is implementation-defined and it is considered very bad style to write 48 to refer to the character '0'.)
If a '0' terminated a character string, you wouldn't be able to put zeros in strings, which would be a bit... limiting.

My function goes over the length of string

I am trying to make function that compares all the letters from alphabet to string I insert, and prints letters I didn't use. But when I print those letters it goes over and gives me random symbols at end. Here is link to function, how I call the function and result: http://imgur.com/WJRZvqD,U6Z861j,PXCQa4V#0
Here is code: (http://pastebin.com/fCyzFVAF)
void getAvailableLetters(char lettersGuessed[], char availableLetters[])
{
char alphabet[]={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
int LG,LG2,LA=0;
for (LG=0;LG<=strlen(alphabet)-1;LG++)
{
for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++)
{
if (alphabet[LG]==lettersGuessed[LG2])
{
break;
}
else if(alphabet[LG]!=lettersGuessed[LG2] &&LG2==strlen(lettersGuessed)-1)
{
availableLetters[LA]=alphabet[LG];
LA++;
}
}
}
}
Here is program to call the function:
#include <stdio.h>
#include <string.h>
#include "hangman.c"
int main()
{
int i = 0;
char result[30];
char text[30];
scanf("%s", text);
while(i != strlen(text))
{
i++;
}
getAvailableLetters(text, result);
printf("%s\n", result);
printf ("%d", i);
printf ("\n");
}
Here is result when I typed in abcd: efghijklmnopqrstuvwxyzUw▒ˉ
If you want to print result as a string, you need to include a terminating null at the end of it (that's how printf knows when to stop).
for %s printf stops printing when it reaches a null character '\0', because %s expects the string to be null terminated, but result not null terminated and that's why you get random symbols at the end
just add availableLetters[LA] = '\0' at the last line in the function getAvailableLetters
http://pastebin.com/fCyzFVAF
Make sure your string is NULL-terminated (e.g. has a '\0' character at the end). And that also implies ensuring the buffer that holds the string is large enough to contain the null terminator.
Sometimes one thinks they've got a null terminated string but the string has overflowed the boundary in memory and truncated away the null-terminator. That's a reason you always want to use the form of functions (not applicable in this case) that read data, like, for example, sprintf() which should be calling snprintf() instead, and any other functions that can write into a buffer to be the form that let's you explicitly limit the length, so you don't get seriously hacked with a virus or exploit.
char alphabet[]={'a','b','c', ... ,'x','y','z'}; is not a string. It is simply an "array 26 of char".
In C, "A string is a contiguous sequence of characters terminated by and including the first null character. ...". C11 §7.1.1 1
strlen(alphabet) expects a string. Since code did not provide a string, the result is undefined.
To fix, insure alphabet is a string.
char alphabet[]={'a','b','c', ... ,'x','y','z', 0};
// or
char alphabet[]={"abc...xyz"}; // compiler appends a \0
Now alphabet is "array 27 of char" and also a string.
2nd issue: for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++) has 2 problems.
1) Each time through the loop, code recalculates the length of the string. Better to calculate the string length once since the string length does not change within the loop.
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
2) strlen() returns the type size_t. This is some unsigned integer type. Should lettersGuessed have a length of 0 (it might have been ""), the string length - 1 is not -1, but some very large number as unsigned arithmetic "wraps around" and the loop may never stop. A simple solution follows. This solution would only fail is the length of the string exceeded INT_MAX.
int len = (int) strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
A solution without this limitation would use size_t throughout.
size_t LG2;
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 < len; LG2++)

Checking equality of a string in C and printing an answer [duplicate]

This question already has answers here:
C Strings Comparison with Equal Sign
(5 answers)
Closed 9 years ago.
i've written some simple code as an SSCCE, I'm trying to check if string entered is equal to a string i've defined in a char pointer array, so it should point to the string and give me a result. I'm not getting any warnings or errors but I'm just not getting any result (either "true" or "false")
is there something else being scanned with the scanf? a termination symbol or something? i'm just not able to get it to print out either true or false
code:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#define LENGTH 20
//typedef char boolean;
int main(void)
{
const char *temp[1];
temp[0] = "true\0";
temp[1] = "false\0";
char var[LENGTH];
printf("Enter either true or false.\n");
scanf("%s", var);
if(var == temp[0]) //compare contents of array
{
printf("\ntrue\n");
}
else if(var == temp[1]) //compare contents of array
{
printf("\nfalse\n");
}
}
const char *temp[1];
This defines tmp an array that can store 1 char* element.
temp[0] = "true\0";
Assigngs to the first element. This is okay.
temp[1] = "false\0";
Would assign to the second element, but temp can only store one. C doesn't check array boundaries for you.
Also not that you don't have to specify the terminating '\0' explicitly in string literals, so just "true" and "false" are sufficient.
if(var == temp[0])
This compares only the pointer values ("where the strings are stored"), not the contents. You need the strcmp() function (and read carefully, the returned value for equal strings might not be what you expect it to be).
Use strcmp for comparing strings:
#include <stdio.h>
int main(void)
{
const int LENGTH = 20;
char str[LENGTH];
printf("Type \"true\" or \"false\:\n");
if (scanf("%19s", str) != 1) {
printf("scanf failed.");
return -1;
}
if(strcmp(str, "true") == 0) {
printf("\"true\" has been typed.\n");
}
else if(strcmp(str, "false") == 0) {
printf("\"false\" has been typed.\n");
}
return 0;
}
and also note that:
string literals automatically contain null-terminating character ("true", not "true\0")
const int LENGTH is better than #define LENGTH since type safety comes with it
"%19s" ensures that no more than 19 characters (+ \0) will be stored in str
typedef char boolean; is not a good idea
unlikely, but still: scanf doesn't have to succeed
and there is no #include <iostream> in c :)
== checks for equality. Let's see what you're comparing.
The variable var is declared as a character array, so the expression var is really equivalent to &var[0] (the address of the first character in the var array).
Similarly, temp[0] is equivalent to &temp[0][0] (the address of the first character in the temp[0] array).
These addresses are obviously different (otherwise writing var would automatically write temp[0] as well), so == will always return 0 for your case.
strcmp, on the other hand, does not check for equality of its inputs, but for character-by-character equality of the arrays pointed to by its inputs (that is, it compares their members, not their addresses) and so you can use that for strings in C. It's worth noting that strcmp returns 0 (false) if the strings are equal.

Resources