C, counting the number of blankspaces - c

I'm writing a function that replaces blank spaces into '-' (<- this character).
I ultimately want to return how many changes I made.
#include <stdio.h>
int replace(char c[])
{
int i, cnt;
cnt = 0;
for (i = 0; c[i] != EOF; i++)
if (c[i]==' ' || c[i] == '\t' || c[i] == '\n')
{
c[i] = '-';
++cnt;
}
return cnt;
}
main()
{
char cat[] = "The cat sat";
int n = replace(cat);
printf("%d\n", n);
}
The problem is, it correctly changes the string into "The-cat-sat" but for n, it returns the value 3, when it's supposed to return 2.
What have I done wrong?

#4386427 suggested this should be another answer. #wildplasser already provided the solution, this answer explains EOF and '\0'.
You would use EOF only when reading from a file (EOF -> End Of File). See this discussion. EOF is used to denote the end of file, and its value is system dependent. In fact, EOF is rather a condition than a value. You can find great explainations in this thread. When working with char array or a char pointer, it will always be terminated by a '\0' character, and there is always exactly one of those, thus, you would use it to break out of the loop when iterating through an array/pointer. This is a sure way to ensure that you don't access memory that is not allocated.

#include <stdio.h>
int repl(int c);
int main(void){
int c, nc;
nc =0;
while ((c=getchar())!=EOF)
nc = replc(c);
printf("replaced: %d times\n", nc);
return 0;
}
int replc(int c){
int nc = 0;
for(; (c = getchar())!=EOF; ++c)
if (c == ' '){
putchar('-');
++nc;
} else putchar(c);
return nc;
}

A string ends with a 0 (zero) value, not an EOF (so: the program in the question will scan the string beyond the terminal\0 until it happens to find a -1 somewhere beyond; but you are already in UB land, here)
[sylistic] the function argument could be a character pointer (an array argument cannot exist in C)
[stylistic] a pointer version wont need the 'i' variable.
[stylistic] The count can never be negative: intuitively an unsigned counter is preferred. (it could even be a size_t, just like the other string functions)
[stylistic] a switch(){} can avoid the (IMO) ugly || list, it is also easier to add cases.
unsigned replace(char *cp){
unsigned cnt;
for(cnt = 0; *cp ; cp++) {
switch (*cp){
case ' ' : case '\t': case '\n':
*cp = '-';
cnt++;
default:
break;
}
}
return cnt;
}

EOF used in the for loop end condition is the problem as you are not using is to check end of file/stream.
for (i = 0; c[i] != EOF; i++)
EOF itself is not a character, but a signal that there are no more characters available in the stream.
If you are trying to check end of line please use
for (i = 0; c[i] != "\0"; i++)

Related

can anyone tell me what mgetline is doing in ? i cant understand

i could not understand what mgetline does in this code.
anyone can help me?
int mgetline(char s[],int lim)
{
int c, i;
for(i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if(c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The function basically reads characters one-by-one from the the standard input stream stdin until you enter a \n (newline) or the array limit of s, lim, is reached. The characters are stored in the char s[] and the length of what was read is returned.
It's hard to answer with more detail since it's a little unclear what it is you don't understand, but I've tried to annotate the code to make it somewhat clearer.
This is the same code, only reformatted to fit my comments.
int mgetline(char s[], int lim) {
int c, i;
for(i = 0; // init-statement, start with `i` at zero
i < lim - 1 // condition, `i` must be less than `lim - 1`
&& // condition, logical AND
(c = getchar()) !=EOF // (function call, assignment) condition, `c` must not be EOF
&& // condition, logical AND
c != '\n'; // condition, `c` must not be `\n` (newline)
++i) // iteration_expression, increase i by one
{
s[i] = c; // store the value of `c` in `s[i]`
}
if(c == '\n') { // if a newline was the last character read
s[i] = c; // store it
++i; // and increase i by one
}
s[i] = '\0'; // store a null terminator last
return i; // return the length of the string stored in `s`
}
In the condition part of the for loop you have 3 conditions that must all be true for the loop to enter the statement for(...;...;...) statement. I've made that statement into a code block to make it easier to see the scope. EOF is a special value that is returned by getchar() if the input stream (stdin) is closed.
Note: If you pass an array of one char (lim == 1) this function will cause undefined behavior. Any program reading uninitialized variables has undefined behavior - and that's a bad thing. In this case, if lim == 1, you will read c after the loop and c will then still be uninitialized.
Either initialize it:
int mgetline(char s[], int lim) {
int c = 0, i;
or bail out of the function:
int mgetline(char s[], int lim) {
if(lim < 2) {
if(lim == 1) s[0] = '\0';
return 0;
}
int c, i;

Getchar gets char only one time

Even if while is true getchar iterates one time. I tried my code with getchar in while condition and body, but it doesn't work.
int main() {
char* s = malloc(sizeof(char)) /*= get_string("Write number: ")*/;
char a[MAXN];
int i = 0;
do {
a[i] = getchar();
*s++ = a[i];
i++;
} while (isdigit(a[i-1]) && a[i-1] != EOF && a[i-1] != '\n' && i< MAXN);
/*while (isdigit(*s++=getchar()))
i++;*/
*s = '\0';
s -= i;
long n = conversion(s);
printf("\n%lu\n", n);
}
As others have pointed out, there isn't much use for s because a can be passed to conversion. And, again, the malloc for s only allocates a single byte.
You're incrementing i before doing the loop tests, so you have to use i-1 there. Also, the loop ends with i being one too large.
Even for your original code, doing int chr = getchar(); a[i] = chr; and replacing a[i-1] with chr can simplify things a bit.
Better yet, by restructuring to use a for instead of a do/while loop, we can add some more commenting for each escape condition rather than a larger single condition expression.
#define MAXN 1000
int
main(void)
{
char a[MAXN + 1];
int i;
for (i = 0; i < MAXN; ++i) {
// get the next character
int chr = getchar();
// stop on EOF
if (chr == EOF)
break;
// stop on newline
if (chr == '\n')
break;
// stop on non-digit
if (! isdigit(chr))
break;
// add digit to the output array
a[i] = chr;
}
// add EOS terminator to string
a[i] = 0;
unsigned long n = conversion(a);
printf("\n%lu\n",n);
return 0;
}
Code does not allocate enough memory with malloc(sizeof(char)) as that is only 1 byte.
When code tries to save a 2nd char into s, bad things can happen: undefined behavior (UB).
In any case, the allocation is not needed.
Instead form a reasonable fixed sized buffer and store characters/digits there.
// The max digits in a `long` is about log10(LONG_MAX) + a few
// The number of [bits in an `long`]/3 is about log10(INT_MAX)
#define LONG_DEC_SZ (CHAR_BIT*sizeof(long)/3 + 3)
int main(void) {
char a[LONG_DEC_SZ * 2]; // lets go for 2x to allow some leading zeros
int i = 0;
int ch; // `getchar()` typically returns 257 different values, use `int`
// As long as there is room and code is reading digits ...
while (i < sizeof a && isdigit((ch = getchar())) ) {
a[i++] = ch;
}
a[i++] = '\0';
long n = conversion(a);
printf("\n%ld\n", n);
}
To Do: This code does not allow a leading sign character like '-' or '+'

Converting words from camelCase to snake_case in C

What I am trying to code is, if I input camelcase, it should just print out camelcase, but if there contains any uppercase, for example, if I input camelCase, it should print out camel_case.
The below is the one I am working on but the problem is, if I input, camelCase, it prints out camel_ase.
Can someone please tell me the reason and how to fix it?
#include <stdio.h>
#include <ctype.h>
int main() {
char ch;
char input[100];
int i = 0;
while ((ch = getchar()) != EOF) {
input[i] = ch;
if (isupper(input[i])) {
input[i] = '_';
//input[i+1] = tolower(ch);
} else {
input[i] = ch;
}
printf("%c", input[i]);
i++;
}
}
First look at your code and think about what happens when someone enters a word longer than 100 characters -> undefined behavior. If you use a buffer for input, you always have to add checks so you don't overflow this buffer.
But then, as you directly print the characters, why do you need a buffer at all? It's completely unnecessary with the approach you show. Try this:
#include <stdio.h>
#include <ctype.h>
int main()
{
int ch;
int firstChar = 1; // needed to also accept PascalCase
while((ch = getchar())!= EOF)
{
if(isupper(ch))
{
if (!firstChar) putchar('_');
putchar(tolower(ch));
} else
{
putchar(ch);
}
firstChar = 0;
}
}
Side note: I changed the type of ch to int. This is because getchar() returns an int, putchar(), isupper() and islower() take an int and they all use a value of an unsigned char, or EOF. As char is allowed to be signed, on a platform with signed char, you would get undefined behavior calling these functions with a negative char. I know, this is a bit complicated. Another way around this issue is to always cast your char to unsigned char when calling a function that takes the value of an unsigned char as an int.
As you use a buffer, and it's useless right now, you might be interested there is a possible solution making good use of a buffer: Read and write a whole line at a time. This is slightly more efficient than calling a function for every single character. Here's an example doing that:
#include <stdio.h>
static size_t toSnakeCase(char *out, size_t outSize, const char *in)
{
const char *inp = in;
size_t n = 0;
while (n < outSize - 1 && *inp)
{
if (*inp >= 'A' && *inp <= 'Z')
{
if (n > outSize - 3)
{
out[n++] = 0;
return n;
}
out[n++] = '_';
out[n++] = *inp + ('a' - 'A');
}
else
{
out[n++] = *inp;
}
++inp;
}
out[n++] = 0;
return n;
}
int main(void)
{
char inbuf[512];
char outbuf[1024]; // twice the lenght of the input is upper bound
while (fgets(inbuf, 512, stdin))
{
toSnakeCase(outbuf, 1024, inbuf);
fputs(outbuf, stdout);
}
return 0;
}
This version also avoids isupper() and tolower(), but sacrifices portability. It only works if the character encoding has letters in sequence and has the uppercase letters before the lowercase letters. For ASCII, these assumptions hold. Be aware that what is considered an (uppercase) letter could also depend on the locale. The program above only works for letters A-Z as in the english language.
I don't know exactly how to code in C but I think you should do something like this.
if(isupper(input[i]))
{
input[i] = tolower(ch);
printf("_");
} else
{
input[i] = ch;
}
There are two problems in your code:
You insert one character in each branch of if, while one of them is supposed to insert two characters, and
You print characters as you go, but the first branch is supposed to print both _ and ch.
You can fix this by incrementing i on insertion with i++, and by printing the entire word at the end:
int ch; // <<== Has to be int, not char
char input[100];
int i = 0;
while((ch = getchar())!= EOF && (i < sizeof(input)-1)) {
if(isupper(ch)) {
if (i != 0) {
input[i++] = '_';
}
ch = tolower(ch);
}
input[i++] = ch;
}
input[i] = '\0'; // Null-terminate the string
printf("%s\n", input);
Demo.
There are multiple problems in your code:
ch is defined as a char: you cannot properly test for end of file if c is not defined as an int. getc() can return all values of type unsigned char plus the special value EOF, which is negative. Define ch as int.
You store the byte into the array input and use isupper(input[i]). isupper() is only defined for values returned by getc(), not for potentially negative values of the char type if this type is signed on the target system. Use isupper(ch) or isupper((unsigned char)input[i]).
You do not check if i is small enough before storing bytes to input[i], causing a potential buffer overflow. Note that it is not necessary to store the characters into an array for your problem.
You should insert the '_' in the array and the character converted to lowercase. This is your principal problem.
Whether you want Main to be converted to _main, main or left as Main is a question of specification.
Here is a simpler version:
#include <ctype.h>
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != EOF) {
if (isupper(c)) {
putchar('_');
putchar(tolower(c));
} else {
putchar(c);
}
}
return 0;
}
To output the entered characters in the form as you showed there is no need to use an array. The program can look the following way
#include <stdio.h>
#include <ctype.h>
int main( void )
{
int c;
while ((c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
putchar('_');
c = tolower(c);
}
putchar(c);
}
putchar('\n');
return 0;
}
If you want to use a character array you should reserve one its element for the terminating zero if you want that the array would contain a string.
In this case the program can look like
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char input[100];
const size_t N = sizeof(input) / sizeof(*input);
int c;
size_t i = 0;
while ( i + 1 < N && (c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
input[i++] = '_';
c = tolower(c);
}
if ( i + 1 != N ) input[i++] = c;
}
input[i] = '\0';
puts(input);
return 0;
}

Standard Input - Counting chars/words/lines

I've written some code for finding the # of chars, lines and words in a standard input but I have a few questions.
On running the program - It doesn't grab any inputs from me. Am I able to use shell redirection for this?
My word count - only counts if getchar() is equal to the escape ' or a ' ' space. I want it so that it also counts if its outside of a decimal value range on the ASCII table. IE. if getchar() != in the range of a->z and A->Z or a ', wordcount += 1.
I was thinking about using a decimal value range here to represent the range - ie: getchar() != (65->90 || 97->122 || \' ) -> wordcount+1
https://en.wikipedia.org/wiki/ASCII for ref.
Would this be the best way of going about answering this? and if so, what is the best way to implement the method?
#include <stdio.h>
int main() {
unsigned long int charcount;
unsigned long int wordcount;
unsigned long int linecount;
int c = getchar();
while (c != EOF) {
//characters
charcount += 1;
//words separated by characters outside range of a->z, A->Z and ' characters.
if (c == '\'' || c == ' ')
wordcount += 1;
//line separated by \n
if (c == '\n')
linecount += 1;
}
printf("%lu %lu %lu\n", charcount, wordcount, linecount);
}
Your code has multiple problems:
You do not initialize the charcount, wordcount nor linecount. Uninitialized local variables with automatic storage must be initialized before used, otherwise you invoke undefined behavior.
You only read a single byte from standard input. You should keep reading until you get EOF.
Your method for detecting words is incorrect: it is questionable whether ' is a delimiter, but you seem to want to specifically consider it to be. The standard wc utility considers only white space to separate words. Furthermore, multiple separators should only count for 1.
Here is a corrected version with your semantics, namely words are composed of letters, everything else counting as separators:
#include <ctype.h>
#include <stdio.h>
int main(void) {
unsigned long int charcount = 0;
unsigned long int wordcount = 0;
unsigned long int linecount = 0;
int c, lastc = '\n';
int inseparator = 1;
while ((c = getchar()) != EOF) {
charcount += 1; // characters
if (isalpha(c)) {
wordcount += inseparator;
inseparator = 0;
} else {
inseparator = 1;
if (c == '\n')
linecount += 1;
}
lastc = c;
}
if (lastc != '\n')
linecount += 1; // count the last line if not terminated with \n
printf("%lu %lu %lu\n", charcount, wordcount, linecount);
}
You need:
while((getchar()) != EOF )
As the head of you loop. As you have it getchar will read one character, the while block will loop around with no further getchar() ocurring !

Embracing/or not ++i in one part of a code. K&R longest line example

int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
This example is from K&R book on C, chapter 1.9 on arrays. What I do not understand is why do we have to embrace ++i inside if statement? Writing it outside should do the same work.
if(c=='\n')
s[i] = c;
++i;
s[i] = '\0'
return 0;
}
In case of embracing i program works as intended, but on the second case(which in my opinion should do the same work and this is why I edited that part) it doesn't. I ran it through debugger and watched i which in both cases was correctly calculated and returned. But program still won't work without embracing ++i. I don't get my print from printf statement, and Ctrl+D just won't work in terminal or XTerm(thorough CodeBlocks) I can't figure out why. Any hint please? Am I missing some logical step? Here is a complete code:
//Program that reads lines and prints the longest
/*----------------------------------------------------------------------------*/
#include <stdio.h>
#define MAXLINE 1000
int getline(char currentline[], int maxlinelenght);
void copy(char saveto[], char copyfrom[]);
/*----------------------------------------------------------------------------*/
int main(void)
{
int len, max;
char line[MAXLINE], longest[MAXLINE];
max = 0;
while( (len = getline(line, MAXLINE)) > 0 )
if(len > max){
max = len;
copy(longest, line);
}
if(max > 0)
printf("StrLength:%d\nString:%s", max, longest);
return 0;
}
/*----------------------------------------------------------------------------*/
int getline(char s[], int lim)
{
int c, i;
for(i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if(c=='\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
/*----------------------------------------------------------------------------*/
void copy(char to[], char from[])
{
int i;
i = 0;
while( (to[i]=from[i]) != '\0')
++i;
}
/*----------------------------------------------------------------------------*/
The line
if(c == '\n')
is equivalent to
if(c != EOF)
Does that help explain why the embracing occurs?
There is a logic there:
if(c=='\n'){
s[i] = c;
++i;
}
It means only if you read an additional newline, you need to increment i one more in order to keep space for the \0 character. If you put ++i outside the if block. it means that it will always increase i by 1 even there is no newline input, in this case, since i is already incremented in the for loop , there is already space for \0, therefore, ++i again will be wrong. You can print the value of i and see how it works.
The index specified by i is the location where the terminating null should be placed when there is no more input for the line. The location just before the index i contains the last valid character in the string.
Keep in mind that the loop that reads data from stdin can terminate for reasons other than reading a \n character.
If you had this construct:
if(c=='\n')
s[i] = c;
++i;
then if the last character read from stdin wasn't a newline you would increment the index by one without writing anything into the location specified by the pre-incremented value of i. You would be effectively adding an unspecified character to the result.
Worse(?), if the for loop terminated because of the i<lim-1 condition you would end up writing the terminating null character after the specified end of the array, resulting in undefined behavior (memory corruption).
The ++i is inside the if statement because we do not want to increment i if we are not placing the \n character in the current index; that would result in leaving an index in between the last character of the input and the \0 at the end of the character string.
For loop can exit due to 3 conditions
1. reading char limit reached or EOF encountered
2. New Line encountered
For first Case We need to store Null into string s , as i points to next position to last valid character read so no need to increment i.
But for second case , as i points to next position to last valid character read , we now store newline at that position then increment i for storing NULL character.
Thats why we need to increment i in 2nd case not in 1st case.
if(c=='\n'){
s[i] = c;
++i;
}

Resources