Standard Input - Counting chars/words/lines - c

I've written some code for finding the # of chars, lines and words in a standard input but I have a few questions.
On running the program - It doesn't grab any inputs from me. Am I able to use shell redirection for this?
My word count - only counts if getchar() is equal to the escape ' or a ' ' space. I want it so that it also counts if its outside of a decimal value range on the ASCII table. IE. if getchar() != in the range of a->z and A->Z or a ', wordcount += 1.
I was thinking about using a decimal value range here to represent the range - ie: getchar() != (65->90 || 97->122 || \' ) -> wordcount+1
https://en.wikipedia.org/wiki/ASCII for ref.
Would this be the best way of going about answering this? and if so, what is the best way to implement the method?
#include <stdio.h>
int main() {
unsigned long int charcount;
unsigned long int wordcount;
unsigned long int linecount;
int c = getchar();
while (c != EOF) {
//characters
charcount += 1;
//words separated by characters outside range of a->z, A->Z and ' characters.
if (c == '\'' || c == ' ')
wordcount += 1;
//line separated by \n
if (c == '\n')
linecount += 1;
}
printf("%lu %lu %lu\n", charcount, wordcount, linecount);
}

Your code has multiple problems:
You do not initialize the charcount, wordcount nor linecount. Uninitialized local variables with automatic storage must be initialized before used, otherwise you invoke undefined behavior.
You only read a single byte from standard input. You should keep reading until you get EOF.
Your method for detecting words is incorrect: it is questionable whether ' is a delimiter, but you seem to want to specifically consider it to be. The standard wc utility considers only white space to separate words. Furthermore, multiple separators should only count for 1.
Here is a corrected version with your semantics, namely words are composed of letters, everything else counting as separators:
#include <ctype.h>
#include <stdio.h>
int main(void) {
unsigned long int charcount = 0;
unsigned long int wordcount = 0;
unsigned long int linecount = 0;
int c, lastc = '\n';
int inseparator = 1;
while ((c = getchar()) != EOF) {
charcount += 1; // characters
if (isalpha(c)) {
wordcount += inseparator;
inseparator = 0;
} else {
inseparator = 1;
if (c == '\n')
linecount += 1;
}
lastc = c;
}
if (lastc != '\n')
linecount += 1; // count the last line if not terminated with \n
printf("%lu %lu %lu\n", charcount, wordcount, linecount);
}

You need:
while((getchar()) != EOF )
As the head of you loop. As you have it getchar will read one character, the while block will loop around with no further getchar() ocurring !

Related

K&R C programming book exercise 1-18

I'm towards solving the exercise, but just half way, I find it so weird and cannot figure it out,
the next is the code snippet, I know it is steps away from finished, but I think it's worth figuring out how come the result is like this!
#define MAXLINE 1000
int my_getline(char line[], int maxline);
int main(){
int len;
char line[MAXLINE];/* current input line */
int j;
while((len = my_getline(line, MAXLINE)) > 0 ){
for (j = 0 ; j <= len-1 && line[j] != ' ' && line[j] != '\t'; j++){
printf("%c", line[j]);
}
}
return 0;
}
int my_getline(char s[], int limit){
int c,i;
for (i = 0 ; i < limit -1 && (c = getchar()) != EOF && c != '\n'; i++)
s[i] = c;
if (c == '\n'){
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
It will be compiled successfully with cc: cc code.c. But the following result is subtle!
Iit is working for lines without \t and blanks:
hello
hello
but it does not work for the line in the picture:
I typed hel[blank][blank]lo[blank]\n:
Could anyone help me a bit? many thanks!
The problem is that you are stuck because you try to get a full line and process it. It's better to process (and the problems of K&R are mostly this way all) the input char by char. If you don't print characters as you detect spaces, but save them in a buffer, and print them if there's a nontab character when you read one past the accumulated ones, then everything works fine. This is also true for new lines. You should keep the last (nonblank) character (as blanks are eliminated before a new line) read to see if it is a newline... in that case, the new line you have just read is not printed, and so, sequences of two or more newlines are only printed the first. This is a sample complete program that does this:
#include <stdio.h>
#include <stdlib.h>
#define F(_f) __FILE__":%d:%s: "_f, __LINE__, __func__
int main()
{
char buffer[1000];
int bs = 0;
int last_char = '\n', c;
unsigned long
eliminated_spntabs = 0,
eliminated_nl = 0;
while((c = getchar()) != EOF) {
switch(c) {
case '\t': case ' ':
if (bs >= sizeof buffer) {
/* full buffer, cannot fit more blanks/tabs */
fprintf(stderr,
"we can only hold upto %d blanks/tabs in"
" sequence\n", (int)sizeof buffer);
exit(1);
}
/* add to buffer */
buffer[bs++] = c;
break;
default: /* normal char */
/* print intermediate spaces, if any */
if (bs > 0) {
printf("%.*s", bs, buffer);
bs = 0;
}
/* and the read char */
putchar(c);
/* we only update last_char on nonspaces and
* \n's. */
last_char = c;
break;
case '\n':
/* eliminate the accumulated spaces */
if (bs > 0) {
eliminated_spntabs += bs;
/* this trace to stderr to indicate the number of
* spaces/tabs that have been eliminated.
* Erase it when you are happy with the code. */
fprintf(stderr, "<<%d>>", bs);
bs = 0;
}
if (last_char != '\n') {
putchar('\n');
} else {
eliminated_nl++;
}
last_char = '\n';
break;
} /* switch */
} /* while */
fprintf(stderr,
F("Eliminated tabs: %lu\n"),
eliminated_spntabs);
fprintf(stderr,
F("Eliminated newl: %lu\n"),
eliminated_nl);
return 0;
}
The program prints (on stderr to not interfer the normal output) the number of eliminated tabs/spaces surrounded by << and >>. And also prints at the end the full number of eliminated blank lines and the number of no content lines eliminated. A line full of spaces (only) is considered a blank line, and so it is eliminated. In case you don't want blank lines with spaces (they will be eliminated anyway, as they are at the end) to be eliminated, just assign spaces/tabs seen to the variable last_char.
In addition to the good answer by #LuisColorado, there a several ways you can look at your problem that may simplify things for you. Rather than using multiple conditionals to check for c == ' ' and c == '\t' and c == '\n', include ctype.h and use the isspace() macro to determine if the current character is whitespace. It is a much clearer way to go.
When looking at the return. POSIX getline uses ssize_t as the signed return allowing it to return -1 on error. While the type is a bit of an awkward type, you can do the same with long (or int64_t for a guaranteed exact width).
Where I am a bit unclear on what you are trying to accomplish, you appear to be wanting to read the line of input and ignore whitespace. (while POSIX getline() and fgets() both include the trailing '\n' in the count, it may be more advantageous to read (consume) the '\n' but not include that in the buffer filled by my_getline() -- up to you. So from your example output provided above it looks like you want both "hello" and "hel lo ", to be read and stored as "hello".
If that is the case, then you can simplify your function as:
long my_getline (char *s, size_t limit)
{
int c = 0;
long n = 0;
while ((size_t)n + 1 < limit && (c = getchar()) != EOF && c != '\n') {
if (!isspace (c))
s[n++] = c;
}
s[n] = 0;
return n ? n : c == EOF ? -1 : 0;
}
The return statement is just the combination of two ternary clauses which will return the number of characters read, including 0 if the line was all whitespace, or it will return -1 if EOF is encountered before a character is read. (a ternary simply being a shorthand if ... else ... statement in the form test ? if_true : if_false)
Also note the choice made above for handling the '\n' was to read the '\n' but not include it in the buffer filled. You can change that by simply removing the && c != '\n' from the while() test and including it as a simple if (c == '\n') break; at the very end of the while loop.
Putting together a short example, you would have:
#include <stdio.h>
#include <ctype.h>
#define MAXC 1024
long my_getline (char *s, size_t limit)
{
int c = 0;
long n = 0;
while ((size_t)n + 1 < limit && (c = getchar()) != EOF && c != '\n') {
if (!isspace (c))
s[n++] = c;
}
s[n] = 0;
return n ? n : c == EOF ? -1 : 0;
}
int main (void) {
char str[MAXC];
long nchr = 0;
fputs ("enter line: ", stdout);
if ((nchr = my_getline (str, MAXC)) != -1)
printf ("%s (%ld chars)\n", str, nchr);
else
puts ("EOF before any valid input");
}
Example Use/Output
With your two input examples, "hello" and "hel lo ", you would have:
$ ./bin/my_getline
enter line: hello
hello (5 chars)
Or with included whitespace:
$ ./bin/my_getline
enter line: hel lo
hello (5 chars)
Testing the error condition by pressing Ctrl + d (or Ctrl + z on windows):
$ ./bin/my_getline
enter line: EOF before any valid input
There are many ways to put these pieces together, this is just one possible solution. Look things over and let me know if you have further questions.

Getchar gets char only one time

Even if while is true getchar iterates one time. I tried my code with getchar in while condition and body, but it doesn't work.
int main() {
char* s = malloc(sizeof(char)) /*= get_string("Write number: ")*/;
char a[MAXN];
int i = 0;
do {
a[i] = getchar();
*s++ = a[i];
i++;
} while (isdigit(a[i-1]) && a[i-1] != EOF && a[i-1] != '\n' && i< MAXN);
/*while (isdigit(*s++=getchar()))
i++;*/
*s = '\0';
s -= i;
long n = conversion(s);
printf("\n%lu\n", n);
}
As others have pointed out, there isn't much use for s because a can be passed to conversion. And, again, the malloc for s only allocates a single byte.
You're incrementing i before doing the loop tests, so you have to use i-1 there. Also, the loop ends with i being one too large.
Even for your original code, doing int chr = getchar(); a[i] = chr; and replacing a[i-1] with chr can simplify things a bit.
Better yet, by restructuring to use a for instead of a do/while loop, we can add some more commenting for each escape condition rather than a larger single condition expression.
#define MAXN 1000
int
main(void)
{
char a[MAXN + 1];
int i;
for (i = 0; i < MAXN; ++i) {
// get the next character
int chr = getchar();
// stop on EOF
if (chr == EOF)
break;
// stop on newline
if (chr == '\n')
break;
// stop on non-digit
if (! isdigit(chr))
break;
// add digit to the output array
a[i] = chr;
}
// add EOS terminator to string
a[i] = 0;
unsigned long n = conversion(a);
printf("\n%lu\n",n);
return 0;
}
Code does not allocate enough memory with malloc(sizeof(char)) as that is only 1 byte.
When code tries to save a 2nd char into s, bad things can happen: undefined behavior (UB).
In any case, the allocation is not needed.
Instead form a reasonable fixed sized buffer and store characters/digits there.
// The max digits in a `long` is about log10(LONG_MAX) + a few
// The number of [bits in an `long`]/3 is about log10(INT_MAX)
#define LONG_DEC_SZ (CHAR_BIT*sizeof(long)/3 + 3)
int main(void) {
char a[LONG_DEC_SZ * 2]; // lets go for 2x to allow some leading zeros
int i = 0;
int ch; // `getchar()` typically returns 257 different values, use `int`
// As long as there is room and code is reading digits ...
while (i < sizeof a && isdigit((ch = getchar())) ) {
a[i++] = ch;
}
a[i++] = '\0';
long n = conversion(a);
printf("\n%ld\n", n);
}
To Do: This code does not allow a leading sign character like '-' or '+'

C, counting the number of blankspaces

I'm writing a function that replaces blank spaces into '-' (<- this character).
I ultimately want to return how many changes I made.
#include <stdio.h>
int replace(char c[])
{
int i, cnt;
cnt = 0;
for (i = 0; c[i] != EOF; i++)
if (c[i]==' ' || c[i] == '\t' || c[i] == '\n')
{
c[i] = '-';
++cnt;
}
return cnt;
}
main()
{
char cat[] = "The cat sat";
int n = replace(cat);
printf("%d\n", n);
}
The problem is, it correctly changes the string into "The-cat-sat" but for n, it returns the value 3, when it's supposed to return 2.
What have I done wrong?
#4386427 suggested this should be another answer. #wildplasser already provided the solution, this answer explains EOF and '\0'.
You would use EOF only when reading from a file (EOF -> End Of File). See this discussion. EOF is used to denote the end of file, and its value is system dependent. In fact, EOF is rather a condition than a value. You can find great explainations in this thread. When working with char array or a char pointer, it will always be terminated by a '\0' character, and there is always exactly one of those, thus, you would use it to break out of the loop when iterating through an array/pointer. This is a sure way to ensure that you don't access memory that is not allocated.
#include <stdio.h>
int repl(int c);
int main(void){
int c, nc;
nc =0;
while ((c=getchar())!=EOF)
nc = replc(c);
printf("replaced: %d times\n", nc);
return 0;
}
int replc(int c){
int nc = 0;
for(; (c = getchar())!=EOF; ++c)
if (c == ' '){
putchar('-');
++nc;
} else putchar(c);
return nc;
}
A string ends with a 0 (zero) value, not an EOF (so: the program in the question will scan the string beyond the terminal\0 until it happens to find a -1 somewhere beyond; but you are already in UB land, here)
[sylistic] the function argument could be a character pointer (an array argument cannot exist in C)
[stylistic] a pointer version wont need the 'i' variable.
[stylistic] The count can never be negative: intuitively an unsigned counter is preferred. (it could even be a size_t, just like the other string functions)
[stylistic] a switch(){} can avoid the (IMO) ugly || list, it is also easier to add cases.
unsigned replace(char *cp){
unsigned cnt;
for(cnt = 0; *cp ; cp++) {
switch (*cp){
case ' ' : case '\t': case '\n':
*cp = '-';
cnt++;
default:
break;
}
}
return cnt;
}
EOF used in the for loop end condition is the problem as you are not using is to check end of file/stream.
for (i = 0; c[i] != EOF; i++)
EOF itself is not a character, but a signal that there are no more characters available in the stream.
If you are trying to check end of line please use
for (i = 0; c[i] != "\0"; i++)

Converting words from camelCase to snake_case in C

What I am trying to code is, if I input camelcase, it should just print out camelcase, but if there contains any uppercase, for example, if I input camelCase, it should print out camel_case.
The below is the one I am working on but the problem is, if I input, camelCase, it prints out camel_ase.
Can someone please tell me the reason and how to fix it?
#include <stdio.h>
#include <ctype.h>
int main() {
char ch;
char input[100];
int i = 0;
while ((ch = getchar()) != EOF) {
input[i] = ch;
if (isupper(input[i])) {
input[i] = '_';
//input[i+1] = tolower(ch);
} else {
input[i] = ch;
}
printf("%c", input[i]);
i++;
}
}
First look at your code and think about what happens when someone enters a word longer than 100 characters -> undefined behavior. If you use a buffer for input, you always have to add checks so you don't overflow this buffer.
But then, as you directly print the characters, why do you need a buffer at all? It's completely unnecessary with the approach you show. Try this:
#include <stdio.h>
#include <ctype.h>
int main()
{
int ch;
int firstChar = 1; // needed to also accept PascalCase
while((ch = getchar())!= EOF)
{
if(isupper(ch))
{
if (!firstChar) putchar('_');
putchar(tolower(ch));
} else
{
putchar(ch);
}
firstChar = 0;
}
}
Side note: I changed the type of ch to int. This is because getchar() returns an int, putchar(), isupper() and islower() take an int and they all use a value of an unsigned char, or EOF. As char is allowed to be signed, on a platform with signed char, you would get undefined behavior calling these functions with a negative char. I know, this is a bit complicated. Another way around this issue is to always cast your char to unsigned char when calling a function that takes the value of an unsigned char as an int.
As you use a buffer, and it's useless right now, you might be interested there is a possible solution making good use of a buffer: Read and write a whole line at a time. This is slightly more efficient than calling a function for every single character. Here's an example doing that:
#include <stdio.h>
static size_t toSnakeCase(char *out, size_t outSize, const char *in)
{
const char *inp = in;
size_t n = 0;
while (n < outSize - 1 && *inp)
{
if (*inp >= 'A' && *inp <= 'Z')
{
if (n > outSize - 3)
{
out[n++] = 0;
return n;
}
out[n++] = '_';
out[n++] = *inp + ('a' - 'A');
}
else
{
out[n++] = *inp;
}
++inp;
}
out[n++] = 0;
return n;
}
int main(void)
{
char inbuf[512];
char outbuf[1024]; // twice the lenght of the input is upper bound
while (fgets(inbuf, 512, stdin))
{
toSnakeCase(outbuf, 1024, inbuf);
fputs(outbuf, stdout);
}
return 0;
}
This version also avoids isupper() and tolower(), but sacrifices portability. It only works if the character encoding has letters in sequence and has the uppercase letters before the lowercase letters. For ASCII, these assumptions hold. Be aware that what is considered an (uppercase) letter could also depend on the locale. The program above only works for letters A-Z as in the english language.
I don't know exactly how to code in C but I think you should do something like this.
if(isupper(input[i]))
{
input[i] = tolower(ch);
printf("_");
} else
{
input[i] = ch;
}
There are two problems in your code:
You insert one character in each branch of if, while one of them is supposed to insert two characters, and
You print characters as you go, but the first branch is supposed to print both _ and ch.
You can fix this by incrementing i on insertion with i++, and by printing the entire word at the end:
int ch; // <<== Has to be int, not char
char input[100];
int i = 0;
while((ch = getchar())!= EOF && (i < sizeof(input)-1)) {
if(isupper(ch)) {
if (i != 0) {
input[i++] = '_';
}
ch = tolower(ch);
}
input[i++] = ch;
}
input[i] = '\0'; // Null-terminate the string
printf("%s\n", input);
Demo.
There are multiple problems in your code:
ch is defined as a char: you cannot properly test for end of file if c is not defined as an int. getc() can return all values of type unsigned char plus the special value EOF, which is negative. Define ch as int.
You store the byte into the array input and use isupper(input[i]). isupper() is only defined for values returned by getc(), not for potentially negative values of the char type if this type is signed on the target system. Use isupper(ch) or isupper((unsigned char)input[i]).
You do not check if i is small enough before storing bytes to input[i], causing a potential buffer overflow. Note that it is not necessary to store the characters into an array for your problem.
You should insert the '_' in the array and the character converted to lowercase. This is your principal problem.
Whether you want Main to be converted to _main, main or left as Main is a question of specification.
Here is a simpler version:
#include <ctype.h>
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != EOF) {
if (isupper(c)) {
putchar('_');
putchar(tolower(c));
} else {
putchar(c);
}
}
return 0;
}
To output the entered characters in the form as you showed there is no need to use an array. The program can look the following way
#include <stdio.h>
#include <ctype.h>
int main( void )
{
int c;
while ((c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
putchar('_');
c = tolower(c);
}
putchar(c);
}
putchar('\n');
return 0;
}
If you want to use a character array you should reserve one its element for the terminating zero if you want that the array would contain a string.
In this case the program can look like
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char input[100];
const size_t N = sizeof(input) / sizeof(*input);
int c;
size_t i = 0;
while ( i + 1 < N && (c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
input[i++] = '_';
c = tolower(c);
}
if ( i + 1 != N ) input[i++] = c;
}
input[i] = '\0';
puts(input);
return 0;
}

Is there a way to see what characters are in the types in ctype.h?

I am writing a C program that involves going through a .txt file and finding all the printable characters (or possibly graphical characters) that are not used in the file. I know that the header file ctype.h defines several character classes (e.g. digits, lowercase letters, uppercase letters, etc.) and provides functions to check whether or not a given character belongs to each of the classes, but I'm not sure whether it's possible to do the inverse (i.e. checking all the characters in a class for something). I need something that lists or defines all of the characters in each type, ideally an array or enumerated type.
Dunno if this is helpful, but I wrote a program to classify characters based on those found in a given file. It wouldn't be hard to fix it to go over the characters (bytes) in the range 0..255 unconditionally.
#include <stdio.h>
#include <ctype.h>
#include <limits.h>
static void classifier(FILE *fp, char *fn)
{
int c;
int map[UCHAR_MAX + 1];
size_t i;
printf("%s:\n", fn);
for (i = 0; i < UCHAR_MAX + 1; i++)
map[i] = 0;
printf("Code Char Space Upper Lower Alpha AlNum Digit XDig Graph Punct Print Cntrl\n");
while ((c = getc(fp)) != EOF)
{
map[c] = 1;
}
for (c = 0; c < UCHAR_MAX + 1; c++)
{
if (map[c] == 1)
{
int sp = isspace(c) ? 'X' : ' ';
int up = isupper(c) ? 'X' : ' ';
int lo = islower(c) ? 'X' : ' ';
int al = isalpha(c) ? 'X' : ' ';
int an = isalnum(c) ? 'X' : ' ';
int dg = isdigit(c) ? 'X' : ' ';
int xd = isxdigit(c) ? 'X' : ' ';
int gr = isgraph(c) ? 'X' : ' ';
int pu = ispunct(c) ? 'X' : ' ';
int pr = isprint(c) ? 'X' : ' ';
int ct = iscntrl(c) ? 'X' : ' ';
int ch = (pr == 'X') ? c : ' ';
printf("0x%02X %-4c %-6c%-6c%-6c%-6c%-6c%-6c%-6c%-6c%-6c%-6c%-6c\n",
c, ch, sp, up, lo, al, an, dg, xd, gr, pu, pr, ct);
}
}
}
The extra trick that my code pulled was using setlocale() to work in the current locale rather than the C locale:
#include <locale.h>
int main(int argc, char **argv)
{
setlocale(LC_ALL, "");
filter(argc, argv, 1, classifier);
return(0);
}
The filter() function processes the arguments from argv[1] (usually optind is passed instead of 1, but there is no conditional argument processing in this code) to argv[argc-1], reading the files (or reading standard input if there are no named files). It calls classifier() for each file it opens — and handles the opening, closing, etc.
There is no fixed character list in ctype.h that could help you. In fact isprint() depends on the locale.
Assuming that you're speaking of char and not wide chars, one way to solve your issue would be to initialize a table of 256 elements, each representing a char:
char mychars[256];
memset(mychars, 0, 256);
then open your file and read all the chars, and flag those that are present:
...
int c;
while ( (c=fgetc(fp)) != EOF) {
mychars[c] |= 1;
}
then later you can just iterate through the printable ones:
for (int i=0; i<256; i++) {
if (isprint(i) && !mychars[i])
printf ("%c not found\n", (char)i);
}
You can iterate through all values of the unsigned char type, from 0 to UCHAR_MAX and check every function from <ctype.h> to determine what the classes are.
For example, you can list all digits with this:
printf("digits: ");
for (int c = 0; c <= UCHAR_MAX; c++) {
if (isdigit(c))
putchar(c);
}
printf("\n");
My suggestion:
Create an array of unsigned longs with 256 elements that can old the number of times a char occurs in the file.
Read the contents of the file character by character and update the data in the array.
After processing all the characters of the file, walk through the elements of the array and print the necessary information.
int main()
{
unsigned long charOccurrences[256] = {0};
// open the file.
FILE* fin = fopen(....);
int c;
while ( (c = fgetc(fin)) != EOF )
{
// Increment the number of occurrences.
charOccurrences[c]++;
}
// Process the data.
for (int i = 0; i < 256; ++i )
{
if ( isprint(i) && charOccurrences[i] == 0 )
{
printf("%c was not found in the file.\n", i);
}
}
// Close the file
fclose(fin);
}

Resources