Character frequency histogram in C - c

I read this program, but i'm not able to understand it. Please explain what exactly is happening in the length[] arraay . How can it be used to store different type of characters i.e. both digits & chars.Following is the code:
#include <stdio.h>
#define EOL '\n'
#define ARYLEN 256
main()
{
int c, i, x;
int length[ARYLEN];
for(x = 0; x < ARYLEN;x++)
length[x] = 0;
while( (c = getchar() ) != EOL)
{
length[c]++;
if (c == EOL)
break;
}
for(x = 0; x < ARYLEN; x++)
{
if( length[x] > 0){
printf("%c | ", x);
for(i = 1; i <= length[x]; ++i){
printf("*");
}
printf("\n");
}
}
}

The array doesn't store any characters (at least conceptually). It stores the number of times the program has encountered a character with the numerical value c in the array position of index c.
Basically, in the C programming language, a char is a datatype that consists of 8 bits and is able to hold values of the range 0 to 255 for an unsigned char or -128 to 127 for a signed char.
The program then defines an array large enough to hold as many different values as it is possible to represent using a char, one array position for each unique value.
Then it counts the number of occurances using the appropriate array position, length[c], as a counter for that specific value. As it loops over the array to print out the data, it can tell which character the data belongs to just by looking at the current index inside the loop, so printf("%c | ", x); is the character while length[x] is the data we're after.

In your code the integer array length[] is not used to store characters. It is only used to store the count of each character being typed. The characters are read one by one into the character variable c while( (c = getchar() ) != EOL).
But the tricky part is length[c]++;. The count of each character is kept at a location equal to its ASCII value - 1 in the array length[].
For example in a system using ASCII codes, length[64] contains the count of A, because 65 is the ASCII code for A.
length[65] contains the count of B, because 66 is the ASCII-8 code for B.
length[96] contains the count of a, because 97 is the ASCII code for a.
length[47] contains the count of 0, because 48 is the ASCII code for 0.

Related

Read text letter by letter without strings

What would the best way to go about reading the text from a user and then counting the letters from the next one by one?
For example, the user enters
Hello World
The program would record in an array that
{0,0,0,1,1,0,0,1,0,0,0,3,0,0,2,0,0,1,0,0,0,0,1,0,0,0}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
As the title says NO STRINGS!
In my attempts, I am trying to use the ascii table for a more efficient method instead of comparing each of the user inputs to every letter of the alphabet.
EDIT: how do i run a loop of all input string characters without using a string?
You don't have to compare each user input to every letter of the alphabet.
All you need to do is create an array of size 26 for the 26 English characters( assuming you only use upper case characters). Keep all the initial array elements equal to 0. Run a loop for all the input string characters and subtract 65 from ASCII value of that character which will give you the location of that character in the array and increment its value by 1.
You can have array of 26 integers and increment the respective ASCII index.
Example:
int counter[26];
char buffer[256];
fgets(buffer, sizeof buffer, stdin);
for(int i=0;i<strlen(buffer);i++)
{
if (buffer[i]>='A' && buffer[i]<='Z')
counter[buffer[i]-'A']++;
else if (buffer[i]>='a' && buffer[i]<='z')
counter[buffer[i]-'a']++;
}
The basic idea is to use a frequency table, which stores how often each character appears in the input. This happens in the first part of the below code.
In the second half, the code prints how often each of the interesting characters appears. This part does NOT assume that the letters appear in a single block in the character set. Therefore it also works on EBCDIC computers. It calculates the sum of the uppercase and lowercase frequencies and outputs that.
#include <studio.h>
int main(void) {
int freq[256] = {0}; // initializes the whole array to 0; only works with 0
int ch;
while ((ch = fgetc(stdin)) != EOF) {
freq[ch]++;
}
const char *upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const char *lower = "abcdefghijklmnopqrstuvwxyz";
for (int i = 0; upper[i] != '\0') {
fprintf("character %c appears %5d times\n",
upper[i],
freq[upper[i]] + freq[lower[i]]);
}
}

Do char's in C have pre-assigned zero indexed values?

Sorry if my title is a little misleading, I am still new to a lot of this but:
I recently worked on a small cipher project where the user can give the file a argument at the command line but it must be alphabetical. (Ex: ./file abc)
This argument will then be used in a formula to encipher a message of plain text you provide. I got the code to work, thanks to my friend for helping but i'm not 100% a specific part of this formula.
#include <stdio.h>
#include <cs50.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <ctype.h>
int main (int argc, string argv[])
{ //Clarify that the argument count is not larger than 2
if (argc != 2)
{
printf("Please Submit a Valid Argument.\n");
return 1;
}
//Store the given arguemnt (our key) inside a string var 'k' and check if it is alpha
string k = (argv[1]);
//Store how long the key is
int kLen = strlen(k);
//Tell the user we are checking their key
printf("Checking key validation...\n");
//Pause the program for 2 seconds
sleep(2);
//Check to make sure the key submitted is alphabetical
for (int h = 0, strlk = strlen(k); h < strlk; h++)
{
if isalpha(k[h])
{
printf("Character %c is valid\n", k[h]);
sleep(1);
}
else
{ //Telling the user the key is invalid and returning them to the console
printf("Key is not alphabetical, please try again!\n");
return 0;
}
}
//Store the users soon to be enciphered text in a string var 'pt'
string pt = get_string("Please enter the text to be enciphered: ");
//A prompt that the encrypted text will display on
printf("Printing encrypted text: ");
sleep(2);
//Encipher Function
for(int i = 0, j = 0, strl = strlen(pt); i < strl; i++)
{
//Get the letter 'key'
int lk = tolower(k[j % kLen]) - 'a';
//If the char is uppercase, run the V formula and increment j by 1
if isupper(pt[i])
{
printf("%c", 'A' + (pt[i] - 'A' + lk) % 26);
j++;
}
//If the char is lowercase, run the V formula and increment j by 1
else if islower(pt[i])
{
printf("%c", 'a' + (pt[i] - 'a' + lk) % 26);
j++;
}
//If the char is a symbol just print said symbol
else
{
printf("%c", pt[i]);
}
}
printf("\n");
printf("Closing Script...\n");
return 0;
}
The Encipher Function:
Uses 'A' as a char for the placeholder but does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)
In C, character literals like 'A' are of type int, and represent whatever integer value encodes the character A on your system. On the 99.999...% of systems that use ASCII character encoding, that's the number 65. If you have an old IBM mainframe from the 1970s using EBCDIC, it might be something else. You'll notice that the code is subtracting 'A' to make 0-based values.
This does make the assumption that the letters A-Z occupy 26 consecutive codes. This is true of ASCII (A=65, B=66, etc.), but not of all codes, and not guaranteed by the language.
does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)
No. Strictly conforming C code can not depend on any character encoding other than the numerals 0-9 being represented consecutively, even though the common ASCII character set does represent them consecutively.
The only guarantee regarding character sets is per 5.2.1 Character sets, paragraph 3 of the C standard:
... the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous...
Character sets such as EBCDIC don't represent letters consecutively
char is a numeric type that happens to also often be used to represent visible characters (or special non-visible pseudo-characters). 'A' is a value (with actual type int) that can be converted to a char without overflow or underflow. That is, it's really some number, but you usually don't need to know what number, since you generally use a particular char value either as just a number or as just a character, not both.
But this program is using char values in both ways, so it somewhat does matter what the numeric values corresponding to visible characters are. One way it's very often done, but not always, is using the ASCII values which are numbered 0 to 127, or some other scheme which uses those values plus more values outside that range. So for example, if the computer uses one of those schemes, then 'A'==65, and 'A'+1==66, which is 'B'.
This program is assuming that all the lowercase Latin-alphabet letters have numeric values in consecutive order from 'a' to 'z', and all the uppercase Latin-alphabet letters have numeric values in consecutive order from 'A' to 'Z', without caring exactly what those values are. This is true of ASCII, so it will work on many kinds of machines. But there's no guarantee it will always be true!
C does guarantee the ten digit characters from '0' to '9' are in consecutive order, which means that if n is a digit number from zero to nine inclusive, then n + '0' is the character for displaying that digit, and if c is such a digit character, then c - '0' is the number from zero to nine it represents. But that's the only guarantee the C language makes about the values of characters.
For one counter-example, see EBCDIC, which is not in much use now, but was used on some older computers, and C supports it. Its alphabetic characters are arranged in clumps of consecutive letters, but not with all 26 letters of each case all together. So the program would give incorrect results running on such a computer.
Sequentiality is only one aspect of concern.
Proper use of isalpha(ch) is another, not quite implemented properly in OP's code.
isalpha(ch) expects a ch in the range of unsigned char or EOF. With k[h], a char, that value could be negative. Insure a non-negative value with:
// if isalpha(k[h])
if isalpha((unsigned char) k[h])

why we need "-'0'" to modify array?

This is code from C by Dennis Ritchie, chapter "Array":
#include <stdio.h>
/* count digits, white space, others */
main()
{
int c, i, nwhite, nother;
int ndigit[10];
nwhite = nother = 0;
for (i = 0; i < 10; ++i)
ndigit[i] = 0;
while ((c = getchar()) != EOF)
if (c >= '0' && c <= '9')
++ndigit[c-'0'];
else if (c == ' ' || c == '\n' || c == '\t')
++nwhite;
else
++nother;
printf("digits =");
for (i = 0; i < 10; ++i)
printf(" %d", ndigit[i]);
printf(", white space = %d, other = %d\n", nwhite, nother);
}
Why do we need -'0' in this line?
++ndigit[c-'0'];
If I change it to ++ndigit[c], the program doesn't work properly. Why can't we just write ++ndigit[c]?
I already read the explanation of the book, but I don't understand it.
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets. By definition, chars are just small integers, so char variables and constants are identical to ints in arithmetic expressions. This is natural and convenient; for example c-'0' is an integer expression with a value between 0 and 9 corresponding to the character '0' to '9' stored in c, and thus a valid subscript for the array ndigit
to understand why we need "-'0'" you first need to understand ASCII table - http://www.asciitable.com/
now you need to understand that every character in C is represented by a number between 0 and 127 ( 255 for extended ).
for example if you'll print the character '0' for his numeric value:
printf( "%d", '0' );
output: 48
now you've declared an array of size 10 - ndigit[ 10 ], where the n cell represent the number of times the number n was given as input.
so if you receive '0' as input you'd want to do ndigit[ 0 ]++ so you need to convert from char to integer. and you can do that by subtracting 48 ( = '0' )
thats why we use the line ++ndigit[c-'0'];
if c = '5', we will get
++ndigit['5' - '0']
++ndigit[ 53 - 48 ]
++ndigit[ 5 ]
exactly like we wanted it to be
c = getchar() will store the character code read to c, and it is differ from the integer that the character stands for.
Quote from N1256 5.2.1 Character sets
. In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous.
As this shows, the character codes for decimal digits are continuous, so you can convert the character code of decimal digits to the integer that the characters stand for by subtracting '0', which is 0's character code, from the character code.
In conclusion, c-'0' yields the integer that the character in c stands for.

Validation of infinite input char \ number

I need to get a valid number from the user between 0-9 without duplicates.
The valid number can have any number of digit, from 1 to 10.
If the user type "space" or any kind of char, then the input is invalid.
My algorithm :
1) Create an array of char in size of 10, then initialize all cells to '0'.
2) For every char that reads from the user, check if the char actually between 0-9.
2.1) If true: count the respectively cell number +1.
2.2) Else "error".
2.3) If I get to a cell that already has +1, means this number already exist, then "error".
Now a few questions about my idea:
1) Is there any better\easy algorithm to do that?
2) The user doesn't type char by char, means I can get an infinite char length, so where do I store everything?
The answer to 2) is: you don't store the characters at all, you process them one by one. You only need storage to remember which digits you have already seen. I'd do it like this:
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char seen[10] = { 0 };
int c, loops;
for (loops = 0; (c = getchar()) != EOF && loops < 10; ++loops)
{
if (!isdigit(c)) {
printf ("Not a digit: %c\n", c);
break;
}
c -= '0';
if (seen[c]) {
printf ("Already seen: %d\n", c);
break;
}
seen[c] = 1;
}
return 0;
}
Try to modify this program as an exercise: reduce the storage requirements of the seen[] array. As written it uses one byte per digit. Make the program use only one bit per digit.

How does this code generate the map of India?

This code prints the map of India. How does it work?
#include <stdio.h>
main()
{
int a,b,c;
int count = 1;
for (b=c=10;a="- FIGURE?, UMKC,XYZHello Folks,\
TFy!QJu ROo TNn(ROo)SLq SLq ULo+\
UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^\
NBELPeHBFHT}TnALVlBLOFAkHFOuFETp\
HCStHAUFAgcEAelclcn^r^r\\tZvYxXy\
T|S~Pn SPm SOn TNn ULo0ULo#ULo-W\
Hq!WFs XDt!" [b+++21]; )
for(; a-- > 64 ; )
putchar ( ++c=='Z' ? c = c/ 9:33^b&1);
return 0;
}
The long string is simply a binary sequence converted to ASCII. The first for statement makes b start out at 10, and the [b+++21] after the string yields 31. Treating the string as an array, offset 31 is the start of the "real" data in the string (the second line in the code sample you provided). The rest of the code simply loops through the bit sequence, converting the 1's and 0's to !'s and whitespace and printing one character at a time.
Less obfuscated version:
#include "stdio.h"
int main (void) {
int a=10, b=0, c=10;
char* bits ="TFy!QJu ROo TNn(ROo)SLq SLq ULo+UHs UJq TNn*RPn/QPbEWS_JSWQAIJO^NBELPeHBFHT}TnALVlBLOFAkHFOuFETpHCStHAUFAgcEAelclcn^r^r\\tZvYxXyT|S~Pn SPm SOn TNn ULo0ULo#ULo-WHq!WFs XDt!";
a = bits[b];
while (a != 0) {
a = bits[b];
b++;
while (a > 64) {
a--;
if (++c == 'Z') {
c /= 9;
putchar(c);
} else {
putchar(33 ^ (b & 0x01));
}
}
}
return 0;
}
The strange clever part is in the putchar statements. Take the first putchar. ASCII 'Z' is 90 in decimal, so 90 / 9 = 10 which is a newline character. In the second, decimal 33 is ASCII for '!'. Toggling the low-order bit of 33 gives you 32, which is ASCII for a space. This causes ! to be printed if b is odd, and a blank space to be printed if b is even. The rest of the code is simply there to walk the "pointer" a through the string.
Basically, the string is a run-length encoding of the image: Alternating characters in the string say how many times to draw a space, and how many times to draw an exclamation mark consecutively. Here is an analysis of the different elements of this program:
The encoded string
The first 31 characters of this string are ignored. The rest contain instructions for drawing the image. The individual characters determine how many spaces or exclamation marks to draw consecutively.
Outer for loop
This loop goes over the characters in the string. Each iteration increases the value of b by one, and assigns the next character in the string to a.
Inner for loop
This loop draws individual characters, and a newline whenever it reaches the end of line. The number of characters drawn is a - 64. The value of c goes from 10 to 90, and resets to 10 when the end of line is reached.
The putchar
This can be rewritten as:
++c;
if (c==90) { //'Z' == 90
c = 10; //Note: 10 == '\n'
putchar('\n');
}
else {
if (b % 2 == 0)
putchar('!');
else
putchar(' ');
}
It draws the appropriate character, depending on whether b is even or odd, or a newline when needed.

Resources