C beginner: Need explanation of error messages from "ideone" - c

I am attempting to write a program that accepts grammatically incorrect text (under 990 characters in length) as input, corrects it, and then returns the corrected text as output. I attempted to run the program using the online compiler, "ideone", but it returned quite a few errors that I don't quite understand. I have posted my code, as well as a picture of the errors below. Can anybody explain to me what exactly the errors mean?
#include "stdio.h"
char capitalize(int i); //prototype for capitalize method
int main(void)
{
char userInput[1200]; //Array of chars to store user input. Initialized to 1200 to negate the possibility of added characters filling up the array.
int i; //Used as a counter for the for loop below.
int j; //Used as a counter for the second for loop within the first for loop below.
int numArrayElements;
printf("Enter your paragraphs: ");
scanf("%c", &userInput); //%c used since chars are expected as input(?)
numArrayElements = sizeof(userInput) / sizeof(userInput[0]); //stores the number of elements in the array into numArrayElements.
if (userInput[0] >= 97 && userInput[0] <= 122) //Checks the char in index 0 to see if its ascii value is equal to that of a lowercase letter. If it is, it is capitalized.
userInput[0] = capitalize(userInput[0]);
//code used to correct input should go here.
for (i = 1; i < numArrayElements; i++) //i is set to 1 here because index 0 is taken care of by the if statement above this loop
{
if (userInput[i] == 32) //checks to see if the char at index i has the ascii value of a space.
if (userInput[i + 1] == 32 && userInput[i - 1] != 46) //checks the char at index i + 1 to see if it has the ascii value of a space, as well as the char at index i - 1 to see if it is any char other than a period. The latter condition is there to prevent a period from being added if one is already present.
{
for (j = numArrayElements - 1; j > (i - 1); j--) //If the three conditions above are satisfied, all characters in the array at location i and onwards are shifted one index to the right. A period is then placed within index i.
userInput[j + 1] = userInput[j];
userInput[i] = 46; //places a period into index i.
numArrayElements++; //increments numArrayElements to reflect the addition of a period to the array.
if (userInput[i + 3] >= 97 && userInput[i + 3] <= 122) //additionally, the char at index i + 3 is examined to see if it is capitalized or not.
userInput[i + 3] = capitalize(userInput[i + 3]);
}
}
printf("%c\n", userInput); //%c used since chars are being displayed as output.
return 0;
}
char capitalize(char c)
{
return (c - 32); //subtracting 32 from a lowercase char should result in it gaining the ascii value of its capitalized form.
}

Your code hase several problems, quite typical for a beginner. Teh answer to teh question in your last commenst lies in the way scanf() works: it takes everything between whitepsaces as a token, so it just ends after hey. I commented the code for the rest of the problems I found without being too nitpicky. The comments below this post might do it if they fell so.
#include "stdlib.h"
#include "stdio.h"
#include <string.h>
// Check for ASCII (spot-checks only).
// It will not work for encodings that are very close to ASCII but do not earn the
// idiomatic cigar for it but will fail for e.g.: EBCDIC
// (No check for '9' because non-consecutive digits are forbidden by the C-standard)
#if ('0' != 0x30) || ('a' != 0x61) || ('z' != 0x7a) || ('A' != 0x41) || ('Z' != 0x5a)
#error "Non-ASCII input encoding found, please change code below accordingly."
#endif
#define ARRAY_LENGTH 1200
// please put comments on top, not everyone has a 4k monitor
//prototype for capitalize method
char capitalize(char i);
int main(void)
{
//Array of chars to store user input.
// Initialized to 1200 to negate the possibility of
// added characters filling up the array.
// added one for the trailing NUL
char userInput[ARRAY_LENGTH + 1];
// No need to comment counters, some things can be considered obvious
// as are ints called "i", "j", "k" and so on.
int i, j;
int numArrayElements;
// for returns
int res;
printf("Enter your paragraphs: ");
// check returns. Always check returns!
// (there are exceptions if you know what you are doing
// or if failure is unlikely under normal circumstances (e.g.: printf()))
// scanf() will read everything that is not a newline up to 1200 characters
res = scanf("%1200[^\n]", userInput);
if (res != 1) {
fprintf(stderr, "Something went wrong with scanf() \n");
exit(EXIT_FAILURE);
}
// you have a string, so use strlen()
// numArrayElements = sizeof(userInput) / sizeof(userInput[0]);
// the return type of strlen() is size_t, hence the cast
numArrayElements = (int) strlen(userInput);
// Checks the char in index 0 to see if its ascii value is equal
// to that of a lowercase letter. If it is, it is capitalized.
// Do yourself a favor and use curly brackets even if you
// theoretically do not need them. The single exception being "else if"
// constructs where it looks more odd if you *do* place the curly bracket
// between "else" and "if"
// don't use the numerical value here, use the character itself
// Has the advantage that no comment is needed.
// But you still assume ASCII or at least an encoding where the characters
// are encoded in a consecutive, gap-less way
if (userInput[0] >= 'a' && userInput[0] <= 'z') {
userInput[0] = capitalize(userInput[0]);
}
// i is set to 1 here because index 0 is taken care of by the
// if statement above this loop
for (i = 1; i < numArrayElements; i++) {
// checks to see if the char at index i has the ascii value of a space.
if (userInput[i] == ' ') {
// checks the char at index i + 1 to see if it has the ascii
// value of a space, as well as the char at index i - 1 to see
// if it is any char other than a period. The latter condition
// is there to prevent a period from being added if one is already present.
if (userInput[i + 1] == ' ' && userInput[i - 1] != '.') {
// If the three conditions above are satisfied, all characters
// in the array at location i and onwards are shifted one index
// to the right. A period is then placed within index i.
// you need to include the NUL at the end, too
for (j = numArrayElements; j > (i - 1); j--) {
userInput[j + 1] = userInput[j];
}
//places a period into index i.
userInput[i] = '.';
// increments numArrayElements to reflect the addition
// of a period to the array.
// numArrayElements might be out of bounds afterwards, needs to be checked
numArrayElements++;
if (numArrayElements > ARRAY_LENGTH) {
fprintf(stderr, "numArrayElements %d out of bounds\n", numArrayElements);
exit(EXIT_FAILURE);
}
// additionally, the char at index i + 3 is examined to see
// if it is capitalized or not.
// The loop has the upper limit at numArrayElements
// i + 3 might be out of bounds, so check
if (i + 3 > ARRAY_LENGTH) {
fprintf(stderr, "(%d + 3) is out of bounds\n",i);
exit(EXIT_FAILURE);
}
if (userInput[i + 3] >= 97 && userInput[i + 3] <= 122) {
userInput[i + 3] = capitalize(userInput[i + 3]);
}
}
}
}
printf("%s\n", userInput);
return 0;
}
char capitalize(char c)
{
// subtracting 32 from a lowercase char should result
// in it gaining the ascii value of its capitalized form.
return (c - ' ');
}

Related

CS50 Week 2 Caesar Practice

My code seems to be working properly except at the point when it should print the final output. The problem is to input a string and output an encrypted version. The encryption works by adding an int defined as the key and then adding that value to each character of the ascii values of the inputed string. My issue is that when the cypher text is outputted there are only spaces and no letters or even numbers.
#include <cs50.h>
#include <stdio.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, string argv[]) {
int key = atoi(argv[1]);
printf("%i\n", key);
if (argc != 2) {
printf("Usage: ./ceasar key\n");
} else {
string text = get_string("Plaintext: ");
for (int i = 0, len = strlen(text); i < len; i++) {
int cipher = text[i];
int ciphertext = cipher + key;
int ciphermod = ciphertext % 26;
printf("%c", ciphermod);
}
printf("\n");
}
}
You've got a few issues going on here. Please make sure to thoroughly read the assignment before turning to others for assistance.
The assignment requires you to:
Only encode alphabetic characters. Look to the function isalpha() for this.
Encode both uppercase and lowercase characters accurately. Note that, in ASCII, uppercase letters and lowercase letters are separate entities.
Meaning, you must have your code be able to handle both, as they are each handled differently.
Perhaps taking some time to sit and take in the ASCII table may be helpful to you, as it will help you understand what is really happening when you add the key.
Use the correct formula for encoding letters. The i'th ciphered letter ci corresponding to the i'th plaintext letter pi is defined as ci = (pi + k) % 26.
Your code is equivalent to this formula, but it does not account for wrapping, uppercase/lowercase letters, etc. The project specification doesn't just ask you to repeat the formula, it asks you to solve a problem using it. To do so, you must understand it. I explain more, subsequently.
I recommend:
Modifying the text in-place. Currently, you calculate the ciphered text and print it. If you add code for modifying the text where it sits, it'll make ignoring non-alphabetic characters easier.
Modify the formula.
Where 𝚨 is the ASCII character code for the beginning of either the uppercase or lowercase characters, the formula might shake out as follows:
ci = (pi - 𝚨 + k) % 26 + 𝚨
What this modified formula does is first take the ASCII code for Pi and turn it into a number that represents which letter in the alphabet it is, ignoring case. Then, you can add the key(shift the cipher). Using % 26 on this result then makes sure that the result is between 1 and 26—always a letter. Finally, we add back 𝚨 so that the character has a case again.
Here's the modified code with the solution broken down, step by step:
// ...
for (int i = 0, n = strlen(text); i < n; i++) {
if (!isalpha(text[i])) continue;
if (isupper(text[i])) {
// the letter's ASCII code on its own.
int charcode = text[i];
// the letter's index in the alphabet. A = 0, B = 1, etc.
// this is no longer a valid ASCII code.
int alphabet_index = charcode - 'A';
// the letter's index in the alphabet, shifted by the key.
// note, this may shift the letter past the end/beginning of the alphabet.
int shifted_alphabet_index = alphabet_index + key;
// the letter's index in the alphabet, shifted by the key, wrapped around.
// the modulo operator (%) returns the remainder of a division.
// in this instance, the result will always be between 0 and 25,
// meaning it will always be a valid index in the alphabet.
int shifted_index_within_alphabet = shifted_alphabet_index % 26;
// this is the final ASCII code of the letter, after it has been shifted.
// we achieve this by adding back the 'A' offset so that the letter is
// within the range of the correct case of letters.
int final_shifted_charcode = shifted_index_within_alphabet + 'A';
text[i] = final_shifted_charcode;
}
else { // islower
int charcode = text[i];
int alphabet_index = charcode - 'a';
int shifted_alphabet_index = alphabet_index + key;
int shifted_index_within_alphabet = shifted_alphabet_index % 26;
int final_shifted_charcode = shifted_index_within_alphabet + 'a';
text[i] = final_shifted_charcode;
}
}
printf("ciphertext: %s\n", text);
// ...
And here is the solution, simplified down:
// ...
for (int i = 0, n = strlen(text); i < n; i++) {
if (!isalpha(text[i])) // if not alphabetic, skip
continue; //
if (isupper(text[i])) // if uppercase
text[i] = (text[i] - 'A' + key) % 26 + 'A'; //
else // if lowercase
text[i] = (text[i] - 'a' + key) % 26 + 'a'; //
}
printf("ciphertext: %s\n", text);
// ...
Just as a side note, the statement if (!isalpha(text[i])) is acting like something called a guard clause. This is a useful concept to know. Using guard clauses allows you to have simpler, more readable code. Imagine if I had nested all of the code inside the for loop under the if (isalpha(text[i])) condition. It would be harder to read and understand, and difficult to match up the different bracket pairs.
Edit: I would also echo what chqrlie said. Do not use argv[n] until you have verified that argc >= (n + 1)
The formula to compute the ciphered characters is incorrect:
you should only encode letters
you should subtract the code for the first letter 'a' or 'A'
you should add the code for the first letter 'a' or 'A' to the encoded index.
Note also that you should not use argv[1] until you have checked that enough arguments have been passed.
Here is a modified version:
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, string argv[]) {
if (argc != 2) {
printf("Usage: ./ceasar key\n");
} else {
int key = atoi(argv[1]);
printf("%i\n", key);
string text = get_string("Plaintext: ");
for (int i = 0, len = strlen(text); i < len; i++) {
int c = text[i];
if (c >= 'a' && c <= 'z') {
int cipher = c - 'a';
int ciphertext = cipher + key;
int ciphermod = ciphertext % 26;
c = 'a' + ciphermod;
} else
if (c >= 'A' && c <= 'Z') {
int cipher = c - 'A';
int ciphertext = cipher + key;
int ciphermod = ciphertext % 26;
c = 'A' + ciphermod;
}
printf("%c", c);
}
printf("\n");
}
return 0;
}

How do you count the frequency of which a word of n length occurs within a string

I have this code here that correctly formats the hard-coded sentence and finds the frequency of which a certain letter shows up in that string:
#include <stdio.h>
#include <string.h>
int main() {
char words[1000][100];
int x = 0, y;
char myString[10000] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf("Original Text:\n");
printf("%s\n", myString);
// Function for uppercase letters to become lowercase and to remove special characters
for (x = 0; x <= strlen(myString); ++x) {
if (myString[x] >= 65 && myString[x] <= 90)
myString[x] = myString[x] + 32;
}
for (x = 0; myString[x] != '\0'; ++x) {
while (!(myString[x] >= 'a' && myString[x] <= 'z') &&
!(myString[x] >= 'A' && myString[x] <= 'Z') &&
!(myString[x] >= '0' && myString[x] <= '9') &&
!(myString[x] == '\0') && !(myString[x] == ' ')) {
for (y = x; myString[y] != '\0'; ++y) {
myString[y] = myString[y + 1];
}
myString[y] = '\0';
}
}
printf("\nModified Text: \n%s\n", myString);
// Part A
int counts[26] = { 0 };
int k;
size_t myString_length = strlen(myString);
for (k = 0; k < myString_length; k++) {
char c = myString[k];
if (!isalpha(c))
continue;
counts[(int)(c - 'a')]++;
}
printf("\nLetter\tCount\n------ -----\n");
for (k = 0; k < 26; ++k) {
printf("%c\t%d\n", k + 'a', counts[k]);
}
// Part B
int i = 0, count = 0, occurrences[10000] = { 0 };
while (myString[i] != '\0') {
char wordArray[100];
int j = 0;
while (myString[i] != ' ' && myString[i] != '\0') {
wordArray[j++] = myString[i++];
}
if (wordArray[j - 1] == ',' || wordArray[j - 1] == '.') {
wordArray[j - 1] = '\0';
}
wordArray[j] = '\0';
int status = -1;
for (j = 0; j < count; ++j) {
if (strcmp(words[j], wordArray) == 0) {
status = j;
break;
}
}
if (status != -1) {
occurrences[status] += 1;
} else {
occurrences[count] += 1;
strcpy(words[count++], wordArray);
}
++i;
}
printf("\nWord Length\tOccurrences\n----------- -----------\n");
for (i = 0; i < count; ++i) {
// print each word and its occurrences
printf("%s\t\t%d\n", words[i], occurrences[i]);
}
}
Part B is where I'm having a problem though, I want the code to be able to tell me the occurrence of which a word of a specific length shows up, such as this instance:
Word length Occurrences
1 0
2 1
Here, there are no instances where there is a word with one character, but there is one instance where there is a word with two characters. However, my code is outputting the number of times a specific word is given and not what I want above, like this:
Word Length Occurrences
----------- -----------
the 3
quick 1
brown 1
3
fox 1
jumps 1
over 1
lazy 2
dog 2
and 1
is 1
still 1
sleeping 1
How would I go about changing it so that it shows the output I want with just the word length and frequency?
Here are some remarks about your code:
the first loop recomputes the length of the string for each iteration: for (x = 0; x <= strlen(myString); ++x). Since you modify the string inside the loop, it is difficult for the compiler to ascertain that the string length does not change, so a classic optimisation may not work. Use the same test as for the next loop:
for (x = 0; myString[x] != '\0'; ++x)
the test for uppercase is not very readable because you hardcode the ASCII values of the letters A and Z, you should either write:
if (myString[x] >= 'A' && myString[x] <= 'Z')
myString[x] += 'a' - 'A';
or use macros from <ctype.h>:
unsigned char c = myString[x];
if (isupper(c))
myString[x] = tolower(c);
or equivalently and possibly more efficiently:
myString[x] = tolower((unsigned char)myString[x]);
in the second loop, you remove characters that are neither letters, digits nor spaces. You have a redundant nested while loop and a third nested loop to shift the rest of the array for each byte removed: this method has cubic time complexity, O(N3), very inefficient. You should instead use a two finger method that operates in linear time:
for (x = y = 0; myString[x] != '\0'; ++x) {
unsigned char c = myString[x];
if (!isalnum(c) && c != ' ') {
myString[y++] = c;
}
}
myString[y] = '\0';
note that this loop removes all punctuation instead of replacing it with spaces: this might glue words together such as "a fine,good man" -> "a finegood man"
In the third loop, you use a char value c as an argument for isalpha(c). You should include <ctype.h> to use any function declared in this header file. Functions and macros from <ctype.h> are only defined for all values of the type unsigned char and the special negative value EOF. If type char is signed on your platform, isalpha(c) would have undefined behavior if the string has negative characters. In your particular case, you filtered characters that are not ASCII letters, digits or space, so this should not be a problem, yet it is a good habit to always use unsigned char for the character argument to isalpha() and equivalent functions.
Note also that this counting phase could have been combined into the previous loops.
to count the occurrences of words, the array occurrences should have the same number of elements as the words array, 1000. You do not check for boundaries so you have undefined behavior if there are more than 1000 different words and/or if any of these words has 100 characters or more.
in the next loop, you extract words from the string, incrementing i inside the nested loop body. You also increment i at the end of the outer loop, hence skipping the final null terminator. The test while (myString[i] != '\0') will test bytes beyond the end of the string, which is incorrect and potential undefined behavior.
to avoid counting empty words in this loop, you should skip sequences of spaces before copying the word if not at the end of the string.
According to the question, counting individual words is not what Part B is expected to do, you should instead count the frequency of word lengths. You can do this in the first loop by keeping track of the length of the current word and incrementing the array of word length frequencies when you find a separator.
Note that modifying the string is not necessary to count letter frequencies or word length occurrences.
Writing a separate function for each task is recommended.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#define MAX_LENGTH 100
// Function to lowercase letters and remove special characters
void clean_string(char *str) {
int x, y;
printf("Original Text:\n");
printf("%s\n", str);
for (x = y = 0; str[x] != '\0'; x++) {
unsigned char c = str[x];
c = tolower(c);
if (isalnum(c) || c == ' ') {
str[y++] = c;
}
}
str[y] = '\0';
printf("\nModified Text:\n%s\n", str);
}
// Part A: count letter frequencies
void count_letters(const char *str) {
int letter_count['z' - 'a' + 1] = { 0 };
for (int i = 0; str[i] != '\0'; i++) {
unsigned char c = str[i];
if (c >= 'a' && c <= 'z') {
letter_count[c - 'a'] += 1;
} else
if (c >= 'A' && c <= 'Z') {
letter_count[c - 'A'] += 1;
}
}
printf("\nLetter\tCount"
"\n------\t-----\n");
for (int c = 'a'; c <= 'z'; c++) {
printf("%c\t%d\n", c, letter_count[c - 'a']);
}
}
// Part B: count word lengths frequencies
void count_word_lengths(const char *str) {
int length_count[MAX_LENGTH + 1] = { 0 };
for (int i = 0, len = -1;; i++) {
unsigned char c = str[i];
// counting words as sequences of letters or digits
if (isalnum(c)) {
len++;
} else {
if (len >= 0 && len <= MAX_LENGTH) {
length_count[len] += 1;
len = -1;
}
}
if (c == '\0')
break;
}
printf("\nWord Length\tOccurrences"
"\n-----------\t-----------\n");
for (int len = 0; len <= MAX_LENGTH; len++) {
if (length_count[len]) {
printf("%-11d\t%d\n", len, length_count[len]);
}
}
}
int main() {
char myString[] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
// Uncomment if modifying the string is required
//clean_string(myString);
count_letters(myString);
count_word_lengths(myString);
return 0;
}
Output:
Letter Count
------ -----
a 3
b 1
c 1
d 3
e 6
f 1
g 3
h 3
i 4
j 1
k 1
l 5
m 1
n 3
o 5
p 2
q 1
r 2
s 4
t 4
u 2
v 1
w 1
x 1
y 2
z 2
Word Length Occurrences
----------- -----------
1 1
2 7
3 3
4 4
7 1
Use strtok_r() and simplify counting.
It's sibling strtok() is not thread-safe. Discussed in detail in Why is strtok() Considered Unsafe?
Also, strtok_r() chops input string by inserting \0 chars inside the string. If you want to keep a copy of original string, you have to make a copy of original string and pass it on to strtok_r().
There is also another catch. strtok_r() is not a part of C-Standard yet, but POSIX-2008 lists it. GNU glibc implements it, but to access this function we need to #define _POSIX_C_SOURCE before any includes in our source files.
There is also strdup() & strndup() which duplicate an input string, they allocate memory for you. You've to free that string-memory when you're done using it. strndup() was added in POSIX-2008 so we declare 200809L in our sources to use it.
It's always better to use new standards to write fresh code. POSIX 200809L is recommended with at least C standard 2011.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_STR_LEN 1024
#define MAX_WORD_LEN 128
#define WORD_DELIMS " \n\t"
int is_word (const char* str, const size_t slen) {
int word = 0;
for (size_t ci = 0; ci < slen;)
if (isalnum (str[ci++])) {
word = 1;
break;
}
return word;
}
void get_word_stat (const char* str, int word_stat[]) {
char *copy = strndup (str, MAX_STR_LEN); // limiting copy
if (!copy) { // copying failed
printf ("Error duplicating input string\n");
exit (1);
}
for (char *token, *rmdStr = copy; (token = strtok_r (NULL, WORD_DELIMS, &rmdStr)); /* empty */) {
size_t token_len = strlen (token);
if (token_len > (MAX_WORD_LEN - 1)) {
printf ("Error: Increase MAX_WORD_LEN(%d) to handle words of length %lu\n", MAX_WORD_LEN, token_len);
exit (2);
}
if (is_word (token, token_len))
++word_stat[token_len];
else
printf ("[%s] not a word\n", token);
}
free (copy);
}
int main () {
char str [MAX_STR_LEN] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf ("Original Text: [%s]\n", str);
int word_stat[MAX_WORD_LEN] = {0};
get_word_stat (str, word_stat);
printf ("\nWordLength Occurrences\n");
for (int si = 1; si < MAX_WORD_LEN; ++si) {
if (word_stat[si])
printf ("%d\t\t%d\n", si, word_stat[si]);
}
return 0;
}
Whenever you are interested in the frequency that something occurs, you want to use a Frequency Array containing the number of elements necessary to handle the entire range of possible occurrence. You want to track the frequency of word-lengths, so you need an array that is sized to track the longest word. (longest word in the non-medical unabridged dictionary is 29-characters, longest medical word is 45-characters)
So here a simple array of integers with 29 elements will do (unless you want to consider medical words, then use 45). If you want to consider non-sense words, then size appropriately, e.g. "Supercalifragilisticexpialidocious", 34-characters. Chose the type based on a reasonably anticipated maximum number of occurrences. Using signed int that limits the occurrences to INT_MAX (2147483647). Using unsigned will double the limit, or using uint64_t for a full 64-bit range.
How it works
How do you use a simple array to tract the occurrences of word lengths? Simple, declare an array of sufficient size and initialize all elements zero. Now all you do is read a word, use, e.g. size_t len = strlen(word); to get the length and then increment yourarray[len] += 1;.
Say the word has 10-characters, you will add one to yourarray[10]. So the array index corresponds word-length. When you have taken the length of all words and incremented the corresponding array index, to get your results, you just loop over your array and output the value (number of occurrences) at the index (word-length). If you have had two words that were 10-characters each, then yourarray[10] will contain 2 (and so on and so forth for every other index that corresponds to a different word-length number of characters).
Consideration When Choosing How to Separate Words
When selecting a method to split a string of space separated words into individual words, you need to know whether your original string is mutable. For example, if you choose to separate words with strtok(), it will modify the original string. In your case since your words are stored in an array or char, that is fine, but what if you had a string-literal like:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog ";
In that case, passing mystring to strtok() would SEGFAULT when strtok() attempts to modify the region of read-only memory holding mystring (ignoring the non-standard treatment of string-literals by Microsoft)
You can of course make a copy of mystring and put the string-literal in mutable memory and then call strtok() on the copy. Or, you can use a method that does not modify mystring (like using sscanf() and an offset to parse the words, or using alternating calls to strcspn() and strspn() to locate and skip whitespace, or simply using a start and end pointer to work down the string bracketing words and copying characters between the pointers. Entirely up to you.
For example, using sscanf() with an offset to work down the string, updating the offset from the beginning with the number of characters consumed during each read you could do:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0; /* offset from beginning of mystring */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* do other stuff here */
}
Testing if Words is Alphanum
You can loop over each character calling the isalnum() macro from ctype.h on each character. Or, you can let strspn() do it for you given a list of characters that your words can contain. For example for digits and alpha-characters only, you can use a simple constant, and then call strspn() in your loop to determine if the word is made up only of the characters you will accept in a word, e.g.
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
...
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
...
Neither way is more-right than the other, it's really a matter of convenience and readability. Using a library provided function also provides a bit of confidence that it is written in a manner that will allow the compiler to fully optimize the compiled code.
A Short Example
Putting the thoughts above together in a short example that will parse the words in mystring using sscanf() and then track the occurrences of all alphanum words (up to 31-characters, and outputting any word rejected) using a simple array of integers to hold the frequency of length, you could do:
#include <stdio.h>
#include <string.h>
#define MAXLEN 32 /* if you need a constant, #define one (or more) */
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
int main (void) {
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0, /* offset from beginning of mystring */
lenfreq[MAXLEN] = {0}; /* frequency array for word length */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
lenfreq[len] += 1; /* update frequency array of lengths */
}
/* output original string */
printf ("\nOriginal Text:\n\n%s\n\n", mystring);
/* output length frequency array */
puts ("word length Occurrences\n"
"----------- -----------");
for (size_t i = 0; i < MAXLEN; i++) {
if (lenfreq[i])
printf ("%2zu%14s%d\n", i, " ", lenfreq[i]);
}
}
Example Use/Output
Compiling and running the program would produce:
$ ./bin/wordlen-freq
error: rejecting "?"
error: rejecting "?"
error: rejecting "!##!"
Original Text:
The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping
word length Occurrences
----------- -----------
2 1
3 7
4 3
5 4
8 1
(note: you can output all lengths from 0 to 31 even if there were no occurrences by removing the print condition if (lenfreq[i]) -- up to you)
Look things over and let me know if you have questions.

Don't understand this unfamiliar syntax: arr1[ arr2[i] - 'a' ]++

I am looking at a program that finds the frequency of strings entered. Comparison is made based on a string's ASCII value against the ASCII value of lowercase 'a'. I have implemented it; it works, albeit, with a bug, but essentially, I am ignorant of a particular line of code;
for (int i = 0; i < strlen(arr2); i++)
{
// this line...
arr1[ arr2[i] - 'a' ]++;
}
arr1 is arr1[26] = {0},
that is, all the letters of the alphabet are assigned an index and the array is initialised to zero, while arr2[] as a function argument, receives the stdin.
How does the mysterious line of code work and what is it saying?
The full code:
#include <stdio.h>
#include <string.h>
#define ALEPH 26
void freq(char arr2[]);
int main ()
{
char * str;
printf("\nCharacter Frequency\n"
"--------------------\n");
// user input
printf("\nEnter a string of characters:\n");
fgets(str, ALEPH, stdin);
freq(str);
return 0;
}
// Function Definiton
void freq (char arr2[])
{
// array for ascii characters initialised to 0
int arr1[ALEPH] = {0};
// scan and cycle through the input array
for (int i = 0; i < strlen(arr2); i++)
{
arr1[ arr2[i] - 'a' ]++;
}
for (int j = 0; j < 26; j++)
{
if ( arr1[j] != 0 )
{
printf("\nCharacter: %c - Frequency: %d", 'a'+j, arr1[j]);
}
}
printf("\n");
}
arr1 is an array of 26 ints initialized to 0s. The indexes of its elements are 0..25.
arr2 is assumed to be a string of lowercase letters 'a'..'z' only. The characters are assumed to be using an encoding where lowercase letters are single-byte and sequential in value, such as ASCII (where a=97, ..., z=122). Anything else that does not match these assumptions will cause undefined behavior in this code.
The code loops through arr2, and for each character, calculates an index by subtracting the numeric value of 'a' (ie, ASCII 97) from the character's numeric value:
'a' - 'a' = 97 - 97 = 0
'b' - 'a' = 98 - 97 = 1
...
'z' - 'a' = 122 - 97 = 25
Then the code accesses the arr1 element at that index, and increments that element's value by 1.
You ask about the line:
arr1[ arr2[i] - 'a' ]++;
In this line:
arr1 is the array that will accumulate the histogram
arr2 is the input string which will contribute to the histogram
i is the index into input string.
This can be rewritten as:
ch = arr2[i];
histogram_slot = ch - 'a';
arr1[histogram_slot ] = arr1[histogram_slot ] + 1;
For each character in the input string, the character is fetched from the string and assigned to "ch". "ch" is converted to the index in the histogram array by subtracting 'a'. In the third line, the histogram_slot is increased by one. histogram_slot 0 is incremented for 'a', 1 for 'b', 2 for 'c', ... , and 25 for 'z'.
A serious bug in this code is that it only works for the lower case letters. An upper case letter, digit, punctuation, Unicode, extended ASCII, or any character not between 'a' and 'z' inclusive will write in an unintended region of memory. At the best, this will cause an unexpected crash. In the medium disaster, it will cause sporatic malfunction that gets through your testing. In the worst case, it creates a security hole allowing someone uncontrolled access to your stack, and thus the ability to take over execution of the thread.

While loops and arrays causing very odd behaviour...maybe a memory mixup

I'm tired of this tom-foolery occurring during runtime , although I'm sure we all are, when our programs screw up at runtime in the most obscure ways.
Getting to the point, the entire source code is a bit large to place here, but still <200 lines, so that's here . Use it if running the program, since the code I will post below is just functions, where I think the error lies.
Context : This is a sort of shift cipher with 8 different shifts taken using an 8 digit pin.
The issue is strange. Basically, the encrypt() function works correctly always -I've matched it by doing the algorithm for myself on paper ; for example, ABC is correctly encoded to 3c 45 46 -6f when the Pin is 12345678.
The strange issues are with the decrypt() function.
When the program is run for the first time, trying to run decrypt() on a valid ciphertext-pin pair always returns nothing except a /n (newline) . When tried with a different valid pin-ciphertext pair, after a successful run of encrypt() is done first, the decrypt() function just returns either the same message which was just encrypted or some other random output from the previously encoded message.
Without further ado, the legendarily screwed up decrypt function which I have rebuilt thrice now -
void decrypt()
{
printf("\n");
int *digits = pin(); int d[8];
getchar();
for (int i=0;i<8;i++)
d[i] = *(digits + i); //puts each digit in a local array.
printf("\nEnter encoded message -\n\n");
getchar();
int j; char ch, msg[3002];
for(int i=0; i < 3000;i++)
{
scanf("%x",&j);
if(j==-111){
msg[i] = '\0'; //terminates string with \0
break;
}
else{
if(ctln(i)==1)
ch = j - d[2];
else if(fib(i)==1)
ch = j + d[4];
else if(luc(i)==1)
ch = j - d[0];
else if(pent(i)==1)
ch = j + d[6];
else if(hex(i)==1)
ch = j - d[3];
else if(prm(i)==1)
ch = j + d[7];
else {
if(i%2 == 0)
ch = j - d[1];
else
ch = j + d[5];
msg[i] = ch;
}
}
}
printf("\nDecrypted message -\n\n");
puts(msg);
}
For context, as well as finding the culprits here, do make sure to read the full code here , with the pin() returning a pointer to a static int array holding all 8 digits , as well as the ctln() , fib(), luc(), pent(), hex(), prm() [ which check if position value i of char in message is a part of Catalan, Fibonacci , Lucas, Pentagon, Hexagon, Prime number series. More here.
Edit 1
I have already tried keeping different variable names, and some other things I can't fully recall. Also, because it is very relevant, below is the pin() function:
int *pin()
{
int num,q=0; static int pins[8];
printf("Enter 8-digit PIN : ");
scanf("%d", &num);
for(register int i = 10000000 ; i >= 1 ; i = (i/10)) // i is position of digit.
{
int d = ((num - (num % i)) / i); // d stores 'digit' ( divides quotient of (num % i) by i)
pins[q] = d; q++;
num = (num - ( d * i ));
}
return pins ; // pointer to static array storing digits of PIN
}
Edit 2
I had wrongly assigned pins[6] rather than pins[8] in the original code, I have corrected it but am still facing the same errors.
Edit 3
After correcting the mistake pointed out by MikeCAT, it now ignores the first character when deciphering.
Edit 4
The getchar() before scanf() was to blame, removing it fixes the last issue too. Thanks #MikeCAT !
In your decrypt() function, msg[i] = ch; is executed only if none of the functions ctln, fib, luc, pent, hex, prm returned 1.
Therefore, uninitialized value of non-static local variable msg, which is indeterminate, may be used for printing and undefined behavior may be invoked.
The part
msg[i] = ch;
}
should be
}
msg[i] = ch;
as it is done in encrypt() function.

Why is this array being initialized in an odd way?

I am reading K&R 2nd Edition and I am having trouble understanding exercise 1-13. The answer is this code
#include <stdio.h>
#define MAXHIST 15
#define MAXWORD 11
#define IN 1
#define OUT 0
main()
{
int c, i, nc, state;
int len;
int maxvalue;
int ovflow;
int wl[MAXWORD];
state = OUT;
nc = 0;
ovflow = 0;
for (i = 0; i < MAXWORD; i++)
wl[i] = 0;
while ((c = getchar()) != EOF)
{
if(c == ' ' || c == '\n' || c == '\t')
{
state = OUT;
if (nc > 0)
{
if (nc < MAXWORD)
++wl[nc];
else
++ovflow;
}
nc = 0;
}
else if (state == OUT)
{
state = IN;
nc = 1;
}
else
++nc;
}
maxvalue = 0;
for (i = 1; i < MAXWORD; ++i)
{
if(wl[i] > maxvalue)
maxvalue = wl[i];
}
for(i = 1; i < MAXWORD; ++i)
{
printf("%5d - %5d : ", i, wl[i]);
if(wl[i] > 0)
{
if((len = wl[i] * MAXHIST / maxvalue) <= 0)
len = 1;
}
else
len = 0;
while(len > 0)
{
putchar('*');
--len;
}
putchar('\n');
}
if (ovflow > 0)
printf("There are %d words >= %d\n", ovflow, MAXWORD);
return 0;
}
At the top, wl is being declared and initialized. What I don't understand is why is it looping through it and setting everything to zero if it just counts the length of words? It doesn't keep track of how many words there are, it just keeps track of the word length so why is everything set to 0?
I know this is unclear it's just been stressing me out for the past 20 minutes and I don't know why.
The ith element of the array wl[] is the number of words of length i that have been found in an input file. The wl[] array needs to be zero-initialized first so that ++wl[nc]; does not cause undefined behavior by attempting to use an uninitialized variable, and so that array elements that represent word lengths that are not present reflect that no such word lengths were found.
Note that ++wl[nc] increments the value wl[nc] when a word of length nc is encountered. If the array were not initialized, the first time the code attempts to increment an array element, it would be attempting to increment an indeterminate value. This attempt would cause undefined behavior.
Further, array indices that represent counts of word lengths that are not found in the input should hold values of zero, but without the zero-initialization, these values would be indeterminate. Even attempting to print these indeterminate values would cause undefined behavior.
The moral: initialize variables to sensible values, or store values in them, before attempting to use them.
It would seem simpler and be more clear to use an array initializer to zero-initialize the wl[] array:
int wl[MAXWORD] = { 0 };
After this, there is no need for the loop that sets the array values to zero (unless the array is used again) for another file. But, the posted code is from The C Answer Book by Tondo and Gimpel. This book provides solutions to the exercises found in the second edition of K&R in the style of K&R, and using only ideas that have been introduced in the book before each exercise. This exercise, 1.13, occurs in "Chapter 1 - A Tutorial Introduction". This is a brief tour of the language lacking many details to be found later in the book. At this point, assignment and arrays have been introduced, but array initializers have not (this has to wait until Chapter 4), and the K&R code that uses arrays has initialized arrays using loops thus far. Don't read too much into code style from the introductory chapter of a book that is 30+ years old.
Much has changed in C since K&R was published, e.g., main() is no longer a valid function signature for the main() function. Note that the function signature must be one of int main(void) or int main(int argc, char *argv[]) (or alternatively int main(int argc, char **argv)), with a caveat for implementation-defined signatures for main().
Everything is set to 0 because if you dont initialize the array, the array will be initialize with random number in it. Random number will cause error in your program. Instead of looping in every position of your array you could do this int wl[MAXWORD] = {0}; at the place of int wl[MAXWORD]; this will put 0 at every position in your array so you dont hava to do the loop.
I edited your code and put some comments in as I was working through it, to explain what's going on. I also changed some of your histogram calculations because they didn't seem to make sense to me.
Bottom line: It's using a primitive "state machine" to count up the letters in each group of characters that isn't white space. It stores this in wl[] such that wl[i] contains an integer that tells you how many groups of characters (sometimes called "tokens") has a word length of i. Because this is done by incrementing the appropriate element of w[], each element must be initialized to zero. Failing to do so would lead to undefined behavior, but probably would result in nonsensical and absurdly large counts in each element of w[].
Additionally, any token with a length that can't be reflected in w[] will be tallied in the ovflow variable, so at the end there will be an accounting of every token.
#include <stdio.h>
#define MAXHIST 15
#define MAXWORD 11
#define IN 1
#define OUT 0
int main(void) {
int c, i, nc, state;
int len;
int maxvalue;
int ovflow;
int wl[MAXWORD];
// Initializations
state = OUT; //Start off not assuming we're IN a word
nc = 0; //Start off with a character count of 0 for current word
ovflow = 0; //Start off not assuming any words > MAXWORD length
// Start off with our counters of words at each length at zero
for (i = 0; i < MAXWORD; i++) {
wl[i] = 0;
}
// Main loop to count characters in each 'word'
// state keeps track of whether we are IN a word or OUTside of one
// For each character in the input stream...
// - If it's whitespace, set our state to being OUTside of a word
// and, if we have a character count in nc (meaning we've just left
// a word), increment the counter in the wl (word length) array.
// For example, if we've just counted five characters, increment
// wl[5], to reflect that we now know there is one more word with
// a length of five. If we've exceeded the maximum word length,
// then increment our overflow counter. Either way, since we're
// currently looking at a whitespace character, reset the character
// counter so that we can start counting characters with our next
// word.
// - If we encounter something other than whitespace, and we were
// until now OUTside of a word, change our state to being IN a word
// and start the character counter off at 1.
// - If we encounter something other than whitespace, and we are
// still in a word (not OUTside of a word), then just increment
// the character counter.
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
state = OUT;
if (nc > 0) {
if (nc < MAXWORD) ++wl[nc];
else ++ovflow;
}
nc = 0;
} else if (state == OUT) {
state = IN;
nc = 1;
} else {
++nc;
}
}
// Find out which length has the most number of words in it by looping
// through the word length array.
maxvalue = 0;
for (i = 1; i < MAXWORD; ++i) {
if(wl[i] > maxvalue) maxvalue = wl[i];
}
// Print out our histogram
for (i = 1; i < MAXWORD; ++i) {
// Print the word length - then the number of words with that length
printf("%5d - %5d : ", i, wl[i]);
if (wl[i] > 0) {
len = wl[i] * MAXHIST / maxvalue;
if (len <= 0) len = 1;
} else {
len = 0;
}
// This is confusing and unnecessary. It's integer division, with no
// negative numbers. What we want to have happen is that the length
// of the bar will be 0 if wl[i] is zero; that the bar will have length
// 1 if the bar is otherwise too small to represent; and that it will be
// expressed as some fraction of MAXHIST otherwise.
//if(wl[i] > 0)
// {
// if((len = wl[i] * MAXHIST / maxvalue) <= 0)
// len = 1;
// }
// else
// len = 0;
// Multiply MAXHIST (our histogram maximum length) times the relative
// fraction, i.e., we're using a histogram bar length of MAXHIST for
// our statistical mode, and interpolating everything else.
len = ((double)wl[i] / maxvalue) * MAXHIST;
// Our one special case might be if maxvalue is huge, a word length
// with just one occurrence might be rounded down to zero. We can fix
// that manually instead of using a weird logic structure.
if ((len == 0) && (wl[i] > 0)) len = 1;
while (len > 0) {
putchar('*');
--len;
}
putchar('\n');
}
// If any words exceeded the maximum word length, say how many there were.
if (ovflow > 0) printf("There are %d words >= %d\n", ovflow, MAXWORD);
return 0;
}

Resources