Trying to make a small program that separates words within a big string and stores each word (of the big string) in a string (i.e pointer) in an array of strings (i.e pointers); forming a 2-dimensional string array.
The word separator is simply a whitespace (32 in ASCII); the big string is:
"Showbizzes Oxygenized Equalizing Liquidized Jaywalking"
Note:
the words are all 10 characters in length
the total length of the string is 54 characters (spaces included)
the total size of the buffer is 55 bytes ('\0' included)
One more thing, the last pointer in the array of pointers must hold a 0 (i.e 1 character: '\0') (this is completely arbitrary).
Here is the program, nothing special, but ...
#include <stdio.h>
#include <stdlib.h>
int main(void) {
// The string that we need to break down into individual words
char str[] = "Showbizzes Oxygenized Equalizing Liquidized Jaywalking";
// Allocate memory for 6 char pointers (i.e 6 strings) (5 of which will contain words)
// the last one will just hold 0 ('\0')
char **array; array = malloc(sizeof(char *) * 6);
// i: index for where we are in 'str'
// r: index for rows of array
// c: index for columns of array
int i, r, c;
// Allocate 10 + 1 bytes for each pointer in the array of pointers (i.e array of strings)
// +1 for the '\0' character
for (i = 0; i < 6; i++)
array[i] = malloc(sizeof(char)*11);
// Until we reach the end of the big string (i.e until str[i] == '\0');
for (i = 0, c = 0, r = 0; str[i]; i++) {
// Word seperator is a whitespace: ' ' (32 in ASCII)
if (str[i] == ' ') {
array[c][r] = '\0'; // cut/end the current word
r++; // go to next row (i.e pointer)
c = 0; // reset index of column/letter in word
}
// Copy character from 'str', increment index of column/letter in word
else { array[c][r] = str[i]; c++; }
}
// cut/end the last word (which is the current word)
array[c][r] = '\0';
// go to next row (i.e pointer)
r++;
// point it to 0 ('\0')
array[r] = 0;
// Print the array of strings in a grid - - - - - - - - - - - - - -
printf(" ---------------------------------------\n");
for (r = 0; r < 6; r++) {
printf("Word %i --> ", r);
for (c = 0; array[c][r]; c++)
printf("| %c ", array[c][r]);
printf("|");printf("\n");
printf(" ---------------------------------------");
printf("\n");
}
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
return 0;
}
.. there's something wrong and I don't understand how to fix it.
For some reason it copies into the first string (i.e pointer), in the array of strings (i.e pointers), the first 6 characters of the big string, then on the 7th it gives a segmentation fault. I've allocated 6 pointers each with 11 bytes.. at least thats what I think the code is doing, so really I have NO clue why this is happening...
Hope someone can help.
Replace all ocurrences of array[c][r] with array[r][c]
The first dimension is the row.
Next time you can check this using a debugger:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004007ea in main () at demo.c:37
37 array[c][r] = str[i];
Related
I have this code here that correctly formats the hard-coded sentence and finds the frequency of which a certain letter shows up in that string:
#include <stdio.h>
#include <string.h>
int main() {
char words[1000][100];
int x = 0, y;
char myString[10000] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf("Original Text:\n");
printf("%s\n", myString);
// Function for uppercase letters to become lowercase and to remove special characters
for (x = 0; x <= strlen(myString); ++x) {
if (myString[x] >= 65 && myString[x] <= 90)
myString[x] = myString[x] + 32;
}
for (x = 0; myString[x] != '\0'; ++x) {
while (!(myString[x] >= 'a' && myString[x] <= 'z') &&
!(myString[x] >= 'A' && myString[x] <= 'Z') &&
!(myString[x] >= '0' && myString[x] <= '9') &&
!(myString[x] == '\0') && !(myString[x] == ' ')) {
for (y = x; myString[y] != '\0'; ++y) {
myString[y] = myString[y + 1];
}
myString[y] = '\0';
}
}
printf("\nModified Text: \n%s\n", myString);
// Part A
int counts[26] = { 0 };
int k;
size_t myString_length = strlen(myString);
for (k = 0; k < myString_length; k++) {
char c = myString[k];
if (!isalpha(c))
continue;
counts[(int)(c - 'a')]++;
}
printf("\nLetter\tCount\n------ -----\n");
for (k = 0; k < 26; ++k) {
printf("%c\t%d\n", k + 'a', counts[k]);
}
// Part B
int i = 0, count = 0, occurrences[10000] = { 0 };
while (myString[i] != '\0') {
char wordArray[100];
int j = 0;
while (myString[i] != ' ' && myString[i] != '\0') {
wordArray[j++] = myString[i++];
}
if (wordArray[j - 1] == ',' || wordArray[j - 1] == '.') {
wordArray[j - 1] = '\0';
}
wordArray[j] = '\0';
int status = -1;
for (j = 0; j < count; ++j) {
if (strcmp(words[j], wordArray) == 0) {
status = j;
break;
}
}
if (status != -1) {
occurrences[status] += 1;
} else {
occurrences[count] += 1;
strcpy(words[count++], wordArray);
}
++i;
}
printf("\nWord Length\tOccurrences\n----------- -----------\n");
for (i = 0; i < count; ++i) {
// print each word and its occurrences
printf("%s\t\t%d\n", words[i], occurrences[i]);
}
}
Part B is where I'm having a problem though, I want the code to be able to tell me the occurrence of which a word of a specific length shows up, such as this instance:
Word length Occurrences
1 0
2 1
Here, there are no instances where there is a word with one character, but there is one instance where there is a word with two characters. However, my code is outputting the number of times a specific word is given and not what I want above, like this:
Word Length Occurrences
----------- -----------
the 3
quick 1
brown 1
3
fox 1
jumps 1
over 1
lazy 2
dog 2
and 1
is 1
still 1
sleeping 1
How would I go about changing it so that it shows the output I want with just the word length and frequency?
Here are some remarks about your code:
the first loop recomputes the length of the string for each iteration: for (x = 0; x <= strlen(myString); ++x). Since you modify the string inside the loop, it is difficult for the compiler to ascertain that the string length does not change, so a classic optimisation may not work. Use the same test as for the next loop:
for (x = 0; myString[x] != '\0'; ++x)
the test for uppercase is not very readable because you hardcode the ASCII values of the letters A and Z, you should either write:
if (myString[x] >= 'A' && myString[x] <= 'Z')
myString[x] += 'a' - 'A';
or use macros from <ctype.h>:
unsigned char c = myString[x];
if (isupper(c))
myString[x] = tolower(c);
or equivalently and possibly more efficiently:
myString[x] = tolower((unsigned char)myString[x]);
in the second loop, you remove characters that are neither letters, digits nor spaces. You have a redundant nested while loop and a third nested loop to shift the rest of the array for each byte removed: this method has cubic time complexity, O(N3), very inefficient. You should instead use a two finger method that operates in linear time:
for (x = y = 0; myString[x] != '\0'; ++x) {
unsigned char c = myString[x];
if (!isalnum(c) && c != ' ') {
myString[y++] = c;
}
}
myString[y] = '\0';
note that this loop removes all punctuation instead of replacing it with spaces: this might glue words together such as "a fine,good man" -> "a finegood man"
In the third loop, you use a char value c as an argument for isalpha(c). You should include <ctype.h> to use any function declared in this header file. Functions and macros from <ctype.h> are only defined for all values of the type unsigned char and the special negative value EOF. If type char is signed on your platform, isalpha(c) would have undefined behavior if the string has negative characters. In your particular case, you filtered characters that are not ASCII letters, digits or space, so this should not be a problem, yet it is a good habit to always use unsigned char for the character argument to isalpha() and equivalent functions.
Note also that this counting phase could have been combined into the previous loops.
to count the occurrences of words, the array occurrences should have the same number of elements as the words array, 1000. You do not check for boundaries so you have undefined behavior if there are more than 1000 different words and/or if any of these words has 100 characters or more.
in the next loop, you extract words from the string, incrementing i inside the nested loop body. You also increment i at the end of the outer loop, hence skipping the final null terminator. The test while (myString[i] != '\0') will test bytes beyond the end of the string, which is incorrect and potential undefined behavior.
to avoid counting empty words in this loop, you should skip sequences of spaces before copying the word if not at the end of the string.
According to the question, counting individual words is not what Part B is expected to do, you should instead count the frequency of word lengths. You can do this in the first loop by keeping track of the length of the current word and incrementing the array of word length frequencies when you find a separator.
Note that modifying the string is not necessary to count letter frequencies or word length occurrences.
Writing a separate function for each task is recommended.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#define MAX_LENGTH 100
// Function to lowercase letters and remove special characters
void clean_string(char *str) {
int x, y;
printf("Original Text:\n");
printf("%s\n", str);
for (x = y = 0; str[x] != '\0'; x++) {
unsigned char c = str[x];
c = tolower(c);
if (isalnum(c) || c == ' ') {
str[y++] = c;
}
}
str[y] = '\0';
printf("\nModified Text:\n%s\n", str);
}
// Part A: count letter frequencies
void count_letters(const char *str) {
int letter_count['z' - 'a' + 1] = { 0 };
for (int i = 0; str[i] != '\0'; i++) {
unsigned char c = str[i];
if (c >= 'a' && c <= 'z') {
letter_count[c - 'a'] += 1;
} else
if (c >= 'A' && c <= 'Z') {
letter_count[c - 'A'] += 1;
}
}
printf("\nLetter\tCount"
"\n------\t-----\n");
for (int c = 'a'; c <= 'z'; c++) {
printf("%c\t%d\n", c, letter_count[c - 'a']);
}
}
// Part B: count word lengths frequencies
void count_word_lengths(const char *str) {
int length_count[MAX_LENGTH + 1] = { 0 };
for (int i = 0, len = -1;; i++) {
unsigned char c = str[i];
// counting words as sequences of letters or digits
if (isalnum(c)) {
len++;
} else {
if (len >= 0 && len <= MAX_LENGTH) {
length_count[len] += 1;
len = -1;
}
}
if (c == '\0')
break;
}
printf("\nWord Length\tOccurrences"
"\n-----------\t-----------\n");
for (int len = 0; len <= MAX_LENGTH; len++) {
if (length_count[len]) {
printf("%-11d\t%d\n", len, length_count[len]);
}
}
}
int main() {
char myString[] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
// Uncomment if modifying the string is required
//clean_string(myString);
count_letters(myString);
count_word_lengths(myString);
return 0;
}
Output:
Letter Count
------ -----
a 3
b 1
c 1
d 3
e 6
f 1
g 3
h 3
i 4
j 1
k 1
l 5
m 1
n 3
o 5
p 2
q 1
r 2
s 4
t 4
u 2
v 1
w 1
x 1
y 2
z 2
Word Length Occurrences
----------- -----------
1 1
2 7
3 3
4 4
7 1
Use strtok_r() and simplify counting.
It's sibling strtok() is not thread-safe. Discussed in detail in Why is strtok() Considered Unsafe?
Also, strtok_r() chops input string by inserting \0 chars inside the string. If you want to keep a copy of original string, you have to make a copy of original string and pass it on to strtok_r().
There is also another catch. strtok_r() is not a part of C-Standard yet, but POSIX-2008 lists it. GNU glibc implements it, but to access this function we need to #define _POSIX_C_SOURCE before any includes in our source files.
There is also strdup() & strndup() which duplicate an input string, they allocate memory for you. You've to free that string-memory when you're done using it. strndup() was added in POSIX-2008 so we declare 200809L in our sources to use it.
It's always better to use new standards to write fresh code. POSIX 200809L is recommended with at least C standard 2011.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_STR_LEN 1024
#define MAX_WORD_LEN 128
#define WORD_DELIMS " \n\t"
int is_word (const char* str, const size_t slen) {
int word = 0;
for (size_t ci = 0; ci < slen;)
if (isalnum (str[ci++])) {
word = 1;
break;
}
return word;
}
void get_word_stat (const char* str, int word_stat[]) {
char *copy = strndup (str, MAX_STR_LEN); // limiting copy
if (!copy) { // copying failed
printf ("Error duplicating input string\n");
exit (1);
}
for (char *token, *rmdStr = copy; (token = strtok_r (NULL, WORD_DELIMS, &rmdStr)); /* empty */) {
size_t token_len = strlen (token);
if (token_len > (MAX_WORD_LEN - 1)) {
printf ("Error: Increase MAX_WORD_LEN(%d) to handle words of length %lu\n", MAX_WORD_LEN, token_len);
exit (2);
}
if (is_word (token, token_len))
++word_stat[token_len];
else
printf ("[%s] not a word\n", token);
}
free (copy);
}
int main () {
char str [MAX_STR_LEN] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf ("Original Text: [%s]\n", str);
int word_stat[MAX_WORD_LEN] = {0};
get_word_stat (str, word_stat);
printf ("\nWordLength Occurrences\n");
for (int si = 1; si < MAX_WORD_LEN; ++si) {
if (word_stat[si])
printf ("%d\t\t%d\n", si, word_stat[si]);
}
return 0;
}
Whenever you are interested in the frequency that something occurs, you want to use a Frequency Array containing the number of elements necessary to handle the entire range of possible occurrence. You want to track the frequency of word-lengths, so you need an array that is sized to track the longest word. (longest word in the non-medical unabridged dictionary is 29-characters, longest medical word is 45-characters)
So here a simple array of integers with 29 elements will do (unless you want to consider medical words, then use 45). If you want to consider non-sense words, then size appropriately, e.g. "Supercalifragilisticexpialidocious", 34-characters. Chose the type based on a reasonably anticipated maximum number of occurrences. Using signed int that limits the occurrences to INT_MAX (2147483647). Using unsigned will double the limit, or using uint64_t for a full 64-bit range.
How it works
How do you use a simple array to tract the occurrences of word lengths? Simple, declare an array of sufficient size and initialize all elements zero. Now all you do is read a word, use, e.g. size_t len = strlen(word); to get the length and then increment yourarray[len] += 1;.
Say the word has 10-characters, you will add one to yourarray[10]. So the array index corresponds word-length. When you have taken the length of all words and incremented the corresponding array index, to get your results, you just loop over your array and output the value (number of occurrences) at the index (word-length). If you have had two words that were 10-characters each, then yourarray[10] will contain 2 (and so on and so forth for every other index that corresponds to a different word-length number of characters).
Consideration When Choosing How to Separate Words
When selecting a method to split a string of space separated words into individual words, you need to know whether your original string is mutable. For example, if you choose to separate words with strtok(), it will modify the original string. In your case since your words are stored in an array or char, that is fine, but what if you had a string-literal like:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog ";
In that case, passing mystring to strtok() would SEGFAULT when strtok() attempts to modify the region of read-only memory holding mystring (ignoring the non-standard treatment of string-literals by Microsoft)
You can of course make a copy of mystring and put the string-literal in mutable memory and then call strtok() on the copy. Or, you can use a method that does not modify mystring (like using sscanf() and an offset to parse the words, or using alternating calls to strcspn() and strspn() to locate and skip whitespace, or simply using a start and end pointer to work down the string bracketing words and copying characters between the pointers. Entirely up to you.
For example, using sscanf() with an offset to work down the string, updating the offset from the beginning with the number of characters consumed during each read you could do:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0; /* offset from beginning of mystring */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* do other stuff here */
}
Testing if Words is Alphanum
You can loop over each character calling the isalnum() macro from ctype.h on each character. Or, you can let strspn() do it for you given a list of characters that your words can contain. For example for digits and alpha-characters only, you can use a simple constant, and then call strspn() in your loop to determine if the word is made up only of the characters you will accept in a word, e.g.
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
...
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
...
Neither way is more-right than the other, it's really a matter of convenience and readability. Using a library provided function also provides a bit of confidence that it is written in a manner that will allow the compiler to fully optimize the compiled code.
A Short Example
Putting the thoughts above together in a short example that will parse the words in mystring using sscanf() and then track the occurrences of all alphanum words (up to 31-characters, and outputting any word rejected) using a simple array of integers to hold the frequency of length, you could do:
#include <stdio.h>
#include <string.h>
#define MAXLEN 32 /* if you need a constant, #define one (or more) */
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
int main (void) {
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0, /* offset from beginning of mystring */
lenfreq[MAXLEN] = {0}; /* frequency array for word length */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
lenfreq[len] += 1; /* update frequency array of lengths */
}
/* output original string */
printf ("\nOriginal Text:\n\n%s\n\n", mystring);
/* output length frequency array */
puts ("word length Occurrences\n"
"----------- -----------");
for (size_t i = 0; i < MAXLEN; i++) {
if (lenfreq[i])
printf ("%2zu%14s%d\n", i, " ", lenfreq[i]);
}
}
Example Use/Output
Compiling and running the program would produce:
$ ./bin/wordlen-freq
error: rejecting "?"
error: rejecting "?"
error: rejecting "!##!"
Original Text:
The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping
word length Occurrences
----------- -----------
2 1
3 7
4 3
5 4
8 1
(note: you can output all lengths from 0 to 31 even if there were no occurrences by removing the print condition if (lenfreq[i]) -- up to you)
Look things over and let me know if you have questions.
Quite recently, at the university, we began to study strings in the C programming language, and as a homework, I was given the task of writing a program to remove extra words.
While writing a program, I faced an issue with iteration through a string that I could solve in a hacky way. However, I would like to deal with the problem with your help, since I cannot find the error myself.
The problem is that when I use the strlen(buffer) function as a for-loop condition, the code compiles easily and there are no errors at runtime, although when I use the __act_buffer_len variable, which is assigned a value of strlen(buffer) there will be a segmentation fault at runtime.
I tried many more ways to solve this problem, but the only one, which I already described, worked for me.
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
int __act_buffer_len = strlen(buffer);
// for debugging purposes
printf("__actbuff: %d\n", __act_buffer_len);
printf("sizeof: %d\n", sizeof(buffer));
printf("strlen: %d\n", strlen(buffer));
char* _newbuff = malloc(__act_buffer_len + 1); // <- new buffer without words with less than 2 unique words
char* _tempbuff; // <- used to store current word
int beg_point = 0;
int curr_wlen = 0;
for (int i = 0; i < strlen(buffer); i++) // no errors at runtime, app runs well
// for (int i = 0; i < __act_buffer_len; i++) // <- segmentation fault when loop is reaching a space character
// for (int i = 0; buffer[i] != '\0'; i++) // <- also segmentation fault at the same spot
// for (size_t i = 0; i < strlen(buffer); i++) // <- even this gives a segmentation fault which is totally confusing for me
{
printf("strlen in loop %d\n", i);
if (buffer[i] == delim)
{
char* __cpy;
memcpy(__cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t __letters = __get_letters(__cpy, curr_wlen); // <- will return number of unique letters in word
if (__letters > 2) // <- will remove all the words with less than 2 unique letters
{
strcat(_newbuff, __cpy);
strcat(_newbuff, " ");
}
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
}
else curr_wlen++;
}
return _newbuff;
}
In short, the code above just finds delimiter character in string and counts the number of unique letters of the word before this delimiter.
My fault was in not initializing a __cpy variable.
Also, as #n.1.8e9-where's-my-sharem. stated, I shouldn't name vars with two underscores.
The final code:
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
size_t _act_buffer_len = strlen(buffer);
char* _newbuff = malloc(_act_buffer_len); // <- new buffer without words with less than 2 unique words
int beg_point = 0;
int curr_wlen = 0;
for (size_t i = 0; i < _act_buffer_len; i++)
{
if (buffer[i] == delim)
{
char* _cpy = malloc(curr_wlen);
memcpy(_cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t _letters = _get_letters(_cpy, curr_wlen); // <- will return number of unique letters in word
if (_letters > 2) // <- will remove all the words with less than 2 unique letters
strcat(_newbuff, _cpy);
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
free(_cpy);
}
else curr_wlen++;
}
return _newbuff;
}
Thanks for helping me
I am looking at a program that finds the frequency of strings entered. Comparison is made based on a string's ASCII value against the ASCII value of lowercase 'a'. I have implemented it; it works, albeit, with a bug, but essentially, I am ignorant of a particular line of code;
for (int i = 0; i < strlen(arr2); i++)
{
// this line...
arr1[ arr2[i] - 'a' ]++;
}
arr1 is arr1[26] = {0},
that is, all the letters of the alphabet are assigned an index and the array is initialised to zero, while arr2[] as a function argument, receives the stdin.
How does the mysterious line of code work and what is it saying?
The full code:
#include <stdio.h>
#include <string.h>
#define ALEPH 26
void freq(char arr2[]);
int main ()
{
char * str;
printf("\nCharacter Frequency\n"
"--------------------\n");
// user input
printf("\nEnter a string of characters:\n");
fgets(str, ALEPH, stdin);
freq(str);
return 0;
}
// Function Definiton
void freq (char arr2[])
{
// array for ascii characters initialised to 0
int arr1[ALEPH] = {0};
// scan and cycle through the input array
for (int i = 0; i < strlen(arr2); i++)
{
arr1[ arr2[i] - 'a' ]++;
}
for (int j = 0; j < 26; j++)
{
if ( arr1[j] != 0 )
{
printf("\nCharacter: %c - Frequency: %d", 'a'+j, arr1[j]);
}
}
printf("\n");
}
arr1 is an array of 26 ints initialized to 0s. The indexes of its elements are 0..25.
arr2 is assumed to be a string of lowercase letters 'a'..'z' only. The characters are assumed to be using an encoding where lowercase letters are single-byte and sequential in value, such as ASCII (where a=97, ..., z=122). Anything else that does not match these assumptions will cause undefined behavior in this code.
The code loops through arr2, and for each character, calculates an index by subtracting the numeric value of 'a' (ie, ASCII 97) from the character's numeric value:
'a' - 'a' = 97 - 97 = 0
'b' - 'a' = 98 - 97 = 1
...
'z' - 'a' = 122 - 97 = 25
Then the code accesses the arr1 element at that index, and increments that element's value by 1.
You ask about the line:
arr1[ arr2[i] - 'a' ]++;
In this line:
arr1 is the array that will accumulate the histogram
arr2 is the input string which will contribute to the histogram
i is the index into input string.
This can be rewritten as:
ch = arr2[i];
histogram_slot = ch - 'a';
arr1[histogram_slot ] = arr1[histogram_slot ] + 1;
For each character in the input string, the character is fetched from the string and assigned to "ch". "ch" is converted to the index in the histogram array by subtracting 'a'. In the third line, the histogram_slot is increased by one. histogram_slot 0 is incremented for 'a', 1 for 'b', 2 for 'c', ... , and 25 for 'z'.
A serious bug in this code is that it only works for the lower case letters. An upper case letter, digit, punctuation, Unicode, extended ASCII, or any character not between 'a' and 'z' inclusive will write in an unintended region of memory. At the best, this will cause an unexpected crash. In the medium disaster, it will cause sporatic malfunction that gets through your testing. In the worst case, it creates a security hole allowing someone uncontrolled access to your stack, and thus the ability to take over execution of the thread.
I am attempting to write a program that accepts grammatically incorrect text (under 990 characters in length) as input, corrects it, and then returns the corrected text as output. I attempted to run the program using the online compiler, "ideone", but it returned quite a few errors that I don't quite understand. I have posted my code, as well as a picture of the errors below. Can anybody explain to me what exactly the errors mean?
#include "stdio.h"
char capitalize(int i); //prototype for capitalize method
int main(void)
{
char userInput[1200]; //Array of chars to store user input. Initialized to 1200 to negate the possibility of added characters filling up the array.
int i; //Used as a counter for the for loop below.
int j; //Used as a counter for the second for loop within the first for loop below.
int numArrayElements;
printf("Enter your paragraphs: ");
scanf("%c", &userInput); //%c used since chars are expected as input(?)
numArrayElements = sizeof(userInput) / sizeof(userInput[0]); //stores the number of elements in the array into numArrayElements.
if (userInput[0] >= 97 && userInput[0] <= 122) //Checks the char in index 0 to see if its ascii value is equal to that of a lowercase letter. If it is, it is capitalized.
userInput[0] = capitalize(userInput[0]);
//code used to correct input should go here.
for (i = 1; i < numArrayElements; i++) //i is set to 1 here because index 0 is taken care of by the if statement above this loop
{
if (userInput[i] == 32) //checks to see if the char at index i has the ascii value of a space.
if (userInput[i + 1] == 32 && userInput[i - 1] != 46) //checks the char at index i + 1 to see if it has the ascii value of a space, as well as the char at index i - 1 to see if it is any char other than a period. The latter condition is there to prevent a period from being added if one is already present.
{
for (j = numArrayElements - 1; j > (i - 1); j--) //If the three conditions above are satisfied, all characters in the array at location i and onwards are shifted one index to the right. A period is then placed within index i.
userInput[j + 1] = userInput[j];
userInput[i] = 46; //places a period into index i.
numArrayElements++; //increments numArrayElements to reflect the addition of a period to the array.
if (userInput[i + 3] >= 97 && userInput[i + 3] <= 122) //additionally, the char at index i + 3 is examined to see if it is capitalized or not.
userInput[i + 3] = capitalize(userInput[i + 3]);
}
}
printf("%c\n", userInput); //%c used since chars are being displayed as output.
return 0;
}
char capitalize(char c)
{
return (c - 32); //subtracting 32 from a lowercase char should result in it gaining the ascii value of its capitalized form.
}
Your code hase several problems, quite typical for a beginner. Teh answer to teh question in your last commenst lies in the way scanf() works: it takes everything between whitepsaces as a token, so it just ends after hey. I commented the code for the rest of the problems I found without being too nitpicky. The comments below this post might do it if they fell so.
#include "stdlib.h"
#include "stdio.h"
#include <string.h>
// Check for ASCII (spot-checks only).
// It will not work for encodings that are very close to ASCII but do not earn the
// idiomatic cigar for it but will fail for e.g.: EBCDIC
// (No check for '9' because non-consecutive digits are forbidden by the C-standard)
#if ('0' != 0x30) || ('a' != 0x61) || ('z' != 0x7a) || ('A' != 0x41) || ('Z' != 0x5a)
#error "Non-ASCII input encoding found, please change code below accordingly."
#endif
#define ARRAY_LENGTH 1200
// please put comments on top, not everyone has a 4k monitor
//prototype for capitalize method
char capitalize(char i);
int main(void)
{
//Array of chars to store user input.
// Initialized to 1200 to negate the possibility of
// added characters filling up the array.
// added one for the trailing NUL
char userInput[ARRAY_LENGTH + 1];
// No need to comment counters, some things can be considered obvious
// as are ints called "i", "j", "k" and so on.
int i, j;
int numArrayElements;
// for returns
int res;
printf("Enter your paragraphs: ");
// check returns. Always check returns!
// (there are exceptions if you know what you are doing
// or if failure is unlikely under normal circumstances (e.g.: printf()))
// scanf() will read everything that is not a newline up to 1200 characters
res = scanf("%1200[^\n]", userInput);
if (res != 1) {
fprintf(stderr, "Something went wrong with scanf() \n");
exit(EXIT_FAILURE);
}
// you have a string, so use strlen()
// numArrayElements = sizeof(userInput) / sizeof(userInput[0]);
// the return type of strlen() is size_t, hence the cast
numArrayElements = (int) strlen(userInput);
// Checks the char in index 0 to see if its ascii value is equal
// to that of a lowercase letter. If it is, it is capitalized.
// Do yourself a favor and use curly brackets even if you
// theoretically do not need them. The single exception being "else if"
// constructs where it looks more odd if you *do* place the curly bracket
// between "else" and "if"
// don't use the numerical value here, use the character itself
// Has the advantage that no comment is needed.
// But you still assume ASCII or at least an encoding where the characters
// are encoded in a consecutive, gap-less way
if (userInput[0] >= 'a' && userInput[0] <= 'z') {
userInput[0] = capitalize(userInput[0]);
}
// i is set to 1 here because index 0 is taken care of by the
// if statement above this loop
for (i = 1; i < numArrayElements; i++) {
// checks to see if the char at index i has the ascii value of a space.
if (userInput[i] == ' ') {
// checks the char at index i + 1 to see if it has the ascii
// value of a space, as well as the char at index i - 1 to see
// if it is any char other than a period. The latter condition
// is there to prevent a period from being added if one is already present.
if (userInput[i + 1] == ' ' && userInput[i - 1] != '.') {
// If the three conditions above are satisfied, all characters
// in the array at location i and onwards are shifted one index
// to the right. A period is then placed within index i.
// you need to include the NUL at the end, too
for (j = numArrayElements; j > (i - 1); j--) {
userInput[j + 1] = userInput[j];
}
//places a period into index i.
userInput[i] = '.';
// increments numArrayElements to reflect the addition
// of a period to the array.
// numArrayElements might be out of bounds afterwards, needs to be checked
numArrayElements++;
if (numArrayElements > ARRAY_LENGTH) {
fprintf(stderr, "numArrayElements %d out of bounds\n", numArrayElements);
exit(EXIT_FAILURE);
}
// additionally, the char at index i + 3 is examined to see
// if it is capitalized or not.
// The loop has the upper limit at numArrayElements
// i + 3 might be out of bounds, so check
if (i + 3 > ARRAY_LENGTH) {
fprintf(stderr, "(%d + 3) is out of bounds\n",i);
exit(EXIT_FAILURE);
}
if (userInput[i + 3] >= 97 && userInput[i + 3] <= 122) {
userInput[i + 3] = capitalize(userInput[i + 3]);
}
}
}
}
printf("%s\n", userInput);
return 0;
}
char capitalize(char c)
{
// subtracting 32 from a lowercase char should result
// in it gaining the ascii value of its capitalized form.
return (c - ' ');
}
I have been working on some code that tokenizes a string from a line and then creates a temp array to copy the string into it (called copy[]) and it is filled with 0's initially (The end game is to split this copy array into temp arrays of length 4 and store them in a struct with a field char* Value). For some reason my temp arrays of size 4 end up having a size of 6.
char* string = strtok(NULL, "\"");
printf("%s", string);
int len = (int)strlen(string);
while(len%4 != 0) {
len++;
}
char copy[len];
for(int i = 0; i < len; i++){
copy[i] = '0';
}
printf("%s\n", copy);
int copyCount = 0;
int tmpCount = 0;
char temp[4];
while (copyCount < len) {
if(tmpCount == 4) {
tmpCount = 0;
}
while(tmpCount < 4) {
temp[tmpCount] = copy[copyCount];
tmpCount++;
copyCount++;
}
printf("%s %d\n", temp, (int)strlen(temp));
}
This yields:
This is the end
0000000000000000
This is the end0
This� 6
is � 6
the � 6
end0� 6
And should yield:
This is the end
0000000000000000
This is the end0
This 4
is 4
the 4
end0 4
I've been messing around with this for awhile and can't seem to figure out why its making temp have a length of 6 when I set it to 4. Also I'm not sure where the random values are coming from. Thanks!
The reason is that your string temp is not null-terminated. C-style strings should be terminated with a \0 character. For some (lucky) reason there is a \0 three bytes in memory after wherever the end of temp lives, so when strlen tries to compute its length, it gets 6. This is also why printf is printing garbage: it will print temp until it finds the null terminator, and there are garbage characters in memory before printf reaches the null terminator.