Counting Character usage in text file? C - c

Hi,
I need to count the usage of alphabetical characters in some plain text file. This is what i have came with. Basically just run through the text file and compare each character with the ASCII value of specific searched character.
When I run it, all I can see is just the first printf() string and just error of terminated status when I close the console.
I do have a text.txt file in same folder as the .exe file but I can't see anything.
Not sure if just my syntax is bad or even semantics.
Thx for help! :-)
#include <stdio.h>
#include <stdlib.h>
#define ASCIIstart 65
#define ASCIIend 90
void main(){
FILE *fopen(), *fp;
int c;
unsigned int sum;
fp = fopen("text.txt","r");
printf("Characters found in text: \n");
for (int i = ASCIIstart; i <= ASCIIend; i++){
sum = 0;
c = toupper(getc(fp));
while (c != EOF){
if (c == i){
sum = sum++;
}
c = toupper(getc(fp));
}
if (sum > 0){
printf("%c: %u\n",i,sum);
}
}
fclose(fp);
}

Instead of looking up the entire file for each character, you could do
FILE *fp;
int c, sum[ASCIIend - ASCIIstart + 1]={0};
fp = fopen("file.txt,"r");
if(fp==NULL)
{
perror("Error");
return 1;
}
int i;
while( (c = toupper(getc(fp)))!= EOF)
{
if(c>=ASCIIstart && c<=ASCIIend)
{
sum[c-ASCIIstart]++;
}
}
for(i=ASCIIstart; i<=ASCIIend; ++i)
{
printf("\n%c: %d", i, sum[i-ASCIIstart]);
}
You must check the return value of fopen() to ensure that the file was successfully opened.
There's an array sum which holds the the number of occurrences of each character within the range denoted with ASCIIend and ASCIIstart macros.
The size of the array is just the number of characters whose number of occurrences is to be counted.
sum[c-ASCIIstart] is used because the difference between the ASCII value (if the encoding is indeed ASCII) of c and ASCIIstart would give the index associated with c.
I don't know what you meant with FILE *fopen(), fp; but fopen() is the name of a function in C used to open files.
And by
FILE *fopen(), *fp;
you gave a prototype of a function fopen().
But in stdio.h, there's already a prototype for fopen() like
FILE *fopen(const char *path, const char *mode);
yet no errors (if so) were shown because fopen() means that the function can have any number of arguments. Have a look here.
Had the return type of your FILE *fopen(); were not FILE * or if it were shown to other parameter types like int, you would definitely have got an error.
And, void main() is not considered good practice. Use int main() instead. Look here.

You can use a character array and parse the file contents with one time traversal and display the array count finally.
#include <stdio.h>
#include<ctype.h>
void main(){
FILE *fopen(), *fp;
int c;
fp = fopen("test.txt","r");
printf("Characters found in text: \n");
char charArr[26]= {0};
c = toupper(fgetc(fp));
while(c!=EOF) {
charArr[c-'A']=charArr[c-'A']+1;
c = toupper(fgetc(fp));
}
fclose(fp);
for(int i=0;i<26;i++){
printf("\nChar: %c | Count= %d ",i+65,charArr[i]);
}
}
Hope this helps!!

because after first time you are end of the file.
and your c = toupper(getc(fp)); returning -1 after that.

For counting just one character, you are reading the whole file and repeating this for each and every character. Instead, you can do:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#define ASCIIstart 65
#define ASCIIend 90
int main(){
FILE *fp;
int c, i;
int alphabets[26] = {0};
fp = fopen("text.txt","r");
if (fp == NULL){
fprintf (stderr, "Failed to open file\n");
return -1;
}
while ((c = toupper(fgetc(fp))) != EOF){
if (c >= ASCIIstart && c <= ASCIIend)
alphabets[c - ASCIIstart]++;
}
fclose(fp);
fprintf(stdout, "Characters found in text: \n");
for (i = 0; i < 26; i++)
fprintf (stdout, "%c: %d\n", i+ASCIIstart, alphabets[i]);
return 0;
}

TLDR
Working with your code, your loops are inside-out.
I'll answer in pseudo-code to keep the concepts straightforward.
Right now you are doing this:
FOR LETTER = 'A' TO 'Z':
WHILE FILE HAS CHARACTERS
GET NEXT CHARACTER
IF CHARACTER == LETTER
ADD TO COUNT FOR CHAR
END IF
END WHILE
END FOR
The problem is you are running through the file with character 'A' and then reaching the end of file so nothing gets done for 'B'...'Z'
If you swapped this:
WHILE FILE HAS CHARACTERS
GET NEXT CHARACTER
FOR LETTER = 'A' TO 'Z'
IF LETTER = UCASE(CHARACTER)
ADD TO COUNT FOR LETTER
END IF
END FOR
END WHILE
Obviously doing 26 checks for each letter is too much so perhaps a better approach.
LET COUNTS = ARRAY(26)
WHILE FILE HAS CHARACTERS
CHARACTER := UCASE(CHARACTER)
IF CHARACTER >= 'A' AND CHARACTER <= 'Z'
LET INDEX = CHARACTER - 'A'
COUNTS[INDEX]++
ENDIF
END WHILE
You can translate the pseudo code to C as an exercise.

Rewind the pointer to the beginning of the file at the end of your for loop?
This has been posted before: Resetting pointer to the start of file
P.S. - maybe use an array for your output values : int charactercount[pow(2,sizeof(char))] so that you don't have to parse the file repeatedly?
edit: was missing pow()

Related

How do I run C code on linux with input file from command line?

I'm trying to do some simple tasks in C and run them from the command line in Linux.
I'm having some problems with both C and running the code from the command line with a given filename given as a parameter. I've never written code in C before.
Remove the even numbers from a file. The file name is transferred to
the program as a parameter in the command line. The program changes
this file.
How do I do these?
read from a file and write the results over the same file
read numbers and not digits from the file (ex: I need to be able to read "22" as a single input, not two separate chars containing "2")
give the filename through a parameter in Linux. (ex: ./main.c file.txt)
my attempt at writing the c code:
#include <stdio.h>
int main ()
{
FILE *f = fopen ("arr.txt", "r");
char c = getc (f);
int count = 0;
int arr[20];
while (c != EOF)
{
if(c % 2 != 0){
arr[count] = c;
count = count + 1;
}
c = getc (f);
}
for (int i=0; i<count; i++){
putchar(arr[i]);
}
fclose (f);
getchar ();
return 0;
}
Here's a complete program which meets your requirements:
write the results over the same file - It keeps a read and write position in the file and copies characters towards the file beginning in case numbers have been removed; at the end, the now shorter file has to be truncated. (Note that with large files, it will be more efficient to write to a second file.)
read numbers and not digits from the file - It is not necessary to read whole numbers, it suffices to store the write start position of a number (this can be done at every non-digit) and the parity of the last digit.
give the filename through a parameter - If you define int main(int argc, char *argv[]), the first parameter is in argv[1] if argc is at least 2.
#include <stdio.h>
#include <ctype.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
if (argc < 2) return 1; // no argument given
FILE *f = fopen(argv[1], "rb+");
if (!f) return 1; // if fopen failed
// read, write and number position
long rpos = 0, wpos = 0, npos = 0;
int even = 0, c; // int to hold EOF
while (c = getc(f), c != EOF)
{
if (isdigit(c)) even = c%2 == 0;
else
{
if (even) wpos = npos, even = 0;
npos = wpos+1; // next may be number
}
fseek(f, wpos++, SEEK_SET);
putc(c, f);
fseek(f, ++rpos, SEEK_SET);
}
ftruncate(fileno(f), wpos); // shorten the file
}
I'd do that like this (removing extra declarations => micro optimizations)
/**
* Check if file is avaiable.
*/
if (f == NULL)
{
printf("File is not available \n");
}
else
{
/**
* Populate array with even numbers.
*/
while ((ch = fgetc(f)) != EOF)
ch % 2 != 0 ? push(arr, ch); : continue;
/**
* Write to file those numbers.
*/
for (int i = 0; i < 20; i++)
fprintf(f, "%s", arr[i]);
}
Push implementation:
void push(int el, int **arr)
{
int *arr_temp = *arr;
*arr = NULL;
*arr = (int*) malloc(sizeof(int)*(n - 1));
(*arr)[0] = el;
for(int i = 0; i < (int)n - 1; i++)
{
(*arr)[i + 1] = arr_temp[i];
}
}
In order to write to the same file, without closing and opening it, you should provide both methods, w+ (writing and reading), and this method will clear it's content.
So, change the line where you open the file, for this.
FILE *f = fopen ("arr.txt", "w+");
You should look for ways of implementing dynamic arrays (pointers and memory management).
With this example you could simply go ahead and write yourself, inside the main loop, a temporary variable that stores a sequence of numbers, and stack those values
Something like this (pseudocode, have fun :)):
DELIMITER one of (',' | '|' | '.' | etc);
char[] temp;
if(ch not DELIMITER)
push ch on temp;
else
push temp to arr and clear it's content;
Hope this was useful.

Reading and writing into a file in c

I need to write into a file with uppercase some strings ,then to display on screen with lowercase. After that ,I need to write into file the new text (lowercase one). I write some code ,but it doesn't work. When I run it , my file seems to be intact and the convert to lowercase don't work
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
void main(void) {
int i;
char date;
char text[100];
FILE *file;
FILE *file1;
file = fopen("C:\\Users\\amzar\\Desktop\\PC\\Pregatire PC\\Pregatire PC\\file\\da.txt","r");
file1 = fopen("C:\\Users\\amzar\\Desktop\\PC\\Pregatire PC\\Pregatire PC\\file\\da.txt","w");
printf("\nSe citeste fisierul si se copiaza textul:\n ");
if(file) {
while ((date = getc(file)) != EOF) {
putchar(tolower(date));
for (i=0;i<27;i++) {
strcpy(text[i],date);
}
}
}
if (file1) {
for (i=0;i<27;i++)
fprintf(file1,"%c",text[i]);
}
}
There are several problems with your program.
First, getc() returns int, not char. This is necessary so that it can hold EOF, as this is not a valid char value. So you need to declare date as int.
When you fix this, you'll notice that the program ends immediately, because of the second problem. This is because you're using the same file for input and output. When you open the file in write mode, that empties the file, so there's nothing to read. You should wait until after you finish reading the file before you open it for output.
The third problem is this line:
strcpy(text[i],date);
The arguments to strcpy() must be strings, i.e. pointers to null-terminated arrays of char, but text[i] and date are char (single characters). Make sure you have compiler warnings enabled -- that line should have warned you about the incorrect argument types. To copy single characters, just use ordinary assignment:
text[i] = date;
But I'm not really sure what you intend with that loop that copies date into every text[i]. I suspect you want to copy each character you read into the next element of text, not into all of them.
Finally, when you were saving into text, you didn't save the lowercase version.
Here's a corrected program. I've also added a null terminator to text, and changed the second loop to check for that, instead of hard-coding the length 27.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
void main(void) {
int i = 0;
int date;
char text[100];
FILE *file;
FILE *file1;
file = fopen("C:\\Users\\amzar\\Desktop\\PC\\Pregatire PC\\Pregatire PC\\file\\da.txt","r");
printf("\nSe citeste fisierul si se copiaza textul:\n ");
if(file) {
while ((date = getc(file)) != EOF) {
putchar(tolower(date));
text[i++] = tolower(date);
}
text[i] = '\0';
fclose(file);
} else {
printf("Can't open input file\n");
exit(1);
}
file1 = fopen("C:\\Users\\amzar\\Desktop\\PC\\Pregatire PC\\Pregatire PC\\file\\da.txt","w");
if (file1) {
for (i=0;text[i] != '\0';i++)
fprintf(file1,"%c",text[i]);
fclose(file1);
} else {
printf("Can't open output file\n");
exit(1);
}
}

Getting segmentation fault(core dumped)

My program has to encrypt/decrypt the textfile but I'm getting segmentation fault(core dumped) when I do this:
./program 9999 input.txt output.txt
The program takes every character from the input file and converts it based on the passed key. It compiles fine when I compile in CodeBlocks and does not give any errors. Could smb tell me what's wrong with the code? Thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//Checks if the input arguments are less than 2 (starting from 0)
//A user should enter 3 arguments so as not to reach this method
int badProgram(const char *const program){
printf("The program is missing some of the files!");
return -1;
}
//Encrypts the passed inputFile and
//Produces its output to the passed outputFile
//Key is also passed as the seeding number
int encryptFile(FILE *input, FILE *output){
char c;
char p;
int r = 0;
char p1 = 0;
char c1 = 0;
while((p = fgetc(input)) != EOF){
r = rand() % 97;
//change all displayable characters [0...96]
if(p == 't'){
p1 = 0;
}
else if(p == '\n'){
p1 = 1;
}
else{
p1 = p - 30;
}
c1 = p1 ^ r;//bitwise xor
if(c1 == 0){
c = 't';
}
else if(c1 == 1){
c = '\n';
}
else{
c = c1 + 30;
}
//Write
fprintf(output, "%c", c);
}
}
int main(int argc, char *argv[])
{
//Check the number of the entered arguments
if(argc < 2){
return badProgram(argv[0]);
}
else{
FILE *input;
FILE *output;
//Seed a number into Random Generator
int key = *argv[0];
srand(key);
input = fopen(argv[1], "r");
output = fopen(argv[2], "w");
encryptFile(input, output);
}
return 0;
}
The **input.txt** looks like this:
Hello, World!
Bye!
Couple of things that are wrong with your code:
int key = *argv[0]; is most likely not doing what you think it does. What it actually does is the following:
assign an ASCII value of the first character of the [0] argument (program name) to an int variable
It is likely that what you intended to do there is:
int key = atoi(argv[1]); // this converts "9999" into an int 9999
input = fopen(argv[1], "r"); opens a file named (in your case) "9999" for reading and fails. You never check for the error so this is causing a crash the moment you are trying to use the input FILE pointer. The fix is to use the argv[2]
Similarly you should be using argv[3] for the output file
Your encryptFile function must return a value as it is declared int (don't know why you want to return a value from it as you never use it)
Once you fix the above issues your program no longer crashes
Update
A bit of explanation for the above issues and general info:
The argv lists all the input parameters as strings (char*) where the first ([0]) argument is the executable name and is not your first argument "after" the program name
One should always check the results of file operations as they are quite likely to fail during "normal" program operation
C/C++ doesn't "automatically" convert a string into an int (or a double, for that matter) but provides a whole set of functions to deal with numbers' parsing. Some examples of those functions are: 'atoi', 'atol', 'atof'

How to Read/Write UTF8 text files in C?

i am trying to read UTF8 text from a text file, and then print some of it to another file. I am using Linux and gcc compiler. This is the code i am using:
#include <stdio.h>
#include <stdlib.h>
int main(){
FILE *fin;
FILE *fout;
int character;
fin=fopen("in.txt", "r");
fout=fopen("out.txt","w");
while((character=fgetc(fin))!=EOF){
putchar(character); // It displays the right character (UTF8) in the terminal
fprintf(fout,"%c ",character); // It displays weird characters in the file
}
fclose(fin);
fclose(fout);
printf("\nFile has been created...\n");
return 0;
}
It works for English characters for now.
Instead of
fprintf(fout,"%c ",character);
use
fprintf(fout,"%c",character);
The second fprintf() does not contain a space after %c which is what was causing out.txt to display weird characters. The reason is that fgetc() is retrieving a single byte (the same thing as an ASCII character), not a UTF-8 character. Since UTF-8 is also ASCII compatible, it will write English characters to the file just fine.
putchar(character) output the bytes sequentially without the extra space between every byte so the original UTF-8 sequence remained intact. To see what I'm talking about, try
while((character=fgetc(fin))!=EOF){
putchar(character);
printf(" "); // This mimics what you are doing when you write to out.txt
fprintf(fout,"%c ",character);
}
If you want to write UTF-8 characters with the space between them to out.txt, you would need to handle the variable length encoding of a UTF-8 character.
#include <stdio.h>
#include <stdlib.h>
/* The first byte of a UTF-8 character
* indicates how many bytes are in
* the character, so only check that
*/
int numberOfBytesInChar(unsigned char val) {
if (val < 128) {
return 1;
} else if (val < 224) {
return 2;
} else if (val < 240) {
return 3;
} else {
return 4;
}
}
int main(){
FILE *fin;
FILE *fout;
int character;
fin = fopen("in.txt", "r");
fout = fopen("out.txt","w");
while( (character = fgetc(fin)) != EOF) {
for (int i = 0; i < numberOfBytesInChar((unsigned char)character) - 1; i++) {
putchar(character);
fprintf(fout, "%c", character);
character = fgetc(fin);
}
putchar(character);
printf(" ");
fprintf(fout, "%c ", character);
}
fclose(fin);
fclose(fout);
printf("\nFile has been created...\n");
return 0;
}
This code worked for me:
/* fgetwc example */
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
int main ()
{
setlocale(LC_ALL, "en_US.UTF-8");
FILE * fin;
FILE * fout;
wint_t wc;
fin=fopen ("in.txt","r");
fout=fopen("out.txt","w");
while((wc=fgetwc(fin))!=WEOF){
// work with: "wc"
}
fclose(fin);
fclose(fout);
printf("File has been created...\n");
return 0;
}
If you do not wish to use the wide options, experiment with the following:
Read and write bytes, not characters.
Also known as, use binary, not text.
fgetc effectively gets a byte from a file, but if the byte is greater than 127, try treating it as a int instead of a char.
fputc, on the other hand, silently ignores putting a char > 127. It will work if you use an int rather than char as the input.
Also, in the open mode, try using binary, so try rb & wb rather than r & w
The C-style solution is very insightful, but if you'd consider using C++ the task becomes much more high level and it does not require you to have so much knowledge about utf-8 encoding. Consider the following:
#include<iostream>
#include<fstream>
int main(){
wifstream input { "in.txt" }
wofstream output { "out.txt" }
// Look out - this part is not portable to windows
locale utf8 {"en_us.UTF-8"};
input.imbue(utf8);
output.imbue(utf8);
wcout.imbue(utf8);
wchar_t c;
while(input >> noskipws >> c) {
wcout << c;
output << c;
}
return 0;
}

reading text file, copying into array in C

The code is supposed to read a user-inputted text file name, copy every character into a multidimensional array, then display it with standard output. It compiles, but produces unintelligible text. Am I missing something?
for (i = 0; i < BIGGEST; i++) {
for (j = 0; j < BIGGESTL; j++) {
if (fgetc(array, fp) ) != EOF)
array[i][j] = c;
else array[i][j] = '\0'
}
fclose(fp);
return 0;
}
You stop filling the array when you encounter EOF, but you print the full array out no matter what.
If the data read from the file is smaller than the input array, you will read that data in and then print that data out, plus whatever random characters were in the memory locations that you do not overwrite with data from the file.
Since the requirement seems to be to print text data, you could insert a special marker in the array (e.g. '\0') to indicate the position where you encountered EOF, and stop displaying data when you reach that marker.
You had better read each line from file
For example:
int i = 0;
while(fgets(text[i],1000,fp))
{
i++;
}
Though the question is edited and only part of the code is left in question. I am posting more than what is required for the question at the moment.
Reason being, there can be numberous improvements to originally posted full code.
In main() function:
You need to check for the argc value to be equal to 2 for your purpose and only then read in value of argv[1] . Else if program executed without the command-line-argument which is file_name in this case, invalid memory read occurs, resulting in segmentation fault if you read in argv[1].
In read_file_and_show_the contents() function:
Stop reading file if end of file is reached or maximum characters is read and store in the character array.
Below Program will help you visualize:
#include <stdio.h>
/*Max number of characters to be read/write from file*/
#define MAX_CHAR_FOR_FILE_OPERATION 1000000
int read_and_show_the_file(char *filename)
{
FILE *fp;
char text[MAX_CHAR_FOR_FILE_OPERATION];
int i;
fp = fopen(filename, "r");
if(fp == NULL)
{
printf("File Pointer is invalid\n");
return -1;
}
//Ensure array write starts from beginning
i = 0;
//Read over file contents until either EOF is reached or maximum characters is read and store in character array
while( (fgets(&text[i++],sizeof(char)+1,fp) != NULL) && (i<MAX_CHAR_FOR_FILE_OPERATION) ) ;
//Ensure array read starts from beginning
i = 0;
while((text[i] != '\0') && (i<MAX_CHAR_FOR_FILE_OPERATION) )
{
printf("%c",text[i++]);
}
fclose(fp);
return 0;
}
int main(int argc, char *argv[])
{
if(argc != 2)
{
printf("Execute the program along with file name to be read and printed. \n\
\rFormat : \"%s <file-name>\"\n",argv[0]);
return -1;
}
char *filename = argv[1];
if( (read_and_show_the_file(filename)) == 0)
{
printf("File Read and Print to stdout is successful\n");
}
return 0;
}

Resources