Find instance of different character in a file - c

In this program, I want to print out the instance of different characters in a file. The output will contain three variable, the number of occurrence, the hex of the letter, and the letter itself. Can someone help me with this? I am stuck!
Results of program should be something like this:
10 instance of character 0x4s (O)
10 instance of character 0x51 (W)
10 instance of character 0x51 (Y)
2 instances of character 0x65 (a)
18 instances of character 0x67 (c)
16 instances of character 0x81 (d)
//here is my program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char FILE_NAME[] = "input.txt";
int main(argc, *argv[]) {
char temp;
char count[255];
FILE *in_file;
int ch;
fp = fopen(FILE_NAME, "r");
if (in_file == NULL) {
printf("Can not open %s \n", FILE_NAME);
exit(0);
}
while (!feof(fp)) {
ch = fgetc(fp);
if(strchr(count, ch)!= NULL)
{
}
}
printf("%d instance of character (%c)", count);
fclose(in_file);
return (0);
}

Here's what you want (based on your code, with many comments by me):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h> // you need this to use isupper() and islower()
const char FILE_NAME[] = "input.txt";
int main(int argc,char *argv[]) {
char temp;
unsigned count[52] = {0}; // An array to store 52 kinds of chars
FILE *fp;
int i;
fp = fopen(FILE_NAME, "r");
if (fp == NULL) {
printf("Can not open %s \n", FILE_NAME);
exit(0);
}
while((temp = fgetc(fp)) != EOF) { // use this to detect eof
if(isupper(temp))
count[26+(temp-'A')]++; // capital letters count stored in 26-51
if(islower(temp))
count[temp-'a']++; // lower letters count stored in 0-25
}
fclose(fp); // When you don't need it anymore, close it immediately.
for(i = 0; i < 26; i++)
if(count[i])
printf("%d instance of character 0x%x (%c)\n", count[i], 'a'+i, 'a'+i);
for(; i < 52; i++)
if(count[i])
printf("%d instance of character 0x%x (%c)\n", count[i], 'A'+i-26, 'A'+i-26);
return (0);
}

Your array count is not a string, so using strchr() on it is not a good idea. Also, it's of type char, so it has very limited range for larger files.
You should probably use something like unsigned long count[256]. Make sure to initialize the counts to 0 before starting.
Also, don't use feof(). Just loop calling fgetc() until the returned character (which, correctly, has type int) is EOF. Cast it to something positive before using it to index into count for the increment.

Related

How to use fscanf to read a text file including many words and store them into a string array by index

The wordlist.txt is including like:
able
army
bird
boring
sing
song
And I want to use fscanf() to read this txt file line by line and store them into a string array by indexed every word like this:
src = [able army bird boring sing song]
where src[0]= "able", src[1] = "army" and so on. But my code only outputs src[0] = "a", src[1] = "b"... Could someone help me figure out what's going wrong in my code:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
FILE *fp = fopen("wordlist.txt", "r");
if (fp == NULL)
{
printf("%s", "File open error");
return 0;
}
char src[1000];
for (int i = 0; i < sizeof(src); i++)
{
fscanf(fp, "%[^EOF]", &src[i]);
}
fclose(fp);
printf("%c", src[0]);
getchar();
return 0;
}
Pretty appreciated!
For example like this.
#include <stdio.h>
#include <string.h>
#include <errno.h>
#define MAX_ARRAY_SIZE 1000
#define MAX_STRING_SIZE 100
int main(int argc, char *argv[]) {
FILE *fp = fopen("wordlist.txt", "r");
if (fp == NULL) {
printf("File open error\n");
return 1;
}
char arr[MAX_ARRAY_SIZE][MAX_STRING_SIZE];
int index = 0;
while (1) {
int ret = fscanf(fp, "%s", arr[index]);
if (ret == EOF) break;
++index;
if (index == MAX_ARRAY_SIZE) break;
}
fclose(fp);
for (int i = 0; i < index; ++i) {
printf("%s\n", arr[i]);
}
getchar();
return 0;
}
Some notes:
If there is an error, it is better to return 1 and not 0, for 0 means successful execution.
For a char array, you use a pointer. For a string array, you use a double pointer. A bit tricky to get used to them, but they are handy.
Also, a check of the return value of the fscanf would be great.
For fixed size arrays, it is useful to define the sizes using #define so that it is easier to change later if you use it multiple times in the code.
It's reading file one character at a time, Which itself is 4 in size like we see sizeof('a') in word able. Same goes for 'b' and so on. So one approach you can use is to keep checking when there is a space or newline character so that we can save the data before these two things as a word and then combine these small arrays by adding spaces in between and concatenating them to get a single array.

Using fscanf with a while loop to store a char and int

Im am trying to use fscanf to read through a file of hex numbers that either have a char followed by numbers or just numbers and no char. The fscanf appears to work for the first line of the file but that's it.
FILE
E10
20
22
18
E10
210
12
CODE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char** argv) {
FILE * iFile;
char instr;
unsigned long long int address;
iFile = fopen("addresses.txt", "r");
if(iFile != NULL){
while (fscanf(iFile, "%c%x", &instr, &address) > 0){
printf("%c", instr); //This just works for the first line
}
}
fclose(iFile);
return 0;
}
In such cases, where you will be trying to parse the same line many times, it is better to read the line into memory, and then deal with the data in memory, rather than on disk.
char line[MAXLINE];
fgets(line, MAXLINE, iFile);
Then you have what I call the "sscanf ladder," which is a series of if-else if clauses, each of which attempts to parse line in different ways. The condition would examine the return value of sscanf, since the number of objects successfully read is the return value. So we use this number to distinguish among several different formats:
if (sscanf(line, "%c%x", &instr, &address) == 2)
/* you have an instruction and an address */
else if (sscanf(line, "%x", &address) == 1)
/* you have an address only */
Since, in your case, this is a loop condition, you will have to refactor this into a function of its own:
int readAtLeastAddress(const char *const line, char *instr, unsigned long long *address)
{
return sscanf(line, "%c%x", instr, address) == 2 || sscanf(line, "%x", &address) == 1;
}
Then you would rewrite the loop as such
while (readAtLeastAddress(line, &instr, &address)) {
printf("%c", instr);
}

Program to count number of times a character appear in file(case insensitive)

executed this code in an online editor.but always getting File 'test.txt' has 0 instances of letter 'r'.what to do??File 'test.txt' has 99 instances of letter'r'.This is the expected output.
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
int main()
{
FILE *fptr;
int d=0;
char c;
char ch,ck;
char b[100];
printf("Enter the file name\n");
scanf("%19s",b);
fptr=fopen(b,"r");
printf("Enter the character to be counted\n");
scanf(" %c",&c);
c=toupper(c);
if(fptr==NULL)
{
exit(-1);
}
while((ck=fgetc(fptr))!=EOF)
{
ch=toupper(ck);
if(c==ch||c==ck)
++d;
}
fclose(fptr);
printf("File '%s' has %d instances of letter '%c'.",b,d,c);
return(0);
}
Problems:
ck should be an int, not a char as #alk has pointed out because fgetc returns an int, not a char.
As per the title, you want a case insensitive comparision. Your code does not do that. The solution is that this:
if(c==ch||c==ck)
needs to be
if(c == ch || (tolower(c)) == ck) /* Compare upper with upper, lower with lower */
I tested it. It works fine. I can't detect your problem without you providing the test file it fails on. As far as I am concerned, it does what it is supposed to do - it counts the number of times specified character appears (case-insensitive) in specified file. Here's your prettified version of your code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h> /* strlen */
#include <ctype.h>
#define MAX_FILENAME_LENGTH 100
int main(int argc, char **argv) {
/* Variables */
FILE *file = NULL;
int count = 0, file_char;
char target_char, filename[MAX_FILENAME_LENGTH];
/* Getting filename */
printf("Enter the file name: ");
fgets(filename, MAX_FILENAME_LENGTH, stdin);
/* Removing newline at the end of input */
size_t filename_len = strlen(filename);
if(filename_len > 0 && filename[filename_len - 1] == '\n') {
filename[filename_len - 1] = '\0'; }
/* Opening file */
file = fopen(filename, "r");
if(file == NULL) exit(EXIT_FAILURE);
/* Getting character to count */
printf("Enter the character to be counted: ");
scanf(" %c", &target_char);
target_char = toupper(target_char);
/* Counting characters */
while((file_char = fgetc(file)) != EOF) {
file_char = toupper(file_char);
if(target_char == file_char) ++count; }
/* Reporting finds */
printf("File '%s' has %d instances of letter '%c'.",
filename, count, target_char);
/* Exiting */
fclose(file);
return EXIT_SUCCESS; }

Get the length of each line in file with C and write in output file

I am a biology student and I am trying to learn perl, python and C and also use the scripts in my work. So, I have a file as follows:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this, that is the name of each sequence and the count of characters in each line and printing the total number of sequences in the end of the file.
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
I could make the perl and python scripts work, this is the python script as an example:
#!/usr/bin/python
import sys
my_file = open(sys.argv[1]) #open the file
my_output = open(sys.argv[2], "w") #open output file
total_sequence_counts = 0
for line in my_file:
if line.startswith(">"):
sequence_name = line.rstrip('\n').replace(">","")
total_sequence_counts += 1
continue
dna_length = len(line.rstrip('\n'))
my_output.write(sequence_name + " " + str(dna_length) + '\n')
my_output.write("Total number of sequences = " + str(total_sequence_counts) + '\n')
Now, I want to write the same script in C, this is what I have achieved so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
input = FILE *fopen(const char *filename, "r");
output = FILE *fopen(const char *filename, "w");
double total_sequence_counts = 0;
char sequence_name[];
char line [4095]; // set a temporary line length
char buffer = (char *) malloc (sizeof(line) +1); // allocate some memory
while (fgets(line, sizeof(line), filename) != NULL) { // read until new line character is not found in line
buffer = realloc(*buffer, strlen(line) + strlen(buffer) + 1); // realloc buffer to adjust buffer size
if (buffer == NULL) { // print error message if memory allocation fails
printf("\n Memory error");
return 0;
}
if (line[0] == ">") {
sequence_name = strcpy(sequence_name, &line[1]);
total_sequence_counts += 1
}
else {
double length = strlen(line);
fprintf(output, "%s \t %ld", sequence_name, length);
}
fprintf(output, "%s \t %ld", "Total number of sequences = ", total_sequence_counts);
}
int fclose(FILE *input); // when you are done working with a file, you should close it using this function.
return 0;
int fclose(FILE *output);
return 0;
}
But this code, of course is full of mistakes, my problem is that despite studying a lot, I still can't properly understand and use the memory allocation and pointers so I know I especially have mistakes in that part. It would be great if you could comment on my code and see how it can turn into a script that actually work. By the way, in my actual data, the length of each line is not defined so I need to use malloc and realloc for that purpose.
For a simple program like this, where you look at short lines one at a time, you shouldn't worry about dynamic memory allocation. It is probably good enough to use local buffers of a reasonable size.
Another thing is that C isn't particularly suited for quick-and-dirty string processing. For example, there isn't a strstrip function in the standard library. You usually end up implementing such behaviour yourself.
An example implementation looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
char line[MAXLEN]; /* Current line buffer */
char ref[MAXLEN] = ""; /* Sequence reference buffer */
int nseq = 0; /* Sequence counter */
if (argc != 3) {
fprintf(stderr, "Usage: %s infile outfile\n", argv[0]);
exit(1);
}
in = fopen(argv[1], "r");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s.\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "w");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s for writing.\n", argv[2]);
exit(1);
}
while (fgets(line, sizeof(line), in)) {
int len = strlen(line);
/* Strip whitespace from end */
while (len > 0 && isspace(line[len - 1])) len--;
line[len] = '\0';
if (line[0] == '>') {
/* First char is '>': copy from second char in line */
strcpy(ref, line + 1);
} else {
/* Other lines are sequences */
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
A lot of code is about enforcing arguments and opening and closing files. (You could cut out a lot of code if you used stdin and stdout with file redirections.)
The core is the big while loop. Things to note:
fgets returns NULL on error or when the end of file is reached.
The first lines determine the length of the line and then remove white-space from the end.
It is not enough to decrement length, at the end the stripped string must be terminated with the null character '\0'
When you check the first character in the line, you should check against a char, not a string. In C, single and double quotes are not interchangeable. ">" is a string literal of two characters, '>' and the terminating '\0'.
When dealing with countable entities like chars in a string, use integer types, not floating-point numbers. (I've used (signed) int here, but because there can't be a negative number of chars in a line, it might have been better to have used an unsigned type.)
The notation line + 1 is equivalent to &line[1].
The code I've shown doesn't check that there is always one reference per sequence. I'll leave this as exercide to the reader.
For a beginner, this can be quite a lot to keep track of. For small text-processing tasks like yours, Python and Perl are definitely better suited.
Edit: The solution above won't work for long sequences; it is restricted to MAXLEN characters. But you don't need dynamic allocation if you only need the length, not the contents of the sequences.
Here's an updated version that doesn't read lines, but read characters instead. In '>' context, it stored the reference. Otherwise it just keeps a count:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h> /* for isspace() */
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
int nseq = 0; /* Sequence counter */
char ref[MAXLEN]; /* Reference name */
in = fopen(argv[1], "r");
out = fopen(argv[2], "w");
/* Snip: Argument and file checking as above */
while (1) {
int c = getc(in);
if (c == EOF) break;
if (c == '>') {
int n = 0;
c = fgetc(in);
while (c != EOF && c != '\n') {
if (n < sizeof(ref) - 1) ref[n++] = c;
c = fgetc(in);
}
ref[n] = '\0';
} else {
int len = 0;
int n = 0;
while (c != EOF && c != '\n') {
n++;
if (!isspace(c)) len = n;
c = fgetc(in);
}
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
Notes:
fgetc reads a single byte from a file and returns this byte or EOF when the file has ended. In this implementation, that's the only reading function used.
Storing a reference string is implemented via fgetc here too. You could probably use fgets after skipping the initial angle bracket, too.
The counting just reads bytes without storing them. n is the total count, len is the count up to the last non-space. (Your lines probably consist only of ACGT without any trailing space, so you could skip the test for space and use n instead of len.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
FILE *my_file = fopen(argv[1], "r");
FILE *my_output = fopen(argv[2], "w");
int total_sequence_coutns = 0;
char *sequence_name;
int dna_length;
char *line = NULL;
size_t size = 0;
while(-1 != getline(&line, &size, my_file)){
if(line[0] == '>'){
sequence_name = strdup(strtok(line, ">\n"));
total_sequence_coutns +=1;
continue;
}
dna_length = strlen(strtok(line, "\n"));
fprintf(my_output, "%s %d\n", sequence_name, dna_length);
free(sequence_name);
}
fprintf(my_output, "Total number of sequences = %d\n", total_sequence_coutns);
fclose(my_file);
fclose(my_output);
free(line);
return (0);
}

Not reading from stdin properly

I'm trying to mimic the behavior of the unix utility cat, but when I call a command of the form:
cat file1 - file2 - file3
My program will output file1 correctly, then read in from stdin, then when I press EOF, it will print file 2 then file 3, without reading from stdin for the second time.
Why might this be?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ASCII_LENGTH 255
int printfile(FILE *source, int N);
int main(int argc, char *argv[])
{
int currentarg = 1; //the argument number currently being processed
FILE *input_file;
//if there are no arguments, dog reads from standard input
if(argc == 1 || currentarg == argc)
{
input_file = stdin;
printfile(input_file,0);
}
else
{
int i;
for(i = currentarg; i < argc; i++)
{
printf("%d %s\n",i,argv[i]);
//if file is a single dash, dog reads from standard input
if(strcmp(argv[i],"-") == 0)
{
input_file = stdin;
printfile(input_file,0);
fflush(stdin);
fclose(stdin);
clearerr(stdin);
}
else if ((input_file = fopen(argv[i], "r")) == NULL)
{
fprintf(stderr, "%s: %s: No such file or directory\n", argv[0], argv[i]);
return 1;
}
else
{
printfile(input_file,0);
fflush(input_file);
fclose(input_file);
clearerr(input_file);
}
}
}
return 0;
}
int printfile(FILE *source, int N)
{
//used to print characters of a file to the screen
//characters can be shifted by some number N (between 0 and 25 inclusive)
char c;
while((c = fgetc(source)) != EOF)
{
fputc((c+N)%ASCII_LENGTH,stdout);
}
printf("***** %c %d",c,c==EOF);
return 0;
}
For one thing, you can't expect to be able to read from stdin after you've closed it:
fclose(stdin);
fflush(stdin); is undefined behaviour, as is fflush on all files open only for input. That's sort of like flushing the toilet and expecting the waste to come out of the bowl, because fflush is only defined for files open for output! I would suggest something like for (int c = fgetc(stdin); c >= 0 && c != '\n'; c = fgetc(stdin)); if you wish to discard the remainder of a line.
Furthermore, fgetc returns int for a reason: Inside the int will be an unsigned char value or EOF. c should be an int, not a char. EOF isn't a character! It's a negative int value. This differentiates it from any possible characters, because successful calls to fgetc will only return a positive integer rather than a negative EOF. fputc expects input in the form of an unsigned char value. char isn't required to be unsigned. Providing your fgetc call is successful and you store the return value into an int, that int should be safe to pass on to fputc.

Resources