Segmentation Fault on my while loop - c

I am trying to count the number of lines and characters whatever they may be in a file that I specify from argv. But I get a segmentation fault when I hit the while loop for some reason. The program runs fine without the while loop, though it only goes through once.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
if(argc != 2) {
return 0;
}
FILE *fp;
char c;
int lines = 0;
int chs = 0;
fp = fopen(argv[1], "r");
//Segmentation Fault happens here on the while loop
while((c = fgetc(fp)) != EOF) {
if(c == '\n') {
lines += 1;
}
else {
chs += 1;
}
}
printf("Charaters: %d\n", chs);
printf("lines: %d\n", lines);
if(fp){
fclose(fp);
}
return 0;
}

Your code needs to be follow Idiomatic C more closely.
You should validate fopen immediately, instead of after you've already attempted to use fp.
fgetc returns int, not char. This is because it needs to return side-channel information about the status of the stream (i.e. EOF), this information cannot be represented by char, but you can safely cast the int value to char if the value is not EOF.
Your code treats \r as a regular character when it is commonplace for \r\n to represent a line-break (not just a solitary \n), you might want to consider how you handle different character classes.
Your program does not handle non-trivial encodings (i.e. it will only correctly handle files in your system's native encoding, presumably ASCII). You should use a Unicode library to correctly read individual characters from a file: for example your program will treat a surrogate-pair in UTF-8 as two characters instead of 1, and would incorrectly count UTF-16 files.
Better:
FILE* fp = fopen( argv[1], "r" );
if( !fp ) {
printf( "Could not open file \"%s\" for reading.\r\n", argv[1] );
return 1;
}
int lines = 0;
int chars = 0;
int nc;
while( ( nc = fgetc( fp ) ) != EOF ) {
char c = (char)nc;
if ( c == '\n' ) lines++;
else if( c != '\r' ) chars++;
}
printf( "Characters: %d\r\nLines: %d\r\n", chars, lines );
fclose( fp );
return 0;

Related

C return if characters are the same

So, I'm just learning how to code in C and I'm a bit stumped by what my teacher is asking. Basically, we need to create a program that reads characters from two separate files character by character and if the two characters are the exact same it is to be printed to a third file. Normally, I would use an array to do this but, we have specifically been told that we are not allowed to use an array. I sort of have it working in that it prints characters to the third file but it prints numbers and punctuation in addition to characters. Obviously, I'm missing something but I don't know what it is. Here is my code:
int main(int argc, char *argv[]) {
FILE *fp, *fp2, *ofp;
char a, b;
fp = fopen("input1a.txt", "r");
fp2 = fopen("input1b.txt", "r");
ofp = fopen("output.txt", "w");
while((a=getc(fp))!= -1){
b=getc(fp2);
if(isalpha(a) == isalpha(b)){
putc(a, ofp);
letters++;
}
}
return 0;
}
From what I have read, isalpha should check to see if the character is an alphabetical character but, in this instance is there something better for me to use? Thanks for any help you can give me!
This is the problem
if(isalpha(a) == isalpha(b)) {
it sould be
if (isalpha(a) && isalpha(b) && a == b)
You are not comparing whether the characters are equal, but instead you are comparing if both are alphabetic or not, in case they are both alphabetic or they are both NOT alphabetic the character will be printed to the file.
Also, get used to writing safe code which is also clean and readable, like the following (note that there are elegant methods to handle errors, and you can add error messages):
int main(int argc, char *argv[])
{
FILE *in[2];
FILE *out;
int chars[2]; // getc() returns `int' not `char'
int counter;
in[0] = fopen("input1a.txt", "r");
if (in[0] == NULL)
return -1;
in[1] = fopen("input1b.txt", "r");
if (in[1] == NULL) {
fclose(in[0]);
return -1;
}
output = fopen("output.txt", "w");
if (output == NULL) {
fclose(in[0]);
fclose(in[1]);
return -1;
}
counter = 0;
while (((chars[0] = getc(in[0])) != EOF) && ((chars[1] = getc(in[1])) != EOF) {
if (chars[0] == chars[1] && isalpha(chars[0])) {
putc(chars[0], out);
counter++;
}
}
// Release ALL resources, it's a good habit
fclose(inputs[0]);
fclose(inputs[1]);
fclose(output);
// Not required but a good habit too
return 0;
}
Don't use magic numbers, EOF is normally -1 but it's better to use EOF to make the code readable and robust.

Get the length of each line in file with C and write in output file

I am a biology student and I am trying to learn perl, python and C and also use the scripts in my work. So, I have a file as follows:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this, that is the name of each sequence and the count of characters in each line and printing the total number of sequences in the end of the file.
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
I could make the perl and python scripts work, this is the python script as an example:
#!/usr/bin/python
import sys
my_file = open(sys.argv[1]) #open the file
my_output = open(sys.argv[2], "w") #open output file
total_sequence_counts = 0
for line in my_file:
if line.startswith(">"):
sequence_name = line.rstrip('\n').replace(">","")
total_sequence_counts += 1
continue
dna_length = len(line.rstrip('\n'))
my_output.write(sequence_name + " " + str(dna_length) + '\n')
my_output.write("Total number of sequences = " + str(total_sequence_counts) + '\n')
Now, I want to write the same script in C, this is what I have achieved so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
input = FILE *fopen(const char *filename, "r");
output = FILE *fopen(const char *filename, "w");
double total_sequence_counts = 0;
char sequence_name[];
char line [4095]; // set a temporary line length
char buffer = (char *) malloc (sizeof(line) +1); // allocate some memory
while (fgets(line, sizeof(line), filename) != NULL) { // read until new line character is not found in line
buffer = realloc(*buffer, strlen(line) + strlen(buffer) + 1); // realloc buffer to adjust buffer size
if (buffer == NULL) { // print error message if memory allocation fails
printf("\n Memory error");
return 0;
}
if (line[0] == ">") {
sequence_name = strcpy(sequence_name, &line[1]);
total_sequence_counts += 1
}
else {
double length = strlen(line);
fprintf(output, "%s \t %ld", sequence_name, length);
}
fprintf(output, "%s \t %ld", "Total number of sequences = ", total_sequence_counts);
}
int fclose(FILE *input); // when you are done working with a file, you should close it using this function.
return 0;
int fclose(FILE *output);
return 0;
}
But this code, of course is full of mistakes, my problem is that despite studying a lot, I still can't properly understand and use the memory allocation and pointers so I know I especially have mistakes in that part. It would be great if you could comment on my code and see how it can turn into a script that actually work. By the way, in my actual data, the length of each line is not defined so I need to use malloc and realloc for that purpose.
For a simple program like this, where you look at short lines one at a time, you shouldn't worry about dynamic memory allocation. It is probably good enough to use local buffers of a reasonable size.
Another thing is that C isn't particularly suited for quick-and-dirty string processing. For example, there isn't a strstrip function in the standard library. You usually end up implementing such behaviour yourself.
An example implementation looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
char line[MAXLEN]; /* Current line buffer */
char ref[MAXLEN] = ""; /* Sequence reference buffer */
int nseq = 0; /* Sequence counter */
if (argc != 3) {
fprintf(stderr, "Usage: %s infile outfile\n", argv[0]);
exit(1);
}
in = fopen(argv[1], "r");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s.\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "w");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s for writing.\n", argv[2]);
exit(1);
}
while (fgets(line, sizeof(line), in)) {
int len = strlen(line);
/* Strip whitespace from end */
while (len > 0 && isspace(line[len - 1])) len--;
line[len] = '\0';
if (line[0] == '>') {
/* First char is '>': copy from second char in line */
strcpy(ref, line + 1);
} else {
/* Other lines are sequences */
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
A lot of code is about enforcing arguments and opening and closing files. (You could cut out a lot of code if you used stdin and stdout with file redirections.)
The core is the big while loop. Things to note:
fgets returns NULL on error or when the end of file is reached.
The first lines determine the length of the line and then remove white-space from the end.
It is not enough to decrement length, at the end the stripped string must be terminated with the null character '\0'
When you check the first character in the line, you should check against a char, not a string. In C, single and double quotes are not interchangeable. ">" is a string literal of two characters, '>' and the terminating '\0'.
When dealing with countable entities like chars in a string, use integer types, not floating-point numbers. (I've used (signed) int here, but because there can't be a negative number of chars in a line, it might have been better to have used an unsigned type.)
The notation line + 1 is equivalent to &line[1].
The code I've shown doesn't check that there is always one reference per sequence. I'll leave this as exercide to the reader.
For a beginner, this can be quite a lot to keep track of. For small text-processing tasks like yours, Python and Perl are definitely better suited.
Edit: The solution above won't work for long sequences; it is restricted to MAXLEN characters. But you don't need dynamic allocation if you only need the length, not the contents of the sequences.
Here's an updated version that doesn't read lines, but read characters instead. In '>' context, it stored the reference. Otherwise it just keeps a count:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h> /* for isspace() */
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
int nseq = 0; /* Sequence counter */
char ref[MAXLEN]; /* Reference name */
in = fopen(argv[1], "r");
out = fopen(argv[2], "w");
/* Snip: Argument and file checking as above */
while (1) {
int c = getc(in);
if (c == EOF) break;
if (c == '>') {
int n = 0;
c = fgetc(in);
while (c != EOF && c != '\n') {
if (n < sizeof(ref) - 1) ref[n++] = c;
c = fgetc(in);
}
ref[n] = '\0';
} else {
int len = 0;
int n = 0;
while (c != EOF && c != '\n') {
n++;
if (!isspace(c)) len = n;
c = fgetc(in);
}
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
Notes:
fgetc reads a single byte from a file and returns this byte or EOF when the file has ended. In this implementation, that's the only reading function used.
Storing a reference string is implemented via fgetc here too. You could probably use fgets after skipping the initial angle bracket, too.
The counting just reads bytes without storing them. n is the total count, len is the count up to the last non-space. (Your lines probably consist only of ACGT without any trailing space, so you could skip the test for space and use n instead of len.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
FILE *my_file = fopen(argv[1], "r");
FILE *my_output = fopen(argv[2], "w");
int total_sequence_coutns = 0;
char *sequence_name;
int dna_length;
char *line = NULL;
size_t size = 0;
while(-1 != getline(&line, &size, my_file)){
if(line[0] == '>'){
sequence_name = strdup(strtok(line, ">\n"));
total_sequence_coutns +=1;
continue;
}
dna_length = strlen(strtok(line, "\n"));
fprintf(my_output, "%s %d\n", sequence_name, dna_length);
free(sequence_name);
}
fprintf(my_output, "Total number of sequences = %d\n", total_sequence_coutns);
fclose(my_file);
fclose(my_output);
free(line);
return (0);
}

hexadecimal to decimal conversion

The first piece of code prints each line in b.txt in a new line when it outputs it, and the second code is the conversion from hexadecimal to decimal. I am bad at writing big programs, so I split the task and write smaller programs instead. I am having trouble combining these two programs. Can anyone help ?
#include <stdio.h>
int main ( int argc, char **argv )
{
FILE *fp = fopen ( "b", "r");
char line[1024];
int ch = getc ( fp );
int index = 0;
while ( ch != EOF ) {
if ( ch != '\n'){
line[index++] = ch;
}else {
line[index] = '\0';
index = 0;
printf ( "%d\n", line );
}
ch = getc ( fp );
}
fclose ( fp );
return 0;
}
This is the second program
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned int d;
FILE *fp;
FILE *ptr_file;
fp = fopen("normal_data","r"); // read mode
ptr_file =fopen("normal_decimal", "w");
while(fscanf(fp,"%x", &d) == 1)
{
fprintf(ptr_file, "%d /n", d);
}
while( ( d = fgetc(fp) ) != EOF )
fclose(fp);
return 0;
}
It is good programming practice to split your program in small related fragments.
But instead of using a main function everywhere , try making functions which accomplish certain tasks and add them to a header file.
This will make it much easier to write, debug and re-use the code.
In the above case, converting hexadecimal to decimal is clearly something which maybe used again and again.
So, just make a function int hex_to_dec(char* input); which takes a string of input e.g,"3b8c" and converts it to a decimal and returns the converted value.
You may also want to make function void printFile(FILE* fp); which takes the pointer to a file and prints it data to stdout.
You can add these and other functions you have made, to a header file like myFunctions.h and then include the file into whatever program you need to use your functions in.

Reading only first character of each line in file

I'm currently trying to read and process only first character in each line of a ".c" file. So far i have came to this code, but n is not even printed ot od the loop:
void FileProcess(char* FilePath)
{
char mystring [100];
FILE* pFile;
int upper = 0;
int lower = 0;
char c;
int n =0;
pFile = fopen (FilePath , "r");
do {
c = fgetc (pFile);
if (isupper(c)) n++;
} while (c != EOF);
printf("6");
printf(n);
fclose (pFile);
}
A few points:
You are not printing n correctly. You are feeding it to printf as the "formatting string". It is surprising that you get away with it - this would normally cause havoc.
You are reading one character at a time. If you want to print only the first character of each line, better read a line at a time, then print the first character. Use fgets to read entire line into a buffer (make sure your buffer is big enough).
Example (updated with inputs from #chux - and instrumented with some additional code to aid in debugging the "n=1" problem):
void FileProcess(char* FilePath)
{
char mystring [1000];
FILE* pFile;
int upper = 0;
int lower = 0;
char c;
int n =0;
pFile = fopen (FilePath , "r");
printf("First non-space characters encountered:\n")
while(fgets( myString, 1000, pFile) != NULL)
int jj = -1;
while(++jj < strlen(myString)) {
if ((c = myString[jj]) != ' ') break;
}
printf("%c", c);
if (isupper(c)) {
printf("*U*\n"); // print *U* to show character recognized as uppercase
n++;
}
else {
printf("*L*\n"); // print *L* to show character was recognized as not uppercase
}
}
printf("\n");
printf("n is %d\n", n);
fclose (pFile);
}
NOTE there are other more robust methods of reading lines to make sure you have everything (my favorite is getline() but it is not available for all compilers) . If you are sure your code lines are not very long, this will work (maybe make the buffer a bit bigger than 100 characters though)

Not reading from stdin properly

I'm trying to mimic the behavior of the unix utility cat, but when I call a command of the form:
cat file1 - file2 - file3
My program will output file1 correctly, then read in from stdin, then when I press EOF, it will print file 2 then file 3, without reading from stdin for the second time.
Why might this be?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ASCII_LENGTH 255
int printfile(FILE *source, int N);
int main(int argc, char *argv[])
{
int currentarg = 1; //the argument number currently being processed
FILE *input_file;
//if there are no arguments, dog reads from standard input
if(argc == 1 || currentarg == argc)
{
input_file = stdin;
printfile(input_file,0);
}
else
{
int i;
for(i = currentarg; i < argc; i++)
{
printf("%d %s\n",i,argv[i]);
//if file is a single dash, dog reads from standard input
if(strcmp(argv[i],"-") == 0)
{
input_file = stdin;
printfile(input_file,0);
fflush(stdin);
fclose(stdin);
clearerr(stdin);
}
else if ((input_file = fopen(argv[i], "r")) == NULL)
{
fprintf(stderr, "%s: %s: No such file or directory\n", argv[0], argv[i]);
return 1;
}
else
{
printfile(input_file,0);
fflush(input_file);
fclose(input_file);
clearerr(input_file);
}
}
}
return 0;
}
int printfile(FILE *source, int N)
{
//used to print characters of a file to the screen
//characters can be shifted by some number N (between 0 and 25 inclusive)
char c;
while((c = fgetc(source)) != EOF)
{
fputc((c+N)%ASCII_LENGTH,stdout);
}
printf("***** %c %d",c,c==EOF);
return 0;
}
For one thing, you can't expect to be able to read from stdin after you've closed it:
fclose(stdin);
fflush(stdin); is undefined behaviour, as is fflush on all files open only for input. That's sort of like flushing the toilet and expecting the waste to come out of the bowl, because fflush is only defined for files open for output! I would suggest something like for (int c = fgetc(stdin); c >= 0 && c != '\n'; c = fgetc(stdin)); if you wish to discard the remainder of a line.
Furthermore, fgetc returns int for a reason: Inside the int will be an unsigned char value or EOF. c should be an int, not a char. EOF isn't a character! It's a negative int value. This differentiates it from any possible characters, because successful calls to fgetc will only return a positive integer rather than a negative EOF. fputc expects input in the form of an unsigned char value. char isn't required to be unsigned. Providing your fgetc call is successful and you store the return value into an int, that int should be safe to pass on to fputc.

Resources