We were supposed to extract strings from a provided file, the output matches the expect, but it reports segmentation fault in the end and I don't know why.
#include<stdio.h>
#include<string.h>
int main(int argc, char *argv[]){
char str[100];
char f;
int len = 0;
FILE *file;
file = fopen(argv[1],"r");
//read only here, so use "r"
if(file==NULL){
printf("The file doesn't exist.\n");
return 1;
}
while(feof(file)==0){
//if feof returns 0 it means it havent reaches the end yet
fread(&f,sizeof(f),1,file);//read in the file
//printabel character between 32 and 126
if(f>=32&&f<=126){
str[len] = f;
len++;
continue;//keep doing it(for ->while)
}
if(strlen(str)>3){
//a string is a run of at least 4
printf("The output is:%s\n",str);
len=0;//reset
memset(str, 0, sizeof(str));
//reset the str so it wont get too big(overflow)
}
}
//close the file and return
fclose(file);
return 0;
}
This is not true
while(feof(file)==0){
//if feof returns 0 it means it havent reaches the end yet
And a very common mistake.
This returns 0 if you have Not read past the end of file. Its s subtle but important detail. Your last read may have read up-to the end of file but not past it. This means there is actually no data left to read but feof() will still return 0.
This is why you must test the result of the read operation.
fread(&f,sizeof(f),1,file);
If this returns zero then you failed to read anything.
Which is why you should structure your loop to test the result of the read (not feof()).
while (fread(&f,sizeof(f),1,file) == 1)
{
// You have successfully read an object from the stream
}
Your code has some fundamental errors:
See Why is while ( !feof (file) ) always wrong?
You don't check if fread returns 0, meaning that no more character could be
read, yet you continue with your algorithm
str is not '\0'-terminated, the strlen(str)>3 yields undefined
behaviour in the first iteration and will likely be evaluated as true right in the first iteration.
Then the printf would also yield undefined behaviour for the same reason.
Don't use the ASCII code directly, it's hard to read, you have to look up in
the ASCII table to see what 32 is and what 126. Better use the character
constants
if(f>= ' ' && f <= '~'){
...
}
This is easier to read and you get the intention of the code immediately.
So the program can be rewritten like this:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char str[100];
char f;
int len = 0;
FILE *file;
file = fopen(argv[1],"r");
//read only here, so use "r"
if(file==NULL){
printf("The file doesn't exist.\n");
return 1;
}
memset(str, 0, sizeof str);
while(fread(&f, sizeof f, 1, file) == 1)
{
if(f >= ' ' && f <= '~')
{
str[len++] = f;
continue;
}
if(strlen(str) > 3) // or if(len > 3)
{
printf("The output is: %s\n", str);
len = 0;
memset(str, 0, sizeof str);
}
}
fclose(file);
return 0;
}
Related
there is very long "dict.txt" file.
the size of this file is about 2400273(calculated by fseek, SEEK_END)
this file has lots of char like this 'apple = 사과'(simillar to dictionary)
Main problem is that reading file takes very long time
I couldn't find any solution to solve this problem in GOOGLE
The reason i guessed is associated with using fgets() but i don't know exactly.
please help me
here is my code written by C
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
int line = 0;
char txt_str[50];
FILE* pFile;
pFile = fopen("dict_test.txt", "r");
if (pFile == NULL) {
printf("file doesn't exist or there is problem to open your file\n");
}
else {
do{
fgets(txt_str, 50, pFile);;
line++;
} while (txt_str != EOF);
}
printf("%d", line);
}
Output
couldn't see result because program was continuosly running
Expected
the number of lines of this txt file
Major
OP's code fail to test the return value of fgets(). Code needs to check the return value of fgets() to know when to stop. #A4L
do{
fgets(txt_str, 50, pFile);; // fgets() return value not used.
Other
Line count should not get incremented when fgets() returns NULL.
Line count should not get incremented when fgets() read a partial line. (I. e.) the line was 50 or longer. Reasonable to use a wider than 50 buffer.
Line count may exceed INT_MAX. There is always some upper bound, yet trivial to use a wider type.
Good practice to close the stream.
Another approach to count lines would use fread() to read chunks of memory and then look for start of lines. (Not shown)
Recommend to print a '\n' after the line count.
int main(void) {
FILE* pFile = fopen("dict_test.txt", "r");
if (pFile == NULL) {
printf("File doesn't exist or there is problem to open your file.\n");
return EXIT_FAILURE;
}
unsigned long long line = 0;
char txt_str[4096];
while (fgets(txt_str, sizeof txt_str, pFile)) {
if (strlen(txt_str) == sizeof txt_str - 1) { // Buffer full?
if (txt_str[sizeof txt_str - 1] != '\n') { // Last not \n?
continue;
}
}
line++;
}
fclose(pFile);
printf("%llu\n", line);
}
fgets returns NULL on EOF.
You are never assigning the result of
fgets(txt_str, 50, pFile);
to txt_str, your program never sees the end of the file and thus enters an endless loop.
try something like this:
char* p_str;
do{
p_str = fgets(txt_str, 50, pFile);
} while (p_str != NULL);
I'm trying to read information printed by program A from program B. How can I pass data from A to B using read()?.
code for A
#include <stdio.h>
int main(int argc, char **argv)
{
int i, j;
char instruc_list[11][3] = {"sa", "sb", "ss", "pa", "pb",
"ra", "rb", "rr", "rra", "rrb", "rrr"};
i = 0;
while (i < 11)
{
j = 0;
while (j < 3)
{
printf("%c", instruc_list[i][j]);
j++;
}
i++;
printf("\n");
}
return (0);
}
code for B
int main()
{
char buf[4];
while ((read(0,buf, 4)))
{
printf("%s", buf);
}
printf("\n");
return 0;
}
When I run this two programs, I get the following result.
Use the popen() and pclose() functions defined in stdio.h to pipe output between programs.
Here's an example program of how to print the output of the ls shell command in your program, taken from this link:
FILE *fp;
int status;
char path[PATH_MAX];
fp = popen("ls *", "r");
if (fp == NULL)
/* Handle error */;
while (fgets(path, PATH_MAX, fp) != NULL)
printf("%s", path);
status = pclose(fp);
if (status == -1) {
/* Error reported by pclose() */
...
} else {
/* Use macros described under wait() to inspect `status' in order
to determine success/failure of command executed by popen() */
...
}
For your case, you'd call popen("./A", "r");.
You can use popen() to read the output of program A from program B.
Compile the first program:
gcc a.c -o a
In the program B:
#include <stdio.h>
int main(void)
{
char buf[4];
FILE *fp;
fp = popen("./a", "r");
while( !feof(fp)) {
fscanf(fp, "%s", buf);
printf("%s\n", buf);
}
return 0;
pclose(fp);
}
Now compile and execute the program B:
gcc b.c -o b
me#linux:$ ./b
The output I got is:
sa
sb
ss
pa
pb
ra
rb
rr
rra
rrb
rrr
rrr
In program A, you're not writing the null terminators for the 3-letter strings... and in program B, you're not adding a null char after the characters you read (and haven't initialised buf, so it might not contain one). That's why you're getting garbage between the 3-letter strings you read... printf() is continuing past the characters you read because it hasn't found a null yet.
Also note that read() can return -1 for error, which would still test as true for your while loop. You should at least check that read() returns greater than 0 (rather than just non-zero), if not put in more thorough error handling.
So with some changes to address these issues, program B might become:
int main()
{
char buf[4];
int ret; // ** for the return from read()
while ((ret = read(0,buf, 4)) > 0) // ** read >0 (no error, and bytes read)
{
fwrite(buf, 1, ret, stdout); // ** write the number of chars
// you read to stdout
}
printf("\n");
return 0;
}
As for program A, right now it writes 3 characters for both the 2-letter and the 3-letter strings -- which means it includes the null char for the 2-letter strings but not for the 3-letter strings. With the changes to program B above, you don't need to write the null characters at all... so you could change:
while (j < 3)
to:
while (j < 3 && instruc_list[i][j] != 0)
to stop when the null character is reached (though it's still inefficient to use a printf() call just to write a single char -- perhaps putchar(instruc_list[i][j]); would be better). Or, you could just replace that inner while loop with:
fputs(instruc_list[i], stdout);
...which would then write the string in instruc_list[i] up to but not including the null char, and also change instruc_list[11][3] to instruc_list[11][4] so that it has room for the null char from the 3-letter string literals in the initialiser list.
I am trying to open a text file inputted by the user and read this text file but print the text file 60 characters at a time so I think in order for me to do this I need to store the text into an array and if it is over 60 characters on a line it should start on a new line. However, when I run the code below an error message shows up saying : C^#
#include <stdio.h>
#include <stdlib.h>
int main()
{
char arr[];
arr[count] = '\0';
char ch, file_name[25];
FILE *fp;
printf("Enter file name: \n");
gets(file_name);
fp = fopen(file_name,"r"); // reading the file
if( fp == NULL )
{
perror("This file does not exist\n"); //if file cannot be found print error message
exit(EXIT_FAILURE);
}
printf("The contents of %s file are :\n", file_name);
while( ( ch = fgetc(fp) ) != EOF ){
arr[count] = ch;
count++;
printf("%s", arr);}
fclose(fp);
return 0;
}
char arr[]; is invalid.you need to specify a size.
array[count] = '\0'; : count is uninitialized.
gets(file_name); : gets is deprecated and dangerous.use another function like scanf.
Try the following code :
#include <stdio.h>
#include <stdlib.h>
int main()
{
int ch , count = 0;
char file_name[25];
FILE *fp;
printf("Enter file name: \n");
scanf(" %24s",file_name);
fp = fopen(file_name,"r"); // reading the file
if( fp == NULL )
{
perror("This file does not exist\n"); //if file cannot be found print error message
exit(EXIT_FAILURE);
}
fseek(fp, 0L, SEEK_END);
long sz = ftell(fp);
fseek(fp, 0L, SEEK_SET);
char arr[sz];
while( ( ch = fgetc(fp) ) != EOF )
{
if( count < sz )
{
arr[count] = ch;
count++;
}
}
arr[sz] = '\0';
printf("The contents of %s file are :\n", file_name);
printf("arr : %s\n",arr);
fclose(fp);
return 0;
}
fgetc always reads the next character until EOF. use fgets() instead:
char *fgets(char *s, int size, FILE *stream)
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte (aq\0aq) is stored after the last character in the
buffer.
1) your while loop is not properly delimited. In the absence of a { } block, the instruction arr[count] = ch; is the only repeted one.
I suppose it should include the incrementation of count too
while( ( ch = fgetc(fp) ) != EOF )
{
arr[count] = ch;
count++;
....
}
among other things (testing the counter etc).
2) there's no imperative need to read and store in an array. It is perfectly possible to transfer each character as soon as it is read, and add a line break when needed (new line, limit of 60 exceeded).
Three problems:
The variable count is not initialized, so it's value is indeterminate and using it will lead to undefined behavior.
The call printf(arr) treats arr as a string but arr is not terminated which again leads to undefined behavior.
The increment of count is outside the loop.
To solve the two first problems you must first initialize count to zero, then you must terminate the string after the loop:
arr[count] = '\0';
However, your printf(arr) call is still very problematic, what if the user enters some printf formatting codes, what will happen then? That's why you should never call printf with a user-provided input string, instead simply do
printf("%s", arr);
You also have a very big problem if the contents of the file you read is longer than 59 characters, and then you will overflow the array.
I am a biology student and I am trying to learn perl, python and C and also use the scripts in my work. So, I have a file as follows:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this, that is the name of each sequence and the count of characters in each line and printing the total number of sequences in the end of the file.
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
I could make the perl and python scripts work, this is the python script as an example:
#!/usr/bin/python
import sys
my_file = open(sys.argv[1]) #open the file
my_output = open(sys.argv[2], "w") #open output file
total_sequence_counts = 0
for line in my_file:
if line.startswith(">"):
sequence_name = line.rstrip('\n').replace(">","")
total_sequence_counts += 1
continue
dna_length = len(line.rstrip('\n'))
my_output.write(sequence_name + " " + str(dna_length) + '\n')
my_output.write("Total number of sequences = " + str(total_sequence_counts) + '\n')
Now, I want to write the same script in C, this is what I have achieved so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
input = FILE *fopen(const char *filename, "r");
output = FILE *fopen(const char *filename, "w");
double total_sequence_counts = 0;
char sequence_name[];
char line [4095]; // set a temporary line length
char buffer = (char *) malloc (sizeof(line) +1); // allocate some memory
while (fgets(line, sizeof(line), filename) != NULL) { // read until new line character is not found in line
buffer = realloc(*buffer, strlen(line) + strlen(buffer) + 1); // realloc buffer to adjust buffer size
if (buffer == NULL) { // print error message if memory allocation fails
printf("\n Memory error");
return 0;
}
if (line[0] == ">") {
sequence_name = strcpy(sequence_name, &line[1]);
total_sequence_counts += 1
}
else {
double length = strlen(line);
fprintf(output, "%s \t %ld", sequence_name, length);
}
fprintf(output, "%s \t %ld", "Total number of sequences = ", total_sequence_counts);
}
int fclose(FILE *input); // when you are done working with a file, you should close it using this function.
return 0;
int fclose(FILE *output);
return 0;
}
But this code, of course is full of mistakes, my problem is that despite studying a lot, I still can't properly understand and use the memory allocation and pointers so I know I especially have mistakes in that part. It would be great if you could comment on my code and see how it can turn into a script that actually work. By the way, in my actual data, the length of each line is not defined so I need to use malloc and realloc for that purpose.
For a simple program like this, where you look at short lines one at a time, you shouldn't worry about dynamic memory allocation. It is probably good enough to use local buffers of a reasonable size.
Another thing is that C isn't particularly suited for quick-and-dirty string processing. For example, there isn't a strstrip function in the standard library. You usually end up implementing such behaviour yourself.
An example implementation looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
char line[MAXLEN]; /* Current line buffer */
char ref[MAXLEN] = ""; /* Sequence reference buffer */
int nseq = 0; /* Sequence counter */
if (argc != 3) {
fprintf(stderr, "Usage: %s infile outfile\n", argv[0]);
exit(1);
}
in = fopen(argv[1], "r");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s.\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "w");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s for writing.\n", argv[2]);
exit(1);
}
while (fgets(line, sizeof(line), in)) {
int len = strlen(line);
/* Strip whitespace from end */
while (len > 0 && isspace(line[len - 1])) len--;
line[len] = '\0';
if (line[0] == '>') {
/* First char is '>': copy from second char in line */
strcpy(ref, line + 1);
} else {
/* Other lines are sequences */
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
A lot of code is about enforcing arguments and opening and closing files. (You could cut out a lot of code if you used stdin and stdout with file redirections.)
The core is the big while loop. Things to note:
fgets returns NULL on error or when the end of file is reached.
The first lines determine the length of the line and then remove white-space from the end.
It is not enough to decrement length, at the end the stripped string must be terminated with the null character '\0'
When you check the first character in the line, you should check against a char, not a string. In C, single and double quotes are not interchangeable. ">" is a string literal of two characters, '>' and the terminating '\0'.
When dealing with countable entities like chars in a string, use integer types, not floating-point numbers. (I've used (signed) int here, but because there can't be a negative number of chars in a line, it might have been better to have used an unsigned type.)
The notation line + 1 is equivalent to &line[1].
The code I've shown doesn't check that there is always one reference per sequence. I'll leave this as exercide to the reader.
For a beginner, this can be quite a lot to keep track of. For small text-processing tasks like yours, Python and Perl are definitely better suited.
Edit: The solution above won't work for long sequences; it is restricted to MAXLEN characters. But you don't need dynamic allocation if you only need the length, not the contents of the sequences.
Here's an updated version that doesn't read lines, but read characters instead. In '>' context, it stored the reference. Otherwise it just keeps a count:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h> /* for isspace() */
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
int nseq = 0; /* Sequence counter */
char ref[MAXLEN]; /* Reference name */
in = fopen(argv[1], "r");
out = fopen(argv[2], "w");
/* Snip: Argument and file checking as above */
while (1) {
int c = getc(in);
if (c == EOF) break;
if (c == '>') {
int n = 0;
c = fgetc(in);
while (c != EOF && c != '\n') {
if (n < sizeof(ref) - 1) ref[n++] = c;
c = fgetc(in);
}
ref[n] = '\0';
} else {
int len = 0;
int n = 0;
while (c != EOF && c != '\n') {
n++;
if (!isspace(c)) len = n;
c = fgetc(in);
}
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
Notes:
fgetc reads a single byte from a file and returns this byte or EOF when the file has ended. In this implementation, that's the only reading function used.
Storing a reference string is implemented via fgetc here too. You could probably use fgets after skipping the initial angle bracket, too.
The counting just reads bytes without storing them. n is the total count, len is the count up to the last non-space. (Your lines probably consist only of ACGT without any trailing space, so you could skip the test for space and use n instead of len.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
FILE *my_file = fopen(argv[1], "r");
FILE *my_output = fopen(argv[2], "w");
int total_sequence_coutns = 0;
char *sequence_name;
int dna_length;
char *line = NULL;
size_t size = 0;
while(-1 != getline(&line, &size, my_file)){
if(line[0] == '>'){
sequence_name = strdup(strtok(line, ">\n"));
total_sequence_coutns +=1;
continue;
}
dna_length = strlen(strtok(line, "\n"));
fprintf(my_output, "%s %d\n", sequence_name, dna_length);
free(sequence_name);
}
fprintf(my_output, "Total number of sequences = %d\n", total_sequence_coutns);
fclose(my_file);
fclose(my_output);
free(line);
return (0);
}
The code is supposed to read a user-inputted text file name, copy every character into a multidimensional array, then display it with standard output. It compiles, but produces unintelligible text. Am I missing something?
for (i = 0; i < BIGGEST; i++) {
for (j = 0; j < BIGGESTL; j++) {
if (fgetc(array, fp) ) != EOF)
array[i][j] = c;
else array[i][j] = '\0'
}
fclose(fp);
return 0;
}
You stop filling the array when you encounter EOF, but you print the full array out no matter what.
If the data read from the file is smaller than the input array, you will read that data in and then print that data out, plus whatever random characters were in the memory locations that you do not overwrite with data from the file.
Since the requirement seems to be to print text data, you could insert a special marker in the array (e.g. '\0') to indicate the position where you encountered EOF, and stop displaying data when you reach that marker.
You had better read each line from file
For example:
int i = 0;
while(fgets(text[i],1000,fp))
{
i++;
}
Though the question is edited and only part of the code is left in question. I am posting more than what is required for the question at the moment.
Reason being, there can be numberous improvements to originally posted full code.
In main() function:
You need to check for the argc value to be equal to 2 for your purpose and only then read in value of argv[1] . Else if program executed without the command-line-argument which is file_name in this case, invalid memory read occurs, resulting in segmentation fault if you read in argv[1].
In read_file_and_show_the contents() function:
Stop reading file if end of file is reached or maximum characters is read and store in the character array.
Below Program will help you visualize:
#include <stdio.h>
/*Max number of characters to be read/write from file*/
#define MAX_CHAR_FOR_FILE_OPERATION 1000000
int read_and_show_the_file(char *filename)
{
FILE *fp;
char text[MAX_CHAR_FOR_FILE_OPERATION];
int i;
fp = fopen(filename, "r");
if(fp == NULL)
{
printf("File Pointer is invalid\n");
return -1;
}
//Ensure array write starts from beginning
i = 0;
//Read over file contents until either EOF is reached or maximum characters is read and store in character array
while( (fgets(&text[i++],sizeof(char)+1,fp) != NULL) && (i<MAX_CHAR_FOR_FILE_OPERATION) ) ;
//Ensure array read starts from beginning
i = 0;
while((text[i] != '\0') && (i<MAX_CHAR_FOR_FILE_OPERATION) )
{
printf("%c",text[i++]);
}
fclose(fp);
return 0;
}
int main(int argc, char *argv[])
{
if(argc != 2)
{
printf("Execute the program along with file name to be read and printed. \n\
\rFormat : \"%s <file-name>\"\n",argv[0]);
return -1;
}
char *filename = argv[1];
if( (read_and_show_the_file(filename)) == 0)
{
printf("File Read and Print to stdout is successful\n");
}
return 0;
}