I'm currently working on this assignment and I'm stuck. The objective is to read a file and find if these char values exist in the String from the file. I have to compare a String from a file to another String I put in as an argument. However, just as long as each char value is in the String from the file then it "matches".
Example (input and output):
./a.out file1 done
done is in bonehead
done is not in doggie
Example (file1):
bonehead
doggie
As you can see the order in which is compares Strings does not matter and the file also follows one word per line. I've put together a program that finds if the char value is present in the other String but that is only part of the problem. Any idea how to go about this?
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv){
FILE *f = fopen(argv[1], "r");
char *line = NULL;
size_t len = 0;
ssize_t read;
char *word = argv[2];
if(argc != 3){
printf("./a.out <file> <word>\n");
exit(EXIT_SUCCESS);
}
if(f == NULL){
printf("file empty\n");
exit(EXIT_SUCCESS);
}
// confused what this loop does too
while((read = getline(&line, &len, f)) != -1){
char *c = line;
while(*c){
if(strchr(word, *c))
printf("can't spell \"%s\" without \"%s\"!\n", line, word);
else
printf("no \"%s\" in \"%s\".\n", word, line);
c++;
}
}
fclose(f);
exit(EXIT_SUCCESS);
}
Another approach would simply keep a sum of each character matched in the line read from the file, adding one for each unique character in the word supplied to test, and if the sum is equal to the length of the string made up by the unique characters is the search term, then each of the unique characters in the search term are included in the line read from the file.
#include <stdio.h>
#include <string.h>
#define MAXC 256
int main (int argc, char **argv) {
if (argc < 3 ) { /* validate required arguments */
fprintf (stderr, "error: insufficient input, usage: %s file string\n",
argv[0]);
return 1;
}
FILE *fp = fopen (argv[1], "r");
char line[MAXC] = "";
char *s = argv[2]; /* string holding search string */
size_t slen = strlen(s), sum = 0, ulen;
char uniq[slen+1]; /* unique characters in s */
if (!fp) { /* validate file open */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
memset (uniq, 0, slen+1); /* zero the VLA */
/* fill uniq with unique characters from s */
for (; *s; s++) if (!strchr (uniq, *s)) uniq[sum++] = *s;
ulen = strlen (uniq);
s = argv[2]; /* reset s */
while (fgets (line, MAXC, fp)) { /* for each line in file */
if (strlen (line) - 1 < ulen) { /* short line, continue */
printf ("%s is not in %s", s, line);
continue;
}
char *up = uniq; /* ptr to uniq */
sum = 0; /* reset sum */
while (*up) if (strchr (line, *up++)) sum++; /* count chars */
if (sum < ulen) /* validate sum */
printf ("%s is not in %s", s, line);
else
printf ("%s is in %s", s, line);
}
fclose (fp); /* close file */
return 0;
}
Example Use/Output
$ ./bin/strallcinc dat/words.txt done
done is in bonehead
done is not in doggie
which would work equally well for duplicate characters in the search string. e.g.
$ ./bin/strallcinc dat/words.txt doneddd
doneddd is in bonehead
doneddd is not in doggie
You can decide if you would handle duplicate characters differently, but you should make some determination on how that contingency will be addressed.
Let me know if you have any questions.
confused what this loop does
The while (read ... line obviously reads in lines from your file, placing them in the line variable
*c is a pointer to the start of the variable line and this pointer is incremented by c++, so that each letter in the word from the file is accessed. The while loop will be terminated when *c points to the null terminator (0).
The if (strchr(word ... line is testing if the test word contains one of the letters from the word in the file.
This seems to be the reverse of what you are trying to do - finding if all the letters in the test word can be found in the word from the file.
The printf lines are not sensible because there is no either/or - you need one line to print 'yes' our letters are present and one line to print 'no' at least one letter is not present.
The printf statements should be outside the comparison loop, so that you don't get multiple lines of output for each word. Add a flag to show if any letter does not exist in the word. Set flag to 1 at start, and only change it to 0 when a letter is not present, then use the flag to print one of the two outcome statements.
This code snippet may help
/* set flag to 'letters all present' */
int flag = 1;
/* set pointer c to start of input line */
c = word;
/* test word from file for each letter in test word */
while(*c) {
if(strchr(line, *c) == NULL) {
/* set flag to letter not present */
flag = 0;
break;
}
c++;
}
Related
I have a file that contain few different sections. All sections have a start section and end section lines to distinguish between sections.
How can I read lines from section-2?
>start Section-1
Some words are here.
>end Section-1
>start Section-2
Other words are also here.
>end Section-2
With my current code, all the file is printed (all sections except words separating sections). I understand the issue is that in my fgets I'm reading the file until #end Section-2 and I probably need another while loop to read lines from specific start section. But I'm not sure how can I change the code so it will only output words inside the section-2.
Expected output:
Other
words
are
also
here.
What I get now:
Some
words
are
here.
Other
words
are
also
here.
My code:
#define MAXSTR 1000
#define END ">end Section-2\n"
#define ENDWORD ">end"
#define STRWORD ">start"
#define SECTION "Section-2"
int main () {
FILE *file;
char lines[MAXSTR];
char delim[2] = " ";
char *words;
if ((file = fopen("sample.txt", "r")) == NULL) {
printf("File empty.\n");
return 0;
}
while (strcmp(fgets(lines, MAXSTR, file), END) != 0) {
words = strtok(lines, delim);
while (words != NULL && strcmp(words, STRWORD) != 0
&& strcmp(words, SECTION) != 0
&& strcmp(words, ENDWORD) != 0) {
printf("%s\n", words);
words = strtok(NULL, delim);
}
}
fclose(fileUrl);
return 0;
}
You are thinking along the correct lines. The key is to set a flag when you find the first "Section-X" to read and then while that flag is set, tokenize each line until the closing "Section-X" is found, at which time you exit your read-loop.
You can check for "Section-X" however you like, using the entire line, or just the "Section-X" identifier (which I chose below). To locate the "Section-X" text, just use strrchr() to find the last space in each line, and compare from the next character to the end of line for your section, e.g.
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (int argc, char **argv) {
if (argc < 2) { /* validate 1 arg givent for filename */
fprintf (stderr, "usage: %s file [\"Section-X\" (default: 2)]\n", argv[0]);
return 1;
}
const char *section = argc > 2 ? argv[2] : "Section-2", /* set section */
*delim = " ";
char line[MAXC];
int found = 0; /* found flag, 0-false, 1-true */
FILE *fp = fopen (argv[1], "r"); /* open file */
if (!fp) { /* validate file open for reading */
perror ("fopen-fp");
return 1;
}
while (fgets (line, MAXC, fp)) { /* read each line */
line[strcspn (line, "\n")] = 0; /* trim \n from end */
char *p = strrchr(line, ' '); /* pointer to last space */
if (p && strcmp (p + 1, section) == 0) { /* compare "Section-X" */
if (found++) /* check/set found flag */
break; /* break loop if 2nd "Section-X" */
continue;
}
if (found) { /* if found set, tokenize each line */
for (p = strtok (line, delim); p; p = strtok (NULL, delim))
puts (p);
}
}
}
Example Use/Output
With your input stored in the file dat/sections.txt and reading default "Section-2":
$ ./bin/read_sections dat/sections.txt
Other
words
are
also
here.
Reading "Section-1":
$ ./bin/read_sections dat/sections.txt "Section-1"
Some
words
are
here.
Look things over and let me know if you have questions.
I have a .txt file that contains data in this format:
xxxx: 0.9467,
yyyy: 0.9489,
zzzz: 0.78973,
hhhh: 0.8874,
yyyy: 0.64351,
xxxx: 0.8743,
and so on...
Let's say that my C program receives, as input, the string yyyy. The program should, simply, return all the instances of yyyy in the .txt file and the average of all their numerical values.
int main() {
FILE *filePTR;
char fileRow[100000];
if (fopen_s(&filePTR, "file.txt", "r") == 0) {
while (fgets(fileRow, sizeof fileRow, filePTR) != NULL) {
if (strstr(fileRow, "yyyy") != NULL) { // Input parameter
printf("%s", fileRow);
}
}
fclose(filePTR);
printf("\nEnd of the file.\n");
} else {
printf("ERROR! Impossible to read the file.");
}
return 0;
}
This is my code right now. I don't know how to:
Isolate the numerical values
actually convert them to double type
average them
I read something about the strtok function (just to start), but I would need some help...
You have started off on the right track and should be commended for using fgets() to read a complete line from the file on each iteration, but your choice of strstr does not ensure the prefix you are looking for is found at the beginning of the line.
Further, you want to avoid hardcoding your search string as well as the file to open. main() takes arguments through argc and argv that let you pass information into your program on startup. See: C11 Standard - ยง5.1.2.2.1 Program startup(p1). Using the parameters eliminates your need to hardcode values by letting you pass the filename to open and the prefix to search for as arguments to your program. (which also eliminates the need to recompile your code simply to read from another filename or search for another string)
For example, instead of hardcoding values, you can use the parameters to main() to open any file and search for any prefix simply using something similar to:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
...
At this point in your program, you have opened the file passed as the first argument and have validated that it is open for reading through the file-stream pointer fp. You have passed in the prefix to search for as the second argument, assigned it to the pointer str and have obtained the length of the prefix and have stored in in len.
Next you want to read each line from your file into buf, but instead of attempting to match the prefix with strstr(), you can use strncmp() with len to compare the beginning of the line read from your file. If the prefix is found, you can then use sscanf to parse the double value from the file and add it to sum and increment the number of values stored in n, e.g.
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
(note: above the assignment suppression operator for sscanf(), '*' allows you to read and discard the prefix and ':' without having to store the prefix in a second string)
All that remains is checking if values are contained in sum by checking your count n and if so, output the average for the prefix. Or, if n == 0 the prefix was not found in the file, e.g.:
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
That is basically all you need. With it, you can read from any file you like and search for any prefix simply passing the filename and prefix as the first two arguments to your program. The complete example would be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
Example Use/Output
Using your data file stored in dat/prefixdouble.txt, you can search for each prefix in the file and obtain the average, e.g.
$ ./bin/prefixaverage dat/prefixdouble.txt hhhh
prefix 'hhhh' avg: 0.8874
$ ./bin/prefixaverage dat/prefixdouble.txt xxxx
prefix 'xxxx' avg: 0.9105
$ ./bin/prefixaverage dat/prefixdouble.txt yyyy
prefix 'yyyy' avg: 0.7962
$ ./bin/prefixaverage dat/prefixdouble.txt zzzz
prefix 'zzzz' avg: 0.7897
$ ./bin/prefixaverage dat/prefixdouble.txt foo
prefix 'foo' -- not found in file.
Much easier than having to recompile each time you want to search for another prefix. Look things over and let me know if you have further questions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
this code works for single word counts and it differentiate between words with punctuation words with upper lower case. Is there an easy way around to make this code work for pairs as well instead of single words? like I need to print the occurrence of every pair of words in a text file.
Your help is much appreciated,
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE* f = fopen (argv[1], "r");
char buffer[10000];
if (argc != 2)
{
fprintf(stderr, "Usage: %s file\n", argv[0]);
}
fclose(f);
snprintf(buffer, sizeof(buffer), "tr -cs '[:punct:][a-z][A-Z]' '[\\n*]' < %s |"
" sort | uniq -c | sort -n", argv[1]);
return(system(buffer));
}
Example input
The Cat Sat On The Mat
Output
(The Cat, The Sat, The On, The The, The Mat, Cat The, Cat Sat, Cat On, for 30 pairs)
It seems inconceivable that the purpose of your assignment determining the frequency of word-pairs in a file would be to have you wrap a piped-string of shell utilities in a system call. What does that possibly teach you about C? That a system function exists that allows shell access? Well, it does, and you can, lesson done, nothing learned.
It seems far more likely that the intent was for you to understand the use of structures to hold collections of related data in a single object, or at the minimum array or pointer indexing to check for pairs in adjacent words within a file. Of the 2 normal approaches, use of a struct, or index arithmetic, the use of a struct is far more beneficial. Something simple to hold a pair of words and the frequency that pair is seen is all you need. e.g.:
enum { MAXC = 32, MAXP = 100 };
typedef struct {
char w1[MAXC];
char w2[MAXC];
size_t freq;
} wordpair;
(note, the enum simply defines the constants MAXC (32) and MAXP (100) for maximum characters per-word, and maximum pairs to record. You could use two #define statements to the same end)
You can declare an array of the wordpair struct which will hold a pair or words w1 and w2 and how many time that pair is seen in freq. The array of struct can be treated like any other array, sorted, etc..
To analyze the file, you simply need to read the first two words into the first struct, save a pointer to the second word, and then read each remaining word that remains in the file comparing whether the pair formed by the pointer and the new word read already exists (if so simply update the number of times seen), and if it doesn't exist, add a new pair updating the pointer to point to the new word read, and repeat.
Below is a short example that will check the pair occurrence for the words in all filenames given as arguments on the command line (e.g. ./progname file1 file2 ...). If no file is given, the code will read from stdin by default.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 32, MAXP = 100 };
typedef struct {
char w1[MAXC];
char w2[MAXC];
size_t freq;
} wordpair;
size_t get_pair_freq (wordpair *words, FILE *fp);
int compare (const void *a, const void *b);
int main (int argc, char **argv) {
/* initialize variables & open file or stdin for seening */
wordpair words[MAXP] = {{"", "", 0}};
size_t i, idx = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* read from file given, or from stdin (default) */
idx = get_pair_freq (words, stdin);
/* read each remaining file given on command line */
for (i = 2; i < (size_t)argc; i++)
{ if (fp && fp != stdin) { fclose (fp); fp = NULL; }
/* open file for reading */
if (!(fp = fopen (argv[i], "r"))) {
fprintf (stderr, "error: file open failed '%s'.\n",
argv[i]);
continue;
}
/* check 'idx' against MAXP */
if ((idx += get_pair_freq (words, fp)) == MAXP)
break;
}
if (fp && fp != stdin) fclose (fp);
/* sort words alphabetically */
qsort (words, idx, sizeof *words, compare);
/* output the frequency of word pairs */
printf ("\nthe occurrence of words pairs are:\n\n");
for (i = 0; i < idx; i++) {
char pair[MAXC * 2] = "";
sprintf (pair, "%s:%s", words[i].w1, words[i].w2);
printf (" %-32s : %zu\n", pair, words[i].freq);
}
return 0;
}
size_t get_pair_freq (wordpair *pairs, FILE *fp)
{
char w1[MAXC] = "", w2[MAXC] = "";
char *fmt1 = " %32[^ ,.\t\n]%*c";
char *fmt2 = " %32[^ ,.\t\n]%*[^A-Za-z0-9]%32[^ ,.\t\n]%*c";
char *w1p;
int nw = 0;
size_t i, idx = 0;
/* read 1st 2 words into pair, update index 'idx' */
if (idx == 0) {
if ((nw = fscanf (fp, fmt2, w1, w2)) == 2) {
strcpy (pairs[idx].w1, w1);
strcpy (pairs[idx].w2, w2);
pairs[idx].freq++;
w1p = pairs[idx].w2; /* save pointer to w2 for next w1 */
idx++;
}
else {
if (!nw) fprintf (stderr, "error: file read error.\n");
return idx;
}
}
/* read each word in file into w2 */
while (fscanf (fp, fmt1, w2) == 1) {
/* check against all pairs in struct */
for (i = 0; i < idx; i++) {
/* check if pair already exists */
if (strcmp (pairs[i].w1, w1p) == 0 &&
strcmp (pairs[i].w2, w2) == 0) {
pairs[i].freq++; /* update frequency for pair */
goto skipdup; /* skip adding duplicate pair */
}
} /* add new pair, update pairs[*idx].freq */
strcpy (pairs[idx].w1, w1p);
strcpy (pairs[idx].w2, w2);
pairs[idx].freq++;
w1p = pairs[idx].w2;
idx++;
skipdup:
if (idx == MAXP) { /* check 'idx' against MAXP */
fprintf (stderr, "warning: MAXP words exceeded.\n");
break;
}
}
return idx;
}
/* qsort compare funciton */
int compare (const void *a, const void *b)
{
return (strcmp (((wordpair *)a)->w1, ((wordpair *)b)->w1));
}
Use/Output
Given your example of "Hi how are you are you.", it produces the desired results (in sorted order according to your LOCALE).
$ echo "Hi how are you are you." | ./bin/file_word_pairs
the occurrence of words pairs are:
Hi:how : 1
are:you : 2
how:are : 1
you:are : 1
(there is no requirement that you sort the results, but it makes lookup/confirmation a lot easier with longer files)
Removing qsort
$ echo "Hi how are you are you." | ./bin/file_word_pairs
the occurrence of words pairs are:
Hi:how : 1
how:are : 1
are:you : 2
you:are : 1
While you are free to attempt to use your system version, why not take the time to learn how to approach the problem in C. If you want to learn how to do it through a system call, take a Linux course, as doing it in that manner has very little to do with C.
Look it over, lookup the functions that are new to you in the man pages and then ask about anything you don't understand thereafter.
I am a biology student and I am trying to learn perl, python and C and also use the scripts in my work. So, I have a file as follows:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this, that is the name of each sequence and the count of characters in each line and printing the total number of sequences in the end of the file.
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
I could make the perl and python scripts work, this is the python script as an example:
#!/usr/bin/python
import sys
my_file = open(sys.argv[1]) #open the file
my_output = open(sys.argv[2], "w") #open output file
total_sequence_counts = 0
for line in my_file:
if line.startswith(">"):
sequence_name = line.rstrip('\n').replace(">","")
total_sequence_counts += 1
continue
dna_length = len(line.rstrip('\n'))
my_output.write(sequence_name + " " + str(dna_length) + '\n')
my_output.write("Total number of sequences = " + str(total_sequence_counts) + '\n')
Now, I want to write the same script in C, this is what I have achieved so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
input = FILE *fopen(const char *filename, "r");
output = FILE *fopen(const char *filename, "w");
double total_sequence_counts = 0;
char sequence_name[];
char line [4095]; // set a temporary line length
char buffer = (char *) malloc (sizeof(line) +1); // allocate some memory
while (fgets(line, sizeof(line), filename) != NULL) { // read until new line character is not found in line
buffer = realloc(*buffer, strlen(line) + strlen(buffer) + 1); // realloc buffer to adjust buffer size
if (buffer == NULL) { // print error message if memory allocation fails
printf("\n Memory error");
return 0;
}
if (line[0] == ">") {
sequence_name = strcpy(sequence_name, &line[1]);
total_sequence_counts += 1
}
else {
double length = strlen(line);
fprintf(output, "%s \t %ld", sequence_name, length);
}
fprintf(output, "%s \t %ld", "Total number of sequences = ", total_sequence_counts);
}
int fclose(FILE *input); // when you are done working with a file, you should close it using this function.
return 0;
int fclose(FILE *output);
return 0;
}
But this code, of course is full of mistakes, my problem is that despite studying a lot, I still can't properly understand and use the memory allocation and pointers so I know I especially have mistakes in that part. It would be great if you could comment on my code and see how it can turn into a script that actually work. By the way, in my actual data, the length of each line is not defined so I need to use malloc and realloc for that purpose.
For a simple program like this, where you look at short lines one at a time, you shouldn't worry about dynamic memory allocation. It is probably good enough to use local buffers of a reasonable size.
Another thing is that C isn't particularly suited for quick-and-dirty string processing. For example, there isn't a strstrip function in the standard library. You usually end up implementing such behaviour yourself.
An example implementation looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
char line[MAXLEN]; /* Current line buffer */
char ref[MAXLEN] = ""; /* Sequence reference buffer */
int nseq = 0; /* Sequence counter */
if (argc != 3) {
fprintf(stderr, "Usage: %s infile outfile\n", argv[0]);
exit(1);
}
in = fopen(argv[1], "r");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s.\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "w");
if (in == NULL) {
fprintf(stderr, "Couldn't open %s for writing.\n", argv[2]);
exit(1);
}
while (fgets(line, sizeof(line), in)) {
int len = strlen(line);
/* Strip whitespace from end */
while (len > 0 && isspace(line[len - 1])) len--;
line[len] = '\0';
if (line[0] == '>') {
/* First char is '>': copy from second char in line */
strcpy(ref, line + 1);
} else {
/* Other lines are sequences */
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
A lot of code is about enforcing arguments and opening and closing files. (You could cut out a lot of code if you used stdin and stdout with file redirections.)
The core is the big while loop. Things to note:
fgets returns NULL on error or when the end of file is reached.
The first lines determine the length of the line and then remove white-space from the end.
It is not enough to decrement length, at the end the stripped string must be terminated with the null character '\0'
When you check the first character in the line, you should check against a char, not a string. In C, single and double quotes are not interchangeable. ">" is a string literal of two characters, '>' and the terminating '\0'.
When dealing with countable entities like chars in a string, use integer types, not floating-point numbers. (I've used (signed) int here, but because there can't be a negative number of chars in a line, it might have been better to have used an unsigned type.)
The notation line + 1 is equivalent to &line[1].
The code I've shown doesn't check that there is always one reference per sequence. I'll leave this as exercide to the reader.
For a beginner, this can be quite a lot to keep track of. For small text-processing tasks like yours, Python and Perl are definitely better suited.
Edit: The solution above won't work for long sequences; it is restricted to MAXLEN characters. But you don't need dynamic allocation if you only need the length, not the contents of the sequences.
Here's an updated version that doesn't read lines, but read characters instead. In '>' context, it stored the reference. Otherwise it just keeps a count:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h> /* for isspace() */
#define MAXLEN 80 /* Maximum line length, including null terminator */
int main(int argc, char *argv[])
{
FILE *in;
FILE *out;
int nseq = 0; /* Sequence counter */
char ref[MAXLEN]; /* Reference name */
in = fopen(argv[1], "r");
out = fopen(argv[2], "w");
/* Snip: Argument and file checking as above */
while (1) {
int c = getc(in);
if (c == EOF) break;
if (c == '>') {
int n = 0;
c = fgetc(in);
while (c != EOF && c != '\n') {
if (n < sizeof(ref) - 1) ref[n++] = c;
c = fgetc(in);
}
ref[n] = '\0';
} else {
int len = 0;
int n = 0;
while (c != EOF && c != '\n') {
n++;
if (!isspace(c)) len = n;
c = fgetc(in);
}
fprintf(out, "%s: %d\n", ref, len);
nseq++;
}
}
fprintf(out, "Total number of sequences. %d\n", nseq);
fclose(in);
fclose(out);
return 0;
}
Notes:
fgetc reads a single byte from a file and returns this byte or EOF when the file has ended. In this implementation, that's the only reading function used.
Storing a reference string is implemented via fgetc here too. You could probably use fgets after skipping the initial angle bracket, too.
The counting just reads bytes without storing them. n is the total count, len is the count up to the last non-space. (Your lines probably consist only of ACGT without any trailing space, so you could skip the test for space and use n instead of len.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
FILE *my_file = fopen(argv[1], "r");
FILE *my_output = fopen(argv[2], "w");
int total_sequence_coutns = 0;
char *sequence_name;
int dna_length;
char *line = NULL;
size_t size = 0;
while(-1 != getline(&line, &size, my_file)){
if(line[0] == '>'){
sequence_name = strdup(strtok(line, ">\n"));
total_sequence_coutns +=1;
continue;
}
dna_length = strlen(strtok(line, "\n"));
fprintf(my_output, "%s %d\n", sequence_name, dna_length);
free(sequence_name);
}
fprintf(my_output, "Total number of sequences = %d\n", total_sequence_coutns);
fclose(my_file);
fclose(my_output);
free(line);
return (0);
}
I'm trying to write a program to swap a character that I would specify on the command line (a command line argument) with a character in the input text file. The first command line argument is the character I want to change, the second argument is character that I want to replace the old character with, and the third argument is the input file.
When I do this, my program should generate an output file named: "translation.txt". I know that the problem with my program is in the "if" statements/the fprintf statements, but I'm not sure how to fix this. I was thinking of reading each character in the input file separately, and from there, I wanted to use "if" statements to determine whether or not to replace the character.
void replace_character(int arg_list, char *arguments[])
{
FILE *input, *output;
input = fopen(arguments[3], "r");
output = fopen("translation.txt", "w");
if (input == NULL)
{
perror("Error: file cannot be opened\n");
}
for (int i = 0; i != EOF; i++)
{
if (input[i] == arguments[1])
{
fprintf(output, "%c\n", arguments[2]);
}
else
{
fprintf(output, "%c\n", arguments[1]);
}
}
}
int main(int argc, char *argv[])
{
if (argc < 5)
{
perror("Error!\n");
}
replace_character(argc, argv);
}
Okay I think this can help:
#include <stdio.h>
int main(int argc, char** argv)
{
if (argc < 4) return -1; /* quit if argument list not there */
FILE* handle = fopen(argv[3], "r+"); /* open the file for reading and updating */
if (handle == NULL) return -1; /* if file not found quit */
char current_char = 0;
char to_replace = argv[1][0]; /* get the character to be replaced */
char replacement = argv[2][0]; /* get the replacing character */
while ((current_char = fgetc(handle)) != EOF) /* while it's not the end-of-file */
{ /* read a character at a time */
if (current_char == to_replace) /* if we've found our character */
{
fseek(handle, ftell(handle) - 1, SEEK_SET); /* set the position of the stream
one character back, this is done by
getting the current position using
ftell, subtracting one from it and
using fseek to set a new position */
fprintf(handle, "%c", replacement); /* write the new character at the new position */
}
}
fclose(handle); /* it's important to close the file_handle
when you're done with it to avoid memory leaks */
return 0;
}
Given an input specified as the first argument, it will seek a character to replace and then replace it with what is stored in replacement. Give it a try and let me know if it doesn't work. I run it like this:
./a.out l a input_trans.txt
My file has just the string 'Hello, World!'. After running this it's changed to 'Heaao, Worad!'.
Read up on ftell and fseek, as they're key here for what you need to do.
EDIT: Forgot to add an fclose statement that closes the file handle at the end of the program. Fixed!