I have a file of DNA sequences and an associated IDs and I'm trying to save the even lines (the IDs) to one array and the odd lines (the sequences) to another. Then I want to compare all of the sequences with each other to find the unique sequences. For example is Seq A is AGTCGAT and Seq B is TCG, Seq B is not unique. I want to save the unique sequences and their IDs to an output file and id the sequences are not unique, only save the ID to the output file and print "Deleting sequence with ID: " to the console. I'm pretty much done but Im running into a few problems. I tried printing out the two separate arrays, sequences[] and headers[], but for some reason, they only contain two out of the 5 strings (the file has 5 IDs and 5 headers). And then the information isn't printing out to the screen the way it's supposed to.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
int total_seq = 20000;
char seq[900];
char** headers;
char** sequences;
int sequence_size = 0;
headers = malloc(total_seq * sizeof(char*));
sequences = malloc(total_seq * sizeof(char*));
int index;
for(index = 0; index < total_seq; index++){
headers[index] = malloc(900 * sizeof(char));
sequences[index] = malloc(900 * sizeof(char));
}
FILE *dna_file;
FILE *new_file;
dna_file = fopen("inabc.fasta", "r");
new_file = fopen("output.fasta", "w");
if (dna_file == NULL){
printf("Error");
return 0;
}
int i = 0;
int j = 0;
while(fgets(seq, sizeof seq, dna_file)){
if(i%2 == 0){
strcpy(headers[i/2], seq);
i++;
}
else{
strcpy(sequences[i/2], seq);
i++;
}
}
fclose(dna_file);
sequence_size = i/2;
char* result;
for(i=0; i < sequence_size; i++){
for(j=0; j < sequence_size; j++){
if(i==j){
continue;
}
result = strstr(sequences[j], sequences[i]);
if(result== NULL){
fprintf(new_file,"%s", headers[i]);
fprintf(new_file,"%s", sequences[i]);
}
else{
printf("Deleting sequence with id: %s \n", headers[i]);
printf(sequences[i]);
fprintf(new_file,"%s", headers[i]);
}
}
}
The sample file inabc.fasta is short but the actual file I'm working with is very long, which is why I've used malloc. Any help would be appreciated!
EDIT: The sample input file inabc.fasta:
cat inabc.fasta
> id1 header1
abcd
> id2 header2
deghj
> id3 header3
defghijkabcd
> id4 header4
abcd
> id5 header5
xcvbnnmlll
So for this sample, sequences 1 and 4 will not be saved to the output file
This:
while( fgets(seq, sizeof seq, dna_file) ) {
if( i % 2 == 0 ){
strcpy(headers[i], seq);
i++;
}
else {
strcpy(sequences[i-1], seq);
i++;
}
}
is going to skip every other element in your arrays:
When i == 0, it'll store in headers[0]
When i == 1, it'll store in sequences[0]
When i == 2, it'll store in headers[2]
When i == 3, it'll store in sequences[2]
and so on.
Then you do:
sequence_size = i/2;
so if you loop sequence_size times, you'll only make it half way through the piece of the array you've written to, and every other element you print will be uninitialized. This is why you're only printing half the elements (if you have 5 elements, then i / 2 == 2, and you'll only see 2), and why it "isn't printing out to the screen the way it's supposed to".
You'll be better off just using either two separate counters when you read in your input, and a separate variable to store whether you're on an odd or even line of input.
For instance:
int i = 0, j = 0, even = 1;
while( fgets(seq, sizeof seq, dna_file) ) {
if( even ){
strcpy(headers[i++], seq);
even = 0;
}
else {
strcpy(sequences[j++], seq);
even = 1;
}
}
Here it's better to have two variables, since if you read in an odd number of lines, your two arrays will contain different numbers of read elements.
In addition to the other comments, there are a few logic errors in your output routine you need to correct. Below, I have left your code in comments, so you can follow the changes and additions made.
There are several ways you can approach memory management a bit more efficiently, and provide a way to cleanly iterate over your data without tracking counters throughout your code. Specifically, when you allocate your array of pointers-to-pointer-to-char, use calloc instead of malloc so that your pointers are initialized to zero/NULL. This allows you to easily iterate over only those pointers that have been assigned.
There is no need to allocate 20000 900 char arrays (times 2) before reading your data. Allocate your pointers (or start with a smaller number of pointers say 256 and realloc as needed), then simply allocate for each element in headers and sequences as needed within your read loop. Further, instead of allocating 1800 chars (900 * 2) every time you add an element to headers and sequences, just allocate the memory required to hold the data. This can make a huge difference. For example, you allocate 20000 * 900 * 2 = 36000000 bytes (36M) before you start reading this small set of sample data. Even allocating all 20000 pointers, allocating memory as needed for this sample data, limits memory usage to 321,246 bytes (less that 1% of 36M)
The logic in your write loop will not work. You must move your write of the data outside of the inner loop. Otherwise you have no way of testing whether to Delete a duplicate entry. Further testing result does not provide a way to skip duplicates. result changes with every iteration of the inner loop. You need to both test and set a flag that will control whether or not to delete the duplicate once you leave the inner loop.
Finally, since you are allocating memory dynamically, you are responsible for tracking the memory allocated and freeing the memory when no longer needed. Allocating your array of pointers with calloc makes freeing the memory in use a snap.
Take a look at the changes and additions I'm made to your code. Understand the changes and let me know if you have any questions. Note: there are many checks omitted for sake of not cluttering the code. You should at minimum make sure you do not exceed the 20000 pointers allocated when run on a full dataset, and realloc as needed. You should also check that strdup succeeded (it's allocating memory), although you get some assurance comparing the headers and sequences index count. I'm sure there are many more that make sense. Good luck.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXSEQ 20000
#define SZSEQ 900
int main ()
{
int total_seq = MAXSEQ; /* initialize all variables */
char seq[SZSEQ] = {0};
char **headers = NULL; /* traditionally variables */
char **sequences = NULL; /* declared at beginning */
// char *result = NULL;
// int sequence_size = 0;
size_t len = 0;
int hidx = 0;
int sidx = 0;
// int idx = 0; /* (see alternative in fgets loop) */
int i = 0;
int j = 0;
int del = 0;
/* calloc initilizes to 0 & allows iteration on addresses */
headers = calloc (total_seq, sizeof (*headers));
sequences = calloc (total_seq, sizeof (*sequences));
/* allocate as needed if possible - see read loop */
// for (index = 0; index < total_seq; index++) {
// headers[index] = malloc (900 * sizeof (char));
// sequences[index] = malloc (900 * sizeof (char));
// }
FILE *dna_file = NULL;
FILE *new_file = NULL;
dna_file = fopen ("inabc.fasta", "r");
new_file = fopen ("output.fasta", "w+"); /* create if not existing "w+" */
if (!dna_file || !new_file) {
fprintf (stderr, "Error: file open failed.\n");
return 1; /* 1 indicates error condition */
}
while (fgets (seq, sizeof (seq), dna_file)) /* read dna_file & separate */
{
len = strlen (seq); /* strip newline from seq end */
if (seq[len-1] == '\n') /* it's never good to leave \n */
seq[--len] = 0; /* scattered through your data */
/* if header line has '>' as first char -- use it! */
if (*seq == '>')
headers[hidx++] = strdup (seq); /* strdup allocates */
else
sequences[sidx++] = strdup (seq);
/* alternative using counter if no '>' */
// if (idx % 2 == 0)
// headers[hidx++] = strdup (seq);
// else
// sequences[sidx++] = strdup (seq);
// idx++
}
fclose (dna_file);
if (hidx != sidx)
fprintf (stderr, "warning: hidx:sidx (%d:%d) differ.\n", hidx, sidx);
// sequence_size = (hidx>sidx) ? sidx : hidx; /* protect against unequal read */
//
// for (i = 0; i < sequence_size; i++) {
// for (j = 0; i < sequence_size; i++) {
// if (i == j) {
// continue;
// }
// result = strstr (sequences[j], sequences[i]);
// if (result == NULL) {
// fprintf (new_file, "%s", headers[i]);
// fprintf (new_file, "%s", sequences[i]);
// } else {
// printf ("Deleting sequence with id: %s \n", headers[i]);
// printf (sequences[i]);
// fprintf (new_file, "%s", headers[i]);
// }
// }
// }
/* by using calloc, all pointers except those assigned are NULL */
while (sequences[i]) /* testing while (sequences[i] != NULL) */
{
j = 0;
del = 0;
while (sequences[j])
{
if (i == j)
{
j++;
continue;
}
if (strstr (sequences[j], sequences[i])) /* set delete flag */
{
del = 1;
break;
}
j++;
}
if (del) {
printf ("Deleting id: '%s' with seq: '%s' \n", headers[i], sequences[i]);
// printf (sequences[i]);
fprintf (new_file, "%s\n", headers[i]);
} else {
fprintf (new_file, "%s\n", headers[i]);
fprintf (new_file, "%s\n", sequences[i]);
}
i++;
}
fclose (new_file);
/* free allocated memory - same simple iteration */
i = 0;
while (headers[i])
free (headers[i++]); /* free strings allocated by strdup */
if (headers) free (headers); /* free the array of pointers */
i = 0;
while (sequences[i])
free (sequences[i++]);
if (sequences) free (sequences);
return 0;
}
output:
$ ./bin/dnaio
Deleting id: '> id1 header1' with seq: 'abcd'
Deleting id: '> id4 header4' with seq: 'abcd'
output.fasta:
$ cat output.fasta
> id1 header1
> id2 header2
deghj
> id3 header3
defghijkabcd
> id4 header4
> id5 header5
xcvbnnmlll
memory allocation/free verification:
==21608== HEAP SUMMARY:
==21608== in use at exit: 0 bytes in 0 blocks
==21608== total heap usage: 14 allocs, 14 frees, 321,246 bytes allocated
==21608==
==21608== All heap blocks were freed -- no leaks are possible
Related
I need to read input from a file, then split the word in capitals from it's definition. My trouble being that I need multiple lines from the file to be in one variable to pass it to another function.
The file I want to read from looks like this
ACHROMATIC. An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon, are
partially corrected. (See APLANATIC.)
ACHRONICAL. An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
ACROSS THE TIDE. A ship riding across tide, with the wind in the
direction of the tide, would tend to leeward of her anchor; but with a
weather tide, or that running against the wind, if the tide be strong,
would tend to windward. A ship under sail should prefer the tack that
stems the tide, with the wind across the stream, when the anchor is
let go.
Right now my code splits the word from the rest, but I'm having difficulty getting the rest of the input into one variable.
while(fgets(line, sizeof(line), mFile) != NULL){
if (strlen(line) != 2){
if (isupper(line[0]) && isupper(line[1])){
word = strtok(line, ".");
temp = strtok(NULL, "\n");
len = strlen(temp);
for (i=0; i < len; i++){
*(defn+i) = *(temp+i);
}
printf("Word: %s\n", word);
}
else{
temp = strtok(line, "\n");
for (i=len; i < strlen(temp) + len; i++);
*(defn+i) = *(temp+i-len);
len = len + strlen(temp);
//printf(" %s\n", temp);
}
}
else{
len = 0;
printf("%s\n", defn);
index = 0;
}
}
like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
//another function
void func(char *word, char *defs){
printf("<%s>\n", word);
if(defs){
printf("%s", defs);
}
}
int main(void){
char buffer[4096], *curr = buffer;
size_t len, buf_size = sizeof buffer;
FILE *fp = fopen("dic.txt", "r");
while(fgets(curr, buf_size, fp)){
//check definition line
if(*curr == '\n' || !isupper(*curr)){
continue;//printf("invalid format\n");
}
len = strlen(curr);
curr += len;
buf_size -= len;
//read rest line
while(1){
curr = fgets(curr, buf_size, fp);
if(!curr || *curr == '\n'){//upto EOF or blank line
char *word, *defs;
char *p = strchr(buffer, '.');
if(p)
*p++ = 0;
word = buffer;
defs = p;
func(word, defs);
break;
}
len = strlen(curr);
curr += len;
buf_size -= len;
assert(buf_size >= 2 || (fprintf(stderr, "small buffer\n"), 0));
}
curr = buffer;
buf_size = sizeof buffer;
}
fclose(fp);
return 0;
}
It appears you need to first pull a string of uppercase letters from the beginning of the line, up to the first period, then concatenate the remainder of that line with subsequent lines until a blank line is found. Lather, rinse, repeat as needed.
While this task would be MUCH easier in Perl, if you need to do it in C, for one thing I recommend using the built-in string functions instead of constructing your own for-loops to copy the data. Perhaps something like the following:
while(fgets(line, sizeof(line), mFile) != NULL) {
if (strlen(line) > 2) {
if (isupper(line[0]) && isupper(line[1])) {
word = strtok(line, ".");
strcpy(defn,strtok(NULL, "\n"));
printf("Word: %s\n", word);
} else {
strcat(defn,strtok(line, "\n"));
}
} else {
printf("%s\n", defn);
defn[0] = 0;
}
}
When I put this in a properly structured C program, with appropriate include files, it works fine. I personally would have approached the problem differently, but hopefully this gets you going.
There are several areas that can be addressed. Given your example input and description, it appears your goal is to develop a function that will read and separate each word (or phrase) and associated definition, return a pointer to the collection of words/definitions, while also updating a pointer to the number of words and definitions read so that number is available back in the calling function (main here).
While your data suggests that the word and definition are both contained within a single line of text with the word (or phrase written in all upper-case), it is unclear whether you will have to address the case where the definition can span multiple lines (essentially causing you to potentially read multiple lines and combine them to form the complete definition.
Whenever you need to maintain relationships between multiple variables within a single object, then a struct is a good choice for the base data object. Using an array of struct allows you access to each word and its associated definition once all have been read into memory. Now your example has 3 words and definitions. (each separated by a '\n'). Creating an array of 3 struct to hold the data is trivial, but when reading data, like a dictionary, you rarely know exactly how many words you will have to read.
To handle this situation, a dynamic array of structs is a proper data structure. You essentially allocate space for some reasonable number of words/definitions, and then if you reach that limit, you simply realloc the array containing your data, update your limit to reflect the new size allocated, and continue on.
While you can use strtok to separate the word (or phrase) by looking for the first '.', that is a bit of an overkill. You will need to traverse over each char anyway to check if they are all caps anyway, you may as well just iterate until you find the '.' and use the number for that character index to store your word and set a pointer to the next char after the '.'. You will begin looking for the start of the definition from there (you basically want to skip any character that is not an [a-zA-Z]). Once you locate the beginning of the definition, you can simply get the length of the rest of the line, and copy that as the definition (or the first part of it if the definition is contained in multiple-separate lines).
After the file is read and the pointer returned and the pointer for the number of words updated, you can then use the array of structs back in main as you like. Once you are done using the information, you should free all the memory you have allocated.
Since the size of the maximum word or phrase is generally know, the struct used provides static storage for the word. Give the definitions can vary wildly in length and are much longer, the struct simply contains a pointer-to-char*. So you will have to allocate storage for each struct, and then allocates storage for each definition within each struct.
The following code does just that. It will take the filename to read as the first argument (or it will read from stdin by default if no filename is given). The code the output the words and definitions on single lines. The code is heavily commented to help you follow along and explain the logic e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum {MAXW = 64, NDEF = 128};
typedef struct { /* struct holding words/definitions */
char word[MAXW],
*def; /* you must allocate space for def */
} defn;
defn *readdict (FILE *fp, size_t *n);
int main (int argc, char **argv) {
defn *defs = NULL;
size_t n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!(defs = readdict (fp, &n))) { /* read words/defs into defs */
fprintf (stderr, "readdict() error: no words read from file.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) {
printf ("\nword: %s\n\ndefinition: %s\n", defs[i].word, defs[i].def);
free (defs[i].def); /* free allocated definitions */
}
free (defs); /* free array of structs */
return 0;
}
/** read word and associated definition from open file stream 'fp'
* into dynamic array of struct, updating pointer 'n' to contain
* the total number of defn structs filled.
*/
defn *readdict (FILE *fp, size_t *n)
{
defn *defs = NULL; /* pointer to array of structs */
char buf[BUFSIZ] = ""; /* buffer to hold each line read */
size_t max = NDEF, haveword = 0, offset = 0; /* allocated size & flags */
/* allocate, initialize & validate memory to hold 'max' structs */
if (!(defs = calloc (max, sizeof *defs))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return NULL;
}
while (fgets (buf, BUFSIZ, fp)) /* read each line of input */
{
if (*buf == '\n') { /* check for blank line */
if (haveword) (*n)++; /* if word/def already read, increment n */
haveword = 0; /* reset haveword flag */
if (*n == max) {
void *tmp = NULL; /* tmp ptr to realloc defs */
if (!(tmp = realloc (defs, sizeof *defs * (max + NDEF)))) {
fprintf (stderr, "error: memory exhaused, realloc defs.\n");
break;
}
defs = tmp; /* assign new block to defs */
memset (defs + max, 0, NDEF * sizeof *defs); /* zero new mem */
max += NDEF; /* update max with current allocation size */
}
continue; /* get next line */
}
if (haveword) { /* word already stored in defs[n].word */
void *tmp = NULL; /* tmp pointer to realloc */
size_t dlen = strlen (buf); /* get line/buf length */
if (buf[dlen - 1] == '\n') /* trim '\n' from end */
buf[--dlen] = 0; /* realloc & validate */
if (!(tmp = realloc (defs[*n].def, offset + dlen + 2))) {
fprintf (stderr,
"error: memory exhaused, realloc defs[%zu].def.\n", *n);
break;
}
defs[*n].def = tmp; /* assign new block, fill with definition */
sprintf (defs[*n].def + offset, offset ? " %s" : "%s", buf);
offset += dlen + 1; /* update offset for rest (if required) */
}
else { /* no current word being defined */
char *p = NULL;
size_t i;
for (i = 0; buf[i] && i < MAXW; i++) { /* check first MAXW chars */
if (buf[i] == '.') { /* if a '.' is found, end of word */
size_t dlen = 0;
if (i + 1 == MAXW) { /* check one char available for '\0' */
fprintf (stderr,
"error: 'word' exceeds MAXW, skipping.\n");
goto next;
}
strncpy (defs[*n].word, buf, i); /* copy i chars to .word */
haveword = 1; /* set haveword flag */
p = buf + i + 1; /* set p to next char in buf after '.' */
while (*p && (*p == ' ' || *p < 'A' || /* find def start */
('Z' < *p && *p < 'a') || 'z' < *p))
p++; /* increment p and check again */
if ((dlen = strlen (p))) { /* get definition length */
if (p[dlen - 1] == '\n') /* trim trailing '\n' */
p[--dlen] = 0;
if (!(defs[*n].def = malloc (dlen + 1))) { /* allocate */
fprintf (stderr,
"error: virtual memory exhausted.\n");
goto done; /* bail if allocation failed */
}
strcpy (defs[*n].def, p); /* copy definition to .def */
offset = dlen; /* set offset in .def buf to be */
} /* used if def continues on a */
break; /* new or separae line */
} /* check word is all upper-case or a ' ' */
else if (buf[i] != ' ' && (buf[i] < 'A' || 'Z' < buf[i]))
break;
}
}
next:;
}
done:;
if (haveword) (*n)++; /* account for last word/definition */
return defs; /* return pointer to array of struct */
}
Example Use/Output
$ ./bin/dict_read <dat/dict.txt
word: ACHROMATIC
definition: An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon,
are partially corrected. (See APLANATIC.)
word: ACHRONICAL
definition: An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
word: ACROSS THE TIDE
definition: A ship riding across tide, with the wind in the direction
of the tide, would tend to leeward of her anchor; but with a weather tide,
or that running against the wind, if the tide be strong, would tend to
windward. A ship under sail should prefer the tack that stems the tide,
with the wind across the stream, when the anchor is let go.
(line breaks were manually inserted to keep the results tidy here).
Memory Use/Error Check
You should also run any code that dynamically allocates memory though a memory use and error checking program like valgrind on linux. Just run the code though it and confirm you free all memory you allocate and that there are no memory errors, e.g.
$ valgrind ./bin/dict_read <dat/dict.txt
==31380== Memcheck, a memory error detector
==31380== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31380== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31380== Command: ./bin/dict_read
==31380==
word: ACHROMATIC
<snip output>
==31380==
==31380== HEAP SUMMARY:
==31380== in use at exit: 0 bytes in 0 blocks
==31380== total heap usage: 4 allocs, 4 frees, 9,811 bytes allocated
==31380==
==31380== All heap blocks were freed -- no leaks are possible
==31380==
==31380== For counts of detected and suppressed errors, rerun with: -v
==31380== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Look things over and let me know if you have further questions.
Expanding on my a previous exercise, I have a text file that is filled with one word per line.
hello
hi
hello
bonjour
bonjour
hello
As I read these words from the file I would like to compare them to an array of struct pointers (created from the text file). If the word does not exist within the array, the word should be stored into a struct pointer with a count of 1. If the word already exist in the array the count should increase by 1. I will write the outcome into a new file (that already exist).
hello = 3
hi = 1
bonjour = 2
this is my code
#include <stdio.h>
#include <stdlib.h>
struct wordfreq{
int count;
char *word;
};
int main(int argc, char * argv[]) {
struct wordfreq *words[1000] = {NULL};
int i, j, f = 0;
for(i=0; i <1000; i++)
words[i] = (struct wordfreq*)malloc(sizeof(struct wordfreq));
FILE *input = fopen(argv[1], "r");
FILE *output = fopen(argv[2], "w");
if(input == NULL){
printf("Error! Can't open file.\n");
exit(0);
}
char str[20];
i=0;
while(fscanf(input, "%s[^\n]", &str) ==1){
//fprintf(output, "%s:\n", str);
for(j=0; j<i; j++){
//fprintf(output, "\t%s == %s\n", str, words[j] -> word);
if(str == words[j]->word){
words[j] ->count ++;
f = 1;
}
}
if(f==0){
words[i]->word = str;
words[i]->count = 1;
}
//fprintf(output, "\t%s = %d\n", words[i]->word, words[i]->count);
i++;
}
for(j=0; j< i; j++)
fprintf(output, "%s = %d\n", words[j]->word, words[j]->count);
for(i=0; i<1000; i++){
free(words[i]);
}
return 0;
}
I used several fprintf statements to look at my values and I can see that while str is right, when I reach the line to compare str to the other array struct pointers (str == words[I]->word) during the transversal words[0] -> word is always the same as str and the rest of the words[i]->words are (null). I am still trying to completely understand mixing pointes and structs, with that said any thoughts, comments, complains?
You may be making things a bit harder than necessary, and you are certainly allocating 997 more structures than necessary in the case of your input file. There is no need to allocate all 1000 structs up front. (you are free to do so, it's just a memory management issue). The key is that you only need allocate a new struct each time a unique word is encountered. (in the case of your data file, 3-times). For all other cases, you are simply updating count to add the occurrence for a word you have already stored.
Also, if there is no compelling reason to use a struct, it is just as easy to use an array of pointers-to-char as your pointers to each word, and then a simple array of int [1000] as your count (or frequency) array. Your choice. In the case of two arrays, you only need to allocate for each unique word and never need a separate allocation for each struct.
Putting those pieces together, you could reduce your code (not including the file -- which can be handled by simple redirection) to the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
struct wordfreq{
int count;
char *word;
};
int main (void) {
struct wordfreq *words[MAXW] = {0};
char tmp[MAXC] = "";
int n = 0;
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; i < n; i++) /* check against exising words */
if (strcmp (words[i]->word, tmp) == 0) /* if exists, break */
break;
if (i < n) { /* if exists */
words[i]->count++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate struct and
* allocate storage for word (+ space for nul-byte)
*/
words[n] = malloc (sizeof *words[n]);
words[n]->word = malloc (strlen (tmp) + 1);
if (!words[n] || !words[n]->word) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].\n", n);
break;
}
words[n]->count = 0; /* initialize count */
strcpy (words[n]->word, tmp); /* copy new word to words[n] */
words[n]->count++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d\n", words[i]->word, words[i]->count);
free (words[i]->word); /* free memory when no longer needed */
free (words[i]);
}
return 0;
}
Example Input File
$ cat dat/wordfile.txt
hello
hi
hello
bonjour
bonjour
hello
Example Use/Output
$ ./bin/filewordfreq <dat/wordfile.txt
hello = 3
hi = 1
bonjour = 2
As with any code that dynamically allocates memory, you will want to validate your use of the memory to insure you have not written beyond the bounds or based a conditional move or jump on an uninitialized value. In Linux, valgrind is the natural choice (there are similar programs for each OS). Just run you program through it, e.g.:
$ valgrind ./bin/filewordfreqstruct <dat/wordfile.txt
==2000== Memcheck, a memory error detector
==2000== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2000== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2000== Command: ./bin/filewordfreqstruct
==2000==
hello = 3
hi = 1
bonjour = 2
==2000==
==2000== HEAP SUMMARY:
==2000== in use at exit: 0 bytes in 0 blocks
==2000== total heap usage: 6 allocs, 6 frees, 65 bytes allocated
==2000==
==2000== All heap blocks were freed -- no leaks are possible
==2000==
==2000== For counts of detected and suppressed errors, rerun with: -v
==2000== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Verify that you free all memory you allocate and that there are no memory errors.
Look things over and let me know if you have any further questions.
Using 2-Arrays Instead of a struct
As mentioned above, sometimes using a storage array and a frequency array can simplify accomplishing the same thing. Whenever you are faced with needing the frequency of any "set", your first thought should be a frequency array. It is nothing more than an array of the same size as the number of items in your "set", (initialized to 0 at the beginning). The same approach applies, when you add (or find a duplicate of an existing) element in your storage array, you increment the corresponding element in your frequency array by 1. When you are done, your frequency array elements hold the frequency the corresponding elements in your storage array appear.
Here is an equivalent to the program above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
int main (void) {
char *words[MAXW] = {NULL}, /* storage array of pointers to char* */
tmp[MAXC] = "";
int freq[MAXW] = {0}, n = 0; /* simple integer frequency array */
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; words[i]; i++) /* check against exising words */
if (strcmp (words[i], tmp) == 0) /* if exists, break */
break;
if (words[i]) { /* if exists */
freq[i]++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate storage (+ space for nul-byte) */
words[n] = malloc (strlen (tmp) + 1);
if (!words[n]) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].\n", n);
break;
}
strcpy (words[n], tmp); /* copy new word to words[n] */
freq[n]++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d\n", words[i], freq[i]); /* output word + freq */
free (words[i]); /* free memory when no longer needed */
}
return 0;
}
Using this approach, you eliminate 1/2 of your memory allocations by using a statically declared frequency array for your count. Either way is fine, it is largely up to you.
This code reads a text file line by line. But I need to put those lines in an array but I wasn't able to do it. Now I am getting a array of numbers somehow. So how to read the file into a list. I tried using 2 dimensional list but this doesn't work as well.
I am new to C. I am mostly using Python but now I want to check if C is faster or not for a task.
#include <stdio.h>
#include <time.h>
#include <string.h>
void loadlist(char *ptext) {
char filename[] = "Z://list.txt";
char myline[200];
FILE * pfile;
pfile = fopen (filename, "r" );
char larray[100000];
int i = 0;
while (!feof(pfile)) {
fgets(myline,200,pfile);
larray[i]= myline;
//strcpy(larray[i],myline);
i++;
//printf(myline);
}
fclose(pfile);
printf("%s \n %d \n %d \n ","while doneqa",i,strlen(larray));
printf("First larray element is: %d \n",larray[0]);
/* for loop execution */
//for( i = 10; i < 20; i = i + 1 ){
// printf(larray[i]);
//}
}
int main ()
{
time_t stime, etime;
printf("Starting of the program...\n");
time(&stime);
char *ptext = "String";
loadlist(ptext);
time(&etime);
printf("time to load: %f \n", difftime(etime, stime));
return(0);
}
This code reads a text file line by line. But I need to put those lines in an array but I wasn't able to do it. Now I am getting an array of numbers somehow.
There are many ways to do this correctly. To begin with, first sort out what it is you actually need/want to store, then figure out where that information will come from and finally decide how you will provide storage for the information. In your case loadlist is apparently intended load a list of lines (up to 10000) so that they are accessible through your statically declared array of pointers. (you can also allocate the pointers dynamically, but if you know you won't need more than X of them, statically declaring them is fine (up to the point you cause StackOverflow...)
Once you read the line in loadlist, then you need to provide adequate storage to hold the line (plus the nul-terminating character). Otherwise, you are just counting the number of lines. In your case, since you declare an array of pointers, you cannot simply copy the line you read because each of the pointers in your array does not yet point to any allocated block of memory. (you can't assign the address of the buffer you read the line into with fgets (buffer, size, FILE*) because (1) it is local to your loadlist function and it will go away when the function stack frame is destroyed on function return; and (2) obviously it gets overwritten with each call to fgets anyway.
So what to do? That's pretty simple too, just allocate storage for each line as it is read using the strlen of each line as #iharob says (+1 for the nul-byte) and then malloc to allocate a block of memory that size. You can then simply copy the read buffer to the block of memory created and assign the pointer to your list (e.g. larray[x] in your code). Now the gnu extensions provide a strdup function that both allocates and copies, but understand that is not part of the C99 standard so you can run into portability issues. (also note you can use memcpy if overlapping regions of memory are a concern, but we will ignore that for now since you are reading lines from a file)
What are the rules for allocating memory? Well, you allocate with malloc, calloc or realloc and then you VALIDATE that your call to those functions succeeded before proceeding or you have just entered the realm of undefined behavior by writing to areas of memory that are NOT in fact allocated for your use. What does that look like? If you have your array of pointers p and you want to store a string from your read buffer buf of length len at index idx, you could simply do:
if ((p[idx] = malloc (len + 1))) /* allocate storage */
strcpy (p[idx], buf); /* copy buf to storage */
else
return NULL; /* handle error condition */
Now you are free to allocate before you test as follows, but it is convenient to make the assignment as part of the test. The long form would be:
p[idx] = malloc (len + 1); /* allocate storage */
if (p[idx] == NULL) /* validate/handle error condition */
return NULL;
strcpy (p[idx], buf); /* copy buf to storage */
How you want to do it is up to you.
Now you also need to protect against reading beyond the end of your pointer array. (you only have a fixed number since you declared the array statically). You can make that check part of your read loop very easily. If you have declared a constant for the number of pointers you have (e.g. PTRMAX), you can do:
int idx = 0; /* index */
while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {
...
idx++;
}
By checking the index against the number of pointers available, you insure you cannot attempt to assign address to more pointers than you have.
There is also the unaddressed issue of handling the '\n' that will be contained at the end of your read buffer. Recall, fgets read up to and including the '\n'. You do not want newline characters dangling off the ends of the strings you store, so you simply overwrite the '\n' with a nul-terminating character (e.g. simply decimal 0 or the equivalent nul-character '\0' -- your choice). You can make that a simple test after your strlen call, e.g.
while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {
size_t len = strlen (buf); /* get length */
if (buf[len-1] == '\n') /* check for trailing '\n' */
buf[--len] = 0; /* overwrite '\n' with nul-byte */
/* else { handle read of line longer than 200 chars }
*/
...
(note: that also brings up the issue of reading a line longer than the 200 characters you allocate for your read buffer. You check for whether a complete line has been read by checking whether fgets included the '\n' at the end, if it didn't, you know your next call to fgets will be reading again from the same line, unless EOF is encountered. In that case you would simply need to realloc your storage and append any additional characters to that same line -- that is left for future discussion)
If you put all the pieces together and choose a return type for loadlist that can indicate success/failure, you could do something similar to the following:
/** read up to PTRMAX lines from 'fp', allocate/save in 'p'.
* storage is allocated for each line read and pointer
* to allocated block is stored at 'p[x]'. (you should
* add handling of lines greater than LNMAX chars)
*/
char **loadlist (char **p, FILE *fp)
{
int idx = 0; /* index */
char buf[LNMAX] = ""; /* read buf */
while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {
size_t len = strlen (buf); /* get length */
if (buf[len-1] == '\n') /* check for trailing '\n' */
buf[--len] = 0; /* overwrite '\n' with nul-byte */
/* else { handle read of line longer than 200 chars }
*/
if ((p[idx] = malloc (len + 1))) /* allocate storage */
strcpy (p[idx], buf); /* copy buf to storage */
else
return NULL; /* indicate error condition in return */
idx++;
}
return p; /* return pointer to list */
}
note: you could just as easily change the return type to int and return the number of lines read, or pass a pointer to int (or better yet size_t) as a parameter to make the number of lines stored available back in the calling function.
However, in this case, we have used the initialization of all pointers in your array of pointers to NULL, so back in the calling function we need only iterate over the pointer array until the first NULL is encountered in order to traverse our list of lines. Putting together a short example program that read/stores all lines (up to PTRMAX lines) from the filename given as the first argument to the program (or from stdin if no filename is given), you could do something similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
enum { LNMAX = 200, PTRMAX = 10000 };
char **loadlist (char **p, FILE *fp);
int main (int argc, char **argv) {
time_t stime, etime;
char *list[PTRMAX] = { NULL }; /* array of ptrs initialized NULL */
size_t n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
printf ("Starting of the program...\n");
time (&stime);
if (loadlist (list, fp)) { /* read lines from fp into list */
time (&etime);
printf("time to load: %f\n\n", difftime (etime, stime));
}
else {
fprintf (stderr, "error: loadlist failed.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
while (list[n]) { /* output stored lines and free allocated mem */
printf ("line[%5zu]: %s\n", n, list[n]);
free (list[n++]);
}
return(0);
}
/** read up to PTRMAX lines from 'fp', allocate/save in 'p'.
* storage is allocated for each line read and pointer
* to allocated block is stored at 'p[x]'. (you should
* add handling of lines greater than LNMAX chars)
*/
char **loadlist (char **p, FILE *fp)
{
int idx = 0; /* index */
char buf[LNMAX] = ""; /* read buf */
while (fgets (buf, LNMAX, fp) && idx < PTRMAX) {
size_t len = strlen (buf); /* get length */
if (buf[len-1] == '\n') /* check for trailing '\n' */
buf[--len] = 0; /* overwrite '\n' with nul-byte */
/* else { handle read of line longer than 200 chars }
*/
if ((p[idx] = malloc (len + 1))) /* allocate storage */
strcpy (p[idx], buf); /* copy buf to storage */
else
return NULL; /* indicate error condition in return */
idx++;
}
return p; /* return pointer to list */
}
Finally, in any code your write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
Use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory, attempted to read or base a jump on an uninitialized value and finally to confirm that you have freed all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
Look things over, let me know if you have any further questions.
It's natural that you see numbers because you are printing a single character using the "%d" specifier. In fact, strings in c are pretty much that, arrays of numbers, those numbers are the ascii values of the corresponding characters. If you instead use "%c" you will see the character that represents each of those numbers.
Your code also, calls strlen() on something that is intended as a array of strings, strlen() is used to compute the length of a single string, a string being an array of char items with a non-zero value, ended with a 0. Thus, strlen() is surely causing undefined behavior.
Also, if you want to store each string, you need to copy the data like you tried in the commented line with strcpy() because the array you are using for reading lines is overwritten over and over in each iteration.
Your compiler must be throwing all kinds of warnings, if it's not then it's your fault, you should let the compiler know that you want it to do some diagnostics to help you find common problems like assigning a pointer to a char.
You should fix multiple problems in your code, here is a code that fixes most of them
void
loadlist(const char *const filename) {
char line[100];
FILE *file;
// We can only read 100 lines, of
// max 99 characters each
char array[100][100];
int size;
size = 0;
file = fopen (filename, "r" );
if (file == NULL)
return;
while ((fgets(line, sizeof(line), file) != NULL) && (size < 100)) {
strcpy(array[size++], line);
}
fclose(file);
for (int i = 0 ; i < size ; ++i) {
printf("array[%d] = %s", i + 1, array[i]);
}
}
int
main(void)
{
time_t stime, etime;
printf("Starting of the program...\n");
time(&stime);
loadlist("Z:\\list.txt");
time(&etime);
printf("Time to load: %f\n", difftime(etime, stime));
return 0;
}
Just to prove how complicated it can be in c, check this out
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>
struct string_list {
char **items;
size_t size;
size_t count;
};
void
string_list_print(struct string_list *list)
{
// Simply iterate through the list and
// print every item
for (size_t i = 0 ; i < list->count ; ++i) {
fprintf(stdout, "item[%zu] = %s\n", i + 1, list->items[i]);
}
}
struct string_list *
string_list_create(size_t size)
{
struct string_list *list;
// Allocate space for the list object
list = malloc(sizeof *list);
if (list == NULL) // ALWAYS check this
return NULL;
// Allocate space for the items
// (starting with `size' items)
list->items = malloc(size * sizeof *list->items);
if (list->items != NULL) {
// Update the list size because the allocation
// succeeded
list->size = size;
} else {
// Be optimistic, maybe realloc will work next time
list->size = 0;
}
// Initialize the count to 0, because
// the list is initially empty
list->count = 0;
return list;
}
int
string_list_append(struct string_list *list, const char *const string)
{
// Check if there is room for the new item
if (list->count + 1 >= list->size) {
char **items;
// Resize the array, there is no more room
items = realloc(list->items, 2 * list->size * sizeof *list->items);
if (items == NULL)
return -1;
// Now update the list
list->items = items;
list->size += list->size;
}
// Copy the string into the array we simultaneously
// increase the `count' and copy the string
list->items[list->count++] = strdup(string);
return 0;
}
void
string_list_destroy(struct string_list *const list)
{
// `free()' does work with a `NULL' argument
// so perhaps as a principle we should too
if (list == NULL)
return;
// If the `list->items' was initialized, attempt
// to free every `strdup()'ed string
if (list->items != NULL) {
for (size_t i = 0 ; i < list->count ; ++i) {
free(list->items[i]);
}
free(list->items);
}
free(list);
}
struct string_list *
loadlist(const char *const filename) {
char line[100]; // A buffer for reading lines from the file
FILE *file;
struct string_list *list;
// Create a new list, initially it has
// room for 100 strings, but it grows
// automatically if needed
list = string_list_create(100);
if (list == NULL)
return NULL;
// Attempt to open the file
file = fopen (filename, "r");
// On failure, we now have the responsibility
// to cleanup the allocated space for the string
// list
if (file == NULL) {
string_list_destroy(list);
return NULL;
}
// Read lines from the file until there are no more
while (fgets(line, sizeof(line), file) != NULL) {
char *newline;
// Remove the trainling '\n'
newline = strchr(line, '\n');
if (newline != NULL)
*newline = '\0';
// Append the string to the list
string_list_append(list, line);
}
fclose(file);
return list;
}
int
main(void)
{
time_t stime, etime;
struct string_list *list;
printf("Starting of the program...\n");
time(&stime);
list = loadlist("Z:\\list.txt");
if (list != NULL) {
string_list_print(list);
string_list_destroy(list);
}
time(&etime);
printf("Time to load: %f\n", difftime(etime, stime));
return 0;
}
Now, this will work almost as the python code you say you wrote but it will certainly be faster, there is absolutely no doubt.
It is possible that an experimented python programmer can write a python program that runs faster than that of a non-experimented c programmer, learning c however is really good because you then understand how things work really, and you can then infer how a python feature is probably implemented, so understanding this can be very useful actually.
Although it's certainly way more complicated than doing the same in python, note that I wrote this in nearly 10min. So if you really know what you're doing and you really need it to be fast c is certainly an option, but you need to learn many concepts that are not clear to higher level languages programmers.
I'm going to be getting input in the following format and I wish to store it into an arraylist or maybe even a linked list (whichever one is easier to implement):
3,5;6,7;8,9;11,4;
I want to be able to put the two numbers before the ; into a structure and store these. For example, I want 3,5 to be grouped together, and 6,7 to be grouped together.
I'm unsure as to how to read the input and obtain each pair and store it. The input I'm going to get can be fairly large (up to 60-70mB).
I have tried to use strtok() and strtol() however I just can't seem to get the correct implementation.
Any help would be great
EDIT:
What I have tried to do up til now is use this piece of code to read the input:
char[1000] remainder;
int first, second;
fp = fopen("C:\\file.txt", "r"); // Error check this, probably.
while (fgets(&remainder, 1000, fp) != null) { // Get a line.
while (sscanf(remainder, "%d,%d;%s", first, second, remainder) != null) {
// place first and second into a struct or something
}
}
I fixed the syntax errors in the code, but when I try and compile, it crashes.
With the addition of a single line, the same answer provides a robust and flexible solution to your problem. Take the time to understand what it does. It is not complicated at all, it is simply basic C. In order for you to do your conversion from string to int, you have 2 choices provided by libc, atoi (no error checking) and strtol (with error checking). Your only other alternative is to code a conversion by hand, which given your comments in both versions of this question, isn't what you are looking for.
The following is a good solution. Take the time to learn what it does. Regardless whether you use fgets or getline, the approach to the problem is the same. Let me know what your questions are:
/* read unliminted number of int values into array from stdin
(semicolon or comma separated values, pair every 2 values)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define NMAX 256
int main () {
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
int *numbers = NULL; /* array to hold numbers */
size_t nmax = NMAX; /* check for reallocation */
size_t idx = 0; /* numbers array index */
if (!(numbers = calloc (NMAX, sizeof *numbers))) {
fprintf (stderr, "error: memory allocation failed.");
return 1;
}
/* read each line from stdin - dynamicallly allocated */
while ((nchr = getline (&ln, &n, stdin)) != -1)
{
char *p = ln; /* pointer for use with strtol */
char *ep = NULL;
errno = 0;
while (errno == 0)
{
/* parse/convert each number on stdin */
numbers[idx] = strtol (p, &ep, 10);
/* note: overflow/underflow checks omitted */
/* if valid conversion to number */
if (errno == 0 && p != ep)
{
idx++; /* increment index */
if (!ep) break; /* check for end of str */
}
/* skip delimiters/move pointer to next digit */
while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
if (*ep)
p = ep;
else
break;
/* reallocate numbers if idx = nmax */
if (idx == nmax)
{
int *tmp = realloc (numbers, 2 * nmax * sizeof *numbers);
if (!tmp) {
fprintf (stderr, "Error: struct reallocation failure.\n");
exit (EXIT_FAILURE);
}
numbers = tmp;
memset (numbers + nmax, 0, nmax * sizeof *numbers);
nmax *= 2;
}
}
}
/* free mem allocated by getline */
if (ln) free (ln);
/* show values stored in array */
size_t i = 0;
for (i = 0; i < idx; i++)
if ( i % 2 == 1 ) /* pair ever 2 values */
printf (" numbers[%2zu] numbers[%2zu] %d, %d\n", i-1, i, numbers[i-1], numbers[i]);
/* free mem allocated to numbers */
if (numbers) free (numbers);
return 0;
}
Output
$ echo "3,5;6,7;8,9;11,4;;" | ./bin/parsestdin2
numbers[ 0] numbers[ 1] 3, 5
numbers[ 2] numbers[ 3] 6, 7
numbers[ 4] numbers[ 5] 8, 11
how can I read rows of ints from a txt file in C.
input.txt
3
5 2 3
1
2 1 3 4
the first 3 means there are 3 lines in all.
I write a c++ version using cin cout sstream
but I wonder how can I make it in C using fscanf or other c functions.
I need to handle each line of integers separately.
I know I can use getline to read the whole line into a buffer, then parse the string buffer to get the integers.
Can I just use fscanf?
What you ultimately want to do is free yourself from having to worry about the format of your inputfile. You want a routine that is flexible enough to read each row and parse the integers in each row and allocate memory accordingly. That greatly improves the flexibility of your routine and minimizes the amount of recoding required.
As you can tell from the comments there are many, many, many valid ways to approach this problem. The following is a quick hack at reading all integers in a file into an array, printing the array, and then cleaning up and freeing the memory allocated during the program. (note: the checks for reallocating are shown in comments, but omitted for brevity).
Note too that the storage for the array is allocated with calloc which allocates and sets the memory to 0. This frees you from the requirement of keeping a persistent row and column count. You can simply iterate over values in the array and stop when you encounter an uninitialized value. Take a look over the code and let me know if you have any questions:
#include <stdio.h>
#include <stdlib.h>
#define MROWS 100
#define MCOLS 20
int main (int argc, char **argv) {
if (argc < 2) {
fprintf (stderr, "error: insufficient input. usage: %s filename\n", argv[0]);
return 1;
}
FILE *fp = fopen (argv[1], "r");
if (!fp) {
fprintf (stderr, "error: file open failed for '%s'.\n", argv[1]);
return 1;
}
char *line = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
int **array = NULL; /* array of ptrs to array of int */
size_t ridx = 0; /* row index value */
size_t cidx = 0; /* col index value */
char *endptr = NULL; /* endptr to use with strtol */
/* allocate MROWS (100) pointers to array of int */
if (!(array = calloc (MROWS, sizeof *array))) {
fprintf (stderr, "error: array allocation failed\n");
return 1;
}
/* read each line in file */
while ((nchr = getline (&line, &n, fp)) != -1)
{
/* strip newline or carriage return (not req'd) */
while (line[nchr-1] == '\r' || line[nchr-1] == '\n')
line[--nchr] = 0;
if (!nchr) /* if line is blank, skip */
continue;
/* allocate MCOLS (20) ints for array[ridx] */
if (!(array[ridx] = calloc (MCOLS, sizeof **array))) {
fprintf (stderr, "error: array[%zd] allocation failed\n", ridx);
return 1;
}
cidx = 0; /* reset cidx */
char *p = line; /* assign pointer to line */
/* parse each int in line into array */
while ((array[ridx][cidx] = (int)strtol (p, &endptr, 10)) && p != endptr)
{
/* checks for underflow/overflow omitted */
p = endptr; /* increment p */
cidx++; /* increment cidx */
/* test cidx = MCOLS & realloc here */
}
ridx++; /* increment ridx */
/* test for ridx = MROWS & realloc here */
}
/* free memory and close input file */
if (line) free (line);
if (fp) fclose (fp);
printf ("\nArray:\n\n number of rows with data: %zd\n\n", ridx);
/* reset ridx, output array values */
ridx = 0;
while (array[ridx])
{
cidx = 0;
while (array[ridx][cidx])
{
printf (" array[%zd][%zd] = %d\n", ridx, cidx, array[ridx][cidx]);
cidx++;
}
ridx++;
printf ("\n");
}
/* free allocated memory */
ridx = 0;
while (array[ridx])
{
free (array[ridx]);
ridx++;
}
if (array) free (array);
return 0;
}
input file
$ cat dat/intfile.txt
3
5 2 3
1
2 1 3 4
program output
$ ./bin/readintfile dat/intfile.txt
Array:
number of rows with data: 4
array[0][0] = 3
array[1][0] = 5
array[1][1] = 2
array[1][2] = 3
array[2][0] = 1
array[3][0] = 2
array[3][1] = 1
array[3][2] = 3
array[3][3] = 4
In C (not C++) you should combine fgets with sscanf function.
EDIT:
But as an answer for the question "Can I just use fscanf?"
try this example (where usage of fgetc allows using fscanf instead of fgets+sscanf):
int lnNum = 0;
int lnCnt = 0; // line counter
int ch; // single character
// read number of lines
fscanf(f, "%d", &lnNum);
if(lnNum < 1)
{
return 1; // wrong line number
}
// reading numbers line by line
do{
res = fscanf(f, "%d", &num);
// analyse res and process num
// ....
// check the next character
ch = fgetc(f);
if(ch == '\n')
{
lnCnt++; // one more line is finished
}
} while (lnNum > lnCnt && !feof(f) );
NOTE: This code will work when your file has only numbers separated by single '\n' or spaces, for case of letters or combinations as number \n (space before newline) it becomes unstable