I'm going to be getting input in the following format and I wish to store it into an arraylist or maybe even a linked list (whichever one is easier to implement):
3,5;6,7;8,9;11,4;
I want to be able to put the two numbers before the ; into a structure and store these. For example, I want 3,5 to be grouped together, and 6,7 to be grouped together.
I'm unsure as to how to read the input and obtain each pair and store it. The input I'm going to get can be fairly large (up to 60-70mB).
I have tried to use strtok() and strtol() however I just can't seem to get the correct implementation.
Any help would be great
EDIT:
What I have tried to do up til now is use this piece of code to read the input:
char[1000] remainder;
int first, second;
fp = fopen("C:\\file.txt", "r"); // Error check this, probably.
while (fgets(&remainder, 1000, fp) != null) { // Get a line.
while (sscanf(remainder, "%d,%d;%s", first, second, remainder) != null) {
// place first and second into a struct or something
}
}
I fixed the syntax errors in the code, but when I try and compile, it crashes.
With the addition of a single line, the same answer provides a robust and flexible solution to your problem. Take the time to understand what it does. It is not complicated at all, it is simply basic C. In order for you to do your conversion from string to int, you have 2 choices provided by libc, atoi (no error checking) and strtol (with error checking). Your only other alternative is to code a conversion by hand, which given your comments in both versions of this question, isn't what you are looking for.
The following is a good solution. Take the time to learn what it does. Regardless whether you use fgets or getline, the approach to the problem is the same. Let me know what your questions are:
/* read unliminted number of int values into array from stdin
(semicolon or comma separated values, pair every 2 values)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define NMAX 256
int main () {
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
int *numbers = NULL; /* array to hold numbers */
size_t nmax = NMAX; /* check for reallocation */
size_t idx = 0; /* numbers array index */
if (!(numbers = calloc (NMAX, sizeof *numbers))) {
fprintf (stderr, "error: memory allocation failed.");
return 1;
}
/* read each line from stdin - dynamicallly allocated */
while ((nchr = getline (&ln, &n, stdin)) != -1)
{
char *p = ln; /* pointer for use with strtol */
char *ep = NULL;
errno = 0;
while (errno == 0)
{
/* parse/convert each number on stdin */
numbers[idx] = strtol (p, &ep, 10);
/* note: overflow/underflow checks omitted */
/* if valid conversion to number */
if (errno == 0 && p != ep)
{
idx++; /* increment index */
if (!ep) break; /* check for end of str */
}
/* skip delimiters/move pointer to next digit */
while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
if (*ep)
p = ep;
else
break;
/* reallocate numbers if idx = nmax */
if (idx == nmax)
{
int *tmp = realloc (numbers, 2 * nmax * sizeof *numbers);
if (!tmp) {
fprintf (stderr, "Error: struct reallocation failure.\n");
exit (EXIT_FAILURE);
}
numbers = tmp;
memset (numbers + nmax, 0, nmax * sizeof *numbers);
nmax *= 2;
}
}
}
/* free mem allocated by getline */
if (ln) free (ln);
/* show values stored in array */
size_t i = 0;
for (i = 0; i < idx; i++)
if ( i % 2 == 1 ) /* pair ever 2 values */
printf (" numbers[%2zu] numbers[%2zu] %d, %d\n", i-1, i, numbers[i-1], numbers[i]);
/* free mem allocated to numbers */
if (numbers) free (numbers);
return 0;
}
Output
$ echo "3,5;6,7;8,9;11,4;;" | ./bin/parsestdin2
numbers[ 0] numbers[ 1] 3, 5
numbers[ 2] numbers[ 3] 6, 7
numbers[ 4] numbers[ 5] 8, 11
Related
I have a text file with one line with numbers separated by space as in the following example:
1 -2 3.1 0xf 0xcc
After parsing the file with a C program,the results should be saved in a vector and its internal structure should be:
V[0]=1
V[1]=-2
V[2]=3.1
V[3]=15
V[4]=204
Basically i need to convert the numbers that start with 0x into decimal numbers.
I have tried storing all elements in a char vector and then transform them in numbers but without much succes.
Any help with a piece of code in C will be greatly appreciated.Thanks
You could have a look at sscanf. Here's a bare-bones program. I am sure you can pick up from here:
#include <stdio.h>
int main(void)
{
char *hex = "0xF";
int i= 0;
sscanf(hex, "%x", &i);
printf("%d", i);
}
What you need is strtol function for integer types. You can use endptr to iterate through the string. For double you can use atof function, but you have to check firstly if the string contains a dot.
EDIT: As user3386109 mentioned strtod is a better solution for double.
Assuming that you have the string in an array:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char numbers_str[] = "1 -2 3.1 0xf 0xcc";
int ints[10];
double doubles[10];
int ints_idx = 0, doubles_idx = 0;
const char delims[] = " ";
char *it = strtok(numbers_str, delims);
while(it != NULL)
{
char *dot_it = strchr(it, '.');
// When a dot is found then a double number is found
if(dot_it)
doubles[doubles_idx++] = strtod(it, NULL);
// When a dot is not found then we've got an integer
else
ints[ints_idx++] = strtol(it, NULL, 0);
it = strtok(NULL, delims);
}
printf("Integers found: \n");
for(int i = 0; i < ints_idx; ++i)
printf("%d\n", ints[i]);
printf("Double numbers found: \n");
for(int i = 0; i < doubles_idx; ++i)
printf("%f\n", doubles[i]);
}
The easiest way to handle reading the values from the line is to work your way down the line with strtod. strtod takes two pointers as parameters. The first points to the beginning point to search for digits (or leading +/-) in order to convert the string representation of the number to a numeric value (all leading whitespace is skipped). The second pointer-to-pointer (endptr) will be set to the next character following the last character used in the conversion. You start your search for the next number to convert from there (e.g. set p = ep; and repeat the process).
You can consult the man page for further details, but to validate a successful conversion, you check that the pointer is not equal to the end-pointer (meaning digits were converted) and you check to make sure errno was not set. If there were no digits converted (meaning you had an invalid character), you simply want to scan forward in the line manually until your next +/- or 0-9 is found (or you hit the nul-terminating character).
You want to protect your array bounds and limit the number of values you try and store in your vector array by keeping a simple counter and exiting the loop when your array is full.
Here is a short example (NAN and INF checking omitted):
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#define VECSZ 5 /* if you need a constant, define one (or more) */
int main (int argc, char **argv) {
int i, n = 0;
double v[VECSZ] = {0.0};
char buf[BUFSIZ] = "";
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (n < VECSZ && fgets (buf, BUFSIZ, fp)) { /* read each line */
char *p = buf, *ep; /* pointer, end-pointer for strtod */
while (n < VECSZ) { /* convert vales in buf until VECSZ */
double tmp = strtod (p, &ep);
if (p != ep) { /* digits converted */
if (!errno) /* validate no error */
v[n++] = tmp; /* add to vector */
p = ep; /* update pointer */
}
else { /* no digits converted */
fprintf (stderr, "error: no digits converted.\n");
/* scan forward to next valid '+,-,0-9' */
while (*p && *p != '-' && *p != '+' && (*p < '1' || '9' < *p))
p++;
if (*p) /* numbers remain in line */
continue;
break; /* otherwise get next line */
}
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < n; i++)
printf ("v[%d]=% g\n", i, v[i]);
return 0;
}
Example Input File
$ cat dat/strtod_vect.txt
1 -2 3.1 0xf 0xcc
Example Use/Output
$ ./bin/strtod_vect dat/strtod_vect.txt
v[0]= 1
v[1]=-2
v[2]= 3.1
v[3]= 15
v[4]= 204
Look things over and let me know if you have further questions. You can check strtod(3) - Linux man page for further details and error checking that can be done.
I need to read input from a file, then split the word in capitals from it's definition. My trouble being that I need multiple lines from the file to be in one variable to pass it to another function.
The file I want to read from looks like this
ACHROMATIC. An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon, are
partially corrected. (See APLANATIC.)
ACHRONICAL. An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
ACROSS THE TIDE. A ship riding across tide, with the wind in the
direction of the tide, would tend to leeward of her anchor; but with a
weather tide, or that running against the wind, if the tide be strong,
would tend to windward. A ship under sail should prefer the tack that
stems the tide, with the wind across the stream, when the anchor is
let go.
Right now my code splits the word from the rest, but I'm having difficulty getting the rest of the input into one variable.
while(fgets(line, sizeof(line), mFile) != NULL){
if (strlen(line) != 2){
if (isupper(line[0]) && isupper(line[1])){
word = strtok(line, ".");
temp = strtok(NULL, "\n");
len = strlen(temp);
for (i=0; i < len; i++){
*(defn+i) = *(temp+i);
}
printf("Word: %s\n", word);
}
else{
temp = strtok(line, "\n");
for (i=len; i < strlen(temp) + len; i++);
*(defn+i) = *(temp+i-len);
len = len + strlen(temp);
//printf(" %s\n", temp);
}
}
else{
len = 0;
printf("%s\n", defn);
index = 0;
}
}
like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>
//another function
void func(char *word, char *defs){
printf("<%s>\n", word);
if(defs){
printf("%s", defs);
}
}
int main(void){
char buffer[4096], *curr = buffer;
size_t len, buf_size = sizeof buffer;
FILE *fp = fopen("dic.txt", "r");
while(fgets(curr, buf_size, fp)){
//check definition line
if(*curr == '\n' || !isupper(*curr)){
continue;//printf("invalid format\n");
}
len = strlen(curr);
curr += len;
buf_size -= len;
//read rest line
while(1){
curr = fgets(curr, buf_size, fp);
if(!curr || *curr == '\n'){//upto EOF or blank line
char *word, *defs;
char *p = strchr(buffer, '.');
if(p)
*p++ = 0;
word = buffer;
defs = p;
func(word, defs);
break;
}
len = strlen(curr);
curr += len;
buf_size -= len;
assert(buf_size >= 2 || (fprintf(stderr, "small buffer\n"), 0));
}
curr = buffer;
buf_size = sizeof buffer;
}
fclose(fp);
return 0;
}
It appears you need to first pull a string of uppercase letters from the beginning of the line, up to the first period, then concatenate the remainder of that line with subsequent lines until a blank line is found. Lather, rinse, repeat as needed.
While this task would be MUCH easier in Perl, if you need to do it in C, for one thing I recommend using the built-in string functions instead of constructing your own for-loops to copy the data. Perhaps something like the following:
while(fgets(line, sizeof(line), mFile) != NULL) {
if (strlen(line) > 2) {
if (isupper(line[0]) && isupper(line[1])) {
word = strtok(line, ".");
strcpy(defn,strtok(NULL, "\n"));
printf("Word: %s\n", word);
} else {
strcat(defn,strtok(line, "\n"));
}
} else {
printf("%s\n", defn);
defn[0] = 0;
}
}
When I put this in a properly structured C program, with appropriate include files, it works fine. I personally would have approached the problem differently, but hopefully this gets you going.
There are several areas that can be addressed. Given your example input and description, it appears your goal is to develop a function that will read and separate each word (or phrase) and associated definition, return a pointer to the collection of words/definitions, while also updating a pointer to the number of words and definitions read so that number is available back in the calling function (main here).
While your data suggests that the word and definition are both contained within a single line of text with the word (or phrase written in all upper-case), it is unclear whether you will have to address the case where the definition can span multiple lines (essentially causing you to potentially read multiple lines and combine them to form the complete definition.
Whenever you need to maintain relationships between multiple variables within a single object, then a struct is a good choice for the base data object. Using an array of struct allows you access to each word and its associated definition once all have been read into memory. Now your example has 3 words and definitions. (each separated by a '\n'). Creating an array of 3 struct to hold the data is trivial, but when reading data, like a dictionary, you rarely know exactly how many words you will have to read.
To handle this situation, a dynamic array of structs is a proper data structure. You essentially allocate space for some reasonable number of words/definitions, and then if you reach that limit, you simply realloc the array containing your data, update your limit to reflect the new size allocated, and continue on.
While you can use strtok to separate the word (or phrase) by looking for the first '.', that is a bit of an overkill. You will need to traverse over each char anyway to check if they are all caps anyway, you may as well just iterate until you find the '.' and use the number for that character index to store your word and set a pointer to the next char after the '.'. You will begin looking for the start of the definition from there (you basically want to skip any character that is not an [a-zA-Z]). Once you locate the beginning of the definition, you can simply get the length of the rest of the line, and copy that as the definition (or the first part of it if the definition is contained in multiple-separate lines).
After the file is read and the pointer returned and the pointer for the number of words updated, you can then use the array of structs back in main as you like. Once you are done using the information, you should free all the memory you have allocated.
Since the size of the maximum word or phrase is generally know, the struct used provides static storage for the word. Give the definitions can vary wildly in length and are much longer, the struct simply contains a pointer-to-char*. So you will have to allocate storage for each struct, and then allocates storage for each definition within each struct.
The following code does just that. It will take the filename to read as the first argument (or it will read from stdin by default if no filename is given). The code the output the words and definitions on single lines. The code is heavily commented to help you follow along and explain the logic e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum {MAXW = 64, NDEF = 128};
typedef struct { /* struct holding words/definitions */
char word[MAXW],
*def; /* you must allocate space for def */
} defn;
defn *readdict (FILE *fp, size_t *n);
int main (int argc, char **argv) {
defn *defs = NULL;
size_t n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!(defs = readdict (fp, &n))) { /* read words/defs into defs */
fprintf (stderr, "readdict() error: no words read from file.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++) {
printf ("\nword: %s\n\ndefinition: %s\n", defs[i].word, defs[i].def);
free (defs[i].def); /* free allocated definitions */
}
free (defs); /* free array of structs */
return 0;
}
/** read word and associated definition from open file stream 'fp'
* into dynamic array of struct, updating pointer 'n' to contain
* the total number of defn structs filled.
*/
defn *readdict (FILE *fp, size_t *n)
{
defn *defs = NULL; /* pointer to array of structs */
char buf[BUFSIZ] = ""; /* buffer to hold each line read */
size_t max = NDEF, haveword = 0, offset = 0; /* allocated size & flags */
/* allocate, initialize & validate memory to hold 'max' structs */
if (!(defs = calloc (max, sizeof *defs))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return NULL;
}
while (fgets (buf, BUFSIZ, fp)) /* read each line of input */
{
if (*buf == '\n') { /* check for blank line */
if (haveword) (*n)++; /* if word/def already read, increment n */
haveword = 0; /* reset haveword flag */
if (*n == max) {
void *tmp = NULL; /* tmp ptr to realloc defs */
if (!(tmp = realloc (defs, sizeof *defs * (max + NDEF)))) {
fprintf (stderr, "error: memory exhaused, realloc defs.\n");
break;
}
defs = tmp; /* assign new block to defs */
memset (defs + max, 0, NDEF * sizeof *defs); /* zero new mem */
max += NDEF; /* update max with current allocation size */
}
continue; /* get next line */
}
if (haveword) { /* word already stored in defs[n].word */
void *tmp = NULL; /* tmp pointer to realloc */
size_t dlen = strlen (buf); /* get line/buf length */
if (buf[dlen - 1] == '\n') /* trim '\n' from end */
buf[--dlen] = 0; /* realloc & validate */
if (!(tmp = realloc (defs[*n].def, offset + dlen + 2))) {
fprintf (stderr,
"error: memory exhaused, realloc defs[%zu].def.\n", *n);
break;
}
defs[*n].def = tmp; /* assign new block, fill with definition */
sprintf (defs[*n].def + offset, offset ? " %s" : "%s", buf);
offset += dlen + 1; /* update offset for rest (if required) */
}
else { /* no current word being defined */
char *p = NULL;
size_t i;
for (i = 0; buf[i] && i < MAXW; i++) { /* check first MAXW chars */
if (buf[i] == '.') { /* if a '.' is found, end of word */
size_t dlen = 0;
if (i + 1 == MAXW) { /* check one char available for '\0' */
fprintf (stderr,
"error: 'word' exceeds MAXW, skipping.\n");
goto next;
}
strncpy (defs[*n].word, buf, i); /* copy i chars to .word */
haveword = 1; /* set haveword flag */
p = buf + i + 1; /* set p to next char in buf after '.' */
while (*p && (*p == ' ' || *p < 'A' || /* find def start */
('Z' < *p && *p < 'a') || 'z' < *p))
p++; /* increment p and check again */
if ((dlen = strlen (p))) { /* get definition length */
if (p[dlen - 1] == '\n') /* trim trailing '\n' */
p[--dlen] = 0;
if (!(defs[*n].def = malloc (dlen + 1))) { /* allocate */
fprintf (stderr,
"error: virtual memory exhausted.\n");
goto done; /* bail if allocation failed */
}
strcpy (defs[*n].def, p); /* copy definition to .def */
offset = dlen; /* set offset in .def buf to be */
} /* used if def continues on a */
break; /* new or separae line */
} /* check word is all upper-case or a ' ' */
else if (buf[i] != ' ' && (buf[i] < 'A' || 'Z' < buf[i]))
break;
}
}
next:;
}
done:;
if (haveword) (*n)++; /* account for last word/definition */
return defs; /* return pointer to array of struct */
}
Example Use/Output
$ ./bin/dict_read <dat/dict.txt
word: ACHROMATIC
definition: An optical term applied to those telescopes in which
aberration of the rays of light, and the colours dependent thereon,
are partially corrected. (See APLANATIC.)
word: ACHRONICAL
definition: An ancient term, signifying the rising of the heavenly
bodies at sunset, or setting at sunrise.
word: ACROSS THE TIDE
definition: A ship riding across tide, with the wind in the direction
of the tide, would tend to leeward of her anchor; but with a weather tide,
or that running against the wind, if the tide be strong, would tend to
windward. A ship under sail should prefer the tack that stems the tide,
with the wind across the stream, when the anchor is let go.
(line breaks were manually inserted to keep the results tidy here).
Memory Use/Error Check
You should also run any code that dynamically allocates memory though a memory use and error checking program like valgrind on linux. Just run the code though it and confirm you free all memory you allocate and that there are no memory errors, e.g.
$ valgrind ./bin/dict_read <dat/dict.txt
==31380== Memcheck, a memory error detector
==31380== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31380== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31380== Command: ./bin/dict_read
==31380==
word: ACHROMATIC
<snip output>
==31380==
==31380== HEAP SUMMARY:
==31380== in use at exit: 0 bytes in 0 blocks
==31380== total heap usage: 4 allocs, 4 frees, 9,811 bytes allocated
==31380==
==31380== All heap blocks were freed -- no leaks are possible
==31380==
==31380== For counts of detected and suppressed errors, rerun with: -v
==31380== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Look things over and let me know if you have further questions.
I am trying to save each line of a text file into an array.
They way I am doing it and works fine so far is this :
char *lines[40];
char line[50];
int i = 0 ;
char* eof ;
while( (eof = fgets(line, 50, in)) != NULL )
{
lines[i] = strdup(eof); /*Fills the array with line of the txt file one by one*/
i++;
}
My text file has 40 lines , which I am accessing with a for loop
for( j = 0; j <= 39 ; j++)
{ /*Do something to each line*/}.
So far so good. My problem is that i define the size of the array lines
for the a text file that has 40 lines. I tried to count the lines and then define the size but I am getting segmentation fault.
My approach:
int count=1 ; char c ;
for (c = getc(in); c != EOF; c = getc(in))
if (c == '\n') // Increment count if this character is newline
count = count + 1;
printf("\nNUMBER OF LINES = %d \n",count);
char* lines[count];
Any ideas ?
As an aside, I tested the exact code you show above to get line count (by counting newline characters), on a file containing more than 1000 lines, and with some lines 4000 char long. The problem is not there.
The seg fault is therefore likely due to the way you are allocating memory for each line buffer. You may be attempting to write a long line to a short buffer. (maybe I missed it in your post, but could not find where you addressed line length?)
Two things useful when allocating memory for storing strings in a file are number of lines, and the maximum line length in the file. These can be used to create the array of char arrays.
You can get both line count and longest line by looping on fgets(...): (a variation on your theme, essentially letting fgets find the newlines)
int countLines(FILE *fp, int *longest)
{
int i=0;
int max = 0;
char line[4095]; // max for C99 strings
*longest = max;
while(fgets(line, 4095, fp))
{
max = strlen(line);
if(max > *longest) *longest = max;//record longest
i++;//track line count
}
return i;
}
int main(void)
{
int longest;
char **strArr = {0};
FILE *fp = fopen("C:\\dev\\play\\text.txt", "r");
if(fp)
{
int count = countLines(fp, &longest);
printf("%d", count);
GetKey();
}
// use count and longest to create memory
strArr = create2D(strArr, count, longest);
if(strArr)
{
//use strArr ...
//free strArr
free2D(strArr, lines);
}
......and so on
return 0;
}
char ** create2D(char **a, int lines, int longest)
{
int i;
a = malloc(lines*sizeof(char *));
if(!a) return NULL;
{
for(i=0;i<lines;i++)
{
a[i] = malloc(longest+1);
if(!a[i]) return NULL;
}
}
return a;
}
void free2D(char **a, int lines)
{
int i;
for(i=0;i<lines;i++)
{
if(a[i]) free(a[i]);
}
if(a) free(a);
}
There are many ways to approach this problem. Either declare a static 2D array or char (e.g. char lines[40][50] = {{""}};) or declare a pointer to array of type char [50], which is probably the easiest for dynamic allocation. With that approach you only need a single allocation. With constant MAXL = 40 and MAXC = 50, you simply need:
char (*lines)[MAXC] = NULL;
...
lines = malloc (MAXL * sizeof *lines);
Reading each line with fgets is a simple task of:
while (i < MAXL && fgets (lines[i], MAXC, fp)) {...
When you are done, all you need to do is free (lines); Putting the pieces together, you can do something like:
#include <stdio.h>
#include <stdlib.h>
enum { MAXL = 40, MAXC = 50 };
int main (int argc, char **argv) {
char (*lines)[MAXC] = NULL; /* pointer to array of type char [MAXC] */
int i, n = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* valdiate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!(lines = malloc (MAXL * sizeof *lines))) { /* allocate MAXL arrays */
fprintf (stderr, "error: virtual memory exhausted 'lines'.\n");
return 1;
}
while (n < MAXL && fgets (lines[n], MAXC, fp)) { /* read each line */
char *p = lines[n]; /* assign pointer */
for (; *p && *p != '\n'; p++) {} /* find 1st '\n' */
*p = 0, n++; /* nul-termiante */
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
/* print lines */
for (i = 0; i < n; i++) printf (" line[%2d] : '%s'\n", i + 1, lines[i]);
free (lines); /* free allocated memory */
return 0;
}
note: you will also want to check to see if the whole line was read by fgets each time. (say you had a long line of more than 38 chars in the file). You do this by checking whether *p is '\n' before overwriting with the nul-terminating character. (e.g. if (*p != '\n') { int c; while ((c = getchar()) != '\n' && c != EOF) {} }). That insures the next read with fgets will begin with the next line, instead of the remaining characters in the current line.
To include the check you could do something similar to the following (note: I changed the read loop counter from i to n to eliminate the need for assigning n = i; following the read loop).
while (n < MAXL && fgets (lines[n], MAXC, fp)) { /* read each line */
char *p = lines[n]; /* assign pointer */
for (; *p && *p != '\n'; p++) {} /* find 1st '\n' */
if (*p != '\n') { /* check line read */
int c; /* discard remainder of line with getchar */
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
}
*p = 0, n++; /* nul-termiante */
}
It is up to you whether you discard or keep the remainder of lines that exceed the length of your array. However, it is a good idea to always check. (the lines of text in my example input below are limited to 17-chars so there was no possibility of a long line, but you generally cannot guarantee the line length.
Example Input
$ cat dat/40lines.txt
line of text - 1
line of text - 2
line of text - 3
line of text - 4
line of text - 5
line of text - 6
...
line of text - 38
line of text - 39
line of text - 40
Example Use/Output
$ ./bin/fgets_ptr2array <dat/40lines.txt
line[ 1] : 'line of text - 1'
line[ 2] : 'line of text - 2'
line[ 3] : 'line of text - 3'
line[ 4] : 'line of text - 4'
line[ 5] : 'line of text - 5'
line[ 6] : 'line of text - 6'
...
line[38] : 'line of text - 38'
line[39] : 'line of text - 39'
line[40] : 'line of text - 40'
Now include a the length check in code and add a long line to the input, e.g.:
$ cat dat/40lines+long.txt
line of text - 1
line of text - 2
line of text - 3 + 123456789 123456789 123456789 123456789 65->|
line of text - 4
...
Rerun the program and you can confirm you have now protected against long lines in the file mucking up your sequential read of lines from the file.
Dynamically Reallocating lines
If you have an unknown number of lines in your file and you reach your initial allocation of 40 in lines, then all you need do to keep reading additional lines is realloc storage for lines. For example:
int i, n = 0, maxl = MAXL;
...
while (fgets (lines[n], MAXC, fp)) { /* read each line */
char *p = lines[n]; /* assign pointer */
for (; *p && *p != '\n'; p++) {} /* find 1st '\n' */
*p = 0; /* nul-termiante */
if (++n == maxl) { /* if limit reached, realloc lines */
void *tmp = realloc (lines, 2 * maxl * sizeof *lines);
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "error: realloc - virtual memory exhausted.\n");
break; /* on failure, exit with existing data */
}
lines = tmp; /* assign reallocated block to lines */
maxl *= 2; /* update maxl to reflect new size */
}
}
Now it doesn't matter how many lines are in your file, you will simply keep reallocating lines until your entire files is read, or you run out of memory. (note: currently the code reallocates twice the current memory for lines on each reallocation. You are free to add as much or as little as you like. For example, you could allocate maxl + 40 to simply allocate 40 more lines each time.
Edit In Response To Comment Inquiry
If you do want to use a fixed increase in the number of lines rather than scaling by some factor, you must allocate for a fixed number of additional lines (the increase times sizeof *lines), you can't simple add 40 bytes, e.g.
void *tmp = realloc (lines, (maxl + 40) * sizeof *lines);
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "error: realloc - virtual memory exhausted.\n");
break; /* on failure, exit with existing data */
}
lines = tmp; /* assign reallocated block to lines */
maxl += 40; /* update maxl to reflect new size */
}
Recall, lines is a pointer-to-array of char[50], so for each additional line you want to allocate, you must allocate storage for 50-char (e.g. sizeof *lines), so the fixed increase by 40 lines will be realloc (lines, (maxl + 40) * sizeof *lines);, then you must accurately update your max-lines-allocated count (maxl) to reflect the increase of 40 lines, e.g. maxl += 40;.
Example Input
$ cat dat/80lines.txt
line of text - 1
line of text - 2
...
line of text - 79
line of text - 80
Example Use/Output
$ ./bin/fgets_ptr2array_realloc <dat/80lines.txt
line[ 1] : 'line of text - 1'
line[ 2] : 'line of text - 2'
...
line[79] : 'line of text - 79'
line[80] : 'line of text - 80'
Look it over and let me know if you have any questions.
how can I read rows of ints from a txt file in C.
input.txt
3
5 2 3
1
2 1 3 4
the first 3 means there are 3 lines in all.
I write a c++ version using cin cout sstream
but I wonder how can I make it in C using fscanf or other c functions.
I need to handle each line of integers separately.
I know I can use getline to read the whole line into a buffer, then parse the string buffer to get the integers.
Can I just use fscanf?
What you ultimately want to do is free yourself from having to worry about the format of your inputfile. You want a routine that is flexible enough to read each row and parse the integers in each row and allocate memory accordingly. That greatly improves the flexibility of your routine and minimizes the amount of recoding required.
As you can tell from the comments there are many, many, many valid ways to approach this problem. The following is a quick hack at reading all integers in a file into an array, printing the array, and then cleaning up and freeing the memory allocated during the program. (note: the checks for reallocating are shown in comments, but omitted for brevity).
Note too that the storage for the array is allocated with calloc which allocates and sets the memory to 0. This frees you from the requirement of keeping a persistent row and column count. You can simply iterate over values in the array and stop when you encounter an uninitialized value. Take a look over the code and let me know if you have any questions:
#include <stdio.h>
#include <stdlib.h>
#define MROWS 100
#define MCOLS 20
int main (int argc, char **argv) {
if (argc < 2) {
fprintf (stderr, "error: insufficient input. usage: %s filename\n", argv[0]);
return 1;
}
FILE *fp = fopen (argv[1], "r");
if (!fp) {
fprintf (stderr, "error: file open failed for '%s'.\n", argv[1]);
return 1;
}
char *line = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
int **array = NULL; /* array of ptrs to array of int */
size_t ridx = 0; /* row index value */
size_t cidx = 0; /* col index value */
char *endptr = NULL; /* endptr to use with strtol */
/* allocate MROWS (100) pointers to array of int */
if (!(array = calloc (MROWS, sizeof *array))) {
fprintf (stderr, "error: array allocation failed\n");
return 1;
}
/* read each line in file */
while ((nchr = getline (&line, &n, fp)) != -1)
{
/* strip newline or carriage return (not req'd) */
while (line[nchr-1] == '\r' || line[nchr-1] == '\n')
line[--nchr] = 0;
if (!nchr) /* if line is blank, skip */
continue;
/* allocate MCOLS (20) ints for array[ridx] */
if (!(array[ridx] = calloc (MCOLS, sizeof **array))) {
fprintf (stderr, "error: array[%zd] allocation failed\n", ridx);
return 1;
}
cidx = 0; /* reset cidx */
char *p = line; /* assign pointer to line */
/* parse each int in line into array */
while ((array[ridx][cidx] = (int)strtol (p, &endptr, 10)) && p != endptr)
{
/* checks for underflow/overflow omitted */
p = endptr; /* increment p */
cidx++; /* increment cidx */
/* test cidx = MCOLS & realloc here */
}
ridx++; /* increment ridx */
/* test for ridx = MROWS & realloc here */
}
/* free memory and close input file */
if (line) free (line);
if (fp) fclose (fp);
printf ("\nArray:\n\n number of rows with data: %zd\n\n", ridx);
/* reset ridx, output array values */
ridx = 0;
while (array[ridx])
{
cidx = 0;
while (array[ridx][cidx])
{
printf (" array[%zd][%zd] = %d\n", ridx, cidx, array[ridx][cidx]);
cidx++;
}
ridx++;
printf ("\n");
}
/* free allocated memory */
ridx = 0;
while (array[ridx])
{
free (array[ridx]);
ridx++;
}
if (array) free (array);
return 0;
}
input file
$ cat dat/intfile.txt
3
5 2 3
1
2 1 3 4
program output
$ ./bin/readintfile dat/intfile.txt
Array:
number of rows with data: 4
array[0][0] = 3
array[1][0] = 5
array[1][1] = 2
array[1][2] = 3
array[2][0] = 1
array[3][0] = 2
array[3][1] = 1
array[3][2] = 3
array[3][3] = 4
In C (not C++) you should combine fgets with sscanf function.
EDIT:
But as an answer for the question "Can I just use fscanf?"
try this example (where usage of fgetc allows using fscanf instead of fgets+sscanf):
int lnNum = 0;
int lnCnt = 0; // line counter
int ch; // single character
// read number of lines
fscanf(f, "%d", &lnNum);
if(lnNum < 1)
{
return 1; // wrong line number
}
// reading numbers line by line
do{
res = fscanf(f, "%d", &num);
// analyse res and process num
// ....
// check the next character
ch = fgetc(f);
if(ch == '\n')
{
lnCnt++; // one more line is finished
}
} while (lnNum > lnCnt && !feof(f) );
NOTE: This code will work when your file has only numbers separated by single '\n' or spaces, for case of letters or combinations as number \n (space before newline) it becomes unstable
I have a file of DNA sequences and an associated IDs and I'm trying to save the even lines (the IDs) to one array and the odd lines (the sequences) to another. Then I want to compare all of the sequences with each other to find the unique sequences. For example is Seq A is AGTCGAT and Seq B is TCG, Seq B is not unique. I want to save the unique sequences and their IDs to an output file and id the sequences are not unique, only save the ID to the output file and print "Deleting sequence with ID: " to the console. I'm pretty much done but Im running into a few problems. I tried printing out the two separate arrays, sequences[] and headers[], but for some reason, they only contain two out of the 5 strings (the file has 5 IDs and 5 headers). And then the information isn't printing out to the screen the way it's supposed to.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
int total_seq = 20000;
char seq[900];
char** headers;
char** sequences;
int sequence_size = 0;
headers = malloc(total_seq * sizeof(char*));
sequences = malloc(total_seq * sizeof(char*));
int index;
for(index = 0; index < total_seq; index++){
headers[index] = malloc(900 * sizeof(char));
sequences[index] = malloc(900 * sizeof(char));
}
FILE *dna_file;
FILE *new_file;
dna_file = fopen("inabc.fasta", "r");
new_file = fopen("output.fasta", "w");
if (dna_file == NULL){
printf("Error");
return 0;
}
int i = 0;
int j = 0;
while(fgets(seq, sizeof seq, dna_file)){
if(i%2 == 0){
strcpy(headers[i/2], seq);
i++;
}
else{
strcpy(sequences[i/2], seq);
i++;
}
}
fclose(dna_file);
sequence_size = i/2;
char* result;
for(i=0; i < sequence_size; i++){
for(j=0; j < sequence_size; j++){
if(i==j){
continue;
}
result = strstr(sequences[j], sequences[i]);
if(result== NULL){
fprintf(new_file,"%s", headers[i]);
fprintf(new_file,"%s", sequences[i]);
}
else{
printf("Deleting sequence with id: %s \n", headers[i]);
printf(sequences[i]);
fprintf(new_file,"%s", headers[i]);
}
}
}
The sample file inabc.fasta is short but the actual file I'm working with is very long, which is why I've used malloc. Any help would be appreciated!
EDIT: The sample input file inabc.fasta:
cat inabc.fasta
> id1 header1
abcd
> id2 header2
deghj
> id3 header3
defghijkabcd
> id4 header4
abcd
> id5 header5
xcvbnnmlll
So for this sample, sequences 1 and 4 will not be saved to the output file
This:
while( fgets(seq, sizeof seq, dna_file) ) {
if( i % 2 == 0 ){
strcpy(headers[i], seq);
i++;
}
else {
strcpy(sequences[i-1], seq);
i++;
}
}
is going to skip every other element in your arrays:
When i == 0, it'll store in headers[0]
When i == 1, it'll store in sequences[0]
When i == 2, it'll store in headers[2]
When i == 3, it'll store in sequences[2]
and so on.
Then you do:
sequence_size = i/2;
so if you loop sequence_size times, you'll only make it half way through the piece of the array you've written to, and every other element you print will be uninitialized. This is why you're only printing half the elements (if you have 5 elements, then i / 2 == 2, and you'll only see 2), and why it "isn't printing out to the screen the way it's supposed to".
You'll be better off just using either two separate counters when you read in your input, and a separate variable to store whether you're on an odd or even line of input.
For instance:
int i = 0, j = 0, even = 1;
while( fgets(seq, sizeof seq, dna_file) ) {
if( even ){
strcpy(headers[i++], seq);
even = 0;
}
else {
strcpy(sequences[j++], seq);
even = 1;
}
}
Here it's better to have two variables, since if you read in an odd number of lines, your two arrays will contain different numbers of read elements.
In addition to the other comments, there are a few logic errors in your output routine you need to correct. Below, I have left your code in comments, so you can follow the changes and additions made.
There are several ways you can approach memory management a bit more efficiently, and provide a way to cleanly iterate over your data without tracking counters throughout your code. Specifically, when you allocate your array of pointers-to-pointer-to-char, use calloc instead of malloc so that your pointers are initialized to zero/NULL. This allows you to easily iterate over only those pointers that have been assigned.
There is no need to allocate 20000 900 char arrays (times 2) before reading your data. Allocate your pointers (or start with a smaller number of pointers say 256 and realloc as needed), then simply allocate for each element in headers and sequences as needed within your read loop. Further, instead of allocating 1800 chars (900 * 2) every time you add an element to headers and sequences, just allocate the memory required to hold the data. This can make a huge difference. For example, you allocate 20000 * 900 * 2 = 36000000 bytes (36M) before you start reading this small set of sample data. Even allocating all 20000 pointers, allocating memory as needed for this sample data, limits memory usage to 321,246 bytes (less that 1% of 36M)
The logic in your write loop will not work. You must move your write of the data outside of the inner loop. Otherwise you have no way of testing whether to Delete a duplicate entry. Further testing result does not provide a way to skip duplicates. result changes with every iteration of the inner loop. You need to both test and set a flag that will control whether or not to delete the duplicate once you leave the inner loop.
Finally, since you are allocating memory dynamically, you are responsible for tracking the memory allocated and freeing the memory when no longer needed. Allocating your array of pointers with calloc makes freeing the memory in use a snap.
Take a look at the changes and additions I'm made to your code. Understand the changes and let me know if you have any questions. Note: there are many checks omitted for sake of not cluttering the code. You should at minimum make sure you do not exceed the 20000 pointers allocated when run on a full dataset, and realloc as needed. You should also check that strdup succeeded (it's allocating memory), although you get some assurance comparing the headers and sequences index count. I'm sure there are many more that make sense. Good luck.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXSEQ 20000
#define SZSEQ 900
int main ()
{
int total_seq = MAXSEQ; /* initialize all variables */
char seq[SZSEQ] = {0};
char **headers = NULL; /* traditionally variables */
char **sequences = NULL; /* declared at beginning */
// char *result = NULL;
// int sequence_size = 0;
size_t len = 0;
int hidx = 0;
int sidx = 0;
// int idx = 0; /* (see alternative in fgets loop) */
int i = 0;
int j = 0;
int del = 0;
/* calloc initilizes to 0 & allows iteration on addresses */
headers = calloc (total_seq, sizeof (*headers));
sequences = calloc (total_seq, sizeof (*sequences));
/* allocate as needed if possible - see read loop */
// for (index = 0; index < total_seq; index++) {
// headers[index] = malloc (900 * sizeof (char));
// sequences[index] = malloc (900 * sizeof (char));
// }
FILE *dna_file = NULL;
FILE *new_file = NULL;
dna_file = fopen ("inabc.fasta", "r");
new_file = fopen ("output.fasta", "w+"); /* create if not existing "w+" */
if (!dna_file || !new_file) {
fprintf (stderr, "Error: file open failed.\n");
return 1; /* 1 indicates error condition */
}
while (fgets (seq, sizeof (seq), dna_file)) /* read dna_file & separate */
{
len = strlen (seq); /* strip newline from seq end */
if (seq[len-1] == '\n') /* it's never good to leave \n */
seq[--len] = 0; /* scattered through your data */
/* if header line has '>' as first char -- use it! */
if (*seq == '>')
headers[hidx++] = strdup (seq); /* strdup allocates */
else
sequences[sidx++] = strdup (seq);
/* alternative using counter if no '>' */
// if (idx % 2 == 0)
// headers[hidx++] = strdup (seq);
// else
// sequences[sidx++] = strdup (seq);
// idx++
}
fclose (dna_file);
if (hidx != sidx)
fprintf (stderr, "warning: hidx:sidx (%d:%d) differ.\n", hidx, sidx);
// sequence_size = (hidx>sidx) ? sidx : hidx; /* protect against unequal read */
//
// for (i = 0; i < sequence_size; i++) {
// for (j = 0; i < sequence_size; i++) {
// if (i == j) {
// continue;
// }
// result = strstr (sequences[j], sequences[i]);
// if (result == NULL) {
// fprintf (new_file, "%s", headers[i]);
// fprintf (new_file, "%s", sequences[i]);
// } else {
// printf ("Deleting sequence with id: %s \n", headers[i]);
// printf (sequences[i]);
// fprintf (new_file, "%s", headers[i]);
// }
// }
// }
/* by using calloc, all pointers except those assigned are NULL */
while (sequences[i]) /* testing while (sequences[i] != NULL) */
{
j = 0;
del = 0;
while (sequences[j])
{
if (i == j)
{
j++;
continue;
}
if (strstr (sequences[j], sequences[i])) /* set delete flag */
{
del = 1;
break;
}
j++;
}
if (del) {
printf ("Deleting id: '%s' with seq: '%s' \n", headers[i], sequences[i]);
// printf (sequences[i]);
fprintf (new_file, "%s\n", headers[i]);
} else {
fprintf (new_file, "%s\n", headers[i]);
fprintf (new_file, "%s\n", sequences[i]);
}
i++;
}
fclose (new_file);
/* free allocated memory - same simple iteration */
i = 0;
while (headers[i])
free (headers[i++]); /* free strings allocated by strdup */
if (headers) free (headers); /* free the array of pointers */
i = 0;
while (sequences[i])
free (sequences[i++]);
if (sequences) free (sequences);
return 0;
}
output:
$ ./bin/dnaio
Deleting id: '> id1 header1' with seq: 'abcd'
Deleting id: '> id4 header4' with seq: 'abcd'
output.fasta:
$ cat output.fasta
> id1 header1
> id2 header2
deghj
> id3 header3
defghijkabcd
> id4 header4
> id5 header5
xcvbnnmlll
memory allocation/free verification:
==21608== HEAP SUMMARY:
==21608== in use at exit: 0 bytes in 0 blocks
==21608== total heap usage: 14 allocs, 14 frees, 321,246 bytes allocated
==21608==
==21608== All heap blocks were freed -- no leaks are possible