Reading comments into an array of dynamic size - c

So i have a series of comments in my markup file:
# comment1
# comment2
I want to read these into an array to be added to a comment array in my struct. I do not know the amount of comment lines in advance
I declare the comment array in my struct as follows:
char *comments; //comment array
Then I am starting to read the comments in but what i've got wasn't working:
int c;
//check for comments
c = getc(fd);
while(c == '#') {
while(getc(fd) != '\n') ;
c = getc(fd);
}
ungetc(c, fd);
//end comments?
Am I even close?
Thanks

First
char *comments; //comment array
Is one comment not an array of comments.
You need to use realloc, to create an array of strings
char**comments = NULL;
int count = 10; // initial size
comments = realloc(comments, count);
when you get > count
count*=2;
comments = realloc(comments, count);// classic doubling strategy
to put a string into the array (assuming comment is a char* with one comment in it
comments[i] = strdup(comment);

You can use fgets() form <stdio> to read one line at at time.
int num_comments = 0;
char comment_tmp[82];
char comment_arr[150][82];
while(comment_tmp[0] != '#' && !feof(file_pointer)){
fgets(comment_tmp, 82, file_pointer);
strcpy(comment_arr[num_comments], comment_tmp);
num_comments++;
}
This has the limitation of only being able to store 150 comments. This can be overcome by 1) setting a higher number there, 2) using dynamic memory allocation (think malloc/free), or 3) organizing your comments into a more flexible data structure like a linked list.

When you see that line is comment store the value of comment in comment variable just go to next line and do this loop again. So the code :
char c = getc(fd);
while(c == '#') {
while(getc(fd) != '\n') /* remove ; */ {
*comment = getc(fd);
++comment;
}
}
or use fscanf which is easier :
fscanf(fd,"#%s\n",comment); /* fd is the file */
Note that comment here is a string not an array of string.
For array of string it would be :
#define COMMENT_LEN 256
char comment [COMMENT_LEN ][100];
int i = 0;
while(!feof(fd) || i < 100) {
fscanf(fd,"#%s\n",comment[i]);
getch(); /* To just skip the new line char */
++i;
}

Related

Counting lines in a file excluding the empty lines in C

We have a program that will take a file as input, and then count the lines in that file, but without counting the empty lines.
There is already a post in Stack Overflow with this question, but the answer to that doesn't cover me.
Let's take a simple example.
File:
I am John\n
I am 22 years old\n
I live in England\n
If the last '\n' didn't exist, then the counting would be easy. We actually already had a function that did this here:
/* Reads a file and returns the number of lines in this file. */
uint32_t countLines(FILE *file) {
uint32_t lines = 0;
int32_t c;
while (EOF != (c = fgetc(file))) {
if (c == '\n') {
++lines;
}
}
/* Reset the file pointer to the start of the file */
rewind(file);
return lines;
}
This function, when taking as input the file above, counted 4 lines. But I only want 3 lines.
I tried to fix this in many ways.
First I tried by doing fgets in every line and comparing that line with the string "\0". If a line was just "\0" with nothing else, then I thought that would solve the problem.
I also tried some other solutions but I can't really find any.
What I basically want is to check the last character in the file (excluding '\0') and checking if it is '\n'. If it is, then subtract 1 from the number of lines it previously counted (with the original function). I don't really know how to do this though. Are there any other easier ways to do this?
I would appreciate any type of help.
Thanks.
You can actually very efficiently amend this issue by keeping track of just the last character as well.
This works because empty lines have the property that the previous character must have been an \n.
/* Reads a file and returns the number of lines in this file. */
uint32_t countLines(FILE *file) {
uint32_t lines = 0;
int32_t c;
int32_t last = '\n';
while (EOF != (c = fgetc(file))) {
if (c == '\n' && last != '\n') {
++lines;
}
last = c;
}
/* Reset the file pointer to the start of the file */
rewind(file);
return lines;
}
Here is a slightly better algorithm.
#include <stdio.h>
// Reads a file and returns the number of lines in it, ignoring empty lines
unsigned int countLines(FILE *file)
{
unsigned int lines = 0;
int c = '\0';
int pc = '\n';
while (c = fgetc(file), c != EOF)
{
if (c == '\n' && pc != '\n')
lines++;
pc = c;
}
if (pc != '\n')
lines++;
return lines;
}
Only the first newline in any sequence of newlines is counted, since all but the first newline indicate blank lines.
Note that if the file does not end with a '\n' newline character, any characters encountered (beyond the last newline) are considered a partial last line. This means that reading a file with no newlines at all returns 1.
Reading an empty file will return 0.
Reading a file ending with a single newline will return 1.
(I removed the rewind() since it is not necessary.)
Firstly, detect lines that only consist of whitespace. So let's create a function to do that.
bool stringIsOnlyWhitespace(const char * line) {
int i;
for (i=0; line[i] != '\0'; ++i)
if (!isspace(line[i]))
return false;
return true;
}
Now that we have a test function, let's build a loop around it.
while (fgets(line, sizeof line, fp)) {
if (! (stringIsOnlyWhitespace(line)))
notemptyline++;
}
printf("\n The number of nonempty lines is: %d\n", notemptyline);
Source is Bill Lynch, I've little bit changed.
I think your approach using fgets() is totally fine. Try something like this:
char line[200];
while(fgets(line, 200, file) != NULL) {
if(strlen(line) <= 1) {
lines++;
}
}
If you don't know about the length of the lines in your files, you may want to check if line actually contains a whole line.
Edit:
Of course this depends on how you define what an empty line is. If you define a line with only whitespaces as empty, the above code will not work, because strlen() includes whitespaces.

How to store the even lines of a file to one array and the odd lines to another

I am given a file of DNA sequences and asked to compare all of the sequences with each other and delete the sequences that are not unique. The file I am working with is in fasta format so the odd lines are the headers and the even lines are the sequences that I want to compare. SO I am trying to store the even lines in one array and the odd lines in another. I am very new to C so I'm not sure where to begin. I figured out how to store the whole file in one array like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
while(fgets(seq, sizeof seq, dna_file)){
strcpy(line[i], seq);
printf("%s", seq);
i++;
}
}
fclose(dna_file);
return 0;
}
I was thinking I would have to incorporate some sort of code that looked like this:
for (i = 0; i < rows; i++){
if (i % 2 == 0) header[i/2] = getline();
else seq[i/2] = getline();
but I'm not sure how to implement it.
Any help would be greatly appreciated!
To store the even lines of a file to one array and the odd lines to another,
read each char and swap output files when '\n' encountered.
void Split(FILE *even, FILE* odd, FILE *source) {
int evenflag = 1;
int ch;
while ((ch = fgetc(source)) != EOF) {
if (evenflag) {
fputc(ch, even);
} else {
fputc(ch, odd);
}
if (ch == '\n') {
evenflag = !evenflag;
}
}
}
It is not clear if this post also requires code to do the unique filtering step.
Could you please give me an example of the data in the file?
Am I right in thinking it'd be something like:
Header
Sequence
Header
Sequence
And so on
Perhaps you could do something like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
// Put this in an else statement
int counter = 1;
while(fgets(seq, sizeof seq, dna_file)){
// If counter is odd
// Place next line read in headers array
// If counter is even
// Place next line read in sequence array
// Increment counter
}
// Now you have all the sequences & headers. Remove any duplicates
// Foreach number of elements in 'sequence' array - referenced by, e.g. 'j' where 'j' starts at 0
// Foreach number of elements in 'sequence' array - referenced by 'k' - Where 'k' Starts at 'j + 1'
// IF (sequence[j] != '~') So if its not our chosen escape character
// IF (sequence[j] == sequence[k]) (I think you'd have to use strcmp for this?)
// SET sequence[k] = '~';
// SET header[k] = '~';
// END IF
// END IF
// END FOR
// END FOR
}
// You'd then need an algorithm to run through the arrays. If a '~' is found. Move the following non tilda/sequence down to its position, and so on.
// EDIT: Infact. It would probably be easier if when writing back to file, just ignore/don't write if sequence[x] == '~' (where 'x' iterates through all)
// Finally write back to file
fclose(dna_file);
return 0;
}
First: write a function that counts the number of newline (\n) characters in the file.
Then write a function that searches for the n-th newline
Last, write a function to go through and read from one '\n' to the next.
Alternately, you could just go online and read about string parsing.

How to convert my malloc + strcpy to strdup in C?

I am trying to save csv data in an array for use in other functions. I understand that strdup is good for this, but am unsure how to make it work for my situation. Any help is appreciated!
The data is stored in a struct:
typedef struct current{
char **data;
}CurrentData;
Function call:
int main(void){
int totalProducts = 0;
CurrentData *AllCurrentData = { '\0' };
FILE *current = fopen("C:\\User\\myfile.csv", "r");
if (current == NULL){
puts("current file data not found");
}
else{
totalProducts = getCurrentData(current, &AllCurrentData);
}
fclose(current);
return 0;
}
How I allocated memory;
int getCurrentData(FILE *current, CurrentData **AllCurrentData){
*AllCurrentData = malloc(totalProducts * sizeof(CurrentData));
/*allocate struct data memory*/
while ((next = fgetc(current)) != EOF){
if (next == '\n'){
(*AllCurrentData)[newLineCount].data = malloc(colCount * sizeof(char*));
newLineCount++;
}
}
newLineCount = 0;
rewind(current);
while ((next = fgetc(current)) != EOF && newLineCount <= totalProducts){
if (ch != '\0'){
buffer[i] = ch;
i++;
characterCount++;
}
if (ch == ',' && next != ' ' || ch == '\n' && ch != EOF){
if (i > 0){
buffer[i - 1] = '\0';
}
length = strlen(buffer);
/*(*AllCurrentData)[newLineCount].data[tabCount] = malloc(length + 1); /* originally was using strcpy */
strcpy((*AllCurrentData)[newLineCount].data[tabCount], buffer);
*/
(*AllCurrentData)[newLineCount].data[tabCount] = strdup(buffer); /* something like this? */
i = 0;
tabCount++;
for (j = 0; j < BUFFER_SIZE; j++){
buffer[j] = '\0';
}
}
You define a ptr AllCurrentData but you should set it to NULL.
CurrentData* AllCurrentData = NULL;
In getCurrentData you use totalProducts which seems a bit
odd since it is a local variable in main(), either you have another
global variable with the same name or there is an error.
The **data inside the structure seems odd, instead maybe you want
to parse the csv line and create proper members for them. You already
have an array of CurrentData so it seems odd to have another array
inside the struct -- i am just guessing cause you haven't explained
that part.
Since a csv file is line based use fgets() to read one line
from the file, then parse the string by using e.g. strtok or just by
checking the buffer after delimiters. Here strdup can come into play,
when you have taken out a token, do a strdup on it and store it in your
structure.
char line[255];
if ( fgets(line,sizeof(line),current) != NULL )
{
char* token = strdup(strtok( line, "," ));
...
}
Instead of allocating a big buffer that may be enough (or not) use
realloc to increase your buffer as you read from the file.
That said there are faster ways to extract data from a csv-file e.g.
you can read in the whole file with fread, then look for delimiters
and set these to \0 and create an array of char pointers into the buffer.
Okay, I wouldn't comment on other parts of your code, but you can use strdup to get rid of this line (*AllCurrentData)[newLineCount].data = malloc(colCount * sizeof(char*));, and this line (*AllCurrentData)[newLineCount].data[tabCount] = strdup(buffer); /* something like this? */
and replace them with this: (*AllCurrentData)[newLineCount].data = strdup(buffer);
For the function to read in the array of strings I would start with the following approach. This has not been tested or even compiled however it is a starting place.
There are a number of issues not addressed by this sample. The temporary buffer size of 4K characters may or may not be sufficiently large for all lines in the file. There may be more lines of text in the file than elements in the array of pointers and there is no indication from the function that this has happened.
Improvements to this would be better error handling. Also it might be modified so that the array of pointers is allocated in the function with some large amount and then if there are more lines in the file than array elements, using the realloc() function to enlarge the array of pointers by some size. Perhaps also a check on the file size and using an average text line length would be appropriate to provide an initial size for the array of pointers.
// Read lines of text from a text file returning the number of lines.
// The caller will provide an array of char pointers which will be used
// to return the list of lines of text from the file.
int GetTextLines (FILE *hFile, char **pStringArrays, int nArrayLength)
{
int iBuffSize = 4096;
int iLineCount = 0;
char tempBuffer [4096];
while (fgets (tempBuffer, iBuffSize, hFile) && iLineCount < nArrayLength) {
pStringArrays[iLineCount] = malloc ((strlen(tempBuffer) + 1) * sizeof (char));
if (! pStringArrays[iLineCount])
break;
strcpy (pStringArrays[iLineCount], tempBuffer);
iLineCount++;
}
return iLineCount;
}

Same Array in Different Procedures

I'm really new to C, and currently I'm trying to read in from a file which contains a list of names, and import that into an array. The current array is of type char[][] since it will have more information than just the name, but essentially I want team[0][0] to be the first name i read in, team[1][0] to be the second, etc. I'm pretty sure the actual importing of the names is correct, but I'm having problems storing these arrays.
FILE *teamfile;
teamfile = fopen(file, "r");
char line[MAXLENGTH+1];
int i = 0;
while( fgets(line, sizeof line, teamfile) != NULL )
{
trim_line(line);
strcpy(&team[i][NAME],line);
i++;
}
fclose(teamfile);
Which is called from the main function as teams = teamlist(argv[1], team);
But when I try to refer to the array from elsewhere in my program eg printf(&team[0][0]) it outputs what seems to be all names in one block...
What am I doing wrong?
edit:
static void trim_line(char line[])
{
int i = 0;
// LOOP UNTIL WE REACH THE END OF line
while(line[i] != '\0')
{
// CHECK FOR CARRIAGE-RETURN OR NEWLINE
if( line[i] == '\r' || line[i] == '\n' )
{
line[i] = '\0'; // overwrite with nul-byte
break; // leave the loop early
}
i = i+1; // iterate through character array
}
}
thanks for the help so far! :D
if team is declared as char team[NUM_OF_TEAMS][LENGHT_OF_NAME]
then it should always be strcpy(&team[i],line);
Hint: it is a char array, not a "string object" in C

Scanning in more than one word in C

I am trying to make a program which needs scans in more than one word, and I do not know how to do this with an unspecified length.
My first port of call was scanf, however this only scans in one word (I know you can do scanf("%d %s",temp,temporary);, but I do not know how many words it needs), so I looked around and found fgets. One issue with this is I cannot find how to make it move to the next code, eg
scanf("%99s",temp);
printf("\n%s",temp);
if (strcmp(temp,"edit") == 0) {
editloader();
}
would run editloader(), while:
fgets(temp,99,stdin);
while(fgets(temporary,sizeof(temporary),stdin))
{
sprintf(temp,"%s\n%s",temp,temporary);
}
if (strcmp(temp,"Hi There")==0) {
editloader();
}
will not move onto the strcmp() code, and will stick on the original loop. What should I do instead?
I would scan in each loop a word with scanf() and then copy it with strcpy() in the "main" string.
maybe you can use getline method ....I have used it in vc++ but if it exists in standard c library too then you are good to go
check here http://www.daniweb.com/software-development/c/threads/253585
http://www.cplusplus.com/reference/iostream/istream/getline/
Hope you find what you are looking for
I use this to read from stdin and get the same format that you would get by passing as arguments... so that you can have spaces in words and quoted words within a string. If you want to read from a specific file, just fopen it and change the fgets line.
#include <stdio.h>
void getargcargvfromstdin(){
char s[255], **av = (char **)malloc(255 * sizeof(char *));
unsigned char i, pos, ac;
for(i = 0; i < 255; i++)
av[i] = (char *)malloc(255 * sizeof(char));
enum quotes_t{QUOTED=0,UNQUOTED}quotes=UNQUOTED;
while (fgets(s,255,stdin)){
i=0;pos=0;ac=0;
while (i<strlen(s)) {
/* '!'=33, 'ÿ'=-1, '¡'=-95 outside of these are non-printables */
if ( quotes && ((s[i] < 33) && (s[i] > -1) || (s[i] < -95))){
av[ac][pos] = '\0';
if (av[ac][0] != '\0') ac++;
pos = 0;
}else{
if (s[i]=='"'){ /* support quoted strings */
if (pos==0){
quotes=QUOTED;
}else{ /* support \" within strings */
if (s[i-1]=='\\'){
av[ac][pos-1] = '"';
}else{ /* end of quoted string */
quotes=UNQUOTED;
}
}
}else{ /* printable ascii characters */
av[ac][pos] = s[i];
pos++;
}
}
i++;
}
//your code here ac is the number of words and av is the array of words
}
}
If it exceeds the buffer size you simply can't do it.
You will have to do multiple loops
the maximum size you can scan with scanf() will come from
char *name;
scanf("%s",name);
reed this
http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html

Resources