parsing text file of unknown size to array in c

parsing text file of unknown size to array in c - c

i'm new programmer in c and need some help. I'm trying to read text file which contains numbers (long doubles) separated by ','. My problem is that I don't know what is the size of each line and every solution I find online assume it's size. This is a part of work I got to do and I can't use scanf/fscanf. finally I would like to have array contains the numbers (without the ','), what is the best way to do it?
Thanks a lot!
edited:
FILE *inputFile;
inputFile = fopen("C:\\Users\\studies\\C\\Exe1\\input_example.txt",
"r");
for (int c = 0; c < 7; c++) {
fscanf(inputFile, "%Lf,", &params[c]);
}
any other way I tried to read just didn't go well, fgets, getchar, etc..

Divide and conquer! See if you're able to read the file correctly without storing anything. Just read and print what you read, so you can compare your output with the input file. If they match, you're on your way.
It's easier than you think to read the file. Use a char array as a temporary buffer for each number and read the file character by into the buffer. If the input is a ',' then you have read a complete number. Same goes for the newline '\n'.
// untested snippet
char buf[1024]; // Make it big
size_t i = 0;
int c;
long double d;
while ((c = fgetc(fp)) != EOF) {
if (c == ',' || c == '\n') {
buf[i] = '\0';
d = strtold(buf);
printf("%lf%c", d, c); // debugging, sanity check
i = 0;
}
else
buf[i++] = c;
}
There may be uncovered corner cases which the snippet doesn't cover, like missing newline at end of file, or silly Windows \r\n combos. Also, string to double conversion needs proper error checking. Still, the snippet should get you going.

Related

Is there a way to read a filestream until a period (.) is found. Then repeat?

I'm fairly new to C and not sure how I would do this. I've found similar questions, but nothing exactly like I want.
What I want to do is read a raw txt file "sentence by sentence" with the end of a sentence being considered a period (.) or a newline (\n). With no assumed maximum lengths for any data structures.
My first thought was getline(), but the version of C I'm required to use does not seem to have such a function. So I've tried to use fgets() and then parse the data onto a sscanf() with a scanset. sscanf(charLine, "%[^.]s", sentence);
The problem with this, is that if there is more than one period (.) it will stop at the first and not start again at that period (.) to collect the others.
I feel like I'm on the right track but just don't how to expand on this.
while(fgets (charLine, size, readFile) == NULL)
{
sscanf(charLine, "%[^.]s", sentence);
// something here...
}

You can write a function that reads the stream until a . or a newline is found. David C.Rankin suggested that just scanning for a . might be too restrictive, causing embedded periods in www.google.com to act as sentence break. One can stop on . if followed by white space:
#include <ctype.h>
#include <stdio.h>
/* alternative to fgets to stop at `.` and newline */
char *fgetsentence(char *dest, size_t size, FILE *fp) {
size_t i = 0;
while (i + 2 < size) {
int c = getc(fp);
if (c == EOF)
break;
dest[i++] = (char)c;
if (c == '\n')
break;
if (c == '.') {
int d = getc(fp);
if (d == EOF)
break;
if (isspace(d)) {
dest[i++] = (char)d;
break;
}
ungetc(d, fp);
}
}
if (i == 0)
return NULL;
dest[i] = '\0';
return dest;
}
If you want to handle arbitrary long sentences, you would take pointers to dest and size and reallocate the array if required.
Note that it would be very impractical to use fscanf(fp, "%[^.\n]", dest) because it is not possible to pass the maximum number of bytes to store into dest as an evaluated argument and one would need to special case empty lines and sentences.
Note too that stopping on ., even with the above restriction that it must be followed by white space still causes false positives: sentences can contain embedded periods followed by white space that are not the end of the sentence. Example: Thanks to David C. Rankin for his comments on my answer.

Construct a C array from CSV data

So, I am trying to build a c program to be able to read star catalogues and store them in arrays for further analysis.
Star catalogues usually come in .csv format and have a LOT of data on them, also some slots are empty and not all of them have int, double or float type data.
I want this program to be able to read "any" star catalogue (or by that matter any .csv file).
My first aproaches to construct such program met the issue that arrays must have their sizes declared, I decided to bypass this by creating a line and column counter functions, to be implemented by the main function.
int* getfield(char* line){
FILE *fp = fopen("./hipparcos.csv", "r");
int ch;
int lines=0;
do{
ch = fgetc(fp);
if( ch== '\n'){
lines++;
}
}while( ch != EOF );
printf("number of lines in the file %d\n",lines);
return lines;
}
which does work well when implemented in the main function like this getfield("\n"); so I get to see how many lines it is reading in the terminal (therefore know its counting them somehow).
What I need is to know how to store that quantity as a variable to later declare the array size and store lines in every position, and maybe after that do a line split (and separate each line into every column).
Any insights into how to proceed or a more efficient approach is appreciated.

You returns just int value, so change header of function to
int getfield(char* line){
this should not be a pointer.
Also consider possibility of not presence of '\n' at the end of the last line of the file (in this case you will have result 1 less than the number of rows).
EDIT:
If you just want to count number of lines as number of '\n' characters changed function is as follows:
int getCharCount(char chr){
FILE *fp = fopen("./hipparcos.csv", "r");
int ch;
int lines = 0;
do{
ch = fgetc(fp);
if( ch == chr){
lines++;
}
}while( ch != EOF );
fclose(fp);
return lines;
}
you can call this from main() e.g.:
printf("number of lines in the file %d\n", getCharCount('\n')); // maybe +1 needed :-)
but I understand that it is a draft of your program, so consider as an option sending file name as a parameter to your function. This make your solution more flexible.

How way to move a file pointer to the next word in a text file?

I have a program that requires that I start from word N, hash the next N+M words (concatenated and done through another function so the original pointer is not moved), and then increment the FILE pointer that is pointing at N to the next word.
The only way I thought to do this was to increment the FILE pointer until a space is found, then increment further until we found the first character of the next word. This is necessary because the file I am reading may have multiple spaces between words which would not result in a matching string compared to a file that has the same word content but single spaces.
This method would then require ungetc() because we we would have taken from the stream the first character of the next word.
Any ideas on a different implementation or am I pretty well restricted to this method?
while ( (c = fgetc(fileToHash) != ' ' )
;
while( (c = fgetc(fileToHash)) == ' ')
;
ungetc(c, fileToHash);

Yes, if you insist on using the file pointer as your index, that's pretty much what you've got. A better solution would probably be to read part or all of the file into a buffer and manipulate your pointer into the buffer, unless you intend to do random-access overwriting of the file's contents -- which is generally completely impractical with text files.

How about this.
void findWord(FILE *f, int n) {
int c = 0;
while (n-- > 0 && c != EOF) {
do c = fgetc(f); while (c != EOF && !isalpha(c));
while (c != EOF && isalpha(c)) c = fgetc(f);
}
}

You can use fscanf to read words delimited by whitespaces. This example will read each word from standard input and print each of them on a new line:
char buf[128];
while (fscanf(stdin, "%s", buf) > 0)
puts(buf);

Trying to convert morse code to english. struggling

I'm trying to create a function to read Morse code from one file, convert it to English text, print the converted text to the terminal, and write it to an output file. Here's a rough start...
#define TOTAL_MORSE 91
#define MORSE_LEN 6
void
morse_to_english(FILE* inputFile, FILE* outputFile, char morseStrings[TOTAL_MORSE][MORSE_LEN])
{ int i = 0, compare = 0;
char convert[MORSE_LEN] = {'\0'}, *buffer = '\0';
//read in a line of morse string from file
// fgets(buffer, //then what?
while(((convert[i] = fgetc(inputFile)) != ' ') && (i < (MORSE_LEN - 1)))
{ i++;
}
if (convert[i + 1] == ' ')
convert[i + 1] = '\0';
//compare read-in string w/morseStrings
for (i = 48, compare = strcmp(convert, morseStrings[i]); //48 is '0'
i < (TOTAL_MORSE - 1) && compare != 0;
i++)
{ compare = strcmp(convert, morseStrings[i]);
}
printf("%c", (char)i);
}
I have initialized morseStrings to the morse code.
That's my function right now. It does not work, and I'm not really sure what approach to take.
My original algorithm plan was something like this:
1. Scan Morse code in from file, character by character, until a space is reached
1.1 save to a temporary buffer (convert)
2. loop while i < 91 && compare != 0
compare = strcmp(convert, morseString[i])
3. if (test ==0) print ("%c", i);
4. loop through this until eof
but.. I can't seem to think of a good way to test if the next char in the file is a space. So this has made it very difficult for me.
I got pretty frustrated and googled for ideas, and found a suggestion to use this algorithm
Read a line
Loop
-strchr() for a SPACE or EOL
-copy characters before the space to another string
-Use strcmp() and loop to find the letter
-Test the next character for SPACE.
-If so, output another space
-Skip to next morse character
List item
Endloop
But, this loops is kind of confusing. I would use fgets() (I think), but I don't know what to put in the length argument.
Anyways, I'm tired and frustrated. I would appreciate any help or insight for this problem. I can provide more code if necessary.

Your original plan looks fine. You're off by 1 when you check for the ' ' in the buffer, though. It's at convert[i], not convert[i + 1]. The i++ inside the loop doesn't happen when a space is detected.

I wouldn't use strchr(), to complicated.
Loop through the Inputfile reading a line
tokenize line with [strtok][1]
loop through tokens and save(best append) the single Letters to a Buffer
close looops and print
a bit of pseudocode for u
while(there is a next line){
tokens = strtok(line);
int i = 0;
while(tokens hasnext){
save to buffer}}

If you are concerned about the CPU time you can write a lookup table to find the values, something as a switch like this:
case '.-': code = "A"; break;
case '-...': code = "B"; break;
case '-.-.': code = "C"; break;
After you split the morse code by the spaces and send the diferent . and - combinations to the switch to get the original character.
I hope this help.
Best regards.

K&R Chapter 1 - Exercise 22 solution, what do you think?

I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}

I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}

That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that

An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.

It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

parsing text file of unknown size to array in c - c

Related

Is there a way to read a filestream until a period (.) is found. Then repeat?

Construct a C array from CSV data

How way to move a file pointer to the next word in a text file?

Trying to convert morse code to english. struggling

K&R Chapter 1 - Exercise 22 solution, what do you think?

Categories

Resources