So, I was trying to write a program to take the first column of a CSV file and copy that do a string, but it's not going well. I put the code and the CSV file and I really appreciate any help.
int main () {
FILE *fp;
fp = fopen("filePATH", "r");
char column[80];
int line_n = 0;
char ch;
while ((ch = fgetc(fp)) != EOF) {
fgets(column, sizeof column, fp);
for (int i = 0; i < sizeof column; ++i){
fscanf(fp, "%[^;]", column);
}
printf("%s \n", column);
}
fclose(fp);
return 0;
}
CSV file:
2;51.5;144.0;24.80
5;62.3;157.0;25.30
10;52.8;141.0;26.60
10;34.5;120.0;24.00
1;41.6;131.0;24.20
5;49.0;144.0;23.80
6;47.1;142.0;23.50
2;51.8;144.5;24.80
1;55.6;135.0;30.50
9;51.9;150.0;23.10
9;48.5;139.0;25.10
The output I have is:
5
10
10
1
5
6
2
1
9
9
48.5;139.0;25.10
So, I don't understand why the program shows me the first column but copies only the last line for the string column.
To check the string column, I used:
char copy[20];
strncpy(copy, column, 18);
printf("%s ", copy);
And the output is:
48.5;139.0;25.10
Almost certainly you want to restructure the loops, but here are some minimal changes to your code that might be instructive:
#include<stdio.h>
char sample_input[] =
"2;51.5;144.0;24.80\n"
"5;62.3;157.0;25.30\n"
"10;52.8;141.0;26.60\n"
"10;34.5;120.0;24.00\n"
"1;41.6;131.0;24.20\n"
"5;49.0;144.0;23.80\n"
"6;47.1;142.0;23.50\n"
"2;51.8;144.5;24.80\n"
"1;55.6;135.0;30.50\n"
"9;51.9;150.0;23.10\n"
"9;48.5;139.0;25.10\n"
;
int
main(void)
{
FILE *fp = fmemopen(sample_input, sizeof sample_input, "r");
char column[80];
int line_n = 0;
int ch; /* fgetc returns an int. */
while( (ch = fgetc(fp)) != EOF ){
/* put ch back so it can be read by scanf */
ungetc(ch, fp);
/* This fgets does not seem to serve any purpose.
/* fgets(column, sizeof column, fp); */
ch = ';';
while( ch == ';' && fscanf(fp, "%79[^;\n]", column) == 1 ){
ch = fgetc(fp); /* Consume the ; or \n */
printf("%s%c", column, ch);
}
}
fclose(fp);
return 0;
}
It would probably be better to use the fgets to read each line of data and then use sscanf to parse the line, but I wanted to show that your current code can be made to (mostly) work with minimal changes. Note that the variable used to store the value returned by fgetc must be int rather than char in order to properly compare to EOF. Also, whenever you use %s or %[] in scanf, it is best to add a width modifier to prevent a buffer overflow.
Related
I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");
I'm having some trouble with in files.
I'm attempting to read in numbers and special characters from a file. I'm given a file that contains numbers and operands. The file is read in from the command line.
Sample file:
1 4 + 5*
2 - 4 + 8
7 -2 +1 +3
0
So the file is terminated with 0 and each new line is an expression. What my code is attempting to do is read in one expression at a time
1 4 + 5 *
Work with it, then move on to the next expression (until the file reads 0). It need to contain the special characters as well (+, *, -, ect)
This is my code so far. I don't know the length of each expression (it can vary) so I just statically declared it as a length of 50.
int main(int argc, char* argv[]){
FILE* input;
char expression[50];
int i;
char ch;
input = fopen(argv[1], "r");
if(input == NULL){
printf("File does not exist! Exiting program!");
return 0;
}
do{
ch= fgetc(input);
if(ch == 0)
break;
while((ch != '\n')){
//printf("test2\n");
expression[i] = ch;
i++;
ch = fgetc(input);
}
//plan to do something with each expression here
} while(ch != EOF);
for(i=0; i < (strlen(expression)); i++)
printf("%c ", expression[i]);
return 0;
}
All I'm trying to do here is just check to see if it will read in the expressions and print them out correctly. When I try to run it I get a segfault, I put in the test2 statement to test and it prints it out(infinite loop) until it seg faults.
Any help would be greatly appreciated.
For your kind of file have:
while((ch=fgetc(input)) != '0' )
{
/* your main processing */
if (ch == '\n')
{
/* deal with end of expression */
}
}
Having multiple while blocks and fgetc(input); is confusing for everyone.
I'm writing a program for school that asks to read text from a file, capitalizes everything, and removes the punctuation and spaces. The file "Congress.txt" contains
(Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.)
It reads in correctly but what I have so far to remove the punctuation, spaces, and capitalize causes some major problems with junk characters. My code so far is:
void processFile(char line[]) {
FILE *fp;
int i = 0;
char c;
if (!(fp = fopen("congress.txt", "r"))) {
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END);
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) {
fscanf(fp, "%c", &line[i]);
if (line[i] == ' ')
i++;
else if (ispunct((unsigned char)line[i]))
i++;
else if (islower((unsigned char)line[i])) {
line[i] = toupper((unsigned char)line[i]);
i++;
}
printf("%c", line[i]);
fprintf(csis, "%c", line[i]);
}
fclose(fp);
}
I don't know if it's an issue but I have MAX defined as 272 because that's what the text file is including punctuation and spaces.
My output I am getting is:
C╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Press any key to continue . . .
The fundamental algorithm needs to be along the lines of:
while next character is not EOF
if it is alphabetic
save the upper case version of it in the string
null terminate the string
which translates into C as:
int c;
int i = 0;
while ((c = getc(fp)) != EOF)
{
if (isalpha(c))
line[i++] = toupper(c);
}
line[i] = '\0';
This code doesn't need the (unsigned char) cast with the functions from <ctype.h> because c is guaranteed to contain either EOF (in which case it doesn't get into the body of the loop) or the value of a character converted to unsigned char anyway. You only have to worry about the cast when you use char c (as in the code in the question) and try to write toupper(c) or isalpha(c). The problem is that plain char can be a signed type, so some characters, notoriously ÿ (y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), will appear as a negative value, and that breaks the requirements on the inputs to the <ctype.h> functions. This code will attempt to case-convert characters that are already upper-case, but that's probably cheaper than a second test.
What else you do in the way of printing, etc is up to you. The csis file stream is a global scope variable; that's a bit (tr)icky. You should probably terminate the output printing with a newline.
The code shown is vulnerable to buffer overflow. If the length of line is MAX, then you can modify the loop condition to:
while (i < MAX - 1 && (c = getc(fp)) != EOF)
If, as would be a better design, you change the function signature to:
void processFile(int size, char line[]) {
and assert that the size is strictly positive:
assert(size > 0);
and then the loop condition changes to:
while (i < size - 1 && (c = getc(fp)) != EOF)
Obviously, you change the call too:
char line[4096];
processFile(sizeof(line), line);
in the posted code, there is no intermediate processing,
so the following code ignores the 'line[]' input parameter
void processFile()
{
FILE *fp = NULL;
if (!(fp = fopen("congress.txt", "r")))
{
printf("File could not be opened for input.\n");
exit(1);
}
// implied else, fopen successful
unsigned int c; // must be integer so EOF (-1) can be recognized
while( EOF != (c =(unsigned)fgetc(fp) ) )
{
if( (isalpha(c) || isblank(c) ) && !ispunct(c) ) // a...z or A...Z or space
{
// note toupper has no effect on upper case characters
// note toupper has no effect on a space
printf("%c", toupper(c));
fprintf(csis, "%c", toupper(c));
}
}
printf( "\n" );
fclose(fp);
} // end function: processFile
Okay so what I did was created a second character array. My first array read in the entire file. I created a second array which would only take in alphabetical characters from the first array then make them uppercase. My correct and completed function for that part of my homework is as follows:
void processFile(char line[], char newline[]) {
FILE *fp;
int i = 0;
int j = 0;
if (!(fp = fopen("congress.txt", "r"))) { //checks file open
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END); //idk what they do but they make it not crash
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) { //reads the file into the first array
fscanf(fp, "%c", &line[i]);
}
for (i = 0; i < MAX; ++i) {
if (isalpha(line[i])){ //if it's an alphabetical character
newline[j] = line[i]; //read into new array
newline[j] = toupper(newline[j]); //makes that letter capitalized
j++;
}
}
fclose(fp);
}
Just make sure that after creating the new array, it will be smaller than your defined MAX. To make it easy I just counted the now missing punctuation and spaces (which was 50) so for future "for" loops it was:
for (i = 0; i < MAX - 50; ++i)
I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.
Here is what I've done so far:
int main()
{
FILE *arq_file;
arq_file = fopen("file.csv", "r");
if(arq_file == NULL){
printf("Not possible to read the file.");
exit(0);
}
while( !feof(arq_file) ){
fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);
}
fclose(arq_file);
return 0;
}
It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?
Update: File 01 Example
1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999
File 02 Example
1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830
I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(void)
{
FILE *file;
char buffer[256];
char *pointer;
size_t line;
file = fopen("data.dat", "r");
if (file == NULL)
{
perror("fopen()");
return -1;
}
line = 0;
while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
{
size_t field;
char *token;
field = 0;
while ((token = strtok(pointer, ",")) != NULL)
{
printf("line %zu, field %zu -> %s\n", line, field, token);
field += 1;
pointer = NULL;
}
line += 1;
}
return 0;
}
I think it's very clear how the code works and I hope you can understand.
If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.
It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.
Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.
char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;
while ((c = getc(arq_file)) != EOF)
{
if (curfld == bufend)
…process overlong field…
else if (c == ',' || c == '\n')
{
*curfld = '\0';
process(buffer);
curfld = buffer;
}
else
*curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
*curfld = '\0';
process(buffer);
}
However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX
getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, arq_file) != -1)
{
char *posn = buffer;
char *epos;
char *token;
while ((token = strtok_r(posn, ",\n", &epos)) != 0)
{
process(token);
posn = 0;
}
/* Do anything special for end of line */
}
free(buffer);
If you think you must use scanf(), then you need to use something like:
char buffer[4096];
char c;
while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
process(buffer);
The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.
Which is the fastest way to get the lines of an ASCII file?
Normally you read files in C using fgets. You can also use scanf("%[^\n]"), but quite a few people reading the code are likely to find that confusing and foreign.
Edit: on the other hand, if you really do just want to count lines, a slightly modified version of the scanf approach can work quite nicely:
while (EOF != (scanf("%*[^\n]"), scanf("%*c")))
++lines;
The advantage of this is that with the '*' in each conversion, scanf reads and matches the input, but does nothing with the result. That means we don't have to waste memory on a large buffer to hold the content of a line that we don't care about (and still take a chance of getting a line that's even larger than that, so our count ends up wrong unless we got to even more work to figure out whether the input we read ended with a newline).
Unfortunately, we do have to break up the scanf into two pieces like this. scanf stops scanning when a conversion fails, and if the input contains a blank line (two consecutive newlines) we expect the first conversion to fail. Even if that fails, however, we want the second conversion to happen, to read the next newline and move on to the next line. Therefore, we attempt the first conversion to "eat" the content of the line, and then do the %c conversion to read the newline (the part we really care about). We continue doing both until the second call to scanf returns EOF (which will normally be at the end of the file, though it can also happen in case of something like a read error).
Edit2: Of course, there is another possibility that's (at least arguably) simpler and easier to understand:
int ch;
while (EOF != (ch=getchar()))
if (ch=='\n')
++lines;
The only part of this that some people find counterintuitive is that ch must be defined as an int, not a char for the code to work correctly.
Here's a solution based on fgetc() which will work for lines of any length and doesn't require you to allocate a buffer.
#include <stdio.h>
int main()
{
FILE *fp = stdin; /* or use fopen to open a file */
int c; /* Nb. int (not char) for the EOF */
unsigned long newline_count = 0;
/* count the newline characters */
while ( (c=fgetc(fp)) != EOF ) {
if ( c == '\n' )
newline_count++;
}
printf("%lu newline characters\n", newline_count);
return 0;
}
Maybe I'm missing something, but why not simply:
#include <stdio.h>
int main(void) {
int n = 0;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
}
printf("%d\n", n);
}
if you want to count partial lines (i.e. [^\n]EOF):
#include <stdio.h>
int main(void) {
int n = 0;
int pc = EOF;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
pc = c;
}
if (pc != EOF && pc != '\n')
++n;
printf("%d\n", n);
}
Common, why You compare all characters? It is very slow. In 10MB file it is ~3s.
Under solution is faster.
unsigned long count_lines_of_file(char *file_patch) {
FILE *fp = fopen(file_patch, "r");
unsigned long line_count = 0;
if(fp == NULL){
return 0;
}
while ( fgetline(fp) )
line_count++;
fclose(fp);
return line_count;
}
What about this?
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 4096
int main(int argc, char** argv)
{
int count;
int bytes;
FILE* f;
char buffer[BUFFER_SIZE + 1];
char* ptr;
if (argc != 2 || !(f = fopen(argv[1], "r")))
{
return -1;
}
count = 0;
while(!feof(f))
{
bytes = fread(buffer, sizeof(char), BUFFER_SIZE, f);
if (bytes <= 0)
{
return -1;
}
buffer[bytes] = '\0';
for (ptr = buffer; ptr; ptr = strchr(ptr, '\n'))
{
++count;
++ptr;
}
}
fclose(f);
printf("%d\n", count - 1);
return 0;
}