easy way to parse a text file? - c

I'm making a load balancer (a very simple one). It looks at how long the user has been idle, and the load on the system to determine if a process can run, and it goes through processes in a round-robin fashion.
All of the data needed to control the processes are stored in a text file.
The file might look like this:
PID=4390 IDLE=0.000000 BUSY=2.000000 USER=2.000000
PID=4397 IDLE=3.000000 BUSY=1.500000 USER=4.000000
PID=4405 IDLE=0.000000 BUSY=2.000000 USER=2.000000
PID=4412 IDLE=0.000000 BUSY=2.000000 USER=2.000000
PID=4420 IDLE=3.000000 BUSY=1.500000 USER=4.000000
This is a university assignment, however parsing the text file isn't supposed to be a big part of it, which means I can use whatever way is the quickest for me to implement.
Entries in this file will be added and removed as processes finish or are added under control.
Any ideas on how to parse this?
Thanks.

Here is a code that will parse your file, and also account for the fact that your file might be unavailable (that is, fopen might fail), or being written while you read it (that is, fscanf might fail). Note that infinite loop, which you might not want to use (that's more pseudo-code than actual code to be copy-pasted in your project, I didn't try to run it). Note also that it might be quite slow given the duration of the sleep there: you might want to use a more advanced approach, that's more sort of a hack.
int pid;
float idle, busy, user;
FILE* fid;
fpos_t pos;
int pos_init = 0;
while (1)
{
// try to open the file
if ((fid = fopen("myfile.txt","rw+")) == NULL)
{
sleep(1); // sleep for a little while, and try again
continue;
}
// reset position in file (if initialized)
if (pos_init)
fsetpos (pFile,&pos);
// read as many line as you can
while (!feof(fid))
{
if (fscanf(fid,"PID=%d IDLE=%f BUSY=%f USER=%f",&pid, &idle, &busy, &user))
{
// found a line that does match this pattern: try again later, the file might be currently written
break;
}
// add here your code processing data
fgetpos (pFile,&pos); // remember current position
pos_init = 1; // position has been initialized
}
fclose(fid);
}

As far as just parsing is concerned, something like this in a loop:
int pid;
float idle, busy, user;
if(fscanf(inputStream, "PID=%d IDLE=%f BUSY=%f USER=%f", %pid, &idle, &busy, &user)!=4)
{
/* handle the error */
}
But as #Blrfl pointed out, the big problem is to avoid mixups when your application is reading the file and the others are writing to it. To solve this problem you should use a lock or something like that; see e.g. the flock syscall.

Use fscanf in a loop. Here's a GNU C tutorial on using fscanf.
/* fscanf example */
#include <stdio.h>
typedef struct lbCfgData {
int pid;
double idle;
double busy;
double user;
} lbCfgData_t ;
int main ()
{
// PID=4390 IDLE=0.000000 BUSY=2.000000 USER=2.000000
lbCfgData_t cfgData[128];
FILE *f;
f = fopen ("myfile.txt","rw+");
for ( int i = 0;
i != 128 // Make sure we don't overflow the array
&& fscanf(f, "PID=%u IDLE=%f BUSY=%f USER=%f", &cfgData[i].pid,
&cfgData[i].idle, &cfgData[i].busy, cfgData[i].user ) != EOF;
i++
);
fclose (f);
return 0;
}

Related

Why the fread() function is not reaching the end-of-file and the loop continues infinitely, while reading structures from a file?

The code snippet given below is a part of my Customer Billing System Project which I'm writing in C. Here I'm taking input of the Customer Data in a Customer structure and saving the entire structure in "CustomerRecord.dat" file using the writefile() function as defined in my code. After successfully writing the Customer data in my file, I'm trying to traverse the file and assign Customer numbers to each record. For doing this I'm using the setCustomerNo() funtion as defined in my code.
The problem is the while loop inside the setCustomerNo() funtion is running infinitely and reading even after the end of file.
Please help me solving this issue.
I've tried replacing:
while(!feof(fp) { // LOOP BODY }
while(!feof(fp) && !ferror(fp)) { // LOOP BODY }
while(fread(&x, sizeof(Customer), 1, fp) || !feof(fp)) { // LOOP BODY }
But none of these above replacements worked for me.
#define RECORD CustomerRecord.dat
typedef struct
{
int customerNo;
unsigned int phoneNo;
char name[20];
char address[32];
float balance;
}Customer;
void writefile(Customer x)
{
FILE *fp;
fp = fopen("CustomerRecord.dat", "ab");
fwrite(&x, sizeof(Customer), 1, fp);
fclose(fp);
}
Customer setCustomerNo()
{
FILE *fp;
Customer x, y;
int k = 1;
fp = fopen(RECORD, "rb+");
while(!feof(fp) && !ferror(fp))
{
fread(&x, sizeof(Customer), 1, fp);
x.customerNo = k++;
y = x;
fseek(fp, -sizeof(Customer), SEEK_CUR);
fwrite(&x, sizeof(Customer), 1, fp);
}
fclose(fp);
return y;
}
The expected result is that the function setCustomerNo() will read all the structures stored in file from the beginning and update customerNo(s) starting from 1 and so on to the structures stored in the file. At the end return the last updated structure.
You definitely don't want your loop to test feof(fp) -- indeed, that is practically never right -- because feof(fp) will always be false at the end if the loop, so the loop will never end. Why? The fseek manpage spells it out:
A successful call to the fseek() function clears the end-of-file indicator for the stream.
Anyway, if you think about it, you want to terminate the loop exactly when fread reports that it got no data. You certainly don't want to process the non-existent data that fread didn't read. So testing whether there was an EOF after you handle the possibly unread data can't be right, even if feof(fp) actually returned a useful value.
A couple of other notes:
You're better off using ftell to recall the read cursor before the read, and then fseeking to the saved point. That avoids a bunch of corner cases, and is easily upgraded to use the more general fseeko/ftello functions, which work properly with large files. (Or you could just go directly to using fseeko and ftello.
Think about what you want to do if the last read is partial. Of course, that "can't happen" if you always create fixed-length records. But everything that "can't happen" eventually does, for one reason or another. Best to deal with it.

Why is the data write not reflected to the file using fprintf file stream

This is my program:
#include <stdio.h>
int main() {
FILE *logh;
logh = fopen("/home/user1/data.txt", "a+");
if (logh == NULL)
{
printf("error creating file \n");
return -1;
}
// write some data to the log handle and check if it gets written..
int result = fprintf(logh, "this is some test data \n");
if (result > 0)
printf("write successful \n");
else
printf("couldn't write the data to filesystem \n");
while (1) {
};
fclose(logh);
return 0;
}
When i run this program, i see that the file is getting created but it does not contain any data. what i understand i that there is data caching in memory before the data is actually written to the filesystem to avoid multiple IOs to increase performance. and I also know that i can call fsync/fdatasync inside the program to force a sync. but can i force the sync from outside without having to change the program?
I tried running sync command from Linux shell but it does not make the data to appear on the file. :(
Please help if anybody knows any alternative to do the same.
One useful information: I was researching some more on this and finally found this, to remove internal buffering altogether, the FILE mode can be set to _IONBF using int setvbuf(FILE *stream, char *buf, int mode, size_t size)
The IO functions usingFILE pointers cache the data to be written in an internal buffer within the program's memory until they decide to perform a system call to 'really' write it (which is for normal files usually when the size of the data cached reaches BUFSIZ).
Until then, there is no way to force writing from outside the progam.
The problem is that your program does not close the file because of your while statement. Remove these lines:
while (1) {
};
If the intent is to wait forever, then close the file with fclose before executing the while statement.

displaying contents of a file on monitor in C

I'm trying to recreate a program I saw in class.
The teacher made a file with 10 lines, he showed us that the file was indeed created, and then he displayed its contents.
My code doesn't work for some reason, it just prints what looks like a"=" a million times and then exits.
My code:
void main()
{
FILE* f1;
char c;
int i;
f1=fopen("Essay 4.txt","w");
for(i=0;i<10;i++)
fprintf(f1," This essay deserves a 100!\n");
do
{
c=getc(f1);
putchar(c);
}while(c!=EOF);
}
What is the problem? as far as I can see I did exactly what was in the example given.
The flow is as such:
You create a file (reset it to an empty file if it exists). That's what the "w" mode does.
Then you write stuff. Note that the file position is always considered to be at the very end, as writing moves the file position.
Now you try to read from the end. The very first thing you read would be an EOF already. Indeed, when I try your program on my Mac, I just get a single strange character just as one would expect from the fact that you're using a do { } while. I suggest you instead do something like: for (c = getc(f1); c != EOF; c = getc(f1)) { putchar(c) } or similar loop.
But also, your reading should fail anyway because the file mode is "w" (write only) instead of "w+".
So you need to do two things:
Use file mode "w+".
Reset the file position to the beginning of the file after writing to it: fseek(f1, 0, SEEK_SET);.

File IO does not appear to be reading correctly

Disclaimer: this is for an assignment. I am not asking for explicit code. Rather, I only ask for enough help that I may understand my problem and correct it myself.
I am attempting to recreate the Unix ar utility as per a homework assignment. The majority of this assignment deals with file IO in C, and other parts deal with system calls, etc..
In this instance, I intend to create a simple listing of all the files within the archive. I have not gotten far, as you may notice. The plan is relatively simple: read each file header from an archive file and print only the value held in ar_hdr.ar_name. The rest of the fields will be skipped over via fseek(), including the file data, until another file is reached, at which point the process begins again. If EOF is reached, the function simply terminates.
I have little experience with file IO, so I am already at a disadvantage with this assignment. I have done my best to research proper ways of achieving my goals, and I believe I have implemented them to the best of my ability. That said, there appears to be something wrong with my implementation. The data from the archive file does not seem to be read, or at least stored as a variable. Here's my code:
struct ar_hdr
{
char ar_name[16]; /* name */
char ar_date[12]; /* modification time */
char ar_uid[6]; /* user id */
char ar_gid[6]; /* group id */
char ar_mode[8]; /* octal file permissions */
char ar_size[10]; /* size in bytes */
};
void table()
{
FILE *stream;
char str[sizeof(struct ar_hdr)];
struct ar_hdr temp;
stream = fopen("archive.txt", "r");
if (stream == 0)
{
perror("error");
exit(0);
}
while (fgets(str, sizeof(str), stream) != NULL)
{
fscanf(stream, "%[^\t]", temp.ar_name);
printf("%s\n", temp.ar_name);
}
if (feof(stream))
{
// hit end of file
printf("End of file reached\n");
}
else
{
// other error interrupted the read
printf("Error: feed interrupted unexpectedly\n");
}
fclose(stream);
}
At this point, I only want to be able to read the data correctly. I will work on seeking the next file after that has been finished. I would like to reiterate my point, however, that I'm not asking for explicit code - I need to learn this stuff and having someone provide me with working code won't do that.
You've defined a char buffer named str to hold your data, but you are accessing it from a separate memory ar_hdr structure named temp. As well, you are reading binary data as a string which will break because of embedded nulls.
You need to read as binary data and either change temp to be a pointer to str or read directly into temp using something like:
ret=fread(&temp,sizeof(temp),1,stream);
(look at the doco for fread - my C is too rusty to be sure of that). Make sure you check and use the return value.

Asynchronous io in c using windows API: which method to use and why does my code execute synchronous?

I have a C application which generates a lot of output and for which speed is critical. The program is basically a loop over a large (8-12GB) binary input file which must be read sequentially. In each iteration the read bytes are processed and output is generated and written to multiple files, but never to multiple files at the same time. So if you are at the point where output is generated and there are 4 output files you write to either file 0 or 1 or 2, or 3. At the end of the iteration I now write the output using fwrite(), thereby waiting for the write operation to finish. The total number of output operations is large, up to 4 million per file, and output size of files ranges from 100mb to 3.5GB. The program runs on a basic multicore processor.
I want to write output in a separate thread and I know this can be done with
Asyncronous I/O
Creating threads
I/O completion ports
I have 2 type of questions, namely conceptual and code specific.
Conceptual Question
What would be the best approach. Note that the application should be portable to Linux, however, I don't see how that would be very important for my choice for 1-3, since I would write a wrapper around anything kernel/API specific. For me the most important criteria is speed. I have read that option 1 is not that likely to increase the performance of the program and that the kernel in any case creates new threads for the i/o operation, so then why not use option (2) immediately with the advantage that it seems easier to program (also since I did not succeed with option (1), see code issues below).
Note that I read https://stackoverflow.com/questions/3689759/how-can-i-run-a-specific-function-of-thread-asynchronously-in-c-c, but I dont see a motivation on what to use based on the nature of the application. So I hope somebody could provide me with some advice what would be best in my situation. Also from the book "Windows System Programming" by Johnson M. Hart, I know that the recommendation is using threads, mainly because of the simplicity. However, will it also be fastest?
Code Question
This question involves the attempts I made so far to make asynchronous I/O work. I understand that its a big piece of code so that its not that easy to look into. In any case I would really appreciate any attempt.
To decrease execution time I try to write the output by means of a new thread using WINAPI via CreateFile() with FILE_FLAGGED_OVERLAP with an overlapped structure. I have created a sample program in which I try to get this to work. However, I encountered 2 problems:
The file is only opened in overlapped mode when I delete an already existing file (I have tried using CreateFile in different modes (CREATE_ALWAYS, CREATE_NEW, OPEN_EXISTING), but this does not help).
Only the first WriteFile is executed asynchronously. The remainder of WriteFile commands is synchronous. For this problem I already consulted http://support.microsoft.com/kb/156932. It seems that the problem I have is related to the fact that "any write operation to a file that extends its length will be synchronous". I've already tried to solve this by increasing file size/valid data size (commented region in code). However, I still do not get it to work. I'm aware of the fact that it could be the case that to get most out of asynchronous io i should CreateFile with FILE_FLAG_NO_BUFFERING, however I cannot get this to work as well.
Please note that the program creates a file of about 120mb in the path of execution. Also note that print statements "not ok" are not desireable, I would like to see "can do work in background" appear on my screen... What goes wrong here?
#include <windows.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ASYNC // remove this definition to run synchronously (i.e. using fwrite)
#ifdef ASYNC
struct _OVERLAPPED *pOverlapped;
HANDLE *pEventH;
HANDLE *pFile;
#else
FILE *pFile;
#endif
#define DIM_X 100
#define DIM_Y 150000
#define _PRINTERROR(msgs)\
{printf("file: %s, line: %d, %s",__FILE__,__LINE__,msgs);\
fflush(stdout);\
return 0;} \
#define _PRINTF(msgs)\
{printf(msgs);\
fflush(stdout);} \
#define _START_TIMER \
time_t time1,time2; \
clock_t clock1; \
time(&time1); \
printf("start time: %s",ctime(&time1)); \
fflush(stdout);
#define _END_TIMER\
time(&time2);\
clock1 = clock();\
printf("end time: %s",ctime(&time2));\
printf("elapsed processor time: %.2f\n",(((float)clock1)/CLOCKS_PER_SEC));\
fflush(stdout);
double aio_dat[DIM_Y] = {0};
double do_compute(double A,double B, int arr_len);
int main()
{
_START_TIMER;
const char *pName = "test1.bin";
DWORD dwBytesToWrite;
BOOL bErrorFlag = FALSE;
int j=0;
int i=0;
int fOverlapped=0;
#ifdef ASYNC
// create / open the file
pFile=CreateFile(pName,
GENERIC_WRITE, // open for writing
0, // share write access
NULL, // default security
CREATE_ALWAYS, // create new/overwrite existing
FILE_FLAG_OVERLAPPED, // | FILE_FLAG_NO_BUFFERING, // overlapped file
NULL); // no attr. template
// check whether file opening was ok
if(pFile==INVALID_HANDLE_VALUE){
printf("%x\n",GetLastError());
_PRINTERROR("file not opened properly\n");
}
// make the overlapped structure
pOverlapped = calloc(1,sizeof(struct _OVERLAPPED));
pOverlapped->Offset = 0;
pOverlapped->OffsetHigh = 0;
// put event handle in overlapped structure
if(!(pOverlapped->hEvent = CreateEvent(NULL,TRUE,FALSE,NULL))){
printf("%x\n",GetLastError());
_PRINTERROR("error in createevent\n");
}
#else
pFile = fopen(pName,"wb");
#endif
// create some output
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
// determine how many bytes should be written
dwBytesToWrite = (DWORD)sizeof(aio_dat);
for(i=0;i<DIM_X;i++){ // do this DIM_X times
#ifdef ASYNC
//if(i>0){
//SetFilePointer(pFile,dwBytesToWrite,NULL,FILE_CURRENT);
//if(!(SetEndOfFile(pFile))){
// printf("%i\n",pFile);
// _PRINTERROR("error in set end of file\n");
//}
//SetFilePointer(pFile,-dwBytesToWrite,NULL,FILE_CURRENT);
//}
// write the bytes
if(!(bErrorFlag = WriteFile(pFile,aio_dat,dwBytesToWrite,NULL,pOverlapped))){
// check whether io pending or some other error
if(GetLastError()!=ERROR_IO_PENDING){
printf("lasterror: %x\n",GetLastError());
_PRINTERROR("error while writing file\n");
}
else{
fOverlapped=1;
}
}
else{
// if you get here output got immediately written; bad!
fOverlapped=0;
}
if(fOverlapped){
// do background, this msgs is what I want to see
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
_PRINTF("can do work in background\n");
}
else{
// not overlapped, this message is bad
_PRINTF("not ok\n");
}
// wait to continue
if((WaitForSingleObject(pOverlapped->hEvent,INFINITE))!=WAIT_OBJECT_0){
_PRINTERROR("waiting did not succeed\n");
}
// reset event structure
if(!(ResetEvent(pOverlapped->hEvent))){
printf("%x\n",GetLastError());
_PRINTERROR("error in resetevent\n");
}
pOverlapped->Offset+=dwBytesToWrite;
#else
fwrite(aio_dat,sizeof(double),DIM_Y,pFile);
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
#endif
}
#ifdef ASYNC
CloseHandle(pFile);
free(pOverlapped);
#else
fclose(pFile);
#endif
_END_TIMER;
return 1;
}
double do_compute(double A,double B, int arr_len)
{
int i;
double res = 0;
double *xA = malloc(arr_len * sizeof(double));
double *xB = malloc(arr_len * sizeof(double));
if ( !xA || !xB )
abort();
for (i = 0; i < arr_len; i++) {
xA[i] = sin(A);
xB[i] = cos(B);
res = res + xA[i]*xA[i];
}
free(xA);
free(xB);
return res;
}
Useful links
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/cref_cls/common/cppref_asynchioC_aio_read_write_eg.htm
http://www.ibm.com/developerworks/linux/library/l-async/?ca=dgr-lnxw02aUsingPOISIXAIOAPI
http://www.flounder.com/asynchexplorer.htm#Asynchronous%20I/O
I know this is a big question and I would like to thank everybody in advance who takes the trouble reading it and perhaps even respond!
You should be able to get this to work using the OVERLAPPED structure.
You're on the right track: the system is preventing you from writing asynchronously because every WriteFile extends the size of the file. However, you're doing the file size extension wrong. Simply calling SetFileSize will not actually reserve space in the MFT. Use the SetFileValidData function. This will allocate clusters for your file (note that they will contain whatever garbage the disk had there) and you should be able to execute WriteFile and your computation in parallel.
I would stay away from FILE_FLAG_NO_BUFFERING. You're after more performance with parallelism I presume? Don't prevent the cache from doing its job.
Another option that you did not consider is a memory mapped file. Those are available on Windows and Linux. There is a handy Boost abstraction that you could use.
With a memory mapped file, every thread in your process could write its output to the file on its own time, assuming that the record sizes are known and each thread has its own output area.
The operating system will take care of writing the mapped pages to disk when needed or when it gets around to it or when you close the file. Maybe when you close the file. Now that I think about it, some operating systems may require that you call msync to guarantee it.
I don't see why you would want to write asynchronously. Doing things in parallel does not make them faster in all cases. If you write two file at the same time to the same disk, it will almost always be a lot faster. If that is the case, just write them one after another.
If you have some fancy drive like SSD or a virtual RAM drive, parallel writing could be faster. You have to create an file with at full size and then do your parallel magic.
Asynchronous writing is nice, but is done by any OS anyway. The potential gain for you is that you can do other things than writing to disk like displaying a progress bar. This is where multi-threading can help you.
So imho you should use serial writing or parallel writing to multiple disks.
hth

Resources