C - Inaccurate write() position obtained with lseek() when using multiple threads - c

I'm having a problem getting the correct file position at which I'm writing when simultaneously writing to different parts of the same file using multiple threads.
I have one global file descriptor to the file. In my writing function, I
first lock a mutex, then do lseek(global_fd, 0, SEEK_CUR) to get the current file
position. I next write 31 zero bytes (31 is my entry size) using write(), in effect to reserve space for later. I then unlock the mutex.
Later in the function, I declare a local fd variable to the same file, and open
it. I now do an lseek on that local fd to get to the position I learned from
earlier, where my space is reserved. Finally, I write() 31 data bytes there for
the entry, and close the local fd.
The issue seems to be that rarely, an entry doesn't get written to the expected location (it's not mangled data - it seems that either it is swapped with a different entry, or two entries were written to the same location). There are multiple threads running that
"writing function" I described.
I since learned that pwrite() can be used to write to a specific offset, which would be more efficient, and eliminate the lseek(). However, I first want to find out: what is wrong with my original algorithm? Is there any type of buffering that could be causing the discrepancy between the expected write location, and where the data actually ends up getting stored in the file?
The relevant code snippet is below. The reason this is an issue is that in a second data file, I record the location where the entry I'm writing will be stored. If that location, based on the lseek() before the write, is not accurate, my data doesn't match up properly -- which is what happens on occasion (it's hard to reproduce - it happens in maybe 1 in 100k writes). Thanks!
db_entry_add(...)
{
char dbrecord[DB_ENTRY_SIZE];
int retval;
pthread_mutex_lock(&db_mutex);
/* determine the EOF index, at which we will add the log entry */
off_t ndb_offset = lseek(cfg.curr_fd, 0, SEEK_CUR);
if (ndb_offset == -1)
{
fprintf(stderr, "Unable to determine ndb offset: %s\n", strerror_s(errno, ebuf, sizeof(ebuf)));
pthread_mutex_unlock(&db_mutex);
return 0;
}
/* reserve entry-size bytes at the location, at which we will
later add the log entry */
memset(dbrecord, 0, sizeof(dbrecord));
/* note: db_write() is a write() loop */
if (db_write(cfg.curr_fd, (char *) &dbrecord, DB_ENTRY_SIZE) < 0)
{
fprintf(stderr, "db_entry_add2db - db_write failed!");
close(curr_fd);
pthread_mutex_unlock(&db_mutex);
return 0;
}
pthread_mutex_unlock(&db_mutex);
/* in another data file, we now record that the entry we're going to write
will be at the specified location. if it's not (which is the problem,
on rare occasion), our data will be inconsistent */
advertise_entry_location(ndb_offset);
...
/* open the data file */
int write_fd = open(path, O_CREAT|O_LARGEFILE|O_WRONLY, 0644);
if (write_fd < 0)
{
fprintf(stderr, "%s: Unable to open file %s: %s\n", __func__, cfg.curr_silo_db_path, strerror_s(errno, ebuf, sizeof(ebuf)));
return 0;
}
pthread_mutex_lock(&db_mutex);
/* seek to our reserved write location */
if (lseek(write_fd, ndb_offset, SEEK_SET) == -1)
{
fprintf(stderr, "%s: lseek failed: %s\n", __func__, strerror_s(errno, ebuf, sizeof(ebuf)));
close(write_fd);
return 0;
}
pthread_mutex_unlock(&db_mutex);
/* write the entry */
/* note: db_write_with_mutex is a write() loop wrapped with db_mutex lock and unlock */
if (db_write_with_mutex(write_fd, (char *) &dbrecord, DB_ENTRY_SIZE) < 0)
{
fprintf(stderr, "db_entry_add2db - db_write failed!");
close(write_fd);
return 0;
}
/* close the data file */
close(write_fd);
return 1;
}
One more note, for completeness. I have a similar but simpler routine that could also be causing the problem. This one uses buffered output (FILE*, fopen, fwrite), but performs an fflush() at the end of each write. It writes to a different file than the earlier routine, but could cause the same symptom.
pthread_mutex_lock(&data_mutex);
/* determine the offset at which the data will be written. this has to be accurate,
otherwise it could be causing the problem */
offset = ftell(current_fp);
fwrite(data);
fflush(current_fp);
pthread_mutex_unlock(&data_mutex);

There seem to be several places where things could go wrong. I would make the following changes: (1) be consistent and use the same I/O library as per bdonlan's suggestion, (2) make the lseek() and the writes an atomic action guarded by a mutex so that only a single thread at a time can do those actions of adding to both files. SEEK_CUR does a seek based on the current location of the file offset pointer so would you not want SEEK_END to seek to the end of the file in order to append there? Then if you are modifying a particular section of the file you would use SEEK_SET to reposition to the location you want to write to. And you would want to do this in a mutex guarded section so as to allow only a single thread to do the file positioning and file update.

If you're using your 'simpler routine' at the same time, this could indeed be a problem. If these are separate file descriptors, there's nothing to ensure that they're both pointing at the end of the file at all times (unless you use append mode, however I'm not sure what the semantics around ftell for append mode are). If they're the same fd (ie, you have a raw fd and a FILE * pointing to the same place), you might have problems with the standard library getting confused about where you are in a file, when you use write() to bypass it.

Related

Is it recommended method for computing the size of a file using fseek()?

In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?
If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.
The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr ยง7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.
The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat
Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

how to read and write from .dat file in verifone

I want to read and write a text or .dat file in verifone to store data on it.
How can I make it ?
here is my code
int main()
{
char buf [255];
FILE *tst;
int dsply = open(DEV_CONSOLE , 0);
tst = fopen("test.txt","r+");
fputs("this text should write in file.",tst);
fgets(buf,30,tst);
write(dsply,buf,strlen(buf));
return 0;
}
Chapter 3 of the "Programmers Manual for Vx Solutions" ("23230_Verix_V_Operating_System_Programmers_Manual.pdf") is all about file management and contains all the functions I typically use when dealing with data files on the terminal. Go read through that and I think you'll find everything you need.
To get you started, you'll want to use open() together with the flags you want
O_RDONLY (read only)
O_WRONLY (write only)
O_RDWR (read and write)
O_APPEND (Opens with the file position pointer at the end of the file)
O_CREAT (create the file if it doesn't already exist),
O_TRUNC (truncate/delete previous contents if the file already exists),
O_EXCL (Returns error value if the file already exists)
On success, open will return a positive integer that is a handle which can be used for subsequent access to the file. On failure, it returns -1;
When the file is open, you can use read() and write() to manipulate the contents.
Be sure to call close() and pass in the return value from open when you are done with the file.
Your example above would look something like this:
int main()
{
char buf [255];
int tst;
int dsply = open(DEV_CONSOLE , 0);
//next we will open the file. We will want to read and write, so we use
// O_RDWR. If the files does not already exist, we want to create it, so
// we use O_CREAT. If the file *DOES* already exist, we want to truncate
// and start fresh, so we delete all previous contents with O_TRUNC
tst = open("test.txt", O_RDWR | O_CREAT | O_TRUNC);
// always check the return value.
if(tst < 0)
{
write(dsply, "ERROR!", 6);
return 0;
}
strcpy(buf, "this text should write in file.")
write(tst, buf, strlen(buf));
memset(buf, 0, sizeof(buf));
read(tst, buf, 30);
//be sure to close when you are done
close(tst);
write(dsply,buf,strlen(buf));
//you'll want to close your devices, as well
close(dsply);
return 0;
}
Your comments also ask about searching. For that, you'll also need to use lseek with one of the following which specifies where you are starting from:
SEEK_SET โ€” Beginning of file
SEEK_CUR โ€” Current seek pointer location
SEEK_END โ€” End of file
example
SomeDataStruct myData;
...
//assume "curPosition" is set to the beginning of the next data structure I want to read
lseek(file, curPosition, SEEK_SET);
result = read(file, (char*)&myData, sizeof(SomeDataStruct));
curPosition += sizeof(SomeDataStruct);
//now "curPosition" is ready to pull out the next data structure.
NOTE that the internal file pointer is already AT "curPosition", but doing it this way allows me to move forward and backward at will as I manipulate what is there. So, for example, if I wanted to move back to the previous data structure, I would simply set "curPosition" as follows:
curPosition -= 2 * sizeof(SomeDataStruct);
If I didn't want to keep track of "curPosition", I could also do the following which would also move the internal file pointer to the correct place:
lseek(file, - (2 * sizeof(SomeDataStruct)), SEEK_CUR);
You get to pick whichever method works best for you.

File IO does not appear to be reading correctly

Disclaimer: this is for an assignment. I am not asking for explicit code. Rather, I only ask for enough help that I may understand my problem and correct it myself.
I am attempting to recreate the Unix ar utility as per a homework assignment. The majority of this assignment deals with file IO in C, and other parts deal with system calls, etc..
In this instance, I intend to create a simple listing of all the files within the archive. I have not gotten far, as you may notice. The plan is relatively simple: read each file header from an archive file and print only the value held in ar_hdr.ar_name. The rest of the fields will be skipped over via fseek(), including the file data, until another file is reached, at which point the process begins again. If EOF is reached, the function simply terminates.
I have little experience with file IO, so I am already at a disadvantage with this assignment. I have done my best to research proper ways of achieving my goals, and I believe I have implemented them to the best of my ability. That said, there appears to be something wrong with my implementation. The data from the archive file does not seem to be read, or at least stored as a variable. Here's my code:
struct ar_hdr
{
char ar_name[16]; /* name */
char ar_date[12]; /* modification time */
char ar_uid[6]; /* user id */
char ar_gid[6]; /* group id */
char ar_mode[8]; /* octal file permissions */
char ar_size[10]; /* size in bytes */
};
void table()
{
FILE *stream;
char str[sizeof(struct ar_hdr)];
struct ar_hdr temp;
stream = fopen("archive.txt", "r");
if (stream == 0)
{
perror("error");
exit(0);
}
while (fgets(str, sizeof(str), stream) != NULL)
{
fscanf(stream, "%[^\t]", temp.ar_name);
printf("%s\n", temp.ar_name);
}
if (feof(stream))
{
// hit end of file
printf("End of file reached\n");
}
else
{
// other error interrupted the read
printf("Error: feed interrupted unexpectedly\n");
}
fclose(stream);
}
At this point, I only want to be able to read the data correctly. I will work on seeking the next file after that has been finished. I would like to reiterate my point, however, that I'm not asking for explicit code - I need to learn this stuff and having someone provide me with working code won't do that.
You've defined a char buffer named str to hold your data, but you are accessing it from a separate memory ar_hdr structure named temp. As well, you are reading binary data as a string which will break because of embedded nulls.
You need to read as binary data and either change temp to be a pointer to str or read directly into temp using something like:
ret=fread(&temp,sizeof(temp),1,stream);
(look at the doco for fread - my C is too rusty to be sure of that). Make sure you check and use the return value.

in append/update mode is a call to a file positioning function still required?

After opening a file in append update mode, is it necessary to execute a file positioning statement before each write to the file?
FILE *h;
int ch;
if ((h = fopen("data", "a+")) == NULL) exit(1);
if (fseek(h, 0 SEEK_SET)) exit(2);
ch = fgetc(h); /* read very first character */
if (ch == EOF) exit(3);
/* redundant? mandatory? */
fseek(h, 0, SEEK_END); /* call file positioning before output */
/* add 1st character to the end of file on a single line*/
fprintf(h, "%c\n", ch);
The C11 Standard says:
7.21.5.3/6 ... all subsequent writes to the file to be forced to the then current end-of-file ...
and
7.21.5.3/7 ... input shall not be directly followed by output without an
intervening call to a file positioning function ...
I take it the shall in 7.21.5.3/7 is stronger than the description in 7.21.5.3/6.
Probably not redundant in portable C. While the underlying file descriptor will always append (at least on Unix), the point of the fseek/fflush requirement is to get rid of the input buffer before writing to the output, so that the same buffer can be used for reading and writing. AFAIK you're not even required to seek to end of file, you can seek anywhere, as long as you seek.
The second description is stronger than the first, but that is to be expected. The first only states that all writes go to EOF, i.e. that there's no way to write anywhere else. The second establishes the rule that switching from reading to writing must be accompanied by a flush or seek, to ensure that read and write aspects of the buffer don't get mixed up.

Using MapViewOfFile, pointer eventually walks out of memory space

All,
I'm using MapViewOfFile to hold part of a file in memory. There is a stream that points to this file and writes to it, and then is rewound. I use the pointer to the beginning of the mapped file, and read until I get to the null char I write as the final character.
int fd;
yyout = tmpfile();
fd = fileno(yyout);
#ifdef WIN32
HANDLE fm;
HANDLE h = (HANDLE) _get_osfhandle (fd);
fm = CreateFileMapping(
h,
NULL,
PAGE_READWRITE|SEC_RESERVE,
0,
4096,
NULL);
if (fm == NULL) {
fprintf (stderr, "%s: Couldn't access memory space! %s\n", argv[0], strerror (GetLastError()));
exit(GetLastError());
}
bp = (char*)MapViewOfFile(
fm,
FILE_MAP_ALL_ACCESS,
0,
0,
0);
if (bp == NULL) {
fprintf (stderr, "%s: Couldn't fill memory space! %s\n", argv[0], strerror (GetLastError()));
exit(GetLastError());
}
Data is sent to the yyout stream, until flushData() is called. This writes a null to the stream, flushes, and then rewinds the stream. Then I start from the beginning of the mapped memory, and read chars until I get to the null.
void flushData(void) {
/* write out data in the stream and reset */
fprintf(yyout, "%c%c%c", 13, 10, '\0');
fflush(yyout);
rewind(yyout);
if (faqLine == 1) {
faqLine = 0; /* don't print faq's to the data file */
}
else {
char * ps = bp;
while (*ps != '\0') {
fprintf(outstream, "%c%c", *ps, blank);
ps++;
}
fflush(outfile);
}
fflush(yyout);
rewind(yyout);
}
After flushing, more data is written to the stream, which should be set to the start of the memory area. As near as I can determine with gdb, the stream is not getting rewound, and eventually fills up the allocated space.
Since the stream points to the underlying file, this does not cause a problem initially. But, when I attempt to walk the memory, I never find the null. This leads to a SIGSEV. If you want more details of why I need this, see here.
Why am I not reusing the memory space as expected?
I think this line from the MSDN documentation for CreateFileMapping might be the clue.
A mapped file and a file that is accessed by using the input and output (I/O) functions (ReadFile and WriteFile) are not necessarily coherent.
You're not apparently using Read/WriteFile, but the documentation should be understood in terms of mapped views versus explicit I/O calls. In any case, the C RTL is surely implemented using the Win32 API.
In short, this approach is problematic.
I don't know why changing the view/file size helps; perhaps it just shifts the undefined behaviour in a direction that happens to be beneficial.
Well, after working on this for a while, I have a working solution. I don't know why this succeeds, so if someone comes up with something better, I'll be happy to accept their answer instead.
fm = CreateFileMapping(
h,
NULL,
PAGE_READWRITE|SEC_RESERVE,
0,
16384,
NULL);
As you can see, the only change is to the size declared from 4096 to 16384. Why this works when the total chars input at a time is no more than 1200, I don't know. If someone could provide details on this, I would appreciate it.
When you're done with the map, simply un-map it.
UnmapViewOfFile(bp);

Resources