C, unix and overwriting a char with write(), open() and lseek() - c

I need to replace the a character in a text file with '?'. It's not working as expected.
The file has contents 'abc' (without quotes) and i've got to use the unix system calls: lseek(), open() and write(). I can't use the standard C file I/O functions.
The plan is to eventually exand this into a more generalised "find and replace" utility.
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
int main(){
int file = open("data", O_RDWR); //open file with contents 'abc'
lseek(file,0,0); //positions at first char at beginnging of file.
char buffer;
read(file,&buffer, sizeof(buffer));
printf("%c\n", buffer); // text file containing 'abc', it prints 'a'.
if (buffer == 'a'){
char copy = '?';
write(file,&copy,1); //text file containing 'abc' puts '?' were 'b' is.
}
close(file);
}
The file "data" contains abc, i want to replace a with ? and make it ?bc but i'm getting a?c
read() is reading the right char, but write() is writing to the next char.
Why is this?
Been searching google for hours.
Thanks

The answer is actually embedded in your own code, in a way.
The lseek call you do right after open is not required because when you first open a file the current seek offset is zero.
After each successful read or write operation, the seek offset moves forward by the number of bytes read/written. (If you add O_APPEND to your open the seek offset also moves just before each write, to the current-end-of-file, but that's not relevant at this point.)
Since you successfully read one byte, your seek offset moves from 0 to 1. If you want to put it back to 0, you must do that manually.
(You should also check that each operation actually succeeds, of course, but I assume you left that out for brevity here.)

Your call to read() moves the file pointer forward one byte - i.e. from 0 to 1. Since you're using the same file descriptor ("int file = ...") for reading and writing, the position is the same for reading and writing.
To write over the byte that was just read, you need to lseek() back one byte after
(buffer == 'a')
comes true.

lseek is in the wrong place. Once 'a' has been found it writes '?' in the next available spot (which happens to overwrite 'b'). To fix, you need to change the current position using lseek BEFORE you write.
if (buffer == 'a'){
char copy = '?';
lseek(file,0,SEEK_SET); //positions at first char at beginnging of file.
write(file,&copy,1); //text file containing 'abc' puts '?' were 'b' is.
}

Related

does the read call in linux add a newline at EOF?

why does read() on a file in linux add a newline character at EOF even if the file really does not have a newline character ?
my file data is :
1hello2hello3hello4hello5hello6hello7hello8hello9hello10hello11hello12hello13hello14hello15hello
my read() call on this file should hit EOF after reading the last 'o' in "15hello". I use the below :
while( (n = read(fd2, src, read_size-1)) != 0) // read_size = 21
{
//... some code
printf("%s",src);
//... some code
}
where fd2 is the file's descriptor. At the last loop, n was 17 and i had src[16] = '\n'. So......, does the read call in linux add a newline at EOF?
does the read call in linux add a newline at EOF?
No.
Your input file likely has a terminating newline in it - most well-formatted text files do, so multiple files can be concatenated without lines running together.
You could also be running into a stray newline character that was already in your buffer, because read() does not terminate the data read with a NUL character to create an actual C-style string. And I'd guess your code doesn't either, else you would have posted it. Which means your
printf("%s",src);
is quite likely undefined behavior.
why does read() on a file in linux add a newline character at EOF even if the file really does not have a newline character ? No, read() system call doesn't add any new line at end of file.
You are experiencing this kind of behavior because may be you have created text file using vi command and note that default new line gets added if you have created file using vi.
You can validate this on your system by creating a empty text file using vi and then run wc command on that.
Also you can read file data using read() system call all at once if you know the file size(find size using stat() system call) and can avoid while loop.
This
while( (n = read(fd2, src, read_size-1)) != 0) {
/* some code */
}
Change to
struct stat var;
stat(filename, &var); /* check the retuen value of stat()..having all file info now */
off_t size = var.st_size;
Now you have size of file, create one dynamic or stack array equal to size and read the data from file.
char *ptr = malloc(size + 1);
Now read all data at once like
read(fd,ptr,size);/*now ptr having all file contents */
And at last once work done, Don't forgot to free the ptr by calling free(ptr).

How to rewind a file descriptor using only system calls?

I have this sample code where I'm trying to implement for my operating systems assignment a program that copies the contents of an input file to an output file. I'm only allowed to use POSIX system calls, stdio is forbidden.
I've thought about storing the contents in a buffer but in my implementation I must know the file descriptor contents size. I googled a little and found about
off_t fsize;
fsize = lseek (input, 0, SEEK_END);
But in this case my file descriptor (input) gets messed up and I can't rewind it to the start. I played around with the parameters but I can't figure a way to rewind it back to the first character in the file after using lseek. That's the only thing I need, having that I can loop byte by byte and copy all the contents of input to output.
My code is here, it's very short in case any of you want have to take a look:
https://github.com/lucas-sartm/OSAssignments/blob/master/copymachine.c
I figured it out by trial and error. All that was needed was to read the documentation and take a look at read() return values... This loop solved the issue.
while (read (input, &content, sizeof(content)) > 0){ //this will write byte by byte until end of buffer!
write (output, &content, sizeof(content));
}

How to duplicate an image file? [duplicate]

I am designing an image decoder and as a first step I tried to just copy the using c. i.e open the file, and write its contents to a new file. Below is the code that I used.
while((c=getc(fp))!=EOF)
fprintf(fp1,"%c",c);
where fp is the source file and fp1 is the destination file.
The program executes without any error, but the image file(".bmp") is not properly copied. I have observed that the size of the copied file is less and only 20% of the image is visible, all else is black. When I tried with simple text files, the copy was complete.
Do you know what the problem is?
Make sure that the type of the variable c is int, not char. In other words, post more code.
This is because the value of the EOF constant is typically -1, and if you read characters as char-sized values, every byte that is 0xff will look as the EOF constant. With the extra bits of an int; there is room to separate the two.
Did you open the files in binary mode? What are you passing to fopen?
It's one of the most "popular" C gotchas.
You should use freadand fwrite using a block at a time
FILE *fd1 = fopen("source.bmp", "r");
FILE *fd2 = fopen("destination.bmp", "w");
if(!fd1 || !fd2)
// handle open error
size_t l1;
unsigned char buffer[8192];
//Data to be read
while((l1 = fread(buffer, 1, sizeof buffer, fd1)) > 0) {
size_t l2 = fwrite(buffer, 1, l1, fd2);
if(l2 < l1) {
if(ferror(fd2))
// handle error
else
// Handle media full
}
}
fclose(fd1);
fclose(fd2);
It's substantially faster to read in bigger blocks, and fread/fwrite handle only binary data, so no problem with \n which might get transformed to \r\n in the output (on Windows and DOS) or \r (on (old) MACs)

in append/update mode is a call to a file positioning function still required?

After opening a file in append update mode, is it necessary to execute a file positioning statement before each write to the file?
FILE *h;
int ch;
if ((h = fopen("data", "a+")) == NULL) exit(1);
if (fseek(h, 0 SEEK_SET)) exit(2);
ch = fgetc(h); /* read very first character */
if (ch == EOF) exit(3);
/* redundant? mandatory? */
fseek(h, 0, SEEK_END); /* call file positioning before output */
/* add 1st character to the end of file on a single line*/
fprintf(h, "%c\n", ch);
The C11 Standard says:
7.21.5.3/6 ... all subsequent writes to the file to be forced to the then current end-of-file ...
and
7.21.5.3/7 ... input shall not be directly followed by output without an
intervening call to a file positioning function ...
I take it the shall in 7.21.5.3/7 is stronger than the description in 7.21.5.3/6.
Probably not redundant in portable C. While the underlying file descriptor will always append (at least on Unix), the point of the fseek/fflush requirement is to get rid of the input buffer before writing to the output, so that the same buffer can be used for reading and writing. AFAIK you're not even required to seek to end of file, you can seek anywhere, as long as you seek.
The second description is stronger than the first, but that is to be expected. The first only states that all writes go to EOF, i.e. that there's no way to write anywhere else. The second establishes the rule that switching from reading to writing must be accompanied by a flush or seek, to ensure that read and write aspects of the buffer don't get mixed up.

Is there any way to create dummy file descriptor in linux?

I have opened one file with following way:
fp = fopen("some.txt","r");
Now in this file the 1st some bytes lets say 40 bytes are unnecessary junk of data so I want to remove them. But I cannot delete that data from that file, modify or
create duplicates of that file without that unnecessary data.
So I want to create another dummy FILE pointer which points to the file and when I pass this dummy pointer to any another function that does the following operation:
fseek ( dummy file pointer , 0 , SEEK_SET );
then it should set the file pointer at 40th position in my some.txt.
But the function accepts a file descriptor so i need to pass a file descriptor which will treat the file as those first 40 bytes were never in the file.
In short that dummy descriptor should treat the file as those 40 bytes were not in that file and all positioning operations should be with respect to that 40th byte counting as the is 1st byte.
Easy.
#define CHAR_8_BIT (0)
#define CHAR_16_BIT (1)
#define BIT_WIDTH (CHAR_8_BIT)
#define OFFSET (40)
FILE* fp = fopen("some.txt","r");
FILE* dummy = NULL;
#if (BIT_WIDTH == CHAR_8_BIT)
dummy = fseek (fp, OFFSET*sizeof(char), SEEK_SET);
#else
dummy = fseek (fp, OFFSET*sizeof(wchar_t), SEEK_SET);
#endif
The SEEK_SET macro indicates beginning of file, and depending on whether you are using 8-bit characters (ASCI) or 16-bit characters (eg: UNICODE) you will step 40 CHARACTERS forward from the beginning of your file pointer, and assign that pointer/address to dummy.
Good luck!
These links will likely be helpful as well:
char vs wchar_t
http://www.cplusplus.com/reference/clibrary/cstdio/fseek/
If you want, you can just convert a file descriptor to a file pointer via the fdopen() call.
http://linux.die.net/man/3/fdopen
fseek ( dummy file pointer , 0 , SEEK_SET );
In short that dummy pointer should treat the file as there is no that 40 byte in that file and all position should be with respect to that 40th byte as counting as it is 1st byte.
You have conflicting requirements, you cannot do this with the C API.
SEEK_SET always refers to the absolute position in the file, which means if you want that command to work, you have to modify the file and remove the junk.
On linux you could write a FUSE driver that would present the file like it was starting from the 40th byte, but that's a lot of work. I'm only mentioned this because it's possible to solve the problem you've created, but it would be quite silly to actually do this.
The simplest thing of course would be just to abandon this emulating layer idea you're looking for, and write code that can handle that extra header junk.
If you want to remove the first 40 bytes of a file on the disk without creating another file, then you can copy the content from the 41th byte and onwards into a buffer, then write it back at offset -40. Then use ftruncate (a POSIX library in unistd.h) to truncate at (filesize - 40) offset.
I wrote a small code with what i understood from your question.
#include<stdio.h>
void readIt(FILE *afp)
{
char mystr[100];
while ( fgets (mystr , 100 , afp) != NULL )
puts (mystr);
}
int main()
{
FILE * dfp = NULL;
FILE * fp = fopen("h4.sql","r");
if(fp != NULL)
{
fseek(fp,10,SEEK_SET);
dfp = fp;
readIt(dfp);
fclose(fp);
}
}
The readIt() is reading the file from the 11 byte.
Is this what you are expecting or something else?
I haven't actually tried this, but I think you should be able to use mmap (with the MAP_SHARED option) to get your file mapped into your address space, and then fmemopen to get a FILE* that refers to all but the first 40 bytes of that buffer.
This gives you a FILE* (as you describe in the body of your question), but I believe not a file descriptor (as in the title and elsewhere in the question). The two are not the same, and AFAIK the FILE* created with fmemopen does not have an associated file descriptor.

Resources