Read function from c returns 0 prematurely - c

I have this code to copy chunks of 1 KB from a source file to a destination file (practically create a copy of the file) :
test.cpp
#include<stdio.h>
#include<unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include<string.h>
int main() {
int fd = open("file1.mp4", O_RDONLY);
int fd2 = open("file2.mp4", O_WRONLY | O_CREAT | O_APPEND);
int nr = 0;int n;
char buff[1024];
memset(buff, 0, 1024);
while((n = read(fd, buff, 1024)) != 0) {
write(fd2, buff, strlen(buff));
nr = strlen(buff);
memset(buff, 0, 1024);
}
printf("succes %d %d\n", nr,n);
close(fd);
close(fd2);
return 0;
}
I have tried to copy a .mp4 file, which has 250 MB, but the result has only 77.4 MB. The return value of the last read(), n, is 0, so there isn't supposed to be any error (but it should be, since it doesn't copy entire input file).
I think that the .mp4 file has a EOF byte, which does not actually mean the end of the file.
What should I do to be able to copy the entire .mp4 file (I would like an answer to improve my code, not a completely different code).
Thanks for help!

The problem is that you write strlen(buff) bytes instead of n bytes in your loop.
Whenever the buffer contains a \0 byte, strlen will take it to mean "end of string" and you end up not writing any more. (And when it doesn't contain a \0, you end up reading past the end of the buffer).

Related

I made program that copies data from one file and pastes to another using (read,write) but i think its taking too long

i need to copy 1gb file to another and i am using this code while using different buffers (1byte, 512byte and 1024byte) while using 512byte buffer it took me about 22seconds but when i use 1byte buffer copying doesnt end even after 44minutes. Is that time expected or mby something is wrong with my code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <corecrt_io.h>
int main(int argc, char* argv[])
{
char sourceName[20], destName[20], bufferStr[20];
int f1, f2, fRead;
int bufferSize = 0;
char* buffer;
/*printf("unesite buffer size(u bytima): ");
scanf("%d", &bufferSize);*/
//bufferSize = argv[3];
bufferSize = atoi(argv[3]);
buffer = (char*)calloc(bufferSize, sizeof(char));
/*printf("unesite source name: ");
scanf("%s", sourceName);*/
strcpy(sourceName, argv[1]);
f1 = open(sourceName, O_RDONLY);
if (f1 == -1)
printf("something's wrong with oppening source file!\n");
else
printf("file opened!\n");
/*printf("unesite destination name: ");
scanf("%s", destName);*/
strcpy(destName, argv[2]);
f2 = open(destName, O_CREAT | O_WRONLY | O_TRUNC | O_APPEND);
if (f2 == -1)
printf("something's wrong with oppening destination file!\n");
else
printf("file2 opened!");
fRead = read(f1, buffer, bufferSize);
while (fRead != 0)
{
write(f2, buffer, bufferSize);
fRead = read(f1, buffer, bufferSize);
}
return 0;
}
Yes, this is expected, because system calls are expensive operations, so the time is roughly proportional to the number of times you call read() and write(). If it takes 22 seconds to copy with 512-byte buffers, you should expect it to take about 22 * 512 seconds with 1-byte buffers. That's 187 minutes, or over 3 hours.
This is why stdio implements buffered output by default.

Unwanted characters when copying file using scatter/gather I/O (readv/writev)

I'm trying to build a program to copy existing content from an existing file to the new file using readv() and writev().
Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char *argv[])
{
int fs, fd;
ssize_t bytes_read, bytes_written;
char buf[3][50];
int iovcnt;
struct iovec iov[3];
int i;
fs = open(argv[1], O_RDONLY);
if (fs == -1) {
perror("open");
return -1;
}
fd = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, S_IRWXU);
if (fd == -1) {
perror("open");
return 1;
}
for(i = 0; i < 3; i++) {
iov[i].iov_base = buf[i];
iov[i].iov_len = sizeof(buf[i]);
}
iovcnt = sizeof(iov) / sizeof(struct iovec);
if ((bytes_read=readv(fs, iov, iovcnt)) != -1)
if ((bytes_written=writev(fd, iov, iovcnt)) == -1)
perror("error writev");
printf("read: %ld bytes, write: %ld bytes\n", bytes_read, bytes_written);
if (close (fs)) {
perror("close fs");
return 1;
}
if (close (fd)) {
perror("close fd");
return 1;
}
return 0;
}
Problem: Let's say I ran the program with argv[1] corresponding to the file called file1.txt and copied it to argv[2], let's say it's called as hello.txt.
This is the content of file1.txt:
Ini adalah line pertamaS
Ini adalah line kedua
Ini adalah line ketiga
When I ran the program, the new created file specified in argv[2] were filled by unwanted characters such as \00.
Output after running the program:
Ini adalah line pertamaS
Ini adalah line kedua
Ini adalah line ketiga
\00\00\FF\B5\F0\00\00\00\00\00\C2\00\00\00\00\00\00\00W\D4\CF\FF\00\00V\D4\CF\FF\00\00\8D\C4|\8C\F8U\00\00\C8o\A6U\E5\00\00#\C4|\8C\F8U\00\00\00\00\00\00\00\00\00\00 \C1|\8C\F8U\00\00`\D5\CF\FF
I suspect the main cause of the problem is unfitted size of buf array. I've already look up internet for the solutions and there are nothing to be found. Can anyone give me some enlightment to fix this problem? I tried to make the buf or iov_len to be variable-length but I couldn't find the right way to do it. Thanks everyone!
readv() works with byte counts driven by each .iov_len and no special treatment for any content (like a line-feed). The readv() in the original posting is passed an array of (3) struct iovec, each with .iov_len set to 50. After a successful readv(), the content of the local buf[3][50] would be:
buf[0] : first 50 bytes from the input file
buf[1] : next 20 bytes from the input file, then 30 bytes of uninitialized/leftover stack data
buf[2] : another 50 bytes of uninitialized/leftover stack data
The writev() reuses the same struct iovec array with all (3) .iov_len unchanged from 50, and writes 150 bytes as expected. The content of the output file has the first 70 bytes copied from the input file and 80 bytes of leftover stack data. If the local buf was cleared before calling readv(), the output file would contain trailing NULLs.

File descriptor's offset is not moving when data is read using gzFile

I am trying to understand how glib's gzip functions works. So I wrote a small program to simulate what I need.
what I need is:
I need to open and store the file descriptor and when ever I want to just pass the fd and open a gzFile using a dupped fd and then close it. so that my original fd remains open for future read.
I've gone through lib manual here!
It says that:
"If you want to keep fd open, use fd = dup(fd_keep); gz = gzdopen(fd, mode);. The duplicated descriptor should be saved to avoid a leak, since gzdopen does not close fd if it fails."
I am doing the same as part of my below given code, where I am reading one character every time and closing the fd so that I can use it in future.
Here's My Code with gzFile that does not work:
#include <stdio.h>
#include <zlib.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
int ouFd1;
int inpFd1;
int main( int argc, char ** argv )
{
// Open a file to write the data
inpFd1 = open("temp.txt", O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
char* str = (char*)"Anil Prasad.";
gzFile gzfile = gzdopen(inpFd1, "wb9h");
int len = gzwrite(gzfile, &(str[0]), strlen(str));
printf("written length: %d\n", len);
gzclose(gzfile);
// open a file to read the data.
ouFd1 = open("temp.txt", O_RDONLY);
char b[1];
while (len > 0) {
int ouFd1_dup = dup(ouFd1);
gzFile gzFile_2 = gzdopen(ouFd1_dup, "rb");
int r = gzread(gzFile_2, &(b[0]), 1);
printf("Character : %c\n", b[0]);
len--;
gzclose(gzFile_2);
}
fsync(ouFd1);
close(ouFd1);
}
The output of this is:
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Character : A
Can some help me understand why offset is not moving after I do a gzread()?
Or is it getting reset when I am doing gzclose(gzFile_2);?
I've tried moving offset as well like:
while (len > 0) {
int ouFd1_dup = dup(ouFd1);
gzFile gzFile_2 = gzdopen(ouFd1_dup, "rb");
int r = gzread(gzFile_2, &(b[0]), 1);
gzseek(gzFile_2, 1, SEEK_CUR);
printf("Character : %c\n", b[0]);
len--;
gzclose(gzFile_2);
}
But results remains same!
Can someone help me with this?
You are opening and closing the file within the loop - this will reset the file pointer to the beginning on each iteration.
I don't think you need to duplicate the file descriptor for what your doing. You could just use gzopen("temp.txt", "rb") and use the file pointer given.
You are also using a buffer of size 1 - you could get the size of the file first and read into a buffer of appropriate size.
I would do something like this:
// Create buffer
char *buffer = new char[len+1];
memset(buffer, 0, len);
//Open file
gzFile * file = gzopen("temp.txt", "rb");
gzread(file, buffer, len);
printf("%s\n", buffer);
//Close file
gzclose(file);
//Delete buffer
delete [] buffer;

write file by mmap, but when I use fread, the second time read error data

When I use mmap and memcpy to write a file, and then I use fread to read the data.
Below is my code, The problem is the first time i can read the a, but the second time i can't read a.
I guess there is something like seek position in fread function, when I use memcpy to write file, It may change the seek position.
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main()
{
int fd = open("./aa", O_CREAT | O_RDWR | O_TRUNC, 0644);
FILE* f = fopen("./aa", "r");
if (ftruncate(fd, 1024) < 0) {
printf("ftruncate error\n");
}
void* base;
if ((base = mmap(NULL, 1024, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)) == MAP_FAILED) {
printf("mmap error\n");
}
char* file_ptr = (char *)base;
char buffer[256];
char scratch[256];
buffer[0] = 'a';
memcpy(file_ptr, buffer, 1);
file_ptr += 1;
size_t n = fread(scratch, 1, 1, f);
printf("size n %zu\n", n); // this output size n 1
printf("scratch %c\n", scratch[0]); // this output scratch a
memcpy(file_ptr, buffer, 1);
file_ptr += 1;
n = fread(scratch, 1, 1, f);
printf("size n %zu\n", n); // this output size n 1
printf("scratch %c\n", scratch[0]); // but this output scratch
return 0;
}
The output is :
size n 1
scratch a
size n 1
scratch
First of all, #wildplasser is right, your program may work, but if you go on mixing mmap and stdio you'll need to make sure that writes done via mmap get committed (use the msync() function) and that fread isn't buffering stale data (fseek()ing to the current position should do the trick).
Coming to your question: your program doesn't print "scratch", it prints "scratch \0" :)
Seriously, what you do is initialize the size of the "aa" file via ftruncate(), which is the same as filling the missing bytes up to 1024 '\0'; you write an 'a', and read it; then you read another character, and you get one of the NULs.
Try printing the ascii character of scratch[0] and you'll see it's zero; if you're still not convinced, try adding something like
for(i = 0; i < 6; i++)
file_ptr[i] = "QWERTY"[i];
right before the first memcpy and see what happens.

lseek() returning 0 when followed by new open()

I have the following bit of code (it's "example" code, so nothing fancy):
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
char buffer[9];
int fp = open("test.txt", O_RDONLY);
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, strlen(buffer));
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
printf("Read test.txt %d characters from start: %s\n", offset, buffer);
}
close(fp);
}
int fp2 = open("test.txt", O_WRONLY);
if (fp2 != -1)
{
off_t offset = lseek(fp2, 2, SEEK_CUR); // Seek fraom current position (0) - same result as above in this case
ssize_t count = write(fp2, buffer, strlen(buffer));
if (count == strlen(buffer)) // We successfully wrote all the bytes
{
printf("Wrote to test.txt %d characters from current (0): %s\n", offset, buffer);
}
close(fp2);
}
}
This code does not return the first printout (reading) as it is, and the second printout reads: "Wrote test.txt 0 characters from current (0): " indicating that it did not seek anywhere in the file and that buffer is empty.
The odd thing is, if I comment out everything from fp2 = open("test.txt", O_WRONLY);, the first printout returns what I expected. As soon as I include the second open statement (even with nothing else) it won't write it. Does it somehow re-order the open statements or something else?
The line
ssize_t count = read(fp, buffer, strlen(buffer));
is wrong, you're taking the strlen of an uninitialized buffer. You likely want the size of the buffer like so:
ssize_t count = read(fp, buffer, sizeof buffer);
You should make sure buffer really contain a nul terminated string as well when you print it as one.
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, sizeof buffer - 1);
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
buffer[count] = 0;
Are you perfectly sure you are cleaning out the file every time you run?
As written, the first time you run this, you'll only see the second printout, and the second time you might see the first one.

Resources