Creating a file with a hole - c

I was trying to create a file with a hole using a textbook code and modifying it slightly. However, something must've gone wrong, since I don't see any difference between both files in terms of size and disk blocks.
Code to create a file with a hole (Advanced Programming in the Unix Environment)
#include "apue.h"
#include <fcntl.h>
char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
int
main(void)
{
int fd;
if ((fd = creat("file.hole", FILE_MODE)) < 0)
printf("creat error");
if (write(fd, buf1, 10) != 10)
printf("buf1 write error");
/* offset now = 10 */
if (lseek(fd, 16384, SEEK_SET) == -1)
printf("lseek error");
/* offset now = 16384 */
if (write(fd, buf2, 10) != 10)
printf("buf2 write error");
/* offset now = 16394 */
exit(0);
}
My code, creating a file basically full of abcdefghij's.
#include "apue.h"
#include <fcntl.h>
#include<unistd.h>
char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
int
main(void)
{
int fd;
if ((fd = creat("file.nohole", FILE_MODE)) < 0)
printf("creat error");
while(lseek(fd,0,SEEK_CUR)<16394)
{
if (write(fd, buf1, 10) != 10)
printf("buf1 write error");
}
exit(0);
}
Printing both files I get the expected output. However, their size is identical.
{linux1:~/dir} ls -ls *hole
17 -rw-------+ 1 user se 16394 Sep 14 11:42 file.hole
17 -rw-------+ 1 user se 16400 Sep 14 11:33 file.nohole

You misunderstand what is meant by a "hole".
What it means is that you write some number of bytes, skip over some other number of bytes, then write more bytes. The bytes you don't explicitly write in between are set to 0.
The file itself doesn't have a hole in it, i.e. two separate sections. It just has bytes with 0 in them.
If you were to look at the contents of the first file, you'll see it has "abcdefghij" followed by 16373 (16384 - 10 - 1) bytes containing the value 0 (not the character "0") followed by ABCDEFGHIJ.

Related

write() writes null characters in a file instead of my string

When I run my program:
char stringnums[(NUMSIZE + 1) * NUMINROW + 1];
for(int i = 0; i < NUMINROW; i++)
sprintf(stringnums, "%d %s", rand() % (NUMSIZE * 10), stringnums);
if (write(desc, stringnums, strlen(stringnums)) == -1)
perror("write");
I can see some rubbish in the end of a file:
21 21 21 27 22 22 12 12 12 12 ... strange symbols...
Full code:
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#define NUMINROW 10
#define NUMSIZE 3
int main(){
int desc;
struct flock fl;
char stringnums[(NUMSIZE + 1) * NUMINROW + 1];
char* path = "randnums.txt";
srand(time(NULL));
desc = open(path, O_WRONLY);
if (desc == -1)
perror("open");
if (fcntl(desc, F_GETLK, &fl) == -1)
perror("fcntl_getlk");
fl.l_type = F_WRLCK;
if (fcntl(desc, F_SETLK, &fl) == -1)
perror("fcntl_setlk");
for(int i = 0; i < NUMINROW; i++)
sprintf(stringnums, "%d %s", rand() % (NUMSIZE * 10), stringnums);
if (write(desc, stringnums, strlen(stringnums)) == -1)
perror("write");
fl.l_type = F_UNLCK;
if (fcntl(desc, F_SETLK, &fl) == -1)
perror("fcntl_setlk");
if (close(desc) == -1)
perror("close");
return 0;
}
I have tried either to initialize stringnums as "\0" and put in write() sizeof() instead of strlen(), but it's not worked.
As #AndreasWenzel points out, you are attempting to concatenate a string overtop of itself... This is likely to break.
The receiving buffer must be large enough to hold what you want to put into it. Don't scrimp! Why not use 10Kb instead of trying to work with what you hope to be the minimum required?
To correctly concatenate into a large enough buffer:
char buf[ 10 * 1024 ], *at = buf;
for( i = 0; i < 5; i++ )
at += sprintf( at, "Something %d ", i );
The current length of the growng buffer can be quickly determined by at - buf.
You may also want to use snprintf() to pre-calculate the intending addition to the growing string to avoid writing beyond the end of the buffer.

Unwanted characters when copying file using scatter/gather I/O (readv/writev)

I'm trying to build a program to copy existing content from an existing file to the new file using readv() and writev().
Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char *argv[])
{
int fs, fd;
ssize_t bytes_read, bytes_written;
char buf[3][50];
int iovcnt;
struct iovec iov[3];
int i;
fs = open(argv[1], O_RDONLY);
if (fs == -1) {
perror("open");
return -1;
}
fd = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, S_IRWXU);
if (fd == -1) {
perror("open");
return 1;
}
for(i = 0; i < 3; i++) {
iov[i].iov_base = buf[i];
iov[i].iov_len = sizeof(buf[i]);
}
iovcnt = sizeof(iov) / sizeof(struct iovec);
if ((bytes_read=readv(fs, iov, iovcnt)) != -1)
if ((bytes_written=writev(fd, iov, iovcnt)) == -1)
perror("error writev");
printf("read: %ld bytes, write: %ld bytes\n", bytes_read, bytes_written);
if (close (fs)) {
perror("close fs");
return 1;
}
if (close (fd)) {
perror("close fd");
return 1;
}
return 0;
}
Problem: Let's say I ran the program with argv[1] corresponding to the file called file1.txt and copied it to argv[2], let's say it's called as hello.txt.
This is the content of file1.txt:
Ini adalah line pertamaS
Ini adalah line kedua
Ini adalah line ketiga
When I ran the program, the new created file specified in argv[2] were filled by unwanted characters such as \00.
Output after running the program:
Ini adalah line pertamaS
Ini adalah line kedua
Ini adalah line ketiga
\00\00\FF\B5\F0\00\00\00\00\00\C2\00\00\00\00\00\00\00W\D4\CF\FF\00\00V\D4\CF\FF\00\00\8D\C4|\8C\F8U\00\00\C8o\A6U\E5\00\00#\C4|\8C\F8U\00\00\00\00\00\00\00\00\00\00 \C1|\8C\F8U\00\00`\D5\CF\FF
I suspect the main cause of the problem is unfitted size of buf array. I've already look up internet for the solutions and there are nothing to be found. Can anyone give me some enlightment to fix this problem? I tried to make the buf or iov_len to be variable-length but I couldn't find the right way to do it. Thanks everyone!
readv() works with byte counts driven by each .iov_len and no special treatment for any content (like a line-feed). The readv() in the original posting is passed an array of (3) struct iovec, each with .iov_len set to 50. After a successful readv(), the content of the local buf[3][50] would be:
buf[0] : first 50 bytes from the input file
buf[1] : next 20 bytes from the input file, then 30 bytes of uninitialized/leftover stack data
buf[2] : another 50 bytes of uninitialized/leftover stack data
The writev() reuses the same struct iovec array with all (3) .iov_len unchanged from 50, and writes 150 bytes as expected. The content of the output file has the first 70 bytes copied from the input file and 80 bytes of leftover stack data. If the local buf was cleared before calling readv(), the output file would contain trailing NULLs.

how to properly use posix_memalign

I'am struggling to find out how to proper use the pread and pwrite.
In this case, I am trying to read only 256 bytes using pread.
However, that whenever I try to read less than 512 bytes pread will not return anything.
I believe that this problem has to be with the SECTOR argument that I am assigning to posix_memalign...
Is there some obvious info that I have to be aware of?
#define BUF_SIZE 256
#define SECTOR 512
#define FILE_SIZE 1024 * 1024 * 1024 //1G
int main( int argc, char **argv ){
int fd, nr;
char fl_nm[]={"/dev/nvme0n1p1"};
char* aligned_buf_w = NULL;
char* aligned_buf_r = NULL;
void* ad = NULL;
if (posix_memalign(&ad, SECTOR, BUF_SIZE)) {
perror("posix_memalign failed"); exit (EXIT_FAILURE);
}
aligned_buf_w = (char *)(ad);
ad = NULL;
if (posix_memalign(&ad, SECTOR, BUF_SIZE)) {
perror("posix_memalign failed"); exit (EXIT_FAILURE);
}
aligned_buf_r = (char *)(ad);
memset(aligned_buf_w, '*', BUF_SIZE * sizeof(char));
printf("BEFORE READ BEGIN\n");
printf("\t aligned_buf_w::%ld\n",strlen(aligned_buf_w));
printf("\t aligned_buf_r::%ld\n",strlen(aligned_buf_r));
printf("BEFORE READ END\n");
fd = open(fl_nm, O_RDWR | O_DIRECT);
pwrite(fd, aligned_buf_w, BUF_SIZE, 0);
//write error checking
if(nr == -1){
perror("[error in write 2]\n");
}
nr = pread(fd, aligned_buf_r, BUF_SIZE, 0);
//read error checking
if(nr == -1){
perror("[error in read 2]\n");
}
printf("AFTER READ BEGIN\n");
printf("\taligned_buf_r::%ld \n",strlen(aligned_buf_r));
printf("AFTER READ END\n");
//error checking for close process
if(close(fd) == -1){
perror("[error in close]\n");
}else{
printf("[succeeded in close]\n");
}
return 0;
}
Here is the output when I read and write 512 bytes
BEFORE READ BEGIN
aligned_buf_w::512
aligned_buf_r::0
BEFORE READ END
AFTER READ BEGIN
aligned_buf_r::512
AFTER READ END
[succeeded in close]
and here is the result when I try to read 256 bytes
BEFORE READ BEGIN
aligned_buf_w::256
aligned_buf_r::0
BEFORE READ END
[error in read 2]
: Invalid argument
AFTER READ BEGIN
aligned_buf_r::0
AFTER READ END
[succeeded in close]
While using O_DIRECT "the kernel will do DMA directly from/to the physical memory pointed by the userspace buffer passed as parameter" - https://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html - so you have to observe some restrictions - http://man7.org/linux/man-pages/man8/raw.8.html
All I/Os must be correctly aligned in memory and on disk: they must start at a sector
offset on disk, they must be an exact number of sectors long, and the
data buffer in virtual memory must also be aligned to a multiple of
the sector size. The sector size is 512 bytes for most devices.
With buffered IO you do not care of that. The following sample illustrates that while reading a HDD (/dev/sda9) :
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define SECTOR 512
int main( int argc, char **argv ){
int fd, nr, BUF_SIZE;
char fl_nm[]={"/dev/sda9"};
char* buf = NULL;
if (argc>1) {
BUF_SIZE = atoi(argv[1]);
// BUFFERED IO
printf("Buffered IO -------\n");
if ((buf = (char*)malloc(BUF_SIZE)) == NULL) perror("[malloc]");
else {
if ((fd = open(fl_nm, O_RDONLY)) == -1) perror("[open]");
if((nr = pread(fd, buf, BUF_SIZE, 4096)) == -1) perror("[pread]");
else
printf("%i bytes read %.2x %.2x ...\n",nr,buf[0],buf[1]);
free(buf);
if(close(fd) == -1) perror("[close]");
}
// DIRECT IO
printf("Direct IO ---------\n");
if (posix_memalign((void *)&buf, SECTOR, BUF_SIZE)) {
perror("posix_memalign failed");
}
else {
if ((fd = open(fl_nm, O_RDONLY | O_DIRECT)) == -1) perror("[open]");
/* buf size , buf alignment and offset has to observe hardware restrictions */
if((nr = pread(fd, buf, BUF_SIZE, 4096)) == -1) perror("[pread]");
else
printf("%i bytes read %.2x %.2x ...\n",nr,buf[0],buf[1]);
free(buf);
if(close(fd) == -1) perror("[close]");
}
}
return 0;
}
You can verify the following behaviour :
$ sudo ./testodirect 512
Buffered IO -------
512 bytes read 01 04 ...
Direct IO ---------
512 bytes read 01 04 ...
$ sudo ./testodirect 4
Buffered IO -------
4 bytes read 01 04 ...
Direct IO ---------
[pread]: Invalid argument
By the way O_DIRECT is not in flavour of everybody https://yarchive.net/comp/linux/o_direct.html
512B is the smallest unit you can read from a storage device

unbuffered I/O: read int from file and write it on standard output

This program is meant to take as parameter a file, then read a string from standard input and write its length into the file, then read the content of the file (which is supposed to contain the lengths of the strings from the standard input) and write it in standard output:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define MAX_BUFF 4096
int main(int argc, char **argv)
{
if (argc != 2)
{
puts("you must specify a file!");
return -1;
}
int nRead;
char buffer[MAX_BUFF], tmp;
int fd;
puts("write \"end\" to stop:");
fd = open(argv[1], O_RDWR | O_CREAT | O_APPEND, S_IRWXU);
while ((nRead = read(STDIN_FILENO, buffer, MAX_BUFF)) > 0 && strncmp(buffer,"end", nRead-1) != 0 )
{
if ( write(fd, &nRead, 1) < 0 )
{
perror("write error.");
return -1;
}
}
puts("now i am gonna print the length of the strings:");
lseek(fd, 0, SEEK_SET); //set the offset at start of the file
while ((nRead = read(fd, buffer, 1)) > 0)
{
tmp = (char)buffer[0];
write(STDOUT_FILENO, &tmp, 1);
}
close(fd);
return 0;
}
this is the result:
write "end" to stop:
hello
world
i am a script
end
now i am gonna print the length of the strings:
I tried to convert the values written in the file into char before write in standard output with no success.
How am i supposed to print on standard output the lengths by using unbuffered I/O? Thank you for your replies
EDIT: i changed the read from file with this:
while((read(fd, &buffer, 1)) > 0)
{
tmp = (int)*buffer;
sprintf(buffer,"%d:", tmp);
read(fd, &buffer[strlen(buffer)], tmp);
write(STDOUT_FILENO, buffer, strlen(buffer));
}
but actually i have no control on the effective strlen of the string thus the output is this:
13:ciao atottti
4:wow
o atottti
5:fine
atottti
as you can see, the strlength is correct because it consinder the newline character ttoo. Still there is no control on the effective buffer size.

APUE: Createing a file with a hole in it: Figure 3.2 pg 65

In the example from, "Advance Programming in the Unix Environment" the following sample program creates a file, then uses lseek to move the file pointer to a further address thus placing a "hole" in the file. The author says the space in between is filled with "0's". I wanted to see if those "0's" would print out. So I modified the program slightly. However I noticed that only the valid characters were writen to the file.
My question is how does the Unix/Linux filesystem manager know not to print the bytes in between?
#include "apue.h"
#include <fcntl.h>
#include <unistd.h>
char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
char buf3[10];
int
main(void)
{
int fd;
if ((fd = creat("file.hole", FILE_MODE)) < 0) {
err_sys("creat error");
}
if (write(fd, buf1, 10) != 10) { /* offset is now = 10 */
err_sys("buf1 write error");
}
if (lseek(fd, 16380, SEEK_SET) == -1) { /* offset now = 16380 */
err_sys("lseek error");
}
if (write(fd, buf2, 10) != 10) { /* offset now = 16390 */
err_sys("buf2 write error");
}
close(fd);
if ((fd = open("file.hole", O_RDWR)) == -1) {
err_sys("failed to re-open file");
}
ssize_t n;
ssize_t m;
while ((n = read(fd, buf3, 10)) > 0) {
if ((m = write(STDOUT_FILENO, buf3, 10)) != 10) {
err_sys("stdout write error");
}
}
if (n == -1) {
err_sys("buf3 read error");
}
exit(0);
}
The character \000 has a null-width display representation. It is printed, but its printing is invisible. Not every codepoint is a character. In the same way, \n is printed as a newline, not as a character.

Resources