How do you programmatically create a completely empty sparse file on linux? - c

If you run dd with this:
dd if=/dev/zero of=sparsefile bs=1 count=0 seek=1048576
You appear to get a completely unallocated sparse file (this is ext4)
smark#we:/sp$ ls -ls sparsefile
0 -rw-rw-r-- 1 smark smark 1048576 Nov 24 16:19 sparsefile
fibmap agrees:
smark#we:/sp$ sudo hdparm --fibmap sparsefile
sparsefile:
filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
Without having to dig through the source of dd, I'm trying to figure out how to do that in C.
I tried fseeking and fwriting zero bytes, but it did nothing.
Not sure what else to try, I figured somebody might know before I hunt down dd's innards.
EDIT: including my example...
FILE *f = fopen("/sp/sparse2", "wb");
fseek(f, 1048576, SEEK_CUR);
fwrite("x", 1, 0, f);
fclose(f);

When you write to a file using write or various library routines that ultimately call write, there's a file offset pointer associated with the file descriptor that determines where in the file the bytes will go. It's normally positioned at the end of the data that was processed by the most recent call to read or write. But you can use lseek to position the pointer anywhere within the file, and even beyond the current end of the file. When you write data at a point beyond the current EOF, the area that was skipped is conceptually filled with zeroes. Many systems will optimize things so that any whole filesystem blocks in that skipped area simply aren't allocated, producing a sparse file. Attempts to read such blocks will succeed, returning zeroes.
Writing block-sized areas full of zeroes to a file generally won't produce a sparse file, although it's possible for some filesystems to do this.
Another way to produce a sparse file, used by GNU dd, is to call ftruncate. The documentation says this:
The ftruncate() function causes the regular file referenced by fildes to have a size of length bytes.
If the file previously was larger than length, the extra data is discarded. If it was previously shorter than length, it is unspecified whether the file is changed or its size increased. If the file is extended, the extended area appears as if it were zero-filled.
Support for sparse files is filesystem-specific, although virtually all designed-for-UNIX local filesystems support them.

This is complementary to the answer by #MarkPlotnick, it's a sample simple implementation of the feature you requested using ftruncate():
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
int
main(void)
{
int file;
int mode;
mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
file = open("sparsefile", O_WRONLY | O_CREAT, mode);
if (file == -1)
return -1;
ftruncate(file, 0x100000);
close(file);
return 0;
}

Related

What is the correct approach to write multiple small pieces to a temp file in c, in multithreads?

I am simulating multithreads file downloading. My strategy is in each thread would receive small file pieces( each file piece has piece_length and piece_size and start_writing_pos )
And then each thread writes to the same buffer. How do I realize it ? Do I have to worry about collisions ?
//=================== follow up ============//
so I write a small demo as follows:
#include <stdio.h>
int main(){
char* tempfilePath = "./testing";
FILE *fp;
fp = fopen(tempfilePath,"w+");//w+: for reading and writing
fseek( fp, 9, SEEK_SET);//starting in 10-th bytes
fwrite("----------",sizeof(char), 10, fp);
fclose(fp);
}
And before execution I let content in "./testing" to be "XXXXXXXXXXXXXXXXXXX", after I do the above I get "^#^#^#^#^#^#^#^#^#----------" I wonder where is the problem then ....
Do what most torrent clients do. Create a file with the final size having an extension .part. Then allocate non-overlapping parts of the file to each thread, who shall have their own file-descriptors. Thus collisions are avoided. Rename to final name when finished.
Unless you want to use a mutex, you can't use fwrite(). FILE *-based IO using fopen(), fwrite(), and all related functions simply isn't reentrant - the FILE uses a SINGLE buffer., a SINGLE offset, etc.
You can't even use open() and lseek()/write() - multiple threads will interfere with each other, modifying the one offset an open file descriptor has.
Use open() to open the file, and use pwrite() to write data to exact offsets.
pwrite() man page:
pwrite() writes up to count bytes from the buffer starting at buf to
the file descriptor fd at offset offset. The file offset is not
changed.

open a temporary C FILE* for input

I have a legacy function accepting a FILE* pointer in a library. The contents I would like to parse is actually in memory, not on disk.
So I came up with the following steps to work around this issue:
the data is in memory at this point
fopen a temporary file (using tmpnam or tmpfile) on disk for writing
fclose the file
fopen the same file again for reading - guaranteed to exist
change the buffer using setvbuf(buffer, size)
do the legacy FILE* stuff
close the file
remove the temporary file
the data can be discarded
On windows, it looks like this:
int bufferSize;
char buffer[bufferSize];
// set up the buffer here
// temporary file name
char tempName [L_tmpnam_s];
tmpnam_s(tempName, L_tmpnam_s);
// open/close/reopen
fopen_s(&fp, tempName,"wb");
fclose(fp);
freopen_s(&fp, tempName,"rb", fp);
// replace the internal buffer
setvbuf(fp, buffer, _IONBF, bufferSize);
fp->_ptr = buffer;
fp->_cnt = bufferSize;
// do the FILE* reading here
// close and remove tmp file
fclose(fp);
remove(tempName);
Works, but quite cumbersome. The main problem, aside from the backwardness of this approach, are:
the temporary name needs to be determined
the temporary file is actually written to disk
the temporary file needs to be removed afterwards
I'd like to keep things portable, so using Windows memory-mapped functions or boost's facilities is not an option. The problem is mainly that, while it is possible to convert a FILE* to an std::fstream, the reverse seems to be impossible, or at least not supported on C++99.
All suggestions welcome!
Update 1
Using a pipe/fdopen/setvbuf as suggested by Speed8ump and a bit of twiddling seems to work. It does no longer create files on disk nor does it consume extra memory. One step closer, except, for some reason, setvbuf is not working as expected. Manually fixing it up is possible, but of course not portable.
// create a pipe for reading, do not allocate memory
int pipefd[2];
_pipe(pipefd, 0, _O_RDONLY | _O_BINARY);
// open the read pipe for binary reading as a file
fp = _fdopen(pipefd[0], "rb");
// try to switch the buffer ptr and size to our buffer, (no buffering)
setvbuf(fp, buffer, _IONBF, bufferSize);
// for some reason, setvbuf does not set the correct ptr/sizes
fp->_ptr = buffer;
fp->_charbuf = fp->_bufsiz = fp->_cnt = bufferSize;
Update 2
Wow. So it seems that unless I dive into the MS-specific implementation CreateNamedPipe / CreateFileMapping, POSIX portability costs us an entire memcopy (of any size!), be it to file or into a pipe. Hopefully the compiler understands that this is just a temporary and optimizes this. Hopefully.
Still, we eliminated the silly device writing intermediate. Yay!
int pipefd[2];
pipe(pipefd, bufferSize, _O_BINARY); // setting internal buffer size
FILE* in = fdopen(pipefd[0], "rb");
FILE* out = fdopen(pipefd[1], "wb");
// the actual copy
fwrite(buffer, 1, bufferSize, out);
fclose(out);
// fread(in), fseek(in), etc..
fclose(in);
You might try using a pipe and fdopen, that seems to be portable, is in-memory, and you might still be able to do the setvbuf trick you are using.
Your setvbuf hack is a nice idea, but not portable. C11 (n1570):
7.21.5.6 The setvbuf function
Synopsis
#include <stdio.h>
int setvbuf(FILE * restrict stream,
char * restrict buf,
int mode, size_t size);
Description
[...] If buf is not a null pointer, the array it points to may be used instead of a buffer allocated by the setvbuf function [...] and the argument size specifies the size of the array; otherwise, size may determine the size of a buffer allocated by the setvbuf function. The contents of the array at any time are indeterminate.
There is neither a guarantee that the provided buffer is used at all, nor about what it contains at any point after the setvbuf call until the file is closed or setvbuf is called again (POSIX doesn't give more guarantees).
The easiest portable solution, I think, is using tmpfile, fwrite the data into that file, fseek to the beginning (I'm not sure if temporary files are guaranteed to be seekable, on my Linux system, it appears they are, and I'd expect them to be elsewhere), and pass the FILE pointer to the function. This still requires copying in memory, but I guess usually no writing of the data to the disk (POSIX, unfortunately, implicitly requires a real file to exist). A file obtained by tmpfile is deleted after closing.

Clearing file contents only using FILE * [duplicate]

I'm using C to write some data to a file. I want to erase the previous text written in the file in case it was longer than what I'm writing now.
I want to decrease the size of file or truncate until the end. How can I do this?
If you want to preserve the previous contents of the file up to some length (a length bigger than zero, which other answers provide), then POSIX provides the truncate() and ftruncate() functions for the job.
#include <unistd.h>
int ftruncate(int fildes, off_t length);
int truncate(const char *path, off_t length);
The name indicates the primary purpose - shortening a file. But if the specified length is longer than the previous length, the file grows (zero padding) to the new size. Note that ftruncate() works on a file descriptor, not a FILE *; you could use:
if (ftruncate(fileno(fp), new_length) != 0) ...error handling...
However, you should be aware that mixing file stream (FILE *) and file descriptor (int) access to a single file is apt to lead to confusion — see the comments for some of the issues. This should be a last resort.
It is likely, though, that for your purposes, truncate on open is all you need, and for that, the options given by others will be sufficient.
For Windows, there is a function SetEndOfFile() and a related function SetFileValidData() function that can do a similar job, but using a different interface. Basically, you seek to where you want to set the end of file and then call the function.
There's also a function _chsize() as documented in the answer by sofr.
In Windows systems there's no header <unistd.h> but yet you can truncate a file by using
_chsize( fileno(f), size);
That's a function of your operating system. The standard POSIX way to do it is:
open("file", O_TRUNC | O_WRONLY);
If this is to run under some flavor of UNIX, these APIs should be available:
#include <unistd.h>
#include <sys/types.h>
int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);
According to the "man truncate" on my Linux box, these are POSIX-conforming. Note that these calls will actually increase the size of the file (!) if you pass a length greater than the current length.
<edit>
Ah, you edited your post, you're using C. When you open the file, open it with the mode "w+" like so, and it will truncate it ready for writing:
FILE* f = fopen("C:\\gabehabe.txt", "w+");
fclose(file);
</edit>
To truncate a file in C++, you can simply create an ofstream object to the file, using ios_base::trunc as the file mode to truncate it, like so:
ofstream x("C:\\gabehabe.txt", ios_base::trunc);
If you want to truncate the entire file, opening the file up for writing does that for you. Otherwise, you have to open the file for reading, and read the parts of the file you want to keep into a temporary variable, and then output it to wherever you need to.
Truncate entire file:
FILE *file = fopen("filename.txt", "w"); //automatically clears the entire file for you.
Truncate part of the file:
FILE *inFile("filename.txt", "r");
//read in the data you want to keep
fclose(inFile);
FILE *outFile("filename.txt", "w");
//output back the data you want to keep into the file, or what you want to output.

Is there any way to create dummy file descriptor in linux?

I have opened one file with following way:
fp = fopen("some.txt","r");
Now in this file the 1st some bytes lets say 40 bytes are unnecessary junk of data so I want to remove them. But I cannot delete that data from that file, modify or
create duplicates of that file without that unnecessary data.
So I want to create another dummy FILE pointer which points to the file and when I pass this dummy pointer to any another function that does the following operation:
fseek ( dummy file pointer , 0 , SEEK_SET );
then it should set the file pointer at 40th position in my some.txt.
But the function accepts a file descriptor so i need to pass a file descriptor which will treat the file as those first 40 bytes were never in the file.
In short that dummy descriptor should treat the file as those 40 bytes were not in that file and all positioning operations should be with respect to that 40th byte counting as the is 1st byte.
Easy.
#define CHAR_8_BIT (0)
#define CHAR_16_BIT (1)
#define BIT_WIDTH (CHAR_8_BIT)
#define OFFSET (40)
FILE* fp = fopen("some.txt","r");
FILE* dummy = NULL;
#if (BIT_WIDTH == CHAR_8_BIT)
dummy = fseek (fp, OFFSET*sizeof(char), SEEK_SET);
#else
dummy = fseek (fp, OFFSET*sizeof(wchar_t), SEEK_SET);
#endif
The SEEK_SET macro indicates beginning of file, and depending on whether you are using 8-bit characters (ASCI) or 16-bit characters (eg: UNICODE) you will step 40 CHARACTERS forward from the beginning of your file pointer, and assign that pointer/address to dummy.
Good luck!
These links will likely be helpful as well:
char vs wchar_t
http://www.cplusplus.com/reference/clibrary/cstdio/fseek/
If you want, you can just convert a file descriptor to a file pointer via the fdopen() call.
http://linux.die.net/man/3/fdopen
fseek ( dummy file pointer , 0 , SEEK_SET );
In short that dummy pointer should treat the file as there is no that 40 byte in that file and all position should be with respect to that 40th byte as counting as it is 1st byte.
You have conflicting requirements, you cannot do this with the C API.
SEEK_SET always refers to the absolute position in the file, which means if you want that command to work, you have to modify the file and remove the junk.
On linux you could write a FUSE driver that would present the file like it was starting from the 40th byte, but that's a lot of work. I'm only mentioned this because it's possible to solve the problem you've created, but it would be quite silly to actually do this.
The simplest thing of course would be just to abandon this emulating layer idea you're looking for, and write code that can handle that extra header junk.
If you want to remove the first 40 bytes of a file on the disk without creating another file, then you can copy the content from the 41th byte and onwards into a buffer, then write it back at offset -40. Then use ftruncate (a POSIX library in unistd.h) to truncate at (filesize - 40) offset.
I wrote a small code with what i understood from your question.
#include<stdio.h>
void readIt(FILE *afp)
{
char mystr[100];
while ( fgets (mystr , 100 , afp) != NULL )
puts (mystr);
}
int main()
{
FILE * dfp = NULL;
FILE * fp = fopen("h4.sql","r");
if(fp != NULL)
{
fseek(fp,10,SEEK_SET);
dfp = fp;
readIt(dfp);
fclose(fp);
}
}
The readIt() is reading the file from the 11 byte.
Is this what you are expecting or something else?
I haven't actually tried this, but I think you should be able to use mmap (with the MAP_SHARED option) to get your file mapped into your address space, and then fmemopen to get a FILE* that refers to all but the first 40 bytes of that buffer.
This gives you a FILE* (as you describe in the body of your question), but I believe not a file descriptor (as in the title and elsewhere in the question). The two are not the same, and AFAIK the FILE* created with fmemopen does not have an associated file descriptor.

How to truncate a file in C?

I'm using C to write some data to a file. I want to erase the previous text written in the file in case it was longer than what I'm writing now.
I want to decrease the size of file or truncate until the end. How can I do this?
If you want to preserve the previous contents of the file up to some length (a length bigger than zero, which other answers provide), then POSIX provides the truncate() and ftruncate() functions for the job.
#include <unistd.h>
int ftruncate(int fildes, off_t length);
int truncate(const char *path, off_t length);
The name indicates the primary purpose - shortening a file. But if the specified length is longer than the previous length, the file grows (zero padding) to the new size. Note that ftruncate() works on a file descriptor, not a FILE *; you could use:
if (ftruncate(fileno(fp), new_length) != 0) ...error handling...
However, you should be aware that mixing file stream (FILE *) and file descriptor (int) access to a single file is apt to lead to confusion — see the comments for some of the issues. This should be a last resort.
It is likely, though, that for your purposes, truncate on open is all you need, and for that, the options given by others will be sufficient.
For Windows, there is a function SetEndOfFile() and a related function SetFileValidData() function that can do a similar job, but using a different interface. Basically, you seek to where you want to set the end of file and then call the function.
There's also a function _chsize() as documented in the answer by sofr.
In Windows systems there's no header <unistd.h> but yet you can truncate a file by using
_chsize( fileno(f), size);
That's a function of your operating system. The standard POSIX way to do it is:
open("file", O_TRUNC | O_WRONLY);
If this is to run under some flavor of UNIX, these APIs should be available:
#include <unistd.h>
#include <sys/types.h>
int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);
According to the "man truncate" on my Linux box, these are POSIX-conforming. Note that these calls will actually increase the size of the file (!) if you pass a length greater than the current length.
<edit>
Ah, you edited your post, you're using C. When you open the file, open it with the mode "w+" like so, and it will truncate it ready for writing:
FILE* f = fopen("C:\\gabehabe.txt", "w+");
fclose(file);
</edit>
To truncate a file in C++, you can simply create an ofstream object to the file, using ios_base::trunc as the file mode to truncate it, like so:
ofstream x("C:\\gabehabe.txt", ios_base::trunc);
If you want to truncate the entire file, opening the file up for writing does that for you. Otherwise, you have to open the file for reading, and read the parts of the file you want to keep into a temporary variable, and then output it to wherever you need to.
Truncate entire file:
FILE *file = fopen("filename.txt", "w"); //automatically clears the entire file for you.
Truncate part of the file:
FILE *inFile("filename.txt", "r");
//read in the data you want to keep
fclose(inFile);
FILE *outFile("filename.txt", "w");
//output back the data you want to keep into the file, or what you want to output.

Resources