Is there a simple way to insert something at the beginning of a text file using file streams? Because the only way I can think of, is to load a file into a buffer, write text-to-append and then write the buffer. I want to know if it is possible to do it without my buffer.
No, its not possible. You'd have to rewrite the file to insert text at the beginning.
EDIT: You could avoid reading the whole file into memory if you used a temporary file ie:
Write the value you want inserted at the beginning of the file
Read X bytes from the old file
Write those X bytes to the new file
Repeat 2,3 until you are done reading the old file
Copy the new file to the old file.
There is no simple way, because the actual operation is not simple. When the file is stored on the disk, there are no empty available bytes before the beginning of the file, so you can't just put data there. There isn't an ideal generic solution to this -- usually, it means copying all of the rest of the data to move it to make room.
Thus, C makes you decide how you want to solve that problem.
Just wanted to counter some of the more absolute claims in here:
There is no way to append data to the beginning of a file.
Incorrect, there are, given certain constraints.
When the file is stored on the disk, there are no empty available bytes before the beginning of the file, so you can't just put data there.
This may be the case when dealing at the abstraction level of files as byte streams. However, file systems most often store files as a sequence of blocks, and some file systems allow a bit more free access at that level.
Linux 4.1+ (XFS) and 4.2+ (XFS, ext4) allows you to insert holes into files using fallocate, given certain offset/length constraints:
Typically, offset and len must be a multiple of the filesystem logical block size, which varies according to the filesystem type and configuration.
Examples on StackExchange sites can be found by web searching for 'fallocate prepend to file'.
There is no way to append data to the beginning of a file.
The questioner also says that the only way they thought of solving the problem was by reading the whole file into memory and writing it out again. Here are other methods.
Write a placeholder of zeros of a known length. You can rewind the file handler and write over this data, so long as you do not exceed the placeholder size.
A simplistic example is writing the size of an unsigned int at the start that represents the count of lines that will follow, but will not be able to fill in until you reached the end and can rewind the file handler and rewrite the correct value.
Note: Some versions of 'C' on different platforms insist you finally place the file handler at the end of file before closing the file handler for this to work correctly.
Write the new data to a new file and then using file streams append the old data to the new file. Delete the old file and then rename the new file as the old file name. Do not use copy, it is a waste of time.
All methods have trade offs of disk size versus memory and CPU usage. It all depends on your application requirements.
Not strictly a solution, but if you're adding a short string to a long file, you can make a looping buffer of the same length you want to add, and sort of roll the extra characters out:
//also serves as the buffer; the null char gives the extra char for the begining
char insRay[] = "[inserted text]";
printf("inserting:%s size of buffer:%ld\n", insRay,sizeof(insRay));
//indecies to read in and out to the file
int iRead = sizeof(insRay)-1;
int iWrite = 0;
//save the input, so we only read once
int in = '\0';
do{
in = fgetc(fp);
//don't go to next char in the file
ungetc(in,fp);
if(in != EOF){
//preserve og file char
insRay[iRead] = in;
//loop
iRead++;
if(iRead == sizeof(insRay))
iRead = 0;
//insert or replace chars
fputc(insRay[iWrite],fp);
//loop
iWrite++;
if(iWrite == sizeof(insRay))
iWrite = 0;
}
}while(in != EOF);
//add buffer to the end of file, - the char that was null before
for(int i = 0; i < sizeof(insRay)-1;i++){
fputc(insRay[iWrite],fp);
iWrite++;
if(iWrite == sizeof(insRay))
iWrite = 0;
}
Related
I have a file in which I'd like to iterate without processing in any sort the current line. What I am looking for is the best way to go to a determined line of a text file. For example, storing the current line into a variable seems useless until I get to the pre-determined line.
Example :
file.txt
foo
fooo
fo
here
Normally, in order to get here, I would have done something like :
FILE* file = fopen("file.txt", "r");
if (file == NULL)
perror("Error when opening file ");
char currentLine[100];
while(fgets(currentLine, 100, file))
{
if(strstr(currentLine, "here") != NULL)
return currentLine;
}
But fgetswill have to read fully three line uselessly and currentLine will have to store foo, fooo and fo.
Is there a better way to do this, knowing that here is line 4? Something like a go tobut for files?
Since you do not know the length of every line, no, you will have to go through the previous lines.
If you knew the length of every line, you could probably play with how many bytes to move the file pointer. You could do that with fseek().
You cannot access directly to a given line of a textual file (unless all lines have the same size in bytes; and with UTF8 everywhere a Unicode character can take a variable number of bytes, 1 to 6; and in most cases lines have various length - different from one line to the next). So you cannot use fseek (because you don't know in advance the file offset).
However (at least on Linux systems), lines are ending with \n (the newline character). So you could read byte by byte and count them:
int c= EOF;
int linecount=1;
while ((c=fgetc(file)) != EOF) {
if (c=='\n')
linecount++;
}
You then don't need to store the entire line.
So you could reach the line #45 this way (using while ((c=fgetc(file)) != EOF) && linecount<45) ...) and only then read entire lines with fgets or better yet getline(3) on POSIX systems (see this example). Notice that the implementation of fgets or of getline is likely to be built above fgetc, or at least share some code with it. Remember that <stdio.h> is buffered I/O, see setvbuf(3) and related functions.
Another way would be to read the file in two passes. A first pass stores the offset (using ftell(3)...) of every line start in some efficient data structure (a vector, an hashtable, a tree...). A second pass use that data structure to retrieve the offset (of the line start), then use fseek(3) (using that offset).
A third way, POSIX specific, would be to memory-map the file using mmap(2) into your virtual address space (this works well for not too huge files, e.g. of less than a few gigabytes). With care (you might need to mmap an extra ending page, to ensure the data is zero-byte terminated) you would then be able to use strchr(3) with '\n'
In some cases, you might consider parsing your textual file line by line (using appropriately fgets, or -on Linux- getline, or generating your parser with flex and bison) and storing each line in a relational database (such as PostGreSQL or sqlite).
PS. BTW, the notion of lines (and the end-of-line mark) vary from one OS to the next. On Linux the end-of-line is a \n character. On Windows lines are rumored to end with \r\n, etc...
A FILE * in C is a stream of chars. In a seekable file, you can address these chars using the file pointer with fseek(). But apart from that, there are no "special characters" in files, a newline is just another normal character.
So in short, no, you can't jump directly to a line of a text file, as long as you don't know the lengths of the lines in advance.
This model in C corresponds to the files provided by typical operating systems. If you think about it, to know the starting points of individual lines, your file system would have to store this information somewhere. This would mean treating text files specially.
What you can do however is just count the lines instead of pattern matching, something like this:
#include <stdio.h>
int main(void)
{
char linebuf[1024];
FILE *input = fopen("seekline.c", "r");
int lineno = 0;
char *line;
while (line = fgets(linebuf, 1024, input))
{
++lineno;
if (lineno == 4)
{
fputs("4: ", stdout);
fputs(line, stdout);
break;
}
}
fclose(input);
return 0;
}
If you don't know the length of each line, you have to go through all of them. But if you know the line you want to stop you can do this:
while (!found && fgets(line, sizeof line, file) != NULL) /* read a line */
{
if (count == lineNumber)
{
//you arrived at the line
//in case of a return first close the file with "fclose(file);"
found = true;
}
else
{
count++;
}
}
At least you can avoid so many calls to strstr
This code correctly reads a file line-by-line, stores each line in line[] and prints it.
int beargit_add(const char* filename) {
FILE* findex = fopen(".beargit/.index", "r");
char line[FILENAME_SIZE];
while(fgets(line, sizeof(line), findex)) {
strtok(line, "\n");
fprintf(stdout, "%s\n", line);
}
fclose(findex);
return 0;
}
However, I am baffled as to why using fgets() in the while loop actually reads the file line-by-line. I am brand new to C after having learned Python and Java.
Since each call to fgets() is independent, where is C remembering which line it is currently on each time you call it? I thought it might have to do with changing the value FILE* index points to, but you are passing the pointer into fgets() by value so it could not be modified.
Any help in understanding the magic of C is greatly appreciated!
It's not fgets keep track, findex does this for you
findex is a FILE type, which includes infomation about a open file such as file descriptor, file offset
FILE is a encapsulation of I/O in OS.
more about: FILE
Object type that identifies a stream and contains the information needed to control it, including a pointer to its buffer, its position indicator and all its state indicators.
and the offset is to keep track of the file, each time you read the file, it starts from the offset. All these work are done by FILE, it do for you, and for fgets
more info about offset offset wiki
I thought it might have to do with changing the value FILE* index points to
it's not the value of the pointer itself that is changed. The pointer points to a structure (of type FILE) which contains information about the stream associated with that structure/pointer. Presumably, one of the members of that structure contains a "cursor" that points to the next byte to be read.
Or maybe it's just got a file descriptor int (like on many Unices) and I/O functions just call out to the kernel in order to obtain information about the file descriptor, including the current position.
platform: windows
o/s: XP sp3
compiler: gcc v4.8.1
text editor: notepad
encoding: ansi
question: how can i retreive the actual file size in text mode so that i can set a buffer size exactly?
char *filename = "functions.txt";
FILE *source = fopen(filename,"r");
struct stat properites;
stat(filename,&properties);
long size_stat = properties.st_size;
fseek(source,0,SEEK_END);
long size_ftell = ftell(source);
fseek(source,0,SEEK_SET);
char *pchar_source = malloc(sizeof(char)*size_stat);
long size_read = fread(pchar_source,sizeof(char),size_stat,source);
functions.txt
tokenize(String string, Character delimiter) String[]
{
}
output
file size-stat [70]
file size-ftell [70]
file size-fread [67]
for small files, the difference is negligible, however, for file larger files, this means unneccessay memory allocation. any suggestions?
one possible solution:
long fileSize = 0;
while (getc(source) != EOF)
{
fileSize++;
}
however, this is very wasteful and time consuming for large files.
ftell gives you the correct size in bytes. As others noted, it is because you have three line endings encoded as \r\n. When you open in text mode on Windows, they get converted to \n, thus you read three chars less.
There are two options I see:
Use ftell as an estimate for the size of the buffer, but then, after the fread, use size_read in the rest of your code for the buffer size. You will just waste number-of-lines bytes of memory, which is not a big deal.
Open the file in binary mode rb. You will get a size of 70, but also fread will return 70 bytes. Then write your code with the understanding that line endings might be \r, \n, or \r\n.
From the above two I really recommend the 2nd option: it gives a more robust and portable program, and the notion of binary mode is less confusing than the platform dependent text-mode.
If the "size" of the file is to be given in units that depend on the file's contents, then accurately determining that size requires scanning the entire file.
That is precisely the situation for any file opened in text mode on Windows (because physical "\r\n" is treated as a single logical unit). It is also the case if the file content is encoded in some way, and you want the count of decoded units. That's not as unlikely as it may sound, as it arises quite frequently with character encodings, such as (21-bit) Unicode characters encoded as a UTF-8 byte stream.
As far as creating a buffer to hold the whole file content,
If you have to worry about large files, then do everything possible to avoid creating such a buffer in the first place. Ideally, you would process the file in a streaming mode, so that you don't have to keep much of it in memory at any one time.
If you must create such a buffer, then consider a buffer consisting of a linked list of smallish blocks (say 4 - 32k), so that you can extend the buffer as needed without realloc() (e.g. as needed while reading the file).
answer:
no you cant.
the actual file size, and not an "estimate", is only available after a complete read. this is due to conversions (if any) of new lines and encoding types. for those wondering, here is the "proper" way of determining the actual file size.
char *filename = "sample.txt";
FILE *file_source = fopen(filename,"r"); // can be set to either "r" or "rb"
// use stat.st_size if you have the library <sys/stat.h>
struct stat stat_sourceFile;
stat(filename,&stat_sourceFile);
long long_fileSize_stat = stat_sourceFile.st_size; // estimate only
// use fseek,ftell,fseek if you dont have the lib <sys/stat.h>
fseek(file_source,0,SEEK_END);
long long_fileSize_ftell = ftell(file_source); // estimate only
fseek(file_source,0,SEEK_SET);
char *pchar_source = malloc(sizeof(char)*long_fileSize_stat);
long long_ACTUAL_FILE_SIZE = fread(pchar_source,sizeof(char),long_fileSize_stat,file_source);
realloc(pchar_source,long_ACTUAL_FILE_SIZE);
// now when we pass the pointer/array size to ANY function/method, you WONT
// get those funny characters not part of your file at the end of your
// printf statements. also, instead of using long_ACTUAL_FILE_SIZE as
// the bounds for iteration, you could use strlen(pchar_source)
hope this helps others new to c and file buffering.
I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.
My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.
The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".
This is the code that is supposed to read a file into httpData:
int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
fclose(file);
printf("httpData = %s\n", httpData);
Edit: This is what I send:
char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);
The data part produces ???? for the image but correct html for an html file.
Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.
Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.
Thus, we might have a function:
int readHTTPData(const char *filename, size_t size, char *httpData)
{
FILE *fp = fopen(filename, "rb");
size_t n;
if (fp == 0)
return E_FILEOPEN;
n = fread(httpData, size, 1, fp);
fclose(fp);
if (n != 1)
return E_SHORTREAD;
fputs("httpData = ", stdout);
fwrite(httpData, size, 1, stdout);
putchar('\n');
return 0;
}
The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:
Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.
In the original code, there is a loop:
int i = 0;
...
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
This loop has a lot of problems:
You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().
Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.
int i = 0;
int c;
while ((c = getc(file)) != EOF && i < size)
httpData[i++] = c;
You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)
At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).
Can you set any index of array as starting index i.e where to read from file? I was afraid if the buffer might get corrupted in the process.
#include <stdio.h>
int main()
{
FILE *f = fopen("C:\\dummy.txt", "rt");
char lines[30]; //large enough array depending on file size
fpos_t index = 0;
while(fgets(&lines[index], 10, f)) //line limit is 10 characters
{
fgetpos (f, &index );
}
fclose(f);
}
You can, but since your code is trying to read the full contents of the file, you can do that much more directly with fread:
char lines[30];
// Will read as much of the file as can fit into lines:
fread(lines, sizeof(*lines), sizeof(lines) / sizeof(*lines), f);
That said, if you really wanted to read line by line and do it safely, you should change your fgets line to:
// As long as index < sizeof(lines), guaranteed not to overflow buffer
fgets(&lines[index], sizeof(lines) - index, f);
Not like this no. There is a function called fseek that will take you to a different location in the file.
Your code will read the file into a different part of the buffer (rather than reading a different part of the file).
lines[index] is the index'th character of the array lines. Its address is not the index'th line.
If you want to skip to a particular line, say 5, then in order to read the 5th line, read 4 lines and do nothing with them, them read the next line and do something with it.
If you need to skip to a particular BYTE within a file, then what you want to use is fseek().
Also: be careful that the number of bytes that you tell fgets to read for you (10) is the same as the size of the array you are putting the line into (30) - so this is not the case right now.
If you need to read a part of a line starting from a certain character within that line, you still need to read the whole line, then just choose to use a chunk of it starting someplace other than the beginning.
Both of these examples are like requesting a part of a document from a website or a library - they're not going to tear out a page for you, you get the whole document, and you have to flip to what you want.