How to truncate file to size N in ocaml ?
I don't see a function in Pervasive. The closest thing is "open_trunc" flag which I'm not really sure what it does.
If you're under Unix, you can use
val Unix.truncate : string -> int -> unit
It "runcates the named file to the given size".
But this function is not implemented in the Windows version of OCaml (or more precisely, it is not emulated).
If you're under Windows and want to emulate it, you might be interested in
val really_input : in_channel -> string -> int -> int -> unit
"really_input ic buf pos len reads len characters from channel ic, storing them in string buf, starting at character number pos. Raise End_of_file if the end of file is reached before len characters have been read. Raise Invalid_argument "really_input" if pos and len do not designate a valid substring of buf."
I think you are correct.
open_trunc -Open the named file for writing, and return a new output channel on that file, positionned at the beginning of the file. The file is truncated to zero length if it already exists. It is created if it does not already exists. Raise Sys_error if the file could not be opened.
Refer this link also.
The OS capabilities of Pervasives correspond roughly to what you can do in standard C. There is no function to truncate a file to a specified length in standard C, all you can do is truncate a file to be empty when you open it (exposed through the Open_trunc flag). There is one in Unix/POSIX (truncate), so look for it in the Unix module, which does have a truncate function (or ftruncate for an open file, again following Unix/POSIX).
Related
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
I made a minimal example for Packcc parser generator.
Here, the parser have to recognize float or integer numbers.
I try to print the location of the detected numbers. For simplicity there is no
line/column count, just the number from "ftell".
%auxil "FILE*" # The type sent to "pcc_create" for access in "ftell".
test <- line+
/
_ EOL+
line <- num _ EOL
num <- [0-9]+'.'[0-9]+ {printf("Float at %li\n", ftell(auxil));}
/
[0-9]+ {printf("Integer at %li\n", ftell(auxil));}
_ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r'
%%
int main()
{
FILE* file = fopen("test.txt", "r");
stdin = file;
if(file == NULL) {
// try to open.
puts("File not found");
}
else {
// parse.
pcc_context_t *ctx = pcc_create(file);
while(pcc_parse(ctx, NULL));
pcc_destroy(ctx);
}
return 0;
}
The file to parse can be
2.0
42
The command can be
packcc test.peg && cc test.c && ./a.out
The problem is the cursor value is always at the end of file whatever the number
position.
Positions can be retrieved by special variables.
In the example above "ftell" must be replaced by "$0s" or "$0e".
$0s is the begining of the matched pattern, $0e is the end of the matched pattern.
https://github.com/arithy/packcc/blob/master/README.md
Without looking more closely at the generated code, it would seem that the parser insists on reading the entire text into memory before executing any of the actions. That seems unnecessary for this grammar, and it is certainly not the way a typical generated lexical scanner would work. It's particularly odd since it seems like the generated scanner uses getchar to read one byte at a time, which is not very efficient if you are planning to read the entire file.
To be fair, you wouldn't be able to use ftell in a flex-generated scanner either, unless you forced the scanner into interactive mode. (The original AT&T lex, which also reads one character at a time, would give you reasonable value from ftell. But you're unlikely to find a scanner built with it anymore.)
Flex would give you the wrong answer because it deliberately reads its input in chunks the size of its buffer, usually 8k. That's a lot more efficient than character-at-a-time reading. But it doesn't work for interactive environments -- for example, where you are parsing directly from user input -- because you don't want to read beyond the end of the line the user typed.
You'll have to ask whoever maintains packcc what their intended approach for maintaining source position is. It's possible that they have something built in.
What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others
This function print the length of words with '*' called histogram.How can I save results into text file? I tried but the program does not save the results.(no errors)
void histogram(FILE *myinput)
{
FILE *ptr;
printf("\nsaving results...\n");
ptr=fopen("results1.txt","wt");
int j, n = 1, i = 0;
size_t ln;
char arr[100][10];
while(n > 0)
{
n = fscanf(myinput, "%s",arr[i]);
i++;
}
n = i;
for(i = 0; i < n - 1; i++)
{
ln=strlen(arr[i]);
fprintf(ptr,"%s \t",arr[i]);
for(j=0;j<ln;j++)
fprintf(ptr, "*");
fprintf(ptr, "\n");
}
fclose(myinput);
fclose(ptr);
}
I see two ways to take care of this issue:
Open a file in the program and write to it.
If running with command line, change the output location for standard out
$> ./histogram > outfile.txt
Using the '>' will change where standard out will write to. The issue with '>' is that it will truncate a file and then write to the file. This means that if there was any data in that file before, it is gone. Only the new data written by the program will be there.
If you need to keep the data in the file, you can change the standard out to append the file with '>>' as in the following example:
$> ./histogram >> outfile.txt
Also, there does not have to be a space between '>' and the file name. I just do that for preference. It could look like this:
$> ./histogram >outfile.txt
If your writing to a file will be a one time thing, changing standard out is probably be best way to go. If you are going to do it every time, then add it to the code.
You will need to open another FILE. You can do this in the function or pass it in like you did the file being read from.
Use 'fprintf' to write to the file:
int fprintf(FILE *restrict stream, const char *restrict format, ...);
Your program may have these lines added to write to a file:
FILE *myoutput = fopen("output.txt", "w"); // or "a" if you want to append
fprintf(myoutput, "%s \t",arr[i]);
Answer Complete
There may be some other issues as well that I will discuss now.
Your histogram function does not have a return identifier. C will set it to 'int' automatically and then say that you do not have a return value for the function. From what you have provided, I would add the 'void' before the function name.
void histogram {
The size of arr's second set of arrays may be to small. One can assume that the file you are reading from does not exceed 10 characters per token, to include the null terminator [\0] at the end of the string. This would mean that there could be at most 9 characters in a string. Else you are going to overflow the location and potentially mess your data up.
Edit
The above was written before a change to the provided code that now includes a second file and fprintf statements.
I will point to the line that opens the out file:
ptr=fopen("results1.txt","wt");
I am wondering if you mean to put "w+" where the second character is a plus symbol. According to the man page there are six possibilities:
The argument mode points to a string beginning with one of the
following sequences (possibly followed by additional characters, as
described below):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is
positioned at the beginning of the file.
a Open for appending (writing at end of file). The file is
created if it does not exist. The stream is positioned at the
end of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file
position for reading is at the beginning of the file, but
output is always appended to the end of the file.
As such, it appears you are attempting to open the file for reading and writing.
I have an IDL routine that reads a binary data file. However, on this occasion, i'm getting "READU: End of file encountered. Unit 2, File: data.dat".
Instead of destroying the binary file and re-creating it. Is this problem surmountable? What IDL code could I use to allow me to read the binary file? The binary file was created by a C function.
Thanks in advance.
Based on the question, I'm assuming the binary file has a defined structure. You can probably use fstat() and eof() to get around this. For example:
openr, lun, 'file.bin', /get_lun
fs = fstat(lun)
len = fs.size / n_bytes_in_data_structure
for i = 0L, len - 1 do begin
readu, lun, var
...
If you don't know the size of your data structures or if you want to check that there's a sufficient number of bytes before a read, you can use fs.cur_ptr (after a call to fstat(), of course) or eof(lun).