Ada - reading large files

Ada - reading large files - file

I'm working on building an HTTP server, mostly for learning/curiosity purposes, and I came across a problem I've never had in Ada before. If I try to read files that are too big using Direct_IO, I get a Storage Error: Stack Overflow exception. This almost never happens, but when I request a video file, the exception will be thrown.
So I got the idea to read and send files in chunks of 1M characters at a time but this leaves me with End Errors since most files aren't going to be exactly 1M characters in length. I'm also not entirely sure I did it right anyway since reading the whole file has always been sufficient before. Here is the procedure that I've written:
procedure Send_File(Channel : GNAT.Sockets.Stream_Access; Filepath : String) is
File_Size : Natural := Natural(Ada.Directories.Size (Filepath));
subtype Meg_String is String(1 .. 1048576);
package Meg_String_IO is new Ada.Direct_IO(Meg_String);
Meg : Meg_String;
File : Meg_String_IO.File_Type;
Count : Natural := 0;
begin
loop
Meg_String_IO.Open(File, Mode => Meg_String_IO.In_File, Name => Filepath);
Meg_String_IO.Read(File, Item => Meg);
Meg_String_IO.Close(File);
String'Write(Channel, Meg);
exit when Count >= File_Size;
Count := Count + 1048576;
end loop;
end Send_File;
I had the thought to declare two separate Direct_IO packages/string sizes, where one would be 1048576 in length while the other would be the file length mod 1048576 in length but I'm not sure how I would use the two readers sequentially.
Thanks to anyone who can help.

I'd use Stream_IO (ARM A.12.1), which allows you to read into a buffer and tells you how much data was actually read; see the second form of Read,
procedure Read (File : in File_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset);
with semantics described in ARM 13.13.1 (8),
The Read operation transfers stream elements from the specified stream to fill the array Item. Elements are transferred until Item'Length elements have been transferred, or until the end of the stream is reached. If any elements are transferred, the index of the last stream element transferred is returned in Last. Otherwise, Item'First - 1 is returned in Last. Last is less than Item'Last only if the end of the stream is reached.
procedure Send_File (Channel : GNAT.Sockets.Stream_Access;
Filepath : String) is
File : Ada.Streams.Stream_IO.File_Type;
Buffer : Ada.Streams.Stream_Element_Array (1 .. 1024);
Last : Ada.Streams.Stream_Element_Offset;
use type Ada.Streams.Stream_Element_Offset;
begin
Ada.Streams.Stream_IO.Open (File,
Mode => Ada.Streams.Stream_IO.In_File,
Name => Filepath);
loop
Read the next Buffer-full from File. Last receives the index of the last byte that was read; if we’ve reached the end-of-file in this read, Last will be less than Buffer’Last.
Ada.Streams.Stream_IO.Read (File, Item => Buffer, Last => Last);
Write the data that was actually read. If File’s size is a multiple of Buffer’Length, the last Read will read no bytes and will return a Last of 0 (Buffer’First - 1), so this will write Buffer (1 .. 0), i.e. no bytes.
Ada.Streams.Write (Channel.all, Buffer (1 .. Last));
The only reason for reading less than a Buffer-full is that end-of-file was reached.
exit when Last < Buffer’Last;
end loop;
Ada.Streams.Stream_IO.Close (File);
end Send_File;
(note also that it’s best to open and close the file outside the loop!)

Related

java nio socketChannel.write() missing bytes

System.out.println(" # bBuffer = " + bBuffer.capacity());
headerBuffer.rewind();
socketChannel.write(headerBuffer);
int writen = socketChannel.write(bBuffer);
System.out.println(" # writen = " + writen);
bBuffer is an object of type ByteBuffer and it came from FileChannel.map() (it's an image file). When I receive this image file on client, it wasn't a complete image--about half of image was missing. So I checked how much bytes was written by printing some statistics to the console. The output was:
# bBuffer = 319932
# writen = 131071
What happened to the rest of the bytes? It seems that (319923 - 131071) bytes are missing.
Sometimes written is equals to bBuffer.capacity() and it's seems irrespective to file size or buffer capacity.

Your code makes an invalid assumption. read() and write() aren't obliged to transfer more than one byte per call. You have to loop. But you're doing this wrong anyway. It should be:
while (headerBuffer.position() > 0)
{
headerBuffer.flip();
socketChannel.write(headerBuffer);
headerBuffer.compact();
}
or
headerBuffer.flip();
while (headerBuffer.hasRemaining())
{
socketChannel.write(headerBuffer);
}
headerBuffer.compact();
If the SocketChannel is in non-blocking mode, it gets considerably more complex, but you haven't mentioned that in your question.
You also need to check your receiving code for the same assumption.

Create an array of values from different text files in C

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.

I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}

What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.

Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

read and gather variables in a text file

I asked this a while ago but was really vague and I also made some changes to my code.
I have a file that I call "stats.txt" which I open with: (using "C" btw)
fopen("stats.txt", r+)
During the first run of my program, I will ask the user to fill in the variables used to write to the file:
fprintf(fp, "STR: %i(%i)\n", STR, smod);
fprintf(fp, "DEX: %i(%i)\n", DEX, dmod);
etc...
the file looks like this after the programs first run, with all the numbers corresponding to a variable in the program:
Level 1 Gnome Wizard:
STR: 8(-1)
DEX: 14(2)
CON: 14(2)
INT: 13(1)
WIS: 13(1)
CHR: 12(1)
APP: 11(0)
Fort save: 0
Reflex save: 0
Will save: 3
when the program closes and runs for a second time, I have an "IF" statement checking for and displaying text within the "stats.txt" file:
if (fgets(buf, 1000, fp) == NULL)
{
printf("Please enter in your base stats (no modifiers):\n");
enter_stats();
printf("Please indicate your characters level:\n");
printf("I am a level ");
level = GetInt();
Race_check();
spec_check();
printf("------Base saving throws (no modifiers)------\n");
saving_throws();
}
else
{
printf("%s",buf);
}
The problem that I am having is the fact that the program reads the file, but does not transfer any variable values it seems here:
Level 1 Gnome Wizard:
-------------------------
STR: 0(-5)
DEX: 0(-5)
CON: 0(-5)
INT: 0(-5)
WIS: 0(-5)
CHR: 0(-5)
APP: 0(-5)
-----Saving Throws------
Fortitude: 0
Reflex: 0
Will: 0
Can anyone give me their suggestions on how to read the variables as well?
PLease and Thank you

Computers only understand numbers - they don't understand text. This means that you have to write code to convert the numbers (that represent individual characters) back into the values you want and store them somewhere.
For example, you might load the entire file into an "array of char", then search that "array of char" for the 4 numbers that represent STR:, then skip any whitespace (between the STR: and the 0(0)), then convert the character/s 0 into the value 0 and store it somewhere, then check for a ( character, then convert the characters -1 into the value -1 and store it somewhere, then check for the ) character and the newline character \n.
More likely is to arrange the code as a "for each line" loop, where the first characters of a line determine how to process the other characters. E.g. if the first character is - then ignore the line; else if the first 5 characters are level call a function that processes the remainder of the line (1 Gnome Wizard); else if the first few characters are STR:, DEX:, CON, etc call a function to get both numbers (and check for the right brackets, etc); else...
In addition to all this, you should have good error handling. As a rough guide, about half of the code should be checks and error messages (like if( buffer[i] != '(' ) { printf("ERROR: Expecting left bracket after number on line %u", lineNumber); return -1;}).

Implementing the "more filter" in C

I am wondering how to implement the more filter - as in display text till it fills the screen and move each line with "Enter" (just like in UNIX). I know ncurses would be useful, but cant find the appropriate method.
Thanks!

Make a two buffer implementation
This can be an outline:
Read from file into the 1st buffer.
offset = 0
while `offset != total_lines_in_file`
[1] show `max_lines_in_a_page` number of lines starting from line number `offset`
[2] when showing the lines if a buffer end is detected, load the other buffer and switch buffer, and print remaining lines.
[3] if the specific scroll keypress is detected make `offset = offset + x` (x = 1, 2 etc.)
Better have a look at the source code of more implementation.

What could cause a Labwindows/CVI C program to hate the number 2573?

Using Windows
So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it:
function read_dataset(char* stream, t_dataset *dataset){
//...some init, including setting dataset->size;
for(i=0;i<dataset->size;i++){
dataset->samples[i] = *((unsigned int *) stream);
stream += sizeof(unsigned int);
}
//...
}
Where read_dataset in such a context as this:
//...
char buff[10000];
t_dataset* dataset = malloc( sizeof( *dataset) );
unsigned long offset = 0;
for(i=0;i<number_of_datasets; i++){
fseek(fd_in, offset, SEEK_SET);
if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){
break;
}
read_dataset(buff, *dataset);
// Do something with dataset here. It's screwed up before this, I checked.
offset += profileSize;
}
//...
Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers.
For example, what should be
...
1831
2229
2406
2637
2609
2573
2523
2247
...
becomes
...
1831
2229
2406
2637
2609
0xDB00000A
0xC7000009
0xB2000008
...
If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar:
2573 -> 0xA0D
2523 -> 0x9DB
2247 -> 0x8C7
So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573.
I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.

You don't specify how you obtained the bytes in memory (pointed to by stream), nor what platform you're running on, but I wouldn't be surprised to find your on Windows, and you used the C stdio library call fopen(filename "r"); Try using fopen(filename, "rb");. On Windows (and MS-DOS), fopen() translates MS-DOS line endings "\r\n" (hex 0x0D 0x0A) in the file to Unix style "\n", unless you append "b" to the file mode to indicate binary.

A couple of irrelevant points.
sizeof(*dataset) doesn't do what you think it does.
There is no need to use seek on every read
I don't understand how you are calling a function that only takes one parameter but you are giving it two (or at least I don't understand why your compiler doesn't object)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ada - reading large files - file

Related

java nio socketChannel.write() missing bytes

Create an array of values from different text files in C

read and gather variables in a text file

Implementing the "more filter" in C

What could cause a Labwindows/CVI C program to hate the number 2573?

Categories

Resources