Delphi - writing a large dynamic array to disk using stream - arrays

In a Delphi program, I have a dynamic array with 4,000,000,001 cardinals. I'm trying to write (and later read) it do a drive. I used the following:
const Billion = 1000000000;
stream := tFileStream.Create( 'f:\data\BigList.data', fmCreate);
stream.WriteBuffer( Pointer( BigArray)^, (4 * billion + 1) * SizeOf( cardinal));
stream.free;
It bombed out with: ...raised exception class EWriteError with message 'Stream write error'.
The size of the file it wrote is only 3,042,089KB.
Am I doing something wrong? Is there a limit to the size that can be written at once (about 3GB)?

The Count parameter of WriteBuffer is a 32 bit integer so you cannot pass the required value in that parameter. You will need to write the file with multiple separate calls to WriteBuffer, where each call passes a count that does not exceed this limit.
I suggest that you write it something like this.
var
Count, Index, N: Int64;
....
Count := Length(BigArray);
Index := 0;
while Count > 0 do begin
N := Min(Count, 8192);
stream.WriteBuffer(BigArray[Index], N*SizeOf(BigArray[0]));
inc(Index, N);
dec(Count, N);
end;
An additional benefit is that you can readily display progress.

Related

Memory efficient computation of md5sum of a file in vlang

The following code read a file into bytes and computes the md5sum of the bytes array. It works but I would like to find a solution in V that need less RAM.
Thanks for your comments !
import os
import crypto.md5
b := os.read_bytes("file.txt") or {panic(err)}
s := md5.sum(b).hex()
println(s)
I also tried without success :
import os
import crypto.md5
import io
mut f := os.open_file("file.txt", "r")?
mut h := md5.new()
io.cp(mut f, mut h)?
s := h.sum().hex()
println(s) // does not return the correct md5sum
Alrighty. This is what you're looking for. It produces the same result as md5sum and is only slightly slower. block_size is inversely related to the amount of memory used and speed at which the checksum is computed. Decreasing block_size will lower the memory footprint, but takes longer to compute. Increasing block_size has the opposite effect. I tested on a 2GB manjaro disc image and can confirm the memory usage is very low.
Note: It seems this does perform noticeably slower without the -prod flag. The V compiler makes special optimizations in order to run faster for the production build.
import crypto.md5
import io
import os
fn main() {
println(hash_file('manjaro.img')?)
}
const block_size = 64 * 65535
fn hash_file(path string) ?string {
mut file := os.open(path)?
defer {
file.close()
}
mut buf := []u8{len: block_size}
mut r := io.new_buffered_reader(reader: file)
mut digest := md5.new()
for {
x := r.read(mut buf) or { break }
digest.write(buf[..x])?
}
return digest.checksum().hex()
}
To conclude what I've learned from the comments:
V is a programming language with typed arguments
md5.sum takes a byte array argument, and not something that is a sequence of bytes, e.g. read from a file as-you-go.
There's no alternative to md5.sum
So, you will have to implement MD5 yourself. Maybe the standard library is open source and you can build upon that! Or, you can just bind any of the existing (e.g. C) implementations of MD5 and feed in bytes as you read them, in chunks of 512 bits = 2⁶ Bytes.
EDIT: I don't know V, so it's hard for me to judge, but it would look Digest.write would be a method to consecutively push data through the MD5 calculation. Maybe that together with a while loop reading bytes from the file is the solution?

Ada - reading large files

I'm working on building an HTTP server, mostly for learning/curiosity purposes, and I came across a problem I've never had in Ada before. If I try to read files that are too big using Direct_IO, I get a Storage Error: Stack Overflow exception. This almost never happens, but when I request a video file, the exception will be thrown.
So I got the idea to read and send files in chunks of 1M characters at a time but this leaves me with End Errors since most files aren't going to be exactly 1M characters in length. I'm also not entirely sure I did it right anyway since reading the whole file has always been sufficient before. Here is the procedure that I've written:
procedure Send_File(Channel : GNAT.Sockets.Stream_Access; Filepath : String) is
File_Size : Natural := Natural(Ada.Directories.Size (Filepath));
subtype Meg_String is String(1 .. 1048576);
package Meg_String_IO is new Ada.Direct_IO(Meg_String);
Meg : Meg_String;
File : Meg_String_IO.File_Type;
Count : Natural := 0;
begin
loop
Meg_String_IO.Open(File, Mode => Meg_String_IO.In_File, Name => Filepath);
Meg_String_IO.Read(File, Item => Meg);
Meg_String_IO.Close(File);
String'Write(Channel, Meg);
exit when Count >= File_Size;
Count := Count + 1048576;
end loop;
end Send_File;
I had the thought to declare two separate Direct_IO packages/string sizes, where one would be 1048576 in length while the other would be the file length mod 1048576 in length but I'm not sure how I would use the two readers sequentially.
Thanks to anyone who can help.
I'd use Stream_IO (ARM A.12.1), which allows you to read into a buffer and tells you how much data was actually read; see the second form of Read,
procedure Read (File : in File_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset);
with semantics described in ARM 13.13.1 (8),
The Read operation transfers stream elements from the specified stream to fill the array Item. Elements are transferred until Item'Length elements have been transferred, or until the end of the stream is reached. If any elements are transferred, the index of the last stream element transferred is returned in Last. Otherwise, Item'First - 1 is returned in Last. Last is less than Item'Last only if the end of the stream is reached.
procedure Send_File (Channel : GNAT.Sockets.Stream_Access;
Filepath : String) is
File : Ada.Streams.Stream_IO.File_Type;
Buffer : Ada.Streams.Stream_Element_Array (1 .. 1024);
Last : Ada.Streams.Stream_Element_Offset;
use type Ada.Streams.Stream_Element_Offset;
begin
Ada.Streams.Stream_IO.Open (File,
Mode => Ada.Streams.Stream_IO.In_File,
Name => Filepath);
loop
Read the next Buffer-full from File. Last receives the index of the last byte that was read; if we’ve reached the end-of-file in this read, Last will be less than Buffer’Last.
Ada.Streams.Stream_IO.Read (File, Item => Buffer, Last => Last);
Write the data that was actually read. If File’s size is a multiple of Buffer’Length, the last Read will read no bytes and will return a Last of 0 (Buffer’First - 1), so this will write Buffer (1 .. 0), i.e. no bytes.
Ada.Streams.Write (Channel.all, Buffer (1 .. Last));
The only reason for reading less than a Buffer-full is that end-of-file was reached.
exit when Last < Buffer’Last;
end loop;
Ada.Streams.Stream_IO.Close (File);
end Send_File;
(note also that it’s best to open and close the file outside the loop!)

VHDL average of Array through for loop

I have an Array of X Integer values in VHDL declared as a variable inside a process.
I would like to calculate the average of all Values in a for loop.
If I write it out for 3 Values manually everything works fine (tested on hardware):
entity MyEntity is
Port(
Enable : IN STD_LOGIC ;
CLK : IN STD_LOGIC;
SpeedOut : OUT INTEGER
);
end MyEntity;
Average : process
type SampleArray is Array (2 downto 0) of INTEGER;
variable SpeedSamples : SampleArray;
begin
wait until rising_edge(CLK);
if ENABLE = '1' then
SpeedOut <= ( SpeedSamples(0)+ SpeedSamples(1)+SpeedSamples(2) ) / 3;
end if;
end process Average;
If i use a for loop to do the same SpeedOut is constant 0:
entity MyEntity is
Port(
Enable : IN STD_LOGIC ;
CLK : IN STD_LOGIC;
SpeedOut : Out INTEGER
);
end MyEntity;
Average : process
type SampleArray is Array (2 downto 0) of INTEGER;
variable SpeedSamples : SampleArray;
variable tempVar : Integer;
begin
wait until rising_edge(CLK);
if ENABLE = '1' then
for i in 0 to 2 loop
tempVar := tempVar + SpeedSamples(i);
end loop;
SpeedOut <= tempVar / 3;
end if;
end process Average;
I am aware this will need a lot of resources if the Array is bigger but i think there is something fundamentally wrong with my code.
Is there a proven method of calculating a moving average in VHDL?
It's not that efficient to add up a large number of samples each clock period like that; an adder with n inputs will consume a lot of logic resource as n starts to increase.
My suggestion is to implement a memory buffer for the samples, which will have as many locations as you want samples in your rolling average. This will have one new sample written to it each clock cycle; you will also add this same sample to your total on the following clock edge.
Using dual-port memory, you can simultaneously read out the 'oldest' sample in the memory from the same location (provided you have the memory in read-before-write mode). Subtract this from your total, then perform the divide. I expect by far the most efficient divisor will be a power of two, so that your divide does not consume any logic resource. Other types of divider use relatively lots of logic.
So the design would boil down to a memory buffer, a 3-input adder, a counter for use as a pointer to the sample buffer, and a wire-shift divider. If performance was an issue, you could pipeline the add/subtract phases so that you only ever needed 2-input adders.
As for the actual coding question about creating a multi-input adder using a loop, on top of suggestions made in the comments, I would say it's really up to your synthesis tool as to whether it would be able to identify this as a multi-input adder. Have you looked in the synthesis report for any messages relating to this segment of code?

Converting code from Matlab to C?

I need to write a code that will:
Read a file that is a source and it is "emitting" 8-bytes signs (letters, numbers). The code needs to add +1 in the accumulator that is the size of 2^8^n every time a sign occurs. After I can calculate the entropy. The input parameters are the file and "n" that is 8 bytes * n (from 3 to 5).
Example:
if we read from file 10001110|11110000|11111111|00001011|10101010, for n = 3, we put +1 in the accumulator at 100011101111000011111111 and proceed with +1 at 111100001111111100001011 and so on...
The main problem is speed of this process. I need to read files that are max 50MB. Programming language can be anything from C, Matlab or Java.
Code so far in Matlab for n = 1. I need a good implementation for accumulator, so its not ful size from the very beginning...
load B.mat; %just some small part of file
file = dec2bin(B);
file = str2num(file);
[fileSize, ~] = size(file);
%Choose n
n = 1;
%Make accumulator
acu10 = linspace(0, 2^8^n-1, 2^8^n);
acu = dec2bin(acu10);
acu = str2num(acu);
index = zeros(size(acu10))';
%go through file and add +1 in accumulator
for i = 1:n:fileSize
for j = 1:size(acu)
if acu(j,:) == file(i);
index(j) = index(j) + 1;
end
end
end

Create an array of values from different text files in C

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.
I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}
What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.
Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

Resources