How to read a very large dataset from an HDF5 file?

How to read a very large dataset from an HDF5 file? - c

I've started learning HDF5 API very recently. Suppose that I'm going to read a very large vector (i.e. one-dimentional array) which is stored as a dataset in an HDF5 file. Its size N_SIZE is so large that malloc(N_SIZE) fails. So it seems to me that I have to read it chunk by chunk. Should I use H5Dread_chunk() here?

Yes, you will have to read the dataset chunk by chunk if it does not fit in main memory. Also, please note that your dataset must be created with a chunked storage layout. Then you can read one chunk at the time using hyperslabs (i.e. slices).
All these can be greatly simplified with HDFql. HDFql is a high-level language that alleviates you from the low-level details of handling HDF5 files.
As an example, you could do the following in C using HDFql:
// declare variables
char script[100];
int data[1024][1024];
int i;
// create a HDF5 file named 'my_file.h5' and use (i.e. open) it
hdfql_execute("CREATE AND USE FILE my_file.h5");
// create a three dimensional chunked dataset named 'my_dataset' (each chunk is 1 MB)
hdfql_execute("CREATE CHUNKED(1, 1024, 1024) DATASET my_dataset AS INT(100, 1024, 1024)");
// register variable 'data' for subsequent usage
hdfql_variable_register(data);
// loop 100 times (i.e. number of chunks that exists in dataset 'my_dataset')
for(i = 0; i < 100; i++)
{
// prepare script to read one chunk at the time using an hyperslab
sprintf(script, "SELECT FROM my_dataset(%d:::1) INTO MEMORY 0", i);
// execute script
hdfql_execute(script);
// call hypothetical function 'process' passing variable 'data' that contains the chunked data
process(data);
}
Additional examples on how to use HDFql in C can be found here.

Related

How to read a file in Crystal?

I recently picked up on Crystal after being a Rubyist for a while, and I can't seem to find anything about the File class. I want to open and read a file, but it gives me an error.
file = File.open("ditto.txt")
file = file.read
tequila#tequila-pc:~/code$ crystal fileopen.cr
Error in fileopen.cr:2: wrong number of arguments for 'File#read' (given 0, expected 1)
Overloads are:
- IO::Buffered#read(slice : Bytes)
- IO#read(slice : Bytes)
file = file.read
^~~~

You're probably looking for IO#gets_to_end which reads the entire file as a String. But you might as well use File.read
file_content = File.read("ditto.txt")
IO#read is a more low-level method which allows to read pieces of an IO into a byte slice.

"Expecting miMATRIX type" error when reading MATLAB MAT-file with SciPy

This is a MATLAB question: the problem is caused by interactions with MATLAB files and Python/numpy. I am tying to write a 3-D array of type uint8 in MATLAB, and then read it in Python using numpy. This is the MATLAB code that creates the file:
voxels = zeros(30, 30, 30);
....
fileID1 = fopen(fullFileNameOut,'w','s');
fwrite(fileID1, voxels, 'uint8');
fclose(fileID1);
This is the Python code that tries to read the file:
filename = 'File3DArray.mat'
arr = scipy.io.loadmat(filename)['instance'].astype(np.uint8)
This is the error that I get when I run the python code:(I think this is the source of the problem. What is MM
raise TypeError('Expecting miMATRIX type here, got %d' % mdtype)
This is the output of the Linux command 'file' on the 3D array file
that I created (I think this is the source of the problem. What is MMDF Mailbox?):
File3DArray.mat: MMDF mailbox
This is the output of the same Linux command 'file' on another 3D array file
that was created by someone else in MATLAB:
GoodFile.mat: Matlab v5 mat-file (little endian) version 0x0100
I want the files I create in MATLAB to be the same as GoodFile.mat (so that I can read them with the Python/Numpy code segment above). The output of the Linux 'file' command should be the same as the GoodFile output, I think.
What is the MATLAB code that does that?

To create a MAT-file, use the MATLAB save command:
voxels = zeros(30, 30, 30, 'uint8');
save(fullFileNameOut, 'voxels', '-v7')
You need to add '-v7' (or '-v6') as an argument to save to create a file in an older format, as SciPy doesn't recognize the '-v7.3' files created by default.

Changes being made to a file downloaded with libcurl don't take effect

Let me explain in more detail. I'm trying to write a program that downloads a file from a remote FTP server, appends one line to the end of it, and then re-uploads it. The file operations work, and the text is appended to the file and re-uploaded, but when I download the file again, no text was appended. I've written a small test program to demonstrate this; here's the code at Pastebin:
http://pastebin.com/r07TkxEK
The program prints the following output on both the initial run and subsequent runs::
Remote URL: ftp://orangesquirrels.com
Got data.
Local data file size: 678 bytes.
Current position in file: 678
Uploading database file back to server...
Local data file size: 690 bytes.
Remote URL is ftp://orangesquirrels.com !
*** We read 690 bytes from file.
If the program works, the output from the subsequent run should be:
Remote URL: ftp://orangesquirrels.com
Got data.
Local data file size: 690 bytes.
Current position in file: 690
Uploading database file back to server...
Local data file size: 702 bytes.
Remote URL is ftp://orangesquirrels.com !
*** We read 702 bytes from file.
Because the data is written to the file and re-uploaded (I know this because the uploaded file is a greater size than the downloaded file) I assume the upload worked; my suspicion is that the problem lies in the download process and/or the curl_database_write function. I've been doing everything humanely possible to find out why this is happening, to no avail. If anyone knows anything about why this isn't working, I'd love to know. I'm being paid to write this program, and I know I've got to find a solution soon...

you are not using "short_database" nor "file_to_write" in your download()-function. So you download /tmp/musiclist.txt from the ftp-server instead of musiclist.txt.
you should check your defines, and when you use a define and when you use a stringvariable and when you use parameters
#define DATABASE_FILE "/tmp/musiclist.txt"
#define REMOTE_DATABASE_FILE "musiclist.txt"
int main() {
remove(DATABASE_FILE);
download(DATABASE_FILE, "musiclist.txt", REMOTE_URL, "Testing 123");
// ^^^ should't this be REMOTE_DATABASE_FILE?
...
}
void download(const char* file_to_write, const char* short_database, const char* addr, const char* msg) {
remove(DATABASE_FILE); // again?!?
struct FtpFile ftpfile={
DATABASE_FILE, /* name to store the file as if succesful */
// ^^^ which one? file_to_write or short_database
NULL
};

TStringList behavior with non ANSI files

In my application, when I want import a file, i use TStringList.
But, when someone export data from Excel, the file encoding is UCS-2 Little Endian, and TStringList can't read the data.
There is any way to validate this situation, identify the text encoding and send a warning to the user that the text provided is not compatible?
Just to be clear, the user will provide only plain text..letter and numbers, otherwise this, I must send the warning.
Unicode File without BOM is good. (TStringList can read it!)
ANSI file Too. (TStringList can read it!)
Even Unicode with BOM will be good, if there is a way to remove it. (TStringList can read it!, but with "i" ">>" and "reverse ?" characters, that belongs to BOM bytes)

I used the following function in Delphi 6 to detect Unicode BOMs.
const
//standard byte order marks (BOMs)
UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
UTF16LittleEndianBOM: array [0..1] of AnsiChar = #$FF#$FE;
UTF16BigEndianBOM: array [0..1] of AnsiChar = #$FE#$FF;
UTF32LittleEndianBOM: array [0..3] of AnsiChar = #$FF#$FE#$00#$00;
UTF32BigEndianBOM: array [0..3] of AnsiChar = #$00#$00#$FE#$FF;
function FileHasUnicodeBOM(const FileName: string): Boolean;
var
Buffer: array [0..3] of AnsiChar;
Stream: TFileStream;
begin
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite); // Allow other programs read access at the same time.
Try
FillChar(Buffer, SizeOf(Buffer), $AA);//fill with characters that we are not expecting then...
Stream.Read(Buffer, SizeOf(Buffer)); //...read up to SizeOf(Buffer) bytes - there may not be enough
//use Read rather than ReadBuffer so the no exception is raised if we can't fill Buffer
Finally
FreeAndNil(Stream);
End;
Result := CompareMem(#UTF8BOM, #Buffer, SizeOf(UTF8BOM)) or
CompareMem(#UTF16LittleEndianBOM, #Buffer, SizeOf(UTF16LittleEndianBOM)) or
CompareMem(#UTF16BigEndianBOM, #Buffer, SizeOf(UTF16BigEndianBOM)) or
CompareMem(#UTF32LittleEndianBOM, #Buffer, SizeOf(UTF32LittleEndianBOM)) or
CompareMem(#UTF32BigEndianBOM, #Buffer, SizeOf(UTF32BigEndianBOM));
end;
This will detect all the standard BOMs. You could use it to block such files if that's the behaviour you want.
You state that Delphi 6 TStringList can load 16 bit encoded files if they do not have a BOM. Whilst that may be the case, you will find that, for characters in the ASCII range, every other character is #0. Which I guess is not what you want.
If you want to detect that text is Unicode for files without BOMs then you could use IsTextUnicode. However, it may give false positives. This is a situation where I suspect it is better to ask for forgiveness than permission.
Now, if I were you I would not actually try to block Unicode files. I would read them. Use the TNT Unicode library. The class you want is called TWideStringList.

Execute a mapped file on Windows

Hello fellow developers,
I'm trying to map an executable binary file on Windows and then to execute the mapped file.
So far, I managed the mapping using CreateFileMapping and MapViewOfFile.
These functions gave me a HANDLE to the mapped file and a pointer to the mapped data but I have no clue how to execute the mapped binary.
I think I should use the CreateProcess function but what should it be given as parameters ?
char *binaryPath = "C:/MyExecutable.exe";
// Get the binary size
std::fstream stream(binaryPath, std::ios::in | std::ios::binary);
stream.seekg(0, std::ios::end);
unsigned int size = stream.tellg();
// Create a mapped file in the paging file system
HANDLE mappedFile = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_EXECUTE_READ, 0, size, NULL);
// Put the executable data into the mapped file
void* mappedData = MapViewOfFile(mappedFile, FILE_MAP_READ | FILE_MAP_EXECUTE, 0, 0, size);
stream.read((char*)mapping, size);
stream.close();
// What should I do now ?

There is no native way to run a raw executable image that resides in memory. CreateProcess() is the official way to run an executable image, but the image must reside on the file system instead. The OS loads the image into memory and then patches it as needed (resolves DLL references, etc) so it will actually run correctly.
With that said, I have seen third-party code floating around that duplicates what the OS does when it loads an executable image into memory. But I've only ever seen that used with DLLs (so code does not have to use LoadLibrary/Ex() to use a DLL in memory), not with EXEs.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight