I have an f77 unformatted binary file.
I know that the file contains 2 floats and a long integer as well as data.
The size of the file is 536870940 bytes which should include 512^3 float data values together with the 2 floats and the long integer.
The 512^3 float data values make up 536870912 bytes leaving a further 28 bytes.
My problem is that I need to work out where the 28 bytes begins and how to skip this amount of storage so that I can directly access the data.
I prefer to use C to access the file.
Unfortunately, there is no standard what unformatted means. But some methods are more common than others.
In many Fortran versions I have used, every write command writes a header (often unsigned int 32) of how many bytes the data is, then the data, then repeats the header value in case you're reading from the rear.
From the values you have provided, it might be that you have something like this:
uint32(record1 header), probably 12.
float32, float32, int32 (the three 'other values' you talked about)
uint32(record1 header, same as first value)
uint32(record2 header, probably 512^3*4)
float32*512^3
uint32(record2 header, same as before)
You might have to check endianness.
So I suggest you open the file in a hexdump program, and check whether bytes 0-3 are identical to bytes 16-19, and whether bytes 20-23 are repeated at the end of the data again.
If that is the case, I'll try to check the endianness to see whether the values are little or big endian, and with a little luck you'll have your data.
Note: I assume that these three other values are metadata about the data, and therefore would be at the beginning of the file. If that's not the case, you might have them at the end.
Update:
In your comment, you write that your data begins with something like this:
0C 00 00 00 XX XX XX XX XX XX XX XX XX XX XX XX 0C 00 00 00
^- header-^ ^-header -^
E8 09 FF 1F (many, many values) E8 09 FF 1F
^- header-^ ^--- your data ---^ ^-header -^
Now I don't know how to read data in C. I leave this up to you. What you need to do is skip the first 24 bytes, then read the data as (probably little endian) 4-byte floating values. You will have 4 bytes left that you don't need any more.
Important note:
Fortran stores arrays column-major, C afaik stores them row-major. So keep in mind that the order of the indices will be reversed.
I know how to read this in Python:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '<u4')
# read the three values you are not interested in
threevals = ff.read_record('<u4')
# read the data
data = ff.read_record('<f4')
ff.close()
Related
To give a context, I have an incoming stream of Hex values that is getting written to a CSV file which are in the format shown below.
20 5a 20 5e 20 7b 20 b1 20 64 20 f8 ...
I can not change the way the data is flowing in, but before it gets written to a CSV file I want it in this format below.
205a 205e 207b 20b1 2064 20f8 ...
As the data is coming, I need to process it and store it in the format shown above. One of the ways I tried was just bitshifting and doing logical OR which would store the result in a variable. But all I have here is a pointer pointing to a buffer where the data will be flowing into. I have something like this.
uint8_t *curr_ptr;
uint8_t* dec_buffer=(uint8_t*)calloc(4000,sizeof(uint8_t)*max_len);
init=dec_buffer;
curr_ptr=init+((count)*max_len);
for(int j=17;j<=145;j+=1){
fprintf(f_write[file_count],"%02x ", *(curr_ptr+j));
if(j>0 && j%145==0){
fprintf(f_write[file_count],"\n");
Effectively you want to remove every other space. Why not something like this?
for(int j=17;j<=145;j+=1){
fprintf(f_write[file_count], j%2 ? "%02x " : "%02x", *(curr_ptr+j));
Not sure if you should be printing spaces after the odd values of j or the even ones, but you can sort that out.
I've been running some code under UBSan, and found an error which I've never seen before:
/usr/include/c++/7/bits/stl_algobase.h:324:8: runtime error: store to misaligned address 0x611000001383 for type 'struct complex', which requires 4 byte alignment
0x611000001383: note: pointer points here
66 46 40 02 00 00 00 00 00 00 00 00 04 01 18 00 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00
^
(g++-7.3.0, Ubuntu 18.04, flags -fsanitize=address -fsanitize=undefined)
What does this error mean? Is it truly an error (it is in the standard library, so it can't be too bad, right?), and should I care about it?
You probably use a pointer cast which casts a block of raw memory to a complex*.
Example:
void* raw = getBuffer(); // Made up function which returns a buffer
auto size = *static_cast<uint16_t>*(raw); // Maybe your format says that you got a 2 Byte size in front
auto* array = static_cast<complex*>(raw+sizeof(uint16_t)); // ... and complex numbers after
std::transform(array, array+size, ...); // Pass this into STL
Boom! You got UB.
Why?
The behavior is undefined in the following circumstances: [...]
Conversion between two pointer types produces a result that is incorrectly aligned
[...]
If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined.
See https://stackoverflow.com/a/46790815/1930508 (where I got these from)
What does it mean?
Every pointer must be aligned to the type it is pointing to. For complex this means an alignment of 4. In short this means that array (from above) must be evenly divisible by 4 (aka array % 4 == 0) Assuming that raw is aligned to 4 bytes you can easily see that array cannot as (raw + 2) % 4 == 2 (because of raw % 4 == 2)
If the size would be a 4-Byte value, then array would have been aligned if (and only if) raw was aligned. Whether this is guaranteed depends on where it comes from.
So yes this is truly an error and may lead to a real bug although not always (depending on moon phase etc. as it is always with UB, see the answer above for details)
And no it is NOT in the STL, it just happens to be detected there because UBSAN watches memory dereferences. So while the actual UB is the static_cast<complex*> it is only detected when reading from that pointer.
You can use export UBSAN_OPTIONS=print_stacktrace=1 prior to executing the program to get a stacktrace and find out where your wrong cast is.
Tip: You only need to check casts. Any struct/type allocated via new is always aligned (and every member inside), unless tricks like "packed structs" are used.
Initially I thought this conversion would be a peace of cake. But since it didn't worked when running on windows I have to ask for advise. This is the situation:
Given are some bytes received over can e.g. like
5C FE 83 10 00 00 02 29.
^^ ^^
I have to take pair number two and three and concatenate them as FE83 and interpret them as signed Number, which in this case should result in -381.
In Matlab this worked as:
for i = 1:length(sasBytes)
pair23(i) = double( typecast( uint16( base2dec( strcat( hexBytes{i}{2}, hexBytes{i}{3} ), 16) ), 'int16') );
end
Now I'd like to have c code which does the same. I don't know where the code will run yet, so I'd rather use standard int's like int32_t. Is there anything else I have to take care off, when porting the code?
I've been reading the description of xz file format ( http://tukaani.org/xz/xz-file-format.txt ). But when I try to look into an xz file with binary editor, it doesn't seem to follow the structure defined in the description. What am I missing?
I compressed the description file (xz-file-format.txt) with xz cli utility in linux (xz version 4.999.9beta) and these are the first 32 bytes I get:
FD 37 7A 58 5A 00 00 04 E6 D6 B4 46 02 00 21 01 16 00 00 00 74 2F E5 A3 E0 A9 28 2A 99 5D 00 05
Overall structure of the file should be: stream - stream padding - stream - and so on. And in this case I think there should be only one stream since there is only one file compressed in the file. Structure of the stream is: stream header - block - block - ... - block - index - stream footer. And structure of the stream header is: header magic bytes - stream flags - crc code.
I can find the stream header from my file, but after the first sixteen bytes it doesn't seem to follow the description anymore.
First six bytes above are clearly the magic bytes. Next two bytes are the stream flags. Stream flags indicate that CRC64 is being used, so the CRC code takes next eight bytes. Seventeenth byte (I count from one) should then be the first byte of the first block.
Structure of a block is: block header - compressed data - block padding - check. Structure of block header should be: block header size - block flags - compressed size - uncompressed size - list of filter flags - header padding - CRC. So the seventeenth byte should then be block header size (0x16 in my file). That's possible, but the eighteenth byte seems a bit weird. It should be the block flags bit field. In my file it's null - so no flags set. Not even the number of filters, which according to description should be 1-4.
Since bits 6 and 7 of the block flags are also zeros, compressed and uncompressed sizes should not be present in the file and the next bytes should be the list of filter flags. Structure of the list is: filter ID - size of properties - filter properties. Nineteenth byte should then be filter ID. This is null in my file which is not any of officially defined filter IDs. If it would be a custom ID it would take nine bytes, but as I understand the encoding of sizes described in section 1.2 of the description it can't be, since according to the description: "All but the last byte of the multibyte representation have the highest (eighth) bit set.", but in my file the twentieth byte is also null.
So is there something I don't understand or is the file not following the description?
I asked the question a bit hastily and came up with a solution myself. Just in case someone would be interested, I answer my own question.
I had misunderstood the meaning of the stream flags in stream header. They don't affect the CRC code in the header (which is always CRC32), just CRCs in the stream itself (as the name stream flags implies). This means that the CRC in the header is only four bytes long and thus bytes 13-24 form a valid block header.
In the block header, the block flags field is again a null byte, which I saw as a problem before. According to the description, number of filters should be between 1 and 4. So I expected a decimal value of at least one. Since number of filters is expressed with two bits the maximum decimal value is 3, but number of possible values (zero included) is of course four and thus zero means one filter.
Since also the last two bits of the block flags are zeros, no compressed size or uncompressed size fields are present in the block header. This means that bytes 15-17 are the filter flags for the first (and only) filter. Filter id 0x21 is the id of LZMA2 filter. Size of properties 0x01 means size of one byte. And dictionary size 0x16 means size of 4096 KiB.
I have a 256-bit value in Verilog:
reg [255:0] val;
I want to define a system task $foo that calls out to external C using the VPI, so I can call $foo like this:
$foo(val);
Now, in the C definition for the function 'foo', I cannot simply read the argument as an integer (PLI_INT32), because I have too many bits to fit in one of those. But, I can read the argument as a string, which is the same thing as an array of bytes. Here is what I wrote:
static int foo(char *userdata) {
vpiHandle systfref, args_iter, argh;
struct t_vpi_value argval;
PLI_BYTE8 *value;
systfref = vpi_handle(vpiSysTfCall, NULL);
args_iter = vpi_iterate(vpiArgument, systfref);
argval.format = vpiStringVal;
argh = vpi_scan(args_iter);
vpi_get_value(argh, &argval);
value = argval.value.str;
int i;
for (i = 0; i < 32; i++) {
vpi_printf("%.2x ", value[i]);
}
vpi_printf("\n");
vpi_free_object(args_iter);
return 0;
}
As you can see, this code reads the argument as a string and then prints out each character (aka byte) in the string. This works almost perfectly. However, the byte 00 always gets read as 20. For example, if I assign the Verilog reg as follows:
val = 256'h000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f;
And call it using $foo(val), then the C function prints this at simulation time:
VPI: 20 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
I have tested this with many different values and have found that the byte 00 always gets mapped to 20, no matter where or how many times it appears in val.
Also, note that if I read the value in as a vpiHexStrVal, and print the string, it looks fine.
So, two questions:
Is there a better way to read in my 256-bit value from the Verilog?
What's going on with the 20? Is this a bug? Am I missing something?
Note: I am using Aldec for simulation.
vpiStringVal is used when the value is expected to be ASCII text, in order to get the value as a pointer to a C string. This is useful if you want to use it with C functions that expect a C string, such as printf() with the %s format, fopen(), etc. However, C strings cannot contain the null character (since null is used to terminate C strings), and also cannot represent x or z bits, so this is not a format that should be used if you need to distinguish any possible vector value. It looks like the simulator you are using formats the null character as a space (0x20); other simulators just skip them, but that doesn't help you either. To distinguish any possible vector value use either vpiVectorVal (the most compact representation) or vpiBinStrVal (a binary string with one 0/1/x/z character for each bit).