Processing bitmap header - c

I have two questions about the BITMAPFILEHEADER structure.
First, if we make our own version of that structure the assigned memory would be 16 bytes because of Data Structure Alignment. But that of the BITMAPFILEHEADER is 14 bytes. why does that happen?
Second, as you already know Bitmap Header is Little-Endianed. so when you wish to access the value properly, you need to convert it to the Big-Endian. However if you see this question, you would see that the accepted answer does nothing. Would you guys explain how can it be possible?
Thank you for your help in advance.

A file can have any type of alignment, and the header of bitmap files happens to be 14 bytes (for more info: http://en.wikipedia.org/wiki/BMP_file_format). There is no rule that says that everything must be aligned (exceptions are SSE instructions which expect everything to aligned). Aligned data can be accessed faster, so it is recommended that you align your data, but you don't have to. File formats don't have align their data as well.
You need to convert it to big-endian if you want to read the values, but if you just want to create a new bitmap, you have to store the data in the same format as expected in the BITMAPFILEHEADER struct, which is little-endian.

Related

Is Minecraft missing zlib uncompressed size in it's chunk/region data?

Info on minecraft's region files
Minecraft's region files are stored in 3 sections, the first two giving information about where the chunks are stored, and information about the chunks themselves. In the final section, chunks are given as a 4-byte number length, the type of compression it uses, (almost always is zlib, RFC1950)
Here's more (probably better) information: https://minecraft.gamepedia.com/Region_file_format
The problem
I have a program that successfully loads chunk data. However, I'm not able to find how big the chunks will be when decompressed, and so I just use a maximum amount it could take when allocating space.
In the player data files, they do give the size that it takes when decompressed, and (I think) it uses the same type of compression.
The end of a player.dat file giving the size of the decompressed data (in little-endian):
This is the start of the chunk data, first 4 bytes giving how many bytes is in the following compressed data:
Mystery data
However, if I look where the compressed data specifically "ends", there's still a lot of data after it. This data doesn't seem to have a use, but if I try to decompress any of it with the rest of the chunk, I get an error.
Highlighted chunk data, and unhighlighted mystery data:
Missing decompressed size (header?)
And there's no decompressed size (or header? I could be wrong here) given.
The final size of this example chunks is 32,562 bytes, and this number (or any close neighbours) is nowhere to be found within the chunk data or mystery data. (Checked both big-endian, and little-endian)
Decompressed data terminating at index 32562, (Visual Studio locals watch):
Final Questions
Is there something I'm missing? Is this compression actually different from the player data compression? What's the mystery data? And am I stuck loading in 1<<20 bytes every time I want to load a chunk from a region file?
Thank you for any answers or suggestions
Files used
Isolated chunk data: https://drive.google.com/file/d/1n3Ix8V8DAgR9v0rkUCXMUuW4LJjT1L8B/view?usp=sharing
Full region data: https://drive.google.com/file/d/15aVdyyKazySaw9ZpXATR4dyvhVtrL6ZW/view?usp=sharing
(Not linking player data for possible security reasons)
In the region data, the chunk data starts at index 1208320 (or 0x127000)
The format information you linked is quite helpful. Have you read it?
In there it says: "The remainder of the file consists of data for up to 1024 chunks, interspersed with unused space." Furthermore, "Minecraft always pads the last chunk's data to be a multiple-of-4096B in length" (Italics mine.) Everything is in multiples of 4K, so the end of every chunk is padded to the next 4K boundary.
So your "mystery" data is not a mystery at all, as it is entirely expected per the format documentation. That data is simply junk to be ignored.
Note that, from the documentation, that the data "length" in the first three bytes of the chunk is actually one more than the number of bytes of data in the chunk (following the five-byte header).
Also from the documentation, there is indeed no uncompressed size provided in the format.
zlib was designed for streaming data, where you don't know ahead of time how much there will be. You can use inflate() to decompress into whatever buffer size you like. If there's not enough room to finish, you can either do something with that data and then repeat into the same buffer, or you can grow the buffer with realloc() in C, or the equivalent for whatever language you're using. (Not noted in the question or tags.)

Zlib decompress bytes with unknown compressed length in C

I am trying to write my own png reader without any external libraries. I need to use Zlib to decompress the png's IDAT chunk. I have managed to do it in python using zlib.decompress(), and I am trying to replicate it in C. I was reading over zlib's docs and found uncompress(), however it requires a destination length which I would not know.
I could set a destination to be much larger than possible for the png, but this seems like a cop-out and would break my program If I had a really big picture. However, i have found a function inflate() which can be used multiple times. If I could do this, i could realloc() memory if needed with each call. Yet I don't understand the docs for it very well and have not found much examples for this type of thing. Could anyone provide some code or help point me in the right direction?
You do know the destination length. Exactly. The PNG header information tells you how many rows, how many columns, and how many bytes per pixel. Multiply it all out, add a byte per row for the filtering, and you have your answer.
Allocate that amount of memory, and decompress into that.
Note that there can be multiple IDAT chunks, but combined they contain a single zlib stream.

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

Manipulating binary C struct data off line

I have a practical problem to solve. In an embedded system I'm working on, I have a pretty large structure holding system parameters, which includes int, float, and other structures. The structure can be saved to external storage.
I'm thinking of writing a PC software to change certain values inside the binary structure without changing the overall layout of the file. The idea is that one can change 1 or 2 parameter and load it back into memory from external storage. So it's a special purpose hex editor.
The way I figures it: if I can construct a table that contains:
- parameter name
- offset of the parameter in memory
- type
I should be able to change anything I want.
My problem is that it doesn't look too easy to figure out offset of each parameter programmatically. One can always manually to printf their addresses but I'd like to avoid that.
Anybody know a tool that can be of help?
EDIT
The system in question is 32 ARM based running in little endian mode. Compiler isn't asked to pack the structure so for my purpose it's the same as a x86 PC.
The problem is the size of the structure: it contains no less than 1000 parameters in total, spreading across multiple level of nested structures. So it's not feasible (or rather i'm not willing to) to write a short piece of code to dump all the offsets. I guess the key problem is parsing the C headers and automatically generate offsets or code to dump their offsets. Please suggest such a parsing tool.
Thanks
The way this is normally done is by just sharing the header file that defines the struct with the other (PC) application as well. You should then be able to basically load the bytes and cast them to that kind of struct and edit away.
Before doing this, you need to know about (and possibly handle):
Type sizes. If the two platforms are very different (16bit vs 32 bit) etc, you might have differently sized ints/floats, etc.
Field alignment. If the compiler/target are different, the packing rules for the fields won't be necessarily the same. You can generally force these to be the same with compiler-specific #pragmas.
Endianness. This is something to be careful of in all cases. If your embedded device is a different endianness from the PC target, you'll have to byte swap each field when you load on the PC.
If there are too many differences here to bother with, and you're building something quicker/dirtier, I don't know of tools to generate struct offsets automatically, but as another poster says, you could write some short C code that uses offsetof or equivalent (or does, as you suggest, pointer arithmetic) to dump out a table of fields and offsets.
I suspect you could parse the output of the pahole tool to get this. It reads the debugging information in an executable and prints out the type, name and offset of each struct member.
Given a fully-defined structure in your code, you can use the offsetof() macro to find out how the fields are laid out. This won't help if you're trying to programmatically determine the layout on the other machine, though.
Rather than writing your own tool, there are existing hex editors that'll take a struct definition and let you edit the values in a file. If you're using Windows, I believe that Hex Workshop can do this, for example.
To make sure I understand the question:
In your embedded system, you have a struct that contains system settings that looks something like this:
struct SubStruct{
int subSetting1;
float subSetting2;
}
/* ... some other substructure definitions. ... */
struct Settings{
float setting1;
int setting2;
struct SubStruct setting3;
/* ... lots of other settings ... */
}
And your problem is you want to take a binary file containing said structure and modify the values on another machine?
Given that you know the sizes of the integral types and the packing used by the embedded system, you may have two options. If the endianness of your embedded system and PC system is different, you will also have to deal with that, e.g. using hton when writing fields.
Option 1:
Declare the same structure in your PC program, with pragmas to force the same byte-alignment as the embedded system, and with any integral types that are different sizes between the embedded and PC systems "retyped" to specific sizes, e.g. by using int16_t, int32_t if you have stdint.h, or your own typedefs if not, on your PC program.
Option 2:
You may be able to simply make a list of types, e.g. a list that looks like:
Name Type
setting1 float
setting2 int
subStruct1SubSetting1 int
subStruct1SubSetting2 float
And then calculate the locations based on the sizes and packing used by the embedded system, e.g. if the embedded system uses 4-byte alignment, 4-byte floats, and 2-byte ints, the above table would calculate out:
Name Type Calculated Offset
setting1 float 0
setting2 int 4
subStruct1SubSetting1 int 8
subStruct1SubSetting2 float 12
or if the embedded system packs the structure to 1-byte alignment (e.g. if it's an 8-bit micro)
Name Type Calculated Offset
setting1 float 0
setting2 int 4
subStruct1SubSetting1 int 6
subStruct1SubSetting2 float 8
I'm unsure what debugging capabilities you have in your system. Under Windows I would use simple debugging script with text conversion on top of it (debugger can be run in batch mode to generate type information).

reading 16-bit greyscale TIFF

I'm trying to read a 16-bit greyscale TIFF file (BitsPerSample=16) using a small C program to convert into an array of floating point numbers for further analysis. The pixel data are, according to the header information, in a single strip of 2048x2048 pixels. Encoding is little-endian.
With that header information, I was expecting to be able to read a single block of 2048x2048x2 bytes and interpret it as 2048x2048 2-byte integers. What I in fact get is a picture split into four quadrants of 1024x1024 pixels each, the lower two of which contain only zeros. Each of the top two quadrants look like I expected the whole picture to look: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457.png
If I read the same file into Gimp or Imagemagick, both tell me that they have to reduce to 8-bit (which doesn't help me - I need the full range), but the pixels turn up in the right places: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457_gimp.png
This would suggest that my idea about how the data are arranged within the one strip is wrong. On the other hand, the file must be correctly formatted in terms of the header information as otherwise Gimp wouldn't get it right. Where am I going wrong?
Output from tiffdump:
15_inRT_0p457.tiff:
Magic: 0x4949 Version: 0x2a
Directory 0: offset 8 (0x8) next 0 (0)
ImageWidth (256) LONG (4) 1<2048>
ImageLength (257) LONG (4) 1<2048>
BitsPerSample (258) SHORT (3) 1<16>
Compression (259) SHORT (3) 1<1>
Photometric (262) SHORT (3) 1<1>
StripOffsets (273) LONG (4) 1<4096>
Orientation (274) SHORT (3) 1<1>
RowsPerStrip (278) LONG (4) 1<2048>
StripByteCounts (279) LONG (4) 1<8388608>
XResolution (282) RATIONAL (5) 1<126.582>
YResolution (283) RATIONAL (5) 1<126.582>
ResolutionUnit (296) SHORT (3) 1<3>
34710 (0x8796) LONG (4) 1<0>
(Tag 34710 is camera information; to make sure this doesn't somehow make any difference, I've zeroed the whole range from the end of the image file directory to the start of data at 0x1000, and that in fact doesn't make any difference.)
I've found the problem - it was in my C program...
I had allocated memory for an array of longs and used fread() to read in the data:
#define PPR 2048;
#define BPP 2;
long *pix;
pix=malloc(PPR*PPR*sizeof(long));
fread(pix,BPP,PPR*PPR,in);
But since the data come in 2-byte chunks (BPP=2) but sizeof(long)=4, fread() packs the data densely inside the allocated memory rather than packing them into long-sized parcels. Thus I end up with two rows packed together into one and the second half of the picture empty.
I've changed it to loop over the number of pixels and read two bytes each time and store them in the allocated memory instead:
for (m=0;m<PPR*PPR;m++) {
b1=fgetc(in);
b2=fgetc(in);
*(pix+m)=256*b1+b2;
}
You understand that if StripOffsets is an array, it is an offset to an array of offsets, right? You might not be doing that dereference properly.
What's your platform? What are you trying to do? If you're willing to work in .NET on Windows, my company sells an image processing toolkit that includes a TIFF codec that works on pretty much anything you can throw at it and will return 16 bpp images. We also have many tools that operate natively on 16bpp images.

Resources