I can't understand few lines of codes CS50 bottomup practice problem 4 week - c

I'm trying to understand the week 4 assignment (BOTTOMUP) but I'm totally lost. I understand that in order to "turn a photo upside down" you need to do the following basic steps:
Open the photo
Write them to the buffer
Overwrite the newly created file from the buffer
close old and new photo.
Everything is clear to me until:
// Read infile's BITMAPFILEHEADER
BITMAPFILEHEADER bf;
fread(&bf, sizeof(BITMAPFILEHEADER), 1, inptr);
&bf is a bufor? Can someone explain this in layman's terms?
I need to understand the basics

BITMAPFILEHEADER bf; declares a variable named bf that is a BITMAPFILEHEADER struct.
&bf takes the address of that variable to pass to the fread function. In other words, it's a pointer to your BITMAPFILEHEADER variable. That variable is the "buffer" (a section of computer memory) that the fread function will fill with the data from the file.
This code so far simply reads the header part of the bitmap file in that structure. The code that follows will use that information to figure out the size, pixel format, and layout of the image data that follows.
Since you question appears to be about basic C or C++ programming language concepts, I recommend you tag these kinds questions with tag for the programming language you're using. Also take a moment to format your code for Stack Overflow (see How do I format my posts?)

Related

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

Reading data from a file to a struct in C

Lets say i used the fread function to read data from a file to a struct. How exactly is data read to the struct? Lets say my struct has the following:
Int x;
Char c;
Will the first 4 bytes read go into x and the next byte go into c?
And if i read in more bytes than the elements in my struct can hold what's gonna happen?
Will the first 4 bytes read go into x and the next byte go into c?
Yes, unless your compiler has extremely strange padding rules (e.g. every member must be 8 byte aligned). And assuming Int is 4 bytes and Char is 1 byte.
And if i read in more bytes than the elements in my struct can hold what's gonna happen?
That's undefined behavior, unless perhaps the over-long write is not more than sizeof(YourStruct) in which case you'll only be writing to the padding bytes (which on a lot of platforms will be 3 bytes after the char).
fread reads data byte-for-byte from a file (stream) into memory. Therefore, if what you're trying to read is a struct, the byte layout of the struct in the file must exactly match the layout your compiler has chosen for the struct in memory.
So the question of "How does fread read from a file?" really boils down to, "How does the compiler lay out structs in memory?"
And the answer to that question is, it's partly determined by the rules of the C language, and it's partly up to the compiler.
So if you want to read structures from a file, you have three choices:
Learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. Keep all these rules in mind as you design your structures and your data file formats. (This is not an impossible task. Many programmers take this approach to file i/o all the time.)
Don't worry about the layout too much. Define your structures, and write them out to files using fwrite. Then the files are automatically readable using fread -- at least, as long as the program doing the reading is running on the same kind of machine, and was compiled by the same compiler using the same settings. (This, too, is a popular strategy, and works much of the time.)
Don't use fread to read structures form a file. (And although it sounds defeatist, this is my own preferred argument.)
There's much, much more that could be said abut this question. If you choose approach 1, as I've already said, you're going to have to learn everything you can about the C rules for laying out structures in memory, and the choices compilers can make in interpreting these rules. If you choose approach 3, you have to learn some decent techniques for doing so without using fwrite and fread. But I'm not going to launch into long explanations of either of those topics here. I'm sure someone else will post some links, or you could start with Chapter 17 of these C programming notes.

C How fread reads different data blocks in a binary file?

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.
I've read about fread() and on the web.
fread(&(targetObj->data), sizeof(TestObj), 1, file);
Now, file is a big binary file with lots of data in it.
What I want to know is how I can do this in C#.
Let me explain:
I think that line of code does this:
TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data
What I don't understand is:
I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?
Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#
For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?
and if I do
fread(&(targetObj->data), sizeof(TestObj), 5, file);
it is expected that testObj->data is an array of 5 unsigned shorts?
And the code will read the first 5 unsigned shorts that it finds in the whole binary file?
I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file
fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).
So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).
In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.
fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read
From man fread
size_t
fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612
And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

C - Save/Load Pointer Data to File

Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.
I have a structure as follows:
typedef struct {
int id;
char *title;
char *body;
} journal_entry;
Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?
Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.
My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.
The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?
Guidance or suggestions would be very gratefully received.
(I am focusing on a Linux point of view, but it could be adapted to other systems)
Serialization
What you want to achieve is often called serialization (citing wikipedia) - or marshalling:
The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer
Pointer I/O
It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).
Serialization of aggregate data
You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols
Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)
The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.
serialization libraries
Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..
Suggestion:
Use JSON and format your journal_entry perhaps into a JSON object like
{ "id": 1234,
"title": "Some Title Here",
"body": "Some body string goes here" }
Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON
databases
You could also consider a database approach (e.g. sqlite, etc...)
PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.
PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle
You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):
typedef struct {
int id;
char title[MAX_TITLE_LEGNTH];
char body[MAX_BODY_LENGTH];
} journal_entry;
If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.
I.e. to write:
FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);
To read (not particularly safe implementation, just to give the idea):
FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf); // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes += strlen(je.title)+1;
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.

Processing bitmap header

I have two questions about the BITMAPFILEHEADER structure.
First, if we make our own version of that structure the assigned memory would be 16 bytes because of Data Structure Alignment. But that of the BITMAPFILEHEADER is 14 bytes. why does that happen?
Second, as you already know Bitmap Header is Little-Endianed. so when you wish to access the value properly, you need to convert it to the Big-Endian. However if you see this question, you would see that the accepted answer does nothing. Would you guys explain how can it be possible?
Thank you for your help in advance.
A file can have any type of alignment, and the header of bitmap files happens to be 14 bytes (for more info: http://en.wikipedia.org/wiki/BMP_file_format). There is no rule that says that everything must be aligned (exceptions are SSE instructions which expect everything to aligned). Aligned data can be accessed faster, so it is recommended that you align your data, but you don't have to. File formats don't have align their data as well.
You need to convert it to big-endian if you want to read the values, but if you just want to create a new bitmap, you have to store the data in the same format as expected in the BITMAPFILEHEADER struct, which is little-endian.

Resources