Reading a jpeg file byte by byte - c

For the class cs50, I have to read in jpeg files byte by byte from a memory card in order to look at the header information. The file compiles well, but whenever I execute the file, it returns a "segmentation fault(core dumped)" message.
Edit) Okay, now I know why I have to use an "unsigned char" instead of "int*". Can someone tell me how I can store information into files within scope for this particular code? Right now, I am trying to store information outside of an if() condition, and I don't think the fread function is actually accessing the "image" file I opened.
#include <stdio.h>
#include <string.h>
#include <math.h>
FILE * image = NULL;
int main(int argc, char* argv[])
{
FILE* infile = fopen("card.raw", "r");
if (infile == NULL)
{
printf("Could not open.\n");
fclose(infile);
return 1;
}
unsigned char storage[512];
int number = 0;
int b = floor((number) / 100);
int c = floor(((number) - (b * 100))/ 10);
int d = floor(((number) - (b * 100) - (c * 10)));
int writing = 0;
char string[5];
char* extension = ".jpg";
while (fread(&storage, sizeof(storage), 1, infile))
{
if (storage == NULL)
{
break;
}
if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
{
if (storage[3] == 0xe0 || storage[3] == 0xe1)
{
if (image != NULL)
{
fclose(image);
}
sprintf(string, "%d%d%d%s", b, c, d, extension);
image = fopen(string, "w");
number++;
writing = 1;
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
}
}
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
if (storage == NULL)
{
fclose(image);
}
}
fclose(image);
fclose(infile);
return 0;
}
This is the problem set just in case my explanation is not clear.
recover
In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my
digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s
possible I actually spent the past several days on Facebook instead.)
Unfortunately, I’m not very good with computers, and I somehow deleted
them all! Thankfully, in the computer world, "deleted" tends not to
mean "deleted" so much as "forgotten." My computer insists that the CF
card is now blank, but I’m pretty sure it’s lying to me.
Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.
Ummm.
Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish
them from other file formats. In fact, most JPEGs begin with one of
two sequences of bytes. Specifically, the first four bytes of most
JPEGs are either
0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1
from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos
(e.g., my CF card), they demark the start of a JPEG. (To be sure, you
might encounter these patterns on some disk purely by chance, so data
recovery isn’t an exact science.)
Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately
after the previously taken photo. Accordingly, the start of a JPEG
usually demarks the end of another. However, digital cameras generally
initialize CF cards with a FAT file system whose "block size" is 512
bytes (B). The implication is that these cameras only write to those
cards in units of 512 B. A photo that’s 1 MB (i.e.,
1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e.,
1,048,575 B)! The wasted space on disk is called "slack space."
Forensic investigators often look at slack space for remnants of
suspicious data.
The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy
of my CF card, looking for JPEGs' signatures. Each time you find a
signature, you can open a new file for writing and start filling that
file with bytes from my CF card, closing that file only once you
encounter another signature. Moreover, rather than read my CF card’s
bytes one at a time, you can read 512 of them at a time into a buffer
for efficiency’s sake. Thanks to FAT, you can trust that JPEGs'
signatures will be "block-aligned." That is, you need only look for
those signatures in a block’s first four bytes.
Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a
JPEG might not fall at the very end of a block. Recall the possibility
of slack space. But not to worry. Because this CF card was brand- new
when I started snapping photos, odds are it’d been "zeroed" (i.e.,
filled with 0s) by the manufacturer, in which case any slack space
will be filled with 0s. It’s okay if those trailing 0s end up in the
JPEGs you recover; they should still be viewable.
Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card,
storing its contents, byte after byte, in a file called card.raw . So
that you don’t waste time iterating over millions of 0s unnecessarily,
I’ve only imaged the first few megabytes of the CF card. But you
should ultimately find that the image contains 16 JPEGs. As usual, you
can open the file programmatically with
fopen , as in the below. FILE* file = fopen("card.raw", "r");
Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide
how to implement and compile recover!) For simplicity, you should
hard-code "card.raw" in your program; your program need not accept any
command-line arguments. When executed, though, your program should
recover every one of the JPEGs from card.raw, storing each as a
separate file in your current working directory. Your program should
number the files it
outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to
recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact,
your operation was likely a success!
Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see
anything, they’re probably not correct!) Execute the command below to
delete all JPEGs in your current working directory.
rm *.jpg
If you’d rather not be prompted to confirm each deletion, execute the command below instead.
rm -f *.jpg
Just be careful with that -f switch, as it "forces" deletion without prompting you.

int* storage[512];
You define a pointer to a memory location for 512 ints, but you don't actually reserve the space (only the pointer.
I suspect you just want
int storage[512];
After this, storage is still a pointer, but now it actually points to 512 ints. Though I still think you don't want this. You need 'bytes' not ints. The nearest C has are unsigned char. So the final declaration is:
unsigned char storage[512];
Why? Because read reads into consecutive bytes. If you read into ints, then you will read 4 bytes into each int (because an int occupies 4 bytes).

There are a number of problems in your program. The first is that you have not opened the file in binary mode.
The second is that you are doing unnecessary pointer arithmetic. Why not—
char buffer [BUFFERSIZE] ;
....
if (buffer [ii] == WHATEVER)

Related

How to duplicate an image file? [duplicate]

I am designing an image decoder and as a first step I tried to just copy the using c. i.e open the file, and write its contents to a new file. Below is the code that I used.
while((c=getc(fp))!=EOF)
fprintf(fp1,"%c",c);
where fp is the source file and fp1 is the destination file.
The program executes without any error, but the image file(".bmp") is not properly copied. I have observed that the size of the copied file is less and only 20% of the image is visible, all else is black. When I tried with simple text files, the copy was complete.
Do you know what the problem is?
Make sure that the type of the variable c is int, not char. In other words, post more code.
This is because the value of the EOF constant is typically -1, and if you read characters as char-sized values, every byte that is 0xff will look as the EOF constant. With the extra bits of an int; there is room to separate the two.
Did you open the files in binary mode? What are you passing to fopen?
It's one of the most "popular" C gotchas.
You should use freadand fwrite using a block at a time
FILE *fd1 = fopen("source.bmp", "r");
FILE *fd2 = fopen("destination.bmp", "w");
if(!fd1 || !fd2)
// handle open error
size_t l1;
unsigned char buffer[8192];
//Data to be read
while((l1 = fread(buffer, 1, sizeof buffer, fd1)) > 0) {
size_t l2 = fwrite(buffer, 1, l1, fd2);
if(l2 < l1) {
if(ferror(fd2))
// handle error
else
// Handle media full
}
}
fclose(fd1);
fclose(fd2);
It's substantially faster to read in bigger blocks, and fread/fwrite handle only binary data, so no problem with \n which might get transformed to \r\n in the output (on Windows and DOS) or \r (on (old) MACs)

Strange results with reading binary files in C

I'm working on 64-bit Xubuntu 14.04.
I have a fairly large C program and to test new features, I usually implement them in a separate program to iron out any bugs and whatnot before incorporating them into the main program.
I have a function that takes a const char* as argument, to indicate the path of a file (/dev/rx.bin in this case). It opens the file, reads a specific number of bytes into an array and then does some things before exporting the new data to a different file.
First off I allocate the array:
int16_t samples = (int16_t *)calloc(rx_length, 2 * sizeof(samples[0]));
Note that rx_length is for example 100 samples (closer to 100 000 in the actual program), and it's calculated from the same constants.
Next I open the file and read from it:
uint32_t num_samples_read;
FILE *in_file = fopen(file, "rb");
if (in_file == NULL){
ferror(in_file);
return 1;
}
num_samples_read = fread(samples, 2 * sizeof(samples[0]), rx_length, in_file);
Here's the kicker; the return value from fread is not the same between the test program and the main program, while the code is identical. For example, when I should be reading 100 000 samples from a 400 kB file (100 000 samples, one int16_t for the real part and one int16_t for the imaginary part, adds up to four bytes per sample), the value returned is 99328 in the main program. For the life of me I cannot figure out why.
I've tested the output of every single variable used in any calculation, and up until fread() everything is identical.
I should also note that the function is in a separate header in my main program, but I figured that since printing every constant / definition gives the expected result, that it's not there where I'm making a mistake.
If there's anything that I might have missed, any input would be greatly appreciated.
Regards.
Thank you chux for reminding me to close and answer.
Closing the file was the problem in my main program, it never occurred within the test environment because the input file was not being modified there.
Once the RX thread has completed its task, make a call to fclose():
rx_task_out:
fclose(p->out_file);
// close device
// free sample buffer
return NULL;
Previously, only an error status with creating the RX thread caused it to close the file.

Copy data from SD card to RAM on ARM

i need your help! I want to copy one file from the SD Card to the memory of my ARM Cortex A9 (to transfer it faster to the FPGA). But i dont know the start address of the file and the size. Are there any possibilities to find this information? I have some exp with FPGA, but not with mC and ARM =(
Many thanks in advance! Djrem
You're going to need some file system layer, that can interpret the contents of the SD card properly. Such cards are typically never used as raw flash, but instead use a file system layer on top which gives you directories and files. This is of course necessary when moving the card between devices, to make it interoperable.
Once you have a file system driver, you're going to be able to basically open the file on the SD card for reading, and then sit in a loop reading blocks of some suitable size. For every block read in, you simply copy copy it to the desired address in the RAM. Of course, you can read it directly to the proper address too, skipping the copy.
In pseudo-C, it would basically just be:
FILE *in;
if((in = fopen("sd0:\\file.dat", "rb")) != NULL)
{
unsigned char *target = (unsigned char *) 0xec008000; /* totally random */
size_t got;
while((got = fread(target, 1024, 1, in)) > 0)
{
target += got;
}
fclose(in);
}
Of course, you will probably not be using stdio, so the fopen(), fread() and fclose() functions will be something different depending on your file system driver.

fread Only first 5 bytes of .PNG file

I've made a simple resource packer for packing the resources for my game into one file. Everything was going fine until I began writing the unpacker.
I noticed the .txt file - 26 bytes - that I had packed, came out of the resource file fine, without anyway issues, all data preserved.
However when reading the .PNG file I had packed in the resource file, the first 5 bytes were intact while the rest was completely nullified.
I traced this down to the packing process, and I noticed that fread is only reading the first 5 bytes of the .PNG file and I can't for the life of me figure out why. It even triggers 'EOF' indicating that the file is only 5 bytes long, when in fact it is a 787 byte PNG of a small polygon, 100px by 100px.
I even tested this problem by making a separate application to simply read this PNG file into a buffer, I get the same results and only 5-bytes are read.
Here is the code of that small separate application:
#include <cstdio>
int main(int argc, char** argv)
{
char buffer[1024] = { 0 };
FILE* f = fopen("test.png", "r");
fread(buffer, 1, sizeof(buffer), f);
fclose(f); //<- I use a breakpoint here to verify the buffer contents
return 0;
}
Can somebody please point out my stupid mistake?
Can somebody please point out my stupid mistake?
Windows platform, I guess?
Use this:
FILE* f = fopen("test.png", "rb");
instead of this:
FILE* f = fopen("test.png", "r");
See msdn for explanation.
Extending the correct answer from SigTerm, here is some background of why you got the effect you did for opening a PNG file in text mode:
The PNG format explains its 8-byte file header as follows:
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
This signature both identifies the file as a PNG file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish PNG files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a PNG file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem.
I believe that in text mode, the call to fread() was terminated when it read the sixth byte which contains a Ctrl+Z character. Ctrl+Z was historically used in MSDOS (and in CPM before it) to indicate the end of a file, which was necessary because the file system stored the size of a file as a count of blocks, not a count of bytes.
By reading the file in text mode instead of binary mode, you triggered the protection against accidentally using the TYPE command to display a PNG file.
One thing you could do that would have helped diagnose this error is to use fread() slightly differently. You didn't test the return value from fread(). You should. Further, you should call it like this:
...
size_t nread;
...
nread = fread(buffer, sizeof(buffer), 1, f);
so that nread is a count of the bytes actually written to the buffer. For the PNG file in text mode, it would have told you on the first read that it only read 5 bytes. Since the file cannot be that small, you would have had a clue that something else was going on. The remaining bytes of the buffer were never modified by fread(), which would have been seen if you initialized the buffer to some other fill value.

XOR on a very big file

I would like to XOR a very big file (~50 Go).
More precisely, I would like to do so by XORing each block of 32 bytes of a plaintext file (because of lack of memory) with the key 3847611839 and create (block after block) a new cipher file.
Thank You for any help!!
This sounded like fun, and doesn't sound like a homework assignment.
I don't have a previously xor-encrypted file to try with,but if you convert one back and forward, there's no diff.
That I tried atleast. Enjoy! :) This xor's every 4 bytes with 0xE555E5BF, I presume that's what you wanted.
Here's bloxor.c
// bloxor.c - by Peter Boström 2009, public domain, use as you see fit. :)
#include <stdio.h>
unsigned int xormask = 0xE555E5BF; //3847611839 in hex.
int main(int argc, char *argv[])
{
printf("%x\n", xormask);
if(argc < 3)
{
printf("usage: bloxor 'file' 'outfile'\n");
return -1;
}
FILE *in = fopen(argv[1], "rb");
if(in == NULL)
{
printf("Cannot open: %s", argv[2]);
return -1;
}
FILE *out = fopen(argv[2], "wb");
if(out == NULL)
{
fclose(in);
printf("unable to open '%s' for writing.",argv[2]);
return -1;
}
char buffer[1024]; //presuming 1024 is a good block size, I dunno...
int count;
while(count = fread(buffer, 1, 1024, in))
{
int i;
int end = count/4;
if(count % 4)
++end;
for(i = 0;i < end; ++i)
{
((unsigned int *)buffer)[i] ^= xormask;
}
if(fwrite(buffer, 1, count, out) != count)
{
fclose(in);
fclose(out);
printf("cannot write, disk full?\n");
return -1;
}
}
fclose(in);
fclose(out);
return 0;
}
As starblue mentioned in a comment, "Be aware that this is at best obfuscation, not encryption". And it's probably not even obfuscation.
One property of XOR is that (Y xor 0) == Y. What this means for your algorithm is that for anyplace in your very big file where there are runs of zeros (which seems pretty likely given the size of the file), your key will show up in the cipher file. Plain as day.
Another nice feature of XOR encrypted stuff is that if someone has both the plaintext and the cipher text, XOR'ing those items together nets you an output that has the key used to perform the cipher repeated over and over. If the person knows that the 2 files are a plaintext/ciphertext pair, they've learned the key which is bad if the key is used for more than one encryption. if the attacker isn't sure if the plaintext and ciphertext are related, they have a pretty good idea after this since the key is a repeated pattern in the output. None of this is a problem with one time pad because each bit of the key is used only once, so one one learns anything new from this attack.
A lot of people make the mistake of assuming that because a one time pad is provably unbreakable, that an XOR encryption might be OK 'if done well' since the fundamental operation performed is the same. The difference is that a one time pad uses each random bit of the key exactly once. So among other things, if the plaintext has a run of zeros, nothing is learned about the key, unlike with a simple fixed-key XOR cipher.
As Bruce Schneier said: "There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files."
An XOR cipher is barely kid sister proof - if even that.
You need to craft a solution around a streaming architecture: you read the input file in "stream", modify it, and write the result in the output file.
This way, you don't have to read all the file at once.
If your question is how to do it without using extra space on the disk, I would just read in the chunks in multiples of 32 bytes (as big as you can), work with the chunk in memory, then write it out again. You should be able to use the ftell and fseek functions to do that (assuming your long type is large enough, of course).
It may be faster to memory-map the file if you can spare that much out of your address space (and your OS supports it) but I'd try the easiest solution first.
Of course, if space isn't a problem, just read the chunks in and write them to a new file, something like the following (pseudo-code):
open infile
open outfile
while not end of infile:
read chunk from file
change chunk
write chunk to outfile
close outfile
close infile
This sort of read/process/write is pretty basic stuff. If you have more complicated requirements, you should update your question with them.

Resources