Segmentation fault in CS50 (2020) recovery program - c

I am trying to write a program that will recover deleted images from a file and write each of those images to their own seperate files. I've been stuck on this problem for a few days, and have tried my best to solve it on my own, but I now realize I need some guidance. My code always compiles well, but everytime I run my program I suffer a segmentation fault. Using valgrind shows me that I don't have any memory leaks.
I think I have pinpointed the issue, though I'm not sure how to resolve it. When I run my program through the debugger, it always stops at the code inside my last 'else' condition (where the comment says "If already found JPEG") , and gives me an error message about the segmentation fault.
I have tried opening and initializing my file pointer jpegn atop this line of code, to prevent jpegn from being NULL when this condition is run, but that did not work to fix the fault.
I am very new to programming (and this site) so any advice or suggestions would be helpful.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef uint8_t BYTE;
int main(int argc, char *argv[])
{
if(argc!=2) // Checks if the user typed in exactly 1 command-line argument
{
printf("Usage: ./recover image\n");
return 1;
}
if(fopen(argv[1],"r") == NULL) // Checks if the image can be opened for reading
{
printf("This image cannot be opened for reading\n");
return 1;
}
FILE *forensic_image = fopen(argv[1],"r"); // Opens the image inputted and stores it in a new file
BYTE *buffer = malloc(512 * sizeof(BYTE)); // Dynamically creates an array capable of holding 512 bytes of data
if(malloc(512*sizeof(BYTE)) == NULL) // Checks if there is enough memory in the system
{
printf("System error\n");
return 1;
}
// Creates a counting variable, a string and two file pointers
int JPEG_num=0;
char *filename = NULL;
FILE *jpeg0 = NULL;
FILE *jpegn = NULL;
while(!feof(forensic_image)) // Repeat until end of image
{
fread(buffer, sizeof(BYTE), 512, forensic_image); // Read 512 bytes of data from the image into a buffer
// Check for the start of a new JPEG file
if(buffer[0] == 0xff & buffer[1] == 0xd8 & buffer[2] == 0xff & (buffer[3] & 0xf0) == 0xe0)
{
// If first JPEG
if(JPEG_num == 0)
{
sprintf(filename, "%03i.jpg", JPEG_num);
jpeg0 = fopen(filename, "w");
fwrite(buffer, sizeof(BYTE), 512, jpeg0);
}
else // If not first JPEG
{
fclose(jpeg0);
JPEG_num++;
sprintf(filename, "%03i.jpg", JPEG_num);
jpegn = fopen(filename, "w");
fwrite(buffer, sizeof(BYTE), 512, jpegn);
}
}
else // If already found JPEG
{
fwrite(buffer, sizeof(BYTE), 512, jpegn);
}
}
// Close remaining files and free dynamically allocated memory
fclose(jpegn);
free(buffer);
}

There are quite many issues on your code. I am surprised if valgrind didn't identify these out.
First is this:
if(fopen(argv[1],"r") == NULL) // Checks if the image can be opened for reading
{
printf("This image cannot be opened for reading\n");
return 1;
}
FILE *forensic_image = fopen(argv[1],"r"); // Opens the image inputted and stores it in a new file
This is not fatal, but you opened the same file twice and discard the first file pointer. But the one with similar pattern below is definitely a memory leak:
BYTE *buffer = malloc(512 * sizeof(BYTE)); // Dynamically creates an array capable of holding 512 bytes of data
if(malloc(512*sizeof(BYTE)) == NULL) // Checks if there is enough memory in the system
{
printf("System error\n");
return 1;
}
Here you allocated 512-bytes twice and keep only the first allocation in the pointer buffer, while the second allocation is lost.
And then this:
char *filename = NULL;
// ...
sprintf(filename, "%03i.jpg", JPEG_num);
you are writing a string to an unallocated memory!
and also the lines:
else // If already found JPEG
{
fwrite(buffer, sizeof(BYTE), 512, jpegn);
}
How can you guarantee jpegn is a valid file pointer? Probably never because I see in your code that JPEG_num will always be 0. The part of else marked by // If not first JPEG is dead code.

when compiling, always enable the warnings, then fix those warnings.
gcc -ggdb3 -Wall -Wextra -Wconversion -pedantic -std=gnu11 -c "untitled1.c" -o "untitled1.o"
results in several warnings like:
untitled1.c:46:91: warning: suggest parentheses around comparison in operand of ‘&’ [-Wparentheses]
if(buffer[0] == 0xff & buffer[1] == 0xd8 & buffer[2] == 0xff & (buffer[3] & 0xf0) == 0xe0)
Note: a single & is a bit wise AND. You really want a logical AND && for all but the last one in this statement
regarding;
FILE *forensic_image = fopen(argv[1],"r");
Always check (!=NULL) the returned value to assure the operation was successful. If not successful (==NULL) then call
perror( "fopen failed" );
to output to stderr both your error message and the text reason the system thinks the error occurred.
regarding:
while(!feof(forensic_image))
please read: why while( !feof() is always wrong
regarding:
FILE *forensic_image = fopen(argv[1],"r");
This is already done in the prior code block. There is absolutely no reason to do this again AND it will create problems in the code. Suggest: replacing:
if(fopen(argv[1],"r") == NULL)
{
printf("This image cannot be opened for reading\n");
return 1;
}
with:
if( (forensic_image = fopen(argv[1],"r") ) == NULL)
{
perror( "fopen for input file failed" );
exit( EXIT_FAILURE );
}
regarding:
BYTE *buffer = malloc( 512 * sizeof(BYTE) );
and later:
free( buffer );
This is a waste of code and resources. The project only needs one such instance. Suggest:
#define RECORD_LEN 512
and
unsigned char buffer[ RECORD_LEN ];
regarding;
fread(buffer, sizeof(BYTE), 512, forensic_image);
The function: fread() returns a size_t. You should be assigning the returned value to a size_t variable and checking that value to assure the operation was successful. Infact, that statement should be in the while() condition
regarding;
sprintf(filename, "%03i.jpg", JPEG_num);
This results in undefined behavior and can result in a seg fault event because the pointer filename is initialized to NULL. Suggest:
char filename[20];
to avoid that problem
regarding:
else // If not first JPEG
{
fclose(jpeg0);
if your (for instance) working with the 3rd file, then jpeg0 is already closed, resulting in a run time error. Suggest removing the statement:
FILE *jpeg0;
and always using jpegn
regarding;
else // If already found JPEG
{
fwrite(buffer, sizeof(BYTE), 512, jpegn);
}
on the first output file, jpegn is not set, so this results in a crash. Again, ONLY use jpegn for all output file operations.
regarding:
fwrite(buffer, sizeof(BYTE), 512, jpegn);
this returns the number of (second parameter) amounts actually written, so this should be:
if( fwrite(buffer, sizeof(BYTE), 512, jpegn) != 512 ) { // handle error }
the posted code contains some 'magic' numbers, like 512. 'magic' numbers are numbers with no basis. 'magic' numbers make the code much more difficult to understand, debug, etc. Suggest using an enum statement or #define statement to give those 'magic' numbers meaningful names, then use those meaningful names throughout the code.

Related

fgets statement reads first line and not sure how to modify because I have to return a pointer [duplicate]

I need to copy the contents of a text file to a dynamically-allocated character array.
My problem is getting the size of the contents of the file; Google reveals that I need to use fseek and ftell, but for that the file apparently needs to be opened in binary mode, and that gives only garbage.
EDIT: I tried opening in text mode, but I get weird numbers. Here's the code (I've omitted simple error checking for clarity):
long f_size;
char* code;
size_t code_s, result;
FILE* fp = fopen(argv[0], "r");
fseek(fp, 0, SEEK_END);
f_size = ftell(fp); /* This returns 29696, but file is 85 bytes */
fseek(fp, 0, SEEK_SET);
code_s = sizeof(char) * f_size;
code = malloc(code_s);
result = fread(code, 1, f_size, fp); /* This returns 1045, it should be the same as f_size */
The root of the problem is here:
FILE* fp = fopen(argv[0], "r");
argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.
You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.
For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.
That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."
If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)
I'm pretty sure argv[0] won't be an text file.
Give this a try (haven't compiled this, but I've done this a bazillion times, so I'm pretty sure it's at least close):
char* readFile(char* filename)
{
FILE* file = fopen(filename,"r");
if(file == NULL)
{
return NULL;
}
fseek(file, 0, SEEK_END);
long int size = ftell(file);
rewind(file);
char* content = calloc(size + 1, 1);
fread(content,1,size,file);
return content;
}
If you're developing for Linux (or other Unix-like operating systems), you can retrieve the file-size with stat before opening the file:
#include <stdio.h>
#include <sys/stat.h>
int main() {
struct stat file_stat;
if(stat("main.c", &file_stat) != 0) {
perror("could not stat");
return (1);
}
printf("%d\n", (int) file_stat.st_size);
return (0);
}
EDIT: As I see the code, I have to get into the line with the other posters:
The array that takes the arguments from the program-call is constructed this way:
[0] name of the program itself
[1] first argument given
[2] second argument given
[n] n-th argument given
You should also check argc before trying to use a field other than '0' of the argv-array:
if (argc < 2) {
printf ("Usage: %s arg1", argv[0]);
return (1);
}
argv[0] is the path to the executable and thus argv[1] will be the first user submitted input. Try to alter and add some simple error-checking, such as checking if fp == 0 and we might be ble to help you further.
You can open the file, put the cursor at the end of the file, store the offset, and go back to the top of the file, and make the difference.
You can use fseek for text files as well.
fseek to end of file
ftell the offset
fseek back to the begining
and you have size of the file
Kind of hard with no sample code, but fstat (or stat) will tell you how big the file is. You allocate the memory required, and slurp the file in.
Another approach is to read the file a piece at a time and extend your dynamic buffer as needed:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGESIZE 128
int main(int argc, char **argv)
{
char *buf = NULL, *tmp = NULL;
size_t bufSiz = 0;
char inputBuf[PAGESIZE];
FILE *in;
if (argc < 2)
{
printf("Usage: %s filename\n", argv[0]);
return 0;
}
in = fopen(argv[1], "r");
if (in)
{
/**
* Read a page at a time until reaching the end of the file
*/
while (fgets(inputBuf, sizeof inputBuf, in) != NULL)
{
/**
* Extend the dynamic buffer by the length of the string
* in the input buffer
*/
tmp = realloc(buf, bufSiz + strlen(inputBuf) + 1);
if (tmp)
{
/**
* Add to the contents of the dynamic buffer
*/
buf = tmp;
buf[bufSiz] = 0;
strcat(buf, inputBuf);
bufSiz += strlen(inputBuf) + 1;
}
else
{
printf("Unable to extend dynamic buffer: releasing allocated memory\n");
free(buf);
buf = NULL;
break;
}
}
if (feof(in))
printf("Reached the end of input file %s\n", argv[1]);
else if (ferror(in))
printf("Error while reading input file %s\n", argv[1]);
if (buf)
{
printf("File contents:\n%s\n", buf);
printf("Read %lu characters from %s\n",
(unsigned long) strlen(buf), argv[1]);
}
free(buf);
fclose(in);
}
else
{
printf("Unable to open input file %s\n", argv[1]);
}
return 0;
}
There are drawbacks with this approach; for one thing, if there isn't enough memory to hold the file's contents, you won't know it immediately. Also, realloc() is relatively expensive to call, so you don't want to make your page sizes too small.
However, this avoids having to use fstat() or fseek()/ftell() to figure out how big the file is beforehand.

Can someone explain why fread() is not working?

After a couple hours working on the recover exercise of cs50 i've been stumbling in the segmentention error problem. After running the debbuger i've discovered that the cause of the segmentation error is the malfuction of fread(memory, 512, 1, file), even after calling the function the memory[] array keeps empty, thus, the segmentation error.
i've tried to work with malloc(512) instead of an unsigned char array but the error persists. Can someone explain why is this happening and how to solve it?
(PS. Sorry for my bad english)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
// making sure the only user input is the name of the file
if (argc != 2)
{
printf("Usage: ./recover image\n");
return 1;
}
// open the file and check if it works
FILE *file = fopen("card.raw", "r");
if (file == NULL)
{
printf("Could not open card.raw.\n");
return 2;
}
int ending = 1000;
int count = 0;
char img = '\0';
FILE *picture = NULL;
unsigned char memory[512];
do
{
//creating buffer and reading the file into the buffer
fread(memory, 512, 1, file);
//checking if the block is a new jpg file
if (memory[0] == 0xff && memory[1] == 0xd8 && memory[2] == 0xff && (memory[3] & 0xf0) == 0xe0)
{
//if it's the first jpg file
if (count == 0)
{
sprintf(&img, "000.jpg");
picture = fopen(&img, "w");
fwrite(&memory, 512, 1, picture);
}
//closing previous jpg file and writing into a new one
else
{
fclose(picture);
img = '\0';
sprintf(&img, "%03i.jpg", count + 1);
picture = fopen(&img, "w");
fwrite(&memory, 512, 1, picture);
}
}
//continue writing into the file
else
{
picture = fopen(&img, "a");
fwrite(&memory, 512, 1, picture);
}
count++;
}
while(ending >= 512);
fclose(file);
fclose(picture);
return 0;
}
if you're using C or C++ then you should be fully aware of the memory model being used. For example, declaring a character local variable, means allocating from 1 to 4 bytes of memory in the stack, depending on memory alignment and architecture being used (16-bit? 32-bit? 64-bit?). And guess what happens when you do sprintf of more than 4 characters on such character local variable. It will overrun to whatever variable occupying the space after the img variable. So you must prepare a buffer large enough to hold characters that are needed to create the filename.
In C, if you make a mistake, there are several possibilities :
sometime you get segmentation fault after you do a mistake
sometime you didn't get any error but the data silently corrupted
sometime error happens long after the mistake is done
There are other problems with your code, which has been pointed out by Weather Vane and Jabberwocky in the comments above. I would like to add that reopening 'img' file and discarding previous file handle is not a good thing either (you already said continue writing? why need to reopen? ). You might get a dangling file handle or needlessly create many file handles during the iteration. C will not help you check such things, it assumes you really know what are you doing. Even mixing types will not cause compile error nor identifiable runtime error. It will just do one of the three things I said above.
You might want to use other modern language with more memory safety features in order to learn programming, like C#, Java or Python.

Adapt code to copy/paste .zip and .tar.gzip files?

Introduction
I'm writing my own cp program. With the code I currently have I'm able to copy and paste files.
Code
char *buf;
int fd;
int ret;
struct stat sb;
FILE *stream;
/*opening and getting size of file to copy*/
fd = open(argv[1],O_RDONLY);
if(fd == -1)
{
perror("open");
return 1;
}
/*obtaining size of file*/
ret = fstat(fd,&sb);
if(ret)
{
perror("stat");
return 1;
}
/*opening a stream for reading/writing file*/
stream fdopen(fd,"rb");
if(!stream)
{
perror("fdopen");
return 1;
}
/*allocating space for reading binary file*/
buf = malloc(sb.st_size);
/*reading data*/
if(!fread(buf,sb.st_size,1,stream))
{
perror("fread");
return 1;
}
/*writing file to a duplicate*/
fclose(stream);
stream = fopen("duplicate","wb");
if(!fwrite(buf,sb.st_size,1,stream))
{
perror("fwrite");
return 1;
}
fclose(stream);
close(fd);
free(buf);
return 0;
The problem
I'm unable to copy and paste .zip files and .tar.gz files. If i alter the code and give an extension such as 'duplicate.zip' (assuming im copying a zip file) such as .zip and then try and copy a .zip file
everything is copied, however the new duplicated file does not act like a zip file and when i use cat it outputs nothing and this error when i attempt to unzip it anyway:
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
So how do i go about copying zip and pasting zip files and also .tar.gz files. Any pointers will be helpful, thanks in advance.
You are using malloc() incorrectly. You want to allocate sb.st_size bytes.
malloc(sb.st_size * sizeof buf)
should be
malloc(sb.st_size)
The use of fread() is dubious and you are throwing away the result of fread(). Instead of
if(!fread(buf,sb.st_size,1,stream))
you should have
size_t num_bytes_read = fread (buf, 1, sb.st_size, stream);
if (num_bytes_read < sb.st_size)
You are using strlen() incorrectly. The content of buf is not guaranteed to be a string; and anyway you already know how many bytes you have in buf: sb.st_size. (Because if fread() returned a smaller number of bytes read you got angry and terminated the process.) So instead of
fwrite(buf,strlen(buf),1,stream)
you should have
fwrite (buf, 1, sb.st_size, stream)
In addition to AlexP's notes...
/*obtaining size of file*/
ret = fstat(fd,&sb);
if(ret)
{
perror("stat");
return 1;
}
// ...some code...
/*allocating space for reading binary file*/
buf = malloc(sb.st_size);
/*reading data*/
if(!fread(buf,sb.st_size,1,stream))
{
perror("fread");
return 1;
}
You have a race condition here. If the file size changes between your fstat call and malloc or fread you will read too much or too little of the file.
Fixing this leads us to the next issue, you're slurping the entire file into memory. While this might work for small files, it is extremely inefficient with your memory on large ones. For very large files it might be too large for a single malloc, and you're not checking if your malloc succeeds.
Instead, read and write the file a piece at a time. And read until there isn't any more to read.
uint8_t *buffer[4096]; // 4K buffer
size_t num_read;
while( (num_read = fread(buffer, sizeof(uint8_t), sizeof(buffer), in)) != 0 ) {
if( fwrite( buffer, sizeof(uint8_t), num_read, out ) == 0 ) {
perror("fwrite");
}
}
This avoids the race condition by not having to call fstat in the first place. And it avoids allocating a potentially enormous hunk of memory. Instead it can all be done on the stack.
I've used uint8_t to get a hunk of bytes. It's a standard fixed width integer type from stdint.h. You can also use unsigned char to read bytes, and that's probably what uint8_t really is, but uint8_t makes it explicit.

My recovered IMGs doesn't match the Original in recover CS50

The problem is to recover some JPGs from a .raw file.
when I run check50 I get "recovered img don't match".
:) recover.c exists.
:) recover.c compiles.
:) handles lack of forensic image
:( recovers 000.jpg correctly –
recovered image does not match
:( recovers middle images correctly –
recovered image does not match
:( recovers 015.jpg correctly –
015.jpg not found
I really tried hard to identify the problem and every time I fail to Identify where the problem is, I hope someone can and give me a peace of advice.
#include <stdio.h>
#include <stdint.h>
int main(int argc, char *argv[]){
if(argc != 2){
fprintf(stderr, "Usage: ./recover image");
return 1;
}
//open file
FILE *inptr = fopen(argv[1], "r");
if (inptr == NULL){
fprintf(stderr, "Could not open %s.\n", argv[1]);
return 2;
}
int foundjpg = 0;
char filename[10];
int x=1;
//repeat until end of the card
while(x == 1){
//buffer
unsigned char buf[512];
x = fread(buf, 512, 1, inptr);
//read into buffer
fread(buf, 512, 1, inptr);
FILE *jpg = fopen(filename, "w");
//start of a new jpg?
if(buf[0]== 0xff && buf[1] == 0xd8 && buf[2] == 0xff && (buf[3] & 0xf0) == 0xe0 ){
if(jpg != NULL){// yes i found before
fclose(jpg);
sprintf(filename, "%03i.jpg" ,foundjpg );
foundjpg++;
jpg = fopen(filename, "w");
}
else{
sprintf(filename, "%03i.jpg" ,foundjpg );
jpg = fopen(filename , "w");
foundjpg++;
}
}
//already found a jpg?
if(jpg != NULL && foundjpg > 0){
fwrite(buf, 1, 512, jpg);
}
}
fclose(inptr);
// success
return 0;
}
The order in which you do things is quite confused and leads to errors. For example:
filename isn't initialised when you use it for the first time.
You increase the counter foundjpg after you use it to create the filename, which in your program means that the second image is called 01.jpg. All image indices are off by one and the last one is missing.
When the id bytes do not identify a valid jpg, no new record is read and your loop never ends.
You should re-organise your code so that it does one thing after another in a natural way. The program might look like this:
Check command line arguments
Open the raw file
Main loop:
Read fixed-size block. If it can't be read, exit the loop
Check if first bytes identify a jpg and if so:
Create file name
Open jpg file for writing
Write block and close jpg file
Increment block counter
Close raw file
You must decide how you handle errors. Do you just skip erroneous blocks or do you abort the program?
It is also not clear whether all images are 512 bytes long, which seems improbable. Perhaps you must read the actual image size from the header and then copy the whole image.

C - Printing Bin. File In Weird Symbols

I created a function that is successfully reading the binary file but it is not printing as I wanted.
The function:
void print_register() {
FILE *fp;
fp = fopen("data.bin", "rb");
if (fp == NULL) {
error_message("Fail to open data.bin for reading");
exit(0);
}
reg buffer;
while (EOF != feof(fp)) {
fread(&buffer, sizeof(reg), 1, fp);
printf("%s %d %d\n", buffer.name, buffer.age, buffer.id);
}
fclose(fp);
}
Note: reg is a typedef for a struct:
typedef struct registers reg;
struct registers {
char name[30];
int age;
int id;
char end;
};
Function for writing the file:
void register_new() {
system("clear");
reg buffer;
FILE *fp;
fp = fopen("data.bin", "ab");
if (fp == NULL) {
error_message("Error opening file data.bin");
exit(0);
}
write_register(buffer);
fwrite(&buffer, sizeof(reg), 1, fp);
fclose(fp);
}
Posting a printscreen of what was print to be more helpful:
As you can see on image, after the "p" (command for printing) is where should be the name, age and id of the struct.
In register_new(), you have to send the address of buffer in order for write_register() to work properly (right now you're giving it a copy of buffer).
Replace:
write_register(buffer);
with:
write_register(&buffer);
Then correct write_register to take and work with an address instead of a structure.
This might help you understand what's going on: http://fresh2refresh.com/c-programming/c-passing-struct-to-function
Your reading loop is incorrect. Don't use feof(), it can only tell is you have reached the end of file after a read attempt failed and it might not return EOF anyway, it is only specified as returning 0 or non 0. Use this instead:
while (fread(&buffer, sizeof(reg), 1, fp) == 1) {
printf("%s %d %d\n", buffer.name, buffer.age, buffer.id);
}
fread returns the number of items successfully read. Here you request to read 1 item of size sizeof(reg), if the item was read successfully, fread will return 1, otherwise it will return 0 (in case of a read error or end of file reached).
Your screenshot shows a syntax error, which you seem to have fixed now. Remove that, it is not helping.
In your function register_new, you are writing an uninitialized structure reg to the file, no wonder it does not contain anything useful when you read it back from the file. And for what it is worth, opening this file in binary mode is the correct thing to do since it contains binary data, namely the int members of the structure.
The reg passed to fwrite is indeed uninitialized. write_register gets a copy of this uninitialized structure by value, and probably modifies this copy, but this does not affect the local structure in register_new. You should modify write_register() to take a pointer to the structure. Unlike C++, there is no passing by reference in C.

Resources