C: theory on how to extract files from an archived file

C: theory on how to extract files from an archived file - c

In C I have created a program which can archive multiple files into an archive file via the command line.
e.g.
$echo 'file1/2' > file1/2.txt
$./archive file1.txt file2.txt arhivedfile
$cat archivedfile
file1
file2
How do I create a process so that in my archivedfile I have:
header
file1
end
header
file2
end
They are all stored in the archive file one after another after another. I know that perhaps a header file is needed(containing filename, size of filename, start and end of file) for extracting these files back out into their original form, but how would I go about doing this.
I am stuck on where and how to start.
Please could someone help me on some logic as to how to approach extracting files back out of an archived file.

As has been mentioned before, start with the algorithm. You already have most of the details.
There are a few approaches you can take:
Random access archive.
Sequential access archive.
Random Access Archive
For this to work, the header needs to act as an index (like the card indexes at a library), indicating; (a) where to find the start of each file; and (b) the length of each file. The algorithm to write the archive file might look like:
Get a list of all the files from the command line.
Create a structure to hold the meta data about each file: name (255 char), size (64-bit int), date and time, and permissions.
For each file, get its stats.
Store the stats of each file within an array of structures.
Open the archive for writing.
Write the header structure.
For each file, append its content to the archive file.
Close the archive file.
(The header might have to include the number of files, too.)
Next, the algorithm for extracting files:
Get an archive file from the command line.
Get a file name to extract, also from the command line.
Create memory for a structure to read meta data about each file.
Read all the meta data from the archive file.
Search for the file name to extract throughout list of the meta data.
Calculate the offset into the archive file for the start of the matching file name.
Seek to the offset.
Read the file content and write it out to a new file.
Close the new file.
Close the archive.
Sequential Access
This is easier. You can do it yourself: think through the steps.
About Programming
It is easy to get caught up in the details of how something should work. I suggest that you take a step back -- something your teacher should discuss in class -- and try to think about the problem at a level above coding, because:
the algorithm you create will be language independent;
fixing mistakes in an algorithm, before code is written, is trivial;
you will have a better understanding of what you need to do before coding;
it will take less time to implement the solution;
you can identify areas that can be implemented in parallel;
you will see any potential roadblocks ahead of time; and
you will be on your way to management positions in no time. ;-)

I would think that the header would need to have information needed to identify the file and how big it is within the archive - for example, file name, original directory, and size in either lines or bytes, depending on which is more useful in your context. You'd then need routines to create a header, add a file to an archive (create a header and append the file data), extract a file from an archive (follow the headers until the correct entry is found and copy the data from the archive to a separate file), and delete a file (start reading the archive, copying data for all entries except the one you want to delete to a new file, then delete the old archive and rename the new one to the old name).
Share and enjoy.

One approach is to imitate the ZIP format: http://en.wikipedia.org/wiki/ZIP_file_format
It uses a directory structure at the end of the file, which contains pointers to the offsets of the files in the archive. The big benefit of this structure is that you can find a given file without having to read the entire archive -- as long as you know the start of the directory and have the ability to randomly access the file.
An alternative is the TAR file format: http://en.wikipedia.org/wiki/Tar_file_format
This is designed for streaming media ("tape archive"), so each entry contains its own metadata. You have to scan the entire file for an entry, but the normal use case is to pack/unpack entire directory trees, so this isn't too bad a penalty.

Doing it in a streaming fashion, like tar, is probably the easiest implementation. First, write a magic number out so you can identify that this is your archive format. I'd then suggest using stat(2) (that's man syntax for the stat man page, section 2) to get the size of the file to be archived. Actually, look closely at the stat fields available to you, there may be some interesting information there you'd want to keep.
Write out the information you need in a tag=value fashion, one per line. For example:
FileName=file1.txt
FileSize=10
FileDir=./blah/blah
FilePerms=0700
End your header with two newlines so you know when to start pushing out FileSize bytes to disk. You don't need an beginning of header marker, because you know the filesize to write out, so you know when to start parsing your header again.
I'm suggesting you use a text format for your header information because then you don't have to worry about byte ordering, etc. that you'd need to worry about if you write a raw binary struct out to disk.
When reading your archive, parse the header lines one by one and populate a local struct to hold that information. Then write out the file to disk, and set any file properties that need updating based on the header info you extracted.
Hope that helps. Good luck.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
int size;
char name[20];
}Header;
void packfiles(char* archive_file, int numfiles, char** filenames){
FILE* fp = fopen(archive_file, "wb");
if(!fp){
perror("Error opening archive file");
exit(1);
}
for(int i = 0; i < numfiles; i++){
FILE* infile = fopen(filenames[i], "rb");
if(!infile){
perror("Error opening input file");
exit(1);
}
//Get file size
fseek(infile, 0, SEEK_END);
int fsize = ftell(infile);
rewind(infile);
//Create header
Header header;
header.size = fsize;
strncpy(header.name, filenames[i], 20);
//Write header and file content to archive
fwrite(&header, sizeof(Header), 1, fp);
for(int j = 0; j < fsize; j++){
fputc(fgetc(infile), fp);
}
//Add padding if necessary
if(fsize % 4 != 0){
for(int j = 0; j < 4-(fsize % 4); j++){
fputc(0, fp);
}
}
fclose(infile);
}
fclose(fp);
}
void unpackfiles(char* archive_file){
FILE* fp = fopen(archive_file, "rb");
if(!fp){
perror("Error opening archive file");
exit(1);
}
while(1){
//Read header
Header header;
int read = fread(&header, sizeof(Header), 1, fp);
if(read == 0){
//EOF
break;
}
else if(read != 1){
perror("Error reading header");
exit(1);
}
//Create output file
FILE* outfile = fopen(header.name, "wb");
if(!outfile){
perror("Error creating output file");
exit(1);
}
//Write file content to output file
for(int i = 0; i < header.size; i++){
fputc(fgetc(fp), outfile);
}
//Skip padding
fseek(fp, 4-(header.size % 4), SEEK_CUR);
fclose(outfile);
}
fclose(fp);
}
int main(int argc, char** argv){
if(argc < 3){
fprintf(stderr, "Usage: %s <archive_file> <file1> [<file2>...]\n", argv[0]);
exit(1);
}
packfiles(argv[1], argc-2, argv+2);
unpackfiles(argv[1]);
return 0;
}

Related

Replacing bytes at current offset in c

I'm currently developing a program that mimics UNIX file system. I've prepared my disk as file (1 MB) got all data blocks inside it. Now what I'm doing is implementing some simple commands like mkdir, ls etc. In order to work with those commands, I need to read specific offset(no problem with that) and write the modified blocks to specific location.
Simply my goal is:
SIIIDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD (Current Disk)
I wan't to change three blocks with AAA after 16.byte so it will be like:
SIIIDDDDDDDDDDDDAAADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD (Modified Disk)
I'm not going to provide all of my implementation here I just want to have some ideas about it how can I implement it without buffering all the 1 MB data in my program. In short I know locations of my data blocks so I just want to replace that part of my file not whole file. Can't I simply do this with file stream functions ?
Another example:
fseek(from_disk,superblock.i_node_bit_map_starting_addr , SEEK_SET); //seek to known offset.
read_bit_map(&from_disk); // I can read at specific location without problem
... manipulate bit map ...
fseek(to_disk,superblock.i_node_bit_map_starting_addr , SEEK_SET); //seek to known offset.
write_bit_map(&to_disk); //Write back the data.
//This will destroy the current data of file. (Tried with w+, a modes.)
Note: Not provided in example but I have two file pointers both writing and reading and I'm aware I need to close one before opening another.

I think you are looking for the r+ (potentially rb+ mode). Here is a complete example, afterwards you can run grep -n hello data.txt to verify for yourself the result. You can run it with make prog && ./prog.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char const *argv[])
{
FILE *file;
file = fopen("data.txt", "w+");
char dummy_data[] = "This is stackoverflow.com\n";
int dummy_data_length = strlen(dummy_data);
for (int i = 0; i < 1000; ++i)
fwrite(dummy_data, dummy_data_length, 1, file);
fclose(file);
file = fopen("data.txt", "r+");
fseek(file, 500, SEEK_CUR);
fwrite("hello", 5, 1, file);
fclose(file);
return 0;
}

When writing to a file, fputs() does not change line

I'm currently trying to create a database in C, using a .txt document as the place to store all the data. But I can't get fputs() to shift line, so everything that my program writes in this .txt document is only on one line.
int main(void){
char c[1000];
FILE *fptr;
if ((fptr=fopen("data.txt","r"))==NULL){
printf("Did not find file, creating new\n");
fptr = fopen("data.txt", "wb");
fputs("//This text file contain information regarding the program 'monies.c'.\n",fptr);
fputs("//This text file contain information regarding the program 'monies.c'.\n",fptr);
fputs("//Feel free to edit the file as you please.",fptr);
fputs("'\n'",fptr);
fputs("(Y) // Y/N - Yes or No, if you want to use this as a database",fptr);
fputs("sum = 2000 //how much money there is, feel free to edit this number as you please.",fptr);
fclose(fptr);
}
fscanf(fptr,"%[^\n]",c);
printf("Data from file:\n%s",c);
fclose(fptr);
return 0;
}
This is my testing document.
I feel like I've tried everything and then some, but can't get it to change line, help is much appreciated.
Btw. The output looks like this:

There are two issues in your program :
You should specify "w" rather than "wb" so that the file is read and written as text rather than binary. Although in some systems this makes no difference and b is ignored.
The part for file reading should be in an else, otherwise it executes after file creation with fptr not containing a valid value.
This is the code with those corrections. I do get a multiline data.txt with it.
int main(void){
char c[1000];
FILE *fptr;
if ((fptr=fopen("data.txt","r"))==NULL){
printf("Did not find file, creating new\n");
fptr = fopen("data.txt", "w");
fputs("//This text file contain information regarding the program 'mon
fputs("//This text file contain information regarding the program 'mon
fputs("//Feel free to edit the file as you please.",fptr);
fputs("'\n'",fptr);
fputs("(Y) // Y/N - Yes or No, if you want to use this as a database",
fputs("sum = 2000 //how much money there is, feel free to edit this nu
fclose(fptr);
}
else
{
fscanf(fptr,"%[^\n]",c);
printf("Data from file:\n%s",c);
fclose(fptr);
}
return 0;
}

Open image file as binary, store image as string of bytes, save the image - possible in plain C?

I would like to read an image, lets say, picture.png in C. I know I can open it in binary mode, and then read - it's pretty simple.
But I need something more: I would like to be able to read the image once, store it in my code, for example, in *.h file, as 'string of bytes', for example:
unsigned char image[] = "0x87 0x45 0x56 ... ";
and then, be able to just do:
delete physical file I read from disk,
save image into file - it will create my file once again,
EVEN if I removed image from disk (deleted physical file picture.png I read earlier) I will still be able to create an image on disk, simply by writing my image array into file using binary mode. Is that possible in pure C? If so, how can I do this?

There's even a special format for this task, called XPM and a library to manipulate these files. But remember due to its nature it's suitable only for relatively small images. But yes, it was used for years in X Window System to provide icons. Well, those old good days icons were 16x16 pixels wide and contained no more than 256 colors :)

Of course it's possible, but it's a bit unclear what you're after.
There are stand-alone programs that convert binary data to C source code, you don't need to implement that. But doing it that way of course means that the image becomes a static part of your program's executable.
If you want it to be more dynamic, like specifying the filename to your program when it's running, then the whole thing about converting to C source code becomes moot; your program is already compiled. C programs can't add to their own source at run-time.
UPDATE If all you want to do is load a file, hold it in memory and then write it back out, all in the same run of your program, that's pretty trivial.
You'd use fopen() to open the file, fseek() to go to the end, ftell() to read the size of the file. Then rewind() it to the start, malloc() a suitable buffer, fread() the file's contents into the buffer and fclose() the file. Later, fopen() a new output file, and fwrite() the buffer into that before using fclose() to close the file. Then you're done. You can do it again, as many times as you like. It can be an image, a program, a document or any other kind of file, it doesn't matter.

pic2h.c :
#include <stdio.h>
int main(int argc, char *argv[]){
if(argc != 3){
fprintf(stderr, "Usage >pic2h image.png image.h\n");
return -1;
}
FILE *fi = fopen(argv[1], "rb");
FILE *fo = fopen(argv[2], "w");
int ch, count = 0;
fprintf(fo, "extern unsigned char image[];\n");
fprintf(fo, "unsigned char image[] =");
while(EOF!=(ch=fgetc(fi))){
if(count == 0)
fprintf(fo, "\n\"");
fprintf(fo, "\\x%02X", ch);
if(++count==24){
count = 0;
fprintf(fo, "\"");
}
}
if(count){
fprintf(fo, "\"");
}
fprintf(fo, ";\n");
fclose(fo);
fclose(fi);
return 0;
}
resave.c :
#include <stdio.h>
#include "image.h"
int main(int argc, char *argv[]){
if(argc != 2){
fprintf(stderr, "Usage >resave image.png\n");
return 0;
}
size_t size = sizeof(image)-1;
FILE *fo = fopen(argv[1], "wb");
fwrite(image, size, 1, fo);
fclose(fo);
return 0;
}

Trouble testing copy file function in C

Okay so this is probably has an easy solution, but after a bit of searching and testing I remain confused.. :(
Here is a snippet of the code that I have written:
int main(int argc, char *argv[]){
int test;
test = copyTheFile("test.txt", "testdir");
if(test == 1)
printf("something went wrong");
if(test == 0)
printf("copydone");
return 0;
}
int copyTheFile(char *sourcePath, char *destinationPath){
FILE *fin = fopen(sourcePath, "r");
FILE *fout = fopen(destinationPath, "w");
if(fin != NULL && fout != NULL){
char buffer[10000];//change to real size using stat()
size_t read, write;
while((read = fread(buffer, 1, sizeof(buffer), fin)) > 0){
write = fwrite(buffer, 1, read, fout);
if(write != read)
return 1;
}//end of while
}// end of if
else{
printf("Something wrong getting the file\n");
return 0;}
if(fin != NULL)
fclose(fin);
if(fout != NULL)
fclose(fout);
return 0;
}
Some quick notes: I am very new to C, programming, and especially file I/O. I looked up the man pages of fopen, fread, and fwrite. After looking at some example code I came up with this. I was trying to just copy a simple text file, and then place it in the destination folder specified by destinationPath.
The folder I want to place the text file into is called testdir, and the file I want to copy is called test.txt.
The arguments I have attempted to use in the copyFile function are:
"test.txt" "testdir"
".../Desktop/project/test.txt" ".../Desktop/project/testdir"
"/Desktop/project/test.txt" "/Desktop/project/testdir"
I just get the print statement "Something wrong getting the file" with every attempt. I am thinking that it may be because 'testdir' is a folder not a file, but then how would I copy to a folder?
Sorry if this a really basic question, I am just having trouble so any advice would be awesome!
Also, if you wanted to be extra helpful, the "copyTheFile" function is supposed to copy the file regardless of format. So like if its a .jpg or something it should copy it. Let me know if any of you guys see a problem with it.
This is with ISO/POSIX/C89/C99 on Linux.

At the start, you'll want to include stdio.h to provide FILE and the I/O function declarations:
#include <stdio.h>
Aside from this, your program compiles and works properly for me. Unfortunately you can't copy to a directory without using stat() to detect if the destination is a directory, and if so, appending a file name before opening the file.
Some other minor suggestions:
A buffer with a power of two bytes such as 4096 is probably more efficient due to it lining up with filesystem and disk access patterns
Conventionally, C functions that return a status code use 0 for success and other values such as 1 for failure, so swapping your return values may be less confusing
When a standard library function such as fopen, fread or fwrite fails, it is a good idea to use perror(NULL); or perror("error prefix"); to report it, which may look something like:
$ ./a.out
...
error prefix: No such file or directory

if you are trying to write a new file in a directory, you should be giving the full path of the file to be written. in your case
"C:...\Desktop\project\testdir\testfile"

File processing in c?

I have been given a raw file that holds several jpg images. I have to go through the file, find each jpg image, and put those images each in a separate file. So far I have code that can find each where each image begins and ends. I also have written code that names several file names I can use to put the pictures in. It is an array: char filename[] , that holds the names: image00.jpg - image29.jpg .
What I cannot figure out is how to open a file every time I find an image, an then close that file and open a new one for the next image. Do I need to use fwrite()? Also, each image is in blocks of 512 bytes, so I only have to check for a new image every 512 bytes once I find the first one. Do I need to add that into fwrite?
So, to summarize my questions, I don't understand how to use fwrite(), if that is what I should be using to write to these files.
Also, I do not know how to open the files using the names I have already created.
Thanks in advance for the help. Let me know if I need to post any other code.

Use fopen(rawfilename, "rb"); to open the raw file for reading. and fread to read from it.
Use fopen(outfilename, "wb"); to open output file for writing and fwrite to write to it.
As mentioned in my comment, you are assigning char *[] to char*, use char filename[] = "image00.jpg"; instead.
Don't forget to close each file after you finish its processing (r/w) (look at fclose() at the same site of other links)
Decide how much bytes to read each time by parsing the jpeg header. Use malloc to allocate the amount of bytes needed to be read, and remember, for each allocation of buffer you need to free the allocated buffer later.

Pretty much any book on C programming should cover the functions you need. As MByD pointed out, you'll want to use the functions fopen(), fwrite(), and fclose().
I imagine your code may include fragments that look something like
/* Warning: untested and probably out-of-order code */
...
char **filename = {
"image00.jpg", "image01.jpg", "image02.jpg",
...
"image29.jpg" };
...
int index = 0;
const int blocksize = 512; /* bytes */
...
index++;
...
FILE * output_file = fopen( filename[index], "wb");
fwrite( output_data, 1, blocksize, output_file );
fclose(output_file);
...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight