compression algorithm - c

I'm working on a compression algorithm wherein we have to write code in C. The program takes a file and removes the most significant bit in every character and stores the compressed text in another file. I wrote a function called compress as shown below. I'm getting a seg fault while freeing out_buf. Any help will be a great pleasure.

You close out_fd twice, so of course the second time it is an invalid file descriptor. But more than that, you need to review your use of sizeof() which is NOT the same as finding the buffer size of a dynamically-allocated buffer (sizeof returns the size of the pointer, not the buffer). You don't show the calling code, but using strcat() on a buffer passed-in is always worth a look too (is the buffer passed by the caller large enough for the result?).
Anyway, that should be enough to get you going again...

You're closing twice the same file descriptor
close(out_fd);
if ( close(out_fd) == -1 )
oops("Error closing output file", "");
Just remove the first close(out_fd)
The segmentation fault is because you moved the out_buf pointer.
If you want to put values inside his malloc'd area, use another temp pointer and move it through this memory area.
Like this:
unsigned char *out_buf = malloc(5400000*7/8);
unsigned char *tmp_buf = out_buf;
then subst every *out_buf++ with *tmp_buf++;
Change also the out_buf inside the write call with tmp_buf

Related

How to read() from a file continuosly into a variable

I am trying to perform a read() from a file of which I don't know it's exact size into a variable so that I can do stuff on it later on, so I am looping like this:
char buf[BUFSIZE];
char* contentsOfFile;
fd = open(file, O_RDONLY);
while ( (nbytes = read(fd, buf, sizeof(buf)) ) > 0) { // keep reading until the end of file or error
strcat(contentsOfFile, buf);
}
Of course, this explodes unless contentsOfFile is another char array, but I cannot do this as
I could have a bigger file than the number of bytes it could hold.
Is there any other library solution, or should I resort to malloc?
Use malloc. Find the size first (How do you determine the size of a file in C?) then malloc the appropriate number of bytes and do the read.
This is terrible code:
contentsOfFile is an unitialized pointer, so dereferencing it invokes UB
read returns raw bytes and never adds any terminating null (unformatted io), but strcat expects null terminated strings.
Without more context, it is hard to tell you what is the correct way. Possible ways are:
use mmap to map the file content into memory. After that, you can process it transparently and the OS will load and unload pages from the file when required
load everything into memory using malloc and realloc to make sure to have enough allocated memory for next read
load everything into memory using one single malloc and one single read after finding the file size.

addressSanitizer: heap-buffer-overflow on address

I am at the very beginning of learning C.
I am trying to write a function to open a file, read a BUFFER_SIZE, store the content in an array, then track the character '\n' (because I want to get each line of the input).
when I set the BUFFER_SIZE very large, I can get the first line. when I set the BUFFER_SIZE reasonably small (say, 42) which is not yet the end of the first line , it prints out some weird symbol at the end, but I guess it is some bug in my own code.
however, when I set the BUFFER_SIZE very small, say = 10, and i use the -fsanitizer=address to check for memory leak. it throws a monster of error:
==90673==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000fb at pc 0x000108868a95 bp 0x7fff573979a0 sp 0x7fff57397998
READ of size 1 at 0x6020000000fb thread T0
If anyone can explain me in a general sense:
what is fsanitizer=address flag?
what is heap-buffer-overflow?
what is address and thread? what is the flag to see the thread in colors on screen?
and why it says 'read of size 1 at address.." ?
i would really appreciate <3
what is fsanitizer=address flag?
Usually C compiler doesn't add boundaries check for memory access.
Sometimes due to code error, there is read or write from outside the buffer, such an error is usually hard to detect. Using this flag the compiler add some boundaries check, to ensure you won't use a buffer to reach outside of its allocation.
what is heap-buffer-overflow?
use an array to reach after its allocation,
char* x = malloc(10);
char n=x[11]; //heap-buffer-overflow
(underflow is to reach before its allocation)
char* x = malloc(10);
char n=x[-11]; //heap-buffer-underflow
what is address and thread?
Address is position in memory, thread is part of process running sequence of code.
and why it says 'read of size 1 at address.." ?
It means you read single byte form the given address.
I think your problem is that you allocate the BUFFER_SIZE for the buffer and read the same BUFFER_SIZE into it. The correct approach is to always declare at least one more byte than you read.
like this:
char* buff = malloc(BUFFER_SIZE+1);//notice to +1
fread(buff,1,BUFFER_SIZE,fp);
In simple words it is segmentation fault with the variable created using new keyword as all that goes into heap area of memory.
Explanation - you are trying to access such an address for which you haven't declared your variable, to find all such errors revisit all your conditions and check if you are accessing something out of bounds or not.
This can also be rectified by making fast input output by:-
//For FAST I/O
**ios_base::sync_with_stdio(false);
cin.tie(NULL);**

c programming "Access violation writing location 0x00000000."

I'm working on a program which I need to get string from a user and do some manipulates on it.
The problem is when my program ends I am getting "Access violation writing location 0x00000000." error message.
This is my code
}
//code
char *s;
s=gets();
//code
}
After some reading I relized that using gets() may cause some problems so I changed char *s to s[20] just to check it out and it worked fine without any errors at the end of the program.
The thing is that I don't know the string size in advance, thus, I'm not allowed (academic ex) to create string line as -> s[HugeNumber] like s[1000].
So I have no other choice but using gets() function.
Any way to solve my problem?
Thanks in advance
PS
Also tried using malloc as
char *temp;
char *s;
temp = gets();
s= (char*)malloc((strlen(temp) +1)* sizeof(char));
Error still popup at the end.
As long as I have *something = gets(); my program will throw an error at the end.
It looks like you are expecting gets to allocate an appropriately-sized string and return a pointer to it but that is not how it works. gets needs to receive the buffer as a parameter so you would still need to declare the array with a huge number. In fact, I am surprised that you managed to get your code to compile since you are passing the wrong number of arguments to gets.
char s[1000];
if (gets(s) == NULL) {
// handle error
}
The return value of gets is the same pointer that you passed as a parameter to it. The only use of the return value is to check for errors, since gets will return NULL if it reached the end of file before reading any characters.
A function that works more similarly to what you want is getline in the GNU libc:
char *s;
size_t n=0;
getline(&s, &n, stdin);
printf("%s", s); // Use the string here
free(s); //Then free it when done.
Alternatively, you could do something similar using malloc and realloc inside a loop. Malloc a small buffer to start out then use fgets to read into that buffer. If the whole line fits inside the buffer you are done. If it didn't then you realloc the buffer to something larger (multiply its size by a constant factor each time) and continue reading from where you stopped.
Another approach is to give up on reading arbitrarily large lines. The simplest thing you can do in C is to set up a maximum limit for line length (say, 255 characters), use fgets to read up to that number of characters and then abort with an error if you are given a line that is longer than that. This way you stick to functions in the standard library and you keep your logic as simple as possible.
You have not allocated temp.
And 3 kinds of C you should avoid i.e.
void main() use int main() instead
fflush(stdin)
gets() use fgets() instead

How to read a file using offsets in c

How can I read the contents of a file if I have to use the following parameters:
I have to read the file in parts by using "start-value" of the part and length of the part
The start-value and length of the parts will be read from another file
Overall, I am trying to compute the MD5 value of these parts (you can also call them as CHUNKS).
The start-value and length of the chunks have been computed and stored in a file.
I tried to use fread() as follows, but it does not give me logical results
char *chunk_buffer;
//chunk_buffer is a pointer to a memory block
while(cur_poly != NULL) {
//cur_poly is a structure which is used to store the start and length of chunks
chunk_buffer = (char*) malloc ((cur_poly->length)*8);
//here I am trying to allocate memory based on the size of each chunk
int x=fread (chunk_buffer,1, cur_poly->length, c_file);
//c_file is the file to be read according to the offsets
char hash[32];
hash=md5(chunk_buffer);
//md5() is a function which can generate the md5 hash values for the chunks
}
I see two potential issues.
What units does cur_poly->length represent? You are mallocing memory as if it is a count of 64-bit words, yet reading the file as if it is bytes. If the field represents length in bytes, then you are reading correctly, but allocating too much memory. However, if the field is length in 64-bit words, then you are allocating the right amount of memory, but only reading 1/8th the data.
The code seems to be ignoring offsets. (Or assuming all chunks must be contiguous). If you want to read from an arbitrary offset, do a fseek(fp, offset, SEEK_SET); before the fread.
If the chunks are supposed to be contiguous, there still may be padding at the ends to force them all to start on an even boundary. You would have to seek over the padding whenever the byte count was odd (.WAV does this, as an example)
I want to note some more issues with that code. You might need to add some more details on these points.
If you want to read consecutive chunks from your file, you usually don't need to modify the get pointer of your file. Just read a chunk, and then read the next one. If you need to read the chunks in random order, you need to use fseek. This way you adjust the start position of the next file operation by an offset (from beginning, or end of the file, or relative to the current position).
You have a char pointer chunk_buffer, that you obviously use to store the data from your file temporarily. That is, it's only valid for the current loop iteration.
If this is the case I would suggest to do the malloc once before you enter the loop:
char * chunk_buffer = malloc (MAXIMUM_CHUNK_SIZE);
in the loop you may clear this buffer using memset or just overwrite the data. Also note that malloc()ed memory is not initialized with '\0' values (I don't know if this is one assumption you rely on ...).
I am not sure, why you actually allocate a buffer of size length*8 and just read length bytes to it. Probably
int x = fread (chunk_buffer, SIZE_OF_ITEM, THIS_CHUNK_SIZE, c_file);
would fit your needs closer, if your items are indeed larger than a byte.
It is unclear, what the md5() function actually does. What value does it return? A pointer to a buffer that is allocated dynamically? A pointer to a local array? Anyway, you assign the return value to a pointer to a local array of chars. You might not need to allocate 32 bytes for this, but just
char * hash = md5 (chunk_buffer);
Make sure that you keep the pointer to that array somewhere you find it when the loop takes the next iteration. An array that is created statically in local scope of that function can of course not be passed this way.
Your md5() function. How does it know, what the size of a chunk is? It is passed a pointer, but not the size of the valid data (as far as I see it). You might need to adapt this function to take the length of the input array as additional parameter.
What does the md5() function produce, a C-style string (alphanumeric digits, null-terminated) or an array of byte sized unsigned integers (uint8_t) ?
make sure that you free() the memory you allocate dynamically. If you want to keep the malloc() inside the loop, make sure the loop always ends with
free (chunk_buffer);
For us to help you any further, you need to define
a) what are logical results for you and
b) what results do you get

Passing String as argument- getting segfault in function

SOLVED See bottom of question for solution.
I'm having trouble with passing on a String argument to my function, and am getting a segmentation fault when the function is called. The program takes in a command line input and passes on the file provided to the function after validation.
My function code goes like this:
char *inputFile; //
inputFile= argv[2];
strcpy(inputFile, argv[2]);
compress(inputFile){
//file open and creation work bug-free
//compression action to be written
void compress(char inputFile){
//compression code here
}
When the function is called, a segfault is thrown, and the value of inputFile is 0x00000000, when prior to the function call, it had a memory location and value of the test file path.
Some of the variations I've tried, with matching function prototypes:
compress(char *inputFile)
compress (char inputFile[])
I also changed the variable.
Why is a variable with a valid memory address and value in the debugger suddenly erase when used as a parameter?
Edit 1:
Incorporating suggestions here, I removed the inputFile= argv[2] line, and the debugger shows the strcpy function working.
However, I've tried both compress(char *inputFile) per Edwin Buck and compress(argv[2]) per unwind, and both changes still result in Cannot access memory at address 0xBEB9C74C
The strange thing is my file validation function checkFile(char inputFile[]) works with the inputFile value, but when I pass that same parameter to the compress(char inputFile[]) function, I get the segfault.
Edit 2- SOLVED
You know something is going on when you stump your professor for 45 min. It turns out I had declared the file read buffer as a 5MB long array inside the compress() method, which in turn maxed out the stack frame. Changing the buffer declaration to a global variable did the trick, and the code executes.
Thanks for the help!
You shouldn't be writing into memory used to hold argv[2].
You don't seem to quite understand how strings are represented; you're copying both the pointer (with the assignment) and the actual characters (with the strcpy()).
You should just do compress(argv[2]); once you've verified that that argument is valid.
First, to copy something from argv[2] to somewhere else, you need some memory for "that somewhere else". You could allocate the memory based on the size of argv[2] but for our simple example, a very large fixed size buffer will do.
char inputfile[2048];
It looks like you tried to do this by the assignment operator, which doesn't really do what you intended.
// this is not the way to what you seek, as it doesn't create any new memory for inputfile
char* inputfile = argv[2];
in passing the inputfile variable to a procedure, you want to pass much more than a single character, so void compress(char inputfile) is not an option. That leaves
compress(char *inputFile) // I prefer this one
compress (char inputFile[])
which both work, but in my experience the first is preferred, as some older compilers tend to make distinctions between pointer and array semantics. These compilers have no issues casting an array to a pointer (which is required as part of the C language specification); however, casting a pointer to an array gets a bit messy in such systems.
You've not allocated any memory for the char * to use. All you've done with char *inputfile is allocated a pointer.

Resources