I implemented the following two functions for RLE compression of binary files.
char* RLEcompress(char* data, size_t origSize, size_t* compressedSize) {
char* ret = calloc(2 * origSize, 1);
size_t retIdx = 0, inIdx = 0;
size_t retSize = 0;
while (inIdx < origSize) {
size_t count = 1;
size_t contIdx = inIdx;
while (contIdx < origSize - 1 && data[inIdx] == data[++contIdx]) {
count++;
}
size_t tmpCount = count;
// break down counts with 2 or more digits into counts ≤ 9
while (tmpCount > 9) {
tmpCount -= 9;
ret[retIdx++] = data[inIdx];
ret[retIdx++] = data[inIdx];
ret[retIdx++] = '9';
retSize += 3;
}
ret[retIdx++] = data[inIdx];
retSize += 1;
if (tmpCount > 1) {
// repeat character (this tells the decompressor that the next digit
// is in fact the # of consecutive occurrences of this char)
ret[retIdx++] = data[inIdx];
// convert single-digit count to dataing
ret[retIdx++] = '0' + tmpCount;
retSize += 2;
}
inIdx += count;
}
*compressedSize = retSize;
return ret;
}
char* RLEdecompress(char* data, size_t compressedSize, size_t uncompressedSize, size_t extraAllocation) {
char* ret = calloc(uncompressedSize + extraAllocation, 1);
size_t retIdx = 0, inIdx = 0;
while (inIdx < compressedSize) {
ret[retIdx++] = data[inIdx];
if (data[inIdx] == data[inIdx + 1]) { // next digit is the # of occurrences
size_t occ = ((data[inIdx + 2]) - '0');
for (size_t i = 1; i < occ && retIdx < compressedSize; i++) {
ret[retIdx++] = data[inIdx];
}
inIdx += 2;
}
inIdx += 1;
}
return ret;
}
They seem to work fine, i.e. diff doesn't produce any output when comparing the original files to the compressed-then-uncompressed versions.
However, every once in a while, the files will differ indicating there is a bug somewhere. I haven't been able to find a pattern in the files that exhibit this, but I'll give you an example of what the difference looks like.
The lower one is the original.
As you can see, the byte 21 is repeated twice in the compressed-then-uncompressed version. I haven't been able to identify the issue. Unfortunately the bug happens with very few files: so far I've only observed it with two pdf files, including the one shown above, but I can't share them because it's copyrighted content, but I'm working on coming up with another file that fails so I can provide you with an example.
I have a feeling there is something "obvious" wrong with the code above and I'm just missing it. Help is greatly appreciated.
EDIT:
Here's a test program I'm using to read the offending file, compressing it, then decompressing it. I'm also saving the compressed one to disk in a middle step to have more debug data.
int main(int argc, char** argv) {
size_t compsz;
FILE* fp = fopen(argv[1], "r");
if (!fp) {
perror("fp");
return 1;
}
if (fseek(fp, 0L, SEEK_END) == -1) {
return -1;
}
// get file size
size_t filecontentLen = ftell(fp);
if (filecontentLen < 0) {
return -1;
}
rewind(fp);
char* filecontentBuf = calloc(filecontentLen, 1);
if (!filecontentBuf) {
fclose(fp);
errno = ENOMEM;
return -1;
}
// read original
if (fread(filecontentBuf, sizeof(char), filecontentLen, fp) <= 0) {
int errnosave = errno;
if (ferror(fp)) {
fclose(fp);
free(filecontentBuf);
errno = errnosave;
return -1;
}
}
// write compressed
char* compressed = RLEcompress(filecontentBuf, filecontentLen, &compsz);
FILE* fpcompWrite = fopen("compressed", "w+");
if (fwrite(compressed, compsz, 1, fpcompWrite) == -1) {
perror("fwrite");
}
fclose(fpcompWrite);
// read compressed
FILE* fpcompRead = fopen("compressed", "r");
if (!fpcompRead) {
perror("fpcompRead");
return 1;
}
char* compBuf = calloc(compsz * 2, 1);
fread(compBuf, compsz, 1, fpcompRead);
fclose(fpcompRead);
// decompress and write file
char* uncompBuf = RLEdecompress(compBuf, compsz, filecontentLen, 0);
FILE* funcomp = fopen("uncompressed", "w+");
fwrite(uncompBuf, filecontentLen, 1, funcomp);
fclose(funcomp);
}
I think the problem is that
for (size_t i = 1; i < occ && retIdx < compressedSize; i++) {
ret[retIdx++] = data[inIdx];
}
should be changed in
for (size_t i = 1; i < occ && retIdx < uncompressedSize; i++) {
ret[retIdx++] = data[inIdx];
}
in the decompression algorithm, since redIdx is bounded by uncompressedSize, and maybe in some rare cases it copies fewer bytes than it should.
Related
I made some research but nothing was really concerning my problem...
I'm actually trying to code LZW compression for school, and I need a function to check if an element is in my dictionnary.
However, when I'm calling this function, it tries to access to the 64th element in my dictionnary, but it has desapeared !! I checked it before the function calling, it was here !! And the worse is that I can call this element in the previous callings of the function.
Could you help me please ?
The function :
int is_in_dictionnary(dico * p_pRoot, char * p_string){
int i = 0, j = 0;
char a[1024] = { 0 }, b[1024] = { 0 };
//strcpy(b, p_pRoot->m_dico[64].m_info);
for (i = 0; i < p_pRoot->m_index; i++){
printf("dico %s\n", p_pRoot->m_dico[i].m_info);
strcpy(a, p_string);
strcpy(b, p_pRoot->m_dico[i].m_info);
j = strcmp(a, b);
if (j == 0)
return i;
}
return -1;
}
The console, we are herer abble to see that the function previously called the 64th element "#", whithout any problem
The error on visual studio
Some people Asked me to add the code part where it's not functionning :
void lzw_compress(dico *p_pRoot, char * path)
{
FILE *pFile = NULL, *pCompFile = NULL;
int len_c = 0, size_tamp = 0, i = 0, masked_tamp = 0, tamp_to_write = 0, index_tamp = 0, a;
unsigned char char_tamp = 0, cAndTamp[1024] = { 0 }, tampon[1024] = { 0 }, c = '\0', temp[2] = { 0 };
char test[128] = { 0 };
pFile = fopen(path, "r+");
if (!pFile)
{
printf("problem while opening file to compress");
return;
}
size_t len = strlen(path); //creation of the output file name : paht+ ".lzw"
unsigned char *compress_name = malloc(len + 4 + 1);
strcpy(compress_name, path);
compress_name[len] = '.';
compress_name[len + 1] = 'l';
compress_name[len + 2] = 'z';
compress_name[len + 3] = 'h';
compress_name[len + 4] = '\0';
pCompFile = fopen(compress_name, "w"); //creation of the output file
free(compress_name);
while (1)
{
if (feof(pFile))
break;
c = freadByte(pFile);
for (i = 0; i < 1024; i++)
cAndTamp[i] = 0;
temp[0] = c;
strcat(cAndTamp, tampon);
strcat(cAndTamp, temp);
strcpy(test, p_pRoot->m_dico[64].m_info);
a = 0;
if (is_in_dictionnary(p_pRoot, cAndTamp) > -1)
{
strcpy(tampon, cAndTamp);
a = 0;
}
else
{
if (is_in_dictionnary(p_pRoot, tampon) < 256) //write the character in the file
{
char_tamp = tampon[0];
fwrite(&char_tamp, sizeof(char), 1, pCompFile);
a = 0;
}
else
{
a = 0;
index_tamp = is_in_dictionnary(p_pRoot, tampon);
a = 0;
for (i = 0; i < p_pRoot->m_size; i++)
{
mask = 1 << i;
masked_tamp = index_tamp & mask;
tamp_to_write = masked_tamp >> i;
fwriteBit(tamp_to_write, pCompFile);
flush(pCompFile);
}
}
strcpy(test, p_pRoot->m_dico[64].m_info); //HERE IT'S OK
add_dictionnary(p_pRoot, cAndTamp, size_tamp + 1); //add the string tamp + read byte in the dictionnay
strcpy(test, p_pRoot->m_dico[64].m_info); //HERE IT IS NOT OK
strcpy(tampon, temp);
}
strcpy(test, p_pRoot->m_dico[64].m_info);
size_tamp = is_in_dictionnary(p_pRoot, tampon);
}
if (tampon < 256) //write the character in the file
{
char_tamp = (char)tampon;
fwrite(&char_tamp, sizeof(char), 1, pCompFile);
}
else
{
index_tamp = is_in_dictionnary(p_pRoot, tampon);
for (i = 0; i < p_pRoot->m_size; i++)
{
mask = 1 << i;
masked_tamp = index_tamp & mask;
tamp_to_write = masked_tamp >> i;
fwriteBit(tamp_to_write, pCompFile);
flush(pCompFile);
}
}
fclose(pFile);
fclose(pCompFile);
}
The fucnction that where I think there is a problem
void add_dictionnary(dico * p_pRoot, char * p_string, int p_stringSize)
{
p_pRoot->m_index++;
if (p_pRoot->m_index == pow(2, p_pRoot->m_size))
realloc_dictionnary(p_pRoot);
p_pRoot->m_dico[p_pRoot->m_index].m_info = (char*)calloc(p_stringSize, sizeof(char));
strcpy(p_pRoot->m_dico[p_pRoot->m_index].m_info, p_string);
}
Another thank you guys !
I showed again the program to my teacher and he found the problem !
The problem is that i never use malloc and rarely use realloc so here was the problem :
void realloc_dictionnary(dico * p_pRoot)
{
int real = p_pRoot->m_size + 1;
int size = pow(2, real);
printf("index %d, previous pow %d, new power %d, size %d\n", p_pRoot->m_index, p_pRoot->m_size, real, size);
p_pRoot->m_dico = (code*) realloc(p_pRoot->m_dico, size);
p_pRoot->m_size = real;
}
size in a number of bits, ...
So the correction is : size * sizeof(code)!
void realloc_dictionnary(dico * p_pRoot)
{
int real = p_pRoot->m_size + 1;
int size = pow(2, real);
printf("index %d, previous pow %d, new power %d, size %d\n", p_pRoot->m_index, p_pRoot->m_size, real, size);
p_pRoot->m_dico = (code*) realloc(p_pRoot->m_dico, size * sizeof(code));
p_pRoot->m_size = real;
}
I would like to first of all say sorry because of this so little errror and also a big thanks for your great patience !
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BS 12
void reverse(char * buffer, int size)
{
char tmp;
int i;
for(i = 0; i < size / 2; i++)
{
tmp = (char)buffer[i];
buffer[i] = buffer[size - i - 1];
buffer[size - i - 1] = tmp;
}
}
int compare_bin(char * buffer, char * buffer2, int size)
{
// because strncmp is only for string without \x00, so there must be a customized compare function
int i;
for(i = 0; i < size; i++)
{
if(buffer[i] != buffer2[i])
return 0;
}
return 1;
}
int main (const int argc, const char** argv)
{
if(argc != 3)
exit(-1);
int equal = 1;
char * buffer = malloc(BS), * buffer2 = malloc(BS);
FILE * f1, * f2;
f1 = fopen(argv[1], "r");
f2 = fopen(argv[2], "r");
fseek(f1, 0, SEEK_END);
fseek(f2, 0, SEEK_END);
long i = ftell(f1), j = ftell(f2);
if(i != j)
{
equal = 0;
goto endp;
}
fseek(f2, 0, SEEK_SET);
int need = 0;
int count;
int f2_pos = 0;
do
{
i = i - BS;
if(i < 0)
{
need = BS - abs((int)i);
i = 0;
}
else
need = BS;
fseek(f1, i, SEEK_SET);
count = fread(buffer, need, 1, f1);
reverse(buffer, count * need);
// fwrite(buffer, count * need, 1, f2);
fread(buffer2, need * need, 1, f2);
// printf("compare...\n");
// for(int i = 0; i < need * count; i++)
// {
// printf("%02hhX", buffer[i]);
// }
// printf("\n");
// for (int i = 0; i < need * count; i++)
// {
// printf("%02hhX", buffer2[i]);
// }
// printf("\n");
if(compare_bin(buffer, buffer2, need * count) == 0)
{
equal = 0;
break;
}
f2_pos += need * count;
fseek(f2, f2_pos, SEEK_SET);
if(i == 0)
break;
}while(i > 0);
fclose(f1);
fclose(f2);
free(buffer);
free(buffer2);
endp:
if(equal)
return 0;
else
{
printf("2 files not equal is reversed order\n");
return 1;
}
return 0;
}
So I write a program to compare file content in reverse order. I have already considered \x00 in binary file and strncmp isn't used. But there is still flaw. There is a test server to test this program. But I dont have access to it. This program always fails on that server. So there must be some special cases to make it fail. Any idea?
There are other ways around it. For instance, calculating MD5. But I want to fix this.
For the very first iteration where you read data you have
fread(buffer2, need * need, 1, f2);
The problem is that in that case need is 12, which is the size of the memory allocated for buffer2, but you ask to read 12 * 12 bytes.
If the second file is large enough, you will write out of bounds in the memory, leading to undefined behavior. If the file is not large enough then you won't read anything.
Also note that the order of the two middle arguments to fread matter. If you changed the order you would write out of bounds of the buffer both if the file is larger than need * need or not. You should really read count byte-sized object (the second argument should be 1 and the third should be count, which of course mean you need to change the order in the first call as well).
In short, your two fread calls should be
count = fread(buffer, 1, BS, f1);
fread(buffer2, 1, count, f2);
PS. Don't forget error checking.
My code has a memory leak problem. I don't know where I went wrong. Below is the code: I am trying to read from csv file and store a particular columns.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main ()
{
FILE *result = fopen ("C:\\Users\\pa1rs\\Desktop\\local.csv", "w");
const char *text = "LOA,NAME,";
fprintf (result, "%s", text);
char *token;
char *endToken;
int lines = 0;
char ch; /* should check the result */
FILE *file = fopen ("C:\\Users\\pa1rs\\Desktop\\samplee.csv", "r");
char line[300];
if (file == NULL) {
perror ("Error opening the file");
} else {
while (!feof (file)) {
ch = fgetc (file);
if (ch == '\n') {
lines = lines + 1;
}
}
//printf(" no of lines existing in the file %d\n\n", lines);
}
fseek (file, 0, SEEK_SET);
while ((ch = fgetc (file)) != '\n') {
// we don't need the first line on sample.csv
// as it is just the description part
}
int s[lines - 1];
int j = 0;
char *N[lines - 1];
while (fgets (line, sizeof (line), file)) {
int i = 0;
token = line;
do {
endToken = strchr (token, ',');
if (endToken)
*endToken = '\0';
if (i == 3) {
s[j] = atoi (token);
}
if (i == 12) {
N[j] = (char *) malloc (strlen (token) * sizeof (char));
strcpy (N[j], token);
}
if (endToken)
token = endToken + 1;
i++;
} while (endToken);
j = j + 1;
}
//******************************************************unigue loa
int count = 0;
int g = 0;
int h = 0;
int LOA[lines - 1];
int dd = 0;
for (dd = 0; dd < lines - 1; dd++) {
LOA[dd] = 0;
}
for (g = 0; g < lines - 1; g++) {
for (h = 0; h < count; h++) {
if (s[g] == LOA[h])
break;
}
if (h == count) {
LOA[count] = s[g];
count++;
}
}
int xw = 0;
for (xw = 0; xw < count; xw++) {
//printf("%d \t",LOA[xw]);
}
//printf("LOA Array Length is: %d \n",count);
//********************************************************
////FOR UNIQUE NAMES ARRAY
//printf("No of unique names are %d",county);
//FOR UNIQUE CAUSES ARRAY
char *sa[9] =
{ "Monticello", "Valparaiso", "Crown Point", "Plymouth", "Goshen",
"Gary", "Hammond", "Laporte", "Angola" };
int countz = 0;
int gz = 0;
int hz = 0;
char *LOAz[lines - 1];
int zero2 = 0;
for (zero2 = 0; zero2 < lines - 1; zero2++) {
LOAz[zero2] = NULL;
}
for (gz = 0; gz < lines - 1; gz++) {
for (hz = 0; hz < countz; hz++) {
if (strcmp (N[gz], LOAz[hz]) == 0)
break;
}
if (hz == countz) {
LOAz[countz] = (char *) malloc (strlen (N[gz]) * sizeof (char));
strcpy (LOAz[countz], N[gz]);
countz++;
}
}
int nz = 0;
for (nz = 0; nz < countz; nz++) {
fprintf (result, "%s,", LOAz[nz]);
}
fprintf (result, "\n");
// printf("%d",countz);
//*****************************
int i = 0;
int jjj = 0;
int xxx = 0;
int ggg = 0;
int k = 0;
int kount[count][countz];
for (xxx = 0; xxx < count; xxx++) {
for (ggg = 0; ggg < countz; ggg++) {
kount[xxx][ggg] = 0;
}
}
for (i = 0; i < count; i++) {
for (k = 0; k < countz; k++) {
for (jjj = 0; jjj < lines - 1; jjj++) {
if (LOA[i] == s[jjj]) {
if (strcmp (LOAz[k], N[jjj]) == 0) {
kount[i][k]++;
}
}
}
}
}
int ig = 0;
int ik = 0;
for (ig = 0; ig < count; ig++) {
fprintf (result, "%d,%s", LOA[ig], sa[ig]);
for (ik = 0; ik < countz; ik++) {
fprintf (result, ",%d", kount[ig][ik]);
}
fprintf (result, "\n");
}
int rrr = 0;
free (N);
for (rrr = 0; rrr < lines - 1; rrr++) {
free (LOAz[rrr]);
}
//*****************************
//fclose(result);
fclose (file);
return 0;
}
Lines I got here is 13761 and LOAz was declared with array size lines-1=13761, but unique ones I got here are only 49, So I am reallocating memory for that and remaining are unused , I think problem started there.
Please help! Thanks in Advance.
One problem in your code is that you don't allocate enough memory for strings. For example, in these lines:
N[j] = (char*) malloc(strlen(token) * sizeof(char));
strcpy(N[j], token);
// ...
LOAz[countz] = (char*) malloc(strlen(N[gz]) * sizeof(char));
strcpy(LOAz[countz], N[gz]);
The problem is that strlen returns the number of non-zero symbols in the string. However, to store the string you need one more byte, to also store the zero terminating character, so the buffer size to store s should be at least strlen(s) + 1.
Also, a better coding style is to avoid casting the return value of malloc.
I wrote a code in C that read a text file with numbers into memory and the create an 2d int array to store them.
The file has the following format:
9
9 5 6 2235 45558 6 5544 56565 2
The first number is the size of the array and the second line holds as many numbers as the first line says.
MY problem is that the size of the array can't hold more than ~30.000 numbers. How can I make the following code so I can make the array hold until 1.000.000 numbers? I know that I should use some king of long integer but I couldn't do it.
Heres the code
#include <stdio.h>
#include <stdlib.h>
int is_end(char* input) {
return *input == 0;
}
int is_separator(char* input) {
return *input == '\n' || *input == ' ';
}
char* eat_separators(char* input) {
while (is_separator(input))
++input;
return input;
}
size_t count_lines(char* input) {
size_t rows = 1;
while (!is_end(input)) {
if (is_separator(input)) {
++rows;
input = eat_separators(input);
}
else {
++input;
}
}
return rows;
}
char** get_lines(char* input, size_t number_of_rows) {
char* from = input;
size_t length = 0;
size_t line = 0;
size_t i;
char** lines = (char**)malloc(number_of_rows * sizeof(char*));
do {
if (is_end(input) || is_separator(input)) {
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
length = 0;
++line;
input = eat_separators(input);
from = input;
}
else {
++length;
++input;
}
} while (!is_end(input));
/*
lines[line] = (char*)malloc(length + 1);
for (i = 0; i < length; ++i)
lines[line][i] = *(from + i);
lines[line][length] = 0;
++line; */
return lines;
}
int main(int argc, char* argv[]) {
char** lines;
size_t size;
size_t number_of_rows;
int count;
int* children;
FILE *input, *output;
char *contents;
int fileSize = 0;
int i;
input = fopen("xxx.in", "r");
long int filepos = 0L;
fseek(input, 0L, SEEK_END);
fileSize = ftell(input);
fseek(input, 0L, SEEK_SET);
contents = (char*)malloc(fileSize + 1);
size = fread(contents, 1, fileSize, input);
contents[size] = 0;
fclose(input);
number_of_rows = count_lines(contents);
lines = get_lines(contents, number_of_rows);
if ((count = atoi(lines[0])) <= 0 || count > 1000000){
return 1;
}
children = (int*)malloc(count * sizeof(int));
for (i = 0; i < count; ++i) {
if ((children[i] = atoi(lines[i + 1])) <= 0 )
return(-1);
}
// a check to see if everything stored in the array
for(i = 0;i<count;i++)
{
printf(" %d : %d\n", i, children[i]);
}
free(children);
free(lines);
// This is the end! Oh my dear friend, the end!
return 0;
}
First Let me explaint the reason of having only 30.000 number that will give reply to your question?
Basically you are trying to convert the character to ASCII values. Let us take the example of character x whos ASCII value is 120. You are changing the character x with 120, the storage capacity of x is 1 byte but the storage capacity of 120 is 3 bytes. So, basically you have to do memory allocation of 3 times higher the actual value computed as 1 byte is expanding into 3 bytes.
In Your code increase the memory allocation 3 times then your problem would be solved.
If I have an options file along the lines of this:
size = 4
data = 1100010100110010
And I have a 2d size * size array that I want to populate the values in data into, what's the best way of doing it?
To clarify, for the example I have I'd want an array like this:
int[4][4] array = {{1,1,0,0}, {0,1,0,1}, {0,0,1,1}, {0,0,1,0}}. (Not real code but you get the idea).
Size can be really be any number though.
I'm thinking I'd have to read in the size, maloc an array and then maybe read in a string full of data then loop through each char in the data, cast it to an int and stick it in the appropriate index? But I really have no idea how to go about it, have been searching for a while with no luck.
Any help would be cool! :)
int process_file(int **array, char const *file_name)
{
int size = 0;
FILE *file = fopen(file_name, "rt");
if(fp == null)
return -1;//can't open file
char line[1024]; //1024 just for example
if(fgets(line, 1024, file) != 0)
{
if(strncmp(line, "size = ", 7) != 0)
{
fcloes(file);
return -2; //incorrect format
}
size = atoi(line + 7);
array = new int * [size];
for(int i = 0; i < size; ++i)
array[i] = new int [size];
}
else
{
fclose(file);
return -2;//incorrect format
}
if(fgets(line, 1024, file) != 0)
{
if(strncmp(line, "data = ", 7) != 0)
{
fcloes(file);
for(int i = 0; i < size; ++i)
delete [] array[i];
delete [] array;
return -2; //incorrect format
}
for(int i = 7; line[i] != '\n' || line[i] != '\0'; ++i)
array[(i - 7) / size][(i - 7) % size] = line[i] - '0';
}
else
{
fclose(file);
for(int i = 0; i < size; ++i)
delete [] array[i];
delete [] array;
return -2; //incorrect format
}
return 0;
}
Don't forget delete array before program ends;
Loops.
FILE *fp = fopen("waaa.txt", "r");
if(fp == null) { /* bleh */ return; }
int j = 0;
while(char ch = fgetc(fp)) {
for(int i = 0; i < 4; ++i) {
array[j][i] = ch;
}
++j;
}
I am not sure with the fgetc() syntax.. Just check on it. It reads one character at a time.