reading a big bin-file(~2mb) in c - c

I want to read a bin-file with a size under 2mb.
At the moment my code for reading the bin file looks like this:
edit:
#define MAX_BYTES_IN_FILE 500000 // ~ 2mb
#define ERROR_FILE 1
int get_byte_from_file(FILE *stream, unsigned char *dataarray) {
int counter = 0;
while ((dataarray[counter] = fgetc(stream)) != EOF) {
counter += 1;
}
return counter;
}
Main looks like this for the example use of the function.
int main(int argc, char **argv) {
FILE *datei;
unsigned int number_of_bytes;
unsigned char *dataarray;
dataarray = (unsigned char *)malloc(sizeof(unsigned char) * MAX_BYTES_IN_FILE);
datei = fopen(argv[1], "rb");
number_of_bytes = get_byte_from_file(datei, dataarray);
for (int i = 0; i < number_of_bytes; i++)
printf("%x ", dataarray[i]);
return 0;
}
Maybe I did a simple mistake but cant see it the error is still: Segmentation fault (core dumped)

This line is sufficient to crash your program:
while ((dataarray[counter] = fgetc(stream)) != EOF) {
Let's go through it step by step:
fgetc(stream) reads a byte and returns its value or EOF. Because a byte can have any possible value, fgetc() returns a larger int, which can hold an EOF value that is distinct from any byte value that might be found in the file.
You assign this int value to an unsigned char. An EOF value will be truncated to this datatype.
The value of the assignment is of type unsigned char, and the converted EOF value is not equal to EOF anymore. Thus, the comparison always fails, and your program keeps fetching data until the buffer overruns and nasty things begin to happen.
You need to store the result of fgetc() in an int variable until you've checked that it is indeed not the EOF value.

Maybe something like this.
void *readfile(FILE *fi, long *filesize)
{
void *buff;
fseek(fi, 0, SEEK_END);
*filesize = ftell(fi);
fseek(fi, 0, SEEK_SET);
buff = malloc(*filesize);
if(buff)
{
fread(buff, 1, *filesize, fi);
}
return buff;
}
You need to add error checks- I did not as it is only the idea.
And your usage:
int main(int argc, char **argv) {
FILE *datei;
long number_of_bytes;
unsigned char *dataarray;
datei=fopen(argv[1],"rb");
dataarray = readfile(datei, &number_of_bytes);
for (int i=0;dataarray && i<number_of_bytes;i++)
printf("%hhx ",dataarray[i]);
return 0;
}

The reason you get a segmentation fault is your allocation is incorrect: you allocate MAX_BYTES_IN_FILE bytes instead of unsigned int elements. As allocated, the array has only MAX_BYTES_IN_FILE / sizeof(unsigned int) elements, whereas the file is probably MAX_BYTES_IN_FILE * sizeof(unsigned int) bytes long.
You are reading bytes from the file (values between 0 and 255) but you use unsigned int elements. What is the logic? Does the file contain 32-bit values or individual bytes?
Once you can confirm that the file contents is exactly the same as the representation of the array in memory, you can use fread() to read the whole file in a single call.

Related

Sprintf function converting int to a single char instead of a string

I'm trying to convert the unsigned long integer converted_binary, which contains 10000000000 to a string, but sprintf converts it to a single character 1 instead.
I am able to know this through the vscode debugger.
I expect sprintf to convert the details of converted_binary to a string, but it doesn't. I initially thought the problem was with the malloc, but that doesn't seem to be the case as the problem persists even if manually create a character array large enough.
I've also tried to replace the sprintf with printf to see if something is wrong with the converted_binary variable, but it prints out 10000000000 to stdout normally.
This is the code snippet:
int get_bit(unsigned long int n, unsigned int index)
{
unsigned long int converted_binary, arg_int_len, int_len = 0;
char *converted_string;
int bit;
/*convert n to binary*/
converted_binary = convert(n);
/*convert binary to string*/
arg_int_len = converted_binary;
do
{
arg_int_len = arg_int_len / 10;
++int_len;
}
while (arg_int_len != 0);
converted_string = malloc(sizeof(char *) * int_len);
if (converted_string == NULL)
return (-1);
sprintf(converted_string, "%lu", converted_binary);
/*Loop through string to binary at index*/
bit = (int)converted_string[index];
/*pass that into a variable*/
/*Return the variable*/
return bit;
}

confusion about printf() in C

I'm trying to hexdump a file with following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define SIZE 16
void pre_process(char buffer[],int len);
int main(int argc, char **argv){
if(argc == 2){
char *file = argv[1];
FILE *input = fopen(file,"r");
char buffer[SIZE];
char *tmp = malloc(4);
while(!feof(input)){
printf("%06X ",ftell(input)); /*print file pos*/
fread(buffer,1,SIZE,input); /*read 16 bytes with buffer*/
for (int i=0;i<SIZE;i += 4){ /*print each 4 bytes with hex in buffer*/
memcpy(tmp,buffer+i,4);
printf("%08X ",tmp);
}
printf("*");
pre_process(buffer,SIZE); /*print origin plain-text in buffer. subsitute unprint char with '*' */
printf("%s",buffer);
printf("*\n");
}
free(tmp);
fclose(input);
}
}
void pre_process(char buffer[],int len){
for (int i=0;i<len;i++){
if(isblank(buffer[i]) || !isprint(buffer[i]))
buffer[i] = '*';
}
}
reading a slice from lord of ring,result as below:
enter image description here
so, why the hex code are all the same ? It looks like something wrong with printf("%08X ",tmp);
thx for your help.
The answer lies here:
memcpy(tmp,buffer+i,4);
printf("%08X ",tmp);
memcpy as you might already be aware, copies 4 bytes from buffer+i to where tmp is pointing to.
Even though this is done in a loop, tmp continues to hold the address of a specific location, which is never changed. The contents at that address/location in memory are updated with every memcpy() call.
In a nutshell, the house remains there only, hence the address remains the same but people change places, new people arrive as older ones are wiped out!
Also, there is plenty to improve/fix here. I recommend starting with enabling warnings by -Wall option with your compiler.
tmp stores the address of a buffer; that address never changes. What you want to print is the contents of the buffer that tmp points to. In this case, tmp point to a buffer of 4 chars; if you write
printf( "%08X ", *tmp );
you’ll only print the value of the first element - since tmp has type char *, the expression *tmp has type char and is equivalent to writing tmp[0].
To treat what’s in those bytes as an unsigned int (which is what the %X conversion specifier expects), you need to cast the pointer to the correct type before dereferencing it:
printf( "%08X ", *(unsigned int *) tmp );
We first have to cast tmp from char * to unsigned int *, then dereference the result to get the unsigned int equivalent of those four bytes.
This assumes sizeof (unsigned int) == 4 on your system - to be safe, you should write your malloc call as
char *tmp = malloc( sizeof (unsigned int) );
and
for ( int i = 0; i < SIZE; i += sizeof (unsigned int) )
{
memcpy( tmp, buffer + i, sizeof (unsigned int) );
...
}
instead.
You should not use feof as your loop condition - it won’t return true until after you try to read past the end of the file, so your loop will execute once too often. You’ll want to look at the return value of fread to determine whether you’ve reached the end of the file.

unsigned characters and sprintf() C

I have this code:
int main(){
char buffer[1024];
char port = 1;
int length = 255;
char * record = "$TAG ,0 ,89 ,0, 1\n";
if(length < 0 || length > 255){
printf("Error - length out of range for an unsigned char.\n");
exit(1);
}
snprintf(buffer, 1024, "%c%c%s", port, (unsigned char) length, record);
int port_rc = buffer[0];
int length_rc = buffer[1];
printf("port_rc: %d\n",port_rc);
printf("length_rc: %d\n",length_rc);
return 0;
}
Output when I run it:
port_rc: 1
length_rc: -1
I think I am missing something here in terms of snprintf() as i'm not seeing the 255 value when reading the array it created back. My guess is that snprintf() is promoting the variable 'length' to an int or something. Does anyone know how I can achieve this?
Thanks.
I don't think you can use sprintf() to store 255 into the buffer. The buffer argument to sprintf() is a char array. Whether char is signed or unsigned by default is implementation-defined; see Is char signed or unsigned by default?. If 255 is greater than CHAR_MAX, trying to store 255 results in undefined behavior; if the implementation defaults to signed then CHAR_MAX will probably be 127.
I suggest not using sprintf() to store numbers into the buffer. Declare it as:
unsigned char buffer[127];
Then you can do:
buffer[0] = port;
buffer[1] = length;
snprintf((char *)(buffer + 2), sizeof buffer - 2, "%s", record);
"Be careful, though," when judging such a "solution."
In my humble, the root problem – in your original post - is that the variables port_rc and length_rc should have been declared as unsigned integers. You do not want a value such as $FF to be erroneously "sign-extended" to become $FFFFFFFF == -1 ...
Your "solution" is quite different from the original because, as you see, it now stores into both buffer[0] and buffer[1] before then retrieving and examining those values!
WORKING SOLUTION:
int main(){
unsigned char buffer[1024];
char port = 1;
int length = 255;
char * record = "$TAG ,0 ,89 ,0, 1\n";
if(length < 0 || length > 255){
printf("Error - length out of range for an unsigned char.\n");
exit(1);
}
buffer[0] = port;
buffer[1] = length;
snprintf((char *)(buffer), sizeof buffer - 2, "%c%c%s", port,length,record);
int port_rc = buffer[0];
int length_rc = buffer[1];
char char_first = buffer[2];
printf("port_rc: %d\n",port_rc);
printf("length_rc: %d\n",length_rc);
printf("char_first: %c\n",char_first);
return 0;
}
RETURNS:
port_rc: 1
length_rc: 255
char_first: $

segment fault after parsing unsigned char to int pointer

I'm facing a problem when trying to read a binary file of fixed size. The code below returns segmentation Fault just before closing the file. What I want to achieve it is to return an int pointer back to the main function.
The file it s a raw grayscale image, which has values from 0 to 255, that's why I'm using unsigned char.
How can I convert and assign from unsigned char to *int properly?
Any kind of help will be welcome!
void readBinaryFile(char *filename, int *in){
FILE *file;
long length;
unsigned char *imagen;
int c;
file = fopen(filename, "rb");
fseek(file,0,SEEK_END);
length = ftell(file);
imagen = (unsigned char *) malloc(length);
fseek(file, 0, SEEK_SET);
fread(imagen, length, 1 , file);
int cont;
//c+4: file contains values from 0 to 255
for(c=0,cont=0;c<length;c=c+4,cont++){
in[cont] = (unsigned char) imagen[c];
}
for(cont=0;cont<length/4;cont++){
printf("%d",(int) in[cont]);
}
fclose(archivo);
free(imagen_buffer);
}
void main(int argc, char **argv){
int *in;
int fixed_size = 784;
in = (int *) malloc((fixed_size)*(fixed_size));
readBinaryFile("test.raw", in);
int c;
for(c=0;c<((fixed_size)*(fixed_size));c++){
printf("%d", (unsigned char) in[c]);
}
}
This
int *in;
makes in point to an int.
This
in = (int *) malloc((fixed_size)*(fixed_size));
allocates to in 784*784=614656 bytes.
You need 784*784 ints.
An int needs sizeof (int) bytes, which by definition is very well might be and most often is larger then 1.
From 1. and 2. above it can be deduces that 784*784 int needs more then 784*784 bytes.
So change
in = (int *) malloc((fixed_size)*(fixed_size));
to be
in = (int *) malloc((fixed_size)*(fixed_size) * sizeof (int));
or even nicer and safer with less noise do
in = malloc(fixed_size*fixed_size * sizeof *in);
because
in C there is no need to cast void* (which malloc() returns).
the parentheses around fixed_size are useless.
using sizeof *in returns the same as sizeof (int), but would "survive" if you changed int * in to become for example unsigned * in.
Also while reading you need to make sure to not read more then 784*784 char, as else the memory allocated for the int will be overflowed, which in turn invoked the infamous Undefined Behaviour.

Copying a file line by line into a char array with strncpy

So i am trying to read a text file line by line and save each line into a char array.
From my printout in the loop I can tell it is counting the lines and the number of characters per line properly but I am having problems with strncpy. When I try to print the data array it only displays 2 strange characters. I have never worked with strncpy so I feel my issue may have something to do with null-termination.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[])
{
FILE *f = fopen("/home/tgarvin/yes", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos); fread(bytes, pos, 1, f);
int i = 0;
int counter = 0;
char* data[counter];
int length;
int len=strlen(data);
int start = 0;
int end = 0;
for(; i<pos; i++)
{
if(*(bytes+i)=='\n'){
end = i;
length=end-start;
data[counter]=(char*)malloc(sizeof(char)*(length)+1);
strncpy(data[counter], bytes+start, length);
printf("%d\n", counter);
printf("%d\n", length);
start=end+1;
counter=counter+1;
}
}
printf("%s\n", data);
return 0;
}
Your "data[]" array is declared as an array of pointers to characters of size 0. When you assign pointers to it there is no space for them. This could cause no end of trouble.
The simplest fix would be to make a pass over the array to determine the number of lines and then do something like "char **data = malloc(number_of_lines * sizeof(char *))". Then doing assignments of "data[counter]" will work.
You're right that strncpy() is a problem -- it won't '\0' terminate the string if it copies the maximum number of bytes. After the strncpy() add "data[counter][length ] = '\0';"
The printf() at the end is wrong. To print all the lines use "for (i = 0; i < counter; i++) printf("%s\n", data[counter]);"
Several instances of bad juju, the most pertinent one being:
int counter = 0;
char* data[counter];
You've just declared data as a variable-length array with zero elements. Despite their name, VLAs are not truly variable; you cannot change the length of the array after allocating it. So when you execute the lines
data[counter]=(char*)malloc(sizeof(char)*(length)+1);
strncpy(data[counter], bytes+start, length);
data[counter] is referring to memory you don't own, so you're invoking undefined behavior.
Since you don't know how many lines you're reading from the file beforehand, you need to create a structure that can be extended dynamically. Here's an example:
/**
* Initial allocation of data array (array of pointer to char)
*/
char **dataAlloc(size_t initialSize)
{
char **data= malloc(sizeof *data * initialSize);
return data;
}
/**
* Extend data array; each extension doubles the length
* of the array. If the extension succeeds, the function
* will return 1; if not, the function returns 0, and the
* values of data and length are unchanged.
*/
int dataExtend(char ***data, size_t *length)
{
int r = 0;
char **tmp = realloc(*data, sizeof *tmp * 2 * *length);
if (tmp)
{
*length= 2 * *length;
*data = tmp;
r = 1;
}
return r;
}
Then in your main program, you would declare data as
char **data;
with a separate variable to track the size:
size_t dataLength = SOME_INITIAL_SIZE_GREATER_THAN_0;
You would allocate the array as
data = dataAlloc(dataLength);
initially. Then in your loop, you would compare your counter against the current array size and extend the array when they compare equal, like so:
if (counter == dataLength)
{
if (!dataExtend(&data, &dataLength))
{
/* Could not extend data array; treat as a fatal error */
fprintf(stderr, "Could not extend data array; exiting\n");
exit(EXIT_FAILURE);
}
}
data[counter] = malloc(sizeof *data[counter] * length + 1);
if (data[counter])
{
strncpy(data[counter], bytes+start, length);
data[counter][length] = 0; // add the 0 terminator
}
else
{
/* malloc failed; treat as a fatal error */
fprintf(stderr, "Could not allocate memory for string; exiting\n");
exit(EXIT_FAILURE);
}
counter++;
You are trying to print data with a format specifier %s, while your data is a array of pointer s to char.
Now talking about copying a string with giving size:
As far as I like it, I would suggest you to use
strlcpy() instead of strncpy()
size_t strlcpy( char *dst, const char *src, size_t siz);
as strncpy wont terminate the string with NULL,
strlcpy() solves this issue.
strings copied by strlcpy are always NULL terminated.
Allocate proper memory to the variable data[counter]. In your case counter is set to 0. Hence it will give segmentation fault if you try to access data[1] etc.
Declaring a variable like data[counter] is a bad practice. Even if counter changes in the subsequent flow of the program it wont be useful to allocate memory to the array data.
Hence use a double char pointer as stated above.
You can use your existing loop to find the number of lines first.
The last printf is wrong. You will be printing just the first line with it.
Iterate over the loop once you fix the above issue.
Change
int counter = 0;
char* data[counter];
...
int len=strlen(data);
...
for(; i<pos; i++)
...
strncpy(data[counter], bytes+start, length);
...
to
int counter = 0;
#define MAX_DATA_LINES 1024
char* data[MAX_DATA_LINES]; //1
...
for(; i<pos && counter < MAX_DATA_LINES ; i++) //2
...
strncpy(data[counter], bytes+start, length);
...
//1: to prepare valid memory storage for pointers to lines (e.g. data[0] to data[MAX_DATA_LINES]). Without doing this, you may hit into 'segmentation fault' error, if you do not, you are lucky.
//2: Just to ensure that if the total number of lines in the file are < MAX_DATA_LINES. You do not run into 'segmentation fault' error, because the memory storage for pointer to line data[>MAX_DATA_LINES] is no more valid.
I think that this might be a quicker implementation as you won't have to copy the contents of all the strings from the bytes array to a secondary array. You will of course lose your '\n' characters though.
It also takes into account files that don't end with a new line character and as pos is defined as long the array index used for bytes[] and also the length should be long.
#include <stdio.h>
#include <stdlib.h>
#define DEFAULT_LINE_ARRAY_DIM 100
int main(int argc, char* argv[])
{
FILE *f = fopen("test.c", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos+1); /* include an extra byte incase file isn't '\n' terminated */
fread(bytes, pos, 1, f);
if (bytes[pos-1]!='\n')
{
bytes[pos++] = '\n';
}
long i;
long length = 0;
int counter = 0;
size_t size=DEFAULT_LINE_ARRAY_DIM;
char** data=malloc(size*sizeof(char*));
data[0]=bytes;
for(i=0; i<pos; i++)
{
if (bytes[i]=='\n') {
bytes[i]='\0';
counter++;
if (counter>=size) {
size+=DEFAULT_LINE_ARRAY_DIM;
data=realloc(data,size*sizeof(char*));
if (data==NULL) {
fprintf(stderr,"Couldn't allocate enough memory!\n");
exit(1);
}
}
data[counter]=&bytes[i+1];
length = data[counter] - data[counter - 1] - 1;
printf("%d\n", counter);
printf("%ld\n", length);
}
}
for (i=0;i<counter;i++)
printf("%s\n", data[i]);
return 0;
}

Resources