Related
EDIT: OK, I hear you guys, I've isolated the part of my code that's giving me problems, compiled it and made sure that it still gave me the same results, here it goes:
Like before, the segfault appears after the first instance of the for loop on
strcpy(replace[j]->utf8, strtok(data, "\t")); Thanks again!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#define max_chars 45
#define max_UTF 5
#define max_ASCII 7
#define max_word_length 30
#define max_line_length 70
#define max_texto_line 5000
typedef struct {
char utf8[max_UTF];
char ascii_seq[max_ASCII];
int count;
} Replac;
void getTable(FILE *f, char inputfile[],Replac **replace){
char data[max_line_length];
int j;
f = fopen( inputfile, "r" );
if (f == NULL) {
fprintf(stderr, "Can't open input file %s!\n",inputfile);
exit(1);
}
fgets(data,sizeof data,f);
for(j=0 ; strcmp(data,"\n") ; fgets(data,sizeof data,f), j++){
if (feof(f)) {
break;
}
strcpy(replace[j]->utf8, strtok(data, "\t"));
strcpy(replace[j]->ascii_seq, strtok(NULL, "\n"));
}
fclose(f);
}
int main( int argc, char *argv[] ){
Replac *replace=malloc(max_chars * sizeof(Replac));
FILE *fpr,*f,*fpw;
int carprocess = 0;
setlocale(LC_ALL,"pt_PT.UTF-8");
setlocale(LC_COLLATE,"pt_PT.UTF-8");
getTable(f,argv[1],&replace);
}
The text file that I'm copying the character from is formated something like this
UTFCHAR \tab asciichar
ex
Á 'A
END EDIT
-#-##-###-####-####+#####+####p
So I'm a beginner using C, and I've tried all I could think of, this seems like a pretty straight forward thing to do, but since I'm having such trouble clearly shows I have some gap in my knowledge...
I wont bother you with the full code since it is working perfectly, it's just that I wanted to do things differently and that's when the trouble started.
In short I'm doing a program that collects a set of chars of UTF8 type, and their ascii replacement, and stores them in a struct such as
typedef struct {
char utf8[max_UTF];
char ascii_seq[mac_ASCII];
} Replac;
then in main I did the malloc like this
Replac *replace=malloc(max_chars * sizeof(Replac));
If my thought process is correct, this would create a block of available memory to which *replace is pointing to the starting address.
Then I made a function that scans a few UTF8 chars and their replacement and stores them in the struct, something like
void getTable(FILE *f, char inputfile[],Replac **replace)
now, following the debugger, it seems that I'm creating new variable replace of the type Replace** that's on a completely different address, but inside that address is stored the value to the original malloced struct that I passed through the param.
After that I do a
strcpy(replace[0]->utf8, something I got from the table);
following the debugger and searching through the memory adresses, I see that the first time I do this, the first position of the malloc struct is indeed filled with the right data.
followed by
strcpy(replace[0]->ascii_seq, corresponding ascii sequence to the previous UTF8 char);
and that fills the next memory position in the memory block.
So I get something like while debugging on my variables watch
address replace = (Replac **) 0xbf8104fc that contains 0x0878a008
address *replace = (Replac *) 0x0878a008 that contains the whole struct
so inside the address 0x0878a008 I get the data of the utf8 char and then at the address 0x0878a00d I get the ascii seq.
The problem in on the next instance of the loop, when it's time to
strcpy(replace[1]->utf8, something I got from the table);
I get a segmentation fault after that instruction.
So what do you guys think? Am I approaching things correctly, and I'm getting screwed over by syntax or something like that, or is it the base of my knowledge flawed?
Thanks, and a late happy holidays!
f = fopen( inputfile, "r" );
...
typedef struct
{
char utf8[max_UTF];
char ascii_seq[max_ASCII];
int count;
} Replac;
...
fgets(data,sizeof data,f);
You are mixing binary and text format.
Depending on the compiler, sizeof(Replac) will be 16. This includes sizeof(int) which is always 4. There may also be padding if size is not a multiple of 4.
If your data is stored as text, then it will be something like this:
ABCDE\tABCDEFG123456\n
Note that the size of integer in decimal format is anywhere between 0 to 10, so the size is not fixed. And there are (or there should be) new line \n characters.
So you don't want to read exactly 16 characters. You want to write and then read 3 lines for each record. Example:
ABCDE\n
ABCDEFG\n
123456\n
If you are reading in binary, then open the file in binary and use fwrite and fread. Example:
f = fopen( inputfile, "rb" );
Replac data;
fread(f, sizeof(data), 1, f);
This all depends on how your file was created. If you are writing the file yourself, then show the code you used for writing the data.
Also, ASCII is a subset of Unicode. A in ASCII has the exact same representation as A in UTF8.
strcpy(replace[j]->utf8, strtok(data, "\t"));
I get a segmentation fault after that instruction.
You just got the dereferencing order wrong. You first subscripted with [j] and then dereferenced with ->, as if we had an array of pointers to Replacs. But we rather have a pointer to (the first element of) an array of Replacs, hence we must dereference the pointer first and subscript thereafter, i. e. instead of
replace[j]->utf8
we have to write
(*replace)[j].utf8
or the equivalent
(*replace+j)->utf8
we wrote a program that reads comma-separated integer-values into an array and tries processing them with a parallel structure.
By doing so, we found out that there is a fixed limitation for the maximum size of the dynamic array, which usually gets allocated dynamically by doubling the size. Yet for a dataset with more than 5000 values, we can't double it anymore.
I am a bit confused right now, since technically, we did everything the way other posts pointed out we should do (use realloc, don't use stack but heap instead).
Note that it works fine for any file with less or equal than 5000 values.
We also tried working with realloc, but to the same result.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// compile with gcc filename -lpthread -lm -Wall -Wextra -o test
int reader(int ** array, char * name) {
FILE *fp;
int data,row,col,count,inc;
int capacity=10;
char ch;
fp=fopen(name,"r");
row=col=count=0;
while(EOF!=(inc=fscanf(fp,"%d%c", &data, &ch)) && inc == 2){
if(capacity==count)
// this is the alternative with realloc we tried. Still the same issue.
//*array=malloc(sizeof(int)*(capacity*=2));
*array = realloc(*array, sizeof(int)*(capacity*=2));
(*array)[count++] = data;
//printf("%d ", data);
if(ch == '\n'){
break;
} else if(ch != ','){
fprintf(stderr, "format error of different separator(%c) of Row at %d \n", ch, row);
break;
}
}
// close file stream
fclose(fp);
//*array=malloc( sizeof(int)*count);
*array = realloc(*array, sizeof(int)*count);
return count;
}
int main(){
int cores = 8;
pthread_t p[cores];
int *array;
int i = 0;
array=malloc(sizeof(int)*10);
// read the file
int length = reader(&array, "data_2.txt");
// clean up and exit
free(array);
return 0;
}
EDIT: I included the realloc-command we tried and changed the values back to our original testing values (starting at 10). This didn't impact the result though, or rather still does not work. Thanks anyways for pointing out the errors! I also reduced the included code to the relevant part.
I can't really get my head around the fact that it should work this way, but doesn't, so it might just be a minor mistake we overlooked.
Thanks in advance.
New answer after question has been updated
The use of realloc is wrong. Always do realloc into a new pointer and check for NULL before overwriting the old pointer.
Like:
int* tmp = realloc(....);
if (!tmp)
{
// No more memory
// do error handling
....
}
*array = tmp;
Original answer (not fully valid after question has been updated)
You have some serious problems with the current code.
In main you have:
array=malloc(sizeof(int)*10); // This only allocates memory for 10 int
int length = reader(&array, "data_1.txt");
and in reader you have:
int capacity=5001;
So you assume that the array capacity is 5001 even though you only reserved memory for 10 to start with. So you end up writing outside the reserved array (i.e. undefined behavior).
A better approach could be to handle all allocation in the function (i.e. don't do any allocation in main). If you do that you shall initialize capacity to 0 and rewrite the way capacity grows.
Further, in reader you have:
if(capacity==count)
*array=malloc(sizeof(int)*(capacity*=2));
It is wrong to use malloc as you loose all data already in the array and leak memory as well. Use realloc instead.
Finally, you have:
*array=malloc( sizeof(int)*count);
Again this is wrong for the same reason as above. If you want to resize to the exact size (aka count) use realloc
I fairly new to C Programming, but fprintf() & printf() is behaving strangely and I'm so confused on why--I need some help understanding and diagnosing this issue.
fprintf() Deleting Element of Array
First off, I'm passing in a populated malloc allocated four element char** array into a simple function that will write to a file, everything in the array appears normal and all four elements contain the correct data. The function call in main() looks like this. My array in question is header.
Note: I had to cast this normal (char** array) as a constant in this function parameter, due to the function header parameter. Our professor gave us the header file and we cannot change anything in them.
pgmWrite((const char**) header, (const int**) matrix,
rowPixels, colPixels, outFile);
Next, stopping debugger just before it executes the fprintf() & printf() functions, screenshot showing the array is still populated with my 4 elements.
pgmWrite() - Showing array is still fine
Observe the 4th element of the array after execution of fprintf().
After fprintf() executes, element 3 memory is wiped out.
When run, printf() executes the printing of the array exactly what is shown in the debugger, ending at the 3rd element. Often printing nothing in that spot or in rare cases garbage characters. The behavior of printf() is exactly the same as how fprintf() is working as well.
I'm at a loss here guys, please help me understand what I'm doing wrong. I can only provide these two screenshots, based on me being a new member. I'll try to provide as much information as possible. Thank you. Here is a simplified version of my program. Keep in mind, the professor gave us the function declarations and told us we cannot change them. So, I have to work with what I have here. Also, since this is fileIO, you need to find a *.pgm file to test this.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define rowsInHeader 4
#define maxSizeHeadRow 200
int ** pgmRead( char **header, int *numRows, int *numCols, FILE *in ){
// INITIALIZING
char *headArr[rowsInHeader][maxSizeHeadRow];
char buffer[100];
int r = 0;
fpos_t pos;
// CREATE: Header
while (r < 4){
// IF: Row in pgm file header lists the dimensions of matrix
if (r == 2){
// CURSOR: Saving pointer location in file (see notes in header for method reference)
fgetpos(in, &pos);
// ASSIGN: Dereference column and row pointers from file
fscanf(in, "%d %d", numCols, numRows);
// CURSOR: Moving back to saved pointer location (see notes in header for method reference)
fsetpos(in, &pos);
}
// ASSIGN: Copying header row into array
fgets(buffer, maxSizeHeadRow, in);
strcpy((char*)headArr[r], buffer);
// POINTER: Reference pointer to headArr[]
header[r] = (char*)headArr[r];
// TRAVERSE: To next row in file
r++;
}
// NOTE: Placeholder for return type
return 0;
}
int pgmWrite( const char **header, const int **pixels, int numRows, int numCols, FILE *out ){
// INITIALIZING
int i = 0;
// WRITE: Header
for (i = 0; i < rowsInHeader; i++){
fprintf(out, "%s", header[i]);
printf("%s", header[i]);
}
return 0;
}
int main(int argc, char *argv[]){
char **header = (char**)malloc(rowsInHeader * sizeof(char));
FILE *inFile = fopen("smallFile.pgm", "r");
FILE *outFile = fopen("TestPicture.ascii.pgm", "w");;
int rowPixels = 0;
int colPixels = 0;
int **matrix = NULL;
// READ & WRITE
matrix = pgmRead(header, &rowPixels, &colPixels, inFile);
pgmWrite((const char**)header, (const int**)matrix, rowPixels, colPixels, outFile);
// FINALIZING
fclose(inFile);
free(header);
return 0;
}
You are not allocating your array correctly. This line:
char **header = (char**)malloc(rowsInHeader * sizeof(char));
makes header point to an uninitialized region of memory , size 4 bytes.
Then inside your PGM function you write:
header[r] = (char*)headArr[r];
The code header[r] means to access the r'th pointer stored in the space pointed to by headArr. But since that space is only 4 bytes big, you're actually writing off into the wild blue yonder.
Also, (char *)headArr[r] is a mistake. If you did not use the cast, your compiler would have warned you about this mistake. You should avoid using casts in your code, especially using them to make warnings go away. You're saying to the compiler "Ssh, I know what I'm doing" when in fact you don't know what you are doing.
The entire approach with headArr is flawed from the start: even if you had actually written the right code to implement what you were trying, you'd be returning pointers into space which is deallocated when the function returns.
Basically the whole pgmRead function is a complete mess and it'd be easier to start from scratch. But this time, think carefully about when and where you are allocating memory, and what the types are of your expressions, and don't use casts. Let the pgmRead function do all the allocation.
(Unfortunately, based on your description it looks like you will have to use your casts to call the pgmWrite function since that has a mistake in its signature. const int ** should be const int * const *, and similarly for const char **. I'd recommend to actually change pgmWrite's signature accordingly, get your program working, and then once everything is good, then go back to the broken version that you are forced to use.)
Reading C FAQ - arrays and pointers might be useful too.
This question already has answers here:
Writing and reading (fwrite - fread) structures with pointers
(3 answers)
Closed 8 years ago.
I tried to write and read from a file with pointers in structures. But when I read from file I see some garbage value. I am using GCC 4.7.2 on Linux. Need some help.
Read:
//read from a file
#include<stdio.h>
typedef struct
{
char* name;
char* phone;
}LISTING;
int main(void)
{
LISTING phoneList[14];
FILE * fp = NULL;
fp = fopen("/media/Study/PhoneDirectory.dat","rb");
if(fp == NULL)
printf("Error opening file!!!");
fseek(fp,0,SEEK_SET);
if(fread(&phoneList[1],sizeof(LISTING),1,fp)==1)
printf("%s %s",phoneList[1].name,phoneList[1].phone);
fclose(fp);
return 0;
}
And write:
//Write to file
#include<stdio.h>
typedef struct
{
char* name;
char* phone;
}LISTING;
int main(void)
{
LISTING phoneList[2];
FILE * fp = NULL;
fp = fopen("/media/Study/PhoneDirectory.dat","wb");
phoneList[1].name = "Santosh";
phoneList[1].phone = "9657681798";
if(fwrite(&phoneList[1],sizeof(LISTING),1,fp)==1)
printf("inserted");
fclose(fp);
return 0;
}
Pointers are only meaningful in the application process that they originate from. If you write them to a file, as you're doing here, the values you read back will be meaningless — they will most likely point to uninitialized memory, or to memory which is being used for something else entirely.
You will need to come up with another way of writing this data to a file.
The problem you have is equivocating between char* and char[]. You can certainly assign a string literal to a char*, but you need to understand what the contents of a LISTING structure contain, and how you want to serialize and deserialize data to a file.
It does not make sense to save pointers from one process and read them into another process, so you probably want to save the contents (what a pointer points at). You want to store two values, (name, phone) to the file. Since you likely want to store the literal name and literal phone, let us consider what the file might look like:
roast duck|212-333-4444
peking duck|411-511-61111
duck soup|314-222-3333
free duck|800-111-2222
...
You need functions to serialize and deserialize your data. Since your LISTING type is pointers, you will need to allocate appropriate space for those values, as you read them, and you need functions (methods) to read serialized data from a file and write serialized data to a file.
Reading (you will need to allocate enough space),
int
listing_read(FILE*fp, LISTING* listing)
{
char name_buffer[100];
char phone_buffer[100];
if(!fp) return(-1);
if(!listing) return(-2);
int res = fscanf(fp,"%s|%s\n",name_buffer,phone_buffer);
if( !res ) {
//handle error here
}
//careful here, you cannot free if you didn't malloc/strdup
if(listing->name) free(listing->name);
if(listing->phone) free(listing->phone);
listing->name = strdup(name_buffer);
listing->phone = strdup(phone_buffer);
return(0);
}
Writing (you will need to provide proper formatting),
int
listing_write(FILE*fp, LISTING* listing)
{
if(!fp) return(-1);
if(!listing) return(-2);
fprintf(fp,"%s|%s\n",listing->name,listing->phone);
return(0);
}
Here is how you need to modify your code,
//read from a file
#include<stdio.h>
typedef struct
{
char* name;
char* phone;
}LISTING;
int main(void)
{
LISTING phoneList[14];
FILE* fp = NULL;
if( !(fp = fopen("/media/Study/PhoneDirectory.dat","rb")) ) {
printf("Error opening file!!!");
exit(1);
}
fseek(fp,0,SEEK_SET);
if( listing_read(fp,&phoneList[0]) >= 0 ) {
printf("%s %s",phoneList[0].name,phoneList[0].phone);
}
fclose(fp);
return 0;
}
And here is how writing the file would change,
//Write to file
#include<stdio.h>
typedef struct
{
char* name;
char* phone;
}LISTING;
int main(void)
{
LISTING phoneList[14];
FILE* fp = NULL;
if( !(fp = fopen("/media/Study/PhoneDirectory.dat","wb")) ) {
printf("error, cannot write file\n");
exit(1);
}
phoneList[0].name = "Santosh";
phoneList[0].phone = "9657681798";
if( listing_write(fp,&phoneList[0])>=0) {
printf("inserted");
}
fclose(fp);
return 0;
}
Note that in you writing program you assign the string literals "Santosh" and "9657681798" to the LISTING members name and phone. Though legal to do, you need a better understanding of what C does here. C takes the address of these C-string constants and assigns those addresses to the phonelist[1].name and phonelist[1].phone member pointers.
Consider that if you did this assignment,
phoneList[0].name = "Santosh";
phoneList[0].phone = "9657681798";
You have assigned the pointers to constant strings to your structure members.
But if you were to allocate space (for example, using strdup()),
phoneList[0].name = strdup("Santosh");
phoneList[0].phone = strdup("9657681798");
You have allocated space for the strings, assigning independent locations for these member elements. Which is is more likely what you want to do.
Note that I used phonelist[0] since C has zero-based arrays.
printf("%s %s",phoneList[1].name,phoneList[1].phone);
The above statement invokes undefined behaviour.
Since the pointers name & phone of struct object phoneList[1] are not initialized dereferencing them invokes UB. In your case they are throwing out garbage values but it could have lead to a crash also.
To fit your case of reading the contents of file and storing it in the struct objects use getline function to read them row-wise(assuming that all the details are stored line-wise) and then dynamically allocate the memory for char pointers then assign them to the read value. But, this approach leads to lot of memory management which is error prone.
I have a structure with the following definition:
typedef struct myStruct{
int a;
char* c;
int f;
} OBJECT;
I am able to populate this object and write it to a file. However I am not able to read the char* c value in it...while trying to read it, it gives me a segmentation fault error. Is there anything wrong with my code:
//writensave.c
#include "mystruct.h"
#include <stdio.h>
#include <string.h>
#define p(x) printf(x)
int main()
{
p("Creating file to write...\n");
FILE* file = fopen("struct.dat", "w");
if(file == NULL)
{
printf("Error opening file\n");
return -1;
}
p("creating structure\n");
OBJECT* myObj = (OBJECT*)malloc(sizeof(OBJECT));
myObj->a = 20;
myObj->f = 45;
myObj->c = (char*)calloc(30, sizeof(char));
strcpy(myObj->c,
"This is a test");
p("Writing object to file...\n");
fwrite(myObj, sizeof(OBJECT), 1, file);
p("Close file\n");
fclose(file);
p("End of program\n");
return 0;
}
Here is how I am trying to read it:
//readnprint.c
#include "mystruct.h"
#include <stdio.h>
#define p(x) printf(x)
int main()
{
FILE* file = fopen("struct.dat", "r");
char* buffer;
buffer = (char*) malloc(sizeof(OBJECT));
if(file == NULL)
{
p("Error opening file");
return -1;
}
fread((void *)buffer, sizeof(OBJECT), 1, file);
OBJECT* obj = (OBJECT*)buffer;
printf("obj->a = %d\nobj->f = %d \nobj->c = %s",
obj->a,
obj->f,
obj->c);
fclose(file);
return 0;
}
When you write your object, you're writing the pointer value to the file instead of the pointed-to information.
What you need to do is not just fwrite/fread your whole structure, but rather do it a field at a time. fwrite the a and the f as you're doing with the object, but then you need to do something special with the string. Try fwrite/fread of the length (not represented in your data structure, that's fine) and then fwrite/fread the character buffer. On read you'll need to allocate that, of course.
Your first code sample seems to assume that the strings are going to be no larger than 30 characters. If this is the case, then the easiest fix is probably to re-define your structure like this:
typedef struct myStruct{
int a;
char c[30];
int f;
} OBJECT;
Otherwise, you're just storing a pointer to dynamically-allocated memory that will be destroyed when your program exits (so when you retrieve this pointer later, the address is worthless and most likely illegal to access).
You're saving a pointer to a char, not the string itself. When you try to reload the file you're running in a new process with a different address space and that pointer is no longer valid. You need to save the string by value instead.
I would like to add a note about a potential portability issue, which may or may not exist depending upon the planned use of the data file.
If the data file is to be shared between computers of different endian-ness, you will need to configure file-to-host and host-to-file converters for non-char types (int, short, long, long long, ...). Furthermore, it could be prudent to use the types from stdint.h (int16_t, int32_t, ...) instead to guarantee the size you want.
However, if the data file will not be moving around anywhere, then ignore these two points.
The char * field of your structure is known as a variable length field. When you write this field, you will need a method for determining the length of the text. Two popular methods are:
1. Writing Size First
2. Writing terminal character
Writing Size First
In this method, the size of the text data is written first, followed immediately by the data.
Advantages: Text can load quicker by block reads.
Disadvantages: Two reads required, extra space required for the length data.
Example code fragment:
struct My_Struct
{
char * text_field;
};
void Write_Text_Field(struct My_Struct * p_struct, FILE * output)
{
size_t text_length = strlen(p_struct->text_field);
fprintf(output, "%d\n", text_length);
fprintf(output, "%s", p_struct->text_field);
return;
}
void Read_Text_Field(struct My_STruct * p_struct, FILE * input)
{
size_t text_length = 0;
char * p_text = NULL;
fscanf(input, "%d", &text_length);
p_text = (char *) malloc(text_length + sizeof('\0'));
if (p_text)
{
fread(p_text, 1, text_length, input);
p_text[text_length] = '\0';
}
}
Writing terminal character
In this method the text data is written followed by a "terminal" character. Very similar to a C language string.
Advantages: Requires less space than Size First.
Disadvantages: Text must be read one byte at a time so terminal character is not missed.
Fixed size field
Instead of using a char* as a member, use a char [N], where N is the maximum size of the field.
Advantages: Fixed sized records can be read as blocks.
Makes random access in files easier.
Disadvantages: Waste of space if all the field space is not used.
Problems when the field size is too small.
When writing data structures to a file, you should consider using a database. There are small ones such as SQLite and bigger ones such as MySQL. Don't waste time writing and debugging permanent storage routines for your data when they have already been written and tested.