I'm making a program that reads a text file composed by strings, each one on a line. Basically I do this:
...
char* name;
char* buffer = malloc(sizeof(char) * SIZE); //size is a defined constant in the header
while(fgets(buffer, SIZE, pf)){ //pf is the opened stream
name = malloc(sizeof(char) * SIZE);
strcpy(name, strtok(buffer, "\n"));
manipulate(name); //call an extern function
}
Function manipulate is declared in this manner:
void manipulate(void* ptr);
The problem is that in this way two equal strings will have different memory addresses so they will recognized as two different elements from manipulate function.
How can I make them recognized as a single element?
Store the strings in a set, a data type which stores no repeated values and is fast to search. Basically it's a hash table where the key is the string and the value doesn't matter.
You can write your own hash table, it's a good exercise, but for production you're better off using an existing one like from GLib. It already has convenience methods for using a hash table as a set. While we're at it, we can use their g_strchomp() and g_strdup().
#include <stdio.h>
#include <glib.h>
int main () {
// Initialize our set of strings.
GHashTable *set = g_hash_table_new(g_str_hash, g_str_equal);
// Allocate a line buffer on the stack.
char line[1024];
// Read lines from stdin.
while(fgets(line, sizeof(line), stdin)) {
// Strip the newline.
g_strchomp(line);
// Look up the string in the set.
char *string = g_hash_table_lookup(set, line);
if( string == NULL ) {
// Haven't seen this string before.
// Copy it, using only the memory we need.
string = g_strdup(line);
// Add it to the set.
g_hash_table_add(set, string);
}
printf("%p - %s\n", string, string);
}
}
And here's a quick demonstration.
$ ./test
foo
0x60200000bd90 - foo
foo
0x60200000bd90 - foo
bar
0x60200000bd70 - bar
baz
0x60200000bd50 - baz
aldskflkajd
0x60200000bd30 - aldskflkajd
aldskflkajd
0x60200000bd30 - aldskflkajd
If you indeed have two strings then they necessarily have different addresses, regardless of whether their contents are the same. It sounds like you want to keep track of the strings you've already read, so as to avoid / merge duplicates. That starts with the "keeping track" part.
Evidently, then, you need some kind of data structure in which to record the strings you've already read. You have many choices for that, and they have different advantages and disadvantages. If the number of distinct strings you'll need to handle is relatively small then a simple array or linked list could suffice, but if it is large enough then a hash table will provide much better performance.
With that in hand, you check each newly-read string against the previously read ones and act accordingly.
Related
So I'm supposed to do the sorting algorithm as a CS homework.
It should read arbitrary number of words each ending with '\n'. After it reads the '.', it should print the words in alphabetical order.
E.g.:
INPUT:
apple
dog
austria
Apple
OUTPUT:
Apple
apple
Austria
dog
I want to store the words into a struct. I think that in order to work it for arbitrary number of words I should make the array of structs.
So far I've tried to create a typedef struct with only one member (string) and I planned to make the array of structs from that, into which I would then store each of the words.
As for the "randomness" of the number of words, I wanted to set the struct type in main after finding out how many words had been written and then store each word into each element of the struct array.
My problem is:
1. I don't know how to find out the number of words. The only thing I tried was making a function which counts how many times the '\n' occured, though it didn't work very good.
as for the datastructure, I've came up with struct having only one string member:
typedef struct{
char string[MAX];
}sort;
then in main function I firstly read a number of words to come (not the actual assignment but only for purposes of making the code work)
and after having the "len" I declared the variable of type sort:
int main(){
/*code, scanf("%d", &len) and stuff*/
sort sort_t[len];
for(i = 0; i < len; i++){
scanf("%s", sort_t[i].string);
}
Question: Is such thing "legal" and do I use a good approach?
Q2: How do I get to know the number of words to store (for the array of structs) before I start storing them?
IMHO the idea of reserving the same maximal storage for each and every string is a bit wasteful. You are probably better off sticking to dynamic NUL-terminated strings as usually done in C code. This is what the C library supports best.
As for managing an unknown number of strings, you have a choice. Possibility 1 is to use a linked list as mentioned by Xavier. Probably the most elegant solution, but it could be time-consuming to debug, and ultimately you have to convert it to an array in order to use one of the common sort algorithms.
Possibility 2 is to use something akin to a C++ std::vector object. Say the task of allocating storage is delegated to some "bag" object. Code dealing with the "bag" has a monopoly on calling the realloc() function mentioned by Vlad. Your main function only calls bag_create() and bag_put(bag, string). This is less elegant but probably easier to get right.
As your focus is to be on your sorting algorithm, I would rather suggest using approach #2. You could use the code snippet below as a starting point.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
typedef struct {
size_t capacity;
size_t usedSlotCount;
char** storage;
} StringBag;
StringBag* bag_create()
{
size_t initialSize = 4; /* start small */
StringBag* myBag = malloc(sizeof(StringBag));
myBag->capacity = initialSize;
myBag->usedSlotCount = 0;
myBag->storage = (char**)malloc(initialSize*sizeof(char*));
return myBag;
}
void bag_put(StringBag* myBag, char* str)
{
if (myBag->capacity == myBag->usedSlotCount) {
/* have to grow storage */
size_t oldCapacity = myBag->capacity;
size_t newCapacity = 2 * oldCapacity;
myBag->storage = realloc(myBag->storage, newCapacity*sizeof(char*));
if (NULL == myBag->storage) {
fprintf(stderr, "Out of memory while reallocating\n");
exit(1);
}
fprintf(stderr, "Growing capacity to %lu\n", (unsigned long)newCapacity);
myBag->capacity = newCapacity;
}
/* save string to new allocated memory, as this */
/* allows the caller to always use the same static storage to house str */
char* str2 = malloc(1+strlen(str));
strcpy(str2, str);
myBag->storage[myBag->usedSlotCount] = str2;
myBag->usedSlotCount++;
}
static char inputLine[4096];
int main()
{
StringBag* myBag = bag_create();
/* read input data */
while(scanf("%s", inputLine) != EOF) {
if (0 == strcmp(".", inputLine))
break;
bag_put(myBag, inputLine);
}
/* TODO: sort myBag->storage and print the sorted array */
}
I am trying to place some text into a structure part of my array is a array which takes part of the text.
For example my structure is:
struct animal
{
char animal_Type[11];
int age;
int numberOfLegs;
int walksPerDay;
char favoriteFood[];
};
I will then have input such as:
dog,2,4,2,biscuits,wet
cat,5,4,0,biscuits,wet,dry,whiskers
bird,1,2,0,birdseed,biscuits,bread,oats,worms,insects,crackers
I have a working solution that places all the values up to walks per day into the structure, however I want to be able to place the food items into Favorite food. I have a dynamic array for this, but i'm not sure how to read remaining text into the favoriteFood array.
The code used is:
fp = open("animals.txt","r");
struct animal *animal = malloc(sizeof(sturct animal)*3);
int i = 0;
if(fp != NULL) {
while(i < 3) {
fscanf(fp,"%s %d %d %d %s",
animal[i].animal_Type,
animal[i].age,
animal[i].numberOfLegs,
animal[i].walksPerDay,
animal[i].favoriteFood); // need to be able to enter the string of food into here
i++
}
How would I go about doing this?
First of, your struct doesn't match what you've said in the comments.
char favoriteFood[];
The above is an array of char, so couldn't possibly hold a list of favourite foods except if it were one string. And since the size of the array is unspecified, you'd not be able to fill it like you have been either. Instead what you actually want is
char **favoriteFood;
unsigned int favoriteFoodSize;
That will let you create an expanding list of strings to fit whatever data you need to accommodate.
As for reading it in, the best way would be to read the entire line in using fgets and then use something like strtok to break the line up by your separator character. First define a very large string to hold the entire line and a char * to hold each field.
char buffer[1024];
char *token;
And then to the main loop would be something like this:
while(fgets(buffer,1024,fp)) {
token=strtok(buffer,",");
strcpy(beasts[i].animal_Type,token);
token=strtok(NULL,",");
beasts[i].age = atoi(token);
/* etc... */
}
You'd need to check whether token is ever NULL to cope with the possibility of short lines and handle it accordingly. And also make sure that the string copied into animal_Type isn't longer than 10 characters...or alternative make it a char * so you can have any size of string.
For the favoriteFood, you'll need to use realloc to increase the size of it to accommodate each new food added and keep going through the string until you run out of tokens.
token=strtok(NULL,",");
if(token) {
beasts[i].favoriteFood=malloc(sizeof(char *));
beasts[i].favoriteFood[0]=strdup(token); // Need to index using 0 as favoriteFoodSize won't have a value yet
beasts[i].favoriteFoodSize=1;
token=strtok(NULL,",");
while(token) {
beasts[i].favoriteFood=realloc(beasts[i].favoriteFood,(beasts[i].favoriteFoodSize+1)*sizeof(char *));
beasts[i].favoriteFood[beasts[i].favoriteFoodSize]=strdup(token);
beasts[i].favoriteFoodSize++;
token=strtok(NULL,",");
}
}
The last food will have a \n in it as fgets keeps it in the buffer it reads, so you could use that to tell if you've finished processing all the foods (you will also need to remove it from the last food). Or if you don't have it, you know the line was longer and you'll need to read more in. But that seems unlikely based on your sample data.
And since you're doing lots of memory allocation, you should ensure that you check the values returned to make sure you've not run out of memory.
OK beginning C programmer here. What I'm attempting to do is create a function that populates a linked list from a text file. What I've done so far is to use fgets() and strtok() to iterate through the text file and I'm trying to load the tokenized strings into a function to populate the linked list. First off, when I use strtok, how do I capture the tokenized strings into char arrays or strings? So far, I've tried something like this:
char catID[ID_LEN+1];
char drinkType[1];
char itemName[MAX_NAME_LEN + 1];
while((fgets(line, sizeof(line), menufile)) != NULL) {
token = strtok(line, "|");
strcpy(data,strdup(token));
addCatNode(menu, catID);
printf("%s\n", catID);
i++;
while(token){
if(token)
{
strcpy(drinkType,strdup(token));
addNodeItem(&menu, drinkType);
strcpy(itemName,strdup(token));
addNodeItem(&menu, itemName);
token = strtok(NULL, "|");
}
}
}
but somehow I don't think that's the right approach. And of course when I try and load the data into the addNodeItem() function, whose prototype I've written like this:
void addNodeItem(BCSType* menu, char *nodeitem);
and try and add the item using this notation:
category->nodeitem
the compiler tells me that there is no member named 'nodeitem' in the struct. Of course there isn't, but I'm trying to load the name from the strtok() part, so how do I get the addNodeItem() function to recognize the name that I'm trying to pass into it? Very confused here.
The "category" struct in the linked list looks like this:
typedef struct category
{
char categoryID[ID_LEN + 1];
char categoryName[MAX_NAME_LEN + 1];
char drinkType; /* (H)ot or (C)old. */
char categoryDescription[MAX_DESC_LEN + 1];
CategoryTypePtr nextCategory;
ItemTypePtr headItem;
unsigned numItems;
} CategoryType;
There are several issues here, but for starters, strcpy(drinkType, strdup(token)) doesn't make a lot of sense.
token is a pointer to a portion of your input string, with the '|' separator replaced by a NULL.
strdup allocates strlen(token) worth of memory and copies the contents of token. So far so good. It returns the address of the new memory, which you don't store anywhere, so you can never free() it. This is a leak, and you could eventually run out of memory.
strcpy(drinkType,strdup(token)) copies that new memory into the memory pointed to by drinkType. Thats only 1 character. I don't know if that is big enough. Neither do you, since there could be anything in the file you are loading. This is a bug waiting to happen.
And then it seems like the addNodeItem() function is missing something. You are passing a value that represents one of the possible values in the structure, with no way of specifying which one. You would have better luck creating a local copy of CategoryType, assigning all the information from the tokenizer, and then copying the whole thing into a new node.
outline of algorithm:
while lines:
CategoryType newCat
tokenize line
copy tokens into correct members of newCat, using `strncpy` to ensure no overruns.
add newCat to linked List.
For the last step, you need to duplicate newCat before adding it, because it will be overwritten when you process the next line. You can either pass a copy to AddNodeItem, or have AddNodeItem make the copy. (There are other options, but these are probably most straightforward.)
I want to use libjudy to build some data structures to store information keyed on fixed length byte arrays, which means that I need to use the JudyHS structures. Based on my understanding of the code and the documentation, a key will only be able to access an element consisting of a single machine word, which would be fine since I want to save a pointer to a struct allocated on the heap; however, there is a problem in that there appears to be no way to iterate over the previously stored elements, and the macro used to deallocate the structure (JHSFA), calls free() on the memory used to store the word of data, but provides no mechanism to allow the calling code to deallocate memory that the word points to. I verified that JHSFA doesn't deallocate user supplied memory using valgrind and the following example code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <judy.h>
int
main(
const int argc,
const char *argv[]
)
{
Pvoid_t table = (PWord_t)NULL;
const size_t allocSize = sizeof("bar") + 1;
char *bar = calloc(1, allocSize);
strncpy(bar, "bar", allocSize);
uint64_t key = UINT32_MAX + 1;
PWord_t entry;
JHSI(entry, table, &key, sizeof(key));
*entry = (Word_t)bar;
entry = NULL;
JHSG(entry, table, &key, sizeof(key));
if (!strncmp(bar, (const char *)(*entry), allocSize)) {
printf("match\n");
}
else {
printf("no match\n");
}
Word_t result;
JHSFA(result, table);
}
Given that this is the case, can some other libjudy user out there point me to a way to avoid a memory leak if this data structure is the only place where I store the data?
JudyHS can't be iterated over. Therefor the typical solution of looping over the entries and freeing the values isn't directly possible. Your options are:
1) Use JudySL and restrict your keys to not having a NULL byte in them. If you need NULL bytes, then you might also consider converting the keys to an escaped format (i.e. one where the NULL byte is a multi-byte escape sequence.)
2) Combine the JudyHS with another ADT that you can iterate over. This ranges in complexity depending on your use case. If you add and remove things from the JudyHS, then its doubtful this can be done in a way that is more efficient than 1). If you only add entires, then a simple linked list, array of pointers, or JudyL will work.
I'm trying to split a char* to an array of char* in C.
I'm used to program in Java / PHP OO. I know several easy way to do that in these languages but in C... I'm totally lost. I often have segfault for hours x)
I'm using TinyXML and getting info from XML File.
Here's the struct where we find the array.
const int MAX_GATES = 64;
typedef struct {
char *name;
char *firstname;
char *date;
char *id;
char *gates[MAX_GATES];
} UserInfos;
And here's where I fill this struct :
UserInfos * infos = (UserInfos*)malloc(1024);
infos->firstname = (char*)malloc(256);
infos->name = (char*)malloc(128);
infos->id = (char*)malloc(128);
infos->date = (char*)malloc(128);
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
sprintf(infos->name, "%s", card->FirstChild("name")->FirstChild()->Value());
sprintf(infos->date, "%s", card->FirstChild("date")->FirstChild()->Value());
sprintf(infos->id, "%s", card->FirstChild("filename")->FirstChild()->Value());
////////////////////////
// Gates
char * gates = (char*) card->FirstChild("gates")->FirstChild()->Value();
//////////////////////////
The only problem is on 'gates'.
The input form XML looks like "gate1/gate2/gate3" or just blank sometimes.
I want gate1 to be in infos->gates[0] ; etc.
I want to be able to list the gates array afterwards..
I always have a segfault when I try.
Btw, I don't really now how to initialize this array of pointers. I always initialize all gates[i] to NULL but It seems that I've a segfault when I do
for(int i=0;i
Thanks for all.
It's OK when I've only pointers but when String(char*) / Arrays / Pointers are mixed.. I can't manage =P
I saw too that we can use something like
int *myArray = calloc(NbOfRows, NbOfRows*sizeof(int));
Why should we declare an array like that.. ? x)
Thanks!
The problem that people frequently have with XML is that they assume all the elements are available. That's not always safe. Thus this statement:
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
Isn't safe to do because you don't actually know if all of those
functions actually return valid objects. You really need something
like the following (which is not optimized for speed, as I don't
know the tinyXML structure name being returned at each point and thus
am not storing the results once and am rather calling each function
multiple times:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
And then, to protect against buffer overflows from the data you should
really be doing:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
infos->firstname[sizeof(infos->firstname)-1] = '\0';
snprintf(infos->firstname, sizeof(infos->firstname)-1, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
Don't you just love error handling?
As to your other question:
I saw too that we can use something like int *myArray =
calloc(NbOfRows, NbOfRows*sizeof(int)); Why should we declare an array
like that.. ? x)
calloc first initializes the resulting memory to 0, unlike malloc.
If you see above where I set the end of the buffer to '\0' (which is
actually 0), that's because malloc returns a buffer with potentially
random (non-zero) data in it. calloc will first set the entire buffer
to all 0s first, which can be generally safer.