So I have the following project i need to work on for school. It involves a server/client communicating where the client sends requests to the server to get certain information. The server gets the req, parses it, and then sends a response based on the type of the request. For example:
GET /apple/fn/tom/ln/sawyer/a/25/id/1234 : This is a request to get the info for the following person who works at the Apple company (/apple):
fn (first name): Tom
ln (last name): Sawyer
a (age): 25
id (ID): 1234
Now the server should accept input, parse it, and return the information requested from its own database. implementing the server/client is not an issue. I need to know the best way to implement the algorithm to deal with input since not all requests would look like the one above. Other examples of requests:
GET /apple/i: This should return info about Apple company (i.e. address, phone number)
GET /apple/e: return number of employees in Apple company
GET /apple/e/id/1234: return info of employee in Apple company with the following id=1234 (which in our example would be Tom Sawyer) i.e. return first name, last name, age, address.
GET /apple/fn/tom/ln/sawyer/a/25/id/1234 : discussed above
SET /apple/fn/tom/ln/sawyer/a/25/id/5678 : update id of this employee to 5678
SET /apple/fn/tom/ln/sawyer/a/23 : update age of this employee to 23
...
I will have to implement different structs for req/response (i.e. a struct for req and another for response) as well as a different function for each of the different request/responses. BUT what is the best way to deal with parsing input && decide which function to send it to? I was told to look into using a parser-generator like Bison but to my understanding this would only help parse the input and break it up into pieces which is not that hard for me since I know I always have the "/" between fields so I can use the function:
strtok( input, "/" );
So main issue I have is how to decide where to send each request. So Assuming I have the following functions:
struct GetEmployeeInfoReq
{
char *fn;
char *ln;
int age;
};
struct GetEmployeeInfoResp
{
int house_num;
int street_num;
char *street_name;
char * postal_code;
int years_worked_here;
};
void GetEmployeeInfo( struct GetEmployeeInfoResp *resp, struct GetEmployeeInfoReq *req );
struct GetCompanyInfoReq
{
...
}
struct GetCompanyInfoResp
{
...
}
void GetCompanyInfo( struct GetCompanyInfoResp *resp, struct GetCompanyInfoReq *req );
Now I know that to call the first function I need the following request:
GET /apple/fn/tom/ln/sawyer/a/25/id/1234
and to call 2nd function I need the following:
GET /apple/i
My question is how to get this done? Off the top of my mind I'm thinking defining a variable for each possible field in the input req and using that so if this is my request:
GET /apple/e/id/1234
then I would have the following values defined and set to true:
bool is_apple = true;
bool is_employee = true;
bool is_id = true;
After I know that this request is:
GET /apple/e/id/<id_num>
AND NOT
GET /apple/e
so i can send it to the proper function.
Is this the correct approach as I'm lost on how to tackle this issue.
Thanks,
Get yourself a large piece of paper and make a diagram about the logic to get a grammar (not actually neede here, you could just parse it, but being a school assignment I assume that it is meant to be build up upon).
Some observations
every input starts with a /
every entry ends with a / except the last one
entries consist of characters and/or digits
empty entries gets you in trouble if you insist on using strtok (thanks to wildplasser, I missed that)
order matters?
Entries are
either key (one entry) or key/value (two consecutive entries).
In other words: the may or may not have an argument
Entries with a meaning
first entry is the company name (what do you do if that's all? Check assignment)
next entry is either
i print company info
e
without arguments: print #employees
with argument: print information about argument, argument must be correct
fn first name as a key, must have a value because of strtok, argument must be a string
ln last name as a key, must have a value because of strtok, argument must be a string
id id as a key, must have a value because of strtok, argument must be a string of digits (an integer so to say but it is still a string at that point) only
a age as a key, must have a value because of strtok, argument must be a string of digits (an integer so to say but it is still a string at that point) only
Examples:
/apple/e
/ start of input
apple company name. Must have an argument, so go on
e something with employers, my or may not have an argument so check.
EOI (end of input) that means that e has no arguments, so print the number of employees.
/apple/id/134
/ start of input
apple company name. Must have an argument, so go on
e something with employers, my or may not have an argument so check.
id is a key and must have a value, that means we need an argument, so check
1234 all digits which is correct value for id
EOI (end of input) no other things, so print the information you have about id-1234
/apple/fn/tom/ln/sawyer/a/25/id/1234
/ start of input
apple company name. Must have an argument, so go on
fn is a key, must have a value, check
tom the value for fn must be a string which it is, safe
ln is a key, must have a value, check
sawyer the value for ln must be a string which it is, safe
a is a key, must have a value, check
25 the value for ln must be a string of digits which it is, safe
id is a key, must have a value, check
1234 the value for ln must be a string of digits which it is, safe
EOI (end of input). The key for the database entry seems to be the name fn and ln, so read the entry tomsawyer and check if any of id or a is different and change accordingly. If nothing is different: check your assignment.
It might be good idea to build a struct with all of the informations and fill it while parsing (where I wrote "safe"). You need two main functions printIt(keys) and changeIt(key, value). Some things can be done immediately, like in my first example, some need further ado.
That's it, should be straightforward to implement.
EDIT a short example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// ALL CHECKS OMMITTED!
#define DELIMITER '/'
// should resemble a row from the DB
typedef struct company {
char *comp_name;
unsigned int num_empl;
unsigned int empl_id;
unsigned int empl_age;
char *first_name;
char *last_name;
} company;
int main(int argc, char **argv)
{
company *entries;
char *input, *token;
size_t ilen;
if (argc < 2) {
fprintf(stderr, "Usage: %s stringtoparse \n", argv[0]);
exit(EXIT_FAILURE);
}
// work on copy
ilen = strlen(argv[1]);
input = malloc(ilen + 1);
strcpy(input, argv[1]);
entries = malloc(sizeof(company));
// skip first delimiter
if (*input == DELIMITER) {
input++;
ilen--;
}
token = strtok(input, "/");
if (token == NULL) {
fprintf(stderr, "Usage : %s stringtoparse \n", argv[0]);
exit(EXIT_FAILURE);
}
// first entry is the company name
entries->comp_name = malloc(strlen(token));
strcpy(entries->comp_name, token);
// mark empty entries as empty
entries->first_name = NULL;
entries->last_name = NULL;
entries->empl_age = -1;
entries->empl_id = -1;
// F(23)
entries->num_empl = 28657;
// only very small part of grammar implemented for simplicity
for (;;) {
token = strtok(NULL, "/");
if (token == NULL) {
break;
}
// "e" [ "/" "id" "/" number <<EOF>> ]
if (strcmp(token, "e") == 0) {
token = strtok(NULL, "/");
if (token == NULL) {
puts("Info about number of employees wanted\n");
// pure info, pull from DB (not impl.) and stop
break;
} else {
if (strcmp(token, "id") != 0) {
fprintf(stderr, "Only \"id\" allowed after \"e\" \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
token = strtok(NULL, "/");
if (token == NULL) {
fprintf(stderr, "ERROR: \"id\" needs a number \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
// does not check if it really is a number, use strtol() in prod.
entries->empl_id = atoi(token);
printf("Info about employee with id %d wanted\n", entries->empl_id);
// pure info, pull from DB (not impl.) and stop
break;
}
}
// "a" "/" number
else if (strcmp(token, "a") == 0) {
token = strtok(NULL, "/");
if (token == NULL) {
fprintf(stderr, "ERROR: \"a\" needs a number \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
// does not check if it actually is a number, use strtol() in prod.
entries->empl_age = atoi(token);
printf("Age given: %d\n", entries->empl_age);
}
// "id" "/" number
else if (strcmp(token, "id") == 0) {
token = strtok(NULL, "/");
if (token == NULL) {
fprintf(stderr, "ERROR: \"id\" needs a number \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
// does not check if it actually is a number, use strtol() in prod.
entries->empl_id = atoi(token);
printf("ID given: %d\n", entries->empl_id);
}
// "fn" "/" string
else if (strcmp(token, "fn") == 0) {
token = strtok(NULL, "/");
if (token == NULL) {
fprintf(stderr, "ERROR: \"fn\" needs a string \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
entries->first_name = malloc(strlen(token));
strcpy(entries->first_name, token);
printf("first name given: %s\n", entries->first_name);
}
// "ln" "/" string
else if (strcmp(token, "ln") == 0) {
token = strtok(NULL, "/");
if (token == NULL) {
fprintf(stderr, "ERROR: \"ln\" needs a string \n");
// free all heap memory here
exit(EXIT_FAILURE);
}
entries->last_name = malloc(strlen(token));
strcpy(entries->last_name, token);
printf("last name given: %s\n", entries->last_name);
} else {
fprintf(stderr, "ERROR: Unknown token \"%s\" \n", token);
// free all heap memory here
exit(EXIT_FAILURE);
}
}
printf("\n\nEntries:\nCompany name: %s\nFirst name: %s\nLast name: %s\n",
entries->comp_name, entries->first_name, entries->last_name);
printf("Age: %d\nID: %d\nNumber of employees: %d\n",
entries->empl_age, entries->empl_id, entries->num_empl);
/*
* At this state you have information about what is given (in "entries")
* and what is wanted.
*
* Connect to the DB.
*
* If firstnamelastname is the DB-id and in the DB, you can check if
* the given ID is the same as the one in the DB and change if not.
*
* You can do the same for age.
*
* If firstnamelastname is not in the DB but ID is given check if the
* ID is in the DB, change firstname and/or lastname if necessary and
* congratulate on the wedding (many other reasons are possible, please
* check first or it might get really embarassing)
*/
// free all heap memory here
/* Disconnect from the DB */
exit(EXIT_SUCCESS);
}
Compiled with:
gcc -g3 -std=c11 -W -Wall -pedantic jjadams.c -o jjadams
try with
./jjadams "/Apple/fn/Tom/ln/Sawyer/a/10/id/3628800"
./jjadams "/Apple/e"
./jjadams "/Apple/e/id/1234"
./jjadams "/Apple/e/1234"
Related
Hey guys I'm attempting to read in workersinfo.txt and store it into a two-dimensional char array. The file is around 4,000,000 lines with around 100 characters per line. I want to store each file line on the array. Unfortunately, I get exit code 139(Not enough memory). I'm aware I have to use malloc() and free() but I've tried a couple of things and I haven't been able to make them work.Eventually I have to sort the array by ID number but I'm stuck on declaring the array.
The file looks something like this:
First Name, Last Name,Age, ID
Carlos,Lopez,,10568
Brad, Patterson,,20586
Zack, Morris,42,05689
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *ptr_file;
char workers[4000000][1000];
ptr_file =fopen("workersinfo.txt","r");
if (!ptr_file)
perror("Error");
int i = 0;
while (fgets(workers[i],1000, ptr_file)!=NULL){
i++;
}
int n;
for(n = 0; n < 4000000; n++)
{
printf("%s", workers[n]);
}
fclose(ptr_file);
return 0;
}
The Stack memory is limited. As you pointed out in your question, you MUST use malloc to allocate such a big (need I say HUGE) chunk of memory, as the stack cannot contain it.
you can use ulimit to review the limits of your system (usually including the stack size limit).
On my Mac, the limit is 8Mb. After running ulimit -a I get:
...
stack size (kbytes, -s) 8192
...
Or, test the limit using:
struct rlimit slim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur // the stack limit
I truly recommend you process each database entry separately.
As mentioned in the comments, assigning the memory as static memory would, in most implementations, circumvent the stack.
Still, IMHO, allocating 400MB of memory (or 4GB, depending which part of your question I look at), is bad form unless totally required - especially for a single function.
Follow-up Q1: How to deal with each DB entry separately
I hope I'm not doing your homework or anything... but I doubt your homework would include an assignment to load 400Mb of data to the computer's memory... so... to answer the question in your comment:
The following sketch of single entry processing isn't perfect - it's limited to 1Kb of data per entry (which I thought to be more then enough for such simple data).
Also, I didn't allow for UTF-8 encoding or anything like that (I followed the assumption that English would be used).
As you can see from the code, we read each line separately and perform error checks to check that the data is valid.
To sort the file by ID, you might consider either running two lines at a time (this would be a slow sort) and sorting them, or creating a sorted node tree with the ID data and the position of the line in the file (get the position before reading the line). Once you sorted the binary tree, you can sort the data...
... The binary tree might get a bit big. did you look up sorting algorithms?
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 1 on sucesss or 0 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return 0;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
return 0;
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 1;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
return 0;
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE* ptr_file;
ptr_file = fopen("workersinfo.txt", "r");
if (!ptr_file)
perror("File Error");
struct DBEntry line;
while (read_db_line(ptr_file, &line)) {
// do what you want with the data... print it?
printf(
"First name:\t%s\n"
"Last name:\t%s\n"
"Age:\t\t%s\n"
"ID:\t\t%s\n"
"--------\n",
line.first_name, line.last_name, line.age, line.id);
}
// close file
fclose(ptr_file);
return 0;
}
Followup Q2: Sorting array for 400MB-4GB of data
IMHO, 400MB is already touching on the issues related to big data. For example, implementing a bubble sort on your database should be agonizing as far as performance goes (unless it's a single time task, where performance might not matter).
Creating an Array of DBEntry objects will eventually get you a larger memory foot-print then the actual data..
This will not be the optimal way to sort large data.
The correct approach will depend on your sorting algorithm. Wikipedia has a decent primer on sorting algorythms.
Since we are handling a large amount of data, there are a few things to consider:
It would make sense to partition the work, so different threads/processes sort a different section of the data.
We will need to minimize IO to the hard drive (as it will slow the sorting significantly and prevent parallel processing on the same machine/disk).
One possible approach is to create a heap for a heap sort, but only storing a priority value and storing the original position in the file.
Another option would probably be to employ a divide and conquer algorithm, such as quicksort, again, only sorting a computed sort value and the entry's position in the original file.
Either way, writing a decent sorting method will be a complicated task, probably involving threading, forking, tempfiles or other techniques.
Here's a simplified demo code... it is far from optimized, but it demonstrates the idea of the binary sort-tree that holds the sorting value and the position of the data in the file.
Be aware that using this code will be both relatively slow (although not that slow) and memory intensive...
On the other hand, it will require about 24 bytes per entry. For 4 million entries, it's 96MB, somewhat better then 400Mb and definitely better then the 4GB.
#include <stdlib.h>
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// this might be a sorting node for a sorted bin-tree:
struct SortNode {
struct SortNode* next; // a pointer to the next node
fpos_t position; // the DB entry's position in the file
long value; // The computed sorting value
}* top_sorting_node = NULL;
// this function will free all the memory used by the global Sorting tree
void clear_sort_heap(void) {
struct SortNode* node;
// as long as there is a first node...
while ((node = top_sorting_node)) {
// step forward.
top_sorting_node = top_sorting_node->next;
// free the original first node's memory
free(node);
}
}
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 0 on sucesss or 1 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return -1;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
clear_sort_heap();
exit(2);
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 0;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
clear_sort_heap();
exit(1);
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// read and sort a single line from the database.
// return 0 if there was no data to sort. return 1 if data was read and sorted.
int sort_line(FILE* fp) {
// allocate the memory for the node - use calloc for zero-out data
struct SortNode* node = calloc(sizeof(*node), 1);
// store the position on file
fgetpos(fp, &node->position);
// use a stack allocated DBEntry for processing
struct DBEntry line;
// check that the read succeeded (read_db_line will return -1 on error)
if (read_db_line(fp, &line)) {
// free the node's memory
free(node);
// return no data (0)
return 0;
}
// compute sorting value - I'll assume all IDs are numbers up to long size.
sscanf(line.id, "%ld", &node->value);
// heap sort?
// This is a questionable sort algorythm... or a questionable implementation.
// Also, I'll be using pointers to pointers, so it might be a headache to read
// (it's a headache to write, too...) ;-)
struct SortNode** tmp = &top_sorting_node;
// move up the list until we encounter something we're smaller then us,
// OR untill the list is finished.
while (*tmp && (*tmp)->value <= node->value)
tmp = &((*tmp)->next);
// update the node's `next` value.
node->next = *tmp;
// inject the new node into the tree at the position we found
*tmp = node;
// return 1 (data was read and sorted)
return 1;
}
// writes the next line in the sorting
int write_line(FILE* to, FILE* from) {
struct SortNode* node = top_sorting_node;
if (!node) // are we done? top_sorting_node == NULL ?
return 0; // return 0 - no data to write
// step top_sorting_node forward
top_sorting_node = top_sorting_node->next;
// read data from one file to the other
fsetpos(from, &node->position);
char* buffer = NULL;
ssize_t length;
size_t buff_size = 0;
length = getline(&buffer, &buff_size, from);
if (length <= 0) {
perror("Line Copy Error - Couldn't read data");
return 0;
}
fwrite(buffer, 1, length, to);
free(buffer); // getline allocates memory that we're incharge of freeing.
return 1;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE *fp_read, *fp_write;
fp_read = fopen("workersinfo.txt", "r");
fp_write = fopen("sorted_workersinfo.txt", "w+");
if (!fp_read) {
perror("File Error");
goto cleanup;
}
if (!fp_write) {
perror("File Error");
goto cleanup;
}
printf("\nSorting");
while (sort_line(fp_read))
printf(".");
// write all sorted data to a new file
printf("\n\nWriting sorted data");
while (write_line(fp_write, fp_read))
printf(".");
// clean up - close files and make sure the sorting tree is cleared
cleanup:
printf("\n");
fclose(fp_read);
fclose(fp_write);
clear_sort_heap();
return 0;
}
I am writing a clone of find while learning C. When implementing the -ls option I've stumbled upon a problem that the getpwuid_r and getgrgid_r calls are really slow, the same applies to getpwuid and getgrgid. I need them to display the user/group names from ids provided by stat.h.
For example, listing the whole filesystem gets 3x slower:
# measurements were made 3 times and the fastest run was recorded
# with getgrgid_r
time ./myfind / -ls > list.txt
real 0m4.618s
user 0m1.848s
sys 0m2.744s
# getgrgid_r replaced with 'return "user";'
time ./myfind / -ls > list.txt
real 0m1.437s
user 0m0.572s
sys 0m0.832s
I wonder how GNU find maintains such a good speed. I've seen the sources, but they are not exactly easy to understand and to apply without special types, macros etc:
time find / -ls > list.txt
real 0m1.544s
user 0m0.884s
sys 0m0.648s
I thought about caching the uid - username and gid - groupname pairs in a data structure. Is it a good idea? How would you implement it?
You can find my complete code here.
UPDATE:
The solution was exactly what I was looking for:
time ./myfind / -ls > list.txt
real 0m1.480s
user 0m0.696s
sys 0m0.736s
Here is a version based on getgrgid (if you don't require thread safety):
char *do_get_group(struct stat attr) {
struct group *grp;
static unsigned int cache_gid = UINT_MAX;
static char *cache_gr_name = NULL;
/* skip getgrgid if we have the record in cache */
if (cache_gid == attr.st_gid) {
return cache_gr_name;
}
/* clear the cache */
cache_gid = UINT_MAX;
grp = getgrgid(attr.st_gid);
if (!grp) {
/*
* the group is not found or getgrgid failed,
* return the gid as a string then;
* an unsigned int needs 10 chars
*/
char group[11];
if (snprintf(group, 11, "%u", attr.st_gid) < 0) {
fprintf(stderr, "%s: snprintf(): %s\n", program, strerror(errno));
return "";
}
return group;
}
cache_gid = grp->gr_gid;
cache_gr_name = grp->gr_name;
return grp->gr_name;
}
getpwuid:
char *do_get_user(struct stat attr) {
struct passwd *pwd;
static unsigned int cache_uid = UINT_MAX;
static char *cache_pw_name = NULL;
/* skip getpwuid if we have the record in cache */
if (cache_uid == attr.st_uid) {
return cache_pw_name;
}
/* clear the cache */
cache_uid = UINT_MAX;
pwd = getpwuid(attr.st_uid);
if (!pwd) {
/*
* the user is not found or getpwuid failed,
* return the uid as a string then;
* an unsigned int needs 10 chars
*/
char user[11];
if (snprintf(user, 11, "%u", attr.st_uid) < 0) {
fprintf(stderr, "%s: snprintf(): %s\n", program, strerror(errno));
return "";
}
return user;
}
cache_uid = pwd->pw_uid;
cache_pw_name = pwd->pw_name;
return pwd->pw_name;
}
UPDATE 2:
Changed long to unsigned int.
UPDATE 3:
Added the cache clearing. It is absolutely necessary, because pwd->pw_name may point to a static area. getpwuid can overwrite its contents if it fails or simply when executed somewhere else in the program.
Also removed strdup. Since the output of getgrgid and getpwuid should not be freed, there is no need to require free for our wrapper functions.
The timings indeed indicate a strong suspicion on these functions.
Looking at your function do_get_group, there are some issues:
You use sysconf(_SC_GETPW_R_SIZE_MAX); for every call to do_get_group and do_get_user, definitely cache that, it will not change during the lifetime of your program, but you will not gain much.
You use attr.st_uid instead of attr.st_gid, which probably causes the lookup to fail for many files, possibly defeating the cacheing mechanism, if any. Fix this first, this is a bug!
You return values that should not be passed to free() by the caller, such as grp->gr_name and "". You should always allocate the string you return. The same issue is probably present in do_get_user().
Here is a replacement for do_get_group with a one shot cache. See if this improves the performance:
/*
* #brief returns the groupname or gid, if group not present on the system
*
* #param attr the entry attributes from lstat
*
* #returns the groupname if getgrgid() worked, otherwise gid, as a string
*/
char *do_get_group(struct stat attr) {
char *group;
struct group grp;
struct group *result;
static size_t length = 0;
static char *buffer = NULL;
static gid_t cache_gid = -1;
static char *cache_gr_name = NULL;
if (!length) {
/* only allocate the buffer once */
long sysconf_length = sysconf(_SC_GETPW_R_SIZE_MAX);
if (sysconf_length == -1) {
sysconf_length = 16384;
}
length = (size_t)sysconf_length;
buffer = calloc(length, 1);
}
if (!buffer) {
fprintf(stderr, "%s: malloc(): %s\n", program, strerror(errno));
return strdup("");
}
/* check the cache */
if (cache_gid == attr.st_gid) {
return strdup(cache_gr_name);
}
/* empty the cache */
cache_gid = -1;
free(cache_gr_name);
cache_gr_name = NULL;
if (getgrgid_r(attr.st_gid, &grp, buffer, length, &result) != 0) {
fprintf(stderr, "%s: getpwuid_r(): %s\n", program, strerror(errno));
return strdup("");
}
if (result) {
group = grp.gr_name;
} else {
group = buffer;
if (snprintf(group, length, "%ld", (long)attr.st_gid) < 0) {
fprintf(stderr, "%s: snprintf(): %s\n", program, strerror(errno));
return strdup("");
}
}
/* load the cache */
cache_gid = attr.st_gid;
cache_gr_name = strdup(group);
return strdup(group);
}
Whether the getpwuid and getgrgid calls are cached depends on how they are implemented and on how your system is configured. I recently wrote an implementation of ls and ran into a similar problem.
I found that on all modern systems I tested, the two functions are uncached unless you run the name service caching dæmon (nscd) in which case nscd makes sure that the cache stays up to date. It's easy to understand why this happens: Without an nscd, caching the information could lead to outdated output which is a violation of the specification.
I don't think you should rely on these functions caching the group and passwd databases because they often don't. I implemented custom caching code for this purpose. If you don't require to have up-to-date information in case the database contents change during program execution, this is perfectly fine to do.
You can find my implementation of such a cache here. I'm not going to publish it on Stack Overflow as I do not desire to publish the code under the MIT license.
This question has been asked multiple times, but I've done (from what I can tell) everything that's been mentioned here. Basically, I'm getting 1 character at a time from a TCP socket and I'm building a dynamically growing string with the one character. I can do looping prints and see that the string grows and grows, and then when it gets to 20 characters long, the program crashes.
while(FilterAmount != 0)
{
g_io_channel_read_chars (source,(gchar *) ScanLine,1,&BytesRead,&GlibError);
printf("Scanline: %s FilterAmount: %ld\n", ScanLine, FilterAmount);
//the filters are delimited by \n, munch these off, reset important variables, save the last filter which is complete
if(ScanLine[0] == FilterTerminator[0]) {
//if the Filter Name actually has a filter in it
if(FilterName != NULL){
FilterArray = FilterName; //save off the filter name
printf("This is the filter name just added: %s\n", FilterName);
FilterArray++; //increment the pointer to point to the next memory location.
FilterAmount--; //update how many filters we have left
FilterNameCount = 0; //reset how many characters each filter name is
free(FilterName);
free(FilterTmp);
}
}
else {
printf("else!\n");
//keep track of the string length of the filter
FilterNameCount++;
//allocate more memory in the string used to store the filter name + null terminating character
FilterTmp = (gchar*)realloc(FilterName, FilterNameCount*sizeof(char) + 1);
if(FilterTmp == NULL)
{
free(FilterName);
printf("Error reallocating memory for the filter name temporary variable!");
return 1;
}
FilterName = FilterTmp;
printf("filter name: %s\n", FilterName);
//concat the character to the end of the string where space was just made for it.
strcat(FilterName, ScanLine);
}
}
}
This section of code loops and loops whenever we have a non "\n" character in a buffer we're reading data into. The program crashes when allocating the 21st character's location every single time. Here are the pertinent declarations:
static gchar *FilterName = NULL, *FilterTmp = NULL;
static gchar ScanLine[9640];
I have an array of structs and they get saved into a file. Currently there are two lines in the file:
a a 1
b b 2
I am trying to read in the file and have the data saved to the struct:
typedef struct book{
char number[11];//10 numbers
char first[21]; //20 char first/last name
char last[21];
} info;
info info1[500]
into num = 0;
pRead = fopen("phone_book.dat", "r");
if ( pRead == NULL ){
printf("\nFile cannot be opened\n");
}
else{
while ( !feof(pRead) ) {
fscanf(pRead, "%s%s%s", info1[num].first, info1[num].last, info1[num].number);
printf{"%s%s%s",info1[num].first, info1[num].last, info1[num].number); //this prints statement works fine
num++;
}
}
//if I add a print statement after all that I get windows directory and junk code.
This makes me think that the items are not being saved into the struct. Any help would be great. Thanks!
EDIT: Okay so it does save it fine but when I pass it to my function it gives me garbage code.
When I call it:
sho(num, book);
My show function:
void sho (int nume, info* info2){
printf("\n\n\nfirst after passed= %s\n\n\n", info2[0].first); //i put 0 to see the first entry
}
I think you meant int num = 0;, instead of into.
printf{... is a syntax error, printf(... instead.
Check the result of fscanf, if it isn't 3 it hasn't read all 3 strings.
Don't use (f)scanf to read strings, at least not without specifying the maximum length:
fscanf(pRead, "%10s%20s%20s", ...);
But, better yet, use fgets instead:
fgets(info1[num].first, sizeof info1[num].first, pRead);
fgets(info1[num].last, sizeof info1[num].last, pRead);
fgets(info1[num].number, sizeof info1[num].number, pRead);
(and check the result of fgets, of course)
Make sure num doesn't go higher than 499, or you'll overflow info:
while(num < 500 && !feof(pRead)){.
1.-For better error handling, recommend using fgets(), using widths in your sscanf(), validating sscanf() results.
2.-OP usage of feof(pRead) is easy to misuse - suggest fgets().
char buffer[sizeof(info)*2];
while ((n < 500) && (fgets(buffer, sizeof buffer, pRead) != NULL)) {
char sentinel; // look for extra trailing non-whitespace.
if (sscanf(buffer, "%20s%20s%10s %c", info1[num].first,
info1[num].last, info1[num].number, &sentinel) != 3) {
// Handle_Error
printf("Error <%s>\n",buffer);
continue;
}
printf("%s %s %s\n", info1[num].first, info1[num].last, info1[num].number);
num++;
}
BTW: using %s does not work well should a space exists within a first name or within a last name.
I'm doing my final project for my algorithms course in C. For the project, we have to take an input text file that contains lines like:
P|A|0
or
E|0|1|2
The former indicates a vertex to be added to the graph we're using in the program, the 2nd token being the name of the vertex, and the last token being its index in the vertices[] array of the graph struct.
I've got a while loop going through this program line by line, it takes the first token to decide whether to make a vertex or an edge, and then proceeds accordingly.
When I finish the file traversal, I call my show_vertices function, which is just a for-loop that prints each name (g->vertices[i].name) sequentially.
The problem is that where the name should go in the output (%s), I keep getting the last "token1" I collected. In the case of the particular input file I'm using it happens to be the source node of the last edge in the list...which is odd because there are two other values passed through the strtok() function afterward. The line in the file looks like:
E|6|7|1
which creates an edge from indexes 6 to 7 with a weight of 1. The edge comes up fine. But when I call any printf with a %s, it comes up "6". Regardless.
This is the file traversal.
fgets(currLn, sizeof(currLn), infile);
maxv = atoi(currLn);
if(maxv = 0)
{
//file not formatted correctly, print error message
return;
}
t_graph *g = new_graph(maxv, TRUE);
while((fgets(currLn, sizeof(currLn), infile)) != NULL)
{
token1 = strtok(currLn, "|");
key = token1[0];
if(key == 'P' || key == 'p')
{
token1 = strtok(NULL, "|");
if(!add_vertex(g, token1))
{
//file integration fail, throw error!
return;
}
//***If I print the name here, it works fine and gives me the right name!****
continue;
}
if(key == 'E' || key == 'e')
{
token1 = strtok(NULL, "|");
token2 = strtok(NULL, "|");
token3 = strtok(NULL, "|");
src = atoi(token1);
dst = atoi(token2);
w = atoi(token3);
if(!add_edge(g, src, dst, w))
{
//file integration fail, throw error
return;
}
continue;
}
else
{
//epic error message because user doesn't know what they're doing.
return;
}
}
If I run show_vertices here, I get:
0. 6
1. 6
2. 6
etc...
You aren't copying the name. So you end up with a pointer (returned by strtok) to single static array in which you read each line. Since the name is always at offset 2, it that pointer will always be currLn+2. When you traverse and print, that will be the last name you read.
You need to strdup(token1) before passing it to (or in) add_vertex.
No there isn't enough information to be certain this is the answer. But I'll bet money this is it.