Generic SQLite Result Function? (C)

Generic SQLite Result Function? (C) - c

I'm programming server software in C, and I'll have around 8 functions that select data from an SQLite database and handle it in some way. I'm considering making one function to send any kind of SQLite SELECT command and return its results as a 2D array of pointers to integers and strings then having all the functions use that, but I'm worried that the performance cost might be so large that it's not worth the extra expandability. Reading ahead, is performance a valid concern here?
SQLite has no way of counting the rows of a result ahead of time, so I'd have to either run two queries or use a dynamically-allocated array. Not good! Here is what I have for my implementation of the latter option. I'm a newbie, so please tell me if there is anything slow, unclean, or (most importantly) unreliable about this before I commit to using it:
//emalloc and erealloc are just error-detecting malloc and realloc
ptrdiff_t** databaseSelect(char* command){ //returns null upon error
sqlite3_stmt *stmt;
int retval = -1; retval = sqlite3_prepare(db,command,sizeof(command),&stmt,NULL);
if(retval!=SQLITE_OK){
printf("Selecting data from database failed! Error: %s\n", sqlite3_errmsg(db));
return NULL;
}
int retRows = 10; //number of rows in returned array for now
ptrdiff_t** ret = emalloc(sizeof(ptrdiff_t*)*retRows); //make the array representing rows
int result = SQLITE_ROW;
int rowIndex = 0;
while (result != SQLITE_DONE && result != SQLITE_ERROR){
result = sqlite3_step (stmt);
if (result == SQLITE_ROW) { //begin row loop
if ((rowIndex+1)>retRows){ //must reallocate the outer (row) array
retRows*=2;
ret = erealloc(ret, sizeof(ptrdiff_t*)*retRows);
}
const int numCols = sqlite3_column_count(stmt);
ptrdiff_t* row = emalloc(sizeof(ptrdiff_t)*numCols); //make the array representing a row of columns
for (int colIndex = 0; colIndex<numCols; colIndex++){ //begin column loop
switch (sqlite3_column_type(stmt, colIndex)){
case SQLITE_INTEGER:{
const int col = sqlite3_column_int(stmt, colIndex);
memcpy(row[colIndex], &col, sizeof(col));
break;
}
case SQLITE_TEXT:{
const unsigned char* textResult = sqlite3_column_text(stmt, colIndex);
const size_t textSize = (strlen(textResult)+1)*sizeof(char); //edited from before
row[colIndex] = emalloc(textSize);
memcpy(row[colIndex], textResult, textSize);
break;
}
default:{
//perhaps warn about an error since it should either be integer or text
break;
}
}
} //end column loop
ret[rowIndex] = row; //add on the row to the array of rows
rowIndex++;
} //end row loop
}
sqlite3_free(stmt);
return ret;
}
… And then whatever calls this function has to loop through the 2D array and free everything once it's done with it.

For smallish results, performance does not matter.
Reallocating with a constant factor is one of the most efficient ways of creating a dynamc-sized array. Creating a linked would be faster, but not as easy to access later.
A cursor would be an object that allows access (only) to the current record, but this does not work if you need random access to all result records.
Please note that the most efficient way to determine a text value's length is sqlite3_column_bytes:
char *textResult = (char *)sqlite3_column_text(stmt, colIndex);
if (!textResult) textResult = ""; // handle NULL (if needed)
int textSize = sqlite3_column_bytes(stmt, colIndex) + 1;

Related

Can't transmate string through MariaDB connect/c Prepared Statement

I'm using "MariaDB Connector/C" for my homework, but I got a problem: I always get an empty string when I pass in a string parameter, the db table is:
MariaDB none#(none):test> SELECT * FROM t3
a
b
0
abc
1
bcd
2
af
3 rows in set
Time: 0.010s
MariaDB none#(none):test> DESC t3
Field
Type
Null
Key
Default
Extra
a
int(11)
NO
PRI
b
char(10)
YES
2 rows in set
Time: 0.011s
And the code I use to test:
#include <mysql/mysql.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
MYSQL *mysql;
mysql = mysql_init(NULL);
if (!mysql_real_connect(mysql,NULL , "none", "linux", "test", 0,"/tmp/mariadb.sock",0)){
printf( "Error connecting to database: %s",mysql_error(mysql));
} else
printf("Connected...\n");
if(mysql_real_query(mysql,"SET CHARACTER SET utf8",(unsigned int)sizeof("SET CHARACTER SET utf8"))){
printf("Failed to set Encode!\n");
}
char query_stmt_2[]="select * from t3 where b=?";
MYSQL_STMT *stmt2 = mysql_stmt_init(mysql);
if(mysql_stmt_prepare(stmt2, query_stmt_2, -1))
{
printf("STMT2 prepare failed.\n");
}
MYSQL_BIND instr_bind;
char instr[50]="abc";
my_bool in_is_null = 0;
my_bool in_error = 0;
instr_bind.buffer_type = MYSQL_TYPE_STRING;
instr_bind.buffer = &instr[0];
char in_ind = STMT_INDICATOR_NTS;
instr_bind.u.indicator = &in_ind;
unsigned long instr_len=sizeof(instr);
// instr_bind.length = &instr_len;
// instr_bind.buffer_length=instr_len;
instr_bind.is_null = &in_is_null;
instr_bind.error = &in_error;
MYSQL_BIND out_bind[2];
memset(out_bind, 0, sizeof(out_bind));
int out_int[2];
char outstr[50];
my_bool out_int_is_null[2]={0,0};
my_bool out_int_error[2]={0,0};
unsigned long out_int_length[2]={0,0};
out_bind[0].buffer = out_int+0;
out_bind[0].buffer_type = MYSQL_TYPE_LONG;
out_bind[0].is_null = out_int_is_null+0;
out_bind[0].error = out_int_error+0;
out_bind[0].length = out_int_length+0;
out_bind[1].buffer = outstr;
out_bind[1].buffer_type = MYSQL_TYPE_STRING;
out_bind[1].buffer_length = 50;
out_bind[1].is_null = out_int_is_null+1;
out_bind[1].error = out_int_error+1;
out_bind[1].length = out_int_length+1;
if(mysql_stmt_bind_param(stmt2, &instr_bind) ||
mysql_stmt_bind_result(stmt2, out_bind)){
printf("Bind error\n");
}
if(mysql_stmt_execute(stmt2))
{
printf("Exec error: %s",mysql_stmt_error(stmt2));
}
if(mysql_stmt_store_result(stmt2)){
printf("Store result error!\n");
printf("%s\n",mysql_stmt_error(stmt2));
}
while(!mysql_stmt_fetch(stmt2))
{
printf("%d\t%s\n", out_int[0], outstr);
}
mysql_stmt_close(stmt2);
end:
mysql_close(mysql);
}
I only got an empty result:
❯ ./Exec/test/stmt_test
Connected...
I have been in trouble with this for two days, and tomorrow is the deadline, I'm very anxious. Can you help? Thanks a lot!

1) General
Avoid "it was hard to write, so it should be hard to read" code
add variable declarations at the beginning of the function, not in the middle of code (Wdeclaration-after-statement)
don't use c++ comments in C
set character set with api function mysql_set_character_set()
write proper error handling, including mysql_error/mysql_stmt_error results and don't continue executing subsequent code after error.
always initialize MYSQL_BIND
2) input bind buffer
u.indicator is used for bulk operations and doesn't make sense here
bind.is_null is not required, since you specified a valid buffer address
buffer_length is not set (in comments)
3) Output bind buffer
Always bind output parameters after mysql_stmt_execute(), since mysql_stmt_prepare can't always determine the number of parameters, e.g. when calling a stored procedure: In this case mysql_stmt_bind_param will return an error.
binding an error indicator doesn't make much sense without setting MYSQL_REPORT_DATA_TRUNCATION (mysql_optionsv)
For some examples how to deal with prepared statements check the file ps.c of MariaDB Connector/C unit tests

how to charge a map in Csfml

I am making a game in CSFML for the purpose of a school exercise
In order to fit all the requirement I must a game who follows the rule of a finite runne suc as geometry dash. It does everything except a major feature: get a map from a file that will be like:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXX2XXXXXXXXX2XXXXXXXXXXXXXXE
111111111111111111111111111111111111
X representing nothing (they will be a background that is displayed)
2 is spike
1 is the ground
E is the end, it will stop the program after displaying a victory screen
(each character will be replace by the texture who is assigned except X who represent empty space)
I had only access to few C functions ( write, free, malloc, rand, open, read,getline)
I was thinking about just reading the file and storing it as a char*, but the thing is I don't know how to make spikes appear on the screen one by one, when they must.

You need to choose a size for all of your blocks.
Each blocks (X, 2, 1, E) need to have the same size.
Example (with block 64*64px)
void display_map(char **map)
{
char *image = NULL;
int size_block = 64;
for (int i = 0; map[i] != NULL; i++) {
for (int j = 0; map[i][j] != '\0'; j++) {
switch (map[i][j]) {
case 'X':
image = "nothing";
break;
case '2':
image = "pike";
break;
// ....
}
display_at_position(i * size_block, j * size_block, image);
}
}
}

Strategy for cycling trough preexisting set of variables in c

I’m trying to program a HMI console to read a file from an USB pen drive and display its data on the screen. This is a csv file and the objective is to store the interpreted data to HMI console memory, which the HMI console later interprets. The macros on these consoles run in C (not C++).
I have no issue with both reading and interpreting the file, the issue that the existing function (not accessible to me, shown below) to write in the console memory only interprets char.
int WriteLocal( const char *type, int addr, int nRegs, void *buf , int flag );
Parameter: type is the string of "LW","LB" etc;
address is the Operation address ;
nRegs is the length of read or write ;
buf is the buffer which store the reading or writing data
flag is 0,then codetype is BIN,is 1 then codetype is BCD;
return value : 1 , Operation success
0 , Operation fail.
As my luck would have it I need to write integer values. What are available to me are the variables for each memory position. These are preexisting and are named individually such as:
int WR_LW200;
int WR_LW202;
int WR_LW204;
...
int WR_LW20n;
Ideally we could have a vector with all the names of the variables but unfortunately this is not possible. I could manually write every single variable but I need to do 300 of these…
must be a better way, right?

Just to give you a look on how it ended up looking:
int* arr[50][5] = { {&WR_LW200, &WR_LW400, &WR_LW600, &WR_LW800, &WR_LW1000},
{&WR_LW202, &WR_LW402, &WR_LW602, &WR_LW802, &WR_LW1002},
{&WR_LW204, &WR_LW404, &WR_LW604, &WR_LW804, &WR_LW1004},
{&WR_LW206, &WR_LW406, &WR_LW606, &WR_LW806, &WR_LW1006},
{&WR_LW208, &WR_LW408, &WR_LW608, &WR_LW808, &WR_LW1008},
{&WR_LW210, &WR_LW410, &WR_LW610, &WR_LW810, &WR_LW1010},
{&WR_LW212, &WR_LW412, &WR_LW612, &WR_LW812, &WR_LW1012},
{&WR_LW214, &WR_LW414, &WR_LW614, &WR_LW814, &WR_LW1014},
{&WR_LW216, &WR_LW416, &WR_LW616, &WR_LW816, &WR_LW1016},
{&WR_LW218, &WR_LW418, &WR_LW618, &WR_LW818, &WR_LW1018},
{&WR_LW220, &WR_LW420, &WR_LW620, &WR_LW820, &WR_LW1020},
{&WR_LW222, &WR_LW422, &WR_LW622, &WR_LW822, &WR_LW1022},
{&WR_LW224, &WR_LW424, &WR_LW624, &WR_LW824, &WR_LW1024},
{&WR_LW226, &WR_LW426, &WR_LW626, &WR_LW826, &WR_LW1026},
{&WR_LW228, &WR_LW428, &WR_LW628, &WR_LW828, &WR_LW1028},
{&WR_LW230, &WR_LW430, &WR_LW630, &WR_LW830, &WR_LW1030},
{&WR_LW232, &WR_LW432, &WR_LW632, &WR_LW832, &WR_LW1032},
{&WR_LW234, &WR_LW434, &WR_LW634, &WR_LW834, &WR_LW1034},
{&WR_LW236, &WR_LW436, &WR_LW636, &WR_LW836, &WR_LW1036},
{&WR_LW238, &WR_LW438, &WR_LW638, &WR_LW838, &WR_LW1038},
{&WR_LW240, &WR_LW440, &WR_LW640, &WR_LW840, &WR_LW1040},
{&WR_LW242, &WR_LW442, &WR_LW642, &WR_LW842, &WR_LW1042},
{&WR_LW244, &WR_LW444, &WR_LW644, &WR_LW844, &WR_LW1044},
{&WR_LW246, &WR_LW446, &WR_LW646, &WR_LW846, &WR_LW1046},
{&WR_LW248, &WR_LW448, &WR_LW648, &WR_LW848, &WR_LW1048},
{&WR_LW250, &WR_LW450, &WR_LW650, &WR_LW850, &WR_LW1050},
{&WR_LW252, &WR_LW452, &WR_LW652, &WR_LW852, &WR_LW1052},
{&WR_LW254, &WR_LW454, &WR_LW654, &WR_LW854, &WR_LW1054},
{&WR_LW256, &WR_LW456, &WR_LW656, &WR_LW856, &WR_LW1056},
{&WR_LW258, &WR_LW458, &WR_LW658, &WR_LW858, &WR_LW1058},
{&WR_LW260, &WR_LW460, &WR_LW660, &WR_LW860, &WR_LW1060},
{&WR_LW262, &WR_LW462, &WR_LW662, &WR_LW862, &WR_LW1062},
{&WR_LW264, &WR_LW464, &WR_LW664, &WR_LW864, &WR_LW1064},
{&WR_LW266, &WR_LW466, &WR_LW666, &WR_LW866, &WR_LW1066},
{&WR_LW268, &WR_LW468, &WR_LW668, &WR_LW868, &WR_LW1068},
{&WR_LW270, &WR_LW470, &WR_LW670, &WR_LW870, &WR_LW1070},
{&WR_LW272, &WR_LW472, &WR_LW672, &WR_LW872, &WR_LW1072},
{&WR_LW274, &WR_LW474, &WR_LW674, &WR_LW874, &WR_LW1074},
{&WR_LW276, &WR_LW476, &WR_LW676, &WR_LW876, &WR_LW1076},
{&WR_LW278, &WR_LW478, &WR_LW678, &WR_LW878, &WR_LW1078},
{&WR_LW280, &WR_LW480, &WR_LW680, &WR_LW880, &WR_LW1080},
{&WR_LW282, &WR_LW482, &WR_LW682, &WR_LW882, &WR_LW1082},
{&WR_LW284, &WR_LW484, &WR_LW684, &WR_LW884, &WR_LW1084},
{&WR_LW286, &WR_LW486, &WR_LW686, &WR_LW886, &WR_LW1086},
{&WR_LW288, &WR_LW488, &WR_LW688, &WR_LW888, &WR_LW1088},
{&WR_LW290, &WR_LW490, &WR_LW690, &WR_LW890, &WR_LW1090},
{&WR_LW292, &WR_LW492, &WR_LW692, &WR_LW892, &WR_LW1092},
{&WR_LW294, &WR_LW494, &WR_LW694, &WR_LW894, &WR_LW1094},
{&WR_LW296, &WR_LW496, &WR_LW696, &WR_LW896, &WR_LW1096},
{&WR_LW298, &WR_LW498, &WR_LW698, &WR_LW898, &WR_LW1098} };
Big right? I had consurns that this HMI would have issues with such an approach but it did the job. The code below runs trough a string that comes from the csv file. This code runs inside another while cycle to cycle trough the multi dimensional array.
it's a little crude but works.
while (i<=5)
{
memset(lineTemp, 0, sizeof lineTemp); // clear lineTemp array
while (lineFromFile[index] != delimiter)
{
if (lineFromFile[index] != delimiter && lineFromFile[index] != '\0') { lineTemp[j] = lineFromFile[index]; index++; j++; }
if (lineFromFile[index] == '\0') { i = 5; break; }
}
index++;
lineTemp[j] = '\0'; // NULL TERMINATION
j = 0;
if (i == -1) { WriteLocal("LW",temp,3,lineTemp,0); }
if (i >= 0 && i<=5) { *(arr[x][i]) = atoi(lineTemp); }
i++;
}
Thanks again for the tip.
Cheers

Two-dimensional char array too large exit code 139

Hey guys I'm attempting to read in workersinfo.txt and store it into a two-dimensional char array. The file is around 4,000,000 lines with around 100 characters per line. I want to store each file line on the array. Unfortunately, I get exit code 139(Not enough memory). I'm aware I have to use malloc() and free() but I've tried a couple of things and I haven't been able to make them work.Eventually I have to sort the array by ID number but I'm stuck on declaring the array.
The file looks something like this:
First Name, Last Name,Age, ID
Carlos,Lopez,,10568
Brad, Patterson,,20586
Zack, Morris,42,05689
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *ptr_file;
char workers[4000000][1000];
ptr_file =fopen("workersinfo.txt","r");
if (!ptr_file)
perror("Error");
int i = 0;
while (fgets(workers[i],1000, ptr_file)!=NULL){
i++;
}
int n;
for(n = 0; n < 4000000; n++)
{
printf("%s", workers[n]);
}
fclose(ptr_file);
return 0;
}

The Stack memory is limited. As you pointed out in your question, you MUST use malloc to allocate such a big (need I say HUGE) chunk of memory, as the stack cannot contain it.
you can use ulimit to review the limits of your system (usually including the stack size limit).
On my Mac, the limit is 8Mb. After running ulimit -a I get:
...
stack size (kbytes, -s) 8192
...
Or, test the limit using:
struct rlimit slim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur // the stack limit
I truly recommend you process each database entry separately.
As mentioned in the comments, assigning the memory as static memory would, in most implementations, circumvent the stack.
Still, IMHO, allocating 400MB of memory (or 4GB, depending which part of your question I look at), is bad form unless totally required - especially for a single function.
Follow-up Q1: How to deal with each DB entry separately
I hope I'm not doing your homework or anything... but I doubt your homework would include an assignment to load 400Mb of data to the computer's memory... so... to answer the question in your comment:
The following sketch of single entry processing isn't perfect - it's limited to 1Kb of data per entry (which I thought to be more then enough for such simple data).
Also, I didn't allow for UTF-8 encoding or anything like that (I followed the assumption that English would be used).
As you can see from the code, we read each line separately and perform error checks to check that the data is valid.
To sort the file by ID, you might consider either running two lines at a time (this would be a slow sort) and sorting them, or creating a sorted node tree with the ID data and the position of the line in the file (get the position before reading the line). Once you sorted the binary tree, you can sort the data...
... The binary tree might get a bit big. did you look up sorting algorithms?
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 1 on sucesss or 0 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return 0;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
return 0;
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 1;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
return 0;
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE* ptr_file;
ptr_file = fopen("workersinfo.txt", "r");
if (!ptr_file)
perror("File Error");
struct DBEntry line;
while (read_db_line(ptr_file, &line)) {
// do what you want with the data... print it?
printf(
"First name:\t%s\n"
"Last name:\t%s\n"
"Age:\t\t%s\n"
"ID:\t\t%s\n"
"--------\n",
line.first_name, line.last_name, line.age, line.id);
}
// close file
fclose(ptr_file);
return 0;
}
Followup Q2: Sorting array for 400MB-4GB of data
IMHO, 400MB is already touching on the issues related to big data. For example, implementing a bubble sort on your database should be agonizing as far as performance goes (unless it's a single time task, where performance might not matter).
Creating an Array of DBEntry objects will eventually get you a larger memory foot-print then the actual data..
This will not be the optimal way to sort large data.
The correct approach will depend on your sorting algorithm. Wikipedia has a decent primer on sorting algorythms.
Since we are handling a large amount of data, there are a few things to consider:
It would make sense to partition the work, so different threads/processes sort a different section of the data.
We will need to minimize IO to the hard drive (as it will slow the sorting significantly and prevent parallel processing on the same machine/disk).
One possible approach is to create a heap for a heap sort, but only storing a priority value and storing the original position in the file.
Another option would probably be to employ a divide and conquer algorithm, such as quicksort, again, only sorting a computed sort value and the entry's position in the original file.
Either way, writing a decent sorting method will be a complicated task, probably involving threading, forking, tempfiles or other techniques.
Here's a simplified demo code... it is far from optimized, but it demonstrates the idea of the binary sort-tree that holds the sorting value and the position of the data in the file.
Be aware that using this code will be both relatively slow (although not that slow) and memory intensive...
On the other hand, it will require about 24 bytes per entry. For 4 million entries, it's 96MB, somewhat better then 400Mb and definitely better then the 4GB.
#include <stdlib.h>
#include <stdio.h>
// assuming this is the file structure:
//
// First Name, Last Name,Age, ID
// Carlos,Lopez,,10568
// Brad, Patterson,,20586
// Zack, Morris,42,05689
//
// Then this might be your data structure per line:
struct DBEntry {
char* last_name; // a pointer to the last name
char* age; // a pointer to the name - could probably be an int
char* id; // a pointer to the ID
char first_name[1024]; // the actual buffer...
// I unified the first name and the buffer since the first name is first.
};
// this might be a sorting node for a sorted bin-tree:
struct SortNode {
struct SortNode* next; // a pointer to the next node
fpos_t position; // the DB entry's position in the file
long value; // The computed sorting value
}* top_sorting_node = NULL;
// this function will free all the memory used by the global Sorting tree
void clear_sort_heap(void) {
struct SortNode* node;
// as long as there is a first node...
while ((node = top_sorting_node)) {
// step forward.
top_sorting_node = top_sorting_node->next;
// free the original first node's memory
free(node);
}
}
// each time you read only a single line, perform an error check for overflow
// and return the parsed data.
//
// return 0 on sucesss or 1 on failure.
int read_db_line(FILE* fp, struct DBEntry* line) {
if (!fgets(line->first_name, 1024, fp))
return -1;
// parse data and review for possible overflow.
// first, zero out data
int pos = 0;
line->age = NULL;
line->id = NULL;
line->last_name = NULL;
// read each byte, looking for the EOL marker and the ',' seperators
while (pos < 1024) {
if (line->first_name[pos] == ',') {
// we encountered a devider. we should handle it.
// if the ID feild's location is already known, we have an excess comma.
if (line->id) {
fprintf(stderr, "Parsing error, invalid data - too many fields.\n");
clear_sort_heap();
exit(2);
}
// replace the comma with 0 (seperate the strings)
line->first_name[pos] = 0;
if (line->age)
line->id = line->first_name + pos + 1;
else if (line->last_name)
line->age = line->first_name + pos + 1;
else
line->last_name = line->first_name + pos + 1;
} else if (line->first_name[pos] == '\n') {
// we encountered a terminator. we should handle it.
if (line->id) {
// if we have the id string's possition (the start marker), this is a
// valid entry and we should process the data.
line->first_name[pos] = 0;
return 0;
} else {
// we reached an EOL without enough ',' seperators, this is an invalid
// line.
fprintf(stderr, "Parsing error, invalid data - not enough fields.\n");
clear_sort_heap();
exit(1);
}
}
pos++;
}
// we ran through all the data but there was no EOL marker...
fprintf(stderr,
"Parsing error, invalid data (data overflow or data too large).\n");
return 0;
}
// read and sort a single line from the database.
// return 0 if there was no data to sort. return 1 if data was read and sorted.
int sort_line(FILE* fp) {
// allocate the memory for the node - use calloc for zero-out data
struct SortNode* node = calloc(sizeof(*node), 1);
// store the position on file
fgetpos(fp, &node->position);
// use a stack allocated DBEntry for processing
struct DBEntry line;
// check that the read succeeded (read_db_line will return -1 on error)
if (read_db_line(fp, &line)) {
// free the node's memory
free(node);
// return no data (0)
return 0;
}
// compute sorting value - I'll assume all IDs are numbers up to long size.
sscanf(line.id, "%ld", &node->value);
// heap sort?
// This is a questionable sort algorythm... or a questionable implementation.
// Also, I'll be using pointers to pointers, so it might be a headache to read
// (it's a headache to write, too...) ;-)
struct SortNode** tmp = &top_sorting_node;
// move up the list until we encounter something we're smaller then us,
// OR untill the list is finished.
while (*tmp && (*tmp)->value <= node->value)
tmp = &((*tmp)->next);
// update the node's `next` value.
node->next = *tmp;
// inject the new node into the tree at the position we found
*tmp = node;
// return 1 (data was read and sorted)
return 1;
}
// writes the next line in the sorting
int write_line(FILE* to, FILE* from) {
struct SortNode* node = top_sorting_node;
if (!node) // are we done? top_sorting_node == NULL ?
return 0; // return 0 - no data to write
// step top_sorting_node forward
top_sorting_node = top_sorting_node->next;
// read data from one file to the other
fsetpos(from, &node->position);
char* buffer = NULL;
ssize_t length;
size_t buff_size = 0;
length = getline(&buffer, &buff_size, from);
if (length <= 0) {
perror("Line Copy Error - Couldn't read data");
return 0;
}
fwrite(buffer, 1, length, to);
free(buffer); // getline allocates memory that we're incharge of freeing.
return 1;
}
// the main program
int main(int argc, char const* argv[]) {
// open file
FILE *fp_read, *fp_write;
fp_read = fopen("workersinfo.txt", "r");
fp_write = fopen("sorted_workersinfo.txt", "w+");
if (!fp_read) {
perror("File Error");
goto cleanup;
}
if (!fp_write) {
perror("File Error");
goto cleanup;
}
printf("\nSorting");
while (sort_line(fp_read))
printf(".");
// write all sorted data to a new file
printf("\n\nWriting sorted data");
while (write_line(fp_write, fp_read))
printf(".");
// clean up - close files and make sure the sorting tree is cleared
cleanup:
printf("\n");
fclose(fp_read);
fclose(fp_write);
clear_sort_heap();
return 0;
}

LZW encoding for large file

I am building an LZW encoding algorithm, which uses dictionary and hashing so it can reach fast enough for working words already stored in a dictionary.
The algorithm gives proper results when ran on smaller files (cca few hundreds of symbols), but on the larger files (and especially in those files which contain of less different symbols - for example, it gives the worst performance when ran on a file which consists only of 1 symbol, 'y' let's say). The worst performance, in terms that it just crashes when dictionary is not even close to being full. However, when the large input file consists of more than 1 symbol, dictionary gets close to being full, approximately 90%, but again then it crashes.
Considering the structure of my algorithm, I am not quite sure what is causing it to crash in general, or crash so soon when large file of just 1 symbol is given.
It must be something about hashing (first time doing it, so it might have some bugs).
The hash function I am using can be found here, and from what I have tested it, it gives good results: oat_hash
LZW encoding algorithm is based on this link, with slight change, that it works until the dictionary is not full: LZW encoder
Let's get into code:
Note: oat_hash is changed so it returns value % CAPACITY, so every index is from DICTIONARY
// Globals
#define CAPACITY 100000
char *DICTIONARY[CAPACITY];
unsigned short CODES[CAPACITY]; // CODES and DICTIONARY are linked via index: word from dictionary on index i, has its code in CODES on index i
int position = 0;
int code_counter = 0;
void encode(FILE *input, FILE *output){
int succ1 = fseek(input, 0, SEEK_SET);
if(succ1 != 0) printf("Error: file not open!");
int succ2 = fseek(output, 0, SEEK_SET);
if(succ2 != 0) printf("Error: file not open!");
//1. Working word = next symbol from the input
char *working_word = malloc(2048*sizeof(char));
char new_symbol = getc(input);
working_word[0] = new_symbol;
working_word[1] = '\0';
//2. WHILE(there are more symbols on the input) DO
//3. NewSymbol = next symbol from the input
while((new_symbol = getc(input)) != EOF){
char *workingWord_and_newSymbol= NULL;
char newSymbol[2];
newSymbol[0] = new_symbol;
newSymbol[1] = '\0';
workingWord_and_newSymbol = working_word_and_new_symbol(working_word, newSymbol);
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
//4. IF(WorkingWord + NewSymbol) is already in the dictionary THEN
if(DICTIONARY[index] != NULL){
// 5. WorkingWord += NewSymbol
working_word = working_word_and_new_symbol(working_word, newSymbol);
}
//6. ELSE
else{
//7. OUTPUT: code for WorkingWord
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
}
//10. END IF
}
//11. END WHILE
//12. OUTPUT: code for WorkingWord
int index = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[index]);
free(working_word);
}

int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
And later
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
idx and index are unbounded and you use them to access a bounded array. You're accessing memory out of range. Here's a suggestion, but it may skew the distribution. If your hash range is much larger than CAPACITY it shouldn't be a problem. But you also have another problem which was mentioned, collisions, you need to handle them. But that's a different problem.
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol)) % CAPACITY;
// and
int idx = oat_hash(working_word, strlen(working_word)) % CAPACITY;

LZW compression is certainly used to construct binary files and normally is capable of reading binary files.
The following code is problematic as it relies on new_symbol never being a \0.
newSymbol[0] = new_symbol; newSymbol[1] = '\0';
strlen(workingWord_and_newSymbol)
strdup(newSymbol)
Needs re-write to work with arrays of bytes rather than strings.
fopen() was not shown. Insure one is opening in binary. input = fopen(..., "rb");
#Wumpus Q. Wumbley is correct, use int newSymbol.
Minor:
new_symbol and newSymbol are confusing.
Consider:
// char *working_word = malloc(2048*sizeof(char));
#define WORKING_WORD_N (2048)
char *working_word = malloc(WORKING_WORD_N*sizeof(*working_word));
// or
char *working_word = malloc(WORKING_WORD_N);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Generic SQLite Result Function? (C) - c

Related

Can't transmate string through MariaDB connect/c Prepared Statement

how to charge a map in Csfml

Strategy for cycling trough preexisting set of variables in c

Two-dimensional char array too large exit code 139

LZW encoding for large file

Categories

Resources