Optimization for large strings C/C++

Optimization for large strings C/C++ - c

I'm looking for a way to optimize my implementation. Basically this is a "reduce"-like (from Map Reduce framework) function. It takes a key and its values. The goal is to check all the values if they are distinct and output them in a form of an list: value1;value2;value3;...valuen; as a string. n can be very large (in 1000s)
void unique(char *key, int keybytes, char *multivalue, int nvalues,
int *valuebytes, KeyValue *kv, void *ptr) {
char * value = NULL;
char * elem[nvalues];
int i, j, cx;
char adj[3858905] = "";
Big problem is that I have to specify char adj[] length for every input and I don't know ahead how big a number of values is. (That takes huge amount of memory)
for (i = 0; i < nvalues; i++) {
if (i == 0) {
value = multivalue;
} else {
value = multivalue + valuebytes[i - 1];
multivalue = multivalue + valuebytes[i - 1];
}
elem[i] = value;
}
size_t elem_length = sizeof(elem)/sizeof(char *);
qsort(elem, elem_length, sizeof(char *), cstring_cmp);
cx = sprintf(adj, "%s;", elem[0]);
j = 0;
for (i = 1; i < nvalues; i++) {
bool matching = false;
if (!strcmp(elem[i], elem[j]))
matching = true;
j++;
if (!matching) //{;}
cx += snprintf(adj + cx, 3858905 - cx - 1, "%s;", elem[i]);
}
adj is an output string - list of values.
kv->add(key, keybytes, adj, strlen(adj) + 1); //this outputs key-value pairs.
}
I have to use C/C++ only though.

Try to use Huffman Codification. It's a complex thing and old, but I think that's efficient. I don't know if there's new or/and better algorithms to do that.
http://www.cprogramming.com/tutorial/computersciencetheory/huffman.html
http://en.wikipedia.org/wiki/Huffman_coding

struct node {
int value;
struct node *next;
};
i suggest to use linked list to store all the values and then convert it into string...
you can keep count of number of stored values in linked list and using that calculate the string length...and then allocate enough memory using malloc().....
and later on..while more values are added to list you can modify the memory allocated using calloc()....
i dont know if its what you exactly wanted....but it looks feasible to me

Related

How can I correctly allocate memory for this MergeSort implementation in C (with the DS I am using)?

My goal here is to perform MergeSort on a dynamic array-like data structure I called a dictionary used to store strings and their relative weights. Sorry if the implementation is dumb, I'm a student and still learning.
Anyway, based on the segfaults I'm getting, I'm incorrectly allocating memory for my structs of type item to be copied over into the temporary lists I'm making. Not sure how to fix this. Code for mergesort and data structure setup is below, any help is appreciated.
/////// DICTIONARY METHODS ////////
typedef struct {
char *item;
int weight;
} item;
typedef struct {
item **wordlist;
//track size of dictionary
int size;
} dict;
//dict constructor
dict* Dict(int count){
//allocate space for dictionary
dict* D = malloc(sizeof(dict));
//allocate space for words
D->wordlist = malloc(sizeof(item*) * count);
//initial size
D->size = 0;
return D;
}
//word constructor
item* Item(char str[]){
//allocate memory for struct
item* W = malloc(sizeof(item));
//allocate memory for string
W->item = malloc(sizeof(char) * strlen(str));
W->weight = 0;
return W;
}
void merge(dict* D, int start, int middle, int stop){
//create ints to track lengths of left and right of array
int leftlen = middle - start + 1;
int rightlen = stop - middle;
//create new temporary dicts to store the two sides of the array
dict* L = Dict(leftlen);
dict* R = Dict(rightlen);
int i, j, k;
//copy elements start through middle into left dict- this gives a segfault
for (int i = 0; i < leftlen; i++){
L->wordlist[i] = malloc(sizeof(item*));
L->wordlist[i] = D->wordlist[start + i];
}
//copy elements middle through end into right dict- this gives a segfault
for (int j = 0; j < rightlen; j++){
R->wordlist[j] = malloc(sizeof(item*));
R->wordlist[j]= D->wordlist[middle + 1 + k];
}
i = 0;
j = 0;
k = leftlen;
while ((i < leftlen) && (j < rightlen)){
if (strcmp(L->wordlist[i]->item, R->wordlist[j]->item) <= 0) {
D->wordlist[k] = L->wordlist[i];
i++;
k++;
}
else{
D->wordlist[k] = R->wordlist[j];
j++;
k++;
}
}
while (i < leftlen){
D->wordlist[k] = L->wordlist[i];
i++;
k++;
}
while (j < rightlen){
D->wordlist[k] = L->wordlist[j];
j++;
k++;
}
}
void mergeSort(dict* D, int start, int stop){
if (start < stop) {
int middle = start + (stop - start) / 2;
mergeSort(D, start, middle);
mergeSort(D, middle + 1, stop);
merge(D, start, middle, stop);
}
I put print statements everywhere and narrowed it down to the mallocs in the section where I copy the dictionary to be sorted into 2 separate dictionaries. Also tried writing that malloc as malloc(sizeof(D->wordlist[start + i])). Is there something else I need to do to be able to copy the item struct into the wordlist of the new struct?
Again, I'm new to this, so cut me some slack :)

There are numerous errors in the code:
In merge() when copying elements to the R list, the wrong (and uninitialized) index variable k is being used instead of j. R->wordlist[j]= D->wordlist[middle + 1 + k]; should be R->wordlist[j]= D->wordlist[middle + 1 + j];.
In merge() before merging the L and R lists back to D, the index variable k for the D list is being initialized to the wrong value. k = leftLen; should be k = start;.
In merge() in the loop that should copy the remaining elements of the "right" list to D, the elements are being copied from the "left" list instead of the "right" list. D->wordlist[k] = L->wordlist[j]; should be D->wordlist[k] = R->wordlist[j];.
In Item(), the malloc() call is not reserving space for the null terminator at the end of the string. W->item = malloc(sizeof(char) * strlen(str)); should be W->item = malloc(sizeof(char) * (strlen(str) + 1)); (and since sizeof(char) is 1 by definition it can be simplified to W->item = malloc(strlen(str) + 1);).
Item() is not copying the string to the allocated memory. Add strcpy(W->item, str);.
There are memory leaks in merge():
L->wordlist[i] = malloc(sizeof(item*)); is not required and can be removed since L->wordlist[i] is changed on the very next line: L->wordlist[i] = D->wordlist[start + i];.
Similarly, R->wordlist[j] = malloc(sizeof(item*)); is not required and can be removed since R->wordlist[j] is changed on the very next line.
L and R memory is created but never destroyed. Add these lines to the end of merge() to free them:
free(L->wordlist);
free(L);
free(R->wordlist);
free(R);
None of the malloc() calls are checked for success.

Allocate it all at once, before the merge sort even starts.
#include <stdlib.h>
#include <string.h>
// Weighted Word --------------------------------------------------------------
//
typedef struct {
char *word;
int weight;
} weighted_word;
// Create a weighted word
//
weighted_word* CreateWeightedWord(const char *str, int weight){
weighted_word* W = malloc(sizeof(weighted_word));
if (W){
W->word = malloc(strlen(str) + 1); // string length + nul terminator
if (W->word)
strcpy( W->word, str);
W->weight = weight;
}
return W;
}
// Free a weighted word
//
weighted_word *FreeWeightedWord(weighted_word *W){
if (W){
if (W->word)
free(W->word);
free(W);
}
return NULL;
}
// Dictionary (of Weighted Words) ---------------------------------------------
//
typedef struct {
weighted_word **wordlist; // this is a pointer to an array of (weighted_word *)s
int size; // current number of elements in use
int capacity; // maximum number of elements available to use
} dict;
// Create a dictionary with a fixed capacity
//
dict* CreateDict(int capacity){
dict* D = malloc(sizeof(dict));
if (D){
D->wordlist = malloc(sizeof(weighted_word*) * capacity);
D->size = 0;
D->capacity = capacity;
}
return D;
}
// Free a dictionary (and all weighted words)
//
dict *FreeDict(dict *D){
if (D){
for (int n = 0; n < D->size; n++)
FreeWeightedWord(D->wordlist[n]);
free(D->wordlist);
free(D);
}
return NULL;
}
// Add a new weighted word to the end of our dictionary
//
void DictAddWord(dict *D, const char *str, int weight){
if (!D) return;
if (D->size == D->capacity) return;
D->wordlist[D->size] = CreateWeightedWord(str, weight);
if (D->wordlist[D->size])
D->size += 1;
}
// Merge Sort the Dictionary --------------------------------------------------
// Merge two partitions of sorted words
// words • the partitioned weighted word list
// start • beginning of left partition
// middle • end of left partition, beginning of right partition
// stop • end of right partition
// buffer • temporary work buffer, at least as big as (middle-start)
//
void MergeWeightedWords(weighted_word **words, int start, int middle, int stop, weighted_word **buffer){
int Lstart = start; int Rstart = middle; // Left partition
int Lstop = middle; int Rstop = stop; // Right partition
int Bindex = 0; // temporary work buffer output index
// while (left partition has elements) AND (right partition has elements)
while ((Lstart < Lstop) && (Rstart < Rstop)){
if (strcmp( words[Rstart]->word, words[Lstart]->word ) < 0)
buffer[Bindex++] = words[Rstart++];
else
buffer[Bindex++] = words[Lstart++];
}
// if (left partition has any remaining elements)
while (Lstart < Lstop)
buffer[Bindex++] = words[Lstart++];
// We don't actually need this. Think about it. Why not?
// // if (right partition has any remaining elements)
// while (Rstart < Rstop)
// buffer[Bindex++] = words[Rstart++];
// Copy merged data from temporary buffer back into source word list
for (int n = 0; n < Bindex; n++)
words[start++] = buffer[n];
}
// Merge Sort an array of weighted words
// words • the array of (weighted_word*)s to sort
// start • index of first element to sort
// stop • index ONE PAST the last element to sort
// buffer • the temporary merge buffer, at least as big as (stop-start+1)/2
//
void MergeSortWeightedWords(weighted_word **words, int start, int stop, weighted_word **buffer){
if (start < stop-1){ // -1 because a singleton array is by definition sorted
int middle = start + (stop - start) / 2;
MergeSortWeightedWords(words, start, middle, buffer);
MergeSortWeightedWords(words, middle, stop, buffer);
MergeWeightedWords(words, start, middle, stop, buffer);
}
}
// Merge Sort a Dictionary
//
void MergeSortDict(dict *D){
if (D){
// We only need to allocate a single temporary work buffer, just once, right here.
dict * Temp = CreateDict(D->size);
if (Temp){
MergeSortWeightedWords(D->wordlist, 0, D->size, Temp->wordlist);
}
FreeDict(Temp);
}
}
// Main program ---------------------------------------------------------------
#include <stdio.h>
int main(int argc, char **argv){
// Command-line arguments --> dictionary
dict *a_dict = CreateDict(argc-1);
for (int n = 1; n < argc; n++)
DictAddWord(a_dict, argv[n], 0);
// Sort the dictionary
MergeSortDict(a_dict);
// Print the weighted words
for (int n = 0; n < a_dict->size; n++)
printf( "%d %s\n", a_dict->wordlist[n]->weight, a_dict->wordlist[n]->word );
// Clean up
FreeDict(a_dict);
}
Notes for you:
Be consistent. You were inconsistent with capitalization and * placement and, oddly, vertical spacing. (You are waaay better than most beginners, though.) I personally hate the Egyptian brace style, but to each his own.
I personally think there are far too many levels of malloc()s in this code too, but I will leave it at this one comment. It works as is.
Strings must be nul-terminated — that is, each string takes strlen() characters plus one for a '\0' character. There is a convenient library function that can copy a string for you too, called strdup(), which AFAIK exists on every system.
Always check that malloc() and friends succeed.
Don’t forget to free everything you allocate. Functions help.
“Item” was a terribly non-descript name, and it overlapped with the meaning of two different things in your code. I renamed them to separate things.
Your dictionary object should be expected to keep track of how many elements it can support. The above code simply refuses to add words after the capacity is filled, but you could easily make it realloc() a larger capacity if the need arises. The point is to prevent invalid array accesses by adding too many elements to a fixed-size array.
Printing the array could probably go in a function.
Notice how I set start as inclusive and stop as exclusive. This is a very C (and C++) way of looking at things, and it is a good one. It will help you with all kinds of algorithms.
Notice also how I split the Merge Sort up into two functions: one that takes a dictionary as argument, and a lower-level one that takes an array of the weighted words as argument that does all the work.
The higher-level merge sort a dictionary allocates all the temporary buffer the merge algorithm needs, just once.
The lower-level merge sort an array of (weighted_word*)s expects that temporary buffer to exist and doesn’t care (or know anything) about the dictionary object.
The merge algorithm likewise doesn't know much. It is simply given all the information it needs.
Right now the merge condition simply compares the weighted-word’s string value. But it doesn’t have to be so simple. For example, you could sort equal elements by weight. Create a function:
int CompareWeightedWords(const weighted_word *a, const weighted_word *b){
int rel = strcmp( a->word, b->word );
if (rel < 0) return -1;
if (rel > 0) return 1;
return a->weight < b->weight ? -1 : a->weight > b->weight;
}
And put it to use in the merge function:
if (CompareWeightedWords( words[Rstart], words[Lstart] ) < 0)
buffer[Bindex++] = words[Rstart++];
else
buffer[Bindex++] = words[Lstart++];
I don’t think I forgot anything.

How to initialize a field in a struct from another struct? C

So im new to C programming and my assignment is to write a function(Max_way) that prints the driver who had the total of longest trips.
im using these 2 structs:
#define LEN 8
typedef struct
{
unsigned ID;
char name[LEN];
}Driver;
typedef struct
{
unsigned T_id;
char T_origin[LEN];
char T_dest[LEN];
unsigned T_way;
}Trip;
and a function to determine the total trips of a certain driver:
int Driver_way(Trip trips[], int size, unsigned id)
{
int km=0;
for (int i = 0; i < size; i++)
{
if (id == trips[i].T_id)
{
km = km + trips[i].T_way;
}
}
return km;
}
but when im trying to print the details of a specific driver from an array of drivers, i receive the correct ID, the correct distance of km, but the driver's name is not copied properly and i get garbage string containing 1 character instead of 8.
i've also tried strcpy(max_driver.name,driver[i].name) with same result.
void Max_way(Trip trips[], int size_of_trips, Driver drivers[], int size_of_drivers)
{
int *km;
int max = 0;
Driver max_driver;
km = (int*)malloc(sizeof(int) * (sizeof(drivers) / sizeof(Driver)));
for (int i = 0; i < size_of_drivers; i++)
{
km[i] = Driver_way(trips, sizeof(trips), drivers[i].ID);
for (int j = 1; j < size_of_drivers; j++)
{
if (km[j] > km[j - 1])
{
max = km[j];
max_driver.ID = drivers[i].ID;
max_driver.name = drivers[i].name;
}
}
}
printf("The driver who drove the most is:\n%d\n%s\n%d km\n", max_driver.ID, max_driver.name, max);
}
any idea why this is happening?

Note that one cannot copy a string using a simple assignment operator; you must use strcpy (or similar) as follows:
if (km[j] > km[j - 1]) {
max = km[j];
max_driver.ID = drivers[i].ID;
strcpy(max_driver.name,drivers[i].name);
}
Also note that since you were using ==, this was not even a simple assignment, put a comparison. Changing to == likely fixed a compile-time error, but it did NOT give you what you want.

Undefined behavior when deleting an element from dynamic array of structs

I have an n sized array of structs dynamically allocated, and each position of the array is an array too, with different sizes for each position (an array of arrays).
I created a function to delete a given array[index] but I'm facing some undefined behavior, for example:
If the array is of size 3, if I delete array[0],I can't access array[1]. This happens with other combinations of indexes too. The only way it works flawlessly is when I delete from end to start.
Here is the code I have:
Structures:
typedef struct point{
char id[5];
char type[5];
char color[10];
int x;
int y;
} Point;
typedef struct {
char lineID[5];
int nPoints;
Point *pt;
}railData;
typedef struct railway {
railData data;
}railway;
This is how the array was created:
headRail = (railway**)calloc(lineNum,sizeof(railway*));
And each Rail:
headRail[i] = (railway*)calloc(pointsNum,sizeof(railway));
These are the functions to delete a rail:
railway **delRail(railway **headRail, int j)
{
int nPts = 0;
if (!headRail)
{
puts(ERRORS[NULLPOINTER]);
return NULL;
}
// Number of rail points on jth rail
nPts = headRail[j]->data.nPoints;
// Free each rail point from jth rail
for (int i = 0; i < nPts; ++i)
{
free(headRail[j][i].data.pt);
}
// Free allocated memory for jth rail
free(headRail[j]);
return headRail;
}
And this is where I call the previous function:
railway **removeRail(railway **headRail)
{
char userID[20];
int index = 0;
// Quit if no rails
if (!headRail)
{
backToMenu("No rails available!");
return NULL;
}
// Get user input
getString("\nRail ID: ",userID,MINLEN,MAXLEN); // MINLEN = 2 MAXLEN = 4
// get index of the asked rail
getRailIndex(headRail,userID,&index);
if (index != NOTFOUND)
{
headRail = delRail(headRail, index);
// Update number of rails in the array (global var)
NUMOFRAILS--;
backToMenu("Rail deleted!\n");
}
else
backToMenu("Rail not found!");
return headRail;
}
So my question is how can I modify my code so that when position i is eliminated, all other indexes are shifted left and the last position, which would be empty, is discarded (something like realloc but for shrinking)
Is what I'm asking doable without changing the array's structure?

When removing element i, do memmove all the data from i+1 to i to the end of the array and then realloc with the size decremented by 1.
Note that arrays in C do not track their size in any way, so you need to pass the size by an external way.
Your data abstraction is strange. I would expect that headRail[j][0].data.nPoints is used to store the number of points inside the headRail[j][0].data structure, yet there you store the count of headRails in the j row headRail[j][<this count>]. I would advise to rewrite the abstraction, have one "object" for the railway and another for hadling two dimensional arrays of railways with dynamic sizes in all directions.
Like:
railway **delRail(railway **headRail, int j)
{
...
// this is strange, it's equal to
// nPts = headRail[j][0].data.nPoints;
// dunno if you mean that,
// or if [j][0].data.nPoints refers to the size of
// headRail[j][0].data.pt or to the size of the whole array
size_t nPts = headRail[j]->data.nPoints;
for (size_t i = 0; i < nPts; ++i) {
free(headRail[j][i].data.pt);
}
free(headRail[j]);
// note that arrays in C does not know how many elements are there in the array
// so you typically pass that along the arguments, like
// railway **delRail(railway **headRail, size_t railcount, int j);
size_t headRailCount = lineNum; // some external knowledge of the size
memmove(&headRail[j], &headRail[j + 1], (headRailCount - j - 1) * sizeof(*headRail));
void *pnt = realloc(headRail, (headRailCount - 1) * sizeof(*headRail));
if (pnt == NULL) return NULL; // that would be strange
headRail = pnt; // note that the previous headRail is no longer valid
--lineNum; // decrement that object where you store the size of the array
return headRail;
}
What about some encapsulation and more structs instead of 2d array? 2d arrays are really a bit of pain for C, what about:
typedef struct {
// stores a single row of rail datas
struct railData_row_s {
// stores a pointer to an array of rail datas
railData *data;
// stores the count of how many datas of rails are stored here
size_t datacnt;
// stores a pointer to an array of rows of rail datas
} *raildatas;
// stores the size of the pointer of rows of rail datas
size_t raildatascnt;
} railway;
The count of mallocs will stay the same, but thinking about data will get simpler. And each pointer that points to an array of data has it's own size tracking variable. An allocation might look like this:
railway *rail_new(size_t lineNum, size_t pointsNum) {
railway *r = calloc(1, sizeof(*r));
if (!r) { return NULL; }
// allocate the memory for rows of raildata
r->raildatascnt = lineNum;
r->raildatas = calloc(r->raildatascnt, sizeof(*r->raildatas));
if (!t->raildatas) { /* error hadnling */ free(r); abort(); }
// for each row of raildata
for (size_t i = 0; i < r->raildatascnt; ++i) {
struct railData_row_s * const row = &r->raildatas[i];
// allocate the memory for the column of raildata
// hah, looks similar to the above?
row->datacnt = pointsNum;
row->data = calloc(row->datacnt, sizeof(*row->data));
if (!row->data) { /* error ahdnling */ abort(); }
}
return r;
}

Issue implementing dynamic array of structures

I am having an issue creating a dynamic array of structures. I have seen and tried to implement a few examples on here and other sites, the examples as well as how they allocate memory tend to differ, and I can't seem to get any of them to work for me. Any help would be greatly appreciated.
typedef struct node {
int index;
int xmin, xmax, ymin, ymax;
} partition;
partition* part1 = (partition *)malloc(sizeof(partition) * 50);
I can't even get this right. It gives me the following error:
error: initializer element is not constant
If anyone could explain how something like this should be implemented I would greatly appreciate it.
Also, once I have that part down, how would I add values into the elements of the structure? Would something like the below work?
part1[i]->index = x;

The compiler is complaining because you're doing:
partition* part1 = (partition *)malloc(sizeof(partition) * 50);
Do this instead:
partition* part1;
int
main(void)
{
part1 = (partition *)malloc(sizeof(partition) * 50);
...
}
Your version used an initializer on a global, which in C must be a constant value. By moving the malloc into a function, you are "initializing the value" with your code, but you aren't using an initializer as defined in the language.
Likewise, you could have had a global that was initialized:
int twenty_two = 22;
Here 22 is a constant and thus allowable.
UPDATE: Here's a somewhat lengthy example that will show most of the possible ways:
#define PARTMAX 50
partition static_partlist[PARTMAX];
partition *dynamic_partlist;
int grown_partmax;
partition *grown_partlist;
void
iterate_byindex_static_length(partition *partlist)
{
int idx;
for (idx = 0; idx < PARTMAX; ++idx)
do_something(&partlist[idx]);
}
void
iterate_byptr_static_length(partition *partlist)
{
partition *cur;
partition *end;
// these are all equivalent:
// end = partlist + PARTMAX;
// end = &partlist[PARTMAX];
end = partlist + PARTMAX;
for (cur = partlist; cur < end; ++cur)
do_something(cur);
}
void
iterate_byindex_dynamic_length(partition *partlist,int partmax)
{
int idx;
for (idx = 0; idx < partmax; ++idx)
do_something(&partlist[idx]);
}
void
iterate_byptr_dynamic_length(partition *partlist,int partmax)
{
partition *cur;
partition *end;
// these are all equivalent:
// end = partlist + partmax;
// end = &partlist[partmax];
end = partlist + partmax;
for (cur = partlist; cur < end; ++cur)
do_something(cur);
}
int
main(void)
{
partition *part;
dynamic_partlist = malloc(sizeof(partition) * PARTMAX);
// these are all the same
iterate_byindex_static_length(dynamic_partlist);
iterate_byindex_static_length(dynamic_partlist + 0);
iterate_byindex_static_length(&dynamic_partlist[0]);
// as are these
iterate_byptr_static_length(static_partlist);
iterate_byptr_static_length(static_partlist + 0);
iterate_byptr_static_length(&static_partlist[0]);
// still the same ...
iterate_byindex_dynamic_length(dynamic_partlist,PARTMAX);
iterate_byindex_dynamic_length(dynamic_partlist + 0,PARTMAX);
iterate_byindex_dynamic_length(&dynamic_partlist[0],PARTMAX);
// yet again the same ...
iterate_byptr_dynamic_length(static_partlist,PARTMAX);
iterate_byptr_dynamic_length(static_partlist + 0,PARTMAX);
iterate_byptr_dynamic_length(&static_partlist[0],PARTMAX);
// let's grow an array dynamically and fill it ...
for (idx = 0; idx < 10; ++idx) {
// grow the list -- Note that realloc is smart enough to handle
// the fact that grown_partlist is NULL on the first time through
++grown_partmax;
grown_partlist = realloc(grown_partlist,
grown_partmax * sizeof(partition));
part = &grown_partlist[grown_partmax - 1];
// fill in part with whatever data ...
}
// once again, still the same
iterate_byindex_dynamic_length(grown_partlist,grown_partmax);
iterate_byindex_dynamic_length(grown_partlist + 0,grown_partmax);
iterate_byindex_dynamic_length(&grown_partlist[0],grown_partmax);
// sheesh, do things ever change? :-)
iterate_byptr_dynamic_length(grown_partlist,grown_partmax);
iterate_byptr_dynamic_length(grown_partlist + 0,grown_partmax);
iterate_byptr_dynamic_length(&grown_partlist[0],grown_partmax);
}
There are two basic ways to interate through an array: by index and by pointer. It does not matter how the array was defined (e.g. global/static --> int myary[37]; or via malloc/realloc --> int *myptr = malloc(sizeof(int) * 37);). The "by index" syntax and "by pointer" syntaxes are interchangeable. If you wanted the 12th element, the following are all equivalent:
myary[12]
*(myary + 12)
*(&myary[12])
myptr[12]
*(myptr + 12)
*(&myptr[12])
That's why all of the above will produce the same results.

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].
This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.
Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?
And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555

If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.
It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.
A3B1C2D1E1
becomes
A3BC2DE
Here is some code, in C++, to remove the 1s from the string (O(n) complexity).
// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
assert(j <= i); // optional check
if(str[i] != '1') {
str[j] = str[i];
++ j;
}
++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string
Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.
(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).
Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.
During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.

The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.
High level overview:
Determine the encoded length.
Determine the decoded length by reading all the numbers and summing them up.
End of buffer is MAX(decoded length, encoded length).
Decode the string by starting from the end of the string. Write from the end of the buffer.
Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.
int isDigit (char c) {
return '0' <= c && c <= '9';
}
unsigned int toDigit (char c) {
return c - '0';
}
unsigned int intLen (char * str) {
unsigned int n = 0;
while (isDigit(*str++)) {
++n;
}
return n;
}
unsigned int forwardParseInt (char ** pStr) {
unsigned int n = 0;
char * pChar = *pStr;
while (isDigit(*pChar)) {
n = 10 * n + toDigit(*pChar);
++pChar;
}
*pStr = pChar;
return n;
}
unsigned int backwardParseInt (char ** pStr, char * beginStr) {
unsigned int len, n;
char * pChar = *pStr;
while (pChar != beginStr && isDigit(*pChar)) {
--pChar;
}
++pChar;
len = intLen(pChar);
n = forwardParseInt(&pChar);
*pStr = pChar - 1 - len;
return n;
}
unsigned int encodedSize (char * encoded) {
int encodedLen = 0;
while (*encoded++ != '\0') {
++encodedLen;
}
return encodedLen;
}
unsigned int decodedSize (char * encoded) {
int decodedLen = 0;
while (*encoded++ != '\0') {
decodedLen += forwardParseInt(&encoded);
}
return decodedLen;
}
void shift (char * str, int n) {
do {
str[n] = *str;
} while (*str++ != '\0');
}
unsigned int max (unsigned int x, unsigned int y) {
return x > y ? x : y;
}
void decode (char * encodedBegin) {
int shiftAmount;
unsigned int eSize = encodedSize(encodedBegin);
unsigned int dSize = decodedSize(encodedBegin);
int writeOverflowed = 0;
char * read = encodedBegin + eSize - 1;
char * write = encodedBegin + max(eSize, dSize);
*write-- = '\0';
while (read != encodedBegin) {
unsigned int i;
unsigned int n = backwardParseInt(&read, encodedBegin);
char c = *read;
for (i = 0; i < n; ++i) {
*write = c;
if (write != encodedBegin) {
write--;
}
else {
writeOverflowed = 1;
}
}
if (read != encodedBegin) {
read--;
}
}
if (!writeOverflowed) {
write++;
}
shiftAmount = encodedBegin - write;
if (write != encodedBegin) {
shift(write, shiftAmount);
}
return;
}
int main (int argc, char ** argv) {
//char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
//char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char * str = buff + 3;
//char buff[256] = { "A1B1" };
//char * str = buff;
decode(str);
return 0;
}

This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?
For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).
Then loop though, moving the characters to their new slots:
To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.
You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).
Pseudocode for the move:
var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.
for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}
This would leave you with:
A|3|A|3|B|1|C|2|||||
Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:
for(i = cursor; i < cursor + n + 1; i++)
{
array[i] = array[cursor - 1];
}
// increment the cursor afterwards!
cursor += n + 1;
Giving:
A|A|A|A|B|1|C|2|||||
Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.

Another O(n^2) solution follows.
Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.
while ( there is an expandable element ):
expand that element
adjust (shift) all of the elements on the right side of the expanded element
Where:
Free space size is the number of empty elements left in the array.
An expandable element is an element that:
expanded size - encoded size <= free space size
The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Optimization for large strings C/C++ - c

Try to use Huffman Codification. It's a complex thing and old, but I think that's efficient. I don't know if there's new or/and better algorithms to do that. http://www.cprogramming.com/tutorial/computersciencetheory/huffman.html http://en.wikipedia.org/wiki/Huffman_coding

Related

How can I correctly allocate memory for this MergeSort implementation in C (with the DS I am using)?

How to initialize a field in a struct from another struct? C

Undefined behavior when deleting an element from dynamic array of structs

Issue implementing dynamic array of structures

In-place run length decoding?

Categories

Resources