Don't understand why C program crashes, pointer array of strings - c

I'm having trouble with this for loop and I don't understand why it crashes. I'm trying to read an input list of 20 names in "first name last name" format and storing them as a string in "last name, first name". Duplicates should not be stored into the array pointer.
When I comment out the malloc and compare loop, apparently there is some issue with the address staying the same, so that *ary returns the same value as *walker. The filePtr works and the strcpy and strcat functions have no issues. Also, removing the first printf also causes the program to crash, even though removing it doesn't seem it should have any real effect besides output.
FILE *filePtr = fopen ("input.txt","r");
int size = 20;
char **ary;
char **walker;
char **end;
int strsize = 0;
char firstname[30] = {0};
char lastname[30] = {0};
char *fullname;
ary = calloc (size, sizeof(char *));
printf("%d\n",sizeof(pAry));
for ( walker = ary ; *walker < (*end = *ary + size) ; walker++)
{
fscanf(filePtr," %s",firstname);
fscanf(filePtr," %[^\n]",lastname);
strsize = strlen(firstname) + strlen(lastname) + 3;
fullname = malloc (strsize * sizeof(char));
strcpy(fullname,lastname);
strcat(fullname,", ");
strcat(fullname,firstname);
for ( compare = 0 ; compare < walker ; compare++)
{
if(strcmp(fullname,*(ary + compare)) != 0)
{
diff = 0;
}
}
if (diff)
{
strncpy(*walker,fullname,strsize);
printf("%s\n",*walker);
}
free(fullname);
}

The outer loop shall loop over all entries of ary, so the end condition shall test for walker being (at) the end.
No dereferencing is needed here:
for (walker = ary; walker < (end = ary + size); walker++)
The test loop for duplicates does compare againt absolute pointers values, the initialisation of compare to 0 implies relative comparsion so this line
compare < walker;
should be
compare < (walker - ary);
Substracting two pointers returns an integer, its size depends on the size of a pointer, which differs depending on compiler and/or system. To get around this uncertainty the integer type ptrdiff_t had been introduced to be guaranteed to hold any pointer difference.
So compare shall be declare:
ptrdiff_t compare;
strcmp() returns 0 if the strings to be compared are equal, so setting diff to 0 on inequality is wrong.
You might like to use the following statement to set diff:
diff = strcmp(fullname,*(ary + compare));
This sets diff to 0 (false) if the two string a equal (not *diff*erent).
Also comparision shall stop after a dupe had been found.
if (!diff)
{
break;
}
Finally diff needs to be (re-)initialised for every iteration.
Instead of
strncpy(*walker, fullname, strsize);
do
*walker = fullname;
as fullname referrs to the freshly allocated memory and needs to be stored, as overwritten in the next iteration.
The line free()ing fullname
free(fullname);
needs to be removed then.
Putting all this together you get:
...
for (walker = ary; walker < (end = ary + size); walker++)
{
...
{
int diff = 1;
for (ptrdiff_t compare = 0; compare < (walker - ary); compare++)
{
diff = strcmp(fullname, *(ary + compare));
if (!diff)
{
break;
}
}
if (diff)
{
*walker = fullname;
printf("%s\n", *walker);
}
}
}

Related

Bubble sort on string array not doing anything in C with strcmp

I am trying to sort a string array in C (char**), the string array is an array of all the names of files in a directory.
Here are all the parts of the code, I expect the code to sort the array alphabetically, but it doesn't work.
Here is the code to get the files from the directory
typedef struct
{
char** names;
int32_t count;
} Files;
void swap(char* a, char* b)
{
char* temp = a;
a = b;
b = temp;
}
void bubbleSort(char** strs, uint32_t length)
{
uint32_t i = 0, j = 0;
for (i = 0; i < length; i++)
{
for (j = 0; j < length - i - 1; j++)
{
if (strcmp(strs[j], strs[j + 1]) < 0)
{
swap(strs[j], strs[j + 1]);
}
}
}
}
Files* getScannableFilesInDir(char* dirname)
{
Files* files = (Files*)malloc(sizeof(Files));
files->names = NULL; //clearing garbage values
files->count = 0; //clearing garbage values
uint32_t dirnameLength = strlen(dirname);
uint32_t count = 0;
uint32_t countIterations = 0;
DIR* d = opendir(dirname);
struct dirent* dir = NULL;
while ((dir = readdir(d)) != NULL)
{
if (files->names == NULL) //first file, if START_AMOUNT is set to 0 we want to allocate enough space for the first path, once we know there is at least 1 file
{
files->names = (char**)malloc(1 * sizeof(char*));
count++; // there is enough space allocated for 1 string
}
if (strcmp(dir->d_name, ".") && strcmp(dir->d_name, "..") && dir->d_type != DT_DIR)
{
++countIterations;
if (count < countIterations)
{
files->names = (char**)realloc(files->names, countIterations * sizeof(char*)); //adding 1 more space
}
files->names[countIterations - 1] = (char*)malloc(sizeof(char) * (strlen(dir->d_name) + dirnameLength) + 2); //-1 because we are incrementing at the start of the loop and not the end
//+ 2 at the end because a. the \0 at the end, b. 1 space for adding the slash
strcpy(files->names[countIterations - 1], dirname);
files->names[countIterations - 1][dirnameLength] = '/'; //changing the \0 to /
strcpy(files->names[countIterations - 1] + (dirnameLength + 1), dir->d_name); //adding the name after the /, now we have the full name
}
}
closedir(d);
files->count = countIterations;
bubbleSort(files->names, files->count);
return files;
}
I checked the rest of the code, I am not changing the value of the files->names at any other point of the code, just reading it.
You swap, and the way it is invoked from bubblesort, are both wrong. First let me educate you on a simple fundamental.
Arguments in C are pass-by-value. The only way to change caller-side data from an argument passed to a function as a parameter is to change the fundamentals of the parameter. You'll still pass by-value, you just need to make the "value" something you can use to access the caller's variable data. In C, you do that by declaring the formal parameter to be a pointer-to-type, pass the address of the thing you want changed caller-side, and within the function use the pointer via dereference to instigate the change.
void foo(int *a)
{
*a = 42; // stores 42 in whatever 'a' points to
}
Now, lets take a look at your swap:
void swap(char* a, char* b)
{
char* temp = a;
a = b;
b = temp;
}
Well, we have two pointers. But nowhere in this code do we actually dereference anything. All we're doing is swapping the values of the two pointers locally. The caller is completely unaffected by any of this.
What your swap is supposed to be doing is swapping two pointers. We have two char* caller side values we want to swap. The way to do that is exactly as we discussed before. Declare the formal parameters to be pointer-to-type (our type is char*, so our parameters are char**), then use dereferencing to modify the caller-side data.
void swap(char **ppA, char **ppB)
{
char *pp = *ppA;
*ppA = *ppB;
*ppB = pp;
}
That will swap two pointers, but only if the caller passes us the addresses of the two pointers. Note the vernacular here. I did not say "passes us the addresses in the two pointers" (e.g. their values), I said "passes us the addresses of the two pointers" (e.g. where the pointers themselves reside in memory). That means bubblesort needs changing too:
void bubbleSort(char** strs, uint32_t length)
{
uint32_t i = 0, j = 0;
for (i = 0; i < length; i++)
{
for (j = 0; j < length - i - 1; j++)
{
if (strcmp(strs[j], strs[j + 1]) < 0)
{
swap(strs+j, strs+j+1); // <=== HERE
}
}
}
}
This passes the addresses of the pointers at str[j] str[j+1] to the swap function.
That should solve your swap issue. Whether the algorithm is correct, or for that matter the rest of your code, I leave to you to digest.

Problems freeing memory using free()

I have been trying to write a function that will insert commas into a binary number.
Below is my best attempt. It does work if I do NOT try to free() the memory.
If I try to free() the memory, I get an error.
I am puzzled. Please let me know what I am doing wrong.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int insertCommasIntoBinaryNumber(char* outstring, char* instring)
{
char *ptr, *optr;
int i, length, commas;
// move ptr to end of instring
for ( ptr = instring; *ptr; ptr++ );
//calculate offset with commas
length = ptr - instring;
commas = ( length - 1 ) / 8;
optr = outstring + length + commas;
//copy instring into outstring backwards inserting commas
*optr-- = *ptr--;
for ( i = 1; ptr >= instring; i++ )
{
*optr-- = *ptr--;
if ( ( i % 8 ) == 0 )
*optr-- = ',';
}
}
int main (void)
{
const int arrayDimension = 100;
char* instring = (char*) malloc(sizeof(char) * arrayDimension);
char* outstring = (char*) malloc(sizeof(char) * arrayDimension);
strncpy(instring, "111111110101010100001100", arrayDimension-1);
insertCommasIntoBinaryNumber(outstring, instring);
/* show the result */
printf ( "%s\n", outstring );
free(instring);
free(outstring);
}
Here is the output:
11111111,01010101,00001100
*** Error in `./a.out': free(): invalid next size (fast): 0x0000000000bc8010 ***
P.S. Many thanks for letting me know where the code was crashing on the 24th iteration. I soon realized that I was not calculating the number of commas needed correctly and not keeping track of the number of commas being inserted. After I did that, the code below appears to work fine now.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int insertCommasIntoBinaryNumber(char* const outString, const char* const inString)
{
char const *iptr; // iptr will be a pointer to the
// constant inString char array.
char *optr;
int i, commaCount;
// move iptr to end of inString
for ( iptr = inString; *iptr; iptr++ );
// Calculate Number of Commas Needed
const int inStringLength = iptr - inString;
const double totalNumberOfCommasFP = (( inStringLength ) / 8.0) - 1.0;
const int totalNumberOfCommas = (int) ceil(totalNumberOfCommasFP);
// Set optr
optr = outString + inStringLength + totalNumberOfCommas;
//copy inString into outString backwards inserting commas
*optr-- = *iptr--;
commaCount = 0;
for ( i = 1; iptr >= inString; i++ )
{
*optr-- = *iptr--;
if ( ( ( i % 8 ) == 0 ) && (commaCount < totalNumberOfCommas) )
{
*optr-- = ',';
commaCount++;
}
}
}
int main (void)
{
const char testString[] = "111111110101010100001100";
const int inStringArrayDimension = strlen(testString) + 1;
char * inString = (char*) malloc(sizeof(char) * inStringArrayDimension);
strncpy(inString, testString, inStringArrayDimension);
const int inStringLength = (int) strlen(inString);
const double totalNumberOfCommasFP = (( inStringLength ) / 8.0) - 1.0;
const int totalNumberOfCommas = (int) ceil(totalNumberOfCommasFP);
const int outStringArrayDimension = inStringArrayDimension + totalNumberOfCommas;
char* outString = (char*) malloc(sizeof(char) * outStringArrayDimension);
insertCommasIntoBinaryNumber(outString, inString);
/* show the result */
printf ( "%s\n", outString );
free(inString);
free(outString);
exit (EXIT_SUCCESS);
}
And here is the output:
11111111,01010101,00001100
Since the string length is 24, for ( i = 1; ptr >= instring; i++ ) iterates 24 times. On the 24th iteration, optr points to the first character of outstring. Since (i % 8) == 0 is true, *optr-- = ','; is executed. That puts a comma before outstring, writing outside array bounds and corrupting your program’s memory.
I recommend that you do this
#include <assert.h>
//copy instring into outstring backwards inserting commas
assert (optr >= outstring);
*optr-- = *ptr--;
for ( i = 1; ptr >= instring; i++ )
{
assert (optr >= outstring);
*optr-- = *ptr--;
if ( ( i % 8 ) == 0 ) {
assert (optr >= outstring);
*optr-- = ',';
}
}
Then when the assertion goes off, debug it. You have almost certainly botched the space calculation, under-estimating how much space is needed to store the version of the datum with commas inserted.
Secondly, you're doing something that ISO C does not require to work: incrementing pointers below the start of an object. This is not the actual problem; even if you debug the malloc corruption, that issue is still there.
What I'm getting at is that this is not a correct idiom:
for (ptr = end_of_object; ptr >= start_of_object; ptr--)
{
// ... loop body in which ptr is dereferenced
}
Here is why. When the last iteration of the loop occurs, ptr == start_of_object holds. The body of the loop is executed, and then, unconditionally, the ptr-- decrement is executed. Even though we do not execute the loop body any more, and therefore do not dereference ptr, it is still incorrect to be decrementing it. It is undefined behavior according to ISO C.
The idiom works in machine languages.
One way to avoid it is to use integer indexing.
for (i = num_elements - 1; i >= 0; i--)
{
// work with array[i]
}
Here, i is assumed to be a signed integer type. At the top of the last iteration, i == 0 holds. Then, unconditionally, i is decremented to -1. Since array[i] is never accessed in this case, that is completely safe.
Lastly, a robust, production version of this type of code cannot just assume that you have all the space in the destination array. Your comma-inserting API needs to provide some means by which the caller can determine how much space is required. E.g.:
const char *str = "1110101101";
// determine how much space is needed for string with commas
size_t space = comma_insert_space_required(str);
// OK, allocate that much space
char *str_with_commas = malloc(space);
// now process the same string into that space
comma_insert(str_with_commas, str);
This will work whether you have a five character input or five thousand.
If you choose an approach involving artificial limits in there like 100 bytes (the idea being no valid inputs will ever occur that come close), you still need defenses against that, so that you don't access an object out of bounds. "Bad guys" trying to break your software will look for ways to sneak in the "never occur" inputs.

Using pointers in 2D arrays

I'm attempting to store arrays of integers that I read from a file (with a separate function) in a 2D array but I keep having issues with Segmentation fault. I know it's an issue with my pointers but I can't figure out exactly what I'm doing wrong.
Here is my function (takes an integer and compares it with an integer read from a file before storing it in my 2D array).
int **getStopTimes(int stop_id) {
int **result = malloc(sizeof(*result));
char const* const fileName = "stop_times_test.txt";
FILE* txt = fopen(fileName, "r");
char line[256];
int count = 0;
while (fgets(line, sizeof(line), txt) != NULL) {
int *formattedLine = getStopTimeData(line); //getStopTimeData returns a pointer to an array of ints, memory is allocated in the function
if (formattedLine[1] == stop_id) {
result[count] = formattedLine;
count++;
}
}
fclose(txt);
return result;
}
And my main:
int main(int argc, char *argv[]) {
int **niceRow = getStopTimes(21249);
for (int i=0; i<2; i++) { //Only looping 3 iterations for test purposes
printf("%d,%d,%d,%d\n",niceRow[i][0], niceRow[i][1], niceRow[i][2], niceRow[i][3]);
}
free(niceRow);
return 0;
}
getStopTimeData function thats being called (Pulls certain information from an array of chars and stores/returns them in an int array):
int *getStopTimeData(char line[]) {
int commas = 0;
int len = strlen(line);
int *stopTime = malloc(4 * sizeof(*stopTime)); //Block of memory for each integer
char trip_id[256]; //Temp array to build trip_id string
char stop_id[256]; //Temp array to build stop_id string
int arrival_time; //Temp array to build arrival_time string
int departure_time; //Temp array to build departure_time string
int counter;
for(int i = 0; i <len; i++) {
if(line[i] == ',') {
commas++;
counter = 0;
continue;
}
switch(commas) { //Build strings here and store them
case 0 :
trip_id[counter++] = line[i];
if(line[i+1] == ',') trip_id[counter] = '\0';
break;
case 1: //Convert to hours past midnight from 24hr time notation so it can be stored as int
if(line[i] == ':' && line[i+3] == ':') {
arrival_time = (line[i-2]-'0')*600 + (line[i-1]-'0')*60 + (line[i+1]-'0')*10 + (line[i+2]-'0');
}
break;
case 2 :
if(line[i] == ':' && line[i+3] == ':') {
departure_time = (line[i-2]-'0')*600 + (line[i-1]-'0')*60 + (line[i+1]-'0')*10 + (line[i+2]-'0');
}
break;
case 3 :
stop_id[counter++] = line[i];
if(line[i+1] == ',') stop_id[counter] = '\0';
break;
}
}
//Assign and convert to ints
stopTime[0] = atoi(trip_id);
stopTime[1] = atoi(stop_id);
stopTime[2] = arrival_time;
stopTime[3] = departure_time;
return stopTime;
}
This line:
int **result = malloc(sizeof(*result));
allocates just memory for one single pointer. (*result is of type int *, so it's a pointer to data -- the sizeof operator will tell you the size of a pointer to data ... e.g. 4 on a 32bit architecture)
What you want to do is not entirely clear to me without seeing the code for getStopTimeData() ... but you definitely need more memory. If this function indeed returns a pointer to some ints, and it handles allocation correctly, you probably want something along the lines of this:
int result_elements = 32;
int **result = malloc(sizeof(int *) * result_elements);
int count = 0;
[...]
if (formattedLine[1] == stop_id) {
if (count == result_elements)
{
result_elements *= 2;
result = realloc(result, result_elements);
}
result[count] = formattedLine;
count++;
}
Add proper error checking, malloc and realloc could return (void *)0 (aka null) on out of memory condition.
Also, the 32 for the initial allocation size is just a wild guess ... adapt it to your needs (so it doesn't waste a lot of memory, but will be enough for most use cases)
The upper answer is good,
just to give you an advice try to avoid using 2D array but use a simple array where you can store all your data, this ensures you to have coalescent memory.
After that, you can access your 1D array with an easy trick to see it like a 2D array
Consider that your 2D array has a line_size
To access it like a matrix or a 2d array you need to find out the corresponding index of your 1d array for given x,y values
index = x + y * line size;
In the opposite way:
you know the index, you want to find x and y corresponding to this index.
y = index / line_size;
x = index mod(line_size);
Of course, this "trick" can be used if you already know your line size

Making two arrays the same length. C(89)

The two arrays passed in are constants so I made two new arrays.
The first array stores a group of chars and the second array stores a second group of chars. So far I assume that the first group is bigger than the second ex. (a,b,c,d > x,y).
What the program hopes to accomplish is to make two new arrays that contain the same letters but the shorter array in this case arr2 (newarr2) has it's last char repeated until it matches the length of the first array.
examples of correct solutions.
(a,b,c,d < x,y) --> equate_arr --> (a,b,c,d = x,y,y,y)
void equate_arr(char arg2[], char arg1[]){
size_t i = 0;
size_t len1 = strlen(arg1);
size_t len2 = strlen(arg2);
char newarr2[512];
char newarr1[512];
while(i < (strlen2 - 1))
{
newarr2[i] = arg2[i];
i++;
}
i = 0;
while(i < (strlen1 - 1))
{
newarr1[i] = arg1[i];
i++;
}
i = 0;
while(strlen(newarr2) < strlen(newarr1))
{
newarr2[strlen(newarr2)] = newarr2[strlen(newarr2)-1]
}
}
Currently I have no idea what is happening because once I fiddle with this function in my code the program does not seem to run anymore. Sorry about asking about this project I'm working on so much but I really do need some assistance.
I can put the whole program in here if needed.
Revised
void tr_non_eq(char arg1[], char arg2[], int len1, int len2)
{
int i = 0;
char* arr2;
arr2 = (char*)calloc(len1+1,sizeof(char));
while(i < len2)
{
arr2[i] = arg2[i];
i++;
}
while(len2 < len1)
{
arr2[len2] = arg2[len2-1];
len2++;
}
tr_str(arg1, arr2);
}
Right now with inputs (a,b,c,d,e,f) and (x,y) and a string "cabbage" to translate the program prints out "yxyyx" and with string "abcdef" it prints out "xyy" which shows promise. I am not too sure why the arr2 array does not get filled with "y" chars as intended.
As de-duplicator says, as your code stands it effectively achieves nothing. More importantly, what it tries to do is fraught with peril.
The fact that you use strlen to determine the length of your arguments is a clear indicator that equate_arr does not expect to receive two arrays of char. Instead, it wants two NUL-terminated C-style strings. So the declaration should be more like:
void equate_arr(const char *arg2, const char *arg1)
This makes the contract a little clearer.
But note the return type: void. This says your function will not return any values to the caller. So, how did you plan to return the modified arrays?
The next big peril lies in these lines:
char newarr2[512];
char newarr1[512];
What happens if this function is called with a string which is larger than 511 characters (plus the NUL)? The phrase "buffer overrun" should be jumping out at you here.
What you need is to malloc buffers large enough to hold a duplicate of the longest string passed in. But that raises the question of how you will hand the new arrays back to the caller (remember that void return type?).
There are numerous other problems here, largely down to not having a clear definition of the contract this function is meant to meet.
One more for now while I look more closely
while(strlen(newarr2) < strlen(newarr1))
{
newarr2[strlen(newarr2)] = newarr2[strlen(newarr2)-1]
}
The very first pass through this loop overwrites the terminating NUL in newarr2, which means the next call to strlen is off into undefined behavior as it is completely at the mercy of whatever junk is sitting in your stack.
If you are unclear on C-style strings, take a look at my answer to this question which goes into great detail about them.
The following is whiteboard-code (i.e. not compiled, not tested) which would sort of do what you are wanting to achieve. It's purely for reference
// Pad a string so that it is the same length as another. Padding is done
// by replicating the final character.
//
// #param padThis: A C-style string in a non-constant buffer.
// #param bufLength: The size of the buffer containing padThis
// #param toMatchThis: A (possibly) const C-style string to act
// as a template for length
//
// Pre-conditions:
// - Both padThis and toMatchThis reference NUL-terminated sequences
// of chars
// - strlen(padThis) < bufLength. Violating this will exit the program.
// - strlen(toMatchThis) < bufLength. If not, padThis will be padded
// to bufLength characters.
//
// Post-conditons:
// - The string referenced by toMatchThis is unchanged
// - The original string at padThis has been padded if necessary to
// min(bufLength, strlen(toMatchThis))
void padString(char * padThis, size_t bufLength, const char * toMatchThis)
{
size_t targetLength = strlen(toMatchThis);
size_t originalLength = strlen(padThis);
if (originalLength >= bufLength)
{
fprintf(stderr, "padString called with an original which is longer than the buffer!\n");
exit(EXIT_FAILURE);
}
if (targetLength >= bufLength)
targetLength = bufLength -1; // Just pad until buffer full
if (targetLength <= strlen(padThis))
return; // Nothing to do
// At this point, we know that some padding needs to occur, and
// that the buffer is large enough (assuming the caller is not
// lying to us).
char padChar = padThis[originalLength-1];
size_t index = originalLength;
while (index < targetLength)
padThis[index++] = padChar;
padThis[index] = '\0';
}
Since you declared
char newarr2[512];
char newarr1[512];
as size 512 and not assigned any data, strlen will always return size of newarr1 and newarr2 as garbage since you not ended the string with a proper NULL character.
while(strlen(newarr2) < strlen(newarr1))
{
newarr2[strlen(newarr2)] = newarr2[strlen(newarr2)-1]
}
this while loop will not work properly.
for ( i = len2; i < len1; ++i )
newarr2[i] = newarr2[len2-1]
if len2 is always less than len1, you can use the above loop
if you do not know the which array will be bigger than,
size_t len1 = strlen(arg1);
size_t len2 = strlen(arg2);
char* newarr1;
char* newarr2;
int i;
if ( len1 >= len2 )
{
newarr1 = (char*)calloc(len1+1,sizeof(char));
newarr2 = (char*)calloc(len1+1,sizeof(char));
}
else
{
newarr1 = (char*)calloc(len2+1,sizeof(char));
newarr2 = (char*)calloc(len2+1,sizeof(char));
}
for ( i = 0; i < len1; ++i)
newarr1[i] = arg1[i];
for ( i = 0; i < len2; ++i)
newarr2[i] = arg2[i];
if( len1 >= len2 )
{
for ( i = len2; i < len1; ++i )
newarr2[i] = newarr2[len2-1];
}
else
{
for ( i = len1; i < len2; ++i )
newarr1[i] = newarr1[len1-1];
}
free the memory later

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].
This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.
Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?
And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555
If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.
It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.
A3B1C2D1E1
becomes
A3BC2DE
Here is some code, in C++, to remove the 1s from the string (O(n) complexity).
// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
assert(j <= i); // optional check
if(str[i] != '1') {
str[j] = str[i];
++ j;
}
++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string
Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.
(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).
Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.
During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.
The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.
High level overview:
Determine the encoded length.
Determine the decoded length by reading all the numbers and summing them up.
End of buffer is MAX(decoded length, encoded length).
Decode the string by starting from the end of the string. Write from the end of the buffer.
Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.
int isDigit (char c) {
return '0' <= c && c <= '9';
}
unsigned int toDigit (char c) {
return c - '0';
}
unsigned int intLen (char * str) {
unsigned int n = 0;
while (isDigit(*str++)) {
++n;
}
return n;
}
unsigned int forwardParseInt (char ** pStr) {
unsigned int n = 0;
char * pChar = *pStr;
while (isDigit(*pChar)) {
n = 10 * n + toDigit(*pChar);
++pChar;
}
*pStr = pChar;
return n;
}
unsigned int backwardParseInt (char ** pStr, char * beginStr) {
unsigned int len, n;
char * pChar = *pStr;
while (pChar != beginStr && isDigit(*pChar)) {
--pChar;
}
++pChar;
len = intLen(pChar);
n = forwardParseInt(&pChar);
*pStr = pChar - 1 - len;
return n;
}
unsigned int encodedSize (char * encoded) {
int encodedLen = 0;
while (*encoded++ != '\0') {
++encodedLen;
}
return encodedLen;
}
unsigned int decodedSize (char * encoded) {
int decodedLen = 0;
while (*encoded++ != '\0') {
decodedLen += forwardParseInt(&encoded);
}
return decodedLen;
}
void shift (char * str, int n) {
do {
str[n] = *str;
} while (*str++ != '\0');
}
unsigned int max (unsigned int x, unsigned int y) {
return x > y ? x : y;
}
void decode (char * encodedBegin) {
int shiftAmount;
unsigned int eSize = encodedSize(encodedBegin);
unsigned int dSize = decodedSize(encodedBegin);
int writeOverflowed = 0;
char * read = encodedBegin + eSize - 1;
char * write = encodedBegin + max(eSize, dSize);
*write-- = '\0';
while (read != encodedBegin) {
unsigned int i;
unsigned int n = backwardParseInt(&read, encodedBegin);
char c = *read;
for (i = 0; i < n; ++i) {
*write = c;
if (write != encodedBegin) {
write--;
}
else {
writeOverflowed = 1;
}
}
if (read != encodedBegin) {
read--;
}
}
if (!writeOverflowed) {
write++;
}
shiftAmount = encodedBegin - write;
if (write != encodedBegin) {
shift(write, shiftAmount);
}
return;
}
int main (int argc, char ** argv) {
//char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
//char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char * str = buff + 3;
//char buff[256] = { "A1B1" };
//char * str = buff;
decode(str);
return 0;
}
This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?
For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).
Then loop though, moving the characters to their new slots:
To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.
You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).
Pseudocode for the move:
var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.
for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}
This would leave you with:
A|3|A|3|B|1|C|2|||||
Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:
for(i = cursor; i < cursor + n + 1; i++)
{
array[i] = array[cursor - 1];
}
// increment the cursor afterwards!
cursor += n + 1;
Giving:
A|A|A|A|B|1|C|2|||||
Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.
Another O(n^2) solution follows.
Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.
while ( there is an expandable element ):
expand that element
adjust (shift) all of the elements on the right side of the expanded element
Where:
Free space size is the number of empty elements left in the array.
An expandable element is an element that:
expanded size - encoded size <= free space size
The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

Resources