Finding the Occurrence of a Pattern in a Data Block using C - c

I am trying to figure out finding the occurrence of a set of numbers in a data block without using arrays. Right now my issue is that everytime it finds the first number in the pattern it assumes the whole pattern is right. I am having trouble trying to go through the whole pattern using patternLength in the data block before assuming it is right. For example is the pattern I want to find over and over again is 12 14 3C 48. Everytime is sees the number 12 it says it is the whole pattern.
uint32_t findOccurrencesOfPattern(uint32_t *const pOffsets,
const uint8_t *const blockAddress,
uint32_t blockLength,
const uint8_t *const pPattern,
uint8_t patternLength) {
uint32_t bytesRead = 0;
int count = 0;
int length = 0;
// const char* c = ((const char *)&pPattern);
while (bytesRead < blockLength) {
if (*(blockAddress + bytesRead) == *pPattern) {
*(pOffsets + count) = bytesRead;
count++;
}
patternLength--;
bytesRead++;
}
return count;
}

You will need to turn the if() statement (which presently check only the first byte) into a for() loop to iterate over patternLength-worth of bytes.

Related

Disassembler logic for custom opcodes in C

So I'm building a disassembler that will convert a file containing hexadecimal data into assembly language.
So from this format I could convert the hexadecimal data in the file into decimal using uint8_t and store them in an array. Then I decided to bit shift the last number in the array to get number of instructions of the last function; essentially I'm parsing backwards since I don't know how much padding there are at the beginning and the number of ops in a function is given at the end of the function. But then I realised that the operations varies in bit size and aren't in perfect 8 or 16 bit bounds. So then I was stuck since my array, using the example at the top, was essentially this:
uint8_t hex[] = {0x00, 0x03, 0x02, 0x01, 0x42, 0x82, 0x86, 0x04, 0x10, 0x45};
So can anyone help me with the logic in parsing? This is my first time posting so I'm sorry if I'm missing anything and will provide more information or delete if needed
Instead of shifting and masking (which I think would be really complicated) what if you convert the uint8_t array into an array of bits - it uses a lot more memory but you can access individual bits much easier.
Here is a sample program that does this:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
uint8_t value = 0;
*index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[*index+i];
return value;
}
int main()
{
// sample program
uint8_t array[] = {0x00,0x03,0x02,0x01,0x42,0x82,0x86,0x04,0x10,0x45};
// program with zero padding
// uint8_t array[] = {0xE8,0x39,0x06,0xA0,0xC4,0x16,0x82,0x90,0x4A,0x08,0x41};
uint32_t array_size = sizeof(array)/sizeof(*array); // 10 bytes
uint32_t bits_size = 8*array_size; // 80 bytes
uint8_t* bits = malloc(bits_size);
for(uint32_t a=0;a<array_size;a++)
for(uint32_t b=0;b<8;b++)
bits[a*8+b] = (array[a] >> (7-b)) & 1;
puts("Binary program file:");
for(uint32_t i=0;i<bits_size;i++)
printf("%s%d",(i%8?"":" "),bits[i]);
puts("");
enum { MOV, CAL, RET, REF, ADD, PRINT, NOT, EQU};
uint8_t params[] = { 2, 1, 0, 2, 2, 1, 1, 1};
const char *opcodes[] = {"MOV","CAL","RET","REF","ADD","PRINT","NOT","EQU"};
enum { VAL, REG, STK, PTR};
uint8_t value_size[] = { 8, 3, 5, 5};
const char *types[] = {"VAL","REG","STK","PTR"};
uint32_t index = bits_size; // start at end
// minimum program size is function(3) + opcode(3) + size(5)
// if there are less than that number of bits then it must be padding
while(index>10)
{
uint8_t size = getBits(bits,5,&index);
printf("\nsize=%d\n",size);
if (size > 0)
{
for(int o=0; o<size; o++)
{
uint8_t opcode = getBits(bits,3,&index);
printf("opcode=%s",opcodes[opcode]);
for(int p=0; p<params[opcode]; p++)
{
printf("%c ",p?',':':');
uint8_t type = getBits(bits,2,&index);
printf("type=%s ",types[type]);
uint8_t value = getBits(bits,value_size[type],&index);
printf("value=%d",value);
}
puts("");
}
uint8_t function = getBits(bits,3,&index);
printf("function=%d\n",function);
}
}
return 0;
}
Try it at https://onlinegdb.com/S1qVStz8d
How it getBits() works:
You make an array of individual digits from the original value, and then you take bits from it one at a time to make a new value - getBits() is the function I have written for that.
To understand how it works imagine how it works in base 10: 321 is put into the array {3,2,1} and you could turn it back into a value with:
value = 0;
value = value*10 + digits[0];
value = value*10 + digits[1];
value = value*10 + digits[2];
Which gives (((0)*10+3)*10+2)*10+1 which is 321
If 5 (binary 101) is put into the array {1,0,1}, you could turn it back into a value with:
value = 0;
value = value*2 + bits[0];
value = value*2 + bits[1];
value = value*2 + bits[2];
Which gives (((0)*2+1)*2+0)*2+1 which is 5 (binary 101)
And that does work. And a decent compiler would optimize the *2 into <<1 and the + into |, but you could do it yourself (which is what I did):
value = 0;
value = (value<<1) | bits[0];
value = (value<<1) | bits[1];
value = (value<<1) | bits[2];
Which produces that same binary 00000101
It's just a readability thing - with decimal you expect to see value*10+x but with binary you expect to see bit operations like shift/or instead of math operations like multiply/add.
Then, if you use a loop with a size and an index that points to the end of the array, you get:
uint8_t value = 0;
index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[index+i];
But, of course, if it is a function then index needs to be a pointer and you need to dereference it everywhere:
uint8_t getBits(uint8_t *bits, uint8_t size, uint32_t *index)
{
uint8_t value = 0;
*index -= size; // decrement index to the starting point
for(uint32_t i=0; i<size; i++)
value = (value<<1) | bits[*index+i];
return value;
}

What does the last `for` loop in this Radix Sort code do?

I was reading the book Learn C The Hard Way by Zed A. Shaw and I was looking over his implementation of the radix sort algorithm.
This is his code:
#define ByteOf(x, y) (((u_int8_t *)x)[y])
static inline void radix_sort(short offset, uint64_t max,
uint64_t * source, uint64_t * dest)
{
uint64_t count[256] = { 0 };
uint64_t *cp = NULL;
uint64_t *sp = NULL;
uint64_t *end = NULL;
uint64_t s = 0;
uint64_t c = 0;
// Count occurences of every byte value
for (sp = source, end = source + max; sp < end; sp++) {
count[ByteOf(sp, offset)]++;
}
// transform count into index by summing
// elements and storing them into same array.
for (s = 0, cp = count, end = count + 256; cp < end; cp++) {
c = *cp;
*cp = s;
s += c;
}
// fill dest with right values in the right place
for (sp = source, end = source + max; sp < end; sp++) {
cp = count + ByteOf(sp, offset);
printf("dest[%d] = %d\n", *cp, *sp);
dest[*cp] = *sp;
++(*cp);
}
}
The above is just a helper function. His actual radix sort is done here:
void RadixMap_sort(RadixMap * map)
{
uint64_t *source = &map->contents[0].raw;
uint64_t *temp = &map->temp[0].raw;
radix_sort(0, map->end, source, temp);
radix_sort(1, map->end, temp, source);
radix_sort(2, map->end, source, temp);
radix_sort(3, map->end, temp, source);
}
Here's the structures he's defined:
typedef union RMElement {
uint64_t raw;
struct {
uint32_t key;
uint32_t value;
} data;
} RMElement;
typedef struct RadixMap {
size_t max;
size_t end;
uint32_t counter;
RMElement *contents;
RMElement *temp;
} RadixMap;
I can understand the first 2 for loops in the inline function radix_sort. As far as I understand, the first one simply just counts the byte values and the second function basically makes a cumulative frequency table, where each entry is the sum of the previous entries.
I still can't wrap my head around the ByteOf(x, y) macro and the third for loop. I've tried reading the Wikipedia page for Radix-sort and I read another article that used a C++ implementation. However, the code written in each of these articles doesn't match the code that he's written.
I understand how Radix Sort works in principle. Basically, we group it according to each digit, rearranging the groupings for every new digit we encounter. For example, to sort the array [223, 912, 275, 100, 633, 120, 380], you first group them by the ones digit so you get [380, 100, 120], [912], [633, 223], [275]. Then you do the same thing with the tens and hundreds place until you've run out of digits.
Any help explaining his code would be appreciated.
Thanks.
ByteOf(x, y) is the same as:
#define ByteOf(x, y) ((*(x) >> (offset*8)) & 0xff)
That is, it isolates the value of byte #{offset} within a value.
The second loop is a sort of allocator. If the first six counts[] were 1,2,4,0,16,25 after the first loop they would be 0,1,3,7,7,23 after the second. This directs the third loop (over source[]) to layout the destination as:
ByteOf index number of values
0 0 1
1 1 2
2 3 4
3 7 0 -- there are none.
4 7 16
5 23 25
I find it a bit clearer to rewrite the third loop as:
for (i = 0; i < max; i++) {
dest[count[ByteOf((source+i), offset)]++] = source[i];
}
I think it shows the relationship more clearly, that is that the ith’ source element is being copied to an index in dest. The index in dest is at the start of the partition (count[]) previously computed for this digit. Since there is now a number at this location, we increment the start of this partition to prevent over-writing it.
Note that the brackets around (source+i) are necessary to get the right address for the cast in ByteOf.

figure out why my RC4 Implementation doesent produce the correct result

Ok I am new to C, I have programmed in C# for around 10 years now so still getting used to the whole language, Ive been doing great in learning but im still having a few hickups, currently im trying to write a implementation of RC4 used on the Xbox 360 to encrypt KeyVault/Account data.
However Ive run into a snag, the code works but it is outputting the incorrect data, I have provided the original c# code I am working with that I know works and I have provided the snippet of code from my C project, any help / pointers will be much appreciated :)
Original C# Code :
public struct RC4Session
{
public byte[] Key;
public int SBoxLen;
public byte[] SBox;
public int I;
public int J;
}
public static RC4Session RC4CreateSession(byte[] key)
{
RC4Session session = new RC4Session
{
Key = key,
I = 0,
J = 0,
SBoxLen = 0x100,
SBox = new byte[0x100]
};
for (int i = 0; i < session.SBoxLen; i++)
{
session.SBox[i] = (byte)i;
}
int index = 0;
for (int j = 0; j < session.SBoxLen; j++)
{
index = ((index + session.SBox[j]) + key[j % key.Length]) % session.SBoxLen;
byte num4 = session.SBox[index];
session.SBox[index] = session.SBox[j];
session.SBox[j] = num4;
}
return session;
}
public static void RC4Encrypt(ref RC4Session session, byte[] data, int index, int count)
{
int num = index;
do
{
session.I = (session.I + 1) % 0x100;
session.J = (session.J + session.SBox[session.I]) % 0x100;
byte num2 = session.SBox[session.I];
session.SBox[session.I] = session.SBox[session.J];
session.SBox[session.J] = num2;
byte num3 = data[num];
byte num4 = session.SBox[(session.SBox[session.I] + session.SBox[session.J]) % 0x100];
data[num] = (byte)(num3 ^ num4);
num++;
}
while (num != (index + count));
}
Now Here is my own c version :
typedef struct rc4_state {
int s_box_len;
uint8_t* sbox;
int i;
int j;
} rc4_state_t;
unsigned char* HMAC_SHA1(const char* cpukey, const unsigned char* hmac_key) {
unsigned char* digest = malloc(20);
digest = HMAC(EVP_sha1(), cpukey, 16, hmac_key, 16, NULL, NULL);
return digest;
}
void rc4_init(rc4_state_t* state, const uint8_t *key, int keylen)
{
state->i = 0;
state->j = 0;
state->s_box_len = 0x100;
state->sbox = malloc(0x100);
// Init sbox.
int i = 0, index = 0, j = 0;
uint8_t buf;
while(i < state->s_box_len) {
state->sbox[i] = (uint8_t)i;
i++;
}
while(j < state->s_box_len) {
index = ((index + state->sbox[j]) + key[j % keylen]) % state->s_box_len;
buf = state->sbox[index];
state->sbox[index] = (uint8_t)state->sbox[j];
state->sbox[j] = (uint8_t)buf;
j++;
}
}
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen)
{
int idx = 0;
uint8_t num, num2, num3;
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
while(idx != buflen) {
state->i = (int)(state->i + 1) % 0x100;
state->j = (int)(state->j + state->sbox[state->i]) % 0x100;
num = (uint8_t)state->sbox[state->i];
state->sbox[state->i] = (uint8_t)state->sbox[state->j];
state->sbox[state->j] = (uint8_t)num;
num2 = (uint8_t)inbuf[idx];
num3 = (uint8_t)state->sbox[(state->sbox[state->i] + (uint8_t)state->sbox[state->j]) % 0x100];
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
printf("%02X", (*outbuf)[idx]);
idx++;
}
}
printf("\n");
}
Usage (c#) :
byte[] cpukey = new byte[16]
{
...
};
byte[] hmac_key = new byte[16]
{
...
};
byte[] buf = new System.Security.Cryptography.HMACSHA1(cpukey).ComputeHash(hmac_key);
MessageBox.Show(BitConverter.ToString(buf).Replace("-", ""), "");
Usage(c):
const char cpu_key[16] = { 0xXX, 0xXX, 0xXX };
const unsigned char hmac_key[16] = { ... };
unsigned char* buf = HMAC_SHA1(cpu_key, hmac_key);
uint8_t buf2[20];
uint8_t buf3[8] = { 0x1E, 0xF7, 0x94, 0x48, 0x22, 0x26, 0x89, 0x8E }; // Encrypted Xbox 360 data
uint8_t* buf4;
// Allocated 8 bytes out.
buf4 = malloc(8);
int num = 0;
while(num < 20) {
buf2[num] = (uint8_t)buf[num]; // convert const char
num++;
}
rc4_state_t* rc4 = malloc(sizeof(rc4_state_t));
rc4_init(rc4, buf2, 20);
rc4_crypt(rc4, buf3, &buf4, 8);
Now I have the HMACsha1 figured out, im using openssl for that and I confirm I am getting the correct hmac/decryption key its just the rc4 isnt working, Im trying to decrypt part of the Kyevault that should == "Xbox 360"||"58626F7820333630"
The output is currently : "0000008108020000" I do not get any errors in the compilation, again any help would be great ^.^
Thanks to John's help I was able to fix it, it was a error in the c# version, thanks John !
As I remarked in comments, your main problem appeared to involve how the output buffer is managed. You have since revised the question to fix that, but I describe it anyway here, along with some other alternatives for fixing it. The remaining problem is discussed at the end.
Function rc4_crypt() allocates an output buffer for itself, but it has no mechanism to communicate a pointer to the allocated space back to its caller. Your revised usage furthermore exhibits some inconsistency with rc4_crypt() with respect to how the output buffer is expected to be managed.
There are three main ways to approach the problem.
Function rc4_crypt() presently returns nothing, so you could let it continue to allocate the buffer itself, and modify it to return a pointer to the allocated output buffer.
You could modify the type of the outbuf parameter to uint8_t ** to enable rc4_crypt() to set the caller's pointer value indirectly.
You could rely on the caller to manage the output buffer, and make rc4_crypt() just write the output via the pointer passed to it.
The only one of those that might be tricky for you is #2; it would look something like this:
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen) {
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
// ...
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
// ...
}
}
And you would use it like this:
rc4_crypt(rc4, buf3, &buf4, 8);
... without otherwise allocating any memory for buf4.
The caller in any case has the responsibility for freeing the output buffer when it is no longer needed. This is clearer when it performs the allocation itself; you should document that requirement if rc4_crypt() is going to be responsible for the allocation.
The remaining problem appears to be strictly an output problem. You are apparently relying on print statements in rc4_crypt() to report on the encrypted data. I have no problem whatever with debugging via print statements, but you do need to be careful to print the data you actually want to examine. In this case you do not. You update the joint buffer index idx at the end of the encryption loop before printing a byte from the output buffer. As a result, at each iteration you print not the encrypted byte value you've just computed, but rather an indeterminate value that happens to be in the next position of the output buffer.
Move the idx++ to the very end of the loop to fix this problem, or change it from a while loop to a for loop and increment idx in the third term of the loop control statement. In fact, I strongly recommend for loops over while loops where the former are a good fit to the structure of the code (as here); I daresay you would not have made this mistake if your loop had been structured that way.

How to use strncpy with a for-loop in C?

I am writing a program which will take every 3 numbers in a file and convert them to their ASCII symbol. So I thought I could read the numbers into a character array, and then make every 3 elements 1 element in a second array, convert them to int and then print these as char.
I am stuck on taking every 3 elements, however. This is my code snippet for this part:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[16] = {0};
for(int i = 0; i <= sizeof(arryNew); i++){
strncpy(arryNew, arry, 3);
arryNew[i+3]='\0';
puts(arryNew);
}
What this code gives me is the first 3 numbers, fifteen times. I've tried incrementing i by 3, which gives me the first 3 numbers 5 times. How do I write a for-loop with strncpy so that after copying n chars, it moves to the next n chars?
You pass always the pointer to the beginning of the array, so you will always have the same result of course. You must include the loop counter to get at the next block:
strncpy(arryNew, &arry[i*3], 3);
Here you have a problem:
arryNew[i+3]='\0';
First of all, you don't need to set the null byte every time, because this will not change anyway. Additionally you will corrupt memory, because you use i+3 as the index so when you reach 14 and 15, it will write beyond the arrayboundary.
Your arrayNew must be longer, because your original array is 16 characters, and your target array is also. If you intend to have several 3char strings in there, then you must have 5*4 characters for your target, because each string also has the 0-byte.
And of course, you must also use the index here as well. The way it is written now, it will write beyond the array boundary, when i reaches 14 and 15.
So what you seem to want to do (not sure from your description) is:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[20] = {0};
for(int i = 0; i <= sizeof(arry); i++)
{
strncpy(&arryNew[i*4], &arry[i*3], 3);
puts(&arryNew[i*4]);
}
Or if you just want to have the individual strings printed then you can just do:
char arry[] = "073102109109112"; <--example string read from a file
char arryNew[4] = {0};
for(int i = 0; i <= sizeof(arry); i++)
{
strncpy(arryNew, &arry[i*3], 3);
puts(arryNew);
}
Making things a bit simpler: your target string doesn't change.
char arry[] = "073102109109112"; <--example string read from a file
char target[4] = {0};
for(int i = 0; i < strlen(arry) - 3; i+=3)
{
strncpy(target, arry + i, 3);
puts(target);
}
Decoding:
start at the beginning of arry
copy 3 characters to target
(note the fourth element of target is \0)
print out the contents of target
increment i by 3
repeat until you fall off the end of the string.
Some problems.
// Need to change a 3 chars, as text, into an integer.
arryNew[i] = (char) strtol(buf, &endptr, 10);
// char arryNew[16] = {0};
// Overly large.
arryNew[6]
// for(int i = 0; i <= sizeof(arryNew); i++){
// Indexing too far. Should be `i <= (sizeof(arryNew) - 2)` or ...
for (i=0; i<arryNewLen; i++) {
// strncpy(arryNew, arry, 3);
// strncpy() can be used, but we know the length of source and destination,
// simpler to use memcpy()
// strncpy(buf, a, sizeof buf - 1);
memcpy(buf, arry, N);
// arryNew[i+3]='\0';
// Toward the loop's end, code is writing outside arryNew.
// Lets append the `\0` after the for() loop.
// int i
size_t i; // Better to use size_t (or ssize_t) for array index.
Suggestion:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char Source[] = "073102109109112"; // example string read from a file
const int TIW = 3; // textual integer width
// Avoid sprinkling bare constants about code. Define in 1 place instead.
const char *arry = Source;
size_t arryLen = strlen(arry);
if (arryLen%TIW != 0) return -1; // is it a strange sized arry?
size_t arryNewLen = arryLen/TIW;
char arryNew[arryNewLen + 1];
size_t i;
for (i=0; i<arryNewLen; i++) {
char buf[TIW + 1];
// strncpy(buf, a, sizeof buf - 1);
memcpy(buf, arry, TIW);
buf[TIW] = '\0';
char *endptr; // Useful should OP want to do error checking
// TBD: test if result is 0 to 255
arryNew[i] = (char) strtol(buf, &endptr, 10);
arry += TIW;
}
arryNew[i] = '\0';
puts(arryNew); // prints Ifmmp
return 0;
}
You could use this code to complete your task i.e. to convert the given char array in form of ascii value.
char arry[] = "073102109109112";
char arryNew[16] = {0};
int i,j=0;
for(i = 0; i <= sizeof(arryNew)-2; i+=3)
{
arryNew[j]=arry[i]*100+arry[i+1]*10+arry[i+2]*1;
j++;
arryNew[j+1]='\0';
puts(arryNew);
}

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].
This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.
Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?
And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555
If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.
It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.
A3B1C2D1E1
becomes
A3BC2DE
Here is some code, in C++, to remove the 1s from the string (O(n) complexity).
// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
assert(j <= i); // optional check
if(str[i] != '1') {
str[j] = str[i];
++ j;
}
++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string
Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.
(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).
Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.
During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.
The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.
High level overview:
Determine the encoded length.
Determine the decoded length by reading all the numbers and summing them up.
End of buffer is MAX(decoded length, encoded length).
Decode the string by starting from the end of the string. Write from the end of the buffer.
Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.
int isDigit (char c) {
return '0' <= c && c <= '9';
}
unsigned int toDigit (char c) {
return c - '0';
}
unsigned int intLen (char * str) {
unsigned int n = 0;
while (isDigit(*str++)) {
++n;
}
return n;
}
unsigned int forwardParseInt (char ** pStr) {
unsigned int n = 0;
char * pChar = *pStr;
while (isDigit(*pChar)) {
n = 10 * n + toDigit(*pChar);
++pChar;
}
*pStr = pChar;
return n;
}
unsigned int backwardParseInt (char ** pStr, char * beginStr) {
unsigned int len, n;
char * pChar = *pStr;
while (pChar != beginStr && isDigit(*pChar)) {
--pChar;
}
++pChar;
len = intLen(pChar);
n = forwardParseInt(&pChar);
*pStr = pChar - 1 - len;
return n;
}
unsigned int encodedSize (char * encoded) {
int encodedLen = 0;
while (*encoded++ != '\0') {
++encodedLen;
}
return encodedLen;
}
unsigned int decodedSize (char * encoded) {
int decodedLen = 0;
while (*encoded++ != '\0') {
decodedLen += forwardParseInt(&encoded);
}
return decodedLen;
}
void shift (char * str, int n) {
do {
str[n] = *str;
} while (*str++ != '\0');
}
unsigned int max (unsigned int x, unsigned int y) {
return x > y ? x : y;
}
void decode (char * encodedBegin) {
int shiftAmount;
unsigned int eSize = encodedSize(encodedBegin);
unsigned int dSize = decodedSize(encodedBegin);
int writeOverflowed = 0;
char * read = encodedBegin + eSize - 1;
char * write = encodedBegin + max(eSize, dSize);
*write-- = '\0';
while (read != encodedBegin) {
unsigned int i;
unsigned int n = backwardParseInt(&read, encodedBegin);
char c = *read;
for (i = 0; i < n; ++i) {
*write = c;
if (write != encodedBegin) {
write--;
}
else {
writeOverflowed = 1;
}
}
if (read != encodedBegin) {
read--;
}
}
if (!writeOverflowed) {
write++;
}
shiftAmount = encodedBegin - write;
if (write != encodedBegin) {
shift(write, shiftAmount);
}
return;
}
int main (int argc, char ** argv) {
//char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
//char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char * str = buff + 3;
//char buff[256] = { "A1B1" };
//char * str = buff;
decode(str);
return 0;
}
This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?
For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).
Then loop though, moving the characters to their new slots:
To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.
You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).
Pseudocode for the move:
var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.
for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}
This would leave you with:
A|3|A|3|B|1|C|2|||||
Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:
for(i = cursor; i < cursor + n + 1; i++)
{
array[i] = array[cursor - 1];
}
// increment the cursor afterwards!
cursor += n + 1;
Giving:
A|A|A|A|B|1|C|2|||||
Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.
Another O(n^2) solution follows.
Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.
while ( there is an expandable element ):
expand that element
adjust (shift) all of the elements on the right side of the expanded element
Where:
Free space size is the number of empty elements left in the array.
An expandable element is an element that:
expanded size - encoded size <= free space size
The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

Resources