pointers and string parsing in c - c

I was wondering if somebody could explain me how pointers and string parsing works. I know that I can do something like the following in a loop but I still don't follow very well how it works.
for (a = str; * a; a++) ...
For instance, I'm trying to get the last integer from the string. if I have a string as const char *str = "some string here 100 2000";
Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.
Thanks

for (a = str; * a; a++) ...
This works by starting a pointer a at the beginning of the string, until dereferencing a is implicitly converted to false, incrementing a at each step.
Basically, you'll walk the array until you get to the NUL terminator that's at the end of your string (\0) because the NUL terminator implicitly converts to false - other characters do not.
Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.
You're going to want to look for the last space before the \0, then you're going to want to call a function to convert the remaining characters to an integer. See strtol.
Consider this approach:
find the end of the string (using that loop)
search backwards for a space.
use that to call strtol.
-
for (a = str; *a; a++); // Find the end.
while (*a != ' ') a--; // Move back to the space.
a++; // Move one past the space.
int result = strtol(a, NULL, 10);
Or alternatively, just keep track of the start of the last token:
const char* start = str;
for (a = str; *a; a++) { // Until you hit the end of the string.
if (*a == ' ') start = a; // New token, reassign start.
}
int result = strtol(start, NULL, 10);
This version has the benefit of not requiring a space in the string.

You just need to implement a simple state machine with two states, e.g
#include <ctype.h>
int num = 0; // the final int value will be contained here
int state = 0; // state == 0 == not parsing int, state == 1 == parsing int
for (i = 0; i < strlen(s); ++i)
{
if (state == 0) // if currently in state 0, i.e. not parsing int
{
if (isdigit(s[i])) // if we just found the first digit character of an int
{
num = s[i] - '0'; // discard any old int value and start accumulating new value
state = 1; // we are now in state 1
}
// otherwise do nothing and remain in state 0
}
else // currently in state 1, i.e. parsing int
{
if (isdigit(s[i])) // if this is another digit character
{
num = num * 10 + s[i] - '0'; // continue accumulating int
// remain in state 1...
}
else // no longer parsing int
{
state = 0; // return to state 0
}
}
}

I know this has been answered already but all the answers thus far are recreating code that is available in the Standard C Library. Here is what I would use by taking advantage of strrchr()
#include <string.h>
#include <stdio.h>
int main(void)
{
const char* input = "some string here 100 2000";
char* p;
long l = 0;
if(p = strrchr(input, ' '))
l = strtol(p+1, NULL, 10);
printf("%ld\n", l);
return 0;
}
Output
2000

for (a = str; * a; a++)...
is equivalent to
a=str;
while(*a!='\0') //'\0' is NUL, don't confuse it with NULL which is a macro
{
....
a++;
}

The loop you've presented just goes through all characters (string is a pointer to the array of 1-byte chars that ends with 0). For parsing you should use sscanf or better C++'s string and string stream.

Related

Optimizing a do while loop/for loop in C

I was doing an exercise from LeetCode in which consisted in deleting any adjacent elements from a string, until there are only unique characters adjacent to each other. With some help I could make a code that can solve most testcases, but the string length can be up to 10^5, and in a testcase it exceeds the time limit, so I'm in need in some tips on how can I optimize it.
My code:
char res[100000]; //up to 10^5
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
ver = 0;
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
//clar the res string
memset(res, '\0', sizeof res);
} while (ver > 0);
return s;
}
The code can't pass a speed test that has a string that has around the limit (10^5) length, I won't put it here because it's a really big text, but if you want to check it, it is the 104 testcase from the LeetCode Daily Problem
If it was me doing something like that, I would basically do it like a simple naive string copy, but keep track of the last character copied and if the next character to copy is the same as the last then skip it.
Perhaps something like this:
char result[1000]; // Assumes no input string will be longer than this
unsigned source_index; // Index into the source string
unsigned dest_index; // Index into the destination (result) string
// Always copy the first character
result[0] = source_string[0];
// Start with 1 for source index, since we already copies the first character
for (source_index = 1, dest_index = 0; source_string[source_index] != '\0'; ++source_index)
{
if (source_string[source_index] != result[dest_index])
{
// Next character is not equal to last character copied
// That means we can copy this character
result[++dest_index] = source_string[source_index];
}
// Else: Current source character was equal to last copied character
}
// Terminate the destination string
result[dest_index + 1] = '\0';

Design a Character Searching Function, While Forced to Use strchr

Background Information
I was recently approached by a friend who was given a homework problem to develop a searching algorithm. Before anyone asks, I did think of a solution! However, my solution is not what the teacher is asking for...
Anyway, this is an introductory C programming course where the students have been asked to write a search function called ch_search that is supposed to search an array of characters to determine how many times a specific character occurs. The constraints are what I don't understand...
Constraints:
The arguments are: array to search, character to search for, and length of the array being searched.
The function must use a for-loop.
The algorithm must use the strchr function.
Okay, so the first two constraints I can understand... but the 3rd constraint is what really gets me... I was initially thinking that we could just use a for-loop to iterate through the string from the beginning to the end, simply counting each instance of the character. When the student originally described the problem to me, I came up with (although incorrect) the solution:
Proposed Solution
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++)
{
// count each character instance
if (array_to_search[i] == char_to_search_for)
{
// keep incrementing the count
count++;
}
}
return count;
}
Then I was told that I had to specifically use the character position function (and apparently it has to be strchr and not strrchr so we can't start at the end I guess?)... I just don't see how that wouldn't be overcomplicating this. I don't see how that would help at all, especially counting from the beginning... Even strrchr might make a little more sense to me. Thoughts?
It's true that having the length of the array and having to use a for loop,
the most natural thing to do would be to iterate over every characters of the
source array. But you can also loop over the result of strchr like this:
int ch_search(char haystack[], char needle, int size)
{
int count = 0;
char *found;
for(; (found = strchr(haystack, needle)) != NULL; haystack = found + 1)
count++;
return count;
}
In this case you don't need the size of the array but the assignment doesn't say
that you have to use it. Obviously this solution requires the source to be '\0'-terminated.
I think the teacher wanted you to use strchr to navigate to the next occurrence of the char_to_search_for within a string:
int ch_search(char array_to_search[], char char_to_search_for, int array_size) {
int count = 0;
for (char *ptr = array_to_search ; ptr != &array_to_search[array_size] ; ptr++) {
ptr = strchr(ptr, char_to_search_for);
if (!ptr) {
break; // Character is not found
}
count++;
}
return count;
}
Note that array_to_search must be null-terminated in order to be used together with strchr solution above.
This sounds like your friend was given a trick question. The function gets an array of chars and the length of that array but is required to use strchr() even though that function only works on '\0' terminated strings (and there was not given any guaranty that the array is '\0' terminated).
You might thing that it would be fine to use strchr() on the array anyway and then compare the returned pointer to the given length of the array to check if it went past the end of the array. But there are two problems with that:
If strchr() searches past the end of the array, then you already have Undefined Behavior before getting to the check. The program might have crashed before returning from strchr(), the returned pointer might be some total garbage or you might get a pointer to an address a bit further in memory than the end of the array.
Even if the returned pointer is just to an address a bit further in memory than the end of the array, then there is the problem that comparing two pointers (or subtracting them to find the distance between the pointed addresses) is Undefined Behavior unless they're both pointing to parts of the same memory object (or one position past the end of the object). In this instance it means that checking if the returned pointer is within the bounds of the array is only defined behavior if the returned pointer is within the bounds of the array (or one past the end) making the check a bit useless.
The only solution to that is to make sure that strchr() is working with a '\0' terminated string. For example:
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
char *buffer = malloc(array_size + 1);
// Add test here to check if malloc was succesful
strncpy(buffer, array_to_search, array_size);
buffer[array_size] = '\0';
int count = 0;
for (char *i = buffer; (i = strchr(i, char_to_search_for)) != NULL; i++) {
count++;
}
free(buffer);
return count;
}
strchr is a very convenient function to search for a char in a string.
Find and read more about strchr. This is my favorite function ever!
The C library function char *strchr(const char *str, int c) searches for the first occurrence of the character c (an unsigned char) in the string pointed to by the argument str.
Declaration
Following is the declaration for strchr() function.
char *strchr(const char *str, int c)
Parameters
str − This is the C string to be scanned.
c − This is the character to be searched in str.
Return value
Function returns a pointer to the first occurrence of the character c in the string str, or NULL if the character is not found.
Constraints:
1) The arguments are: array to search, character to search for, and
length of the array being searched.
This constrain gives the length of the array to be searched. The given array has to contain '\0' at some point. However the length of search search can be shorter and specified by the search_length.
Following compact solution takes this under account.
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
Or equivalent ch_search2:
#include<stdio.h>
#include<string.h>
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
// Your original function:
int ch_search1(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++){
// count each character instance
if (array_to_search[i] == char_to_search_for){
count++; // keep incrementing the count
}
}
return count;
}
int ch_search2(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
char *p = array_to_search;
for(;;)
{
p = strchr(p, char_to_search_for);
if( p != NULL )
{
if (p - array_to_search >= array_size) // we reached beyond
{
break;
}
else
{
count++;
p++;
}
}
else
break; // char not found
}
return count;
}
int main(void)
{
// the arr has to contain '\0' terminator but we can search within the specified length.
char arr[]={'1','1','2','2','1','1','3','3','3','1','4','4', '1','1','!','1','\0','1'};
char arr1[] = "zdxbab";
printf("count %d count %d \n",ch_search(arr , '1', 12),ch_search2(arr , '1', 12));
printf("count %d count %d \n",ch_search(arr1,'b',strlen(arr1)),ch_search2(arr1,'b',strlen(arr1)));
return 0;
}
Output:
count 5 count 5
count 2 count 2

Returning the length of a char array in C

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

Pointers to string C

trying to write function that returns 1 if every letter in “word” appears in “s”.
for example:

containsLetters1("this_is_a_long_string","gas") returns 1
containsLetters1("this_is_a_longstring","gaz") returns 0
containsLetters1("hello","p") returns 0
Can't understand why its not right:
#include <stdio.h>
#include <string.h>
#define MAX_STRING 100
int containsLetters1(char *s, char *word)
{
int j,i, flag;
long len;
len=strlen(word);
for (i=0; i<=len; i++) {
flag=0;
for (j=0; j<MAX_STRING; j++) {
if (word==s) {
flag=1;
word++;
s++;
break;
}
s++;
}
if (flag==0) {
break;
}
}
return flag;
}
int main() {
char string1[MAX_STRING] , string2[MAX_STRING] ;
printf("Enter 2 strings for containsLetters1\n");
scanf ("%s %s", string1, string2);
printf("Return value from containsLetters1 is: %d\n",containsLetters1(string1,string2));
return 0;
Try these:
for (i=0; i < len; i++)... (use < instead of <=, since otherwise you would take one additional character);
if (word==s) should be if (*word==*s) (you compare characters stored at the pointed locations, not pointers);
Pointer s advances, but it should get back to the start of the word s, after reaching its end, i.e. s -= len after the for (j=...);
s++ after word++ is not needed, you advance the pointer by the same amount, whether or not you found a match;
flag should be initialized with 1 when declared.
Ah, that should be if(*word == *s) you need to use the indirection operator. Also as hackss said, the flag = 0; must be outside the first for() loop.
Unrelated but probably replace scanf with fgets or use scanf with length specifier For example
scanf("%99s",string1)
Things I can see wrong at first glance:
Your loop goes over MAX_STRING, it only needs to go over the length of s.
Your iteration should cover only the length of the string, but indexes start at 0 and not 1. for (i=0; i<=len; i++) is not correct.
You should also compare the contents of the pointer and not the pointers themselves. if(*word == *s)
The pointer advance logic is incorrect. Maybe treating the pointer as an array could simplify your logic.
Another unrelated point: A different algorithm is to hash the characters of string1 to a map, then check each character of the string2 and see if it is present in the map. If all characters are present then return 1 and when you encounter the first one that is not present then return 0. If you are only limited to using ASCII characters a hashing function is very easy. The longer your ASCII strings are the better the performance of the second approach.
Here is a one-liner solution, in keeping with Henry Spencer's Commandment 7 for C Programmers.
#include <string.h>
/*
* Does l contain every character that appears in r?
*
* Note degenerate cases: true if r is an empty string, even if l is empty.
*/
int contains(const char *l, const char *r)
{
return strspn(r, l) == strlen(r);
}
However, the problem statement is not about characters, but about letters. To solve the problem as literally given in the question, we must remove non-letters from the right string. For instance if r is the word error-prone, and l does not contain a hyphen, then the function returns 0, even if l contains every letter in r.
If we are allowed to modify the string r in place, then what we can do is replace every non-letter in the string with one of the letters that it does contain. (If it contains no letters, then we can just turn it into an empty string.)
void nuke_non_letters(char *r)
{
static const char *alpha =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
while (*r) {
size_t letter_span = strspn(r, alpha);
size_t non_letter_span = strcspn(r + letter_span, alpha);
char replace = (letter_span != 0) ? *r : 0;
memset(r + letter_span, replace, non_letter_span);
r += letter_span + non_letter_span;
}
}
This also brings up another flaw: letters can be upper and lower case. If the right string is A, and the left one contains only a lower-case a, then we have failure.
One way to fix it is to filter the characters of both strings through tolower or toupper.
A third problem is that a letter is more than just the 26 letters of the English alphabet. A modern program should work with wide characters and recognize all Unicode letters as such so that it works in any language.
By the time we deal with all that, we may well surpass the length of some of the other answers.
Extending the idea in Rajiv's answer, you might build the character map incrementally, as in containsLetters2() below.
The containsLetters1() function is a simple brute force implementation using the standard string functions. If there are N characters in the string (haystack) and M in the word (needle), it has a worst-case performance of O(N*M) when the characters of the word being looked for only appear at the very end of the searched string. The strchr(needle, needle[i]) >= &needle[i] test is an optimization if there are likely to be repeated characters in the needle; if there won't be any repeats, it is a pessimization (but it can be removed and the code still works fine).
The containsLetters2() function searches through the string (haystack) at most once and searches through the word (needle) at most once, for a worst case performance of O(N+M).
#include <assert.h>
#include <stdio.h>
#include <string.h>
static int containsLetters1(char const *haystack, char const *needle)
{
for (int i = 0; needle[i] != '\0'; i++)
{
if (strchr(needle, needle[i]) >= &needle[i] &&
strchr(haystack, needle[i]) == 0)
return 0;
}
return 1;
}
static int containsLetters2(char const *haystack, char const *needle)
{
char map[256] = { 0 };
size_t j = 0;
for (int i = 0; needle[i] != '\0'; i++)
{
unsigned char c_needle = needle[i];
if (map[c_needle] == 0)
{
/* We don't know whether needle[i] is in the haystack yet */
unsigned char c_stack;
do
{
c_stack = haystack[j++];
if (c_stack == 0)
return 0;
map[c_stack] = 1;
} while (c_stack != c_needle);
}
}
return 1;
}
int main(void)
{
assert(containsLetters1("this_is_a_long_string","gagahats") == 1);
assert(containsLetters1("this_is_a_longstring","gaz") == 0);
assert(containsLetters1("hello","p") == 0);
assert(containsLetters2("this_is_a_long_string","gagahats") == 1);
assert(containsLetters2("this_is_a_longstring","gaz") == 0);
assert(containsLetters2("hello","p") == 0);
}
Since you can see the entire scope of the testing, this is not anything like thoroughly tested, but I believe it should work fine, regardless of how many repeats there are in the needle.

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].
This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.
Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?
And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555
If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.
It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.
A3B1C2D1E1
becomes
A3BC2DE
Here is some code, in C++, to remove the 1s from the string (O(n) complexity).
// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
assert(j <= i); // optional check
if(str[i] != '1') {
str[j] = str[i];
++ j;
}
++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string
Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.
(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).
Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.
During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.
The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.
High level overview:
Determine the encoded length.
Determine the decoded length by reading all the numbers and summing them up.
End of buffer is MAX(decoded length, encoded length).
Decode the string by starting from the end of the string. Write from the end of the buffer.
Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.
int isDigit (char c) {
return '0' <= c && c <= '9';
}
unsigned int toDigit (char c) {
return c - '0';
}
unsigned int intLen (char * str) {
unsigned int n = 0;
while (isDigit(*str++)) {
++n;
}
return n;
}
unsigned int forwardParseInt (char ** pStr) {
unsigned int n = 0;
char * pChar = *pStr;
while (isDigit(*pChar)) {
n = 10 * n + toDigit(*pChar);
++pChar;
}
*pStr = pChar;
return n;
}
unsigned int backwardParseInt (char ** pStr, char * beginStr) {
unsigned int len, n;
char * pChar = *pStr;
while (pChar != beginStr && isDigit(*pChar)) {
--pChar;
}
++pChar;
len = intLen(pChar);
n = forwardParseInt(&pChar);
*pStr = pChar - 1 - len;
return n;
}
unsigned int encodedSize (char * encoded) {
int encodedLen = 0;
while (*encoded++ != '\0') {
++encodedLen;
}
return encodedLen;
}
unsigned int decodedSize (char * encoded) {
int decodedLen = 0;
while (*encoded++ != '\0') {
decodedLen += forwardParseInt(&encoded);
}
return decodedLen;
}
void shift (char * str, int n) {
do {
str[n] = *str;
} while (*str++ != '\0');
}
unsigned int max (unsigned int x, unsigned int y) {
return x > y ? x : y;
}
void decode (char * encodedBegin) {
int shiftAmount;
unsigned int eSize = encodedSize(encodedBegin);
unsigned int dSize = decodedSize(encodedBegin);
int writeOverflowed = 0;
char * read = encodedBegin + eSize - 1;
char * write = encodedBegin + max(eSize, dSize);
*write-- = '\0';
while (read != encodedBegin) {
unsigned int i;
unsigned int n = backwardParseInt(&read, encodedBegin);
char c = *read;
for (i = 0; i < n; ++i) {
*write = c;
if (write != encodedBegin) {
write--;
}
else {
writeOverflowed = 1;
}
}
if (read != encodedBegin) {
read--;
}
}
if (!writeOverflowed) {
write++;
}
shiftAmount = encodedBegin - write;
if (write != encodedBegin) {
shift(write, shiftAmount);
}
return;
}
int main (int argc, char ** argv) {
//char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
//char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
char * str = buff + 3;
//char buff[256] = { "A1B1" };
//char * str = buff;
decode(str);
return 0;
}
This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?
For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).
Then loop though, moving the characters to their new slots:
To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.
You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).
Pseudocode for the move:
var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.
for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}
This would leave you with:
A|3|A|3|B|1|C|2|||||
Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:
for(i = cursor; i < cursor + n + 1; i++)
{
array[i] = array[cursor - 1];
}
// increment the cursor afterwards!
cursor += n + 1;
Giving:
A|A|A|A|B|1|C|2|||||
Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.
Another O(n^2) solution follows.
Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.
while ( there is an expandable element ):
expand that element
adjust (shift) all of the elements on the right side of the expanded element
Where:
Free space size is the number of empty elements left in the array.
An expandable element is an element that:
expanded size - encoded size <= free space size
The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

Resources