reduce the size of a string - c

(disclaimer: this is not a complete exercise because I have to finish it, but error occurred in this part of code)
I did this exercise to practice memory allocation.
create a function that takes an url (a C string) and returns the name of the website (with "www." and with the extension).
for example, given wikipedia's link, "http://www.wikipedia.org/", it has to return only "www.wikipedia.org" in another string (dynamically allocated in the heap).
this is what I did so far:
do a for-loop, and when "i" is greater than 6, then start copying each character in another string until "/" is reached.
I need to allocate the other string, and then reallocate that.
here's my attempt so far:
char *read_website(const char *url) {
char *str = malloc(sizeof(char));
if (str == NULL) {
exit(1);
}
for (unsigned int i = 0; url[i] != "/" && i > 6; ++i) {
if (i <= 6) {
continue;
}
char* s = realloc(str, sizeof(char) + 1);
if (s == NULL) {
exit(1);
}
*str = *s;
}
return str;
}
int main(void) {
char s[] = "http://www.wikipedia.org/";
char *str = read_website(s);
return 0;
}
(1) by debugging line-by-line, I've noticed that the program ends once for-loop is reached.
(2) another thing: I've chosen to create another pointer when I've used realloc, because I have to check if there's memory leak. Is it a good practice? Or should I've done something else?

There are multiple problems in your code:
url[i] != "/" is incorrect, it is a type mismatch. You should compare the character url[i] with a character constant '/', not a string literal "/".
char *s = realloc(str, sizeof(char) + 1); reallocates only to size 2, not the current length plus 1.
you do not increase the pointers, neither do you use the index variable.
instead of using malloc and realloc, you should first compute the length of the server name and allocate the array with the correct size directly.
Here is a modified version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *read_website(const char *url) {
// skip the protocol part
if (!strncmp(url, "http://", 7))
url += 7;
else if (!strncmp(url, "https://", 8))
url += 8;
// compute the length of the domain name, stop at ':' or '/'
size_t n = strcspn(url, "/:");
// return an allocated copy of the substring
return strndup(url, n);
}
int main(void) {
char s[] = "http://www.wikipedia.org/";
char *str = read_website(s);
printf("%s -> %s\n", s, str);
free(str);
return 0;
}
strndup() is a POSIX function available on many systems and that will be part of the next version of the C Standard. If it is not available on your target, here is a simple implementation:
char *strndup(const char *s, size_t n) {
char *p;
size_t i;
for (i = 0; i < n && s[i]; i++)
continue;
p = malloc(i + 1);
if (p) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}

The assignment doesn't say the returned string must be of minimal size, and the amount of memory used for URLs is minimal.
Building on chqrlie's solution, I'd start by finding the beginning of the domain name (skipping the protocol portion), duplicate the rest of the string, and then truncate the result. Roughly:
char *prot[] = { "http://", "https://" };
for( int i=0; i < 2; i++ ) {
if( 0 == strncmp(url, http, strlen(prot)) )
s += strlen(prot);
break;
}
}
char *output = strdup(s);
if( output ) {
size_t n = strcspn(output, "/:");
output[n] = '\0';
}
return output;
The returned pointer can still be freed by the caller, so the total "wasted" space is limited to the trailing part of the truncated URL.

Related

Free, invalid pointer

I have a program, that splits strings based on the delimiter. I have also, 2 other functions, one that prints the returned array and another that frees the array.
My program prints the array and returns an error when the free array method is called. Below is the full code.
#include "stringsplit.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
unsigned long int getNofTokens(const char *string) {
char *stringCopy;
unsigned long int stringLength;
unsigned long int count = 0;
stringLength = (unsigned)strlen(string);
stringCopy = malloc((stringLength + 1) * sizeof(char));
strcpy(stringCopy, string);
if (strtok(stringCopy, " \t") != NULL) {
count++;
while (strtok(NULL, " \t") != NULL)
count++;
}
free(stringCopy);
return count;
}
char **split_string(const char *str, const char *split) {
unsigned long int count = getNofTokens(str);
char **result;
result = malloc(sizeof(char *) * count + 1);
char *tmp = malloc(sizeof(char) * strlen(str));
strcpy(tmp, str);
char *token = strtok(tmp, split);
int idx = 0;
while (token != NULL) {
result[idx++] = token;
token = strtok(NULL, split);
}
return result;
}
void print_split_string(char **split_string) {
for (int i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (int i = 0; split_string[i] != NULL; i++) {
char *currentPointer = split_string[i];
free(currentPointer);
}
free(split_string);
}
Also, do I need to explicitly add \0 at the end of the array or does strtok add it automatically?
There are some problems in your code:
[Major] the function getNofTokens() does not take the separator string as an argument, it counts the number of words separated by blanks, potentially returning an inconsistent count to its caller.
[Major] the size allocated in result = malloc(sizeof(char *) * count + 1); is incorrect: it should be:
result = malloc(sizeof(char *) * (count + 1));
Storing the trailing NULL pointer will write beyond the end of the allocated space.
[Major] storing the said NULL terminator at the end of the array is indeed necessary, as the block of memory returned by malloc() is uninitialized.
[Major] the copy of the string allocated and parsed by split_string cannot be safely freed because the pointer tmp is not saved anywhere. The pointer to the first token will be different from tmp in 2 cases: if the string contains only delimiters (no token found) or if the string starts with a delimiter (the initial delimiters will be skipped). In order to simplify the code and make it reliable, each token could be duplicated and tmp should be freed. In fact your free_split_string() function relies on this behavior. With the current implementation, the behavior is undefined.
[Minor] you use unsigned long and int inconsistently for strings lengths and array index variables. For consistency, you should use size_t for both.
[Remark] you should allocate string copies with strdup(). If this POSIX standard function is not available on your system, write a simple implementation.
[Major] you never test for memory allocation failure. This is OK for testing purposes and throw away code, but such potential failures should always be accounted for in production code.
[Remark] strtok() is a tricky function to use: it modifies the source string and keeps a hidden static state that makes it non-reentrant. You should avoid using this function although in this particular case it performs correctly, but if the caller of split_string or getNofTokens relied on this hidden state being preserved, it would get unexpected behavior.
Here is a modified version:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
size_t getNofTokens(const char *string, const char *split) {
char *tmp = strdup(string);
size_t count = 0;
if (strtok(tmp, split) != NULL) {
count++;
while (strtok(NULL, split) != NULL)
count++;
}
free(tmp);
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
char *tmp = strdup(str);
char *token = strtok(tmp, split);
size_t idx = 0;
while (token != NULL && idx < count) {
result[idx++] = strdup(token);
token = strtok(NULL, split);
}
result[idx] = NULL;
free(tmp);
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
Here is an alternative without strtok() and without intermediary allocations:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
size_t getNofTokens(const char *str, const char *split) {
size_t count = 0;
size_t pos = 0, len;
for (pos = 0;; pos += len) {
pos += strspn(str + pos, split); // skip delimiters
len = strcspn(str + pos, split); // parse token
if (len == '\0')
break;
count++;
}
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
size_t pos, len, idx;
for (pos = 0, idx = 0; idx < count; pos += len, idx++) {
pos += strspn(str + pos, split); // skip delimiters
len = strcspn(str + pos, split); // parse token
if (len == '\0')
break;
result[idx] = strndup(str + pos, len);
}
result[idx] = NULL;
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
EDIT After re-reading the specification in your comment, there seems to be some potential confusion as to the semantics of the split argument:
if split is a set of delimiters, the above code does the job. And the examples will be split as expected.
if split is an actual string to match explicitly, the above code only works by coincidence on the examples given in the comment.
To implement the latter semantics, you should use strstr() to search for the split substring in both getNofTokens and split_string.
Here is an example:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stringsplit.h"
/* Split string by another string, return split parts + NULL in array.
*
* Parameters:
* str: the string to split
* split: the string to split str with
*
* Returns:
* A dynamically reserved array of dynamically reserved string parts.
*
* For example called with "Test string split" and " ",
* returns ["Test", "string", "split", NULL].
* Or called with "Another - test" and " - ",
* returns ["Another", "test", NULL].
*/
size_t getNofTokens(const char *str, const char *split) {
const char *p;
size_t count = 1;
size_t len = strlen(split);
if (len == 0)
return strlen(str);
for (p = str; (p = strstr(p, split)) != NULL; p += len)
count++;
return count;
}
char **split_string(const char *str, const char *split) {
size_t count = getNofTokens(str, split);
char **result = malloc(sizeof(*result) * (count + 1));
size_t len = strlen(split);
size_t idx;
const char *p = str;
for (idx = 0; idx < count; idx++) {
const char *q = strstr(p, split);
if (q == NULL) {
q = p + strlen(p);
} else
if (q == p && *q != '\0') {
q++;
}
result[idx] = strndup(p, q - p);
p = q + len;
}
result[idx] = NULL;
return result;
}
void print_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
printf("%s\n", split_string[i]);
}
}
void free_split_string(char **split_string) {
for (size_t i = 0; split_string[i] != NULL; i++) {
free(split_string[i]);
}
free(split_string);
}
When debugging, take note of values that you got from malloc, strdup, etc. Let's call these values "the active set". It's just a name, so that we can refer to them. You get a pointer from those functions, you mentally add it to the active set. When you call free, you can only pass values from the active set, and after free returns, you mentally remove them from the set. Any other use of free is invalid and a bug.
You can easily find this out by putting breakpoints after all memory allocations, so that you can write down the pointer values, and then breakpoints on all frees, so that you can see if one of those pointer values got passed to free - since, again, to do otherwise is to misuse free.
This can be done also using "printf" debugging. Like this:
char *buf = malloc(...); // or strdup, or ...
fprintf(stderr, "+++ Alloc %8p\n", buf);
And then whenever you have free, do it again:
fprintf(stderr, "--- Free %8p\n", ptr);
free(ptr);
In the output of the program, you must be able to match every +++ with ---. If you see any --- with a value that wasn't earlier listed with a +++, there's your problem: that's the buggy invocation of free :)
I suggest using fprintf(stderr, ... instead of printf(..., since the former is typically unbuffered, so if your program crashes, you won't miss any output. printf is buffered on some architectures (and not buffered on others - so much for consistency).

Why does my string_split implementation not work?

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.
The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.
Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.
Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.
When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.
Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.
Edit: should mention that with the following code,
ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEBUG
#ifdef DEBUG
#define _D if (1)
#else
#define _D if (0)
#endif
char **str_split(char[], char, int*);
int count_char(char[], char);
int main(void) {
int num_strings = 0;
char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);
if (result == NULL) {
printf("result is NULL\n");
return 0;
}
if (num_strings > 0) {
for (int i = 0; i < num_strings; i++) {
printf("\"%s\" \n", result[i]);
}
}
free(result);
return 0;
}
char **str_split(char string[], char delim, int *num_strings) {
int num_delim = count_char(string, delim);
*num_strings = num_delim + 1;
if (*num_strings < 2) {
return NULL;
}
//return value
char **ret = malloc((*num_strings) * sizeof(char*));
if (ret == NULL) {
_D printf("ret is null.\n");
return NULL;
}
int slen = strlen(string);
char buffer[slen];
/* b is the buffer index, c is the index for **ret */
int b = 0, c = 0;
for (int i = 0; i < slen + 1; i++) {
char cur = string[i];
if (cur == delim || cur == '\0') {
_D printf("Copying content of buffer to ret[%i]\n", c);
//char *tmp = malloc(sizeof(char) * slen + 1);
//strcpy(tmp, buffer);
//ret[c] = tmp;
ret[c] = buffer;
_D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
//free(tmp);
c++;
b = 0;
continue;
}
//otherwise
_D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);
buffer[b] = cur;
buffer[b+1] = '\0'; /* extend the null char */
b++;
_D printf("Buffer is now equal to: \"%s\"\n", buffer);
}
return ret;
}
int count_char(char base[], char c) {
int count = 0;
int i = 0;
while (base[i] != '\0') {
if (base[i++] == c) {
count++;
}
}
_D printf("Found %i occurence(s) of '%c'\n", count, c);
return count;
}
You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.
To get around this requires one of the following:
Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:
char my_string[] = "Helo_World_poopy_pants";
char **result = str_split(my_string, '_', &num_strings);
In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).
Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.
It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.
Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:
When you allocate the pointer array, also make it large enough to hold a copy of the string:
// Allocate storage for `num_strings` pointers, plus a copy of the original string,
// then copy the string into memory immediately following the pointer storage.
char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
char *buffer = (char*)&ret[*num_strings];
strcpy(buffer, string);
Now, do all your string operations on buffer. For example:
// Extract all delimited substrings. Here, buffer will always point at the
// current substring, and p will search for the delimiter. Once found,
// the substring is terminated, its pointer appended to the substring array,
// and then buffer is pointed at the next substring, if any.
int c = 0;
for(char *p = buffer; *buffer; ++p)
{
if (*p == delim || !*p) {
char *next = p;
if (*p) {
*p = '\0';
++next;
}
ret[c++] = buffer;
buffer = next;
}
}
When you need to clean up, it's just a single call to free, because everything was stored together.
The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().
Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?
Here is a simpler implementation:
#include <stdlib.h>
char **str_split(const char *string, char delim, int *num_strings) {
int i, n, from, to;
char **res;
for (n = 1, i = 0; string[i]; i++)
n += (string[i] == delim);
*num_strings = 0;
res = malloc(sizeof(*res) * n);
if (res == NULL)
return NULL;
for (i = from = to = 0;; from = to + 1) {
for (to = from; string[to] != delim && string[to] != '\0'; to++)
continue;
res[i] = malloc(to - from + 1);
if (res[i] == NULL) {
/* allocation failure: free memory allocated so far */
while (i > 0)
free(res[--i]);
free(res);
return NULL;
}
memcpy(res[i], string + from, to - from);
res[i][to - from] = '\0';
i++;
if (string[to] == '\0')
break;
}
*num_strings = n;
return res;
}

Can't modify an array within a loop (C)

I am currently developing a small program requires a function to return a string (character array), and two parameters which are (phrase, c). The 'phrase' is a string input and 'c' is the character which will be removed from the phrase. The left-over spaces will also be removed.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
//This method has two parameters: (str, c)
//It will remove all occurences of var 'c'
//inside of 'str'
char * rmchr(char * str, char *c) {
//Declare result array
char *strVal = (char *) malloc(sizeof(char) * strlen(str));
//Iterate through each character
for (int i = 0; i < strlen(str); i++) {
*(strVal+i) = str[i];
//Check if char matches 'c'
if (strVal[i] != *c){
//Assign filtered value to new array
*(strVal+i) = str[i];
printf("%c", strVal[i]);
}
}
return strVal;
}
int main()
{
char * result = rmchr("This is a great message to test with! It includes a lot of examples!","i");
return 1;
}
Inside of the 'rmchr' function (if-statement), the array prints out exactly how I'd like to return it:
Ths s a great message to test wth! It ncludes a lot of examples!
The problem is that my return variable, 'strVal' isn't being modified outside of the if-statement. How can I modify the array permanently so my ideal output will be returned inside of 'result' (inside of main).
I see a few points to address. Primarily, this code directly copies the input string verbatim as it stands. The same *(strVal+i) = str[i]; assignment takes place in two locations in the code which disregards the comparison against *c. Without some secondary index variable j, it becomes difficult to keep track of the end of the receiving string.
Additional notes:
There is no free for your malloc; this creates a memory leak.
You return exit code 1 which indicates abnormal program termination. return 0 to indicate a normal exit.
Don't cast the pointer malloc returns; this can hide errors.
Validate malloc success and exit if it failed.
strlen() is a linear time operation that iterates through the entire parameter string on each call. Call it once and store the result in a variable to save cycles.
This code does not handle removal of extra spaces as required.
See the below sample for a possible implementation that addresses some of the above points:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *rmchr(char *str, char *c) {
int i = 0;
int j = 0;
int len = strlen(str);
char *result = malloc(sizeof(*result) * (len + 1));
if (result == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
while (i < len) {
if (str[i] != *c) {
result[j++] = str[i++];
}
else {
for (i++; i < len && str[i] == ' '; i++);
}
}
result[j] = '\0';
return result;
}
int main() {
char *result = rmchr("This is a great message to test with! It includes a lot of examples!", "i");
for (int i = 0; i < strlen(result); i++) {
printf("%c", result[i]);
}
free(result);
return 0;
}
Output:
Ths s a great message to test wth! It ncludes a lot of examples!

garbage output after whitespace removing

i've this code:
int i =0;
char * str = "ar bitrary whitespace";
int whitespace=0,index;
for(index = 0;index < strlen(str);index++)
{
if(isspace(str[index]) != 0)
{
whitespace++;
}
}
char * tmp = (char *)calloc(strlen(str)-whitespace +1,sizeof(char));
memset(tmp,'\0',strlen(tmp)+1);
while(i < strlen(str))
{
if(isspace(str[i]) != 0)
{
i++;
continue;
}else if(isspace(str[i]) == 0)
{
strcat(tmp,&str[i]);
i++;
}
}
printf("\nnew string is: %s \n",tmp);
the problem is that the output is a string without the whitespace removed + some garbage character.
I've used memset to null terminate tmp,is there the problem?
The length of the source string could be calculated before this loop
for(index = 0;index < strlen(str);index++)
Otherwise if the code will not be optimized the function strlen will be called for each iteration of the loop. In fact using of the function is redundant for such a task.
This statement
memset(tmp,'\0',strlen(tmp)+1);
does not make sense because the call of calloc already initialized the memory with zeroes.
This statement
strcat(tmp,&str[i]);
also copies blanks from the source string after the position i. So it can write beyond the memory allocated for the array pointed to by the pointer tmp.
You can write a separate function that can look as it is shown in this demonstrative program
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
char * remove_blanks( const char *s )
{
size_t n = 0;
const char *p = s;
do
{
if ( !isspace( ( unsigned char )*p ) ) ++n;
} while ( *p++ );
char *t = malloc( n );
if ( t )
{
char *q = t;
p = s;
do
{
if ( !isspace( ( unsigned char )*p ) ) *q++ = *p;
} while ( *p++ );
}
return t;
}
int main(void)
{
char * str = "ar bitrary whitespace";
printf( "\"%s\"\n", str );
char *t = remove_blanks( str );
printf( "\"%s\"\n", t );
free( t );
}
The program output is
"ar bitrary whitespace"
"arbitrarywhitespace"
this is your problem
memset(tmp,'\0',strlen(tmp)+1);
strlen(tmp) works by looking for '\0' in tmp, you have a chicken and egg situation here.
You should not be doing a memset any way, just tack on a '\0' when you fnish copying
And dont use strcat, instead maintain a pointer to tmp and just do *p = str[i] then increment p
I will not read your question, you overwrite the '\0' terminator for sure.
Now that I read your question, it looks like you need to understand strings and arrays better,
Don't ever write while (i < strlen(str))
Don't use strcat() for adding a single character, you apparently did overwrite the '\0' there. Furthermore, don't ever use strcat() for concatenating more than to pieces of a string.
Also notable,
You memset() after calloc() which already initialized to 0. That means that you are enforcing something that is not necessary, and trying it twice as if it failed the first time which I can guarantee it didn't.
In fact, since you have used calloc() and all bytes pointed to by tmp are 0 then strlen(tmp) will return 0, thus your memset() is equivalent to
tmp[0] = '\0';
and you REALLY don't need initialize tmp except when you finally copy the actual bytes from str.
I always advice against calloc() for strings, because
You don't really need to initialize something twice.
You should be sure your code does take the terminating '\0' into account and not simply assume that it's there because you calloc()ed. That is a bug that you just hide with calloc() but it shows up at some point.
Try this and see if you can understand the reasons for my changes
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int main(void)
{
int whitespace;
int length;
char *str = "ar bitrary whitespace";
char *tmp;
whitespace = 0;
for (length = 0; str[length] != '\0'; ++length) {
if (isspace(str[length]) != 0) {
whitespace++;
}
}
tmp = malloc(length - whitespace + 1);
if (tmp == NULL)
return -1;
for (int i = 0, j = 0; str[i] != '\0'; ++i) {
if (isspace(str[i]) != 0)
continue;
tmp[j++] = str[i];
}
tmp[length - whitespace] = '\0';
printf("new string is: %s\n",tmp);
free(tmp);
return 0;
}

C - Replacing char with string

I am writing a program that encodes text such that it can be put into a URL. I have the user inputting a string and if it contains special characters (#, %, &, ?, etc.) to replace them with their corresponding character codes (%23, %25, %26, %3F, etc.). The problem is that the special characters are only of length 1 and the codes are of length 3. The codes end up replacing characters after the special one. This is the code I am using to do the replacement.
char *p = enteredCharStr;
while ((p = strstr(p, specialCharArr[x])) != NULL )
{
char *substr;
substr = strstr(enteredCharStr, specialChar[x]);
strncpy(substr, charCodesArr[x], 3);
p++;
}
Example output from using my program with input: "this=this&that"
this%3Dis%26at
I would like the output to be:
this%3Dthis%26that
Any idea on how to implement what I am trying to do in C (no libraries)?
One way to approach this problem would be to allocate a second string that is three times as large as enteredCharStr and copy the characters over one by one and when you see special character write the replaement instead. You want it to be three times as large since in the worst case you need to replace nearly all the characters.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int isspecial(int c){
static char table[] = "#%&?=<>"; //add etc..
return strchr(table, c) ? 1 : 0;
}
char *encode(const char *s){
size_t capa = 1024;
char *buff=malloc(capa*sizeof(char));
size_t size = 0;
for(;*s;++s){
if(size + 3 > capa){
capa += 32;
buff = realloc(buff, capa*sizeof(char));
}
if(isspecial(*s)){
size+=sprintf(buff+size, "%%%02x", *s);
} else {
size+=sprintf(buff+size, "%c", *s);
}
}
if(size == capa){
buff=realloc(buff, (size+1)*sizeof(char));
}
buff[size++]='\0';
return realloc(buff, size*sizeof(char));
}
int main(void){
char *enteredCharStr = "this=this&that";
char *p = encode(enteredCharStr);
printf("%s\n", p);
free(p);
return 0;
}
You need to make a new string. Here's an example:
char *str = "abc$ddd";
char *p = str;
char *buf = malloc(strlen(str)+1);
char *pbuf = buf;
while(*p) {
if(*p != '$') *pbuf ++ = *p;
p++;
}
It will copy from str to buf all non-$,byte per byte.
Note that in your case,you need to perform the right computation of size of new string.
A C 'string' is a fixed-size array of characters, and therefore there is no built-in notion of insertion. You're effectively asking how to insert n characters into the middle of an array.
One strategy come to mind:
To insert a string of length x at position i of an array of length n:
Resize the array to size n+x (using something like realloc).
Shuffle every character beyond position i to position i+x.
Write your string into the x positions now freed by this shuffle operation.
Alternatively, allocate a new array that is big enough to hold your target string (i.e., with all the substitutions applied), and then write your result into that by copying from the target array until you encounter a character you'd like to replace, then copy from the replacement string, then continue reading from the original source array.
I'm copying characters over one by one, and if I see a special character, (In this code only "#")
I copy in 3 characters, incrementing the index into the output buffer by 3.
You can also do something smarter to guess the buffer size, and perhaps loop over the entire operation, doubling the size of the buffer each time it overruns.
#include<stdio.h>
#include<stdlib.h>
int main(int argc, char* argv[]){
if (argc != 2) {
exit(1);
}
char* input = argv[1];
int bufferSize = 128;
char* output = malloc(bufferSize);
int outIndex = 0;
int inIndex = 0;
while(input[inIndex] != '\0'){
switch (input[inIndex])
{
case '#':ยท
if(outIndex + 4 > bufferSize){
// Overflow, retry or something.
exit(2);
}
output[outIndex] = '%';
output[outIndex+1] = '2';
output[outIndex+2] = '3';
outIndex = outIndex + 3;
inIndex = inIndex + 1;
break;
// Other cases
default:
if(outIndex + 2 > bufferSize){
exit(2);
}
output[outIndex] = input[inIndex];
outIndex = outIndex + 1;
inIndex = inIndex + 1;
break;
}
}
output[outIndex] = '\0';
printf("%s\n", output);
return 0;
}

Resources