Storing user inputs without declaring an arbitrarily large array. - c

I am taking character input and storing it without declaring an arbitrarily large array. Problem is that the code does not print the values stored (although it perfectly prints the number of elements that I enter). The working principle is: in the first for loop execution "b" is created and "c" is copied to it(c right now contains something arbitrary),then user overwrites whatever is there in "b" then the updated "b" is copied to "c". In the second and following loop executions "c" is basically the old "b" and "b" is constantly updated by copying "c" to it and entering new element at the end.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main()
{
char e = 'a';
char *b,*c = &e;
printf("start entering the characters and enter Z to terminate:\n");
char d;
int i,m;
for(i=0;(d=getchar()) != 'Z';i++)
{
b=malloc(sizeof(char)*(i+1));
strcpy(b,c);
scanf("%c",b+i);
c=malloc(sizeof(char)*(i+1));
strcpy(c,b);
}
printf("-----------------------------------------------------------------\n");
int q=strlen(b);
printf("%d\n",q);
//printf("%s\n",b);
for(m=0;m<q;m++)
printf("%c",b[m]);
return 0;
}

I'm not sure why the question code uses both 'getchar()' as well as 'scanf()'; perhaps I am missing something?
And, as mentioned by 'BLUEPIXY', realloc() is better.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main()
{
char *b = NULL; /* This will be the reference to the self-growing array. */
size_t bLen = 0; /* This is the length of the string in 'b' (not counting the string termination character.) */
for(;;)
{
char *x; /* Used to safely grow the array 'b' larger. */
int d; /* Use an 'int'. 'getchar()' returns an 'int', (not a 'char'). */
/* Get a character from stdin. */
d=getchar();
if('Z' == d)
break;
/* Safely grow the array 'b' large enough to hold the one more character. */
x=realloc(b, (bLen+1) * sizeof(*b));
if(NULL == x)
{
fprintf(stderr, "realloc() failed.\n");
exit(1);
}
b=x;
/* Store the character in the array and re-terminate the string. */
b[bLen++] = d;
b[bLen] = '\0';
}
printf("-----------------------------------------------------------------\n");
printf("%s\n",b);
if(b)
free(b);
return 0;
}

Some impressions:
Giving all of your variables single-character names makes it a lot harder than it should be for someone reading this code to figure out what you're trying to do.
When you're done with a block of memory that you have allocated using malloc(), you need to release it using free().
strcpy() is for use with null-terminated strings, so I don't think it's going to do what you expect here. Consider memcpy() instead. Your use of strlen() has the same problem. You shouldn't need to replace that with anything, since i should already give you the character count.
Is your scanf() statement trying to copy a character into your buffer? Just use a simple assignment, like b[i] = d.
If Z is the user's first keypress, b will never be initialized, and Bad Things will happen when you try to access it in the code following the loop.
Reallocating memory during every iteration of the loop is very inefficient. Instead, consider allocating a small amount of space at the outset—say, 20 characters' worth. Then in the body of your loop, if your buffer has enough space to accommodate a new character, all you need to do is copy that one character. If your buffer lacks space, then reallocate a larger block of memory, but not just for 1 extra character; reserve space for another 20 (or whatever).
Each reallocation should only require one call to malloc() (or, even better, realloc() as suggested by BLUEPIXY). I'm not sure why you're using two.
Null-terminate the input at the end so you can treat it like a string when you want to display it.
I'm happy to help if you have specific questions about any of this.

Related

Using getchar() to store address of string literal in char*pointer

The goal is storing the adress of a string litteral in a char*, which is a member of struct
id. I thought about using an array.
The problem with array is that, if I set the maximum number of character to 7, the user
might enter less than 7, so it will be a waste of memory.
The advantage using getchar() is that I can set max of char to 7, but if user enter less, that's ok too.
typedef struct id
{
int age;
char* name;
}id;
id Mary;
char L;
int c =0;
printf("Enter your age: ");
scanf("%d",&Mary.age);
printf(" Enter your name: );
if( (L=getchar() != '\n' )
{
// stroring string litteral in char*
}
printf("%s", Mary.name);
This is a common problem: "How do I read an input string of unknown length?" Daniel Kleinstein has mentioned several general solutions in his answer. I'll give a more implementation-based answer here
Firstly, your program does not try to store a string literal, but a string read from an input stream (e.g. stdin).
Secondly, it is not possible to store a string "in a char*". The string is stored in memory pointed to by a char*. This memory needs to be allocated first.
The following code comes closest to what you want to do. It reads one character at a time and increases the size of the memory copied to by 1 byte every time.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
int age;
char *name;
} Id;
int main(void)
{
Id mary;
printf(" Enter your name: ");
size_t nameSize = 0U;
mary.name = NULL;
while (true)
{
mary.name = (char*) realloc(mary.name, ++nameSize); // cast is optional
if (mary.name == NULL)
{
printf("Memory allocation error\n");
exit(EXIT_FAILURE);
}
int ch = getchar(); // Note the `int` type, necessary to detect EOF
if (ch == '\n' || ch == EOF)
{
mary.name[nameSize - 1] = '\0';
break;
}
mary.name[nameSize - 1] = (char) ch;
}
printf("%s\n", mary.name);
free(mary.name);
}
This does not waste a single byte of memory, however, the frequent memory reallocations will make this code slow. A good compromise is to read one fixed length string at a time, instead of one character at a time.
To do this in practice: create a buffer on the stack of fixed length (e.g. 64 characters), read into that buffer using fgets, and copy that contents to mary.name. If the string didn't fit the buffer, repeatedly call fgets again, realloc mary.name and append the contents of the buffer to mary.name until you find a newline character.
Another, simpler solution is to set a maximum length for the string, allocate memory for that length, read a string of maximally that length, and finally reallocate the memory to the (possibly smaller) actual size of the string.
There isn't a magic bullet solution for this sort of problem.
Your options are:
Use an array with a maximum length - but as you mentioned this can be wasteful if the user inputs a shorter length. Nevertheless this is usually the solution you'll find in real code - in practice, if memory isn't a huge concern, this is faster and simpler than trying to deal with other dynamic solutions that involve memory allocations.
Ask the user for the length of their name before they input it - then you can dynamically allocate an appropriately sized buffer using either char* name = malloc(input_length);, or char name[input_length]; in C99+. You could also do something like a flexible array member:
struct name {
size_t length;
char buffer[];
};
struct name* username = malloc(sizeof(*username) + username_length);
If you don't want to ask the user for the length of the username, you can do a chain of realloc calls after each new getchar, which will resize a dynamically allocated array - but this a terrible idea which you shouldn't even consider unless you're stressed over every byte of memory consumed in your program.

I don't know what's going wrong with this code?

I am writing a simple code which accepts a string from the user of any length and just displays it. But my code is not doing it correctly as it accepts the string but not prints it correctly.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
main()
{
int i,len;
static int n=5;
char a[20];
char **s;
s=malloc(5*sizeof(char));
char *p;
for(i=0;i<n;i++)
{
scanf("%s",a);
if(*a=='1') /*to exit from loop*/
{
break;
}
len=strlen(a);
p=malloc((len+1)*sizeof(char));
strcpy(p,a);
s[i]=p;
if(i==n-1)
{
s=realloc(s,(5+i*5)*sizeof(char));
n=5+i;
}
}
for(i=0;i<n-1;i++)
{
printf("%s ",s[i]);
}
free(p);
p=NULL;
return 0;
}
There are multiple issues, but at first look, the most prominent one is,
s=malloc(5*sizeof(char));
is wrong. s is of type char **, so you'd need to allocate memory worth of char * there. In other words, you expect s to point to a char * element, so, you need to allocate memory accordingly.
To avoid these sort of mistakes, never rely on hardcoded data types, rather, use the form
s = malloc( 5 * sizeof *s); // same as s=malloc( 5 * sizeof (*s))
where, the size oid essentially determined from the type of the variable. Two advantages
You avoid mistakes like above.
The code becomes more resilient, you don;t need to change the malloc() statement in case you choose to change the data type
That said, scanf("%s",a); is also potentially dangerous and cause buffer overflow by longer-than-expected-input. You should always limit the input scanning length, using the maximum field width, like
scanf("%19s",a); // a is array of dimension 20, one for terminating null
That said, to advice about the logic, when you don't know or don't dictate the length of the input string beforehand, you cannot use a string type to scan the input. The basic way of getting this done would be
Allocate a moderate length buffer, dynamically, using allocator functions like malloc().
Keep reading the input stream one by one, fgetc() or alike.
If the read is complete (for example, return of EOF), you've read the complete input.
If the allocated memory has run out, re-allocate the original buffer and continue to step 3.
and, don't forget to free() the memory.
Otherwise, you may use fgets() to read chunks of memory and keep realloacting as mentioned above.

Using Malloc for i endless C -String

I was wondering is it possible to create one endless array which can store endlessly long strings?
So what I exactly mean is, I want to create a function which gets i Strings with n length.I want to input infinite strings in the program which can be infinite characters long!
void endless(int i){
//store user input on char array i times
}
To achieve that I need malloc, which I would normally use like this:
string = malloc(sizeof(char));
But how would that work for lets say 5 or 10 arrays or even a endless stream of arrays? Or is this not possible?
Edit:
I do know memory is not endless, what I mean is if it where infinite how would you try to achieve it? Or maybe just allocate memory until all memory is used?
Edit 2:
So I played around a little and this came out:
void endless (char* array[], int numbersOfArrays){
int j;
//allocate memory
for (j = 0; j < numbersOfArrays; j++){
array[j] = (char *) malloc(1024*1024*1024);
}
//scan strings
for (j = 0; j < numbersOfArrays; j++){
scanf("%s",array[j]);
array[j] = realloc(array[j],strlen(array[j]+1));
}
//print stringd
for (j = 0; j < numbersOfArrays; j++){
printf("%s\n",array[j]);
}
}
However this isn't working maybe I got the realloc part terrible wrong?
The memory is not infinite, thus you cannot.
I mean the physical memory in a computer has its limits.
malloc() will fail and allocate no memory when your program requestes too much memory:
If the function failed to allocate the requested block of memory, a null pointer is returned.
Assuming that memory is infinite, then I would create an SxN 2D array, where S is the number of strings and N the longest length of the strings you got, but obviously there are many ways to do this! ;)
Another way would be to have a simple linked list (I have one in List (C) if you need one), where every node would have a char pointer and that pointer would eventually host a string.
You can define a max length you will assume it will be the max lenght of your strings. Otherwise, you could allocate a huge 1d char array which you hole the new string, use strlen() to find the actual length of the string, and then allocate dynamically an array that would exactly the size that is needed, equal of that length + 1 for the null-string-terminator.
Here is a toy example program that asks the user to enter some strings. Memory is allocated for the strings in the get_string() function, then pointers to the strings are added to an array in the add_string() function, which also allocates memory for array storage. You can add as many strings of arbitrary length as you want, until your computer runs out of memory, at which point you will probably segfault because there are no checks on whether the memory allocations are successful. But that would take an awful lot of typing.
I guess the important point here is that there are two allocation steps: one for the strings and one for the array that stores the pointers to the strings. If you add a string literal to the storage array, you don't need to allocate for it. But if you add a string that is unknown at compile time (like user input), then you have to dynamically allocate memory for it.
Edit:
If anyone tried to run the original code listed below, they might have encountered some bizarre behavior for long strings. Specifically, they could be truncated and terminated with a mystery character. This was a result of the fact that the original code did not handle the input of an empty line properly. I did test it for a very long string, and it seemed to work. I think that I just got "lucky." Also, there was a tiny (1 byte) memory leak. It turned out that I forgot to free the memory pointed to from newstring, which held a single '\0' character upon exit. Thanks, Valgrind!
This all could have been avoided from the start if I had passed a NULL back from the get_string() function instead of an empty string to indicate an empty line of input. Lesson learned? The source code below has been fixed, NULL now indicates an empty line of input, and all is well.
#include <stdio.h>
#include <stdlib.h>
char * get_string(void);
char ** add_string(char *str, char **arr, int num_strings);
int main(void)
{
char *newstring;
char **string_storage;
int i, num = 0;
string_storage = NULL;
puts("Enter some strings (empty line to quit):");
while ((newstring = get_string()) != NULL) {
string_storage = add_string(newstring, string_storage, num);
++num;
}
puts("You entered:");
for (i = 0; i < num; i++)
puts(string_storage[i]);
/* Free allocated memory */
for (i = 0; i < num; i++)
free(string_storage[i]);
free(string_storage);
return 0;
}
char * get_string(void)
{
char ch;
int num = 0;
char *newstring;
newstring = NULL;
while ((ch = getchar()) != '\n') {
++num;
newstring = realloc(newstring, (num + 1) * sizeof(char));
newstring[num - 1] = ch;
}
if (num > 0)
newstring[num] = '\0';
return newstring;
}
char ** add_string(char *str, char **arr, int num_strings)
{
++num_strings;
arr = realloc(arr, num_strings * (sizeof(char *)));
arr[num_strings - 1] = str;
return arr;
}
I was wondering is it possible to create one endless array which can store endlessly long strings?
The memory can't be infinite. So, the answer is NO. Even if you have every large memory, you will need a processor that could address that huge memory space. There is a limit on amount of dynamic memory that can be allocated by malloc and the amount of static memory(allocated at compile time) that can be allocated. malloc function call will return a NULL if there is no suitable memory block requested by you in the heap memory.
Assuming that you have very large memory space available to you relative to space required by your input strings and you will never run out of memory. You can store your input strings using 2 dimensional array.
C does not really have multi-dimensional arrays, but there are several ways to simulate them. You can use a (dynamically allocated) array of pointers to (dynamically allocated) arrays. This is used mostly when the array bounds are not known until runtime. OR
You can also allocate a global two dimensional array of sufficient length and width. The static allocation for storing random size input strings is not a good idea. Most of the memory space will be unused.
Also, C programming language doesn't have string data type. You can simulate a string using a null terminated array of characters. So, to dynamically allocate a character array in C, we should use malloc like shown below:
char *cstr = malloc((MAX_CHARACTERS + 1)*sizeof(char));
Here, MAX_CHARACTERS represents the maximum number of characters that can be stored in your cstr array. The +1 is added to allocate a space for null character if MAX_CHARACTERS are stored in your string.

How to put a char into a empty pointer of a string in pure C

I want to store a single char into a char array pointer and that action is in a while loop, adding in a new char every time. I strictly want to be into a variable and not printed because I am going to compare the text. Here's my code:
#include <stdio.h>
#include <string.h>
int main()
{
char c;
char *string;
while((c=getchar())!= EOF) //gets the next char in stdin and checks if stdin is not EOF.
{
char temp[2]; // I was trying to convert c, a char to temp, a const char so that I can use strcat to concernate them to string but printf returns nothing.
temp[0]=c; //assigns temp
temp[1]='\0'; //null end point
strcat(string,temp); //concernates the strings
}
printf(string); //prints out the string.
return 0;
}
I am using GCC on Debain (POSIX/UNIX operating system) and want to have windows compatability.
EDIT:
I notice some communication errors with what I actually intend to do so I will explain: I want to create a system where I can input a unlimited amount of characters and have the that input be store in a variable and read back from a variable to me, and to get around using realloc and malloc I made it so it would get the next available char until EOF. Keep in mind that I am a beginner to C (though most of you have probably guess it first) and haven't had a lot of experience memory management.
If you want unlimited amount of character input, you'll need to actively manage the size of your buffer. Which is not as hard as it sounds.
first use malloc to allocate, say, 1000 bytes.
read until this runs out.
use realloc to allocate 2000
read until this runs out.
like this:
int main(){
int buf_size=1000;
char* buf=malloc(buf_size);
char c;
int n=0;
while((c=getchar())!= EOF)
buf[n++] = c;
if(n=>buf_size-1)
{
buf_size+=1000;
buf=realloc(buf, buf_size);
}
}
buf[n] = '\0'; //add trailing 0 at the end, to make it a proper string
//do stuff with buf;
free(buf);
return 0;
}
You won't get around using malloc-oids if you want unlimited input.
You have undefined behavior.
You never set string to point anywhere, so you can't dereference that pointer.
You need something like:
char buf[1024] = "", *string = buf;
that initializes string to point to valid memory where you can write, and also sets that memory to an empty string so you can use strcat().
Note that looping strcat() like this is very inefficient, since it needs to find the end of the destination string on each call. It's better to just use pointers.
char *string;
You've declared an uninitialised variable with this statement. With some compilers, in debug this may be initialised to 0. In other compilers and a release build, you have no idea what this is pointing to in memory. You may find that when you build and run in release, your program will crash, but appears to be ok in debug. The actual behaviour is undefined.
You need to either create a variable on the stack by doing something like this
char string[100]; // assuming you're not going to receive more than 99 characters (100 including the NULL terminator)
Or, on the heap: -
char string* = (char*)malloc(100);
In which case you'll need to free the character array when you're finished with it.
Assuming you don't know how many characters the user will type, I suggest you keep track in your loop, to ensure you don't try to concatenate beyond the memory you've allocated.
Alternatively, you could limit the number of characters that a user may enter.
const int MAX_CHARS = 100;
char string[MAX_CHARS + 1]; // +1 for Null terminator
int numChars = 0;
while(numChars < MAX_CHARS) && (c=getchar())!= EOF)
{
...
++numChars;
}
As I wrote in comments, you cannot avoid malloc() / calloc() and probably realloc() for a problem such as you have described, where your program does not know until run time how much memory it will need, and must not have any predetermined limit. In addition to the memory management issues on which most of the discussion and answers have focused, however, your code has some additional issues, including:
getchar() returns type int, and to correctly handle all possible inputs you must not convert that int to char before testing against EOF. In fact, for maximum portability you need to take considerable care in converting to char, for if default char is signed, or if its representation has certain other allowed (but rare) properties, then the value returned by getchar() may exceed its maximum value, in which case direct conversion exhibits undefined behavior. (In truth, though, this issue is often ignored, usually to no ill effect in practice.)
Never pass a user-provided string to printf() as the format string. It will not do what you want for some inputs, and it can be exploited as a security vulnerability. If you want to just print a string verbatim then fputs(string, stdout) is a better choice, but you can also safely do printf("%s", string).
Here's a way to approach your problem that addresses all of these issues:
#include <stdio.h>
#include <string.h>
#include <limits.h>
#define INITIAL_BUFFER_SIZE 1024
int main()
{
char *string = malloc(INITIAL_BUFFER_SIZE);
size_t cap = INITIAL_BUFFER_SIZE;
size_t next = 0;
int c;
if (!string) {
// allocation error
return 1;
}
while ((c = getchar()) != EOF) {
if (next + 1 >= cap) {
/* insufficient space for another character plus a terminator */
cap *= 2;
string = realloc(string, cap);
if (!string) {
/* memory reallocation failure */
/* memory was leaked, but it's ok because we're about to exit */
return 1;
}
}
#if (CHAR_MAX != UCHAR_MAX)
/* char is signed; ensure defined behavior for the upcoming conversion */
if (c > CHAR_MAX) {
c -= UCHAR_MAX;
#if ((CHAR_MAX != (UCHAR_MAX >> 1)) || (CHAR_MAX == (-1 * CHAR_MIN)))
/* char's representation has more padding bits than unsigned
char's, or it is represented as sign/magnitude or ones' complement */
if (c < CHAR_MIN) {
/* not representable as a char */
return 1;
}
#endif
}
#endif
string[next++] = (char) c;
}
string[next] = '\0';
fputs(string, stdout);
return 0;
}

best practice for returning a variable length string in c

I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.

Resources