I need a function/method that will take in a char array and set it to a string read from stdin. It needs to return the last character read as its return type, so I can determine if it reached the end of a line or the end of file marker.
here is what I have so far, and I kind of based it off of code from here
UPDATE: I changed it, but now it just crashes upon hitting enter after text. I know this way is inefficient, and char is not the best for EOF check, but for now I am just trying to get it to return the string. I need it to do it in this fashion and no other fashion. I need the string to be the exact length of the line, and to return a value that is either the newline or EOF int which I believe can still be used in a char value.
This program is in C not C++
char getLine(char **line);
int main(int argc, char *argv[])
{
char *line;
char returnVal = 0;
returnVal = getLine(&line);
printf("%s", line);
free(line);
system("pause");
return 0;
}
char getLine(char **line) {
unsigned int lengthAdder = 1, counter = 0, size = 0;
char charRead = 0;
*line = malloc(lengthAdder);
while((charRead = getc(stdin)) != EOF && charRead != '\n')
{
*line[counter++] = charRead;
*line = realloc(*line, counter);
}
*line[counter] = '\0';
return charRead;
}
Thank you for any help in advance!
You're assigning the result of malloc() to a local copy of line, so after the getLine() function returns it's not modified (albeit you think it is). What you have to do is either return it (as opposed to use an output parameter) or pass its address (pass it 'by reference'):
void getLine(char **line)
{
*line = malloc(length);
// etc.
}
and call it like this:
char *line;
getLine(&line);
Your key problem is that line pointer value does not propagate out of the getLine() function. The solution is to pass pointer to the line pointer to the function as a parameter instead - calling it like getLine(&line); while the function would be defined as taking parameter char **line. In the function, on all places where you now work with line, you would work with *line instead, i.e. dereferencing the pointer to a pointer and working with the value of the variable in main() where the pointer leads. Hope this is not too confusing. :-) Try to draw it on a piece of paper.
(A tricky part - you must change line[counter] to (*line)[counter] because you first need to dereference the pointer to the string, and only then to access a specific character in the string.)
There is a couple of other problems with your code:
You use char as the type for charRead. However, the EOF constant cannot be represented using char, you need to use int - both as the type of charRead and return value of getLine(), so that you can actually distringuish between a newline and end of file.
You forgot to return the last char read from your getLine() function. :-)
You are reallocating the buffer after each character addition. This is not terribly efficient and therefore is a rather ugly programming practice. It is not too difficult to use another variable to track the amount of space allocated and then (i) start with allocating a reasonable chunk of memory, e.g. 64 bytes, so that ideally you will never reallocate (ii) enlarge the allocation only if you need to based on comparing the counter and your allocation size tracker. Two reallocation strategies are common - either doubling the size of the allocation or increasing the allocation by a fixed step.
The way you use realloc is not correct. If it returns NULL then the memory block will be lost.
It is better to use realloc in this way:
char *tmp;
...
tmp = realloc(line, counter);
if(tmp == NULL)
ERROR, TRY TO SOLVE IT
line = tmp;
Related
I want to store a single char into a char array pointer and that action is in a while loop, adding in a new char every time. I strictly want to be into a variable and not printed because I am going to compare the text. Here's my code:
#include <stdio.h>
#include <string.h>
int main()
{
char c;
char *string;
while((c=getchar())!= EOF) //gets the next char in stdin and checks if stdin is not EOF.
{
char temp[2]; // I was trying to convert c, a char to temp, a const char so that I can use strcat to concernate them to string but printf returns nothing.
temp[0]=c; //assigns temp
temp[1]='\0'; //null end point
strcat(string,temp); //concernates the strings
}
printf(string); //prints out the string.
return 0;
}
I am using GCC on Debain (POSIX/UNIX operating system) and want to have windows compatability.
EDIT:
I notice some communication errors with what I actually intend to do so I will explain: I want to create a system where I can input a unlimited amount of characters and have the that input be store in a variable and read back from a variable to me, and to get around using realloc and malloc I made it so it would get the next available char until EOF. Keep in mind that I am a beginner to C (though most of you have probably guess it first) and haven't had a lot of experience memory management.
If you want unlimited amount of character input, you'll need to actively manage the size of your buffer. Which is not as hard as it sounds.
first use malloc to allocate, say, 1000 bytes.
read until this runs out.
use realloc to allocate 2000
read until this runs out.
like this:
int main(){
int buf_size=1000;
char* buf=malloc(buf_size);
char c;
int n=0;
while((c=getchar())!= EOF)
buf[n++] = c;
if(n=>buf_size-1)
{
buf_size+=1000;
buf=realloc(buf, buf_size);
}
}
buf[n] = '\0'; //add trailing 0 at the end, to make it a proper string
//do stuff with buf;
free(buf);
return 0;
}
You won't get around using malloc-oids if you want unlimited input.
You have undefined behavior.
You never set string to point anywhere, so you can't dereference that pointer.
You need something like:
char buf[1024] = "", *string = buf;
that initializes string to point to valid memory where you can write, and also sets that memory to an empty string so you can use strcat().
Note that looping strcat() like this is very inefficient, since it needs to find the end of the destination string on each call. It's better to just use pointers.
char *string;
You've declared an uninitialised variable with this statement. With some compilers, in debug this may be initialised to 0. In other compilers and a release build, you have no idea what this is pointing to in memory. You may find that when you build and run in release, your program will crash, but appears to be ok in debug. The actual behaviour is undefined.
You need to either create a variable on the stack by doing something like this
char string[100]; // assuming you're not going to receive more than 99 characters (100 including the NULL terminator)
Or, on the heap: -
char string* = (char*)malloc(100);
In which case you'll need to free the character array when you're finished with it.
Assuming you don't know how many characters the user will type, I suggest you keep track in your loop, to ensure you don't try to concatenate beyond the memory you've allocated.
Alternatively, you could limit the number of characters that a user may enter.
const int MAX_CHARS = 100;
char string[MAX_CHARS + 1]; // +1 for Null terminator
int numChars = 0;
while(numChars < MAX_CHARS) && (c=getchar())!= EOF)
{
...
++numChars;
}
As I wrote in comments, you cannot avoid malloc() / calloc() and probably realloc() for a problem such as you have described, where your program does not know until run time how much memory it will need, and must not have any predetermined limit. In addition to the memory management issues on which most of the discussion and answers have focused, however, your code has some additional issues, including:
getchar() returns type int, and to correctly handle all possible inputs you must not convert that int to char before testing against EOF. In fact, for maximum portability you need to take considerable care in converting to char, for if default char is signed, or if its representation has certain other allowed (but rare) properties, then the value returned by getchar() may exceed its maximum value, in which case direct conversion exhibits undefined behavior. (In truth, though, this issue is often ignored, usually to no ill effect in practice.)
Never pass a user-provided string to printf() as the format string. It will not do what you want for some inputs, and it can be exploited as a security vulnerability. If you want to just print a string verbatim then fputs(string, stdout) is a better choice, but you can also safely do printf("%s", string).
Here's a way to approach your problem that addresses all of these issues:
#include <stdio.h>
#include <string.h>
#include <limits.h>
#define INITIAL_BUFFER_SIZE 1024
int main()
{
char *string = malloc(INITIAL_BUFFER_SIZE);
size_t cap = INITIAL_BUFFER_SIZE;
size_t next = 0;
int c;
if (!string) {
// allocation error
return 1;
}
while ((c = getchar()) != EOF) {
if (next + 1 >= cap) {
/* insufficient space for another character plus a terminator */
cap *= 2;
string = realloc(string, cap);
if (!string) {
/* memory reallocation failure */
/* memory was leaked, but it's ok because we're about to exit */
return 1;
}
}
#if (CHAR_MAX != UCHAR_MAX)
/* char is signed; ensure defined behavior for the upcoming conversion */
if (c > CHAR_MAX) {
c -= UCHAR_MAX;
#if ((CHAR_MAX != (UCHAR_MAX >> 1)) || (CHAR_MAX == (-1 * CHAR_MIN)))
/* char's representation has more padding bits than unsigned
char's, or it is represented as sign/magnitude or ones' complement */
if (c < CHAR_MIN) {
/* not representable as a char */
return 1;
}
#endif
}
#endif
string[next++] = (char) c;
}
string[next] = '\0';
fputs(string, stdout);
return 0;
}
I created a function designed to get user input. It requires that memory be allocated to the variable holding the user input; however, that variable is returned at the end of the function. What is the proper method to free the allocated memory/return the value of the variable?
Here is the code:
char *input = malloc(MAX_SIZE*sizeof(char*));
int i = 0;
char c;
while((c = getchar()) != '\n' && c != EOF) {
input[i++] = c;
}
return input;
Should I return the address of input and free it after it is used?
Curious as to the most proper method to free the input variable.
It's quite simple, as long as you pass to free() the same pointer returned by malloc() it's fine.
For example
char *readInput(size_t size)
{
char *input;
int chr;
input = malloc(size + 1);
if (input == NULL)
return NULL;
while ((i < size) && ((chr = getchar()) != '\n') && (chr != EOF))
input[i++] = chr;
input[size] = '\0'; /* nul terminate the array, so it can be a string */
return input;
}
int main(void)
{
char *input;
input = readInput(100);
if (input == NULL)
return -1;
printf("input: %s\n", input);
/* now you can free it */
free(input);
return 0;
}
What you should never do is something like
free(input + n);
because input + n is not the pointer return by malloc().
But your code, has other issues you should take care of
You are allocating space for MAX_SIZE chars so you should multiply by sizeof(char) which is 1, instead of sizeof(char *) which would allocate MAX_SIZE pointers, and also you could make MAX_SIZE a function parameter instead, because if you are allocating a fixed buffer, you could define an array in main() with size MAX_SIZE like char input[MAX_SIZE], and pass it to readInput() as a parameter, thus avoiding malloc() and free().
You are allocating that much space but you don't prevent overflow in your while loop, you should verify that i < MAX_SIZE.
You could write a function with return type char*, return input, and ask the user to call free once their done with the data.
You could also ask the user to pass in a properly sized buffer themselves, together with a buffer size limit, and return how many characters were written to the buffer.
This is a classic c case. A function mallocs memory for its result, the caller must free the returned value. You are now walking onto the thin ice of c memory leaks. 2 reasons
First ; there is no way for you to communicate the free requirement in an enforceable way (ie the compiler or runtime can't help you - contrast with specifying what the argument types are ). You just have to document it somewhere and hope that the caller has read your docs
Second: even if the caller knows to free the result he might make a mistake, some error path gets taken that doesnt free the memory. This doesnt cause an immediate error, things seem to work, but after running for 3 weeks your app crashes after running out of memory
This is why so many 'modern' languages focus on this topic, c++ smart pointers, Java, C#, etc garbage collection,...
I am trying to read an unknown length line from stdin using the C language.
I have seen this when looking on the net:
char** str;
gets(&str);
But it seems to cause me some problems and I don't really understand how it is possible to do it this way.
Can you explain me why this example works/doesn't work
and what will be the correct way to implement it (with malloc?)
You don't want a pointer to pointer to char, use an array of chars
char str[128];
or a pointer to char
char *str;
if you choose a pointer you need to reserve space using malloc
str = malloc(128);
Then you can use fgets
fgets(str, 128, stdin);
and remove the trailling newline
char *ptr = strchr(str, '\n');
if (ptr != NULL) *ptr = '\0';
To read an arbitrary long line, you can use getline (a function added to the GNU version of libc):
#define _GNU_SOURCE
#include <stdio.h>
char *foo(FILE * f)
{
int n = 0, result;
char *buf;
result = getline(&buf, &n, f);
if (result < 0) return NULL;
return buf;
}
or your own implementation using fgets and realloc:
char *getline(FILE * f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, BUFSIZ, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
Call it using
char *str = getline(stdin);
if (str == NULL) {
perror("getline");
exit(EXIT_FAILURE);
}
...
free(str);
More info
Firstly, gets() provides no way of preventing a buffer overrun. That makes it so dangerous it has been removed from the latest C standard. It should not be used. However, the usual usage is something like
char buffer[20];
gets(buffer); /* pray that user enters no more than 19 characters in a line */
Your usage is passing gets() a pointer to a pointer to a pointer to char. That is not what gets() expects, so your code would not even compile.
That element of prayer reflected in the comment is why gets() is so dangerous. If the user enters 20 (or more) characters, gets() will happily write data past the end of buffer. There is no way a programmer can prevent that in code (short of accessing hardware to electrocute the user who enters too much data, which is outside the realm of standard C).
To answer your question, however, the only ways involve allocating a buffer of some size, reading data in some controlled way until that size is reached, reallocating if needed to get a greater size, and continuing until a newline (or end-of-file, or some other error condition on input) is encountered.
malloc() may be used for the initial allocation. malloc() or realloc() may be used for the reallocation (if needed). Bear in mind that a buffer allocated this way must be released (using free()) when the data is no longer needed - otherwise the result is a memory leak.
use the getline() function, this will return the length of the line, and a pointer to the contents of the line in an allocated memory area. (be sure to pass the line pointer to free() when done with it )
"Reading an unknown length line from stdin in c with fgets"
Late response - A Windows approach:
The OP does not specify Linux or Windows, but the viable answers posted in response for this question all seem to have the getline() function in common, which is POSIX only. Functions such as getline() and popen() are very useful and powerful but sadly are not included in Windows environments.
Consequently, implementing such a task in a Windows environment requires a different approach. The link here describes a method that can read input from stdin and has been tested up to 1.8 gigabytes on the system it was developed on. (Also described in the link.)_ The simple code snippet below was tested using the following command line to read large quantities on stdin:
cd c:\dev && dir /s // approximately 1.8Mbyte buffer is returned on my system
Simple example:
#include "cmd_rsp.h"
int main(void)
{
char *buf = {0};
buf = calloc(100, 1);//initialize buffer to some small value
if(!buf)return 0;
cmd_rsp("dir /s", &buf, 100);//recursive directory search on Windows system
printf("%s", buf);
free(buf);
return 0;
}
cmd_rsp() is fully described in the links above, but it is essentially a Windows implementation that includes popen() and getline() like capabilities, packaged up into this very simple function.
if u want to input an unknown length of string or input try using following code.
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main()
{
char *m;
clrscr();
printf("please input a string\n");
scanf("%ms",&m);
if (m == NULL)
fprintf(stderr, "That string was too long!\n");
else
{
printf("this is the string %s\n",m);
/* ... any other use of m */
free(m);
}
getch();
return 0;
}
Note that %ms, %as are GNU extensions..
In the code below, I hope you can see that I have a char* variable and that I want to read in a string from a file. I then want to pass this string back from the function. I'm rather confused by pointers so I'm not too sure what I'm supposed to do really.
The purpose of this is to then pass the array to another function to be searched for a name.
Unfortunately the program crashes as a result and I've no idea why.
char* ObtainName(FILE *fp)
{
char* temp;
int i = 0;
temp = fgetc(fp);
while(temp != '\n')
{
temp = fgetc(fp);
i++;
}
printf("%s", temp);
return temp;
}
Any help would be vastly appreciated.
fgetc returns an int, not a char*. This int is a character from the stream, or EOF if you reach the end of the file.
You're implicitly casting the int to a char*, i.e., interpreting it as an address (turn your warnings on.) When you call printf it reads that address and continues to read a character at a time looking for the null terminator which ends the string, but that address is almost certainly invalid. This is undefined behavior.
I've taken some liberties with what you wanted to accomplish. Rather that deal with pointers, you can just use a fixed sized array as long as you can set a maximum length. I've also included several checks so that you don't run off the end of the buffer or the end of the file. Also important is to make sure that you have a null termination '\0' at the end of the string.
#define MAX_LEN 100
char* ObtainName(FILE *fp)
{
static char temp[MAX_LEN];
int i = 0;
while(i < MAX_LEN-1)
{
if (feof(fp))
{
break;
}
temp[i] = fgetc(fp);
if (temp[i] == '\n')
{
break;
}
i++;
}
temp[i] = '\0';
printf("%s", temp);
return temp;
}
So, there are several problems here:
You're not setting aside any storage for the string contents;
You're not storing the string contents correctly;
You're attempting to read memory that doesn't belong to you;
The way you're attempting to return the string is going to give you heartburn.
1. You're not setting aside storage for the string contents
The line
char *temp;
declares temp as a pointer to char; its value will be the address of a single character value. Since it's declared at local scope without the static keyword, its initial value will be indeterminate, and that value may not correspond to a valid memory address.
It does not set aside any storage for the string contents read from fp; that would have to be done as a separate step, which I'll get to below.
2. You're not storing the string contents correctly
The line
temp = fgetc(fp);
reads the next character from fp and assigns it to temp. First of all, this means you're only storing the last character read from the stream, not the whole string. Secondly, and more importantly, you're assigning the result of fgetc() (which returns a value of type int) to an object of type char * (which is treated as an address). You're basically saying "I want to treat the letter 'a' as an address into memory." This brings us to...
3. You're attempting to read memory that doesn't belong to you
In the line
printf("%s", temp);
you're attempting to print out the string beginning at the address stored in temp. Since the last thing you wrote to temp was most likely a character whose value is < 127, you're telling printf to start at a very low and most likely not accessible address, hence the crash.
4. The way you're attempting to return the string is guaranteed to give you heartburn
Since you've defined the function to return a char *, you're going to need to do one of the following:
Allocate memory dynamically to store the string contents, and then pass the responsibility of freeing that memory on to the function calling this one;
Declare an array with the static keyword so that the array doesn't "go away" after the function exits; however, this approach has serious drawbacks;
Change the function definition;
Allocate memory dynamically
You could use dynamic memory allocation routines to set aside a region of storage for the string contents, like so:
char *temp = malloc( MAX_STRING_LENGTH * sizeof *temp );
or
char *temp = calloc( MAX_STRING_LENGTH, sizeof *temp );
and then return temp as you've written.
Both malloc and calloc set aside the number of bytes you specify; calloc will initialize all those bytes to 0, which takes a little more time, but can save your bacon, especially when dealing with text.
The problem is that somebody has to deallocate this memory when its no longer needed; since you return the pointer, whoever calls this function now has the responsibility to call free() when it's done with that string, something like:
void Caller( FILE *fp )
{
...
char *name = ObtainName( fo );
...
free( name );
...
}
This spreads the responsibility for memory management around the program, increasing the chances that somebody will forget to release that memory, leading to memory leaks. Ideally, you'd like to have the same function that allocates the memory free it.
Use a static array
You could declare temp as an array of char and use the static keyword:
static char temp[MAX_STRING_SIZE];
This will set aside MAX_STRING_SIZE characters in the array when the program starts up, and it will be preserved between calls to ObtainName. No need to call free when you're done.
The problem with this approach is that by creating a static buffer, the code is not re-entrant; if ObtainName called another function which in turn called ObtainName again, that new call will clobber whatever was in the buffer before.
Why not just declare temp as
char temp[MAX_STRING_SIZE];
without the static keyword? The problem is that when ObtainName exits, the temp array ceases to exist (or rather, the memory it was using is available for someone else to use). That pointer you return is no longer valid, and the contents of the array may be overwritten before you can access it again.
Change the function definition
Ideally, you'd like for ObtainName to not have to worry about the memory it has to write to. The best way to achieve that is for the caller to pass target buffer as a parameter, along with the buffer's size:
int ObtainName( FILE *fp, char *buffer, size_t bufferSize )
{
...
}
This way, ObtainName writes data into the location that the caller specifies (useful if you want to obtain multiple names for different purposes). The function will return an integer value, which can be a simple success or failure, or an error code indicating why the function failed, etc.
Note that if you're reading text, you don't have to read character by character; you can use functions like fgets() or fscanf() to read an entire string at a time.
Use fscanf if you want to read whitespace-delimited strings (i.e., if the input file contains "This is a test", fscanf( fp, "%s", temp); will only read "This"). If you want to read an entire line (delimited by a newline character), use fgets().
Assuming you want to read an individual string at a time, you'd use something like the following (assumes C99):
#define FMT_SIZE 20
...
int ObtainName( FILE *fp, char *buffer, size_t bufsize )
{
int result = 1; // assume success
int scanfResult = 0;
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
scanfResult = fscanf( fp, fmt, buffer );
if ( scanfResult == EOF )
{
// hit end-of-file before reading any text
result = 0;
}
else if ( scanfResult == 0 )
{
// did not read anything from input stream
result = 0;
}
else
{
result = 1;
}
return result;
}
So what's this noise
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
about? There is a very nasty security hole in fscanf() when you use the %s or %[ conversion specifiers without a maximum length specifier. The %s conversion specifier tells fscanf to read characters until it sees a whitespace character; if there are more non-whitespace characters in the stream than the buffer is sized to hold, fscanf will store those extra characters past the end of the buffer, clobbering whatever memory is following it. This is a common malware exploit. So we want to specify a maximum length for the input; for example, %20s says to read no more than 20 characters from the stream and store them to the buffer.
Unfortunately, since the buffer length is passed in as an argument, we can't write something like %20s, and fscanf doesn't give us a way to specify the length as an argument the way fprintf does. So we have to create a separate format string, which we store in fmt. If the input buffer length is 10, then the format string will be %10s. If the input buffer length is 1000, then the format string will be %1000s.
The following code expands on that in your question, and returns the string in allocated storage:
char* ObtainName(FILE *fp)
{
int temp;
int i = 1;
char *string = malloc(i);
if(NULL == string)
{
fprintf(stderr, "malloc() failed\n");
goto CLEANUP;
}
*string = '\0';
temp = fgetc(fp);
while(temp != '\n')
{
char *newMem;
++i;
newMem=realloc(string, i);
if(NULL==newMem)
{
fprintf(stderr, "realloc() failed.\n");
goto CLEANUP;
}
string=newMem;
string[i-1] = temp;
string[i] = '\0';
temp = fgetc(fp);
}
CLEANUP:
printf("%s", string);
return(string);
}
Take care to 'free()' the string returned by this function, or a memory leak will occur.
I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.