Any simple way to read a string of variable length in C? - c

I tried reading using:
char *input1, *input2;
scanf("%s[^\n]", input1);
scanf("%s[^\n]", input2);
I am obviously doing something wrong because the second string is read as null. I know using scanf() is not recommended but I couldn't find any other simple way to do the same.

The statement:
char *input1, *input2;
allocates memory for two pointers to char. Note that this only allocated memory for that pointers — which are uninitialised and aren't pointing to anything meaningful — not what they're pointing to.
The call to scanf() then tries to write to memory out of bounds, and results in undefined behaviour.
You could instead, declare character arrays of fixed size with automatic storage duration:
char input1[SIZE];
This will allocate memory for the array, and the call to scanf() will be valid.
Alternatively, you could allocate memory dynamically for the pointers with one of the memory allocation functions:
char *input1 = malloc (size);
This declares a pointer to char whose contents are indeterminate, but are immediately overwritten with a pointer to a chunk of memory of size size. Note that the call to malloc() may have failed. It returns NULL as an error code, so check for it.
But scanf() should not be used as a user-input interface. It does not guard against buffer overflows, and will leave a newline in the input buffer (which leads to more problems down the road).
Consider using fgets instead. It will null-terminate the buffer and read at most size - 1 characters.
The calls to scanf() can be replaced with:
fgets (buf, sizeof buf, stdin);
You can then parse the string with sscanf, strtol, et cetera.
Note that fgets() will retain the trailing newline if there was space. You could use this one-liner to remove it:
buf [strcspn (buf, "\n\r") = '\0`;
This takes care of the return carriage as well, if any.
Or if you wish to continue using scanf() (which I advise against), use a field width to limit input and check scanf()'s return value:
scanf ("%1023s", input1); /* Am using 1023 as a place holder */
That being said, if you wish to read a line of variable length, you need to allocate memory dynamically with malloc(), and then resize it with realloc() as necessary.
On POSIX-compliant systems, you could use getline() to read strings of arbitrary length, but note that it's vulnerable to a DOS attack.

You can use m modifier to format specifier. Note that it is not standard C but rather a standard POSIX extension.
char *a, *b;
scanf("%m[^\n] %m[^\n]", &a, &b);
// use a and b
printf("*%s*\n*%s*\n", a, b);
free(a);
free(b);

There are 2 simple ways to read variable length strings from the input stream:
using fgets() with an array large enough for the maximum length:
char input1[200];
if (fgets(input1, sizeof input1, stdin)) {
/* string was read. strip the newline if present */
input1[strcspn(input1, "\n")] = '\0';
...
} else {
/* nothing was read: premature end of file? */
...
}
on POSIX compliant systems, you can use getline() to read strings of arbitrary length into arrays allocated with malloc():
char *input1 = NULL;
size_t input1_size = 0;
ssize_t input1_length = getline(&input1, &input1_size, stdin);
if (input1_length >= 0) {
/* string was read. length is input1_length */
if (input1_length > 0 && input1[input1_length - 1] == '\n') {
/* remove the newline if present */
input1[--input1_length] = '\0';
}
...
} else {
/* nothing was read: premature end of file? */
...
}
Using scanf is not recommended because it is difficult to use correctly and reading input with "%s" or "%[^\n]" without a specified maximum length is risky as any sufficiently long input will cause a buffer overflow and undefined behavior. Passing uninitialized pointers to scanf as you do in the posted code has undefined behavior.

Any simple way to read a string of variable length in C?
Unfortunately the answer is NO
The input functions (e.g. scanf, fgets, etc.) specified by the C standard all requires the caller to provide the input buffer. Once the input buffer is full, the functions will (when used correctly) return. So if the input is longer than the size of the provided buffer, the functions will only read partial input. So the caller must add code to check for partial input and do additional function calls as needed.
Posix systems has the getline and getdelim functions that can do it. So if you can accept limiting your code to Posix compliant systems, that's what you want to use.
If you need portable, standard compliant code, you need to write your own function. For that you need to look into functions like realloc, fgets, strcpy, memcpy, etc. It's not a simple task but it's not "rocket science" either. It's been done many, many times before... and if you search the net, it's very likely you can find an open source implementation that you can just copy (make sure to follow the rules for doing that).

Related

getline how to limit amount of input as you can with fgets

GNU manual
This quote is from the GNU manual
Warning: If the input data has a null character, you can’t tell. So
don’t use fgets unless you know the data cannot contain a null. Don’t
use it to read files edited by the user because, if the user inserts a
null character, you should either handle it properly or print a clear
error message. We recommend using getline instead of fgets.
As I usually do, I spent time searching before asking a question, and I did find a similar question on Stack Overflow from five years ago:
Why is the fgets function deprecated?
Although GNU recommends getline over fgets, I noticed that getline in stdio.h takes any size line. It calls realloc as needed. If I try to set the size to 10 char:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *buffer;
size_t bufsize = 10;
size_t characters;
buffer = (char *)malloc(bufsize * sizeof(char));
if( buffer == NULL)
{
perror("Unable to allocate buffer");
exit(1);
}
printf("Type something: ");
characters = getline(&buffer,&bufsize,stdin);
printf("%zu characters were read.\n",characters);
printf("You typed: '%s'\n",buffer);
return(0);
}
In the code above, type any size string, over 10 char, and getline will read it and give you the right output.
There is no need to even malloc, as I did in the code above — getline does it for you. I'm setting the buffer to size 0, and getline will malloc and realloc for me as needed.
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *buffer;
size_t bufsize = 0;
size_t characters;
printf("Type something: ");
characters = getline(&buffer,&bufsize,stdin);
printf("%zu characters were read.\n",characters);
printf("You typed: '%s'\n",buffer);
return(0);
}
If you run this code, again you can enter any size string, and it works. Even though I set the buffer size to 0.
I've been looking at safe coding practices from CERT guidelines www.securecoding.cert.org
I was thinking of switching from fgets to getline, but the issue I am having, is I cannot figure out how to limit the input in getline. I think a malicious attacker can use a loop to send an unlimited amount of data, and use up all the ram available in the heap?
Is there a way of limiting the input size that getline uses or does getline have some limit within the function?
Using fgets is not necessarily problematic, all the gnu manual tells you is that if there's a '\0'-Byte in the file, so will there be in your buffer. You won't be able to tell if the null-delimiter in your buffer is the actual end of the file or just a null within the file. This means you can read a 100 char file into a 200 char buffer and it will contain a 50 char c-string.
The stdio.h readline in fact doesn't appear to have any sane length limitation so fread might be viable alternative.
Unlinke C getline and C++ std::getline(), C++ std::istream::getline() is limited to count characters
The GNU manual is just bad. Limiting the input length is usually the right thing to do, especially if input is untrusted, and fgets does this correctly. getline cannot be used safely in such a context.

Reading a char array using %s i.e string specifier

char *ptr=(char*)calloc(n,sizeof(int));
using the above, we can allocate memory for char array. But is reading it character-by-character mandatory? How to read and access it using%s` i.e the string format specifier?
Reading character by character is not mandatory and using exactly %s is susceptible to buffer overruns. Specifying the maximum number of characters to read, one less than the number of bytes in the buffer being populated, prevents the buffer overrun. For example "%10s" reads a maximum of ten characters then assigns the null terminating character so the target buffer requires at least 11 bytes.
However, as the code suggests that n is unknown at compile time using %s with a dynamic width is not possible explicitly. But it would be possible to construct the format specifier (the format specifier is not required to be a string literal):
char fmt[32];
sprintf(fmt, "%%%ds", n - 1); /* If 'n == 10' then 'fmt == %9s' */
if (1 == scanf(fmt, ptr))
{
printf("[%s]\n", ptr);
}
An alternative would be fgets():
if (fgets(ptr, n, stdin))
{
}
but the behaviour is slightly different:
fgets() does use whitespace to terminate input.
fgets() will store the newline character if it encounters it.
Casting the return value of calloc() (or malloc() or realloc()) is unrequired (see Do I cast the result of malloc?) and the posted is confusing as it is allocating space for int[n] but is intended to be character array. Instead:
char* ptr = calloc(n, 1); /* 1 == sizeof(char) */
Also, if a null terminated string is being read into ptr the initialisation provided by calloc() is superfluous so a malloc() only would suffice:
char* ptr = malloc(n, 1);
And remember to free() whatever you malloc()d, calloc()d or realloc()d.
Yes, you can read such array using %s but make sure you have allocated enough memory for what you try to read(don't forget the terminating zero character!).

string input and output in C

I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));

Cannot create a program which will invert string

I am using Linux.
I am trying to write a program in c that will print a string backward.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (){
char string[100];
printf ("Enter string:\n");
gets (string);
int length = strlen (string)-1;
for (length = length; length>=0; length--){
puts (string[length]);
}
}
And here is the error:
a.c:10: warning: passing argument 1 of ‘puts’ makes pointer from integer without a cast
/usr/include/stdio.h:668: note: expected ‘const char *’ but argument is of type ‘char’
/tmp/cc5rpeG7.o: In function `main':
a.c:(.text+0x29): warning: the `gets' function is dangerous and should not be used.
What should I do?
Forget that the function gets() exists - it is lethal. Use fgets() instead (but note that it does not remove the newline at the end of the line).
You want to put a single character at a time: use putchar() to write it to stdout. Don't forget to add a newline to the output after the loop.
Also, for (length = length; length >= 0; length--) is not idiomatic C. Use one of:
for ( ; length >= 0; length--)
for (length = strlen(string) - 1; length >= 0; length--)
for (int length = strlen(string) - 1; length >= 0; length--)
The last alternative uses a feature added to C99 (which was available in C++ long before).
Also, we could debate whether length is the appropriate name for the variable. It would be better renamed as i or pos or something similar because, although it is initialized to the length of the input, it is actually used as an array index, not as the length of anything.
Subjective: Don't put a space between the name of a function and its parameter list. The founding fathers of C don't do that - neither should you.
Why is gets() lethal?
The first Internet worm - the Morris worm from 1988 - exploited the fingerd program that used gets() instead of fgets(). Since then, numerous programs have been crashed because they used gets() and not fgets() or another alternative.
The fundamental problem is that gets() does not know how much space is available to store the data it reads. This leads to 'buffer overflows', a term which can be searched for in your favourite search engine that will return an enormous number of entries.
If someone types 150 characters of input to the example program, then gets() will store 150 characters in the array which has length 100. This never leads to happiness - it usually leads to a core dump, but with carefully chosen inputs - often generated by a Perl or Python script - you can probably get the program to execute arbitrary other code. This really matters if the program will ever be run by a user with 'elevated privileges'.
Incidentally, gets() is likely to be removed from the Standard C library in the next release (C1x - see n1494 from WG14). It won't vanish from actual C libraries for a long time yet (20 years?), but it should be replaced with this implementation (or something similar):
#undef NDEBUG
#include <assert.h>
char *gets(char *buffer)
{
assert("Probability of using gets() safely" == 0);
}
One other minor detail, discussed in part under the comments to the main question.
The code shown is clearly for C99; the declaration of length part way through the function is invalid in C89. Given that, it is 'OK' for the main() function not to explicitly return a value, because the C99 standard follows the lead of the C++ standard and allows you to omit the return from main() and the effect is the same as return(0); or return 0; at the end.
As such, the program in this question cannot strictly be faulted for not having a return at the end. However, I regard that as one of the more peculiar standardizing decisions, and would much prefer it if the standards had left that provision out - or done something more radical like allowing the ubiquitous but erroneous void main() observing that when control returns from that, the result is that a success status is returned to the environment. It isn't worth fighting to get that aspect of the standard changed - sadly - but as a personal style decision, I don't take advantage of the licence granted to omit the final return from main(). If the code has to work with C89 compilers, it should have the explicit return 0; at the end (but then the declaration of length has to be fixed too).
You can also use recursion to do it. I think it looks nicer then when using a loop.
Just call the method with your string, and before printing the char in the method, call the method again with the same string, minus the first char.
This will print out you string in reversed order.
First:
NEVER NEVER NEVER NEVER NEVER use gets(); it will introduce a point of failure in your code. There's no way to tell gets() how big the target buffer is, so if you pass a buffer sized to hold 10 characters and there's 100 characters in the input stream, gets() will happily store those extra 90 characters in the memory beyond the end of your buffer, potentially clobbering something important. Buffer overruns are an easy malware exploit; the Morris worm specifically exploited a gets() call in sendmail.
Use fgets() instead; it allows you to specify the maximum number of characters to read from the input stream. However, unlike gets(), fgets() will save the terminating newline character to the buffer if there's room for it, so you have to account for that:
char string[100];
char *newline;
printf("Enter a string: ");
fflush(stdout);
fgets(string, sizeof string, stdin);
newline = strchr(buffer, '\n'); // search for the newline character
if (newline) // if it's present
*newline = 0; // set it to zero
Now that's out of the way...
Your error is coming from the fact that puts() expects an argument of type char *, but you're passing an argument of type char, hence the "pointer from integer without cast" message (char is an integral type). To write a single character to stdout, use putchar() or fputc().
You should use putchar instead of puts
So this loop:
for (length = length; length>=0; length--){
puts (string[length]);
}
Will be:
for (length = length; length>=0; length--){
putchar (string[length]);
}
putchar will take a single char as a parameter and print it to stdout, which is what you want. puts, on the other hand, will print the whole string to stdout. So when you pass a single char to a function that expects a whole string (char array, NULL terminated string), compiler gets confused.
Use putc or putchar, as puts is specified to take a char* and you are feeding it a char.

snprintf vs. strcpy (etc.) in C

For doing string concatenation, I've been doing basic strcpy, strncpy of char* buffers. Then I learned about the snprintf and friends.
Should I stick with my strcpy, strcpy + \0 termination? Or should I just use snprintf in the future?
For most purposes I doubt the difference between using strncpy and snprintf is measurable.
If there's any formatting involved I tend to stick to only snprintf rather than mixing in strncpy as well.
I find this helps code clarity, and means you can use the following idiom to keep track of where you are in the buffer (thus avoiding creating a Shlemiel the Painter algorithm):
char sBuffer[iBufferSize];
char* pCursor = sBuffer;
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), "some stuff\n");
for(int i = 0; i < 10; i++)
{
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), " iter %d\n", i);
}
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), "into a string\n");
snprintf is more robust if you want to format your string. If you only want to concatenate, use strncpy (don't use strcpy) since it's more efficient.
As others did point out already: Do not use strncpy.
strncpy will not zero terminate in case of truncation.
strncpy will zero-pad the whole buffer if string is shorter than buffer. If buffer is large, this may be a performance drain.
snprintf will (on POSIX platforms) zero-terminate. On Windows, there is only _snprintf, which will not zero-terminate, so take that into account.
Note: when using snprintf, use this form:
snprintf(buffer, sizeof(buffer), "%s", string);
instead of
snprintf(buffer, sizeof(buffer), string);
The latter is insecure and - if string depends on user input - can lead to stack smashes, etc.
sprintf has an extremely useful return value that allows for efficient appending.
Here's the idiom:
char buffer[HUGE] = {0};
char *end_of_string = &buffer[0];
end_of_string += sprintf( /* whatever */ );
end_of_string += sprintf( /* whatever */ );
end_of_string += sprintf( /* whatever */ );
You get the idea. This works because sprintf returns the number of characters it wrote to the buffer, so advancing your buffer by that many positions will leave you pointing to the '\0' at the end of what's been written so far. So when you hand the updated position to the next sprintf, it can start writing new characters right there.
Constrast with strcpy, whose return value is required to be useless. It hands you back the same argument you passed it. So appending with strcpy implies traversing the entire first string looking for the end of it. And then appending again with another strcpy call implies traversing the entire first string, followed by the 2nd string that now lives after it, looking for the '\0'. A third strcpy will re-traverse the strings that have already been written yet again. And so forth.
So for many small appends to a very large buffer, strcpy approches (O^n) where n is the number of appends. Which is terrible.
Plus, as others mentioned, they do different things. sprintf can be used to format numbers, pointer values, etc, into your buffer.
I think there is another difference between strncpy and snprintf.
Think about this:
const int N=1000000;
char arr[N];
strncpy(arr, "abce", N);
Usually, strncpy will set the rest of the destination buffer to '\0'. This will cost lots of CPU time. While when you call snprintf,
snprintf(a, N, "%s", "abce");
it will leave the buffer unchanged.
I don't know why strncpy will do that, but in this case, I will choose snprintf instead of strncpy.
All *printf functions check formatting and expand its corresponding argument, thus it is slower than a simple strcpy/strncpy, which only copy a given number of bytes from linear memory.
My rule of thumb is:
Use snprintf whenever formatting is needed.
Stick to strncpy/memcpy when only need to copy a block of linear memory.
You can use strcpy whenever you know exatcly the size of buffers you're copying. Don't use that if you don't have full control over the buffers size.
strcpy, strncpy, etc. only copies strings from one memory location to another. But, with snprint, you can do more stuff like formatting the string. Copying integers into buffer, etc.
It purely depends on your requirement which one to use. If as per your logic, strcpy & strncpy is already working for you, there is no need to jump to snprintf.
Also, remember to use strncpy for better safety as suggested by others.
The difference between strncpy and snprintf is that strncpy basically lays on you responsibility of terminating string with '\0'. It may terminate dst with '\0' but only if src is short enough.
Typical examples are:
strncpy(dst, src, n);
// if src is longer than n dst will not contain null
// terminated string at this point
dst[n - 1] = '\0';
snprintf(dst, n, "%s", src); // dst will 100% contain null terminated string

Resources