Reading a char array using %s i.e string specifier - c

char *ptr=(char*)calloc(n,sizeof(int));
using the above, we can allocate memory for char array. But is reading it character-by-character mandatory? How to read and access it using%s` i.e the string format specifier?

Reading character by character is not mandatory and using exactly %s is susceptible to buffer overruns. Specifying the maximum number of characters to read, one less than the number of bytes in the buffer being populated, prevents the buffer overrun. For example "%10s" reads a maximum of ten characters then assigns the null terminating character so the target buffer requires at least 11 bytes.
However, as the code suggests that n is unknown at compile time using %s with a dynamic width is not possible explicitly. But it would be possible to construct the format specifier (the format specifier is not required to be a string literal):
char fmt[32];
sprintf(fmt, "%%%ds", n - 1); /* If 'n == 10' then 'fmt == %9s' */
if (1 == scanf(fmt, ptr))
{
printf("[%s]\n", ptr);
}
An alternative would be fgets():
if (fgets(ptr, n, stdin))
{
}
but the behaviour is slightly different:
fgets() does use whitespace to terminate input.
fgets() will store the newline character if it encounters it.
Casting the return value of calloc() (or malloc() or realloc()) is unrequired (see Do I cast the result of malloc?) and the posted is confusing as it is allocating space for int[n] but is intended to be character array. Instead:
char* ptr = calloc(n, 1); /* 1 == sizeof(char) */
Also, if a null terminated string is being read into ptr the initialisation provided by calloc() is superfluous so a malloc() only would suffice:
char* ptr = malloc(n, 1);
And remember to free() whatever you malloc()d, calloc()d or realloc()d.

Yes, you can read such array using %s but make sure you have allocated enough memory for what you try to read(don't forget the terminating zero character!).

Related

Any simple way to read a string of variable length in C?

I tried reading using:
char *input1, *input2;
scanf("%s[^\n]", input1);
scanf("%s[^\n]", input2);
I am obviously doing something wrong because the second string is read as null. I know using scanf() is not recommended but I couldn't find any other simple way to do the same.
The statement:
char *input1, *input2;
allocates memory for two pointers to char. Note that this only allocated memory for that pointers — which are uninitialised and aren't pointing to anything meaningful — not what they're pointing to.
The call to scanf() then tries to write to memory out of bounds, and results in undefined behaviour.
You could instead, declare character arrays of fixed size with automatic storage duration:
char input1[SIZE];
This will allocate memory for the array, and the call to scanf() will be valid.
Alternatively, you could allocate memory dynamically for the pointers with one of the memory allocation functions:
char *input1 = malloc (size);
This declares a pointer to char whose contents are indeterminate, but are immediately overwritten with a pointer to a chunk of memory of size size. Note that the call to malloc() may have failed. It returns NULL as an error code, so check for it.
But scanf() should not be used as a user-input interface. It does not guard against buffer overflows, and will leave a newline in the input buffer (which leads to more problems down the road).
Consider using fgets instead. It will null-terminate the buffer and read at most size - 1 characters.
The calls to scanf() can be replaced with:
fgets (buf, sizeof buf, stdin);
You can then parse the string with sscanf, strtol, et cetera.
Note that fgets() will retain the trailing newline if there was space. You could use this one-liner to remove it:
buf [strcspn (buf, "\n\r") = '\0`;
This takes care of the return carriage as well, if any.
Or if you wish to continue using scanf() (which I advise against), use a field width to limit input and check scanf()'s return value:
scanf ("%1023s", input1); /* Am using 1023 as a place holder */
That being said, if you wish to read a line of variable length, you need to allocate memory dynamically with malloc(), and then resize it with realloc() as necessary.
On POSIX-compliant systems, you could use getline() to read strings of arbitrary length, but note that it's vulnerable to a DOS attack.
You can use m modifier to format specifier. Note that it is not standard C but rather a standard POSIX extension.
char *a, *b;
scanf("%m[^\n] %m[^\n]", &a, &b);
// use a and b
printf("*%s*\n*%s*\n", a, b);
free(a);
free(b);
There are 2 simple ways to read variable length strings from the input stream:
using fgets() with an array large enough for the maximum length:
char input1[200];
if (fgets(input1, sizeof input1, stdin)) {
/* string was read. strip the newline if present */
input1[strcspn(input1, "\n")] = '\0';
...
} else {
/* nothing was read: premature end of file? */
...
}
on POSIX compliant systems, you can use getline() to read strings of arbitrary length into arrays allocated with malloc():
char *input1 = NULL;
size_t input1_size = 0;
ssize_t input1_length = getline(&input1, &input1_size, stdin);
if (input1_length >= 0) {
/* string was read. length is input1_length */
if (input1_length > 0 && input1[input1_length - 1] == '\n') {
/* remove the newline if present */
input1[--input1_length] = '\0';
}
...
} else {
/* nothing was read: premature end of file? */
...
}
Using scanf is not recommended because it is difficult to use correctly and reading input with "%s" or "%[^\n]" without a specified maximum length is risky as any sufficiently long input will cause a buffer overflow and undefined behavior. Passing uninitialized pointers to scanf as you do in the posted code has undefined behavior.
Any simple way to read a string of variable length in C?
Unfortunately the answer is NO
The input functions (e.g. scanf, fgets, etc.) specified by the C standard all requires the caller to provide the input buffer. Once the input buffer is full, the functions will (when used correctly) return. So if the input is longer than the size of the provided buffer, the functions will only read partial input. So the caller must add code to check for partial input and do additional function calls as needed.
Posix systems has the getline and getdelim functions that can do it. So if you can accept limiting your code to Posix compliant systems, that's what you want to use.
If you need portable, standard compliant code, you need to write your own function. For that you need to look into functions like realloc, fgets, strcpy, memcpy, etc. It's not a simple task but it's not "rocket science" either. It's been done many, many times before... and if you search the net, it's very likely you can find an open source implementation that you can just copy (make sure to follow the rules for doing that).

When will scanf append '\0' to user input?

I understand that scanf("%s",arr); will automatically append '\0' to the user input. But I thought that it was limited only to this. However even scanf("%[^\n]",arr); also appends a '\0'.(I know this scans all characters until a newline is reached). In this case will '\0' be appended after '\n' or before '\n'. Also how do we figure out when and when not '\0' will be appended? How do we scan many characters to just form a character array and not a string?
Also how do we figure out when and when not '\0' will be appended?
As scanf is a standard function, we figure that out from the C standard. From C11 draft 7.21.6.2p12:
s
... the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically. ...
[
... the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically. ...
There is also nicer looking and commonly used for fast reference the cppreference site.
I strongly suggest you never, ever use %s and %[ without a number specifying the "maximum field width" (unless you know you can use it). Always use %<number>s and %<number>[, to limit the number of character read and prevent overflow. So always:
char arr[20];
scanf("%19s", arr);
scanf is very unsafe function in that manner, it's easy to do stack overflow when reading strings.
How do we scan many characters to just form a character array and not a string?
I think you want:
size_t fread(void * restrict ptr,
size_t size, size_t nmemb,
FILE * restrict stream);
The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream.
It will read nmemb elements of size size to the memory pointed to by ptr. So just:
char character_array[20];
size_t number_of_chcaracter_read = fread(character_array, 20, 1, stdin);
It will not append the terminating null byte to the character_array.

memory allocation error with scanf

I have this simple code, and can't figure out how to allocate memory for scanf
char* string= (char*) malloc (sizeof(char));
printf("insert string: \n");
scanf("%s", string);
free(string);
doesn't matter how many chars my string is, it's an error. I want to use malloc for the char*, any way to set memory for scanf.
You are just request 1 byte. You need to allocate more than that if you want store more than just the 0 terminator in the string:
char* string= malloc (256); //256 bytes, arbitrary value...
I removed:
sizeof(char) because it's always guaranteed to be 1.
(char*) because casting the return value of malloc() is needless.
I would also recommend using fgets() instead of scanf() to prevent overflowing the buffer. The same could be done with scanf() by specifying the length in the format string. But I personally prefer using fgets() and parsing using sscanf() if necessary.
Try
char buf[256];
scanf( "%s" , buf);
Delete your malloc.
You're allocating a tiny piece of memory to what could be a variable sized array of string. By default an array is just a pointer to buf[0] which is what string will give it. You'll need a buffer to capture the initial read and then allocate the proper memory afterwards.
Afterwards you can do string = length(buf) * sizeof(char) <- pseudocode
to assign the correct length of the string.
As #Blue Moon correctly pointed out, you are just allocating memory to store one character, and the only string a single character can store is an empty string or a string without characters. In C, all strings are null terminated, and the 1 byte of memory you have allocated can accommodate only a null character, i.e. '\0'
In order to store more characters in your string, say a hundred characters, you can do the following:
#define MAX_STRING_SIZE 102
char* string= (char*) malloc (sizeof(char) * MAX_STRING_SIZE);
memset(string, '\0', MAX_STRING_SIZE);
printf("Insert string: \n");
scanf("%101s", string);
if (strlen(string) == (MAX_STRING_SIZE - 1))
{
printf("The user entered a string larger than 100 characters!");
free(string);
/****Error handling***/
}
/*****Do something with string****/
free(string)
The above snippet is a nice little trick to find out if the user entered a valid number of characters or not. If you wanted the user to enter a maximum of 100 character and your program is always running in an ideal scenario(where the user listens to you all the time and never enters more than 100 characters) you just need one extra byte to store the null character. Hence, you need space for 101 characters. But you can add space for another character, and allocate 102 bytes instead to find out the cases when the user enters a string longer than 100 characters.
How this works: Say the user enters 100+ characters. The scanf function can read a maximum of 101 characters. So, we can use the strlen function to check if the number of characters input by the user are equal to 101 or not, which will help us to validate the input string.

Specifying the maximum string length to scanf dynamically in C (like "%*s" in printf)

I can specify the maximum amount of characters for scanf to read to a buffer using this technique:
char buffer[64];
/* Read one line of text to buffer. */
scanf("%63[^\n]", buffer);
But what if we do not know the buffer length when we write the code? What if it is the parameter of a function?
void function(FILE *file, size_t n, char buffer[n])
{
/* ... */
fscanf(file, "%[^\n]", buffer); /* WHAT NOW? */
}
This code is vulnerable to buffer overflows as fscanf does not know how big the buffer is.
I remember seeing this before and started to think that it was the solution to the problem:
fscanf(file, "%*[^\n]", n, buffer);
My first thought was that the * in "%*[*^\n]" meant that the maximum string size is passed an argument (in this case n). This is the meaning of the * in printf.
When I checked the documentation for scanf I found out that it means that scanf should discard the result of [^\n].
This left me somewhat disappointed as I think that it would be a very useful feature to be able to pass the buffer size dynamically for scanf.
Is there any way I can pass the buffer size to scanf dynamically?
Basic answer
There isn't an analog to the printf() format specifier * in scanf().
In The Practice of Programming, Kernighan and Pike recommend using snprintf() to create the format string:
size_t sz = 64;
char format[32];
snprintf(format, sizeof(format), "%%%zus", sz);
if (scanf(format, buffer) != 1) { …oops… }
Extra information
Upgrading the example to a complete function:
int read_name(FILE *fp, char *buffer, size_t bufsiz)
{
char format[16];
snprintf(format, sizeof(format), "%%%zus", bufsiz - 1);
return fscanf(fp, format, buffer);
}
This emphasizes that the size in the format specification is one less than the size of the buffer (it is the number of non-null characters that can be stored without counting the terminating null). Note that this is in contrast to fgets() where the size (an int, incidentally; not a size_t) is the size of the buffer, not one less. There are multiple ways of improving the function, but it shows the point. (You can replace the s in the format with [^\n] if that's what you want.)
Also, as Tim Čas noted in the comments, if you want (the rest of) a line of input, you're usually better off using fgets() to read the line, but remember that it includes the newline in its output (whereas %63[^\n] leaves the newline to be read by the next I/O operation). For more general scanning (for example, 2 or 3 strings), this technique may be better — especially if used with fgets() or getline() and then sscanf() to parse the input.
Also, the TR 24731-1 'safe' functions, implemented by Microsoft (more or less) and standardized in Annex K of ISO/IEC 9899-2011 (the C11 standard), require a length explicitly:
if (scanf_s("%[^\n]", buffer, sizeof(buffer)) != 1)
...oops...
This avoids buffer overflows, but probably generates an error if the input is too long. The size could/should be specified in the format string as before:
if (scanf_s("%63[^\n]", buffer, sizeof(buffer)) != 1)
...oops...
if (scanf_s(format, buffer, sizeof(buffer)) != 1)
...oops...
Note that the warning (from some compilers under some sets of flags) about 'non-constant format string' has to be ignored or suppressed for code using the generated format string.
There is indeed no variable width specifier in the scanf family of functions. Alternatives include creating the format string dynamically (though this seems a bit silly if the width is a compile-time constant) or simply accepting the magic number. One possibility is to use preprocessor macros for specifying both the buffer and format string width:
#define STR_VALUE(x) STR(x)
#define STR(x) #x
#define MAX_LEN 63
char buffer[MAX_LEN + 1];
fscanf(file, "%" STR_VALUE(MAX_LEN) "[^\n]", buffer);
Another option is to #define the length of the string:
#define STRING_MAX_LENGTH "%10s"
or
#define DOUBLE_LENGTH "%5lf"

How to use scanf \ fscanf to read a line and parse into variables?

I'm trying to read a text file built with the following format in every line:
char*,char*,int
i.e.:
aaaaa,dfdsd,23
bbbasdaa,ddd,100
i want to use fscanf to read a line from file, and automatically parse the line into the varilables string1,string2,intA
What's the correct way of doing it ?
Thanks
Assuming you have:
char string1[20];
char string1[20];
int intA;
you could do:
fscanf(file, "%19[^,],%19[^,],%d\n", string1, string2, &intA);
%[^,] reads a string of non-comma characters and stops at the first comma. 19 is the maximum number of characters to read (assuming a buffer size of 20) so that you don't have buffer overflows.
If you really cannot make any safe assumption about the length of a line, you should use getline(). This function takes three arguments: a pointer to a string (char**), a pointer to an int holding the size of that string and a file pointer and returns the length of the line read. getline() dynamically allocates space for the string (using malloc / realloc) and thus you do not need to know the length of the line and there are no buffer overruns. Of course, it is not as handy as fscanf, because you have to split the line manually.
Example:
char **line=NULL;
int n=0,len;
FILE *f=fopen("...","r");
if((len=getline(&line,&n,f)>0)
{
...
}
free(line);
fclose(f);

Resources