C - can sscanf read from the same string it's writing to? - c

I have the following code
int myInt;
sscanf(str, "%d=%s", &myInt, str);
Will this be valid? Is there a better way to do this if I have it in a loop?

My guess is that this wil usually work because it seems like the source string will always be >= the result string, and that would seem to cause deterministic and as-specified results.
But I still wouldn't do it. Library functions typically have restrict-qualified parameters in order to allow for optimizations and prefetch.
Don't tempt the compiler.

You can utilize the %n conversion specification - chars read so far; int* parameter required. CStdLib.html#fscanf
Worth noticing. The standard say, about %n:
No input is consumed.
Corresponding argument shall be a pointer to signed integer.
Into which is written the number of characters read so far.
Does not increment the assignment count returned on completion.
If includes an assignment-suppressing character or a field width, the behavior
is undefined.
Ref. ISO/IEC 9899:1999 §7.19.6.2
As concept:
#include <stdio.h>
#include <string.h>
int main(void)
{
int n;
char *str = "12345=blabla 1313=blah "
"333=hello 343=goodbyeeeeeeeeeeeeeeeeeeeeeeeeeeee";
char buf[15];
int lt;
/* +----- limit read to buffer length -1
| +--- store read length here
| | */
while (sscanf(str, "%d=%14s%n", &n, buf, &lt) == 2) {
fprintf(stdout,
":: '%s' :: \n"
"Num: %d\n"
"Str: %s\n"
"Tot: %d bytes\n\n",
str,
n, buf,
lt);
str += lt;
}
return 0;
}
Should give something like (overlong %s input breaks loop):
:: '12345=blabla 1313=blah 333=hello 343=goodbyeeeeeeeeeeeeeeeeeeeeeeeeeeee' ::
Num: 12345
Str: blabla
Tot: 12 bytes
:: ' 1313=blah 333=hello 343=goodbyeeeeeeeeeeeeeeeeeeeeeeeeeeee' ::
Num: 1313
Str: blah
Tot: 10 bytes
:: ' 333=hello 343=goodbyeeeeeeeeeeeeeeeeeeeeeeeeeeee' ::
Num: 333
Str: hello
Tot: 10 bytes
:: ' 343=goodbyeeeeeeeeeeeeeeeeeeeeeeeeeeee' ::
Num: 343
Str: goodbyeeeeeeee
Tot: 19 bytes
How one would like to handle input longer then buffer could be many. I.e. check if eof str and if not re-alloc. Start out with buffer of length = str, etc.
Note that numbers > INT_MAX or < INT_MIN is undefined behavior (). Will (normally?) be truncated to INT_MAX or INT_MIN respectively when using the "%d" specification.
I.e.:
"1234533333333333333=blabla", read by "%d%s" =>
Num: 2147483647
Str: blabla
Tot: 26 bytes consumed
One way to tackle this is to use strtol etc., which, if number is > limit is defined to set value to MAX value for type and set errno = ERANGE. CStdLib.html#strtol

Related

In C, what would happen if I put 'successive wchar_t characters' into a wchar_t variable?

#include <stdio.h>
wchar_t wc = L' 459';
printf("%d", wc); //result : 32
I know the 'space' is 'decimal 32' in ASCII code table.
What I don't understand is, as far as I know, if there's not enough space for a variable to store value, the value would be the 'last digits' of the original value.
Like, if I put binary value '1100 1001 0011 0110' into single byte variable, it would be '0011 0110' which is 'the last byte' of the original binary value.
But the code above shows 'the first byte' of the original value.
I'd like to know what happen in memory level when I execute the code above.
_int64 x = 0x0041'0042'0043'0044ULL;
printf("%016llx\n", x); //prints 0041004200430044
wchar_t wc;
wc = x;
printf("%04X\n", wc); //prints 0044 as you expect
wc = L'\x0041\x0042\x0043\x0044'; //prints 0041, uses the first character
printf("%04X\n", wc);
If you assign an integer value that's too large, the compiler takes the max value 0x0044 that fits in 2 bytes.
If you try to assign several elements in to one element, the compiler takes the first element 0x0041 which fits. L'x' is mean to be a single wide character.
VS2019 will issue a warning for wchar_t wc = L' 459', unless warning level is set to less than 3, but that's not recommended. Use warning level 3 or higher.
wchar_t is a primitive type, not a typedef for unsigned short, but they are both 2 bytes in Windows (4 bytes in linux)
Note that 'abcd' is 4 bytes. The L prefix indicates 2 bytes per element (in Windows), so L'abcd' is 8 bytes.
To see what is inside wc, lets look at Unicode character L'X' which has UTF-16 encoding of 0x0058 (similar to ASCII values up to 128)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
wchar_t wc = L'X';
wprintf(L"%c\n", wc);
char buf[256];
memcpy(buf, &wc, 2);
for (int i = 0; i < 2; i++)
printf("%02X ", buf[i] & 0xff);
printf("\n");
return 0;
}
The output will be 58 00. It is not 00 58 because Windows runs on little-endian systems and the bytes are flipped.
Another weird thing is that UTF16 uses for 4 bytes for some code points. So you will get a warning for this line:
wchar_t wc = L'😀';
Instead you want to use string:
wchar_t *wstr = L"😀";
::MessageBoxW(0, wstr, 0, 0); //console may not display this correctly
This string will be 6 bytes (2 elements + null terminating char)

How much space to allocate for printing long int value in string?

I want to store a long value (LONG_MAX in my test program) in a dynamically allocated string, but I'm confused how much memory I need to allocate for the number to be displayed in the string.
My fist attempt:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
int main(void)
{
char *format = "Room %lu somedata\n";
char *description = malloc(sizeof(char) * strlen(format) + 1);
sprintf(description, format, LONG_MAX);
puts(description);
return 0;
}
Compiled with
gcc test.c
And then running it (and piping it into hexdump):
./a.out | hd
Returns
00000000 52 6f 6f 6d 20 39 32 32 33 33 37 32 30 33 36 38 |Room 92233720368|
00000010 35 34 37 37 35 38 30 37 20 62 6c 61 62 6c 61 0a |54775807 blabla.|
00000020 0a |.|
00000021
Looking at the output, it seems my memory allocation of sizeof(char) * strlen(format) + 1 is wrong (too less memory allocated) and it works more accidentally?
What is the correct amount to allocate then?
My next idea was (pseudo-code):
sizeof(char) * strlen(format) + strlen(LONG_MAX) + 1
This seems too complicated and pretty non-idomatic. Or am I doing something totally wrong?
You are doing it totally wrong. LONG_MAX is an integer, so you can't call strlen (). And it's not the number that gives the longest result, LONG_MIN is. Because it prints a minus character as well.
A nice method is to write a function
char* mallocprintf (...)
which has the same arguments as printf and returns a string allocated using malloc with the exactly right length. How you do this: First figure out what a va_list is and how to use it. Then figure out how to use vsnprintf to find out how long the result of printf would be without actually printing. Then you call malloc, and call vsnprintf again to produce the string.
This has the big advantage that it works when you print strings using %s, or strings using %s with some large field length. Guess how many characters %999999d prints.
You can use snprintf() to figure out the length without worrying about the size of LONG_MAX.
When you call snprintf with NULL string, it'll return a number of bytes that would have been required if it was write into the buffer and then you know exactly how many bytes are required.
char *format = "Room %lu somedata\n";
int len = snprintf(0, 0, format, LONG_MAX); // Returns the number of
//bytes that would have been required for writing.
char *description = malloc( len+1 );
if(!description)
{
/* error handling */
}
snprintf(description, len+1, format, LON_MAX);
Convert the predefined constant numeric value to a string, using macro expansion as explaned in convert digital to string in macro:
#define STRINGIZER_(exp) #exp
#define STRINGIZER(exp) STRINGIZER_(exp)
(code courtesy of Whozcraig). Then you can use
int max_digit = strlen(STRINGIZER(LONG_MAX))+1;
or
int max_digit = strlen(STRINGIZER(LONG_MIN));
for signed values, and
int max_digit = strlen(STRINGIZER(ULONG_MAX));
for unsigned values.
Since the value of LONG_MAX is a compile-time, not a run-time value, you are ensured this writes the correct constant for your compiler into the executable.
To allocate enough room, consider worst case
// Over approximate log10(pow(2,bit_width))
#define MAX_STR_INT(type) (sizeof(type)*CHAR_BIT/3 + 3)
char *format = "Room %lu somedata\n";
size_t n = strlen(format) + MAX_STR_INT(unsigned long) + 1;
char *description = malloc(n);
sprintf(description, format, LONG_MAX);
Pedantic code would consider potential other locales
snprintf(description, n, format, LONG_MAX);
Yet in the end, recommend a 2x buffer
char *description = malloc(n*2);
sprintf(description, format, LONG_MAX);
Note: printing with specifier "%lu" ,meant for unsigned long and passing a long LONG_MAX in undefined behavior. Suggest ULONG_MAX
sprintf(description, format, ULONG_MAX);
With credit to the answer by #Jongware, I believe the ultimate way to do this is the following:
#define STRINGIZER_(exp) #exp
#define STRINGIZER(exp) STRINGIZER_(exp)
const size_t LENGTH = sizeof(STRINGIZER(LONG_MAX)) - 1;
The string conversion turns it into a string literal and therefore appends a null termination, therefore -1.
And not that since everything is compile-time constants, you could as well simply declare the string as
const char *format = "Room " STRINGIZER(LONG_MAX) " somedata\n";
You cannot use the format. You need to observer
LONG_MAX = 2147483647 = 10 characters
"Room somedata\n" = 15 characters
Add the Null = 26 characters
so use
malloc(26)
should suffice.
You have to allocate a number of char equal to the digits of the number LONG_MAX that is 2147483647. The you have to allocate 10 digit more.
in your format string you fave
Room = 4 chars
somedata\n = 9
spaces = 2
null termination = 1
The you have to malloc 26 chars
If you want to determinate runtime how man digit your number has you have to write a function that test the number digit by digit:
while(n!=0)
{
n/=10; /* n=n/10 */
++count;
}
Another way is to store temporary the sprintf result in a local buffer and the mallocate strlen(tempStr)+1 chars.
Usually this is done by formatting into a "known" large enough buffer on the stack and then dynamically allocated whatever is needed to fit the formatted string. i.e.:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
int main(void)
{
char buffer[1024];
sprintf(buffer, "Room %lu somedata\n", LONG_MAX);
char *description = malloc( strlen( buffer ) + 1 );
strcpy( description, buffer );
puts(description);
return 0;
}
Here, your using strlen(format) for allocation of memory is bit problematic. it will allocate memory considering the %lu lexicographically, not based on the lexicographical width of the value that can be printed with %lu.
You should consider the max possible value for unsigned long,
ULONG_MAX 4294967295
lexicographically 10 chars.
So, you've to allocate space for
The actual string (containing chars), plus
10 chars (at max), for the lexicographical value for %lu , plus
1 char to represnt - sign, in case the value is negative, plus
1 null terminator.
Well, if long is a 32-bit on your machine, then LONG_MAX should be 2147483647, which is 10 characters long. You need to account for that, the rest of your string, and the null character.
Keep in mind that long is a signed value with maximum value of LONG_MAX, and you are using %lu (which is supposed to print an unsigned long value). If you can pass a signed value to this function, then add an additional character for the minus sign, otherwise you might use ULONG_MAX to make it clearer what your limits are.
If you are unsure which architecture you are running on, you might use something like:
// this should work for signed 32 or 64 bit values
#define NUM_CHARS ((sizeof(long) == 8) ? 21 : 11)
Or, play safe and simply use 21. :)
Use the following code to calculate the number of characters necessary to hold the decimal representation of any positve integer:
#include <math.h>
...
size_t characters_needed_decimal(long long unsigned int llu)
{
size_t s = 1;
if (0LL != llu)
{
s += log10(llu);
}
return s;
}
Mind to add 1 when using a C-"string" to store the number, as C-"string"s are 0-terminated.
Use it like this:
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
size_t characters_needed_decimal(long long unsigned int);
int main(void)
{
size_t s = characters_needed_decimal(LONG_MAX);
++s; /* 1+ as we want a sign */
char * p = malloc(s + 1); /* add one for the 0-termination */
if (NULL == p)
{
perror("malloc() failed");
exit(EXIT_FAILURE);
}
sprintf(p, "%ld", LONG_MAX);
printf("LONG_MAX = %s\n", p);
sprintf(p, "%ld", LONG_MIN);
printf("LONG_MIN = %s\n", p);
free(p);
return EXIT_SUCCESS;
}
Safest:
Rather than predict the allocation needed, uses asprintf(). This function allocates memory as needed.
char *description = NULL;
asprintf(&description, "Room %lu somedata\n", LONG_MAX);
asprintf() is not standard C, but is common in *nix and its source code is available to accommodate other systems.
Why Use Asprintf?
apple
android

Unable to understand scanf() behaviour in the following code

Input -
COMETQ
HVNGAT
Here's the code -
#include <stdio.h>
#include <stdlib.h>
#define MAXLEN 6
main(void)
{
char comet[MAXLEN], group[MAXLEN];
unsigned long int result[2] = { 1,1 };
short int i, j;
scanf("%s",comet);
scanf("%s",group);
printf("\nComet's Name: %s\nGroup's Name: %s",comet,group);
printf("\nComet's No.: %ld\nGroup's No.: %ld",result[0],result[1]);
i = j = 0;
while(comet[i]!='\0' && i<MAXLEN){
result[0] *= (comet[i] - 'A' + 1);
i++;
}
while(group[j]!='\0' && j<MAXLEN){
result[1] *= (comet[j] - 'A' + 1);
j++;
}
printf("\nComet's No.: %ld\nGroup's No.: %ld",result[0],result[1]);
printf("\nComet's No. Mod 47: %ld\nGroup's No. Mod 47: %ld",result[0]%47,result[1]%47);
if(result[0]%47 == result[1]%47)
printf("\nGO");
else
printf("\nSTAY");
exit(0);
}
Now, as far as I know, scanf() reads a string till a whitespace is detected. But here, the output is-
Comet's Name: COMETQHVNGAT
Group's Name: HVNGAT
Comet's No.: 1
Group's No.: 1
Comet's No.: -534663680
Group's No.: 994500
Comet's No. Mod 47: 43
Group's No. Mod 47: 27
STAY
But, shouldn't it be like this?
comet = "COMETQ" & Group = "HVNGAT"
I don't understand why isn't this happening?
In addition, when the size of comet is 6 bytes, how can it store - COMETQHVNGAT?
If you want to use a char array as a string, you need to null-terminate it.
In your code, the array length is specified as 6, whereas, your inputs are itself 6 bytes, leaving no room to store the terminating null character. That's why, when the comet and group are passed to printf(), they're producing weird output.
A string is null-terminated, and based on this principle, the memory access in your case will go beyond the allocated memory size of comet and group, producing undefined behaviour .
Safer alternative:
Use fgets() to read the input, limited by the size of buffer, get rid of the last \n character, and you're all good-to-go. Check the man page for details.
Also, you need to change the format specifier %ld to %lu for unsigned long int. %ld is used for signed long int.
Its because of the buffer size #define MAXLEN 6 - its causing Undefined Behavior- there is no space for the terminating character \0.
COMETQ\0
0123457
define MAXLEN as 7.
Array is declared memory will be allocated in a continuous manner. So in first string you
are not able to give the null character. So this is the reason it is printing fully.
If you give the input less than the maxlen it give the output correctly. Make sure that maxlen is high value because user can give the name in n no.of characters.
Check:
If you print the address of starting character of the both array you can get the difference that is 6 bytes.

sscanf function usage in c

I'm trying to parse xxxxxx(xxxxx) format string using sscanf as following:
sscanf(command, "%s(%s)", part1, part2)
but it seems like sscanf does not support this format and as a result, part1 actually contains the whole string.
anyone has experience with this please share...
Thank you
Converting your code into a program:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%s(%s)", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
When run, it produces 'Problem! n = 1'.
This is because the first %s conversion specifier skips leading white space and then scans for 'non white-space' characters up to the next white space character (or, in this case, end of string).
You would need to use (negated) character classes or scansets to get the result you want:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
This produces:
Part1 = <<xxxxx>>; Part2 = <<yyyy>>
Note the 31's in the format; they prevent overflows.
I'm wondering how does %31 works. Does it work as %s and prevent overflow or does it just prevent overflow?
With the given data, these two lines are equivalent and both safe enough:
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
if ((n = sscanf(command, "%[^(](%[^)])", part1, part2)) != 2)
The %[...] notation is a conversion specification; so is %31[...].
The C standard says:
Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
An optional assignment-suppressing character *.
An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
An optional length modifier that specifies the size of the receiving object.
A conversion specifier character that specifies the type of conversion to be applied.
The 31 is an example of the (optional) maximum field width. The [...] part is a scanset, which could perhaps be regarded as a special case of the s conversion specifier. The %s conversion specifier is approximately equivalent to %[^ \t\n].
The 31 is one less than the length of the string; the null at the end is not counted in that length. Since part1 and part2 are each an array of 32 char, the %31[^(] or %31[^)] conversion specifiers prevent buffer overflows. If the first string of characters was more than 31 characters before the (, you'd get a return value of 1 because of a mismatch on the literal open parenthesis. Similarly, the second string would be limited to 31 characters, but you'd not easily be able to tell whether the ) was in the correct place or not.
If you know exactly how long are the parts of your "command", then the simplest option is:
sscanf(command, "%6s(%5s)", part1, part2);
This assumes that 'part1' is always 6 characters long and 'part2' is always 5 characters long (as in your code sample).
Try this instead:
#include <stdio.h>
int main(void)
{
char str1[20];
char str2[20];
sscanf("Hello(World!)", "%[^(](%[^)])", str1, str2);
printf("str1=\"%s\", str2=\"%s\"\n", str1, str2);
return 0;
}
Output (ideone):
str1="Hello", str2="World!"

Limit Output in C

In C, I would like to limit the string to the first 8 characters. For example, I have:
char out = printf("%c", str);
How can I make it so it only returns the first 8 characters?
You can limit the length by setting the precision in the format specifier:
printf("%.8s", str);
This will print up to eight characters from the null-terminated string pointed-to by str. If the length of str is less than eight characters, then it will print the entire string.
Note that the format specifier for a null-terminated string is %s, not %c (%c is to print a single char), and that printf returns an int (the total number of characters printed), not a char.
No
That is incorrect. tabular printing "%8s" pads up to say 8 spaces, as in the example given. It does not truncate. ISOC99. If this is a windows only thing, okay, MS ignores the world on lots of things. If the length of the string is longer than the tabulation then the full string prints. See:
int main()
{
char tmp[]="123456789";
printf("1 %1s\n", tmp);
printf("2 %2s\n", tmp);
printf("4 %4s\n", tmp);
printf("8 %8s\n", tmp);
printf("16 %16s\n", tmp);
printf("32 %32s\n", tmp);
return 0;
}
output from gcc 3.4.2 on Solaris 5.9:
> ./a.out
1 123456789
2 123456789
4 123456789
8 123456789
16 123456789
32 123456789
sprintf() will duplicate and truncate a string then it can be sent to printf. Or if you don't care about the source string:
char * trunc(char *src, int len)
{
src[len]=0x0;
return src;
}
References: INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:TC2, WG14/N1124 Committee Draft — May 6, 2005

Resources