sscanf() not properly parsing the input line - c

I'm writing a method to parse a string in a specific format, "55555;fhihehj;"
I have used sscanf in the past to do something similar, so I thought why not.
Here is my current code.
char toBreak[] = "55555;fjfjfhhj;";
char* strNum = malloc(256); //256 * sizeof(char) = 256
char* name = malloc(256);
if (sscanf(toBreak, "%[^;];%[^;];", strNum, name)!=2)
return -1;
printf("%s, %s\n", strNum, name);
For some reason, it isn't parsing the string correctly and I am not sure why.

Taking your compilable code and making it into an SSCCE (Short, Self-Contained, Correct Example) gives us:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char toBreak[] = "55555;fjfjfhhj;";
char *strNum = malloc(256);
char *name = malloc(256);
if (sscanf(toBreak, "%255[^;];%255[^;];", strNum, name) != 2)
return -1;
printf("%s, %s\n", strNum, name);
return 0;
}
I added the 255 to prevent buffer overflows in the general case where the input is not a constant in the program. Here, of course, the 256 byte allocations are very much larger than necessary, and overflow is not a problem. Note the 'off-by-one' on the length; the length in the format does not include the null byte, but there must be space for the null byte.
When compiled and run on Mac OS X 10.9 Mavericks with GCC 4.8.2, that gives:
55555, fjfjfhhj
What platform are you running on — o/s and compiler? What do you get with the code I show?

Related

malloc causes <Access violation writing location>

I am trying to read a file line by line, and get each line as a char * to a dynamic string, the code I am using used to work and without changing it (or noticing it), it has ceased to work, accsesing the reed information results in an error. Here is a MRE of my code for getting one line:
#include <stdio.h>
#include <string.h>
#define MAX_STR_SIZE 10000
int main(void)
{
char* filePath; // is set by other working part of program to a real readable file address.
while (fgetc(filePath) != EOF) // an extra chracter is in the data files to account for this cheack.
{
char tempStr[MAX_STR_SIZE] = { 0 };
char* str = NULL;
fgets(tempStr, MAX_STR_SIZE, filePath);
tempStr[strcspn(tempStr, "\n")] = 0;
str = malloc(sizeof(char) * (strlen(tempStr) + 1)); // does not work
strcpy(str, tempStr);
}
}
The error:
Exception thrown at 0x00007ff95448d215 in GifProject.exe: Access violation writing location 0xFFFFFFFFEA1854F0.
It is difficult to diagnose your problem without a complete compilable program that exhibits the problem, but from the code fragment and the debugging information in the image, it seems you do not include <stdlib.h> and the prototype inferred by the compiler for malloc() from the actual argument is int malloc(size_t), leading to undefined behavior when you store the return value into the pointer str: because of the missing prototype, the compiler generates code that converts the return value from int to char *, sign extending from 32-bit to 64-bits, producing a meaningless pointer.
Note that you should also test the return value of fgets to properly handle end of file, and you should test for potential malloc failure before calling strcpy or better: use strdup that allocates and copies a string in a single call.
Here is a modified version:
#include <stdlib.h>
#include <string.h>
#define MAX_STR_SIZE 4096
char *readline(FILE *file) {
char tempStr[MAX_STR_SIZE];
if (!fgets(tempStr, sizeof tempStr, file)) {
/* end of file: return a null pointer */
return NULL;
}
/* strip the trailing newline if any */
tempStr[strcspn(tempStr, "\n")] = '\0';
/* allocate a copy of the string and return it */
return strdup(tempStr);
}

How to concatenate char pointers using strcat in c? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I'm learning pointers in C, using Linux. I'm trying to use the strcat function, but it doesn't work and I don't understand why.
I'm passing a username to the main as an argument because I need to concatenate and put a number 1 in the first position of this username. For example if the I got as argument username123 I need to convert this to 1username123
I got this code:
#include <stdio.h>
#include <string.h>
int main(int argc, char *arg[]){
const char *userTemp;
char *finalUser;
userTemp = argv[1]; //I got the argument passed from terminal
finalUser = "1";
strcat(finalUser, userTemp); //To concatenate userTemp to finalUser
printf("User: %s\n",finalUser);
return 0;
}
The code compiles, but I got a segmentation fault error and doesn't know why. Can you please help me going to the right direction?
It is undefined behaviour in C to attempt to modify a string literal (like "1"). Often, these are stored in non-modifiable memory to allow for certain optimisations.
Let's leave aside for the moment the fact that your entire program can be replaced with:
#include <stdio.h>
int main(int argc, char *argv[]){
printf("User: 1%s\n", (argc > 1) ? argv[1] : "");
return 0;
}
The way you ensure you have enough space is to create a buffer big enough to hold whatever you want to do. For example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
// Check args provded.
if (argc < 2) {
puts("User: 1");
return 0;
}
// Allocate enough memory ('1' + arg + '\0') and check it worked.
char *buff = malloc(strlen(argv[1]) + 2);
if (buff == NULL) {
fprintf(stderr, "No memory\n");
return 1;
}
// Place data into memory and print.
strcpy(buff, "1");
strcat(buff, argv[1]);
printf("User: %s\n", buff);
// Free memory and return.
free(buff);
return 0;
}
What you shouldn't do is to allocate a fixed size buffer and blindly copy in the data provided by a user. That's how the vast majority of security problems occur, by people overwriting buffers with unexpected data.
I'm trying to use the strcat function, but it doesn't work and I don't understand why.
For starters, you really shouldn't use strcat(). Use strlcat() instead. The "l" version of this and other functions take an extra parameter that let you tell the function how large the destination buffer is, so that the function can avoid writing past the end of the buffer. strcat() doesn't have that parameter, so it relies on you to make sure the buffer is large enough to contain both strings. This is a common source of security problems in C code. The "l" version also makes sure that the resulting string is null-terminated.
The code compiles, but I got a segmentation fault error and doesn't know why.
Here's the prototype for the function: char *strcat( char *dest, const char *src );
Now, you're calling that essentially like this: strcat("1", someString);. That is, you're trying to append someString to "1", which is a string constant. There's no extra room in "1" for whatever string is in someString, and because you're using a function that will happily write past the end of the destination buffer, your code is effectively writing over whatever happens to be in memory next to that string constant.
To fix the problem, you should:
Switch to strlcat().
Use malloc() or some other means to allocate a destination buffer large enough to hold both strings.
Unlike in other languages there is no real string type in C.
You want this:
#include <stdio.h>
#include <string.h>
int main(int argc, char *arg[]){
const char *userTemp;
char finalUser[100]; // finalUser can contain at most 99 characters
userTemp = argv[1]; //I got the argument passed from terminal
strcpy(finalUser, "1"); // copy "1" into the finalUser buffer
strcat(finalUser, userTemp); //To concatenate userTemp to finalUser
printf("User: %s\n",finalUser);
return 0;
}
or even simpler:
#include <stdio.h>
#include <string.h>
int main(int argc, char *arg[]){
char finalUser[100]; // finalUser can contain at most 99 characters
strcpy(finalUser, "1"); // copy "1" into the finalUser buffer
strcat(finalUser, argv[1]); //To concatenate argv[1] to finalUser
printf("User: %s\n",finalUser);
return 0;
}
Disclaimer: for the sake of brevity this code contains a fixed size buffer and no check for buffer overflow is done here.
The chapter dealing with strings in your C text book should cover this.
BTW you also should check if the program is invoked with an argument:
int main(int argc, char *arg[]){
if (argc != 2)
{
printf("you need to provide a command line argument\n");
return 1;
}
...
You're missing some fundamentals about C.
finalUser = "1";
This is created in "read-only" memory. You cannot mutate this. The first argument of strcat requires memory allocated for mutation, e.g.
char finalUser[32];
finalUser[0] = '1';

Reading an unknown length line from stdin in c with fgets

I am trying to read an unknown length line from stdin using the C language.
I have seen this when looking on the net:
char** str;
gets(&str);
But it seems to cause me some problems and I don't really understand how it is possible to do it this way.
Can you explain me why this example works/doesn't work
and what will be the correct way to implement it (with malloc?)
You don't want a pointer to pointer to char, use an array of chars
char str[128];
or a pointer to char
char *str;
if you choose a pointer you need to reserve space using malloc
str = malloc(128);
Then you can use fgets
fgets(str, 128, stdin);
and remove the trailling newline
char *ptr = strchr(str, '\n');
if (ptr != NULL) *ptr = '\0';
To read an arbitrary long line, you can use getline (a function added to the GNU version of libc):
#define _GNU_SOURCE
#include <stdio.h>
char *foo(FILE * f)
{
int n = 0, result;
char *buf;
result = getline(&buf, &n, f);
if (result < 0) return NULL;
return buf;
}
or your own implementation using fgets and realloc:
char *getline(FILE * f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, BUFSIZ, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
Call it using
char *str = getline(stdin);
if (str == NULL) {
perror("getline");
exit(EXIT_FAILURE);
}
...
free(str);
More info
Firstly, gets() provides no way of preventing a buffer overrun. That makes it so dangerous it has been removed from the latest C standard. It should not be used. However, the usual usage is something like
char buffer[20];
gets(buffer); /* pray that user enters no more than 19 characters in a line */
Your usage is passing gets() a pointer to a pointer to a pointer to char. That is not what gets() expects, so your code would not even compile.
That element of prayer reflected in the comment is why gets() is so dangerous. If the user enters 20 (or more) characters, gets() will happily write data past the end of buffer. There is no way a programmer can prevent that in code (short of accessing hardware to electrocute the user who enters too much data, which is outside the realm of standard C).
To answer your question, however, the only ways involve allocating a buffer of some size, reading data in some controlled way until that size is reached, reallocating if needed to get a greater size, and continuing until a newline (or end-of-file, or some other error condition on input) is encountered.
malloc() may be used for the initial allocation. malloc() or realloc() may be used for the reallocation (if needed). Bear in mind that a buffer allocated this way must be released (using free()) when the data is no longer needed - otherwise the result is a memory leak.
use the getline() function, this will return the length of the line, and a pointer to the contents of the line in an allocated memory area. (be sure to pass the line pointer to free() when done with it )
"Reading an unknown length line from stdin in c with fgets"
Late response - A Windows approach:
The OP does not specify Linux or Windows, but the viable answers posted in response for this question all seem to have the getline() function in common, which is POSIX only. Functions such as getline() and popen() are very useful and powerful but sadly are not included in Windows environments.
Consequently, implementing such a task in a Windows environment requires a different approach. The link here describes a method that can read input from stdin and has been tested up to 1.8 gigabytes on the system it was developed on. (Also described in the link.)_ The simple code snippet below was tested using the following command line to read large quantities on stdin:
cd c:\dev && dir /s // approximately 1.8Mbyte buffer is returned on my system
Simple example:
#include "cmd_rsp.h"
int main(void)
{
char *buf = {0};
buf = calloc(100, 1);//initialize buffer to some small value
if(!buf)return 0;
cmd_rsp("dir /s", &buf, 100);//recursive directory search on Windows system
printf("%s", buf);
free(buf);
return 0;
}
cmd_rsp() is fully described in the links above, but it is essentially a Windows implementation that includes popen() and getline() like capabilities, packaged up into this very simple function.
if u want to input an unknown length of string or input try using following code.
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main()
{
char *m;
clrscr();
printf("please input a string\n");
scanf("%ms",&m);
if (m == NULL)
fprintf(stderr, "That string was too long!\n");
else
{
printf("this is the string %s\n",m);
/* ... any other use of m */
free(m);
}
getch();
return 0;
}
Note that %ms, %as are GNU extensions..

concating char* in c fails do to segmentation fault

Im trying to create two paths in order to copy file from one folder to another.
I get Segmentation fault on the second time Im trying to concat args[1].
Tried copy the cell to another char with strcpy but it wouldnt help. and a lot more stuff I didnt succeed with.
I guess something with those string commands is messing with my char array and doesnt let me do the concat twise.
the path should be of the form
"Server/File#"
or "Client#/File#"
the # is the argument from args.
I looked all over and saw some similar things but not exactly that.
please help.
all the needed "include" are in there.
void copy_file(char *args[]){
char dst_path[100],src_path[100];
memset(dst_path,0,100);
memset(src_path,0,100);
strcpy(dst_path,"Client");
strcat(dst_path,args[0]);
strcat(dst_path,"/File");
strcat(dst_path,args[1]);
strcpy(src_path,"Server/File");
strcat(src_path,args[1]);
}
This code is supposed to segfault because there is no bounds checking and you can easily overflow the destination buffers.
Also, you do not check the number of elements in args[] array. There may be fewer arguments than you expect, probably args[1] is NULL.
To fix:
Check the number of elements in args[] array.
Calculate the required buffer size for your final string, allocate a buffer of that size and then format the string. Alternatively, use snprintf to format the string in one call. snprintf does bound checks for you, so that you do not overflow your destination buffer, e.g:
char dst_path[16384];
int n = snprintf(dst_path, sizeof dst_path, "Client %s /File %s", args[0], args[1]);
if(n >= sizeof dst_path)
// dst_path is not large enough
Hopefully you can gather from this where you have mis-stepped. Tested with GCC 4.8.3. The long and short of it is that you are overflowing your buffer.
/* gcc -g -Wall -Wextra main.c */
#include <assert.h>
#include <string.h>
#define BUFSIZE 30
void copy_file(char* args[]) {
char dst_path[BUFSIZE];
char src_path[BUFSIZE];
int i;
for (i = 0; i < 30; i++) { //initializing - tried without it too.
dst_path[i] = 0;
src_path[i] = 0; }
assert(strlen(dst_path) + strlen("Client") < BUFSIZE);
strcpy(dst_path, "Client");
assert(strlen(dst_path) + strlen(args[0]) < BUFSIZE);
strcat(dst_path, args[0]);
assert(strlen(dst_path) + strlen("/File") < BUFSIZE);
strcat(dst_path, "/File");
assert(strlen(dst_path) + strlen(args[1]) < BUFSIZE);
strcat(dst_path, args[1]);
assert(strlen(src_path) + strlen("Server/File") < BUFSIZE);
strcpy(src_path, "Server/File");
assert(strlen(src_path) + strlen(args[1]) < BUFSIZE);
strcat(src_path, args[1]); }
int main(int argc, char* argv[]) {
copy_file(&argv[1]);
return 0; }
One of the main reasons you are having buffer overflows most likely is your use of strcpy. This does not have a fixed copy length, and thus if your strings are not terminated by a NULL character \0 memory that is not part of the string will be copied as well. What you should use is strncpy; then you can use strlen to get the length of the string after adding a terminating NULL character. It is good practice to always set the last character of your buffer to NULL after writing to it.

What size should I allow for strerror_r?

The OpenGroup POSIX.1-2001 defines strerror_r, as does The Linux Standard Base Core Specification 3.1. But I can find no reference to the maximum size that could be reasonably expected for an error message. I expected some define somewhere that I could put in my code but there is none that I can find.
The code must be thread safe. Which is why strerror_r is used and not strerror.
Does any one know the symbol I can use? I should I create my own?
Example
int result = gethostname(p_buffy, size_buffy);
int errsv = errno;
if (result < 0)
{
char buf[256];
char const * str = strerror_r(errsv, buf, 256);
syslog(LOG_ERR,
"gethostname failed; errno=%d(%s), buf='%s'",
errsv,
str,
p_buffy);
return errsv;
}
From the documents:
The Open Group Base Specifications Issue 6:
ERRORS
The strerror_r() function may fail if:
[ERANGE] Insufficient storage was supplied via strerrbuf and buflen to
contain the generated message string.
From the source:
glibc-2.7/glibc-2.7/string/strerror.c:41:
char *
strerror (errnum)
int errnum;
{
...
buf = malloc (1024);
Having a sufficiently large static limit is probably good enough for all situations.
If you really need to get the entire error message, you can use the GNU version of strerror_r, or you can use the standard version
and poll it with successively larger buffers until you get what you need. For example,
you may use something like the code below.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Call strerror_r and get the full error message. Allocate memory for the
* entire string with malloc. Return string. Caller must free string.
* If malloc fails, return NULL.
*/
char *all_strerror(int n)
{
char *s;
size_t size;
size = 1024;
s = malloc(size);
if (s == NULL)
return NULL;
while (strerror_r(n, s, size) == -1 && errno == ERANGE) {
size *= 2;
s = realloc(s, size);
if (s == NULL)
return NULL;
}
return s;
}
int main(int argc, char **argv)
{
for (int i = 1; i < argc; ++i) {
int n = atoi(argv[i]);
char *s = all_strerror(n);
printf("[%d]: %s\n", n, s);
free(s);
}
return 0;
}
I wouldn't worry about it - a buffer size of 256 is far more than sufficient, and 1024 is overkill. You could use strerror() instead of strerror_r(), and then optionally strdup() the result if you need to store the error string. This isn't thread-safe, though. If you really need to use strerror_r() instead of strerror() for thread safety, just use a size of 256. In glibc-2.7, the longest error message string is 50 characters ("Invalid or incomplete multibyte or wide character"). I wouldn't expect future error messages to be significantly longer (in the worst case, a few bytes longer).
This program (run online (as C++) here):
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(){
const int limit = 5;
int unknowns = 0;
int maxlen = 0;
int i=0; char* s = strerror(i);
while(1){
if (maxlen<strlen(s)) maxlen = strlen(s);
if (/*BEGINS WITH "Unknown "*/ 0==strncmp("Unknown ", s , sizeof("Unknown ")-1) )
unknowns++;
printf("%.3d\t%s\n", i, s);
i++; s=strerror(i);
if ( limit == unknowns ) break;
}
printf("Max: %d\n", maxlen);
return 0;
}
lists and prints all the errors on the system and keeps track of the maximum length. By the looks of it, the length does not exceed 49 characters (pure strlen's without the final \0) so with some leeway, 64–100 should be more than enough.
I got curious if the whole buffer size negotiation couldn't simply be avoided by returning structs and whether there was a fundamental reason for not returning structs. So I benchmarked:
#define _POSIX_C_SOURCE 200112L //or else the GNU version of strerror_r gets used
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
typedef struct { char data[64]; } error_str_t;
error_str_t strerror_reent(int errn) __attribute__((const));
error_str_t strerror_reent(int errn){
error_str_t ret;
strerror_r(errn, ret.data, sizeof(ret));
return ret;
}
int main(int argc, char** argv){
int reps = atoi(argv[1]);
char buf[64];
volatile int errn = 1;
for(int i=0; i<reps; i++){
#ifdef VAL
error_str_t err = strerror_reent(errn);
#else
strerror_r(errn, buf, 64);
#endif
}
return 0;
}
and the performance difference between the two at -O2 is minimal:
gcc -O2 : The VAL version is slower by about 5%
g++ -O2 -x c++ : The VAL version is faster by about 1% than the standard version compiled as C++ and by about 4% faster than the standard version compiled as C (surprisingly, even the slower C++ version beats the faster C version by about 3%).
In any case, I think it's extremely weird that strerror is even allowed to be thread unsafe. Those returned strings should be pointers to string literals. (Please enlighten me, but I can't think of a case where they should be synthesized at runtime). And string literals are by definition read only and access to read only data is always thread safe.
Nobody has provided a definitive answer yet, so I looked into this further and there's a better function for the job, perror(3), as you will probably want to display this error somewhere, which is what I'd recommend you use unless your requirements really require you not to.
That's not a full answer, but the reason to use it is because it uses proper size buffer suitable for any locale. It internally uses strerror_r(3), these two functions conform to POSIX standard and are widely available, therefore in my eyes they're authoritative source of truth in this matter.
excerpt from glibc implementation:
static void
perror_internal (FILE *fp, const char *s, int errnum)
{
char buf[1024];
const char *colon;
const char *errstring;
if (s == NULL || *s == '\0')
s = colon = "";
else
colon = ": ";
errstring = __strerror_r (errnum, buf, sizeof buf);
(void) __fxprintf (fp, "%s%s%s\n", s, colon, errstring);
}
From this I can infer, that at this moment in time, and given stability of such things, in forseeable future, you will never go wrong with a buffer size of 1024 chars.

Resources