using getline and strtok together in a loop - c

I am new to C, and trying to implement whoami, as an exercise to myself. I have following code:
#define _POSIX_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h> // strtok
int str_to_int(const char *str)
{
int acc = 0;
int i;
for (i = 0; str[i] != '\0'; ++i) {
acc = (10 * acc) + (str[i] - 48); // 48 -> 0 in ascii
}
return acc;
}
int main()
{
FILE *passwd;
char *line = NULL;
size_t line_size;
passwd = fopen("/etc/passwd","r");
uid_t uid = getuid();
while (getline(&line, &line_size,passwd) != -1) {
char *name = strtok(line,":");
strtok(line,":"); // passwd
char *user_id = strtok(line,":");
if (str_to_int(user_id) == uid) {
printf("%s\n",name);
break;
}
}
fclose(passwd);
return 0;
}
Do I need to save line pointer inside of the while loop. Because I think strtok modifies it somehow, but I am not sure if I need to copy the line, or starting address of the line before I use it with strtok.

strtok is a horrid function. I don't know what documentation you read (if any?) but it both modifies the buffer it is passed and retains an internal pointer into the buffer; you should only pass the buffer the first time you use it on a given line, and pass NULL subsequently so it knows to pick up where it left off instead of starting at the beginning again (which won't actually work quite right because it stomped on the buffer...).
Better, find some other way to parse and stay far away from strtok.

It might be safer to use strtok_r. It is safer in a multi-threaded situation. That may not apply in this case, but it is sometimes better just to assume that some point any snippet you write might end up in a multi-threaded app. The following is the OP code modified to use strtok_r.
char *pos;
char *name = strtok_r(line,":",&pos);
strtok_r(NULL,":",&pos); // passwd
char *user_id = strtok_r(NULL,":",&pos);
And, yes, strtok (and strtok_r) do modify the given input buffer (first parameter). But it can be safe if used properly. Since strtok returns a pointer to a buffer inside the given string, you need to be careful how you use it. In your case, when it breaks out of the loop, name and user_id will point to a value inside the line buffer.
And you maybe should read the man pages for getline. The way you are using it, it returns an allocated buffer that your application is responsible for freeing. That might be what you are aiming for, but I mention it because I don't see a free call for it in the posted code.

I totally agree with geekosaur (and Mark). Paraphrasing his comment, you can modify the above code as following:
while (getline(&line, &line_size, passwd) != -1) {
char *name = strtok(line,":");
strtok(NULL,":"); // passwd
char *user_id = strtok(NULL,":");
if (str_to_int(user_id) == uid) {
printf("%s\n",name);
break;
}
}
You should pass NULL for the strtok invocations other than the first one.

Related

converting unquoted slash to newline without strtok or further memory allocation

I'm trying to figure out why this doesn't work.
I'd like to take data from a file using the 'getline()' function and convert the string so that the slashes ('/') that are not in quotes are replaced with new line characters. I'd like to avoid copying the string to another if possible.
I tried my program below, with two attempts to process the same data. The first attempt wasn't quite right. I expected to see the following in both cases:
ABC
DEF'/'GH
But
printf("%s",newline);
only returns this:
ABC
DEF'/'
and:
printf("%s",newline2);
returns a segmentation fault.
Because the getline() function returns the string as a char array with memory pre-allocated to it, I feel a ridiculous solution would be:
char lines[5000000];
strcpy(lines,datafromgetline);
char* newline=parsemulti(lines,10); //prints data almost correctly
printf("%s",newline);
But could I somehow do this where I don't have to allocate local stack space or memory? Can I somehow modify the incoming data directly without a segmentation fault?
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
// replaces all occurrences of / not within single quotes with a new line character
char* parsemulti(char* input,int inputlen){
char* fms=strchr(input,'/');
char output[100000]; //allocate tons of space
if (!fms){
return input;
}else{
int exempt=0,sz=inputlen;
char aline[5000];
char*inputptr=input,*lineptr=aline;
memset(aline,0,5000);
while(--sz >= 0){
if (*inputptr=='\''){exempt=1-exempt;} //toggle exempt when ' is found
if (*inputptr=='/' && exempt==0){
*lineptr='\0';
strcat(output,aline);
lineptr=aline;
strcat(output,"\r\n");
}else{
*lineptr=*inputptr;lineptr++;
}
inputptr++;
}
if (exempt==1){printf("\nWARNING: Unclosed quotes\n");}
*lineptr='\0';
strcat(output,aline);
strcat(output,"\r\n");
}
strcpy(input,output);
return input;
}
int main(){
char lines[5000];
strcpy(lines,"ABC/DEF'/'GH");
char* newline=parsemulti(lines,10); //prints data almost correctly
printf("%s",newline);
char* lines2="ABC/DEF'/'GH";
char* newline2=parsemulti(lines2,10); //returns segmentation fault
printf("%s",newline2);
return 0;
}
Two lines
char lines[5000];
strcpy(lines, "ABC/DEF'/'GH");
will
allocate memory for 5000 objects of type char on stack
copy string literal contents to memory pointed by name "lines", which you can modify
on the other hand
char *lines2 = "ABC/DEF'/'GH";
defines pointer to string literal that is usually located in read only memory.
Read only, as in do not modify me :)
You tagged this C so I assume You are talking about using getline() function - not a part of C standard, but provided by GNU C Library, that manages memory on it's own (so basically it can, and will do memory allocations, unless you preallocate it. It uses only heap memory, so if preallocated size is too small it reallocates it. Thus You can't provide address to stack char array instead).
To actually find and replace escape character from string, I'd say you should not reinvent wheel and use library string functions.
char *line = NULL;
char *needle;
ssize_t line_size;
size_t size = 0;
line_size = getline(&line, &size, stdin);
while (line_size != -1) {
needle = strchr(line, '/');
while (needle) {
if (needle != line && !(*(needle - 1) == '\'' && *(needle + 1) == '\''))
*needle = '\n';
needle = strchr(needle + 1, '/');
}
printf("%s", line);
line_size = getline(&line, &size, stdin);
}

Reading an unknown length line from stdin in c with fgets

I am trying to read an unknown length line from stdin using the C language.
I have seen this when looking on the net:
char** str;
gets(&str);
But it seems to cause me some problems and I don't really understand how it is possible to do it this way.
Can you explain me why this example works/doesn't work
and what will be the correct way to implement it (with malloc?)
You don't want a pointer to pointer to char, use an array of chars
char str[128];
or a pointer to char
char *str;
if you choose a pointer you need to reserve space using malloc
str = malloc(128);
Then you can use fgets
fgets(str, 128, stdin);
and remove the trailling newline
char *ptr = strchr(str, '\n');
if (ptr != NULL) *ptr = '\0';
To read an arbitrary long line, you can use getline (a function added to the GNU version of libc):
#define _GNU_SOURCE
#include <stdio.h>
char *foo(FILE * f)
{
int n = 0, result;
char *buf;
result = getline(&buf, &n, f);
if (result < 0) return NULL;
return buf;
}
or your own implementation using fgets and realloc:
char *getline(FILE * f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, BUFSIZ, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
Call it using
char *str = getline(stdin);
if (str == NULL) {
perror("getline");
exit(EXIT_FAILURE);
}
...
free(str);
More info
Firstly, gets() provides no way of preventing a buffer overrun. That makes it so dangerous it has been removed from the latest C standard. It should not be used. However, the usual usage is something like
char buffer[20];
gets(buffer); /* pray that user enters no more than 19 characters in a line */
Your usage is passing gets() a pointer to a pointer to a pointer to char. That is not what gets() expects, so your code would not even compile.
That element of prayer reflected in the comment is why gets() is so dangerous. If the user enters 20 (or more) characters, gets() will happily write data past the end of buffer. There is no way a programmer can prevent that in code (short of accessing hardware to electrocute the user who enters too much data, which is outside the realm of standard C).
To answer your question, however, the only ways involve allocating a buffer of some size, reading data in some controlled way until that size is reached, reallocating if needed to get a greater size, and continuing until a newline (or end-of-file, or some other error condition on input) is encountered.
malloc() may be used for the initial allocation. malloc() or realloc() may be used for the reallocation (if needed). Bear in mind that a buffer allocated this way must be released (using free()) when the data is no longer needed - otherwise the result is a memory leak.
use the getline() function, this will return the length of the line, and a pointer to the contents of the line in an allocated memory area. (be sure to pass the line pointer to free() when done with it )
"Reading an unknown length line from stdin in c with fgets"
Late response - A Windows approach:
The OP does not specify Linux or Windows, but the viable answers posted in response for this question all seem to have the getline() function in common, which is POSIX only. Functions such as getline() and popen() are very useful and powerful but sadly are not included in Windows environments.
Consequently, implementing such a task in a Windows environment requires a different approach. The link here describes a method that can read input from stdin and has been tested up to 1.8 gigabytes on the system it was developed on. (Also described in the link.)_ The simple code snippet below was tested using the following command line to read large quantities on stdin:
cd c:\dev && dir /s // approximately 1.8Mbyte buffer is returned on my system
Simple example:
#include "cmd_rsp.h"
int main(void)
{
char *buf = {0};
buf = calloc(100, 1);//initialize buffer to some small value
if(!buf)return 0;
cmd_rsp("dir /s", &buf, 100);//recursive directory search on Windows system
printf("%s", buf);
free(buf);
return 0;
}
cmd_rsp() is fully described in the links above, but it is essentially a Windows implementation that includes popen() and getline() like capabilities, packaged up into this very simple function.
if u want to input an unknown length of string or input try using following code.
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main()
{
char *m;
clrscr();
printf("please input a string\n");
scanf("%ms",&m);
if (m == NULL)
fprintf(stderr, "That string was too long!\n");
else
{
printf("this is the string %s\n",m);
/* ... any other use of m */
free(m);
}
getch();
return 0;
}
Note that %ms, %as are GNU extensions..

How to make backslash character not to escape

I don't know the title correctly addresses my problem or not. So, I will just go with it.
Here is the problem, I have to input a char array of a file path (in Windows) containing lots of backslashes in it, eg. "C:\myfile.txt" and return an unsigned char array of C-style file paths, eg. "C:\myfile.txt".
I tried to write a function.
unsigned char* parse_file_path(char *path);
{
unsigned char p[60];
int i,j;
int len = strlen(path);
for(i=0,j=0; i<len; i++, j++)
{
char ch = path[i];
if(ch==27)
{
p[j++]='\\';
p[j]='\\';
}
else
p[j] = path[i];
}
p[j]='\0';
return p;
}
The weird thing (for me) I am encountering is, here path contains only one backslash '\'. In order to get one backslash, I have to put '\' in path. This is not possible, cause path cannot contain '\'. When I call it like this parse_file_path("t\es\t \it), it returns
t←s it. But parse_file_path("t\\es\\t \\it") returns t\es\t \it.
How can I accomplish my task? Thanks in advance.
If I can just mention another problem with your code.
You are returning a local variable (your unsigned char p). This is undefined behavior. Consider declaring a char* p that you assign memory to dynamically using malloc and then returning p as you do. E.g. something like:
char* p = malloc(60);
A common practice is to use sizeof when allocating memory with malloc but here I've passed 60 directly as the C standard guarantees that a char will be 1 byte on all platforms.
But you have to free the memory assigned with malloc.
Or alternatively, you can change the function to take a buffer as an input argument that it then writes to. That way you can pass a normal array where you would call this function.
Regarding your slashes issue, here:
p[j++]='\\';
p[j]='\\';
Position j in p will be changed to \\, then j will be incremented and at the very next line you do the same for the succeeding char position. Are you sure you want the two assignments?
By the way if you are inputting the path from the command line, the escaping will be taken care of for you. E.g. consider the following code:
#include <stdio.h>
#include <string.h> /* for strlen */
#include <stdlib.h> /* for exit */
int main()
{
char path[60];
fgets(path, 60, stdin); /* get a maximum of 60 characters from the standard input and store them in path */
path[strlen(path) - 1] = '\0'; /* replace newline character with null terminator */
FILE* handle = fopen(path, "r");
if (!handle)
{
printf("There was a problem opening the file\n");
exit(1); /* file doesn't exist, let's quite with a status code of 1 */
}
printf("Should be good!\n");
/* work with the file */
fclose(handle);
return 0; /* all cool */
}
And then you run it and input something like:
C:\cygwin\home\myaccount\main.c
It should print 'Should be good!' (provided the file does exist, you can also test with 'C:\').
At least on Windows 7 with cygwin this is what I get. No need for any escapes as this is handled for you.

Custom getLine() function for c

I need a function/method that will take in a char array and set it to a string read from stdin. It needs to return the last character read as its return type, so I can determine if it reached the end of a line or the end of file marker.
here is what I have so far, and I kind of based it off of code from here
UPDATE: I changed it, but now it just crashes upon hitting enter after text. I know this way is inefficient, and char is not the best for EOF check, but for now I am just trying to get it to return the string. I need it to do it in this fashion and no other fashion. I need the string to be the exact length of the line, and to return a value that is either the newline or EOF int which I believe can still be used in a char value.
This program is in C not C++
char getLine(char **line);
int main(int argc, char *argv[])
{
char *line;
char returnVal = 0;
returnVal = getLine(&line);
printf("%s", line);
free(line);
system("pause");
return 0;
}
char getLine(char **line) {
unsigned int lengthAdder = 1, counter = 0, size = 0;
char charRead = 0;
*line = malloc(lengthAdder);
while((charRead = getc(stdin)) != EOF && charRead != '\n')
{
*line[counter++] = charRead;
*line = realloc(*line, counter);
}
*line[counter] = '\0';
return charRead;
}
Thank you for any help in advance!
You're assigning the result of malloc() to a local copy of line, so after the getLine() function returns it's not modified (albeit you think it is). What you have to do is either return it (as opposed to use an output parameter) or pass its address (pass it 'by reference'):
void getLine(char **line)
{
*line = malloc(length);
// etc.
}
and call it like this:
char *line;
getLine(&line);
Your key problem is that line pointer value does not propagate out of the getLine() function. The solution is to pass pointer to the line pointer to the function as a parameter instead - calling it like getLine(&line); while the function would be defined as taking parameter char **line. In the function, on all places where you now work with line, you would work with *line instead, i.e. dereferencing the pointer to a pointer and working with the value of the variable in main() where the pointer leads. Hope this is not too confusing. :-) Try to draw it on a piece of paper.
(A tricky part - you must change line[counter] to (*line)[counter] because you first need to dereference the pointer to the string, and only then to access a specific character in the string.)
There is a couple of other problems with your code:
You use char as the type for charRead. However, the EOF constant cannot be represented using char, you need to use int - both as the type of charRead and return value of getLine(), so that you can actually distringuish between a newline and end of file.
You forgot to return the last char read from your getLine() function. :-)
You are reallocating the buffer after each character addition. This is not terribly efficient and therefore is a rather ugly programming practice. It is not too difficult to use another variable to track the amount of space allocated and then (i) start with allocating a reasonable chunk of memory, e.g. 64 bytes, so that ideally you will never reallocate (ii) enlarge the allocation only if you need to based on comparing the counter and your allocation size tracker. Two reallocation strategies are common - either doubling the size of the allocation or increasing the allocation by a fixed step.
The way you use realloc is not correct. If it returns NULL then the memory block will be lost.
It is better to use realloc in this way:
char *tmp;
...
tmp = realloc(line, counter);
if(tmp == NULL)
ERROR, TRY TO SOLVE IT
line = tmp;

Simple C string manipulation

I trying to do some very basic string processing in C (e.g. given a filename, chop off the file extension, manipulate filename and then add back on the extension)- I'm rather rusty on C and am getting segmentation faults.
char* fname;
char* fname_base;
char* outdir;
char* new_fname;
.....
fname = argv[1];
outdir = argv[2];
fname_len = strlen(fname);
strncpy(fname_base, fname, (fname_len-4)); // weird characters at the end of the truncation?
strcpy(new_fname, outdir); // getting a segmentation on this I think
strcat(new_fname, "/");
strcat(new_fname, fname_base);
strcat(new_fname, "_test");
strcat(new_fname, ".jpg");
printf("string=%s",new_fname);
Any suggestions or pointers welcome.
Many thanks and apologies for such a basic question
You need to allocate memory for new_fname and fname_base. Here's is how you would do it for new_fname:
new_fname = (char*)malloc((strlen(outdir)+1)*sizeof(char));
In strlen(outdir)+1, the +1 part is for allocating memory for the NULL CHARACTER '\0' terminator.
In addition to what other's are indicating, I would be careful with
strncpy(fname_base, fname, (fname_len-4));
You are assuming you want to chop off the last 4 characters (.???). If there is no file extension or it is not 3 characters, this will not do what you want. The following should give you an idea of what might be needed (I assume that the last '.' indicates the file extension). Note that my 'C' is very rusty (warning!)
char *s;
s = (char *) strrchr (fname, '.');
if (s == 0)
{
strcpy (fname_base, fname);
}
else
{
strncpy (fname_base, fname, strlen(fname)-strlen(s));
fname_base[strlen(fname)-strlen(s)] = 0;
}
You have to malloc fname_base and new_fname, I believe.
ie:
fname_base = (char *)(malloc(sizeof(char)*(fname_len+1)));
fname_base[fname_len] = 0; //to stick in the null termination
and similarly for new_fname and outdir
You're using uninitialized pointers as targets for strcpy-like functions: fname_base and new_fname: you need to allocate memory areas to work on, or declare them as char array e.g.
char fname_base[FILENAME_MAX];
char new_fname[FILENAME_MAX];
you could combine the malloc that has been suggested, with the string manipulations in one statement
if ( asprintf(&new_fname,"%s/%s_text.jpg",outdir,fname_base) >= 0 )
// success, else failed
then at some point, free(new_fname) to release the memory.
(note this is a GNU extension which is also available in *BSD)
Cleaner code:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
const char *extra = "_test.jpg";
int main(int argc, char** argv)
{
char *fname = strdup(argv[1]); /* duplicate, we need to truncate the dot */
char *outdir = argv[1];
char *dotpos;
/* ... */
int new_size = strlen(fname)+strlen(extra);
char *new_fname = malloc(new_size);
dotpos = strchr(fname, '.');
if(dotpos)
*dotpos = '\0'; /* truncate at the dot */
new_fname = malloc(new_size);
snprintf(new_fname, new_size, "%s%s", fname, extra);
printf("%s\n", new_fname);
return 0;
}
In the following code I do not call malloc.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/* Change this to '\\' if you are doing this on MS-windows or something like it. */
#define DIR_SYM '/'
#define EXT_SYM '.'
#define NEW_EXT "jpg"
int main(int argc, char * argv[] ) {
char * fname;
char * outdir;
if (argc < 3) {
fprintf(stderr, "I want more command line arguments\n");
return 1;
}
fname = argv[1];
outdir = argv[2];
char * fname_base_begin = strrchr(fname, DIR_SYM); /* last occurrence of DIR_SYM */
if (!fname_base_begin) {
fname_base_begin = fname; // No directory symbol means that there's nothing
// to chop off of the front.
}
char * fname_base_end = strrchr(fname_base_begin, EXT_SYM);
/* NOTE: No need to search for EXT_SYM in part of the fname that we have cut off
* the front and then have to deal with finding the last EXT_SYM before the last
* DIR_SYM */
if (!fname_base_end) {
fprintf(stderr, "I don't know what you want to do when there is no extension\n");
return 1;
}
*fname_base_end = '\0'; /* Makes this an end of string instead of EXT_SYM */
/* NOTE: In this code I actually changed the string passed in with the previous
* line. This is often not what you want to do, but in this case it should be ok.
*/
// This line should get you the results I think you were trying for in your example
printf("string=%s%c%s_test%c%s\n", outdir, DIR_SYM, fname_base_begin, EXT_SYM, NEW_EXT);
// This line should just append _test before the extension, but leave the extension
// as it was before.
printf("string=%s%c%s_test%c%s\n", outdir, DIR_SYM, fname_base_begin, EXT_SYM, fname_base_end+1);
return 0;
}
I was able to get away with not allocating memory to build the string in because I let printf actually worry about building it, and took advantage of knowing that the original fname string would not be needed in the future.
I could have allocated the space for the string by calculating how long it would need to be based on the parts and then used sprintf to form the string for me.
Also, if you don't want to alter the contents of the fname string you could also have used:
printf("string=%s%c%*s_test%c%s\n", outdir, DIR_SYM, (unsigned)fname_base_begin -(unsigned)fname_base_end, fname_base_begin, EXT_SYM, fname_base_end+1);
To make printf only use part of the string.
The basic of any C string manipulation is that you must write into (and read from unless... ...) memory you "own". Declaring something is a pointer (type *x) reserves space for the pointer, not for the pointee that of course can't be known by magic, and so you have to malloc (or similar) or to provide a local buffer with things like char buf[size].
And you should be always aware of buffer overflow.
As suggested, the usage of sprintf (with a correctly allocated destination buffer) or alike could be a good idea. Anyway if you want to keep your current strcat approach, I remember you that to concatenate strings, strcat have always to "walk" thourgh the current string from its beginning, so that, if you don't need (ops!) buffer overflow checks of any kind, appending chars "by hand" is a bit faster: basically when you finished appending a string, you know where the new end is, and in the next strcat, you can start from there.
But strcat doesn't allow to know the address of the last char appended, and using strlen would nullify the effort. So a possible solution could be
size_t l = strlen(new_fname);
new_fname[l++] = '/';
for(i = 0; fname_base[i] != 0; i++, l++) new_fname[l] = fname_base[i];
for(i = 0; testjpgstring[i] != 0; i++, l++) new_fname[l] = testjpgstring[i];
new_fname[l] = 0; // terminate the string...
and you can continue using l... (testjpgstring = "_test.jpg")
However if your program is full of string manipulations, I suggest using a library for strings (for lazyness I often use glib)

Resources