Iterating through lines without passing an offset - c

I can use the following function to iterate through some text and grab it line by line:
int nextline(char * text, unsigned int * start_at, char * buffer) {
/*
it will return the length of the line if there is a line, or -1 otherwise.
it will fill the character buffer with the line
and return where the pointer has 'finished' for that line so it can be used again
*/
int i;
char c;
if (*start_at > strlen(text)) return -1;
for (i=0; (c = * (text + *start_at + i)); i++) {
buffer[i] = c;
if (c == '\0') break;
if (c == '\n') {
buffer[i+1] = '\0';
break;
}
}
* start_at = * start_at + i + 1;
return i;
}
However, this function requires passing in an offset, for example:
char * longtext = "This is what I went to do\nWhen I came over\nto the place and thought that\nhere we go again";
char buffer[60];
unsigned int line_length, start_at=0;
for (int i=1; (line_length = nextline(longtext, &start_at, buffer)) != -1; i++)
printf("Line %2d. %s\n", i, buffer);
How would I write an equivalent function where it "remembers" where the cursor is and I don't need to keep passing it back into the function?

If you don't want to pass an offset, you could implement this by using a static variable inside of the function to keep track of the current offset. Doing so has disadvantages, however.
Suppose you were to use a static variable for the offset. You then process a complete string. Now what happens when you want to process another string? You need to somehow tell the function to "start over". Also, suppose you wanted to process two separate strings alternately, i.e. you call the function first on one string, then the other, then back to the first. The internal state wouldn't be able to manage that.
The best way to manage these types of issues is to do exactly what you're doing now: passing the address of a variable to keep track of the state. That way it's up to the calling function to keep track of the current state while your nextline function is stateless.
There are a number of older functions in the standard library that use internal state that were superseded by newer function that don't. One notable example is strtok. This function uses internal state to tokenize a string. The POSIX function strtok_r was created later that receives an additional parameter for state.
So keep your function the way it is. It's generally considered better design to not depend on internal state.

Related

Why are arguments used to pass in a inputted value from outside of function. Rather than asking for and storing input all in function itself?

Why are arguments used to pass in a input from user most of the time instead of using code like scanf in function itself.
It would just make code cleaner.
But i most of the time see people write prompts ask for input store it in variable and pass it in as argument outside of function even when they dont re use value later on in code.
Is it just habit of people who write complex code so most of the time its better idea to save input in separate variable and then pass it in rather than just use scanf in body of function?
Because other option of putting everything in function always seems easier to read you dont need arguments which add another layer of unnecessary complexity because whole sub problem is contained in one abstraction?
It is too have a better overview and re usability of the code, exactly like most functions do not allocate their own memory and ask for a buffer instead.
Let us look at an examples.:
void reverse(const char* input, char* output, int len) {
for (int i = 0; i < len; i++) {
output[len - 1 - i] = input[i];
}
output[len] = 0;
}
int main() {
char* my_input = malloc(MAX_INPUT_SIZE + 1);
int len = read(1, my_input, MAX_INPUT_SIZE);
if (len < 0) exit(1);
char* my_output = malloc(len + 1);
reverse(my_input, my_output, len);
free(my_input);
printf("%s", my_output);
free(my_output);
}
If we look at the main function the programs control flow is very clear. We can see where memory is allocated and where it is freed. But most importantly, we can change the input method, without having to touch the reverse() function. If reverse would be a very complicated function, this would be pretty hard to do.
Let us look at an examples.:
char* reverse() {
char* input = malloc(MAX_INPUT_SIZE + 1);
int len = read(1, input, MAX_INPUT_SIZE);
if (len < 0) exit(1);
char* output = malloc(len + 1);
for (int i = 0; i < len; i++) {
output[len - 1 - i] = input[i];
}
output[len] = 0;
free(input);
return output;
}
int main() {
char* my_output = reverse();
printf("%s", my_output);
free(my_output);
}
As you can see, the calling method is a lot shorter. But the big drawback is, that we can't really see what reverse is doing on the first look. Now you would need to write a long comment (at least for a more complex function) what this does. Also we can not customize the memory allocation function, input function and error handling, without having to change reverse. And if we would use reverse in hundreds of places in our code, with slight differences in how they read input, you would have to write a new function for every of them.

Abort trap: 6 error with arrays in c

The following code compiled fine yesterday for a while, started giving the abort trap: 6 error at one point, then worked fine again for a while, and again started giving the same error. All the answers I've looked up deal with strings of some fixed specified length. I'm not very experienced in programming so any help as to why this is happening is appreciated. (The code is for computing the Zeckendorf representation.)
If I simply use printf to print the digits one by one instead of using strings the code works fine.
#include <string.h>
// helper function to compute the largest fibonacci number <= n
// this works fine
void maxfib(int n, int *index, int *fib) {
int fib1 = 0;
int fib2 = 1;
int new = fib1 + fib2;
*index = 2;
while (new <= n) {
fib1 = fib2;
fib2 = new;
new = fib1 + fib2;
(*index)++;
if (new == n) {
*fib = new;
}
}
*fib = fib2;
(*index)--;
}
char *zeckendorf(int n) {
int index;
int newindex;
int fib;
char *ans = ""; // I'm guessing the error is coming from here
while (n > 0) {
maxfib(n, &index, &fib);
n -= fib;
maxfib(n, &newindex, &fib);
strcat(ans, "1");
for (int j = index - 1; j > newindex; j--) {
strcat(ans, "0");
}
}
return ans;
}
Your guess is quite correct:
char *ans = ""; // I'm guessing the error is coming from here
That makes ans point to a read-only array of one character, whose only element is the string terminator. Trying to append to this will write out of bounds and give you undefined behavior.
One solution is to dynamically allocate memory for the string, and if you don't know the size beforehand then you need to reallocate to increase the size. If you do this, don't forget to add space for the string terminator, and to free the memory once you're done with it.
Basically, you have two approaches when you want to receive a string from function in C
Caller allocates buffer (either statically or dynamically) and passes it to the callee as a pointer and size. Callee writes data to buffer. If it fits, it returns success as a status. If it does not fit, returns error. You may decide that in such case either buffer is untouched or it contains all data fitting in the size. You can choose whatever suits you better, just document it properly for future users (including you in future).
Callee allocates buffer dynamically, fills the buffer and returns pointer to the buffer. Caller must free the memory to avoid memory leak.
In your case the zeckendorf() function can determine how much memory is needed for the string. The index of first Fibonacci number less than parameter determines the length of result. Add 1 for terminating zero and you know how much memory you need to allocate.
So, if you choose first approach, you need to pass additional two parameters to zeckendorf() function: char *buffer and int size and write to the buffer instead of ans. And you need to have some marker to know if it's first iteration of the while() loop. If it is, after maxfib(n, &index, &fib); check the condition index+1<=size. If condition is true, you can proceed with your function. If not, you can return error immediately.
For second approach initialize the ans as:
char *ans = NULL;
after maxfib(n, &index, &fib); add:
if(ans==NULL) {
ans=malloc(index+1);
}
and continue as you did. Return ans from function. Remember to call free() in caller, when result is no longer needed to avoid memory leak.
In both cases remember to write the terminating \0 to buffer.
There is also a third approach. You can declare ans as:
static char ans[20];
inside zeckendorf(). Function shall behave as in first approach, but the buffer and its size is already hardcoded. I recommend to #define BUFSIZE 20 and either declare variable as static char ans[BUFSIZE]; and use BUFSIZE when checking available size. Please be aware that it works only in single threaded environment. And every call to zeckendorf() will overwrite the previous result. Consider following code.
char *a,*b;
a=zeckendorf(10);
b=zeckendorf(15);
printf("%s\n",a);
printf("%s\n",b);
The zeckendorf() function always return the same pointer. So a and b would pointer to the same buffer, where the string for 15 would be stored. So, you either need to store the result somewhere, or do processing in proper order:
a=zeckendorf(10);
printf("%s\n",a);
b=zeckendorf(15);
printf("%s\n",b);
As a rule of thumb majority (if not all) Linux standard C library function uses either first or third approach.

Passing a non-empty string to snprintf causes an unrelated char* array to change addresses

I'm working on the exercises in K&R's book, and I've run into a weird bug while trying to extend 04-06 to allow for variables with string names. Truthfully, I've actually managed to fix the bug (pretty simple - explained below), but I'd like to know why the error was occuring in the first place.
For those unfamiliar with the problem, you're basically asked to create a command-line calculator (using Polish notation) that can store and recall variables with character names.
Here's the relevant code where the issue occurs:
#define MAXOPLEN 1000
int varCount = 1;
char **keys;
char **values;
// changing the declaration to:
// char strOps[][STROPCOUNT] = { ... };
// fixed the issue
char *strOps[STROPCOUNT] = { "dupe", "swap", "del", "print",
"clr", "sin", "cos", "tan",
"exp", "pow", "ln", "log",
"mem", "re"};
main() {
keys = malloc(varCount * sizeof(char[MAXOPLEN]));
keys[0] = "ans";
values = malloc(varCount * sizeof(char[MAXOPLEN]));
values[0] = "0.0";
... // Other stuff related to the program
}
// flag is unrelated to the problem I'm asking about. It just checks to see
// if the variable name used to store value n is 'ans', which is where
// the last returned value is stored automatically
void memorize(char s[], double n, bool flag) {
... // small conditional block for flag
for (i = 0; i < varCount; i++) {
if (equals(keys[i], s)) {
found = True;
// Next line is where the program actually breaks
snprintf(values[i], MAXOPLEN, "%f", n);
break;
}
}
if (!found) {
i = varCount;
varCount++;
keys = realloc(keys, varCount * sizeof(char*));
keys[i] = malloc(sizeof(char[MAXOPLEN]));
keys[i] = s;
values = realloc(values, varCount * sizeof(char*));
values[i] = malloc(sizeof(char[MAXOPLEN]));
snprintf(values[i], MAXOPLEN, "%f", n);
}
}
After compiling and running, the first time you enter in an equation to calculate, everything seems to run smoothly. However, while debugging, I found out that the first three char* in strOps were oddly made to point to different addresses. When trying to save the return value of the equation to "ans", it enters the for-loop in memorize() that tries to see if string s had been used as a key name already. It correctly finds keys[0] to point to a string matching s's value ("ans"), then attempts to convert double n to a string and save it in values[0].
While inside the snprintf() function, the first three char* in strOps are made to point elsewhere inside this method in corecrt_stdio_config.h:
_Check_return_ _Ret_notnull_
__declspec(noinline) __inline unsigned __int64* __CRTDECL __local_stdio_printf_options(void)
{
// Error occurs after this next line:
static unsigned __int64 _OptionsStorage;
return &_OptionsStorage;
}
As commented in the code above, making strOps a 2D array of characters (rather than an array of char pointers) fixed the issue. This makes sense because arrays of characters can't have the values of individual characters changed, but what I don't understand is why the that method in corecrt_stdio_config.h was changing the values of those three pointers in the first place.
Thanks!
Your initializations are incorrect and are causing the change:
keys[0] = "ans";
values[0] = "0.0";
Both "ans" and "0.0" are string literals and cannot be used to initialize the arrays, you need to use strcpy after you allocate.
strcpy (keys, "ans");
strcpy (values, "0.0");
Your other option is to assign one character at a time:
size_t i;
char *p = "ans";
for (i = 0; i < strlen (p); i++)
keys[i] = p[i]; /* copy to keys */
p[i] = 0; /* nul-terminate */
note: these are examples of your errors, you do the same thing throughout your code.

Writing a get next token function

Given a C-string: how would I be able to write a function that will get the next token in the string, and a function that will peek the next token and return that without using global variables?
What I'm trying to do is have a static variable that will hold the string, and when called, it would just increment a pointer, and it will reset that static variable throwing out the token that has been retrieved. The problem is: how would I be able to differentiate between the first call (when it will actually store the string) and the other calls, when I am just retrieving it?
Any thoughts on this?
EDIT:
Here's what I have now that "works" but I want to make sure that it should actually work and its not just a coincidence of a pointer being null:
char next_token(char *line) {
static char *p;
if (p == NULL)
p = line;
else {
char next_token = p[0];
p++;
return next_token;
}
}
The code in your edit is wrong. You are handling the NULL case incorrectly.
I initially answered in terms of emulating strtok which seemed to be what you wanted, but you have clarified that you want single characters.
The if-condition should be:
if (line != NULL) p = line;
And you presumably remove the else so that code executes every time... Unless you don't want a result on the first call (you should at least return a value though).
You call like this:
char token = next_token(line);
while( 0 != (token = next_token(NULL)) ) {
// etc
}
typedef struct {
char* raw;
// whatever you need to keep track
} parser_t
void parser_init(parser_t* p, char* s)
{
// init your parser
}
bool parser_get_token(parser_t* p, char* token)
{
// return the token in "token" or return a bool error ( or an enum of various errors)
}
bool parser_peek_token(parser_t* p, char* token)
{
// same deal, but don't update where you are...
}
You have a couple of choices. One would be to use an interface roughly like strtok does, where passing a non-null pointer initializes the static variable, and passing a null pointer retrieves a token. This, however, is fairly ugly, clumsy, error-prone, and problematic in the presence of multithreading.
Another possibility would be to use a file-level static variable with separate functions (both in that file) to initialize the static variable, and to retrieve the next token from the string. This is marginally cleaner, but still have most of the same problems.
A third would be to make it act (for one example) like a file -- the user calls parse_open (for example), passing in the string to parse. You return an opaque handle to them. They then pass that back to (say) get_token each time they want another token.
Basically, there are three ways of a function to pass information back to its caller:
via a global variable
via the return value
via a pointer argument
And, similarly there are ways for the function to maintain state between calls:
via a global or (function-)static variable
by supplying it as a function parameter and returning it after every call
via a pointer argument.
A nice coding convention for a lexer/tokeniser is to use the return value to communicate the number of characters consumed. (and maybe use an extra pointer variable to pass the parser state to and fro calls)
This is wakkerbot's parser:
STATIC size_t tokenize(char *string, int *sp);
Usage:
STATIC void make_words(char * src, struct sentence * target)
{
size_t len, pos, chunk;
STRING word ;
int state = 0; /* FIXME: this could be made static to allow for multi-line strings */
target->size = 0;
len = strlen(src);
if (!len) return;
for(pos=0; pos < len ; ) {
chunk = tokenize(src+pos, &state);
if (!chunk) { /* maybe we should reset state here ... */ pos++; }
if (chunk > STRLEN_MAX) {
warn( "Make_words", "Truncated too long string(%u) at %s\n", (unsigned) chunk, src+pos);
chunk = STRLEN_MAX;
}
word.length = chunk;
word.word = src+pos;
if (word_is_usable(word)) add_word_to_sentence(target, word);
if (pos+chunk >= len) break;
pos += chunk;
}
...
}

C: Passing a variable to a function gives wrong return

I am using some C with an embedded device and currently testing some code to read file details from an SD card. I am using a proprietary API but I will try to remove that wherever possible.
Rather than explaining, I will try to let me code speak for itself:
char* getImage() {
int numFiles = //number of files on SD card
for(int i=0; i<numFiles;i++) {
\\lists the first file name in root of SD
char *temp = SD.ls(i, 1, NAMES);
if(strstr(temp, ".jpg") && !strstr(temp, "_")) {
return temp;
}
}
return NULL;
}
void loop()
{
\\list SD contents
USB.println(SD.ls());
const char * image = getImage();
if(image != NULL) {
USB.println("Found an image!");
USB.println(image);
int byte_start = 0;
USB.print("Image Size: ");
**USB.println(SD.getFileSize(image));
USB.println(SD.getFileSize("img.jpg"));**
}
The two lines at the bottom are the troublesome ones. If I pass a literal string then I get the file size perfectly. However, if I pass the string (as represented by the image variable) then I am given a glorious -1. Any ideas why?
For clarity, the print out of image does display the correct file name.
EDIT: I know it is frowned upon in C to return a char and better to modify a variable passed to the function. I have used this approach as well and an example of the code is below, with the same result:
char * image = NULL;
getSDImage(&image, sizeof(image));
void getSDImage(char ** a, int length) {
int numFiles = SD.numFiles();
for(int i=0; i<numFiles;i++) {
char *temp = SD.ls(i, 1, NAMES);
if(strstr(temp, ".jpg") && !strstr(temp, "_")) {
*a = (char*)malloc(sizeof(char) * strlen(temp));
strcpy(*a, temp);
}
}
}
EDIT 2: The link to the entry is here: SD.ls and the link for the file size function: SD.getFileSize
From the return, it seems like the issue is with the file size function as the return is -1 (not 0) and because a result is returned when listing the root of the SD.
Thanks!
UPDATE: I have added a check for a null terminated string (it appears that this was an issue) and this has been addressed in the getSDImage function, with the following:
void getSDImage(char ** a, int length) {
int numFiles = SD.numFiles();
for(int i=0; i<numFiles;i++) {
char *temp = SD.ls(i, 1, NAMES);
if(strstr(temp, ".jpg") && !strstr(temp, "_")) {
*a = (char*)malloc(sizeof(char) * strlen(temp));
strncpy(*a, temp, strlen(temp)-1);
*a[strlen(*a)-1] = '\0';
}
}
}
This seems to work and my results to standard output are fine, the size is now not shown as the error-indicating -1 but rather -16760. I thought I should post the update in case anyone has any ideas but my assumption is that this is something to do with the filename string.
There are several things that could be wrong with your code:
1) You might be passing "invisible" characters such as whitespaces. Please make sure that the string you are passing is exactly the same, i.e. print character by character including null termination and see if they are the same.
2) The value that is getting returned by API and latter used by other API may not be as expected. I would advise that (if possible) you look at the API source code. If you can compile the API itself then it should be easy to find the problem (check what API getFileSize() gets from parameters). Based on the API documentation you have sent check the value stored in buffer[DOS_BUFFER_SIZE] after you get -1 from.
EDIT (after looking at the API source code):
On line 00657 (func find_file_in_dir) you have:
if(strcmp(dir_entry->long_name, name) == 0)
it seems as the only reason why you would have different reply when using string literal as opposed to the getting name from your function. So it is very likely that you are not passing the same values (i.e. you are either passing invisible chars or you are missing string termination).
As final note: Check the content of buffer[DOS_BUFFER_SIZE] before each code to SD API.
I hope this helps.
Kind regards,
Bo
This:
if(strstr(temp, ".jpg") && !strstr(temp, "_")) {
*a = (char*)malloc(sizeof(char) * strlen(temp));
strcpy(*a, temp);
}
is broken, it's not allocating room for the terminator and is causing a buffer overflow.
You should use:
*a = malloc(strlen(temp) + 1);
There's no need to cast the return value of malloc() in C, and sizeof (char) is always 1.

Resources