I am attempting to write a very basic lexxer in C and have the following code which is supposed to just do something like the following:
Input: "12 142 123"
Output:
NUMBER -- 12
NUMBER -- 14
NUMBER -- 123
However, I am having an issue where if I do not include an initial printf("") statement before looping over the input, then I will get an output like this:
Output:
NUMBER --
NUMBER -- 14
NUMBER -- 123
where the first number is simply blank. I am really confused as to why this is happening and would really appreciate some help with this!
I have the following code (with a number of irrelevant functions omitted)
#define MAX_LEN 400
char* input;
char* ptr;
char curr_type;
char curr;
enum token_type {
END,
NUMBER,
UNEXPECTED
};
typedef struct {
enum token_type type;
char* str;
} Token;
void print_tok(Token t) {
printf("%s -- %s\n", token_types[t.type], t.str);
}
char get(void) {
return *ptr++;
}
char peek(void) {
return *ptr;
}
Token number(void) {
char arr[MAX_LEN];
arr[0] = peek();
get();
int i = 1;
while (is_digit(peek())) {
arr[i] = get();
++i;
}
arr[++i] = '\0';
Token ret = {NUMBER, (char*)arr};
return ret;
}
Token unexpected(void) {
// omitted
}
Token next(void) {
while (is_space(peek())) get();
char c = peek();
switch (peek()) {
case '0':
// omitted
case '9':
return number();
default:
return unexpected();
}
}
int main(int argc, char **argv) {
printf(""); // works fine with this line
input = argv[1];
ptr = input;
Token tokens[MAX_LEN];
Token t;
int i = 0;
do {
t = next();
print_tok(t);
tokens[i++] = t;
} while (t.type != END && t.type != UNEXPECTED);
return 0;
}
In number, arr is a local variable. The local variable is destroyed when its function ends and its content is then unpredictable. Nonetheless, your program then prints its value by using a pointer in the Token struct.
The value that is printed is unpredictable. The extra printf("") statement may cause the compiler to rearrange the code in a way that causes the variable to not get overwritten, or something like that. You cannot rely on it.
You have several other options to allocate memory per token:
Change str in token so it's an array of chars instead of a pointer. Then each token has its own space to store the string.
Allocate the string with malloc. Then it stays allocated until you free it.
Create the array in main so it's valid for both next and print_tok. You'd have to give next a pointer to the array, so it knows where it should store the string. This would only store one token's string at a time.
Basically any other way of creating an array other than making it a local variable in next.
Make the pointer point to where the token is in the original string. Add another variable in Token which stores how long the token is.
I think the first option is easiest and the last option uses the least memory, but I included some other options for completeness.
Related
I'm using an array of strings in C to hold arguments given to a custom shell. I initialize the array of buffers using:
char *args[MAX_CHAR];
Once I parse the arguments, I send them to the following function to determine the type of IO redirection if there are any (this is just the first of 3 functions to check for redirection and it only checks for STDIN redirection).
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
Once I compile and run the shell, it crashes with a SIGSEGV. Using GDB I determined that the shell is crashing on the following line:
if (strlen(args[i]) == 0) {
This is because the address of arg[i] (the first empty string after the parsed commands) is inaccessible. Here is the error from GDB and all relevant variables:
(gdb) next
359 if (strlen(args[i]) == 0) {
(gdb) p args[0]
$1 = 0x7fffffffe570 "echo"
(gdb) p args[1]
$2 = 0x7fffffffe575 "test"
(gdb) p args[2]
$3 = 0x0
(gdb) p i
$4 = 2
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
parseInputFile (args=0x7fffffffd570, inputFilePath=0x7fffffffd240 "") at shell.c:359
359 if (strlen(args[i]) == 0) {
I believe that the p args[2] returning $3 = 0x0 means that because the index has yet to be written to, it is mapped to address 0x0 which is out of the bounds of execution. Although I can't figure out why this is because it was declared as a buffer. Any suggestions on how to solve this problem?
EDIT: Per Kaylum's comment, here is a minimal reproducible example
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include <sys/stat.h>
#include<readline/readline.h>
#include<readline/history.h>
#include <fcntl.h>
// Defined values
#define MAX_CHAR 256
#define MAX_ARG 64
#define clear() printf("\033[H\033[J") // Clear window
#define DEFAULT_PROMPT_SUFFIX "> "
char PROMPT[MAX_CHAR], SPATH[1024];
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
int ioRedirectHandler(char **args) {
char inputFilePath[MAX_CHAR] = "";
// Check if any redirects exist
if (parseInputFile(args, inputFilePath)) {
return 1;
} else {
return 0;
}
}
void parseArgs(char *cmd, char **cmdArgs) {
int na;
// Separate each argument of a command to a separate string
for (na = 0; na < MAX_ARG; na++) {
cmdArgs[na] = strsep(&cmd, " ");
if (cmdArgs[na] == NULL) {
break;
}
if (strlen(cmdArgs[na]) == 0) {
na--;
}
}
}
int processInput(char* input, char **args, char **pipedArgs) {
// Parse the single command and args
parseArgs(input, args);
return 0;
}
int getInput(char *input) {
char *buf, loc_prompt[MAX_CHAR] = "\n";
strcat(loc_prompt, PROMPT);
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
add_history(buf);
strcpy(input, buf);
return 0;
} else {
return 1;
}
}
void init() {
char *uname;
clear();
uname = getenv("USER");
printf("\n\n \t\tWelcome to Student Shell, %s! \n\n", uname);
// Initialize the prompt
snprintf(PROMPT, MAX_CHAR, "%s%s", uname, DEFAULT_PROMPT_SUFFIX);
}
int main() {
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
int isPiped = 0, isIORedir = 0;
init();
while(1) {
// Get the user input
if (getInput(input)) {
continue;
}
isPiped = processInput(input, args, pipedArgs);
isIORedir = ioRedirectHandler(args);
}
return 0;
}
Note: If I forgot to include any important information, please let me know and I can get it updated.
When you write
char *args[MAX_CHAR];
you allocate room for MAX_CHAR pointers to char. You do not initialise the array. If it is a global variable, you will have initialised all the pointers to NULL, but you do it in a function, so the elements in the array can point anywhere. You should not dereference them before you have set the pointers to point at something you are allowed to access.
You also do this, though, in parseArgs(), where you do this:
cmdArgs[na] = strsep(&cmd, " ");
There are two potential issues here, but let's deal with the one you hit first. When strsep() is through the tokens you are splitting, it returns NULL. You test for that to get out of parseArgs() so you already know this. However, where your program crashes you seem to have forgotten this again. You call strlen() on a NULL pointer, and that is a no-no.
There is a difference between NULL and the empty string. An empty string is a pointer to a buffer that has the zero char first; the string "" is a pointer to a location that holds the character '\0'. The NULL pointer is a special value for pointers, often address zero, that means that the pointer doesn't point anywhere. Obviously, the NULL pointer cannot point to an empty string. You need to check if an argument is NULL, not if it is the empty string.
If you want to check both for NULL and the empty string, you could do something like
if (!args[i] || strlen(args[i]) == 0) {
If args[i] is NULL then !args[i] is true, so you will enter the if body if you have NULL or if you have a pointer to an empty string.
(You could also check the empty string with !(*args[i]); *args[i] is the first character that args[i] points at. So *args[i] is zero if you have the empty string; zero is interpreted as false, so !(*args[i]) is true if and only if args[i] is the empty string. Not that this is more readable, but it shows again the difference between empty strings and NULL).
I mentioned another issue with the parsed arguments. Whether it is a problem or not depends on the application. But when you parse a string with strsep(), you get pointers into the parsed string. You have to be careful not to free that string (it is input in your main() function) or to modify it after you have parsed the string. If you change the string, you have changed what all the parsed strings look at. You do not do this in your program, so it isn't a problem here, but it is worth keeping in mind. If you want your parsed arguments to survive longer than they do now, after the next command is passed, you need to copy them. The next command that is passed will change them as it is now.
In main
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
are all uninitialized. They contain indeterminate values. This could be a potential source of bugs, but is not the reason here, as
getInput modifies the contents of input to be a valid string before any reads occur.
pipedArgs is unused, so raises no issues (yet).
args is modified by parseArgs to (possibly!) contain a NULL sentinel value, without any indeterminate pointers being read first.
Firstly, in parseArgs it is possible to completely fill args without setting the NULL sentinel value that other parts of the program should rely on.
Looking deeper, in parseInputFile the following
if (strlen(args[i]) == 0)
contradicts the limits imposed by parseArgs that disallows empty strings in the array. More importantly, args[i] may be the sentinel NULL value, and strlen expects a non-NULL pointer to a valid string.
This termination condition should simply check if args[i] is NULL.
With
strcpy(inputFilePath, args[i+1]);
args[i+1] might also be the NULL sentinel value, and strcpy also expects non-NULL pointers to valid strings. You can see this in action when inputSymbol is a match for the final token in the array.
args[i+1] may also evaluate as args[MAX_ARGS], which would be out of bounds.
Additionally, inputFilePath has a string length limit of MAX_CHAR - 1, and args[i+1] is (possibly!) a dynamically allocated string whose length might exceed this.
Some edge cases found in getInput:
Both arguments to
strcat(loc_prompt, PROMPT);
are of the size MAX_CHAR. Since loc_prompt has a length of 1. If PROMPT has the length MAX_CHAR - 1, the resulting string will have the length MAX_CHAR. This would leave no room for the NUL terminating byte.
readline can return NULL in some situations, so
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
can again pass the NULL pointer to strlen.
A similar issue as before, on success readline returns a string of dynamic length, and
strcpy(input, buf);
can cause a buffer overflow by attempting to copy a string greater in length than MAX_CHAR - 1.
buf is a pointer to data allocated by malloc. It's unclear what add_history does, but this pointer must eventually be passed to free.
Some considerations.
Firstly, it is a good habit to initialize your data, even if it might not matter.
Secondly, using constants (#define MAX_CHAR 256) might help to reduce magic numbers, but they can lead you to design your program too rigidly if used in the same way.
Consider building your functions to accept a limit as an argument, and return a length. This allows you to more strictly track the sizes of your data, and prevents you from always designing around the maximum potential case.
A slightly contrived example of designing like this. We can see that find does not have to concern itself with possibly checking MAX_ARGS elements, as it is told precisely how long the list of valid elements is.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_ARGS 100
char *get_input(char *dest, size_t sz, const char *display) {
char *res;
if (display)
printf("%s", display);
if ((res = fgets(dest, sz, stdin)))
dest[strcspn(dest, "\n")] = '\0';
return res;
}
size_t find(char **list, size_t length, const char *str) {
for (size_t i = 0; i < length; i++)
if (strcmp(list[i], str) == 0)
return i;
return length;
}
size_t split(char **list, size_t limit, char *source, const char *delim) {
size_t length = 0;
char *token;
while (length < limit && (token = strsep(&source, delim)))
if (*token)
list[length++] = token;
return length;
}
int main(void) {
char input[512] = { 0 };
char *args[MAX_ARGS] = { 0 };
puts("Welcome to the shell.");
while (1) {
if (get_input(input, sizeof input, "$ ")) {
size_t argl = split(args, MAX_ARGS, input, " ");
size_t redirection = find(args, argl, "<");
puts("Command parts:");
for (size_t i = 0; i < redirection; i++)
printf("%zu: %s\n", i, args[i]);
puts("Input files:");
if (redirection == argl)
puts("[[NONE]]");
else for (size_t i = redirection + 1; i < argl; i++)
printf("%zu: %s\n", i, args[i]);
}
}
}
I started learning programming with Python a couple months ago, and I decided to learn C because I am interested in lower level languages that let me interact closer to the computers hardware. I'm trying to get some input from a user in C, and I am writing my own little input parser. I'm having some trouble here:
#include <stdio.h>
char prompt()
{
char resp[]; // Create a variable for the users response
for (int i = 0; i < 1000; ++i)
{
char letter = getchar(); // Get character
if (letter == '\n') // If the user hits enter break the loop
{
break;
}
else // Otherwise append the character to the response array
{
resp[i] = letter;
}
}
return resp[]; // Return the response array
}
int main() {
return 0;
}
I'm receiving errors with this code. The errors specifically say:
error: definition of variable with array type needs an explicit size or an initializer
char resp[];
I take it that I must define a set value for an array or assign it something right away. I don't understand how I can grow the character array as the user types the input if arrays in C must have a defined value. I am thinking that using pointers or memory management might work, but I haven't learned a lot about these things yet so if you do have a solution involving pointers, it would be of great help to me if you could briefly explain what the code is doing. In the meantime I will try and find a solution.
You probably want to allocate memory, here's how you would do it>
#include <stdio.h>
#include <stdlib.h>
char* prompt()
{
char* resp; // Create a variable for the users response
int i;
resp = malloc(sizeof(char));//allocate space for one char
for (i = 0; i < 1000; ++i)
{
resp = realloc(resp, sizeof(char)*(i+1));//allocate space for one more char
char letter = getchar(); // Get character
if (letter == '\n') // If the user hits enter break the loop
{
break;
}
else // Otherwise append the character to the response array
{
resp[i] = letter;
}
}
return resp; // Return the response array
}
int main() {
char* answer = prompt();
printf("answer is %s\n", answer);
return 0;
}
And this will work but it is highly NOT recommended to do it like this because you never free() your allocated memory.
EDIT>
How to do it? You might want to avoid creating a new function and do it all in your main(), in that case, when you no longer need your variable simply call free(resp);
There isn't a dynamic runtime in C. You need to explicitly allocate resources because your code essentially translates directly into machine instructions.
Here's an example.
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
int prompt(const size_t bufsize, char response[bufsize]) {
if (fgets(response, bufsize, stdin) == NULL)
return -1; // error
return 0;
}
int main() {
const size_t bufsize = 1024;
char* response = malloc(bufsize); // Create a variable for the users response
if (!response)
return -1; // error
if (prompt(bufsize, response))
return -1; // error
printf("%s", response);
free(response);
return 0;
}
I'm pretty new to C and was wondering if I could get some help! I've been working on this bug for +15 hours.
So, this program is a tokenizer.
Basically, the program is supposed to take a string, or "token stream," and break it up into "tokens." A "token" is a string of either a word, hexadecimal int, octal int, decimal int, floating point int, or symbol.
The code I'm posting is only the code where things go wrong, the other portion of my program is what creates the token.
The gist of how the below code works is this: It takes a "token stream", and then finds the next token from that stream. Once that is completed, it will create a substring of the "token stream" minus the new token, and return that as the new "token stream."
Essentially, when the string "0x4356/*abdc 0777 */[]87656879jlhg kl(/j jlkh 'no thank you' /" is passed through, the program will do everything properly except when "jlhg kl(/j jlkh 'no thank you' /" passes. Once that passes through my program, a "jlhg" token is created BUT then it is added to the end of the token stream again. So, the new token stream to be broken down becomes " kl(/j jlkh 'no thank you' / jlhg" where jlhg is added on at the end, where it wasn't there before. It does this same weird thing once more, right afterwards, but with "kl" instead.
It only does this under extremely weird conditions, so I'm not sure the cause. I put print statements throughout my program and things flow normally except seemingly out of no where, the program will just add those at the end. This I why I feel like it might be a memory problem, but I have absolutely no clue where to go from here.
Any help would be GREATLY appreciated!!!!
EDIT: If you pass the string "array[xyz ] += pi 3.14159e-10 A12B" output should be:
word "array"
left brace "["
word "xyz"
right brace "]"
plusequals "+="
word "pi"
float "3.14159e-10"
word "A12B"
My TokenizerT is this:
struct TokenizerT_
{
char *tokenType;
char *token;
};
typedef struct TokenizerT_ TokenizerT;
Relevant code:
/*
* TKNewStream takes two TokenizerT objects.
* It will locate the index of the end of the last token,
* and create a substring with the new string to be tokenized.
* #tokenStream: old token stream
* #newToken: new token created from old token stream
*
*/
char *TKGetNextStream(char *tokenStream, char *newToken)
{
int i,
index = 0,
count = 0;
char last = newToken[strlen(newToken)-1];
for(i = 0; i < strlen(newToken); i++)
{
if(newToken[i] == last)
{
count++;
}
}
for(i = 0; i < strlen(tokenStream); i++)
{
if(tokenStream[i] == last && count == 1)
{
index = i + 1;
break;
}
else if(tokenStream[i] == last)
{
count--;
}
}
char *ret = malloc(sizeof(char)*(strlen(tokenStream) - index));
for(i = 0; i < strlen(tokenStream) - index; i++)
{
ret[i] = tokenStream[i+index];
}
return ret;
}
/*
* This is my main
*/
int main(int argc, char **argv)
{
char *string = "0x4356/*abdc 0777 */[]87656879jlhg kl(/j jlkh 'no thank you' /";
TokenizerT *newToken = malloc(sizeof(struct TokenizerT_)),
*tokenStream = malloc(sizeof(struct TokenizerT_));
tokenStream->token = string;
while(newToken != NULL)
{
newToken = TKCreate(TKGetNextToken(tokenStream));
if(newToken != NULL)
{
tokenStream->token = TKGetNextStream(tokenStream->token,
newToken->token);
printf("%s \"%s\"\n",
newToken->tokenType,
newToken->token);
}
}
TKDestroy(newToken);
return 0;
}
The string created in ret isn't properly null terminated. So all the functions dealing with strings will assume it goes on until the next random zero byte that happens to be found after the allocated memory.
To fix this allocate one more byte of space for ret and set that to zero, or use an existing function like strdup() to copy the string:
ret = strdup(tokenStream + index);
The following program stores every word and then prints them with a number of occurrences.
Global typedef declaration:
typedef struct {
char * word;
int occ;
}
words;
words *data=NULL;
I have a problem with the search function. I've created a function returning int that looks like this: (max is the constantly updated size of an array of structures, that's why I call search function after EOF is reached.)
int search(char *word,int max)
{
int i;
for(i=0; i<max; i++)
{
if(!strcmp(data[i].word,word)) return i;
}
return -1;
}
But I noticed I'm supposed to write a search function having that prototype:
struct abc *find(char *word)
So I've created the following code:
struct words *findword(char *word)
{
struct words *ptr;
for (ptr = data; ptr != NULL; ptr++) { /* IS THE STOP CONDITION OK? */
if (strcmp(word, ptr->word) == 0)
return ptr;
}
return NULL;
}
And I receive many errors during compilation:
reverse.c: In function ‘findword’:
reverse.c:73: warning: assignment from incompatible pointer type
reverse.c:73: error: increment of pointer to unknown structure
reverse.c:73: error: arithmetic on pointer to an incomplete type
reverse.c:74: error: dereferencing pointer to incomplete type
reverse.c: In function ‘main’:
reverse.c:171: error: ‘which’ undeclared (first use in this function)
reverse.c:171: error: (Each undeclared identifier is reported only once
reverse.c:171: error: for each function it appears in.)
make: * [reverse.o] Error 1
which is an int variable assigned to the return of my firstly written search function.
The error with which is easily fixed, but I don't know how to replace that (solution working with my base search function):
data[which].occ++;
How to fix it so that it'll work with my new approach to search?
EDIT
main() added:
int main(int argc, char **argv)
{
char *word;
words *temp;
int c,i,num;
/*int which;*/
FILE *infile;
if(argc!=2) {}
if((infile=fopen(argv[1],"r"))==NULL) {}
num=0;
while(1)
{
c=fgetc(infile);
if(c==EOF) break;
if(!isalpha(c)) continue;
else ungetc(c,infile);
word=getword(infile);
word=convert(word);
/*which=search(word,num);*/
if(findword(word))
{
if(!(temp=realloc(data,sizeof(words)*(num+1))))
{}
else
data=temp;
data[num].word=strdup(word);
data[num].occ=1;
num++;
}
else
data[which].occ++;
free(word);
}
sort(num-1);
for(i=0;i<num;i++)
{}
free(data);
if(fclose(infile))
{}
return 0;
}
I've left {} for the irrelevant pieces of code eg. error handling.
EDIT2
The things I'm asking for above, are fixed. However, I get a seg fault now.
I'll give a link to the whole code, I don't want to put it in an edited post since it'd create a big mess. Seg fault is caused by lines 73 and 152 (strcmp is not working somehow). Hope that full code will be easier to understand.
FULL CODE
The problems are with your findword function, lets go through all the lines
struct words *ptr;
This is not what you ment to do. The typedef you used in defining the structure allows you to not need to write struct anymore. This is why you're getting the error: reverse.c:73: error: increment of pointer to unknown structure. What you want is just:
words *ptr;
Next, the loop:
for(ptr=data; //This is fine, you're assigning your local ptr to the global data. I assume that's something valid
ptr != NULL; //That could OK too... we can loop while ptr is not NULL
ptr++) //This line makes no sense...
You may want to look up how for loops work again, the point is you're incrementing something until it hits a condition. ptr++ will move where you're pointing too, so you'll no longer be pointing to your structure.
I need to see your main() function to understand what you're trying to accomplish, but based on the prototype you have to follow, I think the easiest solution would be something like:
void main()
{
// init your vars
bool more_words_to_add = true;
words *ptr = NULL;
int i;
// populate your array of words
while(more_words_to_add) {
for(i = 0; i<max; i++) {
if(ptr = findword("word")) //if we find the word
ptr->occ++; //increment the number of times we found it
else {
//I don't know what you want to do here, it's not clear what your goal is.
//Add the new word to your array of words and set occ to 1,
//then increment max because there's one more word in your array?
}
}
//get the next word to fine, or else set more_words_to_add = false to break
}
}
If this type of solution is what you're looking to do, then you can adjust your findwords function to be very simple:
struct words *findword(char *word)
{
words *ptr = data;
if (strcmp(word, ptr->word) == 0)
return ptr;
return NULL;
}
EDIT: For your new error I suspect the problem is with your memory allocation, see this short example of using your structure:
words *findword(char *word)
{
words *ptr = data;
if(strcmp(word, ptr->word) == 0)
return ptr;
return NULL;
}
int main(){
words *ptr;
data = realloc(data, sizeof(words));
data->word = "hello"; //DO NOT SKIP THESE LINES
data->occ = 0; //DO NOT SKIP THESE LINES
if(ptr = findword("hello")) {
ptr->occ++;
printf("I found %d %s's\n",ptr->occ, ptr->word);
}
}
mike#linux-4puc:~> ./a.out
I found 1 hello's
You can see here that you need to alloc some memory for the global structure then you can store data in it and pass pointers to it.
EDIT 2:
Your main() code does this:
if((ptr = findword(word)))
{
//do stuff
}
else
ptr->occ++;
That's not going to work because if findword() fails it returns NULL, so in the if check ptr is set to NULL, then in the else you're trying to deference NULL. If (and keep in mind I'm not really reading your logic so this is up to you) you really want to increment ptr->occ if a word is not found then you want this instead:
if(findword(word))
{
ptr = findword(word);
//do stuff
}
else
ptr->occ++; //increments the current ptr's occ, no new ptr was assigned.
for (ptr = data; ptr != NULL; ptr++) {
/* IS THE STOP CONDITION OK? */
No. Your pointer just keeps getting incremented. The only thing that would make it NULL in that code is integer overflow. You could look at what it points to, and see if that is NULL, IF you preset the data area to 0's:
#define NUM_WORDS 100
data = calloc(NUM_WORDS,sizeof(words));
Or
#define NUM_WORDS 100
int bytes = NUM_WORDS * sizeof(words);
data = malloc(bytes);
memset(data,0,bytes);
....
for (ptr = data; ptr->word != NULL; ptr++) {
If you don't want to preset the data area to 0 then you will have to pass the current amount of structs currently held in the data area to your function in order to know how much to loop.
There's no such thing as struct words in your program; there's an unnamed struct type, and a typedef words to that type. Either use struct words or words consistently.
You'll then need to replace
data[which].occ++;
with
result->occ++;
where result is the return value from your new search function.
I'm trying to make a quick function that gets a word/argument in a string by its number:
char* arg(char* S, int Num) {
char* Return = "";
int Spaces = 0;
int i = 0;
for (i; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
//Want to append S[i] to Return here.
}
else if (Spaces > Num) {
return Return;
}
}
printf("%s-\n", Return);
return Return;
}
I can't find a way to put the characters into Return. I have found lots of posts that suggest strcat() or tricks with pointers, but every one segfaults. I've also seen people saying that malloc() should be used, but I'm not sure of how I'd used it in a loop like this.
I will not claim to understand what it is that you're trying to do, but your code has two problems:
You're assigning a read-only string to Return; that string will be in your
binary's data section, which is read-only, and if you try to modify it you will get a segfault.
Your for loop is O(n^2), because strlen() is O(n)
There are several different ways of solving the "how to return a string" problem. You can, for example:
Use malloc() / calloc() to allocate a new string, as has been suggested
Use asprintf(), which is similar but gives you formatting if you need
Pass an output string (and its maximum size) as a parameter to the function
The first two require the calling function to free() the returned value. The third allows the caller to decide how to allocate the string (stack or heap), but requires some sort of contract about the minumum size needed for the output string.
In your code, when the function returns, then Return will be gone as well, so this behavior is undefined. It might work, but you should never rely on it.
Typically in C, you'd want to pass the "return" string as an argument instead, so that you don't have to free it all the time. Both require a local variable on the caller's side, but malloc'ing it will require an additional call to free the allocated memory and is also more expensive than simply passing a pointer to a local variable.
As for appending to the string, just use array notation (keep track of the current char/index) and don't forget to add a null character at the end.
Example:
int arg(char* ptr, char* S, int Num) {
int i, Spaces = 0, cur = 0;
for (i=0; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
ptr[cur++] = S[i]; // append char
}
else if (Spaces > Num) {
ptr[cur] = '\0'; // insert null char
return 0; // returns 0 on success
}
}
ptr[cur] = '\0'; // insert null char
return (cur > 0 ? 0 : -1); // returns 0 on success, -1 on error
}
Then invoke it like so:
char myArg[50];
if (arg(myArg, "this is an example", 3) == 0) {
printf("arg is %s\n", myArg);
} else {
// arg not found
}
Just make sure you don't overflow ptr (e.g.: by passing its size and adding a check in the function).
There are numbers of ways you could improve your code, but let's just start by making it meet the standard. ;-)
P.S.: Don't malloc unless you need to. And in that case you don't.
char * Return; //by the way horrible name for a variable.
Return = malloc(<some size>);
......
......
*(Return + index) = *(S+i);
You can't assign anything to a string literal such as "".
You may want to use your loop to determine the offsets of the start of the word in your string that you're looking for. Then find its length by continuing through the string until you encounter the end or another space. Then, you can malloc an array of chars with size equal to the size of the offset+1 (For the null terminator.) Finally, copy the substring into this new buffer and return it.
Also, as mentioned above, you may want to remove the strlen call from the loop - most compilers will optimize it out but it is indeed a linear operation for every character in the array, making the loop O(n**2).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *arg(const char *S, unsigned int Num) {
char *Return = "";
const char *top, *p;
unsigned int Spaces = 0;
int i = 0;
Return=(char*)malloc(sizeof(char));
*Return = '\0';
if(S == NULL || *S=='\0') return Return;
p=top=S;
while(Spaces != Num){
if(NULL!=(p=strchr(top, ' '))){
++Spaces;
top=++p;
} else {
break;
}
}
if(Spaces < Num) return Return;
if(NULL!=(p=strchr(top, ' '))){
int len = p - top;
Return=(char*)realloc(Return, sizeof(char)*(len+1));
strncpy(Return, top, len);
Return[len]='\0';
} else {
free(Return);
Return=strdup(top);
}
//printf("%s-\n", Return);
return Return;
}
int main(){
char *word;
word=arg("make a quick function", 2);//quick
printf("\"%s\"\n", word);
free(word);
return 0;
}