String tokenizer without using strtok() - c

I'm in the process of writing a string tokenizer without using strtok(). This is mainly for my own betterment and for a greater understanding of pointers. I think I almost have it, but I've been receiving the following errors:
myToc.c:25 warning: assignment makes integer from pointer without a cast
myToc.c:35 (same as above)
myToc.c:44 error: invalid type argument of 'unary *' (have 'int')
What I'm doing is looping through the string sent to the method, finding each delimiter, and replacing it with '\0.' The "ptr" array is supposed to have pointers to the separated substrings. This is what I have so far.
#include <string.h>
void myToc(char * str){
int spcCount = 0;
int ptrIndex = 0;
int n = strlen(str);
for(int i = 0; i < n; i++){
if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
spcCount++;
}
}
//Pointer array; +1 for \0 character, +1 for one word more than number of spaces
int *ptr = (int *) calloc(spcCount+2, sizeof(char));
ptr[spcCount+1] = '\0';
//Used to differentiate separating spaces from unnecessary ones
char temp;
for(int j = 0; j < n; j++){
if(j == 0){
/*Line 25*/ ptr[ptrIndex] = &str[j];
temp = str[j];
ptrIndex++;
}
else{
if(str[j] == ' '){
temp = str[j];
str[j] = '\0';
}
else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
/*Line 35*/ ptr[ptrIndex] = &str[j];
temp = str[j];
ptrIndex++;
}
}
}
int k = 0;
while(ptr[k] != '\0'){
/*Line 44*/ printf("%s \n", *ptr[k]);
k++;
}
}
I can see where the errors are occurring but I'm not sure how to correct them. What should I do? Am I allocating memory correctly or is it just an issue with how I'm specifying the addresses?

You pointer array is wrong. It looks like you want:
char **ptr = calloc(spcCount+2, sizeof(char*));
Also, if I am reading your code correctly, there is no need for the null byte as this array is not a string.
In addition, you'll need to fix:
while(ptr[k] != '\0'){
/*Line 44*/ printf("%s \n", *ptr[k]);
k++;
}
The dereference is not required and if you remove the null ptr, this should work:
for ( k = 0; k < ptrIndex; k++ ){
/*Line 44*/ printf("%s \n", ptr[k]);
}

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void myToc(char * str){
int spcCount = 0;
int ptrIndex = 0;
int n = strlen(str);
for(int i = 0; i < n; i++){
if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
spcCount++;
}
}
char **ptr = calloc(spcCount+2, sizeof(char*));
//ptr[spcCount+1] = '\0';//0 initialized by calloc
char temp = ' ';//can simplify the code
for(int j = 0; j < n; j++){
if(str[j] == ' '){
temp = str[j];
str[j] = '\0';
} else if(str[j] != '\0' && temp == ' '){//can omit `str[j] != ' ' &&`
ptr[ptrIndex++] = &str[j];
temp = str[j];
}
}
int k = 0;
while(ptr[k] != NULL){//better use NULL
printf("%s \n", ptr[k++]);
}
free(ptr);
}
int main(){
char test1[] = "a b c";
myToc(test1);
char test2[] = "hello world";
myToc(test2);
return 0;
}

Update: I tried this at http://www.compileonline.com/compile_c99_online.php
with the fixes for lines 25, 35, and 44, and with a main function that called
myToc() twice. I initially encountered segfaults when trying to write null characters
to str[], but that was only because the strings I was passing were (apparently
non-modifiable) literals. The code below worked as desired when I allocated a text buffer and wrote the strings there before passing them in. This version also could be modified to return the array of pointers, which then would point to the tokens.
(The code below also works even when the string parameter is non-modifiable, as long as
myToc() makes a local copy of the string; but that would not have the desired effect if the purpose of the function is to return the list of tokens rather than just print them.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void myToc(char * str){
int spcCount = 0;
int ptrIndex = 0;
int n = strlen(str);
for(int i = 0; i < n; i++){
if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
spcCount++;
}
}
//Pointer array; +1 for one word more than number of spaces
char** ptr = (char**) calloc(spcCount+2, sizeof(char*));
//Used to differentiate separating spaces from unnecessary ones
char temp;
for(int j = 0; j < n; j++){
if(j == 0){
ptr[ptrIndex] = &str[j];
temp = str[j];
ptrIndex++;
}
else{
if(str[j] == ' '){
temp = str[j];
str[j] = '\0';
}
else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
ptr[ptrIndex] = &str[j];
temp = str[j];
ptrIndex++;
}
}
}
for (int k = 0; k < ptrIndex; ++k){
printf("%s \n", ptr[k]);
}
}
int main (int n, char** v)
{
char text[256];
strcpy(text, "a b c");
myToc(text);
printf("-----\n");
strcpy(text, "hello world");
myToc(text);
}
I would prefer simpler code, however. Basically you want a pointer to the first non-blank character in str[], then a pointer to each non-blank (other than the first) that is preceded by a blank. Your first loop almost gets this idea except it is looking for blanks preceded by non-blanks. (Also you could start that loop at i = 1 and avoid having to test i != 0 on each iteration.)
I might just allocate an array of char* of size sizeof(char*) * (n + 1)/2 to hold the pointers rather than looping over the string twice (that is, I'd omit the first loop, which is just to figure out the size of the array). In any case, if ptr[0] is non-blank I would write its address to the array; then looping for (int j = 1; j < n; ++j), write the address of str[j] to the array if str[j] is non-blank and str[j - 1] is blank--basically what you are doing, but with fewer ifs and fewer auxiliary variables.
Less code means less opportunity to introduce a bug, as long as the code is clean and makes sense.
Previous remarks:
int *ptr = declares an array of int. For an array of pointers to char, you want
char** ptr = (char**) calloc(spcCount+2, sizeof(char*));
The comment prior to that line also seems to indicate some confusion. There is no terminating null in your array of pointers, and you don't need to allocate space for one, so possibly spcCount+2 could be spcCount + 1.
This also is suspect:
while(ptr[k] != '\0')
It looks like it would work, given the way you used calloc (you do need spcCount+2 to make this work), but I would feel more secure writing something like this:
for (k = 0; k < ptrIndex; ++k)
I do not thing that is what caused the segfault, it just makes me a little uneasy to compare a pointer (ptr[k]) with \0 (which you would normally compare against a char).

Related

Same first and last letter in array C

My assignment is to write a C program, in which user inputs a sentence, and the program checks if there are words that begin and end with the same letter (for example eye, roar, sos etc.)
I wrote this and I am not sure it will work
I've also got an error: uninitialized local variable 'p2' used. May someone say where I should initialize it?
Maybe someone can add some tips for improving?
#include <stdio.h>
#include<stdlib.h>
#include <string.h>
#define MAX 100
int main()
{
system("chcp 1251");
char arr[MAX];
char* p1, *p2;
int sym, i;
printf("\n\t Enter your sentence(use .(dot) or Enter to finish input):");
while ((sym = getchar()) != '.' && sym != '\n' && i < MAX)
{
arr[i++] = sym; //symbols to array
arr[i] = '0';
}
p1 = arr; //pointer for 1st symbol
for (i = 0; i < MAX; i++)
{
if (arr[i] == ',' || arr[i] == ' ') //finding a word
{
*p2 = arr[i--];
if (*p1 = *p2) //if first and last letters are equal
{
printf("%s", arr);
}
if (arr[i++] == ',' || arr[i++] == ' ')
{
*p1 = arr[i + 2];
}
*p1 = arr[i++];
}
}
return 0;
}```
The variable i should be initialized, i = 0; otherwise i starts from some random value. You can just use fgets to read input, then use strcspn to remove the last \n character.
*p2 = arr[i--];
p2 is declared as a pointer and again is not initialized. It's pointing to some random address in memory and should not be dereferenced yet.
If you are not experienced enough with pointers, you can just use the array style indexing.
for (i = 0; i < MAX; i++)...
MAX is the maximum number of characters which can be stored in arr. But string's length can be less, it's determined by strlen. Example:
char arr[MAX];
printf("input:");
fgets(arr, sizeof(arr), stdin);
arr[strcspn(arr, "\n")] = 0;
size_t len = strlen(arr);
if(len == 0) return 0;
if (arr[0] == arr[len - 1])
printf("first and last match");
else
printf("first and last don't match");
If there are more than one word in arr, you can use strtok to break arr in to words, check each word individually.

Why am I getting a Seg-Fa

So I apologize for the general question. I haven't been able to find anything that speaks to my specific case. If there is something out there and I missed it, I'm sorry.
I am writing a function that reverses a string. It's for a project that comes with some pretty specific guidelines. I'm not allowed to use any functions such as malloc, printf etc and my function needs to return the string that is passed in as an argument. The function needs to be prototyped as follows:
char *ft_strrev(char *str);
This is my funtction:
char *ft_strrev(char *str)
{
int i;
int j;
char c;
i = 0;
j = ;
c = '0';
while(str[j] != '\0')
j++;
while(i != j)
{
c = str[i];
str[i] = str[j];
str[j] = c;
i++;
j--;
}
}
When I call this in a main and test it with putstr https://github.com/kigiri/userpref/blob/master/ft_42/ft_putstr.c it compiles fine, but at runtime I get a seg fault.
What am I doing wrong?
There are two problems with your code (apart from this j =; stuff.
After the first while look j points to the '\0' after the end of the string rather than to the last character of the string.
Condition of the second look does handle the situation when j - i is odd initially. for example, if i is 0 and j is 1 initially, then, after first iteration, i will be 1 and j will be 0, so condition will still be true.
Here is fixed code:
char *ft_strrev (char *str)
{
int i = 0, j = 0;
while (str [j] != '\0') j++;
while (i < --j) {
char t = str [i];
str [i++] = str [j];
str [j] = t;
}
return str;
}

C - Split a string at the whitespaces

I need to split a string where there are spaces (ex string: Hello this is an example string. into an array of words. I'm not sure what I'm missing here, I'm also curious as to what the best way to test this function is. The only library function allowed is malloc.
Any help is appreciated!
#include <stdlib.h>
char **ft_split(char *str) {
int wordlength;
int wordcount;
char **wordbank;
int i;
int current;
current = 0;
wordlength = 0;
//while sentence
while (str[wordlength] != '\0') {
//go till letters
while (str[current] == ' ')
current++;
//go till spaces
wordlength = 0;
while (str[wordlength] != ' ' && str[wordlength] != '\0')
wordlength++;
//make memory for word
wordbank[wordcount] = malloc(sizeof(char) * (wordlength - current + 1));
i = 0;
//fill wordbank current
while (i < wordlength - current) {
wordbank[wordcount][i] = str[current];
i++;
current++;
}
//end word with '\0'
wordbank[wordcount][i] = '\0';
wordcount++;
}
return wordbank;
}
There are multiple problems in your code:
You do not allocate an array for wordbank to point to, dereferencing an uninitialized pointer has undefined behavior.
Your approach to scanning the string is broken: you reset wordlength inside the loop so you keep re-scanning from the beginning of the string.
You should allocate an extra entry in the array for a trailing null pointer to indicate the end of the array to the caller.
Here is a modified version:
#include <stdlib.h>
char **ft_split(const char *str) {
size_t i, j, k, wordcount;
char **wordbank;
// count the number of words:
wordcount = 0;
for (i = 0; str[i]; i++) {
if (str[i] != ' ' && (i == 0 || str[i - 1] == ' ')) {
wordcount++;
}
}
// allocate the word array
wordbank = malloc((wordcount + 1) * sizeof(*wordbank));
if (wordbank) {
for (i = k = 0;;) {
// skip spaces
while (str[i] == ' ')
i++;
// check for end of string
if (str[i] == '\0')
break;
// scan for end of word
for (j = i++; str[i] != '\0' && str[i] != ' '; i++)
continue;
// allocate space for word copy
wordbank[k] = p = malloc(i - j + 1);
if (p == NULL) {
// allocation failed: free and return NULL
while (k-- > 0) {
free(wordbank[k]);
}
free(wordbank);
return NULL;
}
// copy string contents
memcpy(p, str + j, i - j);
p[i - j] = '\0';
}
// set a null pointer at the end of the array
wordbank[k] = NULL;
}
return wordbank;
}
You need to malloc() wordbank too. You can count the number for words, and then
wordbank = malloc((count + 1) * sizeof(*wordbank));
if (wordbank == NULL)
return NULL;
Note: sizeof(char) is 1 by definition. And sizeof *pointer is always what you want.

Copying a string into a new array

I'm trying to read a string in an array, and if a character is not any of the excluded characters int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h'); it should copy the character into a new array, then print it.
The code reads as:
void letter_remover (char b[])
{
int i;
char c[MAX];
int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h');
for (i = 0; b[i] != '\0'; i++)
{
if (b[i] != a)
{
c[i] = b[i];
}
i++;
}
c[i] = '\0';
printf("New string without forbidden characters: %s\n", c);
}
However it only prints New string without forbidden characters: h, if the inputted array is, for example hello. I'd like the output of this to be ll (with h, e and o removed).
Use this:
if (b[i] != 'a' && b[i] != 'e' && b[i] != 'i' && b[i] != 'o' && b[i] != 'u' && b[i] != 'y' && b[i] != 'w' && b[i] != 'h')
The boolean OR operator just returns 0 or 1, it doesn't create an object that automatically tests against all the parameters to the operator.
You could also use the strchr() function to search for a character in a string.
char a[] = "aeiouywh";
for (i = 0; b[i] != '\0'; i++)
{
if (!strchr(a, b[i]))
{
c[i] = b[i];
}
i++;
}
c[i] = '\0';
int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h');
...has an entirely different meaning than you expect. When you Boolean-OR together all those characters, a becomes 1. Since b[] contains no character value 1, no characters will be excluded. Also, your c[] is going to have empty slots if you had tested correctly.
You can use strcspn() to test if your string contains your forbidden characters. For example...
// snip
int i=0, j=0;
char * a = "aeiouywh";
while (b[i])
{
int idx = strcspn(&b[i], a);
if (idx >= 0)
{
if (idx > 0)
strncpy(&c[j], &b[i], idx);
j += idx;
i += idx + 1;
}
}
// etc...
Also, you must be sure c[] is large enough to contain all the characters that might be copied.

How do I allocate memory to my char pointer?

My assignment is to allow the user to enter any input and print the occurrences of letters and words, we also have to print out how many one letter, two, three, etc.. letter words are in the string. I have gotten the letter part of my code to work and have revised my word function several times, but still can't get the word finding function to even begin to work. The compiler says the char pointer word is undeclared when it clearly is. Do I have to allocate memory to it and the array of characters?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void findLetters(char *ptr);
void findWords(char *point);
int main()
{
char textStream[100]; //up to 98 characters and '\n\ and '\0'
printf("enter some text\n");
if (fgets(textStream, sizeof (textStream), stdin)) //input up to 99 characters
{
findLetters(textStream);
findWords(textStream);
}
else
{
printf("fgets failed\n");
}
return 0;
}
void findLetters(char *ptr) //find occurences of all letters
{
int upLetters[26];
int loLetters[26];
int i;
int index;
for (i = 0; i < 26; i++) // set array to all zero
{
upLetters[i] = 0;
loLetters[i] = 0;
}
i = 0;
while (ptr[i] != '\0') // loop until prt[i] is '\0'
{
if (ptr[i] >= 'A' && ptr[i] <= 'Z') //stores occurrences of uppercase letters
{
index = ptr[i] - 'A';// subtract 'A' to get index 0-25
upLetters[index]++;//add one
}
if (ptr[i] >= 'a' && ptr[i] <= 'z') //stores occurrences of lowercase letters
{
index = ptr[i] - 'a';//subtract 'a' to get index 0-25
loLetters[index]++;//add one
}
i++;//next character in ptr
}
printf("Number of Occurrences of Uppercase letters\n\n");
for (i = 0; i < 26; i++)//loop through 0 to 25
{
if (upLetters[i] > 0)
{
printf("%c : \t%d\n", (char)(i + 'A'), upLetters[i]);
// add 'A' to go from an index back to a character
}
}
printf("\n");
printf("Number of Occurrences of Lowercase letters\n\n");
for (i = 0; i < 26; i++)
{
if (loLetters[i] > 0)
{
printf("%c : \t%d\n", (char)(i + 'a'), loLetters[i]);
// add 'a' to go back from an index to a character
}
}
printf("\n");
}
void findWords(char *point)
{
int i = 0;
int k = 0;
int count = 0;
int j = 0;
int space = 0;
int c = 0;
char *word[50];
char word1[50][100];
char* delim = "{ } . , ( ) ";
for (i = 0; i< sizeof(point); i++) //counts # of spaces between words
{
if ((point[i] == ' ') || (point[i] == ',') || (point[i] == '.'))
{
space++;
}
}
char *words = strtok(point, delim);
for(;k <= space; k++)
{
word[k] = malloc((words+1) * sizeof(*words));
}
while (words != NULL)
{
printf("%s\n",words);
strcpy(words, word[j++]);
words = strtok(NULL, delim);
}
free(words);
}
This is because you are trying to multiply the pointer position+1 by the size of pointer. Change line 100 to:
word[k] = malloc(strlen(words)+1);
This will solve your compilation problem, but you still have other problems.
You've got a couple of problems in function findWords:
Here,
for (i = 0; i< sizeof(point); i++)
sizeof(point) is the same as sizeof(char*) as point in a char* in the function fincdWords. This is not what you want. Use
for (i = 0; i < strlen(point); i++)
instead. But this might be slow as strlen will be called in every iteration. So I suggest
int len = strlen(point);
for (i = 0; i < len; i++)
The same problem lies here too:
word[k] = malloc((words+1) * sizeof(*words));
It doesn't makes sense what you are trying with (words+1). I think you want
word[k] = malloc( strlen(words) + 1 ); //+1 for the NUL-terminator
You got the arguments all mixed up:
strcpy(words, word[j++]);
You actually wanted
strcpy(word[j++], words);
which copies the contents of words to word[j++].
Here:
free(words);
words was never allocated memory. Since you free a pointer that has not been returned by malloc/calloc/realloc, the code exhibits Undefined Behavior. So, remove that.
You allocated memory for each element of word. So free it using
for(k = 0; k <= space; k++)
{
free(word[k]);
}
Your calculation of the pointer position+1 is wrong. If you want the compilation problem will go away change line 100 to:
word[k] = malloc( 1 + strlen(words));

Resources