strtok, making the delimiter a range of ASCII values [duplicate]

strtok, making the delimiter a range of ASCII values [duplicate] - c

This question already has answers here:
How do I use strtok with every single nonalpha character as a delimeter? (C)
(4 answers)
Closed 7 years ago.
char *p_word;
p_word = strtok (p_input, " ,.-:\n1234567890");
while (p_word != NULL)
{
printf ("%s\n", p_word);
p_word = strtok (NULL, " ,.-:\n1234567890");
}
I'm reading in a text file and want to perform various functions on each word at a time, ignoring any characters that arent part of the alphabet.
I am wanting to know if there is a way instead of typing every single undesired character into the delimiter (e.g. " ,.-:\n1234567890"), that I can specify a range of ASCII decimal values I dont want, i.e. 0-64, or otherwise NOT alphabet characters.
Thanks
EDIT: I'm not allowed to use material that hasn't been taught so I dont think I can use functions from "ctype.h"

If you must use strtok, you can build a delimiter string like this (assumes ASCII character set) which excludes the alphabet.
char *p_word;
char delims[128];
int dindex;
int i;
dindex = 0;
for (i = 1; i < 'A'; i++)
delims[dindex++] = i;
for (i = 'Z' + 1; i < 'a'; i++)
delims[dindex++] = i;
for (i = 'z' + 1; i < 128; i++)
delims[dindex++] = i;
delims[dindex] = '\0';
p_word = strtok (p_input, delims);

You can write your own strtok function that will accept a predicate as the second parameter.
Of course you should use some other name for the function as you like.
Here is a demonstrative program. I have written a simplified predicate that checks any alpha ASCII character. You may use your own predicate.
#include <stdio.h>
char * strtok( char *s, int cmp( char ) )
{
static char *p;
if ( s ) p = s;
if ( p )
{
while ( *p && cmp( *p ) ) ++p;
}
if ( !p || !*p ) return NULL;
char *t = p++;
while ( *p && !cmp( *p ) ) ++p;
if ( *p ) *p++ = '\0';
return t;
}
int cmp( char c )
{
c |= 0x20;
return c < 'a' || c > 'z';
}
int main( void )
{
char s[] = " ABC123abc<>XYZ!##xyz";
char *p = strtok( s, cmp );
while ( p )
{
puts( p );
p = strtok( NULL, cmp );
}
}
The program output is
ABC
abc
XYZ
xyz
Using the predicate you can specify in it any rules for skipped characters.

Related

Recursive Function return ordered vector

I have a problem with the return of that function, it should return the vector with the typed sequence, but it's returning an ordered sequence, could you tell me where I'm going wrong?
Create a recursive function that takes a string as input and also its length. This function should return how many times the character the substring “hi” appears in the string.
Prohibited
The first line in the input contains the number n of test cases. Next, n lines occur, each containing a string of maximum length equal to 5000 characters
Exit
The output consists of n lines, each containing an integer that indicates the number of times the string “hi” occurs in the input string. All strings are written in lowercase letters.
Example
Prohibited
4
hipotenuse hipothermia hilux hifi
hi
hihihi
xavante hey
Exit
4
1
3
0
My code:
#include <stdio.h>
#include <string.h>
#define MAX 5000
int ocorrencias(const char *palavra, size_t len) {
char *st = "hi";
return len ?
(*st == *palavra) + ocorrencias(palavra + 1, len - 1) :
0;
}
int main() {
int n,i;
scanf("%d",&n);
char palavra[MAX];
int re[n];
for(i=0; i<=n; i++){
fgets(palavra, MAX, stdin);
re[i] = ocorrencias(palavra, strlen(palavra));
}
for(i=0; i<=n; i++){
printf("%d\n", re[i]);
}
}

For starters the for loops like this
for(i=0; i<=n; i++){
^^^^^
are wrong.
You need to write
for(i=0; i < n; i++){
^^^^^^
The function ocorrencias should be declared either like
size_t ocorrencias( const char *palavra );
or like
size_t ocorrencias( const char *palavra, const char *st );
instead of
int ocorrencias(const char *palavra, size_t len) ;
because it deals with strings and the function is able to determine lengths of passed strings.
It can be defined for example the following way
size_t ocorrencias( const char *palavra )
{
const char *st = "hi";
const char *p = strstr( palavra, st );
return p == NULL ? 0 : 1 + ocorrencias( p + strlen( st ) );
}
So in main you should write
size_t re[n];
//...
for ( i = 0; i < n; i++ )
{
printf( "%zu\n", re[i] );
}
Otherwise if you want to count substrings either "hi" or "hy" (according to your comment to the question) then the function can look the following way
size_t ocorrencias( const char *palavra )
{
const char *p = strchr( palavra, 'h' );
return p == NULL ? 0 : ( p[1] == 'i' || p[1] == 'y' ) + ocorrencias( p + 1 );
}

how to extract all tokens include null value from a string in C

Below code uses strtok function to extract tokens from string by delimeter ','.
ex.
#include<stdio.h>
#include <string.h>
int main() {
char string[200] = "$GNGNS,103600.01,5114.51176,N,00012.29380,W,45.6,,,V*00";
// Extract the first token
char * token = strtok(string, ",");
// loop through the string to extract all other tokens
while( token != NULL ) {
printf( " %s\n", token ); //printing each token
token = strtok(NULL, ",");
}
return 0;
}
The output is
$GNGNS
103600.01
5114.51176
N
00012.29380
W
45.6
V*00
Here the null value ,,, between 45.6 and V*00 are ignored. How can I print all tokens eventhrough it is null? like:
$GNGNS
103600.01
5114.51176
N
00012.29380
W
45.6
V*00

The function strtok does not consider an empty token to be a valid token. However, you can easily solve the problem using strchr instead of strtok:
#include <stdio.h>
#include <string.h>
int main()
{
const char *const string =
"$GNGNS,103600.01,5114.51176,N,00012.29380,W,45.6,,,V*00";
const char *p = string, *q;
//continue looping until there are no more ',' characters
while ( ( q = strchr( p, ',' ) ) != NULL )
{
//print token
printf( "%.*s\n", (int)(q-p), p );
//make p point one character past the ',', which is the
//first character of the next token, unless the next
//token is empty
p = q + 1;
}
//print last token
printf( "%s\n", p );
}
Output of program:
$GNGNS
103600.01
5114.51176
N
00012.29380
W
45.6
V*00
This solution also has the advantage that the string is allowed to be read-only. Therefore, I was able to define the string as a string literal instead of an array (I didn't have to, though). This would not have been possible with strtok, because that function is destructive in the sense that it overwrites the delimiters with null terminating characters, which requires the string to be writable.

I would write something simple instead of strtok.
char **split(char **argv, int *argc, char *string, const char delimiter, int allowempty)
{
*argc = 0;
do
{
if(*string && (*string != delimiter || allowempty))
{
argv[(*argc)++] = string;
}
while(*string && *string != delimiter) string++;
if(*string) *string++ = 0;
if(!allowempty)
while(*string && *string == delimiter) string++;
}while(*string);
return argv;
}
int main() {
char string[200] = "$GNGNS,103600.01,5114.51176,N,00012.29380,W,45.6,,,V*00";
char *args[20];
int nargs;
split(args, &nargs, string, ',', 1);
for(int x = 0; x < nargs; x++)
{
printf("ARG[%2d] = `%s`\n", x, args[x]);
}
return 0;
}
https://godbolt.org/z/vKqP4hvG4
Output:
ARG[ 0] = `$GNGNS`
ARG[ 1] = `103600.01`
ARG[ 2] = `5114.51176`
ARG[ 3] = `N`
ARG[ 4] = `00012.29380`
ARG[ 5] = `W`
ARG[ 6] = `45.6`
ARG[ 7] = ``
ARG[ 8] = ``
ARG[ 9] = `V*00`

strcat adds junk to the string

I'm trying to reverse a sentence, without changing the order of words,
For example: "Hello World" => "olleH dlroW"
Here is my code:
#include <stdio.h>
#include <string.h>
char * reverseWords(const char *text);
char * reverseWord(char *word);
int main () {
char *text = "Hello World";
char *result = reverseWords(text);
char *expected_result = "olleH dlroW";
printf("%s == %s\n", result, expected_result);
printf("%d\n", strcmp(result, expected_result));
return 0;
}
char *
reverseWords (const char *text) {
// This function takes a string and reverses it words.
int i, j;
size_t len = strlen(text);
size_t text_size = len * sizeof(char);
// output containst the output or the result
char *output;
// temp_word is a temporary variable,
// it contains each word and it will be
// empty after each space.
char *temp_word;
// temp_char is a temporary variable,
// it contains the current character
// within the for loop below.
char temp_char;
// allocating memory for output.
output = (char *) malloc (text_size + 1);
for(i = 0; i < len; i++) {
// if the text[i] is space, just append it
if (text[i] == ' ') {
output[i] = ' ';
}
// if the text[i] is NULL, just get out of the loop
if (text[i] == '\0') {
break;
}
// allocate memory for the temp_word
temp_word = (char *) malloc (text_size + 1);
// set j to 0, so we can iterate only on the word
j = 0;
// while text[i + j] is not space or NULL, continue the loop
while((text[i + j] != ' ') && (text[i + j] != '\0')) {
// assign and cast test[i+j] to temp_char as a character,
// (it reads it as string by default)
temp_char = (char) text[i+j];
// concat temp_char to the temp_word
strcat(temp_word, &temp_char); // <= PROBLEM
// add one to j
j++;
}
// after the loop, concat the reversed version
// of the word to the output
strcat(output, reverseWord(temp_word));
// if text[i+j] is space, concat space to the output
if (text[i+j] == ' ')
strcat(output, " ");
// free the memory allocated for the temp_word
free(temp_word);
// add j to i, so u can skip
// the character that already read.
i += j;
}
return output;
}
char *
reverseWord (char *word) {
int i, j;
size_t len = strlen(word);
char *output;
output = (char *) malloc (len + 1);
j = 0;
for(i = (len - 1); i >= 0; i--) {
output[j++] = word[i];
}
return output;
}
The problem is the line I marked with <= PROBLEM, On the first word which in this case is "Hello", it does everything just fine.
On the second word which in this case is "World", It adds junky characters to the temp_word,
I checked it with gdb, temp_char doesn't contain the junk, but when strcat runs, the latest character appended to the temp_word would be something like W\006,
It appends \006 to all of the characters within the second word,
The output that I see on the terminal is fine, but printing out strcmp and comparting the result with expected_result returns -94.
What can be the problem?
What's the \006 character?
Why strcat adds it?
How can I prevent this behavior?

strcat() expects addresses of the 1st character of "C"-strings, which in fact are char-arrays with at least one element being equal to '\0'.
Neither the memory temp_word points to nor the memory &temp_char points to meet such requirements.
Due to this the infamous undefined behaviour is invoked, anything can happen from then on.
A possible fix would be to change
temp_word = (char *) malloc (text_size + 1);
to become
temp_word = malloc (text_size + 1); /* Not the issue but the cast is
just useless in C. */
temp_word[0] = '\0';
and this
strcat(temp_word, &temp_char);
to become
strcat(temp_word, (char[2]){temp_char});
There might be other issues with the rest of the code.

The root cause of junk characters is you use wrong input for the 2nd argument of strcat function. see explain below:
At the beginning of your function you declare:
int i, j;
size_t len = strlen(text);
size_t text_size = len * sizeof(char);
// output containst the output or the result
char *output;
// temp_word is a temporary variable,
// it contains each word and it will be
// empty after each space.
char *temp_word;
// temp_char is a temporary variable,
// it contains the current character
// within the for loop below.
char temp_char;
you can print variable's addresses in stack, they will be something like this:
printf("&temp_char=%p,&temp_word=%p,&output=%p,&text_size=%p\n", &temp_char, &temp_word,&output,&text_size);
result:
&temp_char=0x7ffeea172a9f,&temp_word=0x7ffeea172aa0,&output=0x7ffeea172aa8,&text_size=0x7ffeea172ab0
As you can see, &temp_char(0x7ffeea172a9f) is at the bottom of the stack, next 1 byte is &temp_word(0x7ffeea172aa0), next 8 bytes is &output(0x7ffeea172aa8), and so on(I used 64bit OS, so it takes 8 bytes for a pointer)
// concat temp_char to the temp_word
strcat(temp_word, &temp_char); // <= PROBLEM
refer strcat description here: http://www.cplusplus.com/reference/cstring/strcat/
the strcat second argument = &temp_char = 0x7ffeea172a9f. strcat considers that &temp_char(0x7ffeea172a9f) is the starting point of the source string, instead of adding only one char as you expect it will append to temp_word all characters starting from &temp_char(0x7ffeea172a9f) , until it meets terminating null character

The function strcat deals with strings.
In this code snippet
// assign and cast test[i+j] to temp_char as a character,
// (it reads it as string by default)
temp_char = (char) text[i+j];
// concat temp_char to the temp_word
strcat(temp_word, &temp_char); // <= PROBLEM
neither the pointer temp_word nor the pointer &temp_char points to a string.
Moreover array output is not appended with the terminating-zero character for example when the source string consists from blanks.
In any case your approach is too complicated and has many redundant code as for example the condition in the for loop and the condition in the if statement that duplicate each other.
for(i = 0; i < len; i++) {
//…
// if the text[i] is NULL, just get out of the loop
if (text[i] == '\0') {
break;
}
The function can be written simpler as it is shown in the demonstrative program below.
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
char * reverse_words( const char *s )
{
char *result = malloc( strlen( s ) + 1 );
if ( result != NULL )
{
char *p = result;
while ( *s != '\0' )
{
while ( isblank( ( unsigned char )*s ) )
{
*p++ = *s++;
}
const char *q = s;
while ( !isblank( ( unsigned char )*q ) && *q != '\0' ) ++q;
for ( const char *tmp = q; tmp != s; )
{
*p++ = *--tmp;
}
s = q;
}
*p = '\0';
}
return result;
}
int main(void)
{
const char *s = "Hello World";
char *result = reverse_words( s );
puts( s );
puts( result );
free( result );
return 0;
}
The program output is
Hello World
olleH dlroW

My program which creates a abbreviation of an char array, does not print anything. Where is my mistake?

I am supposed to create a program, which creates an array with the abbreviation of an constant char Array. While my program does not return any errors, it also does not print any characters at my certain printf spots. Because of that I assume that my program does not work properly, and it isn't filling my array with any characters.
void abbrev(const char s[], char a[], size_t size) {
int i = 0;
while (*s != '\0') {
printf('%c', *s);
if (*s != ' ' && *s - 1 == ' ') {
a[i] = *s;
i++;
printf('%c', a[i]);
}
s++;
}
}
void main() {
char jordan1[60] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
system("PAUSE");
}
The actual result is nothing. At least I assume so, since my console isn't showing anything. The result should be "EFF" and the size_t size is supposed to limit my char array a, in case the abbreviation is too long. So it should only implement the letters until my array is full and then the '\0', but I did not implement it yet, since my program is apparantly not filling the array at all.

#include <stdio.h>
#include <ctype.h>
/* in: the string to abbreviate
out: output abbreviation. Function assumes there's enough room */
void abbrev(const char in[], char out[])
{
const char *p;
int zbPosOut = 0; /* current zero-based position within the `out` array */
for (p = in; *p; ++p) { /* iterate through `in` until we see a zero terminator */
/* if the letter is uppercase
OR if (the letter is alphabetic AND we are not at the zero
position AND the previous char. is a space character) OR if the
letter is lowercase and it is the first char. of the array... */
if (isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)) {
out[zbPosOut++] = *p; /* ... then the letter is the start letter
of a word, so add it to our `out` array, and
increment the current `zbPosOut` */
}
}
out[zbPosOut] = 0; /* null-terminate the out array */
}
This code says a lot in few lines. Let's take a look:
isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)
If the current character (*p) is an uppercase character OR if it is alphabetc (isalpha(*p) and the previous character p[-1] is a space, then we may consider *p to be the first character of a word, and it should be added to our out array. We include the test (p - in) > 0 because if p == in, then we are at the zero position of the array and therefore p[-1] is undefined.
The order in this expression matters a lot. If we were to put (p - in) > 0 after the isspace(p[-1]) test, then we would not be taking advantage of the laziness of the && operator: as soon as it encounters a false operand, the following operand is not evaluated. This is important because if p - in == 0, then we do not want to evaluate the isspace(p[-1]) expression. The order in which we have written the tests makes sure that isspace(p[-1]) is evaluated after making sure we are not at the zero position.
The final expression (islower(*p) && p == in) handles the case where the first letter is lowercase.
out[zbPosOut++] = *p;
We append the character *p to the out array. The current position of out is kept track of by the zbPosOut variable, which is incremented afterwards (which is why we use postscript ++ rather than prefix).
Code to test the operation of abbrev:
int main()
{
char jordan1[] = " electronic frontier foundation ";
char out[16];
abbrev(jordan1, out);
puts(out);
return 0;
}
It gives eff as the output. For it to look like an acronym, we can change the code to append the letter *p to out to:
out[zbPosOut++] = toupper(*p);
which capitalizes each letter added to the out array (if *p is already uppercase, toupper just returns *p).

void print_without_duplicate_leading_trailing_spaces(const char *str)
{
while(*str == ' ' && *str) str++;
while(*str)
{
if(*str != ' ' || (*str == ' ' && *(str + 1) != ' ' && *str))
{
putchar(*str);
}
str++;
}
}

What you want to do could be simplified with a for() loop.
#include <stdio.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
int pos = 0;
// Loop for every character in 's'.
for (int i = 0; i < strlen(s); i++)
// If the character just before was a space, and this character is not a
// space, and we are still in the size bounds (we subtract 1 for the
// terminator), then copy and append.
if (s[i] != ' ' && s[i - 1] == ' ' && pos < size - 1)
a[pos++] = s[i];
printf("%s\n", a); // Print.
}
void main() {
char jordan1[] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
}
However, I don't think this is the best way to achieve what you are trying to do. Firstly, char s[0] cannot be gotten due to the check on the previous character. Which brings me to the second reason: On the first index you will be checking s[-1] which probably isn't a good idea. If I were implementing this function I would do this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
char *str = strdup(s); // Make local copy.
size_t i = 0;
// Break it up into words, then grab the first character of each word.
for (char *w = strdup(strtok(str, " ")); w != NULL; w = strtok(NULL, " "))
if (i < size - 1)
a[i++] = w[0];
free(str); // Release our local copy.
printf("%s\n", a);
}
int main() {
char jordan1[] = "Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
return 0;
}

Convert char* string to upper

Following code is supposed to return the upper case string of the source.
It works but does not convert the string. Could not figure out what was wrong.
char *StrUpper (char *s) {
int i = 0;
char *t = &s [i];
while (*t) {
if ((*t > 0x5a) && (*t < 0x7b)) t = (t - 32);
t = &s [i++];
}
return (s);
}
int main () {
printf ("%s\n", StrUpper ("lower case string"));
return (0);
}

A string literal is a constant pointer to an array of constant characters. In other words, string literals are read only.
Trying to modify constant data leads to undefined behavior. And if your program have undefined behavior, nothing about its behavior can be trusted at all.

You have a few problems in your code:
You forgot to dereference the string in the assignment:
t = (t - 32);
should be
*t = (*t - 32);
You don't check the correct range:
if ((*t > 0x5a) && (*t < 0x7b))
should be
if ((*t > 0x6a) && (*t < 0x7b))
or, even better
if ((*t >= 'a') && (*t <= 'z'))
You go over the first character twice:
t = &s [i++];
should be
t = &s [++i];
or simply
t++;

First of all these two statements
if ((*t > 0x5a) && (*t < 0x7b)) t = (t - 32);
t = &s [i++];
has no sense.
This expression statement
t = (t - 32);
changes the pointer itself. And this statement
t = &s [i++];
does not change the pointer at once.
Moreover it is not clear what these magic numbers, 0x5a, 0x7b, 32 mean.
The function could be written like
char * StrUpper ( char *s )
{
for ( char *p = s; *p; ++p )
{
if ( *p >= 'a' && *p <= 'z' ) *p -= 'a' - 'A';
}
return s;
}
Here is an example of using the function
#include <iostream>
char * StrUpper ( char *s )
{
for ( char *p = s; *p; ++p )
{
if ( *p >= 'a' && *p <= 'z' ) *p -= 'a' - 'A';
}
return s;
}
int main()
{
char s[] = "hello";
std::cout << s << std::endl;
std::cout << StrUpper( s ) << std::endl;
return 0;
}
The output is
hello
HELLO
Take into account that some coding systems do not guarantee that all lower case letters follow each other without intermediate codes. So in general this approach is not correct.

you might also want to change
t = &s[i++];
to
t = &s[++i];
because , i++ first assigns current i value to the index then performs the increment, you end up processing the same index over again the first time.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

strtok, making the delimiter a range of ASCII values [duplicate] - c

Related

Recursive Function return ordered vector

how to extract all tokens include null value from a string in C

strcat adds junk to the string

My program which creates a abbreviation of an char array, does not print anything. Where is my mistake?

Convert char* string to upper

Categories

Resources