Merge two specific columns from a line in C language - c

I want to merge two specific columns from a line in C language. The line is like "hello world hello world". It consists of some words and some white space. The below is my code. In this function, c1 and c2 represent the number of the column, and array key is mergeed string. But it's not good to run.
char *LinetoKey(char *line, int c1, int c2, char key[COLSIZE]){
char *col2 = (char *)malloc(sizeof(char));
while (*line != '\0' && isspace(*line) )
line++;
while(*line != '\0' && c1 != 0){
if(isspace(*line)){
while(*line != '\0' && isspace(*line))
line++;
c1--;
c2--;
}else
line++;
}
while (*line != '\0' && *line != '\n' && (isspace(*line)==0))
*key++ = *line++;
*key = '\0';
while(*line != '\0' && c2 != 0){
if(isspace(*line)){
while(*line != '\0' && isspace(*line))
line++;
c2--;
}else
line++;
}
while (*line != '\0' && *line != '\n' && isspace(*line)==0)
*col2++ = *line++;
*col2 = '\0';
strcat(key,col2);
return key;
}

Here's a possible solution using strtok(). It can handle an arbitrary number of columns (increase the size of buf if necessary), and will still work if the order of the columns is reversed (i.e. c1 > c2). The function returns 1 on success (tokens successfully merged), 0 otherwise.
Note that strtok() modifies its argument - so I have copied input to a temporary buffer, char buf[64].
/*
* Merge space-separated 'tokens' in a string.
* Columns are zero-indexed.
*
* Return: 1 on success, 0 on failure
*/
int merge_cols(char *input, int c1, int c2, char *dest) {
char buf[64];
int col = 0;
char *tok = NULL, *first = NULL, *second = NULL, *tmp = NULL;
if (c1 == c2) {
fprintf(stderr, "Columns can not be the same !");
return 0;
}
if (strlen(input) > sizeof(buf) - 1) return 0;
/*
* strtok() is _destructive_, so copy the input to
* a buffer.
*/
strcpy(buf, input);
tok = strtok(buf, " ");
while (tok) {
if (col == c1 || col == c2) {
if (!first)
first = tok;
else if (first && !second)
second = tok;
}
if (first && second) break;
tok = strtok(NULL, " ");
col++;
}
// In case order of columns is swapped ...
if (c1 > c2) {
tmp = second;
second = first;
first = tmp;
}
if (first) strcpy(dest, first);
if (second) strcat(dest, second);
return first && second;
}
Sample usage:
char *input = "one two three four five six seven eight";
char dest[128];
// The columns can be reversed ...
int merged = merge_cols(input, 7, 1, dest);
if (merged)
puts(dest);
Note also that it is very easy to use different delimiters when using strtok() - so if you wanted to use comma- or tab-separated input instead of spaces, you just change the second argument when calling it.

It is not clear what you're trying to do. If what David Collins suggested is what you're looking for - word concatenation by word index - here's a starting point (demo):
Your function must minimize the number of string traversals. To help this, the code below is using char** instead of char* (sort of "char streams").
The function must be able to count the number of characters the result will have, prior to actual concatenation, in order to be able to allocate the destination string on free store. If catwords is called with null destination, it only counts the length of the result string.
Regarding the actual implementation, you will have to traverse the string word by word and decide whether to copy or skip the word. See the code below for the following functions:
nextword - skips white-spaces until it finds a non-white-space character.
copyword - copies the current word if destination is valid or skips it if not. It returns the number of copied/skipped characters.
#include <ctype.h>
#include <stdlib.h>
void nextword( const char** ps )
{
while ( **ps && isspace( **ps ) )
++*ps;
}
int copyword( char** const pd, const char** ps )
{
// remember the starting point
const char* b = *ps;
// actual copy
if ( pd && *pd ) while ( **ps && !isspace( **ps ) )
*( *pd )++ = *( *ps )++;
// skip the word (no destination)
else while ( **ps && !isspace( **ps ) )
( *ps )++;
// return the length
return *ps - b;
}
int catwords( char* d, const char* s, const int* c )
{
int len = 0;
int iw = 0;
int ic = 0;
const char** ps = &s;
char** pd = &d;
for ( nextword( ps ); **ps && c[ ic ] > -1; nextword( ps ), ++iw )
if ( iw == c[ ic ] )
{
len += copyword( pd, ps );
++ic;
}
else
{
copyword( 0, ps ); // just skip the current word
}
if ( d )
**pd = '\0';
return len;
}
int main()
{
// static buffer test
{
char d[ 1024 ];
int t[] = { 0, 3, -1 };
catwords( d, "Hello world. Hello world!", t );
puts( d );
}
// dynamic buffer test
{
const char* s = "The greatness of a man is not in how much wealth he acquires, but in his integrity and his ability to affect those around him positively.";
int t[] = { 1, 5, 16, -1 };
int dstcharcount = catwords( 0, s, t ) + 1;
char* d = (char*)malloc( dstcharcount * sizeof( char ) );
catwords( d, s, t );
puts( d );
free( d );
}
return 0;
}

Related

How to count how many word in string?

I want to know how to count how many words are in a string.
I use strstr to compare and it works but only works for one time
like this
char buff = "This is a real-life, or this is just fantasy";
char op = "is";
if (strstr(buff,op)){
count++;
}
printf("%d",count);
and the output is 1 but there are two "is" in the sentence, please tell me.
For starters you have to write the declarations at least like
char buff[] = "This is a real-life, or this is just fantasy";
const char *op = "is";
Also if you need to count words you have to check whether words are separated by white spaces.
You can do the task the following way
#include <string.h>
#include <stdio.h>
#include <ctype.h>
//...
size_t n = strlen( op );
for ( const char *p = buff; ( p = strstr( p, op ) ) != NULL; p += n )
{
if ( p == buff || isblank( ( unsigned char )p[-1] ) )
{
if ( p[n] == '\0' || isblank( ( unsigned char )p[n] ) )
{
count++;
}
}
}
printf("%d",count);
Here is a demonstration program.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void)
{
char buff[] = "This is a real-life, or this is just fantasy";
const char *op = "is";
size_t n = strlen( op );
size_t count = 0;
for ( const char *p = buff; ( p = strstr( p, op ) ) != NULL; p += n )
{
if ( p == buff || isblank( ( unsigned char )p[-1] ) )
{
if ( p[n] == '\0' || isblank( ( unsigned char )p[n] ) )
{
count++;
}
}
}
printf( "The word \"%s\" is encountered %zu time(s).\n", op, count );
return 0;
}
The program output is
The word "is" is encountered 2 time(s).
Parse the string, in a loop.
As OP has "but there are two "is" in the sentence", it is not enough just to look for "is" as that occurs 4x, twice in "This". Code needs to parse the string for the idea of a "word".
Case sensitively is also a concern.
char buff = "This is a real-life, or this is just fantasy";
char op = "is";
char *p = buff;
char *candidate;
while ((candidate = strstr(p, op)) {
// Add code to test if candidate is a stand-alone word
// Test if candidate is beginning of buff or prior character is a white-space.
// Test if candidate is end of buff or next character is a white-space/punctuation.
p += strlen(op); // advance
}
For me, I would not use strstr(), but look for "words" with isalpha().
// Concept code
size_t n = strlen(op);
while (*p) {
if (isalpha(*p)) { // Start of word
// some limited case insensitive compare
if (strnicmp(p, op, n) == 0 && !isalpha(p[n]) {
count++;
}
while (isalpha(*p)) p++; // Find end of word
} else {
p++;
}
}

char* gets corrupted on the final iteration of a loop

So, I'm trying to build a string_split function to split a c-style string based on a delimiter.
Here is the code for the function:
char** string_split(char* input, char delim)
{
char** split_strings = malloc(sizeof(char*));
char* charPtr;
size_t split_idx = 0;
int extend = 0;
for(charPtr = input; *charPtr != '\0'; ++charPtr)
{
if(*charPtr == delim || *(charPtr+1) == '\0')
{
if(*(charPtr+1) == '\0') extend = 1; //extend the range by one for the null byte at the end
char* string_element = calloc(1, sizeof(char));
for(size_t i = 0; input != charPtr+extend; ++input, ++i)
{
if(string_element[i] == '\0')
{
//allocate another char and add a null byte to the end
string_element = realloc(string_element, sizeof(char) * (sizeof(string_element)/sizeof(char) + 1));
string_element[i+1] = '\0';
}
string_element[i] = *input;
}
printf("string elem: %s\n", string_element);
split_strings[split_idx++] = string_element;
//allocate another c-string if we're not at the end of the input
split_strings = realloc(split_strings, sizeof(char*) *(sizeof(split_strings)/sizeof(char*) + 1));
//skip over the delimiter
input++;
extend = 0;
}
}
free(charPtr);
free(input);
return split_strings;
}
Essentially, the way it works is that there are two char*, input and charPtr. charPtr counts up from the start of the input string the the next instance of the delimiter, then input counts from the previous instance of the delimiter (or the start of the input string), and copies each char into a new char*. once the string is built it is added to a char** array.
There are also some twiddly bits for skipping over delimiters and dealing with the end points of the input string. the function is used as so:
int main()
{
char* str = "mon,tue,wed,thur,fri";
char delim = ',';
char** split = string_split(str, delim);
return 1;
}
Anyway, it works for the most part, except the first char* in the returned char** array is corrupted, and is just occupied by random junk.
For example printing the elements of split from main yields:
split: α↨▓
split: tue
split: wed
split: thur
split: fri
Whats odd is that the contents of split_strings[0], the array of char* which returns the desired tokens is mon, as it should be for this example, right up until the final loop of the main for-loop in the string_split function, specifically its the line:
split_strings[split_idx++] = string_element;
which turns its contents from mon to Junk. Any help is appreciated, thanks.
Your function is incorrect at least because it tries to free the passed string
char** string_split(char* input, char delim)
{
//...
free(charPtr);
free(input);
return split_strings;
}
that in the case of your program is a string literal
char* str = "mon,tue,wed,thur,fri";
char delim = ',';
char** split = string_split(str, delim);
You may not free a string literal.
And the first parameter shall have the qualifier const.
There are numerous other errors in your function.
For example the expression sizeof(string_element)/sizeof(char) used in this statement
string_element = realloc(string_element, sizeof(char) * (sizeof(string_element)/sizeof(char) + 1));
does not yield the number of characters early allocated to the array pointed to by the pointer string_element. And there is no great sense to reallocate the array for each new character.
The function can look for example the following way as it is shown in the demonstrative program below.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char ** string_split( const char *s, char delim )
{
size_t n = 1;
char **a = calloc( n, sizeof( char * ) );
while ( *s )
{
const char *p = strchr( s, delim );
if ( p == NULL ) p = s + strlen( s );
if ( p != s )
{
char *t = malloc( p - s + 1 );
if ( t != NULL )
{
memcpy( t, s, p - s );
t[p-s] = '\0';
}
char **tmp = realloc( a, ( n + 1 ) * sizeof( char * ) );
if ( tmp == NULL )
{
free( t );
break;
}
a = tmp;
a[n-1] = t;
a[n] = NULL;
++n;
}
s = p + ( *p != '\0' );
}
return a;
}
int main(void)
{
char* str = "mon,tue,wed,thur,fri";
char delim = ',';
char **split = string_split( str, delim );
for ( char **p = split; *p != NULL; ++p )
{
puts( *p );
}
for ( char **p = split; *p != NULL; ++p ) free( *p );
free( split );
return 0;
}
The program output is
mon
tue
wed
thur
fri
The final function for anyone wonder, should be pretty fool proof.
char** string_split(char* input, char delim, bool skip_delim)
{
assert(*input != '\0');
char** split_strings = malloc(sizeof(char*));
char* charPtr;
size_t split_idx = 0;
size_t num_allocated_strings = 1;
size_t extend = 0;
for(charPtr = input; *charPtr != '\0'; ++charPtr)
{
if(*charPtr == delim || *(charPtr+1) == '\0')
{
if(*(charPtr+1) == '\0') extend = 1; //extend the range by one for the null byte at the end
char* string_element = calloc(1, sizeof(char));
for(size_t i = 0; input != charPtr+extend; ++input, ++i)
{
if(string_element[i] == '\0')
{
//allocate another char and add a null byte to the end
char* temp = realloc(string_element, sizeof(char) * (strlen(string_element) + 1));
if(temp == NULL)
{
free(string_element);
free(split_strings);
break;
}
string_element = temp;
string_element[i+1] = '\0';
}
string_element[i] = *input;
}
split_strings[split_idx++] = string_element;
num_allocated_strings++;
//allocate another c-string
char** temp = realloc(split_strings, sizeof(char*) * num_allocated_strings);
if(temp == NULL)
{
free(string_element);
free(split_strings);
break;
}
split_strings = temp;
//skip over the delimiter if required
if(skip_delim) input++;
extend = 0;
}
}
split_strings[num_allocated_strings-1] = NULL;
return split_strings;
}

Scanning group of integers in the form of (x,y) in c

I am trying to take inputs from the standard input in the form (a,b) (c,d) (e, f) (g,h) and will stop taking input if an empty line is added. I need these inputs tuple by tuple like (a,b) first then I perform some computation with it like add in the binary tree and add (c,d) then (e, f) and so on in the following way:
insert_in_tree(&tree->root,&tree->root,a, b);
I know how to accept integers till empty line is added which i do in the following way:
AVLTree *CreateAVLTree(const char *filename)
{
// put your code here
AVLTree *tree = newAVLTree();
int key, value;
if(strcmp(filename, "stdin") == 0){
char str[1024]={};
printf("Enter your values");
while( fgets(str, 1024, stdin) && strlen(str) && str[0] != '\n' ){
printf("string %s", str);
sscanf(str, "%d, %d", &key, &value);
//int key = atoi(str);
printf("This is key you entered %d\n", key);
printf("This is value you entered %d\n", value);
}
}else{
FILE* file = fopen(filename, "r"); // open a file
if(file == NULL) {
return NULL; // error checking
}
while (fscanf (file, " (%d,%d)", &key, &value) == 2) // check for number of conversions
// space added here ^
{
insert_in_tree_q5(&tree->root,&tree->root, key, value);
//printf("%d,%d\n", key, value);
}
fclose(file);
//node = tree->root;
}
return tree;
}
but i am not sure how use this to solve my problem stated above.
I'm not a fan of using scanf() et.al. to parse data, as a simple scanf("%d,%d") tends to be error prone with differing user input.
My general approach when dealing with known formatting characters (like (, ,, )), is to find them first with strchr(), validate they're somewhat sensible, and only then try to extract the value.
In the code below, I locate the parentheses and comma, then copy out the possibly numeric data in between, before handing it off to strtol() for converting the integer string to a numeric representation.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_NUMBER_LEN 32
/*
** Given a string which contains (somewhere) a pair
** of numbers in the form "... (x, y) ...", parse the pair
** into val1 and val2 respectively.
**
** Returns the point at which the input ended successfully
** or NULL on error
*/
const char *parseTuple(const char *input, int *val1, int *val2)
{
char *result = NULL;
char val1_str[ MAX_NUMBER_LEN+1 ] = { '\0' };
char val2_str[ MAX_NUMBER_LEN+1 ] = { '\0' };
// Find the first '('
char *left_paren = strchr( input, '(' );
char *right_paren = strchr( input, ')' );
char *comma = strchr( input, ',' );
// validate the things we found exist, and are in valid positions
if ( left_paren != NULL && right_paren != NULL && comma != NULL && // needed parts exist
left_paren < comma && comma < right_paren ) // in the correct order
{
// val1 source string exists between left_paren+1 and comma-1
int val1_len = comma-1 - left_paren+1 - 1;
if ( val1_len > 0 && val1_len < MAX_NUMBER_LEN )
{
strncpy( val1_str, left_paren+1, val1_len );
val1_str[ val1_len ] = '\0';
}
// val2 source string exists between comma+1 and right_paren-1
int val2_len = right_paren-1 - comma+1 - 1;
if ( val2_len > 0 && val2_len < MAX_NUMBER_LEN )
{
strncpy( val2_str, comma+1, val2_len );
val2_str[ val2_len ] = '\0';
}
// If we extracted some reasonable numbers, try to parse them
if ( val1_str[0] != '\0' && val2_str[0] != '\0' )
{
*val1 = strtol( val1_str, NULL, 10 );
*val2 = strtol( val2_str, NULL, 10 );
// TODO handle errno when string is not a number
// if errono did not indicate a strol() failure
result = right_paren+1; // point to the next input location, so we can call again
}
}
return result;
}
int main(int argc, char **argv)
{
const char *input;
int val1;
int val2;
for (int i=1; i<argc; i++)
{
input = argv[i];
do
{
printf( "From input of: [%s]\n" , input );
input = parseTuple( input, &val1, &val2 );
if ( input != NULL )
printf( " Parsed out: (%3d,%3d)\n", val1, val2 );
} while ( input != NULL && strlen( input ) );
}
return 0;
}
Giving the test-run:
$ ./parse_tuple '(-3, 2)' '(1,1)(11111111111111111111111111111111111111111111111111111111111111111111,0) () (,)' '(' '()' ')' '(,)' '(-12,)' '(123,456)'
From input of: [(-3, 2)]
Parsed out: ( -3, 2)
From input of: [(1,1)(11111111111111111111111111111111111111111111111111111111111111111111,0) () (,)]
Parsed out: ( 1, 1)
From input of: [(11111111111111111111111111111111111111111111111111111111111111111111,0) () (,)]
From input of: [(]
From input of: [()]
From input of: [)]
From input of: [(,)]
From input of: [(-12,)]
From input of: [(123,456)]
Parsed out: (123,456)

How to delete Text in C?

Here is basically the problem. I am given a huge file. A text, that has a lot of blank spaces. I must write a program that removes the blank spaces, creates lines of exactly 80 characters long without splitting any word, and it will align the text to left and right simultaneously (justify text); The text is justified by placing additional spaces between words so that the line will end with a word and start with word, being exactly 80 chars long.
Yes this is a homework, but I am allowed to get any kind of online help. My code this far is able to do everything but align the text (justify):
Code:
#include <stdio.h>
#include "catalin.h"
int main()
{
char text[145000], blank[1450000],c;
FILE *input, *output;
int n,f=80,i=0,j,l;
input = fopen("asimov.in", "r");
while ((c=fgetc(input))!=EOF){
if (c=='\n') c=' ';
text[i]=c;
i++;
}
fclose(input);
blankremove(text,blank);
wrap(blank,f);
l=lenght(blank);
output = fopen("out.out", "w");
fprintf(output,blank);
}
int blankremove(char text[], char blank[])
{
int c = 0, d = 0;
while (text[c] != '\0') {
if (text[c] == ' ') {
int temp = c + 1;
if (text[temp] != '\0') {
while (text[temp] == ' ' && text[temp] != '\0') {
if (text[temp] == ' ') {
c++;
}
temp++;
}
}
}
blank[d] = text[c];
c++;
d++;
}
blank[d] = '\0';
}
void wrap(char s[], const int wrapline)
{
int i, k, wraploc, lastwrap;
lastwrap = 0;
wraploc = 0; //catalin
for (i = 0; s[i] != '\0'; ++i, ++wraploc) {
if (wraploc >= wrapline) {
for (k = i; k > 0; --k) {
// posibil are overflow
if (k - lastwrap <= wrapline && s[k] == ' ') {
s[k] = '\n';
lastwrap = k+1;
break;
}
}
wraploc = i-lastwrap;
}
}
for (i = 0; i < wrapline; ++i) printf(" ");
printf("|\n");
}
All I need is some help on creating a function that justifies the text. "justified—text is aligned along the left margin, and letter- and word-spacing is adjusted so that the text falls flush with both margins, also known as fully justified or full justification;" The spaces created when doing justification should be placed uniformly. No libraries should be used other than the default.
Ignoring the many bugs in your existing code, you need to think about what you're trying to achieve.
Think about a more simple example to start with. Say your source text is "Hello world" and you're justifying it to a width of 15. "Hello world" is 11 characters long, which is 4 less than we need. There is 1 space in the string, so you know you need to make that space become 5 spaces so that it becomes "Hello world".
Next example: "I like bees!" - that is 12 characters, but it has 2 spaces and you need an extra 3 spaces in there. One of those spaces has to become 2 spaces and the other 3 spaces to fill out the 15 characters.
So your code needs to firstly, count how many spaces are in the line you're currently working with. You can do that whilst you're working out where to wrap the line and also if you track where the last space is, you don't then need to back track to find it again.
Secondly, know how many extra characters it needs to pad it out by.
And finally find the spaces within the line and add extra spaces evenly amongst them. You'd be better off working with a new string at this point because whilst it's possible to insert spaces into s, it's complicated and more likely to introduce more bugs.
Using fscanf will read words and exclude whitespace.
Then add words while the length of the line is less than 80.
Add extra spaces to right justify the line.
#include <stdio.h>
int len ( char *str);
char *cat ( char *to, char *from);
char *cpy ( char *to, char *from);
char *lastchr ( char *str, int ch);
char *justify ( char *str, int wordcount, int width);
int main( void) {
char word[100] = "";
char line[100] = "";
char filenamein[] = "asimov.in";
char filenameout[] = "out.out";
int length = 0;
int wordcount = 0;
int pending = 0;
FILE *pfin = NULL;
FILE *pfout = NULL;
if ( NULL == ( pfin = fopen ( filenamein, "r"))) {
perror ( filenamein);
return 0;
}
if ( NULL == ( pfout = fopen ( filenameout, "w"))) {
fclose ( pfin);
perror ( filenameout);
return 0;
}
while ( 1 == fscanf ( pfin, "%99s", word)) {//read a word from file. will exclude whitespace
length = len ( word);
if ( 80 > len ( line) + length) {//add to line if it will fit
cat ( line, word);
cat ( line, " ");
wordcount++;//needed in case more than one extra space per word
pending = 1;
}
else {//adding last word would be more than 80
justify ( line, wordcount, 80);
fprintf ( pfout, "%s\n", line);
cpy ( line, word);//copy pending word to line
cat ( line, " ");//add a space
wordcount = 1;//reset wordcount
pending = 0;
}
}
if ( pending) {
justify ( line, wordcount, 80);
fprintf ( pfout, "%s\n", line);
}
fclose ( pfin);
fclose ( pfout);
return 0;
}
int len ( char *str) {
int length = 0;
while ( *str) {//not at terminating zero
length++;
str++;
}
return length;
}
char *cat ( char *to, char *from) {
char *start = to;
while ( *to) {//not at terminating zero
to++;
}
while ( *from) {
*to = *from;//assign from to to
to++;
from++;
}
*to = 0;//terminate
return start;
}
char *cpy ( char *to, char *from) {
*to = 0;//set first character of to as terminating zero
cat ( to, from);
return to;
}
char *lastchr ( char *str, int ch) {
char *found = NULL;
while ( *str) {//not at terminating zero
if ( ch == *str) {
found = str;//set pointer
}
str++;//keep searching
}
return found;//return NULL or last found match
}
char *justify ( char *str, int wordcount, int width) {
int length = 0;
int addspaces = 0;
int extraspace = 0;
char *space = lastchr ( str, ' ');//find the last space
*space = 0;//set it to terminate the line
space--;//deduct one
length = len ( str);
addspaces = width - length;//difference is number of spaces needed
extraspace = addspaces / wordcount;//may need more than one extra space
char *end = space + addspaces;
while ( addspaces) {
*end = *space;//shift characters toward end
if ( ' ' == *space) {//found a space
for ( int each = 0; each <= extraspace; ++each) {//will add at least one space
end--;
*end = ' ';
addspaces--;
if ( ! addspaces) {
break;//do not need to add more spaces
}
}
}
end--;
space--;
if ( space <= str) {//reached the start of the line
break;
}
}
return str;
}
EDIT:
#include <stdio.h>
#define WIDTH 80
#define SIZE ( WIDTH + 20)
int len ( char *str);
char *cat ( char *to, char *from);
char *cpy ( char *to, char *from);
char *lastchr ( char *str, int ch);
char *justify ( char *str, int wordcount, int width);
int scanword ( FILE *pfread, int size, char *word);
int main( void) {
char word[SIZE] = "";
char line[SIZE] = "";
char filenamein[] = "asimov.in";
char filenameout[] = "out.out";
int length = 0;
int wordcount = 0;
int pending = 0;
//int paragraph = 1;
FILE *pfin = NULL;
FILE *pfout = NULL;
if ( NULL == ( pfin = fopen ( filenamein, "r"))) {
perror ( filenamein);
return 0;
}
if ( NULL == ( pfout = fopen ( filenameout, "w"))) {
fclose ( pfin);
perror ( filenameout);
return 0;
}
while ( 1 == scanword ( pfin, WIDTH, word)) {//read a word from file
length = len ( word);
if ( '\n' != word[0] && WIDTH > len ( line) + length) {//add to line if it will fit
if ( 0 != word[0]) {
cat ( line, word);
cat ( line, " ");
wordcount++;//needed in case more than one extra space per word
pending = 1;//a line is pending
}
}
else {//paragraph or adding last word would be more than 80
if ( len ( line)) {//line has content
justify ( line, wordcount, WIDTH);
fprintf ( pfout, "%s\n", line);
//paragraph = 1;//could have a blank line
}
if ( /*paragraph &&*/ '\n' == word[0]) {
fprintf ( pfout, "\n");//print a blank line for paragraph
//paragraph = 0;//only allow one blank line
}
line[0] = 0;
wordcount = 0;//reset wordcount
if ( 0 != word[0] && '\n' != word[0]) {//word is not empty and is not newline
cpy ( line, word);//copy pending word to line
cat ( line, " ");//add a space
wordcount = 1;//reset wordcount
}
pending = 0;//nothing pending
}
}
if ( pending) {//print pending line
if ( len ( line)) {//line has content
justify ( line, wordcount, WIDTH);
fprintf ( pfout, "%s\n", line);
}
}
fclose ( pfin);
fclose ( pfout);
return 0;
}
int scanword ( FILE *pfread, int size, char *word) {
static int nl = 0;//static to retain value between function calls
int ch = 0;
int max = size - 1;//max characters that can fit in word and leave one to terminate
*word = 0;//first character. zero terminate. empty line
while ( max && ( ch = fgetc ( pfread))) {//read a character until max is zero
if ( EOF == ch) {//end of file
if ( max == size - 1) {
return 0;//no other characters read
}
return 1;//process the other characters that were read
}
if ( '\n' == ch) {//read a newline
if ( '\n' == nl) {//consecutive newlines
*word = nl;
word++;
*word = 0;
//nl = 0;//reset since just had two consceutive newlines
return 1;
}
nl = ch;//set for first single newline
return 1;
}
nl = 0;//reset to zero as prior character was not newline
if ( ' ' == ch || '\t' == ch) {//read space or tab
if ( max == size - 1) {//no characters in word so far
continue;//consume leading space and tab
}
return 1;//process the word read
}
*word = ch;//assign character to word
word++;//increment pointer to next character
*word = 0;//zero terminate
max--;//deduct. one less charater can be read into word
}
return 0;
}
int len ( char *str) {
int length = 0;
while ( *str) {//character pointed to is not terminating zero
length++;
str++;//increment pointer to point to next character
}
return length;
}
char *cat ( char *to, char *from) {
char *iterate = to;
while ( *iterate) {//character pointed to is not terminating zero
iterate++;//increment pointer to point to next character
}
while ( *from) {//character pointed to is not terminating zero
*iterate = *from;//assign from to iterate
iterate++;//increment pointer to point to next character
from++;
}
*iterate = 0;//terminate
return to;
}
char *cpy ( char *to, char *from) {
*to = 0;//set first character of to as terminating zero
cat ( to, from);
return to;
}
char *lastchr ( char *str, int ch) {
char *found = NULL;
while ( *str) {//character pointed to is not terminating zero
if ( ch == *str) {//character pointed to matches ch
found = str;//assign pointer str to found
}
str++;//increment pointer to point to next character. keep searching
}
return found;//return NULL or pointer to last found match
}
char *justify ( char *str, int wordcount, int width) {
int length = 0;
int addspaces = 0;
int extraspace = 0;
char *space = lastchr ( str, ' ');//find the last space
*space = 0;//set it to terminate the line
space--;//deduct one
length = len ( str);
addspaces = width - length;//difference is number of spaces needed
extraspace = addspaces;//may need more than one extra space
if ( wordcount > 2) {
extraspace = addspaces / ( wordcount - 1);//may need more than one extra space
}
char *end = space + addspaces;//set pointer end to point beyond wheree space points
while ( addspaces) {//stop when addspaces is zero
*end = *space;//assign character pointed to by space to the location pointed to by end
if ( ' ' == *space) {//found a space
for ( int each = 0; each <= extraspace; ++each) {//will add at least one space
end--;
*end = ' ';
addspaces--;
if ( ! addspaces) {
break;//do not need to add more spaces
}
}
}
end--;
space--;
if ( space <= str) {//reached the start of the line
break;
}
}
return str;
}

Parsing a complex string in C

I have a String like this:
"00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~",
I want to get each of the items like "00:00:00 000".
My idea is that first, split the string by ";", then split by "|", and finally split by "~".
But the problem is that I can't get it if it's null, such like "00:01:00 0000~", the part after "~", I wanna get it and set a default value to it then store it somewhere else, but the code doesn't work. What is the problem?
Here is my code:
int main(int argc, char *argv[])
{
char *str1, *str2, *str3, *str4, *token, *subtoken, *subt1, *subt2;
char *saveptr1, *saveptr2, *saveptr3;
int j;
for (j = 1, str1 = argv[1]; ; j++, str1 = NULL) {
token = strtok_r(str1, ";", &saveptr1);
if (token == NULL)
break;
printf("%d: %s\n", j, token);
int flag1 = 1;
for (str2 = token; ; str2 = NULL) {
subtoken = strtok_r(str2, "|", &saveptr2);
if (subtoken == NULL)
break;
printf(" %d: --> %s\n", flag1++, subtoken);
int flag2 = 1;
for(str3 = subtoken; ; str3 = NULL) {
subt1 = strtok_r(str3, "~", &saveptr3);
if(subt1 == NULL) {
break;
}
printf(" %d: --> %s\n",flag2++, subt1);
}
}
}
exit(EXIT_SUCCESS);
} /* main */
You can simplify your algorithm if you first make all delimiters uniform. First replace all occurrences of , and | with ~, then the parsing will be easier. You can do this externally via sed or vim or programmatically in your C code. Then you should be able to get the 'NULL' problem easily. (Personally, I prefer not to use strtok as it modifies the original string).
It is indeed easier to just write a custom parser in this case.
The version below allocates new strings, If allocating new memory is not desired, change the add_string method to instead just point to start, and set start[len] to 0.
static int add_string( char **into, const char *start, int len )
{
if( len<1 ) return 0;
if( (*into = strndup( start, len )) )
return 1;
return 0;
}
static int is_delimeter( char x )
{
static const char delimeters[] = { 0, '~', ',', '|',';' };
int i;
for( i=0; i<sizeof(delimeters); i++ )
if( x == delimeters[i] )
return 1;
return 0;
}
static char **split( const char *data )
{
char **res = malloc(sizeof(char *)*(strlen(data)/2+1));
char **cur = res;
int last_delimeter = 0, i;
do {
if( is_delimeter( data[i] ) )
{
if( add_string( cur, data+last_delimeter,i-last_delimeter) )
cur++;
last_delimeter = i+1;
}
} while( data[i++] );
*cur = NULL;
return res;
}
An example usage of the method:
int main()
{
const char test[] = "00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~";
char **split_test = split( test );
int i = 0;
while( split_test[i] )
{
fprintf( stderr, "%2d: %s\n", i, split_test[i] );
free( split_test[i] );
i++;
}
free( split_test );
return 0;
}
Instead of splitting the string, it might be more suitable to come up with a simple finite state machine that parses the string. Fortunately, your tokens seem to have an upper limit on their length, which makes things a lot easier:
Iterate over the string and distinguish four different states:
current character is not a delimiter, but previous character was (start of token)
current character is a delimiter and previous character wasn't (end of token)
current and previous character are both not delimiters (store them in temporary buffer)
current and previous character are both delimiters (ignore them, read next character)
It should be possible to come up with a very short (10 lines?) and concise piece of code that parses the string as specified.

Resources