How to restore string after using strtok() - c

I have a project in which I need to sort multiple lines of text based on the second, third, etc word in each line, not the first word. For example,
this line is first
but this line is second
finally there is this line
and you choose to sort by the second word, it would turn into
this line is first
finally there is this line
but this line is second
(since line is before there is before this)
I have a pointer to a char array that contains each line. So far what I've done is use strtok() to split each line up to the second word, but that changes the entire string to just that word and stores it in my array. My code for the tokenize bit looks like this:
for (i = 0; i < numLines; i++) {
char* token = strtok(labels[i], " ");
token = strtok(NULL, " ");
labels[i] = token;
}
This would give me the second word in each line, since I called strtok twice. Then I sort those words. (line, this, there) However, I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
I'm sure the answer lies in using pointers, but I'm confused what exactly I need to do next.
I should mention I'm reading in the lines from an input file as shown:
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i] = strdup(buffer);
Edit: my find_offset method
size_t find_offset(const char *s, int n) {
size_t len;
while (n > 0) {
len = strspn(s, " ");
s += len;
}
return len;
}
Edit 2: The relevant code used to sort
//Getting the line and offset
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset(labels[i].line, nth);
}
int n = sizeof(labels) / sizeof(labels[0]);
qsort(labels, n, sizeof(*labels), myCompare);
for (i = 0; i < numLines; i++)
printf("%d: %s", i, labels[i].line); //Print the sorted lines
int myCompare(const void* a, const void* b) { //Compare function
xline *xlineA = (xline *)a;
xline *xlineB = (xline *)b;
return strcmp(xlineA->line + xlineA->offset, xlineB->line + xlineB->offset);
}

Perhaps rather than mess with strtok(), use strspn(), strcspn() to parse the string for tokens. Then the original string can even be const.
#include <stdio.h>
#include <string.h>
int main(void) {
const char str[] = "this line is first";
const char *s = str;
while (*(s += strspn(s, " ")) != '\0') {
size_t len = strcspn(s, " ");
// Instead of printing, use the nth parsed token for key sorting
printf("<%.*s>\n", (int) len, s);
s += len;
}
}
Output
<this>
<line>
<is>
<first>
Or
Do not sort lines.
Sort structures
typedef struct {
char *line;
size_t offset;
} xline;
Pseudo code
int fcmp(a, b) {
return strcmp(a->line + a->offset, b->line + b->offset);
}
size_t find_offset_of_nth_word(const char *s, n) {
while (n > 0) {
use strspn(), strcspn() like above
}
}
main() {
int nth = ...;
xline labels[numLines];
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset_of_nth_word(nth);
}
qsort(labels, i, sizeof *labels, fcmp);
}
Or
After reading each line, find the nth token with strspn(), strcspn() and the reform the line from "aaa bbb ccc ddd \n" to "ccd ddd \naaa bbb ", sort and then later re-order the line.
In all case, do not use strtok() - too much information lost.

I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
Far better would be to avoid damaging the original strings in the first place if you want to keep them, and especially to avoid losing the pointers to them. Provided that it is safe to assume that there are at least three words in each line and that the second is separated from the first and third by exactly one space on each side, you could undo strtok()'s replacement of delimiters with string terminators. However, there is no safe or reliable way to recover the start of the overall string once you lose it.
I suggest creating an auxiliary array in which you record information about the second word of each sentence -- obtained without damaging the original sentences -- and then co-sorting the auxiliary array and sentence array. The information to be recorded in the aux array could be a copy of the second word of the sentence, their offsets and lengths, or something similar.

Related

C: print longest string in a dynamic allocated array of chars

I've made a dynamic allocated array of chars and read some lines of text from a .txt file. How can I find the longest "string" in the array?
The .txt file looks like this:
usr
user
username
somerandomtext
Here's my code that loads the array and prints it:
char c = fgetc(rezultati);
printf("\n");
int x = 0;
while (c != EOF){
pogg[x++] = c;
c = fgetc(rezultati);
}
pogg[x] = '\0';
printf("%s\n\n", pogg);
I have tried using qsort with a custom comparator function buy my output is just lines of ^2 (squared).
Read the file line by line using fgets(), and get the length of the line. Save the longest line in another variable.
You don't need an array of all the lines for this.
#define MAXLEN 200
char buffer[MAXLEN], longest[MAXLEN] = "";
size_t maxlength = 0;
while(fgets(buffer, MAXLEN, stdin) {
if (strlen(buffer) > maxlength) {
strcpy(longest, buffer);
}
}
printf("Longest line = %s\n", longest);
I think you need to use the function "strlen" You will then store the lengths in an array and compare them.
maxlen = arrayOfLengths[0];
while (i !=0) {
len = strlen(arrayOflengths[i]);
if (len > maxlen) {
maxlen = len;
maxindex = i;}
This has been little bit pseudocode-ish but still, I think that it will give you some idea.

how to save a string token , save its content to an array, then use those contents for further comparison

/*I am unsure if my code for saving the tokens in an array is accurate.
This is so because been whenever I run my program, the code to compare
token[0] with my variable doesn't give an output nor perform assigned function.
Hence I am sure there is something inaccurate about my coding.*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{
//variable declarations
const char *array[] = {"ax","bo","cf"};
char delim[]=" \n";
char* myline;
size_t max = 500;
char* token1;
char* token2[max];
int n = 0;
while(1) //loop always
{
printf("Enter an argument\n"); //asks for an input
getline (&myline, &max, stdin); //read the input/line
//for loop -- splits up the line into tokens
for(token1 = strtok(myline, " "); token1 != NULL; token1 = strtok(NULL, delim))
{
token2[n] = malloc(strlen(token1)+1); //allocate some space/memory to token2[n]
//save the token in an array by copying from token1 to token2
strcpy(token2[n],token1);
int m;
for(m = 0; m<sizeof(array);m++) //loop through the array elements for comparison
{
//compare array at index m with token at index 0 -- compare only first token with a specific variable
if(strcmp(token2[0], array[m]) == 0)
{
printf("equal");
}
}
}
free(token2[n]); //deallocate assigned memory
}
return(0);
}
I think you should try vector of string like
vector < string > str = { "ax","bo","cf" };
Their seems to be a few issues in your current code:
for(m = 0; m<strlen;m++) is not correct. strlen() is a <string.h> function used to obtain the length of a C string. Since you want array[i], you need to include the size of array in the guard. To find the size of the array, you can use sizeof(array)/sizeof(array[0]). It would be good to include this in a macro:
#define ARRAYSIZE(x) (sizeof x/sizeof x[0])
Then your loop can be:
size_t m;
for(m = 0; m<ARRAYSIZE(array); m++)
You need to check return of malloc(), as it can return NULL on failure to allocate spaces. Here is a way to check this:
token2[n] = malloc(strlen(token1)+1);
if (token2[n] == NULL) {
/* handle error */
It is possible to skip the malloc()/strcpy() step by simply using strdup.
getline() returns -1 on failure to read a line, so its good to check this. It also adds a \n character at the end of the buffer, so you need to remove this. Otherwise, strcmp will never find equal strings, as you will be comparing strcmp("string\n", "string"). You need to find the \n character in your buffer, and replace it with a \0 null-terminator.
You can achieve this like:
size_t slen = strlen(myline);
if (slen > 0 && myline[slen-1] == '\n') {
myline[slen-1] = '\0';
}
You also need to free() all of the char* pointers in token2[].
Since you are using the same delimeter for strtok(), its better to make this const. So const char *delim = " \n"; instead.
Alot of the fixes I suggested in the comments, so I didn't post them here, as you seemed to have updated your code with those suggestions.

Printing a string due to a new line

Is there any efficient (- in terms of performance) way for printing some arbitrary string, but only until the first new line character in it (excluding the new line character) ?
Example:
char *string = "Hello\nWorld\n";
printf(foo(string + 6));
Output:
World
If you are concerned about performance this might help (untested code):
void MyPrint(const char *str)
{
int len = strlen(str) + 1;
char *temp = alloca(len);
int i;
for (i = 0; i < len; i++)
{
char ch = str[i];
if (ch == '\n')
break;
temp[i] = ch;
}
temp[i] = 0;
puts(temp);
}
strlen is fast, alloca is fast, copying the string up to the first \n is fast, puts is faster than printf but is is most likely far slower than all three operations mentioned before together.
size_t writetodelim(char const *in, int delim)
{
char *end = strchr(in, delim);
if (!end)
return 0;
return fwrite(in, 1, end - in, stdout);
}
This can be generalized somewhat (pass the FILE* to the function), but it's already flexible enough to terminate the output on any chosen delimiter, including '\n'.
Warning: Do not use printf without format specifier to print a variable string (or from a variable pointer). Use puts instead or "%s", string.
C strings are terminated by '\0' (NUL), not by newline. So, the functions print until the NUL terminator.
You can, however, use your own loop with putchar. If that is any performance penalty is to be tested. Normally printf does much the same in the library and might be even slower, as it has to care for more additional constraints, so your own loop might very well be even faster.
for ( char *sp = string + 6 ; *sp != '\0'; sp++ ) {
if ( *sp == '\n' ) break; // newline will not be printed
putchar(*sp);
}
(Move the if-line to the end of the loop if you want newline to be printed.)
An alternative would be to limit the length of the string to print, but that would require finding the next newline before calling printf.
I don't know if it is fast enough, but there is a way to build a string containing the source string up to a new line character only involving one standard function.
char *string = "Hello\nWorld\nI love C"; // Example of your string
static char newstr [256]; // String large enough to contain the result string, fulled with \0s or NULL-terimated
sscanf(string + 6, "%s", newstr); // sscanf will ignore whitespaces
sprintf(newstr); // printing the string
I guess there is no more efficient way than simply looping over your string until you find the first \n in it. As Olaf mentioned it, a string in C ends with a terminating \0 so if you want to use printf to print the string you need to make sure it contains the terminating \0 or yu could use putchar to print the string character by character.
If you want to provide a function creating a string up to the first found new line you could do something like that:
#include <stdio.h>
#include <string.h>
#define MAX 256
void foo(const char* string, char *ret)
{
int len = (strlen(string) < MAX) ? (int) strlen(string) : MAX;
int i = 0;
for (i = 0; i < len - 1; i++)
{
if (string[i] == '\n') break;
ret[i] = string[i];
}
ret[i + 1] = '\0';
}
int main()
{
const char* string = "Hello\nWorld\n";
char ret[MAX];
foo(string, ret);
printf("%s\n", ret);
foo(string+6, ret);
printf("%s\n", ret);
}
This will print
Hello
World
Another fast way (if the new line character is truly unwanted)
Simply:
*strchr(string, '\n') = '\0';

How do I split a string by character position in c

I'm using C to read in an external text file. The input is not great and would look like;
0PAUL 22 ACACIA AVENUE 02/07/1986RN666
As you can see I have no obvious delimeter, and sometimes the values have no space between them. However I do know how long in character length each value should be when split. Which is as follows,
id = 1
name = 20
house number = 5
street name = 40
date of birth = 10
reference = 5
I've set up a structure I want to hold this information in, and have tried using fscanf to read in the file.
However I find something along the lines of just isn't doing what I need,
fscanf(file_in, "%1d, %20s", person.id[i], person.name[i]);
(The actual line I use attempts to grab all input but you should see where I'm going...)
The long term intention is to reformat the input file into another output file which would be made a little easier on the eye.
I appreciate I'm probably going about this all the wrong way, but I would hugely appreciate it if somebody could set me on the right path. If you're able to take it easy on me in regard to an obvious lack of understanding, I'd appreciate that also.
Thanks for reading
Use fgets to read each line at a time, then extract each field from the input line. Warning: no range checks is performed on buffers, so attention must be kept to resize buffers opportunely.
For example something like this (I don't compile it, so maybe some errors exist):
void copy_substr(const char * pBuffer, int content_size, int start_idx, int end_idx, char * pOutBuffer)
{
end_idx = end_idx > content_size ? content_size : end_idx;
int j = 0;
for (int i = start_idx; i < end_idx; i++)
pOutBuffer[j++] = pBuffer[i];
pOutBuffer[j] = 0;
return;
}
void test_solution()
{
char buffer_char[200];
fgets(buffer_char,sizeof(buffer_char),stdin); // use your own FILE handle instead of stdin
int len = strlen(buffer_char);
char temp_buffer[100];
// Reading first field: str[0..1), so only the char 0 (len=1)
int field_size = 1;
int filed_start_ofs = 0;
copy_substr(buffer_char, len, filed_start_ofs, filed_start_ofs + field_size, temp_buffer);
}
scanf is a good way to do it, you just need to use a buffer and call sscanf multiple times and give the good offsets.
For example :
char buffer[100];
fscanf(file_in, "%s",buffer);
sscanf(buffer, "%1d", person.id[i]);
sscanf(buffer+1, "%20s", person.name[i]);
sscanf(buffer+1+20, "%5d", person.street_number[i]);
and so on.
I feel like it is the easiest way to do it.
Please also consider using an array of your struct instead of a struct of arrays, it just feels wrong to have person.id[i] and not person[i].id
If you have fixed column widths, you can use pointer arithmetic to access substrings of your string str. if you have a starting index begin,
printf("%s", str + begin) ;
will print the substring beginning at begin and up to the end. If you want to print a string of a certain length, you can use printf's precision specifier .*, which takes a maximum length as additional argument:
printf("%.*s", length, str + begin) ;
If you want to copy the string to a temporary buffer, you could use strncpy, which will generate a null terminated string if the buffer is larger than the substring length. You could also use snprintf according to the above pattern:
char buf[length + 1];
snprintf(buf, sizeof(buf), "%.*s", length, str + begin) ;
This will extract leading and trailing spaces, which is probably not what you want. You could write a function to strip the unwanted whitespace; there should be plenty of examples here on SO.
You could also strip the whitespace when copying the substring. The example code below does this with the isspace function/macro from <ctype.h>:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int extract(char *buf, const char *str, int len)
{
const char *end = str + len;
int tail = -1;
int i = 0;
// skip leading white space;
while (str < end && *str && isspace(*str)) str++;
// copy string
while (str < end && *str) {
if (!isspace(*str)) tail = i + 1;
buf[i++] = *str++;
}
if (tail < 0) tail= i;
buf[tail] = '\0';
return tail;
}
int main()
{
char str[][80] = {
"0PAUL 22 ACACIA AVENUE 02/07/1986RN666",
"1BOB 1 POLK ST 01/04/1988RN802",
"2ALICE 99 WEST HIGHLAND CAUSEWAY 28/06/1982RN774"
};
int i;
for (i = 0; i < 3; i++) {
char *p = str[i];
char id[2];
char name[20];
char number[6];
char street[35];
char bday[11];
char ref[11];
extract(id, p + 0, 1);
extract(name, p + 1, 19);
extract(number, p + 20, 5);
extract(street, p + 25, 34);
extract(bday, p + 59, 10);
extract(ref, p + 69, 10);
printf("<person id='%s'>\n", id);
printf(" <name>%s</name>\n", name);
printf(" <house>%s</house>\n", number);
printf(" <street>%s</street>\n", street);
printf(" <birthday>%s</birthday>\n", bday);
printf(" <reference>%s</reference>\n", ref);
printf("</person>\n\n");
}
return 0;
}
There's a danger here, however: When you access a string at a certain position str + pos you should make sure that you don't go beyond the actual string length. For example, you string may be terminated after the name. When you access the birthday, you access valid memory, but it might contain garbage.
You can avoid this problem by padding the full string with spaces.

How do I set the number of characters output in a fprintf '%s' format using a variable?

I need to write a variable number of characters to a file. For example, lets say I want to print 3 characters. "TO" would print "TO" to a file. "LongString of Characters" would print "Lon" to a file.
How can I do this? (the number of characters is defined in another variable). I know that this is possible fprintf(file,"%10s",string), but that 10 is predefined
This one corresponds to your example:
fprintf(file, "%*s", 10, string);
but you mentioned a maximum as well, to also limit the number:
fprintf(file, "%*.*s", 10, 10, string);
I believe you need "%*s" and you'll need to pass the length as an integer before the string.
As an alternative, why not try this:
void print_limit(char *string, size_t num)
{
char c = string[num];
string[num] = 0;
fputs(string, file);
string[num] = c;
}
Temporarily truncates the string to the length you want and then restores it. Sure, it's not strictly necessary, but it works, and it's quite easy to understand.
As an alternative, you may create "c_str" from your buffer and prepare string to printf like I do that in this source:
void print(char *val, size_t size) {
char *p = (char *)malloc(size+1); // +1 to zero char
char current;
for (int i=0; i < size; ++i) {
current = val[i];
if (current != '\0') {
p[i] = current;
} else {
p[i] = '.'; // keep in mind that \0 was replace by dot
}
p[i+1] = '\0';
}
printf("%s", p);
free(p);
}
But it solution wrong way. You shuld use fprintf with format "%*s".

Resources