How do I split a string by character position in c - c

I'm using C to read in an external text file. The input is not great and would look like;
0PAUL 22 ACACIA AVENUE 02/07/1986RN666
As you can see I have no obvious delimeter, and sometimes the values have no space between them. However I do know how long in character length each value should be when split. Which is as follows,
id = 1
name = 20
house number = 5
street name = 40
date of birth = 10
reference = 5
I've set up a structure I want to hold this information in, and have tried using fscanf to read in the file.
However I find something along the lines of just isn't doing what I need,
fscanf(file_in, "%1d, %20s", person.id[i], person.name[i]);
(The actual line I use attempts to grab all input but you should see where I'm going...)
The long term intention is to reformat the input file into another output file which would be made a little easier on the eye.
I appreciate I'm probably going about this all the wrong way, but I would hugely appreciate it if somebody could set me on the right path. If you're able to take it easy on me in regard to an obvious lack of understanding, I'd appreciate that also.
Thanks for reading

Use fgets to read each line at a time, then extract each field from the input line. Warning: no range checks is performed on buffers, so attention must be kept to resize buffers opportunely.
For example something like this (I don't compile it, so maybe some errors exist):
void copy_substr(const char * pBuffer, int content_size, int start_idx, int end_idx, char * pOutBuffer)
{
end_idx = end_idx > content_size ? content_size : end_idx;
int j = 0;
for (int i = start_idx; i < end_idx; i++)
pOutBuffer[j++] = pBuffer[i];
pOutBuffer[j] = 0;
return;
}
void test_solution()
{
char buffer_char[200];
fgets(buffer_char,sizeof(buffer_char),stdin); // use your own FILE handle instead of stdin
int len = strlen(buffer_char);
char temp_buffer[100];
// Reading first field: str[0..1), so only the char 0 (len=1)
int field_size = 1;
int filed_start_ofs = 0;
copy_substr(buffer_char, len, filed_start_ofs, filed_start_ofs + field_size, temp_buffer);
}

scanf is a good way to do it, you just need to use a buffer and call sscanf multiple times and give the good offsets.
For example :
char buffer[100];
fscanf(file_in, "%s",buffer);
sscanf(buffer, "%1d", person.id[i]);
sscanf(buffer+1, "%20s", person.name[i]);
sscanf(buffer+1+20, "%5d", person.street_number[i]);
and so on.
I feel like it is the easiest way to do it.
Please also consider using an array of your struct instead of a struct of arrays, it just feels wrong to have person.id[i] and not person[i].id

If you have fixed column widths, you can use pointer arithmetic to access substrings of your string str. if you have a starting index begin,
printf("%s", str + begin) ;
will print the substring beginning at begin and up to the end. If you want to print a string of a certain length, you can use printf's precision specifier .*, which takes a maximum length as additional argument:
printf("%.*s", length, str + begin) ;
If you want to copy the string to a temporary buffer, you could use strncpy, which will generate a null terminated string if the buffer is larger than the substring length. You could also use snprintf according to the above pattern:
char buf[length + 1];
snprintf(buf, sizeof(buf), "%.*s", length, str + begin) ;
This will extract leading and trailing spaces, which is probably not what you want. You could write a function to strip the unwanted whitespace; there should be plenty of examples here on SO.
You could also strip the whitespace when copying the substring. The example code below does this with the isspace function/macro from <ctype.h>:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int extract(char *buf, const char *str, int len)
{
const char *end = str + len;
int tail = -1;
int i = 0;
// skip leading white space;
while (str < end && *str && isspace(*str)) str++;
// copy string
while (str < end && *str) {
if (!isspace(*str)) tail = i + 1;
buf[i++] = *str++;
}
if (tail < 0) tail= i;
buf[tail] = '\0';
return tail;
}
int main()
{
char str[][80] = {
"0PAUL 22 ACACIA AVENUE 02/07/1986RN666",
"1BOB 1 POLK ST 01/04/1988RN802",
"2ALICE 99 WEST HIGHLAND CAUSEWAY 28/06/1982RN774"
};
int i;
for (i = 0; i < 3; i++) {
char *p = str[i];
char id[2];
char name[20];
char number[6];
char street[35];
char bday[11];
char ref[11];
extract(id, p + 0, 1);
extract(name, p + 1, 19);
extract(number, p + 20, 5);
extract(street, p + 25, 34);
extract(bday, p + 59, 10);
extract(ref, p + 69, 10);
printf("<person id='%s'>\n", id);
printf(" <name>%s</name>\n", name);
printf(" <house>%s</house>\n", number);
printf(" <street>%s</street>\n", street);
printf(" <birthday>%s</birthday>\n", bday);
printf(" <reference>%s</reference>\n", ref);
printf("</person>\n\n");
}
return 0;
}
There's a danger here, however: When you access a string at a certain position str + pos you should make sure that you don't go beyond the actual string length. For example, you string may be terminated after the name. When you access the birthday, you access valid memory, but it might contain garbage.
You can avoid this problem by padding the full string with spaces.

Related

How to restore string after using strtok()

I have a project in which I need to sort multiple lines of text based on the second, third, etc word in each line, not the first word. For example,
this line is first
but this line is second
finally there is this line
and you choose to sort by the second word, it would turn into
this line is first
finally there is this line
but this line is second
(since line is before there is before this)
I have a pointer to a char array that contains each line. So far what I've done is use strtok() to split each line up to the second word, but that changes the entire string to just that word and stores it in my array. My code for the tokenize bit looks like this:
for (i = 0; i < numLines; i++) {
char* token = strtok(labels[i], " ");
token = strtok(NULL, " ");
labels[i] = token;
}
This would give me the second word in each line, since I called strtok twice. Then I sort those words. (line, this, there) However, I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
I'm sure the answer lies in using pointers, but I'm confused what exactly I need to do next.
I should mention I'm reading in the lines from an input file as shown:
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i] = strdup(buffer);
Edit: my find_offset method
size_t find_offset(const char *s, int n) {
size_t len;
while (n > 0) {
len = strspn(s, " ");
s += len;
}
return len;
}
Edit 2: The relevant code used to sort
//Getting the line and offset
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset(labels[i].line, nth);
}
int n = sizeof(labels) / sizeof(labels[0]);
qsort(labels, n, sizeof(*labels), myCompare);
for (i = 0; i < numLines; i++)
printf("%d: %s", i, labels[i].line); //Print the sorted lines
int myCompare(const void* a, const void* b) { //Compare function
xline *xlineA = (xline *)a;
xline *xlineB = (xline *)b;
return strcmp(xlineA->line + xlineA->offset, xlineB->line + xlineB->offset);
}
Perhaps rather than mess with strtok(), use strspn(), strcspn() to parse the string for tokens. Then the original string can even be const.
#include <stdio.h>
#include <string.h>
int main(void) {
const char str[] = "this line is first";
const char *s = str;
while (*(s += strspn(s, " ")) != '\0') {
size_t len = strcspn(s, " ");
// Instead of printing, use the nth parsed token for key sorting
printf("<%.*s>\n", (int) len, s);
s += len;
}
}
Output
<this>
<line>
<is>
<first>
Or
Do not sort lines.
Sort structures
typedef struct {
char *line;
size_t offset;
} xline;
Pseudo code
int fcmp(a, b) {
return strcmp(a->line + a->offset, b->line + b->offset);
}
size_t find_offset_of_nth_word(const char *s, n) {
while (n > 0) {
use strspn(), strcspn() like above
}
}
main() {
int nth = ...;
xline labels[numLines];
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
labels[i].line = strdup(buffer);
labels[i].offset = find_offset_of_nth_word(nth);
}
qsort(labels, i, sizeof *labels, fcmp);
}
Or
After reading each line, find the nth token with strspn(), strcspn() and the reform the line from "aaa bbb ccc ddd \n" to "ccd ddd \naaa bbb ", sort and then later re-order the line.
In all case, do not use strtok() - too much information lost.
I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '\0', but Ive yet to find a way to get the original string back.
Far better would be to avoid damaging the original strings in the first place if you want to keep them, and especially to avoid losing the pointers to them. Provided that it is safe to assume that there are at least three words in each line and that the second is separated from the first and third by exactly one space on each side, you could undo strtok()'s replacement of delimiters with string terminators. However, there is no safe or reliable way to recover the start of the overall string once you lose it.
I suggest creating an auxiliary array in which you record information about the second word of each sentence -- obtained without damaging the original sentences -- and then co-sorting the auxiliary array and sentence array. The information to be recorded in the aux array could be a copy of the second word of the sentence, their offsets and lengths, or something similar.

Custom STRCAT is overwhelmed by too many arguments

I am trying to code a custom strcat that separates arguments with \n except for the last one and terminates the string with \0.
It's working fine as is up to 5 arguments, but if I try passing a sixth one I get a strange line in response :
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok
ok
ok
ok
ok
ok
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok ok
ok
ok
ok
ok
ok
P/Users/domingodelmasok
Here is my custom strcat code:
char cat(char *dest, char *src, int current, int argc_nb)
{
int i = 0;
int j = 0;
while(dest[i])
i++;
while(src[j])
{
dest[i + j] = src[j];
j++;
}
if(current < argc_nb - 1)
dest[i + j] = '\n';
else
dest[i + j] = '\0';
return(*dest);
}
UPDATE Complete calling function:
char *concator(int argc, char **argv)
{
int i;
int j;
int size = 0;
char *str;
i = 1;
while(i < argc)
{
j = 0;
while(argv[i][j])
{
size++;
j++;
}
i++;
}
str = (char*)malloc(sizeof(*str) * (size + 1));
i = 1;
while(i < argc)
{
cat(str, argv[i], i, argc);
i++;
}
free(str);
return(str);
}
What's wrong here?
Thanks!
Edit: Fixed blunder.
There are quite a few issues with the code:
sizeof (char) == 1 by the C standard.
cat() requires the destination to be a string (terminated by a \0), but does not append it itself (except for current >= argc_nb - 1). This is a bug.
free(str); return str; is an use-after-free bug. If you call free(str), the contents at str are irrevocably lost, inaccessible. The free(str) should simply be removed; it is not appropriate here.
Arrays in C are indexed at 0. However, the concator() function skips the first string pointer (because argv[0] contains the name used to execute the program). This is wrong, and will eventually trip someone. Instead, have concator() add all strings in the array, but call it using concator(argc - 1, argv + 1);.
There might be even more, but at this point, I believe a rewrite from scratch, using a much more appropriate approach, is in order.
Consider the following join() function:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *join(const size_t parts, const char *part[],
const char *separator, const char *suffix)
{
const size_t separator_len = (separator) ? strlen(separator) : 0;
const size_t suffix_len = (suffix) ? strlen(suffix) : 0;
size_t total_len = 0;
size_t p;
char *dst, *end;
/* Calculate sum of part lengths */
for (p = 0; p < parts; p++)
if (part[p])
total_len += strlen(part[p]);
/* Add separator lengths */
if (parts > 1)
total_len += (parts - 1) * separator_len;
/* Add suffix length */
total_len += suffix_len;
/* Allocate enough memory, plus end-of-string '\0' */
dst = malloc(total_len + 1);
if (!dst)
return NULL;
/* Keep a pointer to the current end of the result string */
end = dst;
/* Append each part */
for (p = 0; p < parts; p++) {
/* Insert separator */
if (p > 0 && separator_len > 0) {
memcpy(end, separator, separator_len);
end += separator_len;
}
/* Insert part */
if (part[p]) {
const size_t len = strlen(part[p]);
if (len > 0) {
memcpy(end, part[p], len);
end += len;
}
}
}
/* Append suffix */
if (suffix_len > 0) {
memcpy(end, suffix, suffix_len);
end += suffix_len;
}
/* Terminate string. */
*end = '\0';
/* All done. */
return dst;
}
The logic is simple. First, we find out the length of each component. Note that separator is only added between parts (so occurs parts-1 times), and suffix at the very end.
(The (string) ? strlen(string) : 0 idiom just means "if string is non-NULL, strlen(0), otherwise 0". We do that, because we allow NULL separator and suffix, but strlen(NULL) is Undefined Behaviour.)
Next, we allocate enough memory for the result, including the end-of-string NUL char, \0, that was not included in the lengths.
To append each part, we keep the result pointer intact, and instead use a temporary end pointer. (It is the end of the string thus far.) We use a loop, where we copy the next part to the end. Before the second and subsequent parts, we copy the separator before the part.
Next, we copy the suffix, and finally the end-of-string '\0'. (It is important to return a pointer to the beginning of the string, rather than end, of course; and that is why we kept dst to point to the new resulting string, and end at the point we appended each substring.)
You could use it from the command line using for example the following main():
int main(int argc, char *argv[])
{
char *result;
if (argc < 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SEPARATOR SUFFIX PART [ PART ... ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
result = join(argc - 3, (const char **)(argv + 3), argv[1], argv[2]);
if (!result) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fputs(result, stdout);
return EXIT_SUCCESS;
}
If you compile the above to e.g. example (I use gcc -Wall -O2 example.c -o example), then running
./example ', ' $'!\n' Hello world
in a Bash shell outputs
Hello, world!
(with a newline at end). Running
./example ' and ' $'.\n' a b c d e f g
outputs
a and b and c and d and e and f and g
(again with a newline at end). The $'...' is just a Bash idiom to specify special characters in strings; $'!\n' is the same in Bash as "!\n" is in C, and $'.\n' is the Bash equivalent of ".\n" in C.
(Removing the automatic newline between parts, and allowing a string rather than just one char to be used as a separator and suffix, was a deliberate choice for two reasons. The main one is to stop anyone from just copy-pasting this as an answer to some exercise. The secondary one is to show that while it might sound more complicated than just using single characters for them, it is actually very little additional code; and if you consider the practical use cases, allowing a string to be used as the separator opens up a lot of options.)
The example code above is only very lightly tested, and might contain bugs. If you find any, or disagree with anything I've written above, do let me know in a comment so I can review, and fix as necessary.

Appending a char to a char*

I'm writing a program and I need to append a char to a char*. I have a char** that represents lines of text. I'm trying to take a single char at a time from char**, and add it to a char*.
So basically, I tried strcat(char*, char[i][j]), but strcat rejects char[i][j] since it's not a pointer. I think I need to use sprintf(char*, "%c", char[i][j]), but I'm having trouble understanding how to append with sprintf. I don't want it to overwrite what's already in my char*.
Any tips?
You almost got it!
strncat (char *, & char[i][j], 1);
Calling strncat like that will copy exactly one char from your char** at position [i][j]. Just remember that it will also append the null character after that.
given that you know that buf is defined as
char buf [MAXSIZE];
Let us assume that is has been initialized to an initial string and you want to add mychar to it (a single character)
i = strlen(buf);
if (i < MAXSIZE-1)
{
buf[i] = mychar; // Make the new character the last element of buf
buf[i+1] = '\0' // end the new string with the null character
}
This will append the new character and push the ending null character to ensure that it is a valid string. This is what strcat does when the appended entry is also a string or what strncat does when the count is 1. Note that the new strlen(buf) will now be i+1
Assuming buf is a 0 terminated c-string, you can use snprintf():
#include <stdio.h>
#include <string.h>
int main(void)
{
size_t b_len;
char buf[256];
char ch = '!';
strcpy(buf, "Hello world");
b_len = strlen(buf);
snprintf(buf + b_len, sizeof buf - b_len, "%c", ch);
printf("%s", buf);
char *p = malloc(256); /* check if malloc failed */
strcpy(p, "Hello world");
size_t len = strlen(p);
snprintf(p + len, 256 - len, "%c", ch);
printf("%s", p);
return 0;
}
If buf happens to be pointer (such as an malloced pointer) then you can't use sizeof (it can only be used on an array to get the size).
I have a very hard time understanding your question, instead of posting an example of what you want to achieve you explained something that seems to be based on your way of solving a problem with very little training and/or experience in the c language. So it's hard to write a good answer but definitely I will try to show you a way of doing something similar to what apparently you want to do.
The thing is, I highly doubt that strcat() is appropriate for this or for anything except, concatenating two strings. More than two, require something better than strcat(). And for a single char it's definitely less appropriate.
The following code appends all the characters in the string literal array into a single string, check it out
int
main(int argc, char **argv)
{
char string[100];
char lines[3][14] = {
"First line...",
"Another line.",
"One more line"
};
size_t i;
i = 0;
for (size_t j = 0 ; ((j < sizeof(lines) / sizeof(lines[0])) && (i < sizeof(string) - 1)) ; ++j)
{
for (size_t k = 0 ; ((k < 13) && (i < sizeof(string) - 1)) ; ++k)
string[i++] = lines[j][k];
}
string[i] = '\0';
puts(string);
return 0;
}

Printing a string due to a new line

Is there any efficient (- in terms of performance) way for printing some arbitrary string, but only until the first new line character in it (excluding the new line character) ?
Example:
char *string = "Hello\nWorld\n";
printf(foo(string + 6));
Output:
World
If you are concerned about performance this might help (untested code):
void MyPrint(const char *str)
{
int len = strlen(str) + 1;
char *temp = alloca(len);
int i;
for (i = 0; i < len; i++)
{
char ch = str[i];
if (ch == '\n')
break;
temp[i] = ch;
}
temp[i] = 0;
puts(temp);
}
strlen is fast, alloca is fast, copying the string up to the first \n is fast, puts is faster than printf but is is most likely far slower than all three operations mentioned before together.
size_t writetodelim(char const *in, int delim)
{
char *end = strchr(in, delim);
if (!end)
return 0;
return fwrite(in, 1, end - in, stdout);
}
This can be generalized somewhat (pass the FILE* to the function), but it's already flexible enough to terminate the output on any chosen delimiter, including '\n'.
Warning: Do not use printf without format specifier to print a variable string (or from a variable pointer). Use puts instead or "%s", string.
C strings are terminated by '\0' (NUL), not by newline. So, the functions print until the NUL terminator.
You can, however, use your own loop with putchar. If that is any performance penalty is to be tested. Normally printf does much the same in the library and might be even slower, as it has to care for more additional constraints, so your own loop might very well be even faster.
for ( char *sp = string + 6 ; *sp != '\0'; sp++ ) {
if ( *sp == '\n' ) break; // newline will not be printed
putchar(*sp);
}
(Move the if-line to the end of the loop if you want newline to be printed.)
An alternative would be to limit the length of the string to print, but that would require finding the next newline before calling printf.
I don't know if it is fast enough, but there is a way to build a string containing the source string up to a new line character only involving one standard function.
char *string = "Hello\nWorld\nI love C"; // Example of your string
static char newstr [256]; // String large enough to contain the result string, fulled with \0s or NULL-terimated
sscanf(string + 6, "%s", newstr); // sscanf will ignore whitespaces
sprintf(newstr); // printing the string
I guess there is no more efficient way than simply looping over your string until you find the first \n in it. As Olaf mentioned it, a string in C ends with a terminating \0 so if you want to use printf to print the string you need to make sure it contains the terminating \0 or yu could use putchar to print the string character by character.
If you want to provide a function creating a string up to the first found new line you could do something like that:
#include <stdio.h>
#include <string.h>
#define MAX 256
void foo(const char* string, char *ret)
{
int len = (strlen(string) < MAX) ? (int) strlen(string) : MAX;
int i = 0;
for (i = 0; i < len - 1; i++)
{
if (string[i] == '\n') break;
ret[i] = string[i];
}
ret[i + 1] = '\0';
}
int main()
{
const char* string = "Hello\nWorld\n";
char ret[MAX];
foo(string, ret);
printf("%s\n", ret);
foo(string+6, ret);
printf("%s\n", ret);
}
This will print
Hello
World
Another fast way (if the new line character is truly unwanted)
Simply:
*strchr(string, '\n') = '\0';

String manipulations

Is there a proper way to just copy a part of a string after a certain point.
Party City 1422 Evergreen Street
I use strpbrk() to copy the name out, I could always just tokenize it by white space but is there a string process or technique where I can copy out a specific section of a string besides from the beginning like copy just [1422 Evergreen Street] or delete the first portion of the string?
If you want to specify it by starting position and length, you can always use strncpy and a bit of pointer arithmetic.
EDIT: When you know the starting string you can use
char *pos = strstr(src, "1422");
strcpy(dst, pos);
If you know the first and last characters' indexes of the substring you want to pick, you should do this with strncpy. See the following snippet to copy substringLength characters from the inputStr string at the given startIndex.
char * inputStr;
char * outputStr;
strncpy(outputStr, inputStr + startIndex, substringLength);
If you want to split at the location of a particular string, you can do something like this:
#define MAX_STRING 1024
int main() {
char myleftBuffer[MAX_STRING]="";
char myrightBuffer[MAX_STRING]="";
char mystring[]="Party City 1422 Evergreen Street";
char *start = strstr(mystring, "1422");
if(start) {
strcpy(myrightBuffer, start);
strncpy(myleftBuffer, mystring, (start - mystring));
}
printf("%s -> %s\n", myleftBuffer, myrightBuffer);
return;
}
Which outputs:
Party City -> 1422 Evergreen Street
Actually, strncpy is not a particularly good choice for the task at hand. It always pads your value out to occupy the entire destination, which is generally pretty wasteful (it was originally designed for putting file names into the Unix file system; it's good for that, but not really much else).
I think I'd use sscanf. Assuming we always want to copy from the first digit to the end of the string, you could do something like this:
char street_name[256];
sscanf(input_buffer, "%*[^0-9]%255[^\n]", street_name);
FWIW, the %*[^0-9] part skips over characters until it reaches something in the range 0-9 (yes, I know it looks like a regex, but scanf and company support it too). The * in it means to scan but not assign what it finds. The %255[^\n] means to read and assign until the next newline in the input, or up to 255 characters, whichever comes first.
int split_at(const char *in, const char *match, char *buf, size_t len)
{
char *pos;
if( (pos = strstr(in, match)) == NULL )
return -1; // No match
else if( pos == in )
return 0; // match is empty
if( strlcpy(buf, pos, len) >= len )
fprintf(stderr, "WARNING: match truncated: %s", buf);
return 1;
}
Probably impossible in the general case, and you would do better to get the input in seperate fields, but if thats not a option, something the following should work:
size_t street_extract(char* ret,size_t retsz,char* addr)
{
size_t i,nwrote;
for(i=0; addr[i] ;i++)
{
if(addr[i]!=' ') continue; /* only check at start of word */
i++;
if('0' < addr[i] && addr[i] < '9') break; /* found street number */
}
if(!addr[i]) return -1; /* not found */
for(nwrote=0; addr[i+nwrote] && nwrote < retsz-1 ;nwrote++)
{
ret[nwrote] = addr[i+nwrote];
}
ret[nwrote] = 0;
while(addr[i+nwrote]) nwrote++;
return nwrote; /* result is nwrote characters in length */
}
modify and error-check as needed.

Resources