Custom STRCAT is overwhelmed by too many arguments - c

I am trying to code a custom strcat that separates arguments with \n except for the last one and terminates the string with \0.
It's working fine as is up to 5 arguments, but if I try passing a sixth one I get a strange line in response :
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok
ok
ok
ok
ok
ok
MacBook-Pro-de-Domingo% ./test ok ok ok ok ok ok
ok
ok
ok
ok
ok
P/Users/domingodelmasok
Here is my custom strcat code:
char cat(char *dest, char *src, int current, int argc_nb)
{
int i = 0;
int j = 0;
while(dest[i])
i++;
while(src[j])
{
dest[i + j] = src[j];
j++;
}
if(current < argc_nb - 1)
dest[i + j] = '\n';
else
dest[i + j] = '\0';
return(*dest);
}
UPDATE Complete calling function:
char *concator(int argc, char **argv)
{
int i;
int j;
int size = 0;
char *str;
i = 1;
while(i < argc)
{
j = 0;
while(argv[i][j])
{
size++;
j++;
}
i++;
}
str = (char*)malloc(sizeof(*str) * (size + 1));
i = 1;
while(i < argc)
{
cat(str, argv[i], i, argc);
i++;
}
free(str);
return(str);
}
What's wrong here?
Thanks!
Edit: Fixed blunder.

There are quite a few issues with the code:
sizeof (char) == 1 by the C standard.
cat() requires the destination to be a string (terminated by a \0), but does not append it itself (except for current >= argc_nb - 1). This is a bug.
free(str); return str; is an use-after-free bug. If you call free(str), the contents at str are irrevocably lost, inaccessible. The free(str) should simply be removed; it is not appropriate here.
Arrays in C are indexed at 0. However, the concator() function skips the first string pointer (because argv[0] contains the name used to execute the program). This is wrong, and will eventually trip someone. Instead, have concator() add all strings in the array, but call it using concator(argc - 1, argv + 1);.
There might be even more, but at this point, I believe a rewrite from scratch, using a much more appropriate approach, is in order.
Consider the following join() function:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *join(const size_t parts, const char *part[],
const char *separator, const char *suffix)
{
const size_t separator_len = (separator) ? strlen(separator) : 0;
const size_t suffix_len = (suffix) ? strlen(suffix) : 0;
size_t total_len = 0;
size_t p;
char *dst, *end;
/* Calculate sum of part lengths */
for (p = 0; p < parts; p++)
if (part[p])
total_len += strlen(part[p]);
/* Add separator lengths */
if (parts > 1)
total_len += (parts - 1) * separator_len;
/* Add suffix length */
total_len += suffix_len;
/* Allocate enough memory, plus end-of-string '\0' */
dst = malloc(total_len + 1);
if (!dst)
return NULL;
/* Keep a pointer to the current end of the result string */
end = dst;
/* Append each part */
for (p = 0; p < parts; p++) {
/* Insert separator */
if (p > 0 && separator_len > 0) {
memcpy(end, separator, separator_len);
end += separator_len;
}
/* Insert part */
if (part[p]) {
const size_t len = strlen(part[p]);
if (len > 0) {
memcpy(end, part[p], len);
end += len;
}
}
}
/* Append suffix */
if (suffix_len > 0) {
memcpy(end, suffix, suffix_len);
end += suffix_len;
}
/* Terminate string. */
*end = '\0';
/* All done. */
return dst;
}
The logic is simple. First, we find out the length of each component. Note that separator is only added between parts (so occurs parts-1 times), and suffix at the very end.
(The (string) ? strlen(string) : 0 idiom just means "if string is non-NULL, strlen(0), otherwise 0". We do that, because we allow NULL separator and suffix, but strlen(NULL) is Undefined Behaviour.)
Next, we allocate enough memory for the result, including the end-of-string NUL char, \0, that was not included in the lengths.
To append each part, we keep the result pointer intact, and instead use a temporary end pointer. (It is the end of the string thus far.) We use a loop, where we copy the next part to the end. Before the second and subsequent parts, we copy the separator before the part.
Next, we copy the suffix, and finally the end-of-string '\0'. (It is important to return a pointer to the beginning of the string, rather than end, of course; and that is why we kept dst to point to the new resulting string, and end at the point we appended each substring.)
You could use it from the command line using for example the following main():
int main(int argc, char *argv[])
{
char *result;
if (argc < 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SEPARATOR SUFFIX PART [ PART ... ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
result = join(argc - 3, (const char **)(argv + 3), argv[1], argv[2]);
if (!result) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fputs(result, stdout);
return EXIT_SUCCESS;
}
If you compile the above to e.g. example (I use gcc -Wall -O2 example.c -o example), then running
./example ', ' $'!\n' Hello world
in a Bash shell outputs
Hello, world!
(with a newline at end). Running
./example ' and ' $'.\n' a b c d e f g
outputs
a and b and c and d and e and f and g
(again with a newline at end). The $'...' is just a Bash idiom to specify special characters in strings; $'!\n' is the same in Bash as "!\n" is in C, and $'.\n' is the Bash equivalent of ".\n" in C.
(Removing the automatic newline between parts, and allowing a string rather than just one char to be used as a separator and suffix, was a deliberate choice for two reasons. The main one is to stop anyone from just copy-pasting this as an answer to some exercise. The secondary one is to show that while it might sound more complicated than just using single characters for them, it is actually very little additional code; and if you consider the practical use cases, allowing a string to be used as the separator opens up a lot of options.)
The example code above is only very lightly tested, and might contain bugs. If you find any, or disagree with anything I've written above, do let me know in a comment so I can review, and fix as necessary.

Related

Writing a C program that removes every occurrence of a char except the last one

Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.For example if I had the string
char word[]="Hihxiivaeiavigru";
output should be:
printf("%s",word);
hxeavigru
What I have so far:
#include <stdio.h>
#include <string.h>
int main()
{
char word[]="Hihxiiveiaigru";
for (int i=0;i<strlen(word);i++){
if (word[i+1]==word[i]);
memmove(&word[i], &word[i + 1], strlen(word) - i);
}
printf("%s",word);
return 0;
}
I am not sure what I am doing wrong.
With short strings, any algorithm will do. OP's attempt is O(n*n) (as well as other working answers and #David C. Rankin that identified OP's short-comings.)
But what if the string was thousands, millions in length?
Consider the following algorithm: #paulsm4
Form a `bool` array used[CHAR_MAX - CHAR_MIN + 1] and set each false.
i,unique = n - 1;
From the end of the string (n-1 to 0) to the front:
if (character never seen yet) { // used[] look-up
array[unique] = array[i];
unique--;
}
Mark used[array[i]] as true (index from CHAR_MIN)
i--;
Shift the string "to the left" (unique - i) places
Solution is O(n)
Coding goal is too fun to just post a fully coded answer.
I would first write a function to determine if a char ch at a given position i is the last occurence of ch given a char *. Like,
bool isLast(char *word, char ch, int p) {
p++;
ch = tolower(ch);
while (word[p] != '\0') {
if (tolower(word[p]) == ch) {
return false;
}
p++;
}
return true;
}
Then you can use that to iteratively emit your desired characters like
int main() {
char *word = "Hihxiivaeiavigru";
for (int i = 0; word[i] != '\0'; i++) {
if (isLast(word, word[i], i)) {
putchar(word[i]);
}
}
putchar('\n');
}
And (for completeness) I used
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
Outputs (as requested)
hxeavigru
Additional areas where you are currently hurting yourself.
Your for loop must NOT increment the index, e.g. for (int i=0; word[i];). This is because when you memmove() by 1, you have just incremented the indexes. That also means the value to save for last is now i - 1.
there should only be one call to strlen() in the program. You can simply subtract one from length each time memmove() is called.
only increment your loop counter variable when memmove() is not called.
Additionally, avoid hardcoding strings. You shouldn't have to recompile your code just to test the results of "Hihxiivaeiaigrui" instead of "Hihxiivaeiaigru". You shouldn't have to recompile just to remove all but the last 'a' instead of the 'i'. Either pass the string and character to find as arguments to your program (that's what int argc, char **argv are for), or prompt the user for input.
Putting it altogether you could do (presuming word is 1023 characters or less):
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (int argc, char **argv) {
char word[MAXC]; /* storage for word */
strcpy (word, argc > 1 ? argv[1] : "Hihxiivaeiaigru"); /* copy to word */
int find = argc > 2 ? *argv[2] : 'i', /* character to find */
last = -1; /* last index where find found */
size_t len = strlen (word); /* only compute strlen once */
printf ("%s (removing all but last %c)\n", word, find);
for (int i=0; word[i];) { /* loop over each char -- do NOT increment */
if (word[i] == find) { /* is this my character to find? */
if (last != -1) { /* if last is set */
/* overwrite last with rest of word */
memmove (&word[last], &word[last + 1], (int)len - last);
last = i - 1; /* last now i - 1 (we just moved it) */
len = len - 1;
}
else { /* last not set */
last = i; /* set it */
i++; /* increment loop counter */
}
}
else /* all other chars */
i++; /* just increment loop counter */
}
puts (word); /* output result -- no need for printf (no coversions) */
}
Example Use/Output
$ ./bin/rm_all_but_last_occurrence
Hihxiivaeiaigru (removing all but last i)
Hhxvaeaigru
What if you want to use "Hihxiivaeiaigrui"? Just pass it as the 1st argument:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui
Hihxiivaeiaigrui (removing all but last i)
Hhxvaeagrui
What if you want to use "Hihxiivaeiaigrui" and remove duplicate 'a' characters? Just pass the string to search as the 1st argument and the character to find as the second:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui a
Hihxiivaeiaigrui (removing all but last a)
Hihxiiveiaigrui
Nothing removed if only one of the characters:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui H
Hihxiivaeiaigrui (removing all but last H)
Hihxiivaeiaigrui
Let me know if you have further questions.
Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.
Process the string (or word) from last character and move towards the first character of string (or word). Now, think of it as a problem where you have to remove all occurrence of a character from string and except the first occurrence. Since, we are processing the string from last character to first character, so, we have to move the characters, which are remain after removing duplicates, to the start of string once you have processed whole string and, if, there were duplicate characters found in the string. The complexity of this algorithm is O(n).
Implementation:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define INDX(x) (tolower(x) - 'a')
void remove_dups_except_last (char str[]) {
int map[26] = {0}; /* to keep track of a character processed */
size_t len = strlen (str);
char *p = str + len; /* pointer pointing to null character of input string */
size_t i = 0;
for (i = len; i != 0; --i) {
if (map[INDX(str[i - 1])] == 0) {
map[INDX(str[i - 1])] = 1;
*--p = str[i - 1];
}
}
/* if there were duplicates characters then only copy
*/
if (p != str) {
for (i = 0; *p; ++i) {
str[i] = *p++;
}
str[i] = '\0';
}
}
int main(int argc, char* argv[])
{
if (argc != 2) {
printf ("Invalid number of arguments\n");
return -1;
}
char str[1024] = {0};
/* Assumption: the input string/word will contain characters A-Z and a-z
* only and size of input will not be more than 1023.
*
* Leaving it up to you to check the valid characters in input string/word
*/
strcpy (str, argv[1]);
printf ("Original string : %s\n", str);
remove_dups_except_last (str);
printf ("Removed duplicated characters except the last one, modified string : %s\n", str);
return 0;
}
Testcases output:
# ./a.out Hihxiivaeiavigru
Original string : Hihxiivaeiavigru
Removed duplicated characters except the last one, modified string : hxeavigru
# ./a.out aa
Original string : aa
Removed duplicated characters except the last one, modified string : a
# ./a.out a
Original string : a
Removed duplicated characters except the last one, modified string : a
# ./a.out TtYyuU
Original string : TtYyuU
Removed duplicated characters except the last one, modified string : tyU
You can re-iterate to get each characters of your string, then if it is not "i" and not the last occurrence of the i, copy to a new string.
#include <stdio.h>
#include <string.h>
int main() {
char word[]="Hihxiiveiaigru";
char newword[10000];
char* ptr = strrchr(word, 'i');
int index=0;
int index2=0;
while (index < strlen(word)) {
if (word[index]!='i' || index ==(ptr - word)) {
newword[index2]=word[index];
index2++;
}
index++;
}
printf("%s",newword);
return 0;
}

How to copy a sentence from a longer string into a new array while including period?

I want to save part of a string into a new char array while including the period. For example, the string is:
My name is John. I have 1 dog.
I want to copy each char up to and including the first period, so the new char array will contain:
My name is John.
The code I have written below copies only "My name is John" but omits the period.
ptrBeg and ptrEnd point to the char at the beginning and end, respectively, of the portion I want to copy. My intention was to copy ptrBeg into array newBuf through a pointer to newBuf and then increment both ptrBeg and the pointer to the array until ptrBeg and ptrEnd point to the same char, which should always be a period.
At this point, the text of the string should be copied, so I increment the pointer to char array once more and copy the period to the new space using
++ptrnewBuf;
*ptrnewBuf = *ptrEnd";
Finally, I print the contents of newBuf.
Here's the total code:
int main()
{
char buf[] = "My name is John. I have 1 dog.";
char * ptrBuf;
char * ptrBeg;
char * ptrEnd;
ptrBeg = buf;
ptrBuf = ptrBeg;
while (*ptrBuf != '.'){
ptrBuf++;
}
ptrEnd = ptrBuf;
char newBuf[100];
char * ptrnewBuf = newBuf;
while(*ptrBeg != *ptrEnd){
*ptrnewBuf = *ptrBeg;
ptrnewBuf++;
ptrBeg++;
}
++ptrnewBuf;
*ptrnewBuf = *ptrEnd;
printf("%s", newBuf);
}
How would I modify this code to include a period?
You are on the right track, but you may be making things a bit more complicated than needed and overlooking a few critical checks. The key to iterating by pointers or using pointer arithmetic is to always validate and protect your array or memory bounds during each iteration or arithmetic operation.
Another tip is to always map out your pointer positions on a piece of paper before coding everything up so you have a clear picture of what your iteration limits and any adjustments need to be. (you don't have to use full long strings and many boxes, just use a representation of what needs to be done with a handful of characters) In your case where you wish to copy the substing up through the first '.', something simple like the following will do, e.g.
+---+---+---+---+---+---+
| A | . | | B | . |\0 |
+---+---+---+---+---+---+
^ ^
| pointer (when *p == '.')
buf
So to copy "A." from buf to a new buffer you can't simply iterate while (*p != '.') or you will not copy '.'. By drawing it out, you can clearly see you need to also copy the character when p == '.', e.g.
+---+---+---+---+---+---+
| A | . | | B | . |\0 |
+---+---+---+---+---+---+
^ ^
| |-->| pointer (p + 1)
buf
Now regardless of the actual length of the string before '.', you now know you need p + 1 as the final address to include the last character in the copy.
You also know how many characters your new buffer can store. Say the size of new is MAXC characters (maximum number of characters). So you can store a string of at most MAXC-1 characters (plus the nul-character). When you are filling new you need to always validate you are within MAXC-1 characters.
You also need to insure you new string is nul-terminated (or it isn't a string, it's simply an array of characters). One effective way to insure nul-termination is by initializing all characters in new to 0 when it is declared, e.g.
char new[MAXC] = "";
which initializes the 1st character to 0 (e.g. '\0' empty-string) and all remaining characters 0 by default. Now if you fill no more than MAXC-1 characters, you are guaranteed the array will be a nul-terminated string.
Putting it altogether, you could do something like the following:
#include <stdio.h>
#define MAXC 128 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[] = "My name is John. I have 1 dog.",
*p = buf, /* pointer to buf */
new[MAXC] = "", /* buffer for substring */
*n = new; /* pointer to new */
size_t ndx = 0; /* index for new */
/* loop copying each char until new full, '.' copied, or end of buf */
for (; ndx + 1 < MAXC && *p; p++, n++, ndx++) {
*n = *p; /* copy char from buf to new */
if (*n == '.') /* if char was '.' break */
break;
}
printf ("buf: %s\nnew: %s\n", buf, new);
return 0;
}
(note: ndx is incremented as part of the for loop to track the number of characters copied with the pointers)
Example Use/Output
$ ./bin/str_cpy_substr
buf: My name is John. I have 1 dog.
new: My name is John.
If you do not have the luxury of initializing the string to insure nul-termination, you can always affirmatively nul-terminate after your copy is done. For example, you could add the following after the for loop exit to insure an array of unknown initialization is properly terminated:
*++n = 0; /* nul-terminate (if not already done by initialization) and
* note ++n applied before * due to C operator precedence.
*/
Look things over and let me know if you have further questions.
Just breaking it out into a helper function that "extracts" the first sentence from a line. Just copies the characters over one at a time until either an end of string condition is hit on the source, the period is found, or a max length of the destination buffer is encountered.
void ExtractFirstSentence(const char* line, char* dst, int size)
{
int count = 0;
char c ='\0';
if ((line == NULL) || (dst == NULL) || (size <= 0))
{
return;
}
while ((*line) && ((count+1) < size) && (c != '.'))
{
c = *line++;
*dst++ = c;
count++;
}
*dst = '\0';
}
int main()
{
char buf[] = "My name is John. I have 1 dog.";
char newBuf[100];
ExtractFirstSentence(buf, newBuf, 100);
printf("%s", newBuf);
}
if you want something a bit easier without dealing with all those pointers, try :
int main()
{
char buf[] = "My name is John. I have 1 dog.";
int i = 0;
int j = 0;
while(buf[i] != '.' && buf[i] != '\0') {
i++;
}
char newbuf[i+1];
while (j <= i) {
newbuf[j] = buf[j];
j++;
}
newbuf[j] = '\0';
printf("%s\n",newbuf);
return 0;
}
though the i+1 when making newbuf and the newbuff[j] = '\0' im not 100% certain need to be that way. my thoughts are the i+1 is needed to make room for the \0 ending which is then added after the while loop copying buf to newbuf. but i could be mistaken.
You can use strtok() to split string. Just type man strtok, You will see:
Program source
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(int argc, char *argv[])
{
char *str1, *str2, *token, *subtoken;
char *saveptr1, *saveptr2;
int j;
if (argc != 4) {
fprintf(stderr, "Usage: %s string delim subdelim\n",
argv[0]);
exit(EXIT_FAILURE);
}
for (j = 1, str1 = argv[1]; ; j++, str1 = NULL) {
token = strtok_r(str1, argv[2], &saveptr1);
if (token == NULL)
break;
printf("%d: %s\n", j, token);
for (str2 = token; ; str2 = NULL) {
subtoken = strtok_r(str2, argv[3], &saveptr2);
if (subtoken == NULL)
break;
printf(" --> %s\n", subtoken);
}
}
exit(EXIT_SUCCESS);
}
An example of the output produced by this program is the following:
$ ./a.out 'a/bbb///cc;xxx:yyy:' ':;' '/'
1: a/bbb///cc
--> a
--> bbb
--> cc
2: xxx
--> xxx
3: yyy
--> yyy

How do I split a string by character position in c

I'm using C to read in an external text file. The input is not great and would look like;
0PAUL 22 ACACIA AVENUE 02/07/1986RN666
As you can see I have no obvious delimeter, and sometimes the values have no space between them. However I do know how long in character length each value should be when split. Which is as follows,
id = 1
name = 20
house number = 5
street name = 40
date of birth = 10
reference = 5
I've set up a structure I want to hold this information in, and have tried using fscanf to read in the file.
However I find something along the lines of just isn't doing what I need,
fscanf(file_in, "%1d, %20s", person.id[i], person.name[i]);
(The actual line I use attempts to grab all input but you should see where I'm going...)
The long term intention is to reformat the input file into another output file which would be made a little easier on the eye.
I appreciate I'm probably going about this all the wrong way, but I would hugely appreciate it if somebody could set me on the right path. If you're able to take it easy on me in regard to an obvious lack of understanding, I'd appreciate that also.
Thanks for reading
Use fgets to read each line at a time, then extract each field from the input line. Warning: no range checks is performed on buffers, so attention must be kept to resize buffers opportunely.
For example something like this (I don't compile it, so maybe some errors exist):
void copy_substr(const char * pBuffer, int content_size, int start_idx, int end_idx, char * pOutBuffer)
{
end_idx = end_idx > content_size ? content_size : end_idx;
int j = 0;
for (int i = start_idx; i < end_idx; i++)
pOutBuffer[j++] = pBuffer[i];
pOutBuffer[j] = 0;
return;
}
void test_solution()
{
char buffer_char[200];
fgets(buffer_char,sizeof(buffer_char),stdin); // use your own FILE handle instead of stdin
int len = strlen(buffer_char);
char temp_buffer[100];
// Reading first field: str[0..1), so only the char 0 (len=1)
int field_size = 1;
int filed_start_ofs = 0;
copy_substr(buffer_char, len, filed_start_ofs, filed_start_ofs + field_size, temp_buffer);
}
scanf is a good way to do it, you just need to use a buffer and call sscanf multiple times and give the good offsets.
For example :
char buffer[100];
fscanf(file_in, "%s",buffer);
sscanf(buffer, "%1d", person.id[i]);
sscanf(buffer+1, "%20s", person.name[i]);
sscanf(buffer+1+20, "%5d", person.street_number[i]);
and so on.
I feel like it is the easiest way to do it.
Please also consider using an array of your struct instead of a struct of arrays, it just feels wrong to have person.id[i] and not person[i].id
If you have fixed column widths, you can use pointer arithmetic to access substrings of your string str. if you have a starting index begin,
printf("%s", str + begin) ;
will print the substring beginning at begin and up to the end. If you want to print a string of a certain length, you can use printf's precision specifier .*, which takes a maximum length as additional argument:
printf("%.*s", length, str + begin) ;
If you want to copy the string to a temporary buffer, you could use strncpy, which will generate a null terminated string if the buffer is larger than the substring length. You could also use snprintf according to the above pattern:
char buf[length + 1];
snprintf(buf, sizeof(buf), "%.*s", length, str + begin) ;
This will extract leading and trailing spaces, which is probably not what you want. You could write a function to strip the unwanted whitespace; there should be plenty of examples here on SO.
You could also strip the whitespace when copying the substring. The example code below does this with the isspace function/macro from <ctype.h>:
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int extract(char *buf, const char *str, int len)
{
const char *end = str + len;
int tail = -1;
int i = 0;
// skip leading white space;
while (str < end && *str && isspace(*str)) str++;
// copy string
while (str < end && *str) {
if (!isspace(*str)) tail = i + 1;
buf[i++] = *str++;
}
if (tail < 0) tail= i;
buf[tail] = '\0';
return tail;
}
int main()
{
char str[][80] = {
"0PAUL 22 ACACIA AVENUE 02/07/1986RN666",
"1BOB 1 POLK ST 01/04/1988RN802",
"2ALICE 99 WEST HIGHLAND CAUSEWAY 28/06/1982RN774"
};
int i;
for (i = 0; i < 3; i++) {
char *p = str[i];
char id[2];
char name[20];
char number[6];
char street[35];
char bday[11];
char ref[11];
extract(id, p + 0, 1);
extract(name, p + 1, 19);
extract(number, p + 20, 5);
extract(street, p + 25, 34);
extract(bday, p + 59, 10);
extract(ref, p + 69, 10);
printf("<person id='%s'>\n", id);
printf(" <name>%s</name>\n", name);
printf(" <house>%s</house>\n", number);
printf(" <street>%s</street>\n", street);
printf(" <birthday>%s</birthday>\n", bday);
printf(" <reference>%s</reference>\n", ref);
printf("</person>\n\n");
}
return 0;
}
There's a danger here, however: When you access a string at a certain position str + pos you should make sure that you don't go beyond the actual string length. For example, you string may be terminated after the name. When you access the birthday, you access valid memory, but it might contain garbage.
You can avoid this problem by padding the full string with spaces.

Ignoring spaces in a string unless it's in quotes

char *args[32];
char **next = args;
char *temp = NULL;
char *quotes = NULL;
temp = strtok(line, " \n&");
while (temp != NULL) {
if (strncmp(temp, "\"", 1) == 0) {
//int i = strlen(temp);
printf("first if");
quotes = strtok(temp, "\"");
} else if (strncmp(temp, "\"", 1) != 0) {
*next++ = temp;
temp = strtok(NULL, " \n&");
}
}
I'm having trouble with trying to understand with how to still keep spaces if a part of the string is surrounded with quotes. For example, if I want execvp() to execute this: diff "space name.txt" sample.txt, it should save diff at args[0], space name.txt at args[1] and sample.txt at args[2].
I'm not really sure on how to implement this, I've tried a few different ways of logic with if statements, but I'm not quite there. At the moment I am trying to do something simple like: ls "folder", however, it gets stuck in the while loop of printing out my printf() statement.
I know this isn't worded as a question - it's more explaining what I'm trying to achieve and where I'm up to so far, but I'm having trouble and would really appreciate some hints of how the logic should be.
Instead of using strtok process the string char by char. If you see a ", set a flag. If flag is already set - unset it instead. If you see a space - check the flag and either switch to next arg, or add space to current. Any other char - add to current. Zero byte - done processing.
With some extra effort you'll be able to handle even stuff like diff "file \"one\"" file\ two (you should get diff, file "one" and file two as results)
I'm confused even to understand what you try to do. Are you trying to tokenize the input string into space separated tokens?
Just separate the input string on spaces and when you encounter a double quote char you need a second inner loop which handles quoted strings.
There is more to quoted strings than to search for the closing quote. You need to handle backslashes, for example backslashed escaped quotes and also backslash escaped backslashes.
Just consider the following:
diff "space name \" with quotes.txt\\" foo
Which refers to a (trashy) filename space name " with quotes.txt\. Use this as a test case, then you know when you are done with the basics. Note that shell command line splitting is a lot more crazy than that.
Here is my idea:
Make two pointers A and B, initially pointing at first char of the string.
Iterate through the string with pointer A, copying every char into an array as long as it's not a space.
Once you have reached a ", take the pointer B starting from the position A+1 and go forward until you reach the next ", copying everything including space.
Now repeat from number 2, starting from the char B+1.
Repeat as long as you haven't reached \0.
Note: You'll have to consider what to do if there are nested quotes though.
You can also use a flag (int 1 || 0) and a pointer to denote if you're in a quote or not, following 2 separate rules based on the flag.
Write three functions. All of these should return the number of bytes they process. Firstly, the one that handles quoted arguments.
size_t handle_quoted_argument(char *str, char **destination) {
assert(*str == '\"');
/* discard the opening quote */
*destination = str + 1;
/* find the closing quote (or a '\0' indicating the end of the string) */
size_t length = strcspn(str + 1, "\"") + 1;
assert(str[length] == '\"'); /* NOTE: You really should handle mismatching quotes properly, here */
/* discard the closing quote */
str[length] = '\0';
return length + 1;
}
... then a function to handle the unquoted arguments:
size_t handle_unquoted_argument(char *str, char **destination) {
size_t length = strcspn(str, " \n");
char c = str[length];
*destination = str;
str[length] = '\0';
return c == ' ' ? length + 1 : length;
}
... then a function to handle (possibly repetitive) whitespace:
size_t handle_whitespace(char *str) {
int whitespace_count;
/* This will count consecutive whitespace characters, eg. tabs, newlines, spaces... */
assert(sscanf(str, " %n", &whitespace_count) == 0);
return whitespace_count;
}
Combining these three should be simple:
size_t n = 0, argv = 0;
while (line[n] != '\0') {
n += handle_whitespace(line + n);
n += line[n] == '\"' ? handle_quoted_argument(line + n, args + argv++)
: handle_unquoted_argument(line + n, args + argv++);
}
By breaking this up into four separate algorithms, can you see how much simpler this task becomes?
So here is where I read in the line:
while((qtemp = fgets(line, size, stdin)) != NULL ) {
if (strcmp(line, "exit\n") == 0) {
exit(EXIT_SUCCESS);
}
spaceorquotes(qtemp);
}
Then I go to this: (I haven't added my initializers, you get the idea though)
length = strlen(qtemp);
for(i = 0; i < length; i++) {
position = strcspn(qtemp, " \"\n");
while (strncmp(qtemp, " ", 1) == 0) {
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, " \"\n");
} /*this while loop is for handling multiple spaces*/
if (strncmp(qtemp, "\"", 1) == 0) { /*this is for handling quotes */
memmove(qtemp, qtemp+1, length-1);
position = strcspn(qtemp, "\"");
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
} else { /*otherwise handle it as a (single) space*/
stemp = malloc(position*sizeof(char));
strncat(stemp, qtemp, position);
args[i] = stemp;
}
//printf("args: %s\n", args[i]);
length = strlen(qtemp);
memmove(qtemp, qtemp+position+1, length-position);
}
args[i-1] = NULL; /*the last position seemed to be a space, so I overwrote it with a null to terminate */
if (execvp(args[0], args) == -1) {
perror("execvp");
exit(EXIT_FAILURE);
}
I found that using strcspn helped, as modifiable lvalue suggested.

How to do a split of a string

HI, I would like how to do a split of a string in c without #include
Multiple ways of doing that, which I'll just explain and not write for you as this can only be a homework (or self-enhancement exercise, so the intent is the same).
Either you split the string into multiple strings that you re-allocate into a multi-dimensional array,
or you simply cut the string on separators and add terminal '\0' where appropriate and just copy the starting address of each sub-string to an array of pointers.
The approach for the splitting is similar in both cases, but in the second one you don't need to allocate any memory (but modify the original string), while in the first one you create safe copies of each sub-string.
You were not specific on the splitting, so I don't know if you wanted to cut on substrings, a single charater, or a list of potential separators, etc...
Good luck.
find the point you would like to split it
make two buffers large enough to contain data
strcpy() or do it manually (see example)
in this code I assume you have a string str[] and would like to split it at the first comma:
for(int count = 0; str[count] != '\0'; count++) {
if(str[count] == ',')
break;
}
if(str[count] == '\0')
return 0;
char *s1 = malloc(count);
strcpy(s1, (str+count+1)); // get part after
char *s2 = malloc(strlen(str) - count); // get part before
for(int count1 = 0; count1 < count; count1++)
s2[count1] = str[count1];
got it? ;)
Assuming I have complete control of the function prototype, I'd do this (make this a single source file (no #includes) and compile, then link with the rest of the project)
If #include <stddef.h> is part of the "without #include" thing (but it shouldn't), then instead of size_t, use unsigned long in the code below
#include <stddef.h>
/* split of a string in c without #include */
/*
** `predst` destination for the prefix (before the split character)
** `postdst` destination for the postfix (after the split character)
** `src` original string to be splitted
** `ch` the character to split at
** returns the length of `predst`
**
** it is UB if
** src does not contain ch
** predst or postdst has no space for the result
*/
size_t split(char *predst, char *postdst, const char *src, char ch) {
size_t retval = 0;
while (*src != ch) {
*predst++ = *src++;
retval++;
}
*predst = 0;
src++; /* skip over ch */
while ((*postdst++ = *src++) != 0) /* void */;
return retval;
}
Example usage
char a[10], b[42];
size_t n;
n = split(b, a, "forty two", ' ');
/* n is 5; b has "forty"; a has "two" */

Resources