Expand Tabs to Spaces in C? - c

I need to expand tabs in an input line, so that they are spaces (with a width of 8 columns). I tried it with a previous code I had replacing the last space in every line greater than 10 characters with a '\n' to make a new line. Is there an way in C to make tabs 8 spaces in order to expand them? I mean I am sure it is simple, I just can't seem to get it.
Here's my code:
int v = 0;
int w = 0;
int tab;
extern char line[];
while (v < length) {
if(line[v] == '\t')
tab = v;
if (w == MAXCHARS) {
// THIS IS WHERE I GET STUCK
line[tab] = ' ';
// set y to 0, so loop starts over
w = 0;
}
++v;
++w;
}

This isn't really a question about the C language; it's a question about finding the right algorithm -- you could use that algorithm in any language.
Anyhow, you can't do this at all without reallocating line[] to point at a larger buffer (unless it's a large fixed length, in which case you need to be worried about overflows); as you're expanding the tabs, you need more memory to store the new, larger lines, so character replacement such as you're trying to do simply won't work.
My suggestion: Rather than trying to operate in place (or trying to operate in memory, even) I would suggest writing this as a filter -- reading from stdin and writing to stdout one character at a time; that way you don't need to worry about memory allocation or deallocation or the changing length of line[].
If the context this code is being used in requires it to operate in memory, consider implementing an API similar to realloc(), wherein you return a new pointer; if you don't need to change the length of the string being handled you can simply keep the original region of memory, but if you do need to resize it, the option is available.

You need a separate buffer to write the output to, since it will in general be longer than the input:
void detab(char* in, char* out, size_t max_len) {
size_t i = 0;
while (*in && i < max_len - 1) {
if (*in == '\t') {
in++;
out[i++] = ' ';
while (i % 8 && i < max_len - 1) {
out[i++] = ' ';
}
} else {
out[i++] = *in++;
}
}
out[i] = 0;
}
You must preallocate enough space for out (which in the worst case could be 8 * strlen(in) + 1), and out cannot be the same as in.
EDIT: As suggested by Jonathan Leffler, the max_len parameter now makes sure we avoid buffer overflows. The resulting string will always be null-terminated, even if it is cut short to avoid such an overflow. (I also renamed the function, and changed int to size_t for added correctness :).)

I would probably do something like this:
Iterate through the string once, only counting the tabs (and the string length if you don't already know that).
Allocate original_size + 7 * number_of_tabs bytes of memory (where original_size counts the null byte).
Iterate through the string another time, copying every non-tab byte to the new memory and inserting 8 spaces for every tab.
If you want to do the replacement in-place instead of creating a new string, you'll have to make sure that the passed-in pointer points to a location with enough memory to store the new string (which will be longer than the original because 8 spaces or 7 bytes more than one tab).

Here's a reentrant, recursive version which automatically allocates a buffer of correct size:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct state
{
char *dest;
const char *src;
size_t tab_size;
size_t size;
_Bool expand;
};
static void recexp(struct state *state, size_t di, size_t si)
{
size_t start = si;
size_t pos = si;
for(; state->src[pos]; ++pos)
{
if(state->src[pos] == '\n') start = pos + 1;
else if(state->src[pos] == '\t')
{
size_t str_len = pos - si;
size_t tab_len = state->tab_size - (pos - start) % state->tab_size;
recexp(state, di + str_len + tab_len, pos + 1);
if(state->dest)
{
memcpy(state->dest + di, state->src + si, str_len);
memset(state->dest + di + str_len, ' ', tab_len);
}
return;
}
}
state->size = di + pos - si + 1;
if(state->expand && !state->dest) state->dest = malloc(state->size);
if(state->dest)
{
memcpy(state->dest + di, state->src + si, pos - si);
state->dest[state->size - 1] = 0;
}
}
size_t expand_tabs(char **dest, const char *src, size_t tab_size)
{
struct state state = { dest ? *dest : NULL, src, tab_size, 0, dest };
recexp(&state, 0, 0);
if(dest) *dest = state.dest;
return state.size;
}
int main(void)
{
char *expansion = NULL; // must be `NULL` for automatic allocation
size_t size = expand_tabs(&expansion,
"spam\teggs\tfoo\tbar\nfoobar\tquux", 4);
printf("expanded size: %lu\n", (unsigned long)size);
puts(expansion);
}
If expand_tabs() is called with dest == NULL, the function will return the size of the expanded string, but no expansion is actually done; if dest != NULL but *dest == NULL, a buffer of correct size will be allocated and must be deallocated by the programmer; if dest != NULL and *dest != NULL, the expanded string will be put into *dest, so make sure the supplied buffer is large enough.

Untested, but something like this should work:
int v = 0;
int tab;
extern char line[];
while (v < length){
if (line[v] == '\t') {
tab = (v % TAB_WIDTH) || TAB_WIDTH;
/* I'm assuming MAXCHARS is the size of your array. You either need
* to bail, or resize the array if the expanding the tab would make
* the string too long. */
assert((length + tab) < MAXCHARS);
if (tab != 1) {
memmove(line + v + tab - 1, line + v, length - v + 1);
}
memset(line + v, ' ', tab);
length += tab - 1;
v += tab;
} else {
++v;
}
}
Note that this is O(n*m) where n is the line size and m is the number of tabs. That probably isn't an issue in practice.

There are a myriad ways to convert tabs in a string into 1-8 spaces. There are inefficient ways to do the expansion in-situ, but the easiest way to handle it is to have a function that takes the input string and a separate output buffer that is big enough for an expanded string. If the input is 6 tabs plus an X and a newline (8 characters + terminating null), the output would be 48 blanks, X, and a newline (50 characters + terminating null) - so you might need a much bigger output buffer than input buffer.
#include <stddef.h>
#include <assert.h>
static int detab(const char *str, char *buffer, size_t buflen)
{
char *end = buffer + buflen;
char *dst = buffer;
const char *src = str;
char c;
assert(buflen > 0);
while ((c = *src++) != '\0' && dst < end)
{
if (c != '\t')
*dst++ = c;
else
{
do
{
*dst++ = ' ';
} while (dst < end && (dst - buffer) % 8 != 0);
}
}
if (dst < end)
{
*dst = '\0';
return(dst - buffer);
}
else
return -1;
}
#ifdef TEST
#include <stdio.h>
#include <string.h>
#ifndef TEST_INPUT_BUFFERSIZE
#define TEST_INPUT_BUFFERSIZE 4096
#endif /* TEST_INPUT_BUFFERSIZE */
#ifndef TEST_OUTPUT_BUFFERSIZE
#define TEST_OUTPUT_BUFFERSIZE (8 * TEST_INPUT_BUFFERSIZE)
#endif /* TEST_OUTPUT_BUFFERSIZE */
int main(void)
{
char ibuff[TEST_INPUT_BUFFERSIZE];
char obuff[TEST_OUTPUT_BUFFERSIZE];
while (fgets(ibuff, sizeof(ibuff), stdin) != 0)
{
if (detab(ibuff, obuff, sizeof(obuff)) >= 0)
fputs(obuff, stdout);
else
fprintf(stderr, "Failed to detab input line: <<%.*s>>\n",
(int)(strlen(ibuff) - 1), ibuff);
}
return(0);
}
#endif /* TEST */
The biggest trouble with this test is that it is hard to demonstrate that it handles overflows in the output buffer properly. That's why there are the two '#define' sequences for the buffer sizes - with very large defaults for real work and independently configurable buffer sizes for stress testing. If the source file is dt.c, use a compilation like this:
make CFLAGS="-DTEST -DTEST_INPUT_BUFFERSIZE=32 -DTEST_OUTPUT_BUFFERSIZE=32" dt
If the 'detab()' function is to be used outside this file, you'd create a header to contain its declaration, and you'd include that header in this code, and the function would not be static, of course.

Here is one that will malloc(3) a bigger buffer of exactly the right size and return the expanded string. It does no division or modulus ops. It even comes with a test driver. Safe with -Wall -Wno-parentheses if using gcc.
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
static char *expand_tabs(const char *s) {
int i, j, extra_space;
char *r, *result = NULL;
for(i = 0; i < 2; ++i) {
for(j = extra_space = 0; s[j]; ++j) {
if (s[j] == '\t') {
int es0 = 8 - (j + extra_space & 7);
if (result != NULL) {
strncpy(r, " ", es0);
r += es0;
}
extra_space += es0 - 1;
} else if (result != NULL)
*r++ = s[j];
}
if (result == NULL)
if ((r = result = malloc(j + extra_space + 1)) == NULL)
return NULL;
}
*r = 0;
return result;
}
#include <stdio.h>
int main(int ac, char **av) {
char space[1000];
while (fgets(space, sizeof space, stdin) != NULL) {
char *s = expand_tabs(space);
fputs(s, stdout);
free(s);
}
return 0;
}

Related

reduce the size of a string

(disclaimer: this is not a complete exercise because I have to finish it, but error occurred in this part of code)
I did this exercise to practice memory allocation.
create a function that takes an url (a C string) and returns the name of the website (with "www." and with the extension).
for example, given wikipedia's link, "http://www.wikipedia.org/", it has to return only "www.wikipedia.org" in another string (dynamically allocated in the heap).
this is what I did so far:
do a for-loop, and when "i" is greater than 6, then start copying each character in another string until "/" is reached.
I need to allocate the other string, and then reallocate that.
here's my attempt so far:
char *read_website(const char *url) {
char *str = malloc(sizeof(char));
if (str == NULL) {
exit(1);
}
for (unsigned int i = 0; url[i] != "/" && i > 6; ++i) {
if (i <= 6) {
continue;
}
char* s = realloc(str, sizeof(char) + 1);
if (s == NULL) {
exit(1);
}
*str = *s;
}
return str;
}
int main(void) {
char s[] = "http://www.wikipedia.org/";
char *str = read_website(s);
return 0;
}
(1) by debugging line-by-line, I've noticed that the program ends once for-loop is reached.
(2) another thing: I've chosen to create another pointer when I've used realloc, because I have to check if there's memory leak. Is it a good practice? Or should I've done something else?
There are multiple problems in your code:
url[i] != "/" is incorrect, it is a type mismatch. You should compare the character url[i] with a character constant '/', not a string literal "/".
char *s = realloc(str, sizeof(char) + 1); reallocates only to size 2, not the current length plus 1.
you do not increase the pointers, neither do you use the index variable.
instead of using malloc and realloc, you should first compute the length of the server name and allocate the array with the correct size directly.
Here is a modified version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *read_website(const char *url) {
// skip the protocol part
if (!strncmp(url, "http://", 7))
url += 7;
else if (!strncmp(url, "https://", 8))
url += 8;
// compute the length of the domain name, stop at ':' or '/'
size_t n = strcspn(url, "/:");
// return an allocated copy of the substring
return strndup(url, n);
}
int main(void) {
char s[] = "http://www.wikipedia.org/";
char *str = read_website(s);
printf("%s -> %s\n", s, str);
free(str);
return 0;
}
strndup() is a POSIX function available on many systems and that will be part of the next version of the C Standard. If it is not available on your target, here is a simple implementation:
char *strndup(const char *s, size_t n) {
char *p;
size_t i;
for (i = 0; i < n && s[i]; i++)
continue;
p = malloc(i + 1);
if (p) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
The assignment doesn't say the returned string must be of minimal size, and the amount of memory used for URLs is minimal.
Building on chqrlie's solution, I'd start by finding the beginning of the domain name (skipping the protocol portion), duplicate the rest of the string, and then truncate the result. Roughly:
char *prot[] = { "http://", "https://" };
for( int i=0; i < 2; i++ ) {
if( 0 == strncmp(url, http, strlen(prot)) )
s += strlen(prot);
break;
}
}
char *output = strdup(s);
if( output ) {
size_t n = strcspn(output, "/:");
output[n] = '\0';
}
return output;
The returned pointer can still be freed by the caller, so the total "wasted" space is limited to the trailing part of the truncated URL.

Why does my string_split implementation not work?

My str_split function returns (or at least I think it does) a char** - so a list of strings essentially. It takes a string parameter, a char delimiter to split the string on, and a pointer to an int to place the number of strings detected.
The way I did it, which may be highly inefficient, is to make a buffer of x length (x = length of string), then copy element of string until we reach delimiter, or '\0' character. Then it copies the buffer to the char**, which is what we are returning (and has been malloced earlier, and can be freed from main()), then clears the buffer and repeats.
Although the algorithm may be iffy, the logic is definitely sound as my debug code (the _D) shows it's being copied correctly. The part I'm stuck on is when I make a char** in main, set it equal to my function. It doesn't return null, crash the program, or throw any errors, but it doesn't quite seem to work either. I'm assuming this is what is meant be the term Undefined Behavior.
Anyhow, after a lot of thinking (I'm new to all this) I tried something else, which you will see in the code, currently commented out. When I use malloc to copy the buffer to a new string, and pass that copy to aforementioned char**, it seems to work perfectly. HOWEVER, this creates an obvious memory leak as I can't free it later... so I'm lost.
When I did some research I found this post, which follows the idea of my code almost exactly and works, meaning there isn't an inherent problem with the format (return value, parameters, etc) of my str_split function. YET his only has 1 malloc, for the char**, and works just fine.
Below is my code. I've been trying to figure this out and it's scrambling my brain, so I'd really appreciate help!! Sorry in advance for the 'i', 'b', 'c' it's a bit convoluted I know.
Edit: should mention that with the following code,
ret[c] = buffer;
printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
it does indeed print correctly. It's only when I call the function from main that it gets weird. I'm guessing it's because it's out of scope ?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define DEBUG
#ifdef DEBUG
#define _D if (1)
#else
#define _D if (0)
#endif
char **str_split(char[], char, int*);
int count_char(char[], char);
int main(void) {
int num_strings = 0;
char **result = str_split("Helo_World_poopy_pants", '_', &num_strings);
if (result == NULL) {
printf("result is NULL\n");
return 0;
}
if (num_strings > 0) {
for (int i = 0; i < num_strings; i++) {
printf("\"%s\" \n", result[i]);
}
}
free(result);
return 0;
}
char **str_split(char string[], char delim, int *num_strings) {
int num_delim = count_char(string, delim);
*num_strings = num_delim + 1;
if (*num_strings < 2) {
return NULL;
}
//return value
char **ret = malloc((*num_strings) * sizeof(char*));
if (ret == NULL) {
_D printf("ret is null.\n");
return NULL;
}
int slen = strlen(string);
char buffer[slen];
/* b is the buffer index, c is the index for **ret */
int b = 0, c = 0;
for (int i = 0; i < slen + 1; i++) {
char cur = string[i];
if (cur == delim || cur == '\0') {
_D printf("Copying content of buffer to ret[%i]\n", c);
//char *tmp = malloc(sizeof(char) * slen + 1);
//strcpy(tmp, buffer);
//ret[c] = tmp;
ret[c] = buffer;
_D printf("Content of ret[%i] = \"%s\" \n", c, ret[c]);
//free(tmp);
c++;
b = 0;
continue;
}
//otherwise
_D printf("{%i} Copying char[%c] to index [%i] of buffer\n", c, cur, b);
buffer[b] = cur;
buffer[b+1] = '\0'; /* extend the null char */
b++;
_D printf("Buffer is now equal to: \"%s\"\n", buffer);
}
return ret;
}
int count_char(char base[], char c) {
int count = 0;
int i = 0;
while (base[i] != '\0') {
if (base[i++] == c) {
count++;
}
}
_D printf("Found %i occurence(s) of '%c'\n", count, c);
return count;
}
You are storing pointers to a buffer that exists on the stack. Using those pointers after returning from the function results in undefined behavior.
To get around this requires one of the following:
Allow the function to modify the input string (i.e. replace delimiters with null-terminator characters) and return pointers into it. The caller must be aware that this can happen. Note that supplying a string literal as you are doing here is illegal in C, so you would instead need to do:
char my_string[] = "Helo_World_poopy_pants";
char **result = str_split(my_string, '_', &num_strings);
In this case, the function should also make it clear that a string literal is not acceptable input, and define its first parameter as const char* string (instead of char string[]).
Allow the function to make a copy of the string and then modify the copy. You have expressed concerns about leaking this memory, but that concern is mostly to do with your program's design rather than a necessity.
It's perfectly valid to duplicate each string individually and then clean them all up later. The main issue is that it's inconvenient, and also slightly pointless.
Let's address the second point. You have several options, but if you insist that the result be easily cleaned-up with a call to free, then try this strategy:
When you allocate the pointer array, also make it large enough to hold a copy of the string:
// Allocate storage for `num_strings` pointers, plus a copy of the original string,
// then copy the string into memory immediately following the pointer storage.
char **ret = malloc((*num_strings) * sizeof(char*) + strlen(string) + 1);
char *buffer = (char*)&ret[*num_strings];
strcpy(buffer, string);
Now, do all your string operations on buffer. For example:
// Extract all delimited substrings. Here, buffer will always point at the
// current substring, and p will search for the delimiter. Once found,
// the substring is terminated, its pointer appended to the substring array,
// and then buffer is pointed at the next substring, if any.
int c = 0;
for(char *p = buffer; *buffer; ++p)
{
if (*p == delim || !*p) {
char *next = p;
if (*p) {
*p = '\0';
++next;
}
ret[c++] = buffer;
buffer = next;
}
}
When you need to clean up, it's just a single call to free, because everything was stored together.
The string pointers you store into the res with ret[c] = buffer; array point to an automatic array that goes out of scope when the function returns. The code subsequently has undefined behavior. You should allocate these strings with strdup().
Note also that it might not be appropriate to return NULL when the string does not contain a separator. Why not return an array with a single string?
Here is a simpler implementation:
#include <stdlib.h>
char **str_split(const char *string, char delim, int *num_strings) {
int i, n, from, to;
char **res;
for (n = 1, i = 0; string[i]; i++)
n += (string[i] == delim);
*num_strings = 0;
res = malloc(sizeof(*res) * n);
if (res == NULL)
return NULL;
for (i = from = to = 0;; from = to + 1) {
for (to = from; string[to] != delim && string[to] != '\0'; to++)
continue;
res[i] = malloc(to - from + 1);
if (res[i] == NULL) {
/* allocation failure: free memory allocated so far */
while (i > 0)
free(res[--i]);
free(res);
return NULL;
}
memcpy(res[i], string + from, to - from);
res[i][to - from] = '\0';
i++;
if (string[to] == '\0')
break;
}
*num_strings = n;
return res;
}

Is there a fast way to interpose a character between strings?

I wrote this function that will generate a single string out of a file list.
(e.g. if I have a folder with FileA.txt, FileB.png and FileC I'll get as output this string: FileA.txtFileB.pngFileC). Now I want to add a / character between each filename. (e.g. FileA.txt/FileB.png/FileC/) Is there a way to do it in "one blow" without having to repeat the same operation twice?
In other words, is there a way to do something like:
original_string = append2(original_string, new_string, '/');
instead of having to do
append(original_string, new_string);
append(original_string, "/");
?
Here's the function I wrote as reference:
/**
* #brief Concatenate all file names in a file list (putting a '/' between each of them)
* #param file_list The file list to serialize.
* #return A string containing all files in the file list.
*/
char *file_list_tostring(struct file_list *file_list) {
char *final_string = NULL;
size_t final_len = 0;
struct file_node *list_iter = file_list->first;
while (list_iter != NULL) {
char *tmp = list_iter->filename;
size_t tmp_len = strlen(tmp);
char *s = realloc(final_string, final_len + tmp_len + 1); // +1 for '\0'
if (s == NULL) {
perror("realloc");
exit(EXIT_FAILURE);
}
final_string = s;
memcpy(final_string + final_len, tmp, tmp_len + 1);
final_len += tmp_len;
list_iter = list_iter->next;
}
return final_string;
}
Maybe there is a simple way to interpose a single character between two strings?
Note: I know there's nothing wrong in repeating the same operation twice, I'm asking this question to know if there is a better way of doing so!
Yes, you can do sprintf:
#include <stdio.h>
int main()
{
char var1[] = "FileA.txt";
char var2[] = "FileB.png";
char var3[] = "FileC";
char result[30];
sprintf(result, "%s/%s/%s", var1, var2,var3);
printf("result: %s\n", result);
return 0;
}
And the result is like this:
result: FileA.txt/FileB.png/FileC
If you need, the variable result can be a pointer and allocate space based on your needs.
As Michael Burr mentioned in a comment to the question, it is best to walk the list/array twice. On the first pass, calculate the total length of the string needed. Next, allocate the memory needed for the entire string. On the second pass, copy the contents. Do not forget to account for, and append, the string-terminating nul byte (\0).
Consider the following example functions dupcat() and dupcats():
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <stdio.h>
char *dupcat(const size_t count, const char *parts[])
{
size_t i, len = 0;
char *dst, *end;
/* Calculate total length of parts. Skip NULL parts. */
for (i = 0; i < count; i++)
len += (parts[i]) ? strlen(parts[i]) : 0;
/* Add room for '\0'.
We add an extra 8 to 15 '\0's, just because
it is sometimes useful, and we do a dynamic
allocation anyway. */
len = (len | 7) + 9;
/* Allocate memory. */
dst = malloc(len);
if (!dst) {
fprintf(stderr, "dupcat(): Out of memory; tried to allocate %zu bytes.\n", len);
exit(EXIT_FAILURE);
}
/* Copy parts. */
end = dst;
for (i = 0; i < count; i++) {
const char *src = parts[i];
/* We could use strlen() and memcpy(),
but a loop like this will work just as well. */
if (src)
while (*src)
*(end++) = *(src++);
}
/* Sanity check time! */
if (end >= dst + len) {
fprintf(stderr, "dupcat(): Arguments were modified during duplication; buffer overrun!\n");
free(dst); /* We can omit this free(), but only in case of exit(). */
exit(EXIT_FAILURE);
}
/* Terminate string (and clear padding). */
memset(end, '\0', (size_t)(dst + len - end));
/* Done! */
return dst;
}
char *dupcats(const size_t count, ...)
{
size_t i, len = 0;
char *dst, *end;
va_list args;
/* Calculate total length of 'count' source strings. */
va_start(args, count);
for (i = 0; i < count; i++) {
const char *src = va_arg(args, const char *);
if (src)
len += strlen(src);
}
va_end(args);
/* Add room for end-of-string '\0'.
Because it is often useful to know you have
at least one extra '\0' at the end of the string,
and we do a dynamic allocation anyway,
we pad the string with 9 to 16 '\0',
aligning 'len' to a multiple of 8. */
len = (len | 7) + 9;
/* Allocate memory for the string. */
dst = malloc(len);
if (!dst) {
fprintf(stderr, "dupcats(): Out of memory; tried to allocate %zu bytes.\n", len);
exit(EXIT_FAILURE);
}
/* Copy the source strings. */
end = dst;
va_start(args, count);
for (i = 0; i < count; i++) {
const char *src = va_arg(args, const char *);
/* We could use strlen() and memcpy() here;
however, this loop is easier to follow. */
if (src)
while (*src)
*(end++) = *(src++);
}
va_end(args);
/* Sanity check. */
if (end >= dst + len) {
fprintf(stderr, "dupcats(): Arguments were modified during duplication; buffer overrun!\n");
free(dst); /* We can omit this free(), but only in case of exit(). */
exit(EXIT_FAILURE);
}
/* Add end-of-string '\0' (filling the padding). */
memset(end, '\0', dst + len - end);
/* Done. */
return dst;
}
int main(int argc, char *argv[])
{
char *result;
result = dupcat(argc - 1, (const char **)(argv + 1));
printf("Arguments concatenated: '%s'.\n", result);
free(result);
result = dupcats(5, "foo", "/", "bar", "/", "baz");
printf("Concatenating 'foo', '/', 'bar', '/', and 'baz': '%s'.\n", result);
free(result);
return EXIT_SUCCESS;
}
Neither dupcat() nor dupcats() will ever return NULL: they will print an error message to standard error and exit, if an error occurs.
dupcat() takes an array of strings, and returns a dynamically allocated concatenated copy with at least eight bytes of nul padding.
dupcats() takes a variable number of pointers, and returns a dynamically allocated concatenated copy with at least eight bytes of nul padding.
Both functions treat NULL pointers as if they were empty strings. For both functions, the first parameter is the number of strings to concatenate.
(Since OP did not show the definitions of struct file_list or struct file_node, I did not bother to write a list-based version. However, it should be trivial to adapt from one of the two versions shown.)
In some cases, a variant that constructs a valid path from a fixed base part, with one or more relative file or directory names concatenated, and POSIXy ./ removed and ../ backtracked (but not out of base subtree), is very useful.
If carefully written, it allows the program to accept untrusted paths, relative to a specific subtree. (The combined paths are confined to that subtree, but symlinks and hardlinks can still be used to escape the subtree.)
One possible implementation is as follows:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
char *dynamic_path(const char *const subtree,
const size_t parts,
const char *part[])
{
const size_t subtree_len = (subtree) ? strlen(subtree) : 0;
size_t parts_len = 0;
size_t total_len, i;
char *path, *mark, *curr;
/* Calculate the length of each individual part.
Include room for a leading slash.
*/
for (i = 0; i < parts; i++)
parts_len += (part[i]) ? 1 + strlen(part[i]) : 0;
/* Add room for the string-terminating '\0'.
We're paranoid, and add a bit more padding. */
total_len = ((subtree_len + parts_len) | 7) + 9;
/* Allocate memory for the combined path. */
path = malloc(total_len);
if (!path) {
errno = ENOMEM;
return NULL;
}
/* If the user specified a subtree, we use it as the fixed prefix. */
if (subtree_len > 0) {
memcpy(path, subtree, subtree_len);
mark = path + subtree_len;
/* Omit a trailing /. We enforce it below anyway. */
if (parts > 0 && subtree_len > 1 && mark[-1] == '/')
--mark;
} else
mark = path;
/* Append the additional path parts. */
curr = mark;
for (i = 0; i < parts; i++) {
const size_t len = (part[i]) ? strlen(part[i]) : 0;
if (len > 0) {
/* Each path part is a separate file/directory name,
so there is an (implicit) slash before each one. */
if (part[i][0] != '/')
*(curr++) = '/';
memcpy(curr, part[i], len);
curr += len;
}
}
/* Sanity check. */
if (curr >= path + total_len) {
/* Buffer overrun occurred. */
fprintf(stderr, "Buffer overrun in dynamic_path()!\n");
free(path); /* Can be omitted if we exit(). */
exit(EXIT_FAILURE);
}
/* Terminate string (and clear padding). */
memset(curr, '\0', (size_t)(path + total_len - curr));
/* Cleanup pass.
Convert "/foo/../" to "/", but do not backtrack over mark.
Combine consecutive slashes and /./ to a single slash.
*/
{
char *src = mark;
char *dst = mark;
while (*src)
if (src[0] == '/' && src[1] == '.' && src[2] == '.' && (!src[3] || src[3] == '/')) {
src += 3; /* Skip over /.. */
/* Backtrack, but do not underrun mark. */
if (dst > mark) {
dst--;
while (dst > mark && *dst != '/')
dst--;
}
/* Never consume the mark slash. */
if (dst == mark)
dst++;
} else
if (src[0] == '/' && src[1] == '.' && (!src[2] || src[2] == '/')) {
src += 2; /* Skip over /. */
if (dst == mark || dst[-1] != '/')
*(dst++) = '/';
} else
if (src[0] == '/') {
src++;
if (dst == mark || dst[-1] != '/')
*(dst++) = '/';
} else
*(dst++) = *(src++);
/* Clear removed part. */
if (dst < src)
memset(dst, '\0', (size_t)(src - dst));
}
return path;
}
int main(int argc, char *argv[])
{
char *path;
if (argc < 2) {
fprintf(stderr, "\nUsage: %s PREFIX [ PATHNAME ... ]\n\n", argv[0]);
return EXIT_FAILURE;
}
path = dynamic_path(argv[1], argc - 2, (const char **)(argv + 2));
if (!path) {
fprintf(stderr, "dynamic_path(): %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("%s\n", path);
free(path);
return EXIT_SUCCESS;
}
Note that I wrote the above version from scratch (and dedicate it to public domain (CC0)), so you should thoroughly test it before relying it on production use. (My intent is for it to be an useful example or basis, that will help you write your own implementation tailored to your needs.)
If you do find any bugs or issues in it, let me know in a comment, so I can verify and fix.

Split function in C runtime error

I get a runtime error when running a C program,
Here is the C source (parsing.h header code a little lower):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parsing.h"
int main()
{
printf("Enter text seperated by single spaces :\n");
char *a = malloc(sizeof(char)*10);
gets(a);
char **aa = Split(a, ' ');
int k = SplitLen(a, ' ');
int i = 0;
for(;i<k;i++)
{
printf("%s\n", aa[i]);
}
free(a);
free(aa);
return 0;
}
and the parsing.h file:
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
char** Split(char* a_str, const char a_delim)
{
char** result = 0;
int count = 0;
char* tmp = a_str;
char* last_comma = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char*) * count);
if (result)
{
size_t idx = 0;
char* token = strtok(a_str, ",");
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, ",");
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
int SplitLen(char *src, char sep)
{
int result = 0;
int i;
for(i = 0; i<strlen(src); i++)
{
if(src[i] == sep)
{
result += 1;
}
}
return result;
}
I'm sure most of the code is unneeded but I posted the whole lot in case there is some relevance, Here is the runtime error:
a.out: parsing.h:69: Split: Assertion `idx == count - 1' failed.
Aborted
Thanks in advance and for info I didn't program the whole lot but took some pieces from some places but most is my programming Thanks!.
The purpose of the assert function is that is will stop your program if the condition passed as an argument is false. What this tells you is that when you ran your program, idx != count - 1 at line 69. I didn't take the time to check what import that has on the execution of your program, but apparently (?) idx was intended to equal count - 1 there.
Does that help?
There are many problems. I'm ignoring the code split into two files; I'm treating it as a single file (see comments to question).
Do not use gets(). Never use gets(). Do not ever use gets(). I said it three times; it must be true. Note that gets() is no longer a Standard C function (it was removed from the C11 standard — ISO/IEC 9899:2011) because it cannot be used safely. Use fgets() or another safe function instead.
You don't need to use dynamic memory allocation for a string of 10 characters; use a local variable (it is simpler).
You need a bigger string — think about 4096.
You don't check whether you got any data; always check input function calls.
You don't free all the substrings at the end of main(), thus leaking memory.
One major problem the Split() code slices and dices the input string so that SplitLen() cannot give you the same answer that Split() does for the number of fields. The strtok() function is destructive. It also treats multiple adjacent delimiters as a single delimiter. Your code won't account for the difference.
Another major problem is that you analyze the strings based on the delimiter passed into the Split() function, but you use strtok(..., ',') to actually split on commas. This is more consistent with the commentary and names, but totally misleading to you. This is why your assertion fired.
You don't need to include <malloc.h> unless you are using the extra facilities it provides. You aren't, so you should not include it; <stdlib.h> declares malloc() and free() perfectly well.
This code works for me; I've annotated most of the places I made changes.
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int altSplitLen(char **array);
static char **Split(char *a_str, const char a_delim);
static int SplitLen(char *src, char sep);
int main(void)
{
printf("Enter text separated by single spaces:\n");
char a[4096]; // Simpler
if (fgets(a, sizeof(a), stdin) != 0) // Error checked!
{
char **aa = Split(a, ' ');
int k = SplitLen(a, ' ');
printf("SplitLen() says %d; altSplitLen() says %d\n", k, altSplitLen(aa));
for (int i = 0; i < k; i++)
{
printf("%s\n", aa[i]);
}
/* Workaround for broken SplitLen() */
{
puts("Loop to null pointer:");
char **data = aa;
while (*data != 0)
printf("[%s]\n", *data++);
}
{
// Fix for major leak!
char **data = aa;
while (*data != 0)
free(*data++);
}
free(aa); // Major leak!
}
return 0;
}
char **Split(char *a_str, const char a_delim)
{
char **result = 0;
size_t count = 0;
char *tmp = a_str;
char *last_comma = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char *) * count);
if (result)
{
char delim[2] = { a_delim, '\0' }; // Fix for inconsistent splitting
size_t idx = 0;
char *token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
int SplitLen(char *src, char sep)
{
int result = 0;
for (size_t i = 0; i < strlen(src); i++)
{
if (src[i] == sep)
{
result += 1;
}
}
return result;
}
static int altSplitLen(char **array)
{
int i = 0;
while (*array++ != 0)
i++;
return i;
}
Sample run:
$ parsing
Enter text separated by single spaces:
a b c d e f gg hhh iii jjjj exculpatory evidence
SplitLen() says 0; altSplitLen() says 12
Loop to null pointer:
[a]
[b]
[c]
[d]
[e]
[f]
[gg]
[hhh]
[iii]
[jjjj]
[exculpatory]
[evidence
]
$
Note that fgets() keeps the newline and gets() does not, so the newline was included in output. Note also how the printf() printing the data showed the limits of the strings; that is enormously helpful on many occasions.

C - Replacing char with string

I am writing a program that encodes text such that it can be put into a URL. I have the user inputting a string and if it contains special characters (#, %, &, ?, etc.) to replace them with their corresponding character codes (%23, %25, %26, %3F, etc.). The problem is that the special characters are only of length 1 and the codes are of length 3. The codes end up replacing characters after the special one. This is the code I am using to do the replacement.
char *p = enteredCharStr;
while ((p = strstr(p, specialCharArr[x])) != NULL )
{
char *substr;
substr = strstr(enteredCharStr, specialChar[x]);
strncpy(substr, charCodesArr[x], 3);
p++;
}
Example output from using my program with input: "this=this&that"
this%3Dis%26at
I would like the output to be:
this%3Dthis%26that
Any idea on how to implement what I am trying to do in C (no libraries)?
One way to approach this problem would be to allocate a second string that is three times as large as enteredCharStr and copy the characters over one by one and when you see special character write the replaement instead. You want it to be three times as large since in the worst case you need to replace nearly all the characters.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int isspecial(int c){
static char table[] = "#%&?=<>"; //add etc..
return strchr(table, c) ? 1 : 0;
}
char *encode(const char *s){
size_t capa = 1024;
char *buff=malloc(capa*sizeof(char));
size_t size = 0;
for(;*s;++s){
if(size + 3 > capa){
capa += 32;
buff = realloc(buff, capa*sizeof(char));
}
if(isspecial(*s)){
size+=sprintf(buff+size, "%%%02x", *s);
} else {
size+=sprintf(buff+size, "%c", *s);
}
}
if(size == capa){
buff=realloc(buff, (size+1)*sizeof(char));
}
buff[size++]='\0';
return realloc(buff, size*sizeof(char));
}
int main(void){
char *enteredCharStr = "this=this&that";
char *p = encode(enteredCharStr);
printf("%s\n", p);
free(p);
return 0;
}
You need to make a new string. Here's an example:
char *str = "abc$ddd";
char *p = str;
char *buf = malloc(strlen(str)+1);
char *pbuf = buf;
while(*p) {
if(*p != '$') *pbuf ++ = *p;
p++;
}
It will copy from str to buf all non-$,byte per byte.
Note that in your case,you need to perform the right computation of size of new string.
A C 'string' is a fixed-size array of characters, and therefore there is no built-in notion of insertion. You're effectively asking how to insert n characters into the middle of an array.
One strategy come to mind:
To insert a string of length x at position i of an array of length n:
Resize the array to size n+x (using something like realloc).
Shuffle every character beyond position i to position i+x.
Write your string into the x positions now freed by this shuffle operation.
Alternatively, allocate a new array that is big enough to hold your target string (i.e., with all the substitutions applied), and then write your result into that by copying from the target array until you encounter a character you'd like to replace, then copy from the replacement string, then continue reading from the original source array.
I'm copying characters over one by one, and if I see a special character, (In this code only "#")
I copy in 3 characters, incrementing the index into the output buffer by 3.
You can also do something smarter to guess the buffer size, and perhaps loop over the entire operation, doubling the size of the buffer each time it overruns.
#include<stdio.h>
#include<stdlib.h>
int main(int argc, char* argv[]){
if (argc != 2) {
exit(1);
}
char* input = argv[1];
int bufferSize = 128;
char* output = malloc(bufferSize);
int outIndex = 0;
int inIndex = 0;
while(input[inIndex] != '\0'){
switch (input[inIndex])
{
case '#':·
if(outIndex + 4 > bufferSize){
// Overflow, retry or something.
exit(2);
}
output[outIndex] = '%';
output[outIndex+1] = '2';
output[outIndex+2] = '3';
outIndex = outIndex + 3;
inIndex = inIndex + 1;
break;
// Other cases
default:
if(outIndex + 2 > bufferSize){
exit(2);
}
output[outIndex] = input[inIndex];
outIndex = outIndex + 1;
inIndex = inIndex + 1;
break;
}
}
output[outIndex] = '\0';
printf("%s\n", output);
return 0;
}

Resources