Is there a fast way to interpose a character between strings? - c

I wrote this function that will generate a single string out of a file list.
(e.g. if I have a folder with FileA.txt, FileB.png and FileC I'll get as output this string: FileA.txtFileB.pngFileC). Now I want to add a / character between each filename. (e.g. FileA.txt/FileB.png/FileC/) Is there a way to do it in "one blow" without having to repeat the same operation twice?
In other words, is there a way to do something like:
original_string = append2(original_string, new_string, '/');
instead of having to do
append(original_string, new_string);
append(original_string, "/");
?
Here's the function I wrote as reference:
/**
* #brief Concatenate all file names in a file list (putting a '/' between each of them)
* #param file_list The file list to serialize.
* #return A string containing all files in the file list.
*/
char *file_list_tostring(struct file_list *file_list) {
char *final_string = NULL;
size_t final_len = 0;
struct file_node *list_iter = file_list->first;
while (list_iter != NULL) {
char *tmp = list_iter->filename;
size_t tmp_len = strlen(tmp);
char *s = realloc(final_string, final_len + tmp_len + 1); // +1 for '\0'
if (s == NULL) {
perror("realloc");
exit(EXIT_FAILURE);
}
final_string = s;
memcpy(final_string + final_len, tmp, tmp_len + 1);
final_len += tmp_len;
list_iter = list_iter->next;
}
return final_string;
}
Maybe there is a simple way to interpose a single character between two strings?
Note: I know there's nothing wrong in repeating the same operation twice, I'm asking this question to know if there is a better way of doing so!

Yes, you can do sprintf:
#include <stdio.h>
int main()
{
char var1[] = "FileA.txt";
char var2[] = "FileB.png";
char var3[] = "FileC";
char result[30];
sprintf(result, "%s/%s/%s", var1, var2,var3);
printf("result: %s\n", result);
return 0;
}
And the result is like this:
result: FileA.txt/FileB.png/FileC
If you need, the variable result can be a pointer and allocate space based on your needs.

As Michael Burr mentioned in a comment to the question, it is best to walk the list/array twice. On the first pass, calculate the total length of the string needed. Next, allocate the memory needed for the entire string. On the second pass, copy the contents. Do not forget to account for, and append, the string-terminating nul byte (\0).
Consider the following example functions dupcat() and dupcats():
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <stdio.h>
char *dupcat(const size_t count, const char *parts[])
{
size_t i, len = 0;
char *dst, *end;
/* Calculate total length of parts. Skip NULL parts. */
for (i = 0; i < count; i++)
len += (parts[i]) ? strlen(parts[i]) : 0;
/* Add room for '\0'.
We add an extra 8 to 15 '\0's, just because
it is sometimes useful, and we do a dynamic
allocation anyway. */
len = (len | 7) + 9;
/* Allocate memory. */
dst = malloc(len);
if (!dst) {
fprintf(stderr, "dupcat(): Out of memory; tried to allocate %zu bytes.\n", len);
exit(EXIT_FAILURE);
}
/* Copy parts. */
end = dst;
for (i = 0; i < count; i++) {
const char *src = parts[i];
/* We could use strlen() and memcpy(),
but a loop like this will work just as well. */
if (src)
while (*src)
*(end++) = *(src++);
}
/* Sanity check time! */
if (end >= dst + len) {
fprintf(stderr, "dupcat(): Arguments were modified during duplication; buffer overrun!\n");
free(dst); /* We can omit this free(), but only in case of exit(). */
exit(EXIT_FAILURE);
}
/* Terminate string (and clear padding). */
memset(end, '\0', (size_t)(dst + len - end));
/* Done! */
return dst;
}
char *dupcats(const size_t count, ...)
{
size_t i, len = 0;
char *dst, *end;
va_list args;
/* Calculate total length of 'count' source strings. */
va_start(args, count);
for (i = 0; i < count; i++) {
const char *src = va_arg(args, const char *);
if (src)
len += strlen(src);
}
va_end(args);
/* Add room for end-of-string '\0'.
Because it is often useful to know you have
at least one extra '\0' at the end of the string,
and we do a dynamic allocation anyway,
we pad the string with 9 to 16 '\0',
aligning 'len' to a multiple of 8. */
len = (len | 7) + 9;
/* Allocate memory for the string. */
dst = malloc(len);
if (!dst) {
fprintf(stderr, "dupcats(): Out of memory; tried to allocate %zu bytes.\n", len);
exit(EXIT_FAILURE);
}
/* Copy the source strings. */
end = dst;
va_start(args, count);
for (i = 0; i < count; i++) {
const char *src = va_arg(args, const char *);
/* We could use strlen() and memcpy() here;
however, this loop is easier to follow. */
if (src)
while (*src)
*(end++) = *(src++);
}
va_end(args);
/* Sanity check. */
if (end >= dst + len) {
fprintf(stderr, "dupcats(): Arguments were modified during duplication; buffer overrun!\n");
free(dst); /* We can omit this free(), but only in case of exit(). */
exit(EXIT_FAILURE);
}
/* Add end-of-string '\0' (filling the padding). */
memset(end, '\0', dst + len - end);
/* Done. */
return dst;
}
int main(int argc, char *argv[])
{
char *result;
result = dupcat(argc - 1, (const char **)(argv + 1));
printf("Arguments concatenated: '%s'.\n", result);
free(result);
result = dupcats(5, "foo", "/", "bar", "/", "baz");
printf("Concatenating 'foo', '/', 'bar', '/', and 'baz': '%s'.\n", result);
free(result);
return EXIT_SUCCESS;
}
Neither dupcat() nor dupcats() will ever return NULL: they will print an error message to standard error and exit, if an error occurs.
dupcat() takes an array of strings, and returns a dynamically allocated concatenated copy with at least eight bytes of nul padding.
dupcats() takes a variable number of pointers, and returns a dynamically allocated concatenated copy with at least eight bytes of nul padding.
Both functions treat NULL pointers as if they were empty strings. For both functions, the first parameter is the number of strings to concatenate.
(Since OP did not show the definitions of struct file_list or struct file_node, I did not bother to write a list-based version. However, it should be trivial to adapt from one of the two versions shown.)
In some cases, a variant that constructs a valid path from a fixed base part, with one or more relative file or directory names concatenated, and POSIXy ./ removed and ../ backtracked (but not out of base subtree), is very useful.
If carefully written, it allows the program to accept untrusted paths, relative to a specific subtree. (The combined paths are confined to that subtree, but symlinks and hardlinks can still be used to escape the subtree.)
One possible implementation is as follows:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
char *dynamic_path(const char *const subtree,
const size_t parts,
const char *part[])
{
const size_t subtree_len = (subtree) ? strlen(subtree) : 0;
size_t parts_len = 0;
size_t total_len, i;
char *path, *mark, *curr;
/* Calculate the length of each individual part.
Include room for a leading slash.
*/
for (i = 0; i < parts; i++)
parts_len += (part[i]) ? 1 + strlen(part[i]) : 0;
/* Add room for the string-terminating '\0'.
We're paranoid, and add a bit more padding. */
total_len = ((subtree_len + parts_len) | 7) + 9;
/* Allocate memory for the combined path. */
path = malloc(total_len);
if (!path) {
errno = ENOMEM;
return NULL;
}
/* If the user specified a subtree, we use it as the fixed prefix. */
if (subtree_len > 0) {
memcpy(path, subtree, subtree_len);
mark = path + subtree_len;
/* Omit a trailing /. We enforce it below anyway. */
if (parts > 0 && subtree_len > 1 && mark[-1] == '/')
--mark;
} else
mark = path;
/* Append the additional path parts. */
curr = mark;
for (i = 0; i < parts; i++) {
const size_t len = (part[i]) ? strlen(part[i]) : 0;
if (len > 0) {
/* Each path part is a separate file/directory name,
so there is an (implicit) slash before each one. */
if (part[i][0] != '/')
*(curr++) = '/';
memcpy(curr, part[i], len);
curr += len;
}
}
/* Sanity check. */
if (curr >= path + total_len) {
/* Buffer overrun occurred. */
fprintf(stderr, "Buffer overrun in dynamic_path()!\n");
free(path); /* Can be omitted if we exit(). */
exit(EXIT_FAILURE);
}
/* Terminate string (and clear padding). */
memset(curr, '\0', (size_t)(path + total_len - curr));
/* Cleanup pass.
Convert "/foo/../" to "/", but do not backtrack over mark.
Combine consecutive slashes and /./ to a single slash.
*/
{
char *src = mark;
char *dst = mark;
while (*src)
if (src[0] == '/' && src[1] == '.' && src[2] == '.' && (!src[3] || src[3] == '/')) {
src += 3; /* Skip over /.. */
/* Backtrack, but do not underrun mark. */
if (dst > mark) {
dst--;
while (dst > mark && *dst != '/')
dst--;
}
/* Never consume the mark slash. */
if (dst == mark)
dst++;
} else
if (src[0] == '/' && src[1] == '.' && (!src[2] || src[2] == '/')) {
src += 2; /* Skip over /. */
if (dst == mark || dst[-1] != '/')
*(dst++) = '/';
} else
if (src[0] == '/') {
src++;
if (dst == mark || dst[-1] != '/')
*(dst++) = '/';
} else
*(dst++) = *(src++);
/* Clear removed part. */
if (dst < src)
memset(dst, '\0', (size_t)(src - dst));
}
return path;
}
int main(int argc, char *argv[])
{
char *path;
if (argc < 2) {
fprintf(stderr, "\nUsage: %s PREFIX [ PATHNAME ... ]\n\n", argv[0]);
return EXIT_FAILURE;
}
path = dynamic_path(argv[1], argc - 2, (const char **)(argv + 2));
if (!path) {
fprintf(stderr, "dynamic_path(): %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("%s\n", path);
free(path);
return EXIT_SUCCESS;
}
Note that I wrote the above version from scratch (and dedicate it to public domain (CC0)), so you should thoroughly test it before relying it on production use. (My intent is for it to be an useful example or basis, that will help you write your own implementation tailored to your needs.)
If you do find any bugs or issues in it, let me know in a comment, so I can verify and fix.

Related

How can i add a character after every word in a string?

So what i have is a string(str) that i get from fgets(str, x, stdin);.
If i write for example "Hello World" i want to be able to add a character infront of each word in the string.
To get this "Hello? World?" as an example. I think i've made it alot harder for myself by trying to solve it this way:
add(char *s, char o, char c){
int i, j = 0;
for (i = 0; s[i] != '\0'; i++) {
if (s[i] != o) {
s[j] = s[i];
}
else {
s[j] = c;
}
j++;
}
}
add(str, ' ','?');
printf("\n%s", str);
This will read out "Hello?World" without the spaces. Now the only way i see this working is if i move everything after the first "?" one to the right while also making the positon of the "W" to a space and a "?" at the end. But for much longer strings i can't see myself doing that.
You can't safely extend a string with more characters without insuring the buffer that holds the string is big enough. So let's devise a solution that counts how many additional characters are needed, allocate a buffer big enough to hold a string of that length, then do the copy loop. Then return the new string back to the caller.
char* add(const char* s, char o, char c)
{
size_t len = strlen(s);
const char* str = s;
char* result = NULL;
char* newstring = NULL;
// count how many characters are needed for the new string
while (*str)
{
len += (*str== o) ? 2 : 1;
str++;
}
// allocate a result buffer big enough to hold the new string
result = malloc(len + 1); // +1 for null char
// now copy the string and insert the "c" parameter whenever "o" is seen
newstring = result;
str = s;
while (*str)
{
*newstring++ = *str;
if (*str == o)
{
*newstring++ = c;
}
str++;
}
*newString = '\0';
return result;
}
Then your code to invoke is as follows:
char* newstring g= add(str, ' ','?');
printf("\n%s", newstring);
free(newstring);
#include <stdio.h>
#include <string.h>
int main(void) {
char text[] = "Hello World";
for(char* word = strtok(text, " .,?!"); word; word = strtok(NULL, " .,?!"))
printf("%s? ", word);
return 0;
}
Example Output
Success #stdin #stdout 0s 4228KB
Hello? World?
IDEOne Link
Knowing the amount of storage available when you reach a position where the new character will be inserted, you can check whether the new character will fit in the available storage, move from the current character through end-of-string to the right by one and insert the new character, e.g.
#include <stdio.h>
#include <string.h>
#define MAXC 1024
char *add (char *s, const char find, const char replace)
{
char *p = s; /* pointer to string */
while (*p) { /* for each char */
if (*p == find) {
size_t remain = strlen (p); /* get remaining length */
if ((p - s + remain < MAXC - 1)) { /* if space remains for char */
memmove (p + 1, p, remain + 1); /* move chars to right by 1 */
*p++ = replace; /* replace char, advance ptr */
}
else { /* warn if string full */
fputs ("error: replacement will exceed storage.\n", stderr);
break;
}
}
p++; /* advance to next char */
}
return s; /* return pointer to beginning of string */
}
...
(note: the string must be mutable, not a string-literal, and have additional storage for the inserted character. If you need to pass a string-literal or you have no additional storage in the current string, make a copy as shown by #Selbie in his answer)
Putting together a short example with a 1024-char buffer for storage, you can do something like:
#include <stdio.h>
#include <string.h>
#define MAXC 1024
char *add (char *s, const char find, const char replace)
{
char *p = s; /* pointer to string */
while (*p) { /* for each char */
if (*p == find) {
size_t remain = strlen (p); /* get remaining length */
if ((p - s + remain < MAXC - 1)) { /* if space remains for char */
memmove (p + 1, p, remain + 1); /* move chars to right by 1 */
*p++ = replace; /* replace char, advance ptr */
}
else { /* warn if string full */
fputs ("error: replacement will exceed storage.\n", stderr);
break;
}
}
p++; /* advance to next char */
}
return s; /* return pointer to beginning of string */
}
int main (void) {
char buf[MAXC];
if (!fgets (buf, MAXC, stdin))
return 1;
buf[strcspn(buf, "\n")] = 0;
puts (add (buf, ' ', '?'));
}
Example Use/Output
$ ./bin/str_replace_c
Hello World?
Hello? World?
Look things over and let me know if you have questions.
Just for fun, here's my implementation. It modifies the string in-place and in O(n) time. It assumes that the char-buffer is large enough to hold the additional characters, so it's up to the calling code to ensure that.
#include <stdio.h>
void add(char *s, char o, char c)
{
int num_words = 0;
char * p = s;
while(*p) if (*p++ == o) num_words++;
char * readFrom = p;
char * writeTo = p+num_words;
char * nulByte = writeTo;
// Insert c-chars, iterating backwards to avoid overwriting chars we have yet to read
while(readFrom >= s)
{
*writeTo = *readFrom;
if (*writeTo == o)
{
--writeTo;
*writeTo = c;
}
writeTo--;
readFrom--;
}
// If our string doesn't end in a 'c' char, append one
if ((nulByte > s)&&(*(nulByte-1) != c))
{
*nulByte++ = c;
*nulByte = '\0';
}
}
int main(int argc, char ** argv)
{
char test_string[1000] = "Hello World";
add(test_string, ' ','?');
printf("%s\n", test_string);
return 0;
}
The program's output is:
$ ./a.out
Hello? World?

C printf changing contents of variable inside function

So I am writing a little function to parse paths, it looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int parse_path() {
char *pathname = "this/is/a/path/hello";
int char_index = 0;
char current_char = pathname[char_index];
char *buffer = malloc(2 * sizeof(char));
char *current_char_str = malloc(2 * sizeof(char));
while (current_char != '\0' && (int)current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++; current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
buffer = (char *)realloc(buffer, (strlen(buffer) + 2) * sizeof(char));
strcat(buffer, current_char_str);
char_index++; current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
char_index++; current_char = pathname[char_index];
}
};
int main(int argc, char *argv[]) {
parse_path();
printf("hello\n");
return 0;
}
Now, there is undefined behavior in my code, it looks like the printf call inside the main method is changing the buffer variable... as you can see, the output of this program is:
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
buffer(buffer(%s)
)
buffer(hello)
hello
I have looked at other posts where the same sort of problem is mentioned and people have told me to use a static char array etc. but that does not seem to help.
Any suggestions?
For some reason, at one time in this program the "hello" string from printf is present in my buffer variable.
Sebastian, if you are still having problems after #PaulOgilvie answer, then it is most likely due to not understanding his answer. Your problem is due to buffer being allocated but not initialized. When you call malloc, it allocates a block of at least the size requested, and returns a pointer to the beginning address for the new block -- but does nothing with the contents of the new block -- meaning the block is full random values that just happened to be in the range of addresses for the new block.
So when you call strcat(buffer, current_char_str); the first time and there is nothing but random garbage in buffer and no nul-terminating character -- you do invoke Undefined Behavior. (there is no end-of-string in buffer to be found)
To fix the error, you simply need to make buffer an empty-string after it is allocated by setting the first character to the nul-terminating character, or use calloc instead to allocate the block which will ensure all bytes are set to zero.
For example:
int parse_path (const char *pathname)
{
int char_index = 0, ccs_index = 0;
char current_char = pathname[char_index];
char *buffer = NULL;
char *current_char_str = NULL;
if (!(buffer = malloc (2))) {
perror ("malloc-buffer");
return 0;
}
*buffer = 0; /* make buffer empty-string, or use calloc */
...
Also do not hardcode paths or numbers (that includes the 0 and 2, but we will let those slide for now). Hardcoding "this/is/a/path/hello" within parse_path() make is a rather un-useful function. Instead, make your pathname variable your parameter so I can take any path you want to send to it...
While the whole idea of realloc'ing 2-characters at a time is rather inefficient, you always need to realloc with a temporary pointer rather than the pointer itself. Why? realloc can and does fail and when it does, it returns NULL. If you are using the pointer itself, you will overwrite your current pointer address with NULL in the event of failure, losing the address to your existing block of memory forever creating a memory leak. Instead,
void *tmp = realloc (buffer, strlen(buffer) + 2);
if (!tmp) {
perror ("realloc-tmp");
goto alldone; /* use goto to break nested loops */
}
...
}
alldone:;
/* return something meaningful, your function is type 'int' */
}
A short example incorporating the fixes and temporary pointer would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parse_path (const char *pathname)
{
int char_index = 0, ccs_index = 0;
char current_char = pathname[char_index];
char *buffer = NULL;
char *current_char_str = NULL;
if (!(buffer = malloc (2))) {
perror ("malloc-buffer");
return 0;
}
*buffer = 0; /* make buffer empty-string, or use calloc */
if (!(current_char_str = malloc (2))) {
perror ("malloc-current_char_str");
return 0;
}
while (current_char != '\0' && (int) current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++;
current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
void *tmp = realloc (buffer, strlen(buffer) + 2);
if (!tmp) {
perror ("realloc-tmp");
goto alldone;
}
strcat(buffer, current_char_str);
char_index++;
current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
if (current_char != '\0') {
char_index++;
current_char = pathname[char_index];
}
}
alldone:;
return ccs_index;
}
int main(int argc, char* argv[]) {
parse_path ("this/is/a/path/hello");
printf ("hello\n");
return 0;
}
(note: your logic is quite tortured above and you could just use a fixed buffer of PATH_MAX size (include limits.h) and dispense with allocating. Otherwise, you should allocate some anticipated number of characters for buffer to begin with, like strlen (pathname) which would ensure sufficient space for each path component without reallocating. I'd rather over-allocate by 1000-characters than screw up indexing worrying about reallocating 2-characters at a time...)
Example Use/Output
> bin\parsepath.exe
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello
A More Straight-Forward Approach Without Allocation
Simply using a buffer of PATH_MAX size or an allocated buffer of at least strlen (pathname) size will allow you to simply step down your string without any reallocations, e.g.
#include <stdio.h>
#include <limits.h> /* for PATH_MAX - but VS doesn't provide it, so we check */
#ifndef PATH_MAX
#define PATH_MAX 2048
#endif
void parse_path (const char *pathname)
{
const char *p = pathname;
char buffer[PATH_MAX], *b = buffer;
while (*p) {
if (*p == '/') {
if (p != pathname) {
*b = 0;
printf ("buffer (%s)\n", buffer);
b = buffer;
}
}
else
*b++ = *p;
p++;
}
if (b != buffer) {
*b = 0;
printf ("buffer (%s)\n", buffer);
}
}
int main (int argc, char* argv[]) {
char *path = argc > 1 ? argv[1] : "this/is/a/path/hello";
parse_path (path);
printf ("hello\n");
return 0;
}
Example Use/Output
> parsepath2.exe
buffer (this)
buffer (is)
buffer (a)
buffer (path)
buffer (hello)
hello
Or
> parsepath2.exe another/path/that/ends/in/a/filename
buffer (another)
buffer (path)
buffer (that)
buffer (ends)
buffer (in)
buffer (a)
buffer (filename)
hello
Now you can pass any path you would like to parse as an argument to your program and it will be parsed without having to change or recompile anything. Look things over and let me know if you have questions.
You strcat something to buffer but buffer has never been initialized. strcat will first search for the first null character and then copy the string to concatenate there. You are now probably overwriting memory that is not yours.
Before the outer while loop do:
*buffer= '\0';
There are 2 main problems in your code:
the arrays allocated by malloc() are not initialized, so you have undefined behavior when you call strlen(buffer) before setting a null terminator inside the array buffer points to. The program could just crash, but in your case whatever contents is present in the memory block and after it is retained up to the first null byte.
just before the end of the outer loop, you should only take the next character from the path if the current character is a '/'. In your case, you skip the null terminator and the program has undefined behavior as you read beyond the end of the string constant. Indeed, the parse continues through another string constant "buffer(%s)\n" and through yet another one "hello". The string constants seem to be adjacent without padding on your system, which is just a coincidence.
Here is a corrected version:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void parse_path(const char *pathname) {
int char_index = 0;
char current_char = pathname[char_index];
char *buffer = calloc(1, 1);
char *current_char_str = calloc(1, 1);
while (current_char != '\0' && current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++; current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
buffer = (char *)realloc(buffer, strlen(buffer) + 2);
strcat(buffer, current_char_str);
char_index++; current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
if (current_char == '/') {
char_index++; current_char = pathname[char_index];
}
}
}
int main(int argc, char *argv[]) {
parse_path("this/is/a/path/hello");
printf("hello\n");
return 0;
}
Output:
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello
Note however some remaining problems:
allocation failure is not tested, resulting in undefined behavior,
allocated blocks are not freed, resulting in memory leaks,
it is unclear why you test current_char != 11: did you mean to stop at TAB or newline?
Here is a much simpler version with the same behavior:
#include <stdio.h>
#include <string.h>
void parse_path(const char *pathname) {
int i, n;
for (i = 0; pathname[i] != '\0'; i += n) {
if (pathname[i] == '/') {
n = 1; /* skip path separators and empty items */
} else {
n = strcspn(pathname + i, "/"); /* get the length of the path item */
printf("buffer(%.*s)\n", n, pathname + i);
}
}
}
int main(int argc, char *argv[]) {
parse_path("this/is/a/path/hello");
printf("hello\n");
return 0;
}

Extract the file name and its extension in C

So we have a path string /home/user/music/thomas.mp3.
Where is the easy way to extract file name(without extension, "thomas") and it's extension ("mp3") from this string? A function for filename, and for extension. And only GNU libc in our hands.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_FILENAME_SIZE 256
char *filename(char *str) {
char *result;
char *last;
if ((last = strrchr(str, '.')) != NULL ) {
if ((*last == '.') && (last == str))
return str;
else {
result = (char*) malloc(MAX_FILENAME_SIZE);
snprintf(result, sizeof result, "%.*s", (int)(last - str), str);
return result;
}
} else {
return str;
}
}
char *extname(char *str) {
char *result;
char *last;
if ((last = strrchr(str, '.')) != NULL) {
if ((*last == '.') && (last == str))
return "";
else {
result = (char*) malloc(MAX_FILENAME_SIZE);
snprintf(result, sizeof result, "%s", last + 1);
return result;
}
} else {
return ""; // Empty/NULL string
}
}
Use basename to get the filename and then you can use something like this to get the extension.
char *get_filename_ext(const char *filename) {
const char *dot = strrchr(filename, '.');
if(!dot || dot == filename) return "";
return dot + 1;
}
Edit:
Try something like.
#include <string.h>
#include <libgen.h>
static void printFileInfo(char *path) {
char *bname;
char *path2 = strdup(path);
bname = basename(path2);
printf("%s.%s\n",bname, get_filename_ext(bname));
free(path2);
}
Regarding your actual code (all the other answers so far say to scrap that and do something else, which is good advice, however I am addressing your code as it contains blunders that it'd be good to learn about in advance of next time you try to write something).
Firstly:
strncpy(str, result, (size_t) (last-str) + 1);
is not good. You have dest and src around the wrong way; and further this function does not null-terminate the output (unless the input is short enough, which it isn't). Generally speaking strncpy is almost never a good solution to a problem; either strcpy if you know the length, or snprintf.
Simpler and less error-prone would be:
snprintf(result, sizeof result, "%.*s", (int)(last - str), str);
Similary in the other function,
snprintf(result, sizeof result, "%s", last + 1);
The snprintf function never overflows buffer and always produces a null-terminated string, so long as you get the buffer length right!
Now, even if you fixed those then you have another fundamental problem in that you are returning a pointer to a buffer that is destroyed when the function returns. You could fix ext by just returning last + 1, since that is null-terminated anyway. But for filename you have the usual set of options:
return a pointer and a length, and treat it as a length-counted string, not a null-terminated one
return pointer to mallocated memory
return pointer to static buffer
expect the caller to pass in a buffer and a buffer length, which you just write into
Finally, returning NULL on failure is probably a bad idea; if there is no . then return the whole string for filename, and an empty string for ext. Then the calling code does not have to contort itself with checks for NULL.
Here is a routine I use for that problem:
Separates original string into separate strings of path, file_name and extension.
Will work for Windows and Linux, relative or absolute style paths. Will handle directory names with embedded ".". Will handle file names without extensions.
/////////////////////////////////////////////////////////
//
// Example:
// Given path == "C:\\dir1\\dir2\\dir3\\file.exe"
// will return path_ as "C:\\dir1\\dir2\\dir3"
// Will return base_ as "file"
// Will return ext_ as "exe"
//
/////////////////////////////////////////////////////////
void GetFileParts(char *path, char *path_, char *base_, char *ext_)
{
char *base;
char *ext;
char nameKeep[MAX_PATHNAME_LEN];
char pathKeep[MAX_PATHNAME_LEN];
char pathKeep2[MAX_PATHNAME_LEN]; //preserve original input string
char File_Ext[40];
char baseK[40];
int lenFullPath, lenExt_, lenBase_;
char *sDelim={0};
int iDelim=0;
int rel=0, i;
if(path)
{ //determine type of path string (C:\\, \\, /, ./, .\\)
if( (strlen(path) > 1) &&
(
((path[1] == ':' ) &&
(path[2] == '\\'))||
(path[0] == '\\') ||
(path[0] == '/' ) ||
((path[0] == '.' ) &&
(path[1] == '/' ))||
((path[0] == '.' ) &&
(path[1] == '\\'))
)
)
{
sDelim = calloc(5, sizeof(char));
/* // */if(path[0] == '\\') iDelim = '\\', strcpy(sDelim, "\\");
/* c:\\ */if(path[1] == ':' ) iDelim = '\\', strcpy(sDelim, "\\"); // also satisfies path[2] == '\\'
/* / */if(path[0] == '/' ) iDelim = '/' , strcpy(sDelim, "/" );
/* ./ */if((path[0] == '.')&&(path[1] == '/')) iDelim = '/' , strcpy(sDelim, "/" );
/* .\\ */if((path[0] == '.')&&(path[1] == '\\')) iDelim = '\\' , strcpy(sDelim, "\\" );
/* \\\\ */if((path[0] == '\\')&&(path[1] == '\\')) iDelim = '\\', strcpy(sDelim, "\\");
if(path[0]=='.')
{
rel = 1;
path[0]='*';
}
if(!strstr(path, ".")) // if no filename, set path to have trailing delim,
{ //set others to "" and return
lenFullPath = strlen(path);
if(path[lenFullPath-1] != iDelim)
{
strcat(path, sDelim);
path_[0]=0;
base_[0]=0;
ext_[0]=0;
}
}
else
{
nameKeep[0]=0; //works with C:\\dir1\file.txt
pathKeep[0]=0;
pathKeep2[0]=0; //preserves *path
File_Ext[0]=0;
baseK[0]=0;
//Get lenth of full path
lenFullPath = strlen(path);
strcpy(nameKeep, path);
strcpy(pathKeep, path);
strcpy(pathKeep2, path);
strcpy(path_, path); //capture path
//Get length of extension:
for(i=lenFullPath-1;i>=0;i--)
{
if(pathKeep[i]=='.') break;
}
lenExt_ = (lenFullPath - i) -1;
base = strtok(path, sDelim);
while(base)
{
strcpy(File_Ext, base);
base = strtok(NULL, sDelim);
}
strcpy(baseK, File_Ext);
lenBase_ = strlen(baseK) - lenExt_;
baseK[lenBase_-1]=0;
strcpy(base_, baseK);
path_[lenFullPath -lenExt_ -lenBase_ -1] = 0;
ext = strtok(File_Ext, ".");
ext = strtok(NULL, ".");
if(ext) strcpy(ext_, ext);
else strcpy(ext_, "");
}
memset(path, 0, lenFullPath);
strcpy(path, pathKeep2);
if(rel)path_[0]='.';//replace first "." for relative path
free(sDelim);
}
}
}
Here is an old-school algorithm that will do the trick.
char path[100] = "/home/user/music/thomas.mp3";
int offset_extension, offset_name;
int len = strlen(path);
int i;
for (i = len; i >= 0; i--) {
if (path[i] == '.')
break;
if (path[i] == '/') {
i = len;
break;
}
}
if (i == -1) {
fprintf(stderr,"Invalid path");
exit(EXIT_FAILURE);
}
offset_extension = i;
for (; i >= 0; i--)
if (path[i] == '/')
break;
if (i == -1) {
fprintf(stderr,"Invalid path");
exit(EXIT_FAILURE);
}
offset_name = i;
char *extension, name[100];
extension = &path[offset_extension+1];
memcpy(name, &path[offset_name+1], offset_extension - offset_name - 1);
Then you have both information under the variables name and extension
printf("%s %s", name, extension);
This will print:
thomas mp3
I know this is old. But I tend to use strtok for things like this.
/* strtok example */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_TOKENS 20 /* Some reasonable values */
#define MAX_STRING 128 /* Easy enough to make dynamic with mallocs */
int main ()
{
char str[] ="/home/user/music/thomas.mp3";
char sep[] = "./";
char collect[MAX_TOKENS][MAX_STRING];
/* Not really necessary, since \0 is added inplace. I do this out of habit. */
memset(collect, 0, MAX_TOKENS * MAX_STRING);
char * pch = strtok (str, sep);
int ccount = 0;
if(pch != NULL) {
/* collect all seperated text */
while(pch != NULL) {
strncpy( collect[ccount++], pch, strlen(pch));
pch = strtok (NULL, sep);
}
}
/* output tokens. */
for(int i=0; i<ccount; ++i)
printf ("Token: %s\n", collect[i]);
return 0;
}
This is a rough example, and it makes it easy to deal with the tokens afterwards. Ie the last token is the extension. Second last is the basename and so on.
I also find it useful for rebuilding paths for different platforms - replace / with \.

how to remove extension from file name?

I want to throw the last three character from file name and get the rest?
I have this code:
char* remove(char* mystr) {
char tmp[] = {0};
unsigned int x;
for (x = 0; x < (strlen(mystr) - 3); x++)
tmp[x] = mystr[x];
return tmp;
}
Try:
char *remove(char* myStr) {
char *retStr;
char *lastExt;
if (myStr == NULL) return NULL;
if ((retStr = malloc (strlen (myStr) + 1)) == NULL) return NULL;
strcpy (retStr, myStr);
lastExt = strrchr (retStr, '.');
if (lastExt != NULL)
*lastExt = '\0';
return retStr;
}
You'll have to free the returned string yourself. It simply finds the last . in the string and replaces it with a null terminator character. It will handle errors (passing NULL or running out of memory) by returning NULL.
It won't work with things like /this.path/is_bad since it will find the . in the non-file portion but you could handle this by also doing a strrchr of /, or whatever your path separator is, and ensuring it's position is NULL or before the . position.
A more general purpose solution to this problem could be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// remove_ext: removes the "extension" from a file spec.
// myStr is the string to process.
// extSep is the extension separator.
// pathSep is the path separator (0 means to ignore).
// Returns an allocated string identical to the original but
// with the extension removed. It must be freed when you're
// finished with it.
// If you pass in NULL or the new string can't be allocated,
// it returns NULL.
char *remove_ext (char* myStr, char extSep, char pathSep) {
char *retStr, *lastExt, *lastPath;
// Error checks and allocate string.
if (myStr == NULL) return NULL;
if ((retStr = malloc (strlen (myStr) + 1)) == NULL) return NULL;
// Make a copy and find the relevant characters.
strcpy (retStr, myStr);
lastExt = strrchr (retStr, extSep);
lastPath = (pathSep == 0) ? NULL : strrchr (retStr, pathSep);
// If it has an extension separator.
if (lastExt != NULL) {
// and it's to the right of the path separator.
if (lastPath != NULL) {
if (lastPath < lastExt) {
// then remove it.
*lastExt = '\0';
}
} else {
// Has extension separator with no path separator.
*lastExt = '\0';
}
}
// Return the modified string.
return retStr;
}
int main (int c, char *v[]) {
char *s;
printf ("[%s]\n", (s = remove_ext ("hello", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("hello.", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("hello.txt", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("hello.txt.txt", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("/no.dot/in_path", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("/has.dot/in.path", '.', '/'))); free (s);
printf ("[%s]\n", (s = remove_ext ("/no.dot/in_path", '.', 0))); free (s);
return 0;
}
and this produces:
[hello]
[hello]
[hello]
[hello.txt]
[/no.dot/in_path]
[/has.dot/in]
[/no]
Use rindex to locate the "." character. If the string is writable, you can replace it with the string terminator char ('\0') and you're done.
char * rindex(const char *s, int c);
DESCRIPTION
The rindex() function locates the last character matching c (converted to a char) in the null-terminated string s.
If you literally just want to remove the last three characters, because you somehow know that your filename has an extension exactly three chars long (and you want to keep the dot):
char *remove_three(const char *filename) {
size_t len = strlen(filename);
char *newfilename = malloc(len-2);
if (!newfilename) /* handle error */;
memcpy(newfilename, filename, len-3);
newfilename[len - 3] = 0;
return newfilename;
}
Or let the caller provide the destination buffer (which they must ensure is long enough):
char *remove_three(char *dst, const char *filename) {
size_t len = strlen(filename);
memcpy(dst, filename, len-3);
dst[len - 3] = 0;
return dst;
}
If you want to generically remove a file extension, that's harder, and should normally use whatever filename-handling routines your platform provides (basename on POSIX, _wsplitpath_s on Windows) if there's any chance that you're dealing with a path rather than just the final part of the filename:
/* warning: may modify filename. To avoid this, take a copy first
dst may need to be longer than filename, for example currently
"file.txt" -> "./file.txt". For this reason it would be safer to
pass in a length with dst, and/or allow dst to be NULL in which
case return the length required */
void remove_extn(char *dst, char *filename) {
strcpy(dst, dirname(filename));
size_t len = strlen(dst);
dst[len] = '/';
dst += len+1;
strcpy(dst, basename(filename));
char *dot = strrchr(dst, '.');
/* retain the '.' To remove it do dot[0] = 0 */
if (dot) dot[1] = 0;
}
Come to think of it, you might want to pass dst+1 rather than dst to strrchr, since a filename starting with a dot maybe shouldn't be truncated to just ".". Depends what it's for.
I would try the following algorithm:
last_dot = -1
for each char in str:
if char = '.':
last_dot = index(char)
if last_dot != -1:
str[last_dot] = '\0'
Just replace the dot with "0". If you know that your extension is always 3 characters long you can just do:
char file[] = "test.png";
file[strlen(file) - 4] = 0;
puts(file);
This will output "test". Also, you shouldn't return a pointer to a local variable. The compiler will also warn you about this.
To get paxdiablo's second more general purpose solution to work in a C++ compiler I changed this line:
if ((retstr = malloc (strlen (mystr) + 1)) == NULL)
to:
if ((retstr = static_cast<char*>(malloc (strlen (mystr) + 1))) == NULL)
Hope this helps someone.
This should do the job:
char* remove(char* oldstr) {
int oldlen = 0;
while(oldstr[oldlen] != NULL){
++oldlen;
}
int newlen = oldlen - 1;
while(newlen > 0 && mystr[newlen] != '.'){
--newlen;
}
if (newlen == 0) {
newlen = oldlen;
}
char* newstr = new char[newlen];
for (int i = 0; i < newlen; ++i){
newstr[i] = oldstr[i];
}
return newstr;
}
Get location and just copy up to that location into a new char *.
i = 0;
n = 0;
while(argv[1][i] != '\0') { // get length of filename
i++; }
for(ii = 0; i > -1; i--) { // look for extension working backwards
if(argv[1][i] == '.') {
n = i; // char # of exension
break; } }
memcpy(new_filename, argv[1], n);
This is simple way to change extension name.
....
char outputname[255]
sscanf(inputname,"%[^.]",outputname); // foo.bar => foo
sprintf(outputname,"%s.txt",outputname) // foo.txt <= foo
....
With configurable minimum file length and configurable maximum extension length. Returns index where extension was changed to null character, or -1 if no extension was found.
int32_t strip_extension(char *in_str)
{
static const uint8_t name_min_len = 1;
static const uint8_t max_ext_len = 4;
/* Check chars starting at end of string to find last '.' */
for (ssize_t i = sizeof(in_str); i > (name_min_len + max_ext_len); i--)
{
if (in_str[i] == '.')
{
in_str[i] = '\0';
return i;
}
}
return -1;
}
I use this code:
void remove_extension(char* s) {
char* dot = 0;
while (*s) {
if (*s == '.') dot = s; // last dot
else if (*s == '/' || *s == '\\') dot = 0; // ignore dots before path separators
s++;
}
if (dot) *dot = '\0';
}
It handles the Windows path convention correctly (both / and \ can be path separators).

Expand Tabs to Spaces in C?

I need to expand tabs in an input line, so that they are spaces (with a width of 8 columns). I tried it with a previous code I had replacing the last space in every line greater than 10 characters with a '\n' to make a new line. Is there an way in C to make tabs 8 spaces in order to expand them? I mean I am sure it is simple, I just can't seem to get it.
Here's my code:
int v = 0;
int w = 0;
int tab;
extern char line[];
while (v < length) {
if(line[v] == '\t')
tab = v;
if (w == MAXCHARS) {
// THIS IS WHERE I GET STUCK
line[tab] = ' ';
// set y to 0, so loop starts over
w = 0;
}
++v;
++w;
}
This isn't really a question about the C language; it's a question about finding the right algorithm -- you could use that algorithm in any language.
Anyhow, you can't do this at all without reallocating line[] to point at a larger buffer (unless it's a large fixed length, in which case you need to be worried about overflows); as you're expanding the tabs, you need more memory to store the new, larger lines, so character replacement such as you're trying to do simply won't work.
My suggestion: Rather than trying to operate in place (or trying to operate in memory, even) I would suggest writing this as a filter -- reading from stdin and writing to stdout one character at a time; that way you don't need to worry about memory allocation or deallocation or the changing length of line[].
If the context this code is being used in requires it to operate in memory, consider implementing an API similar to realloc(), wherein you return a new pointer; if you don't need to change the length of the string being handled you can simply keep the original region of memory, but if you do need to resize it, the option is available.
You need a separate buffer to write the output to, since it will in general be longer than the input:
void detab(char* in, char* out, size_t max_len) {
size_t i = 0;
while (*in && i < max_len - 1) {
if (*in == '\t') {
in++;
out[i++] = ' ';
while (i % 8 && i < max_len - 1) {
out[i++] = ' ';
}
} else {
out[i++] = *in++;
}
}
out[i] = 0;
}
You must preallocate enough space for out (which in the worst case could be 8 * strlen(in) + 1), and out cannot be the same as in.
EDIT: As suggested by Jonathan Leffler, the max_len parameter now makes sure we avoid buffer overflows. The resulting string will always be null-terminated, even if it is cut short to avoid such an overflow. (I also renamed the function, and changed int to size_t for added correctness :).)
I would probably do something like this:
Iterate through the string once, only counting the tabs (and the string length if you don't already know that).
Allocate original_size + 7 * number_of_tabs bytes of memory (where original_size counts the null byte).
Iterate through the string another time, copying every non-tab byte to the new memory and inserting 8 spaces for every tab.
If you want to do the replacement in-place instead of creating a new string, you'll have to make sure that the passed-in pointer points to a location with enough memory to store the new string (which will be longer than the original because 8 spaces or 7 bytes more than one tab).
Here's a reentrant, recursive version which automatically allocates a buffer of correct size:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct state
{
char *dest;
const char *src;
size_t tab_size;
size_t size;
_Bool expand;
};
static void recexp(struct state *state, size_t di, size_t si)
{
size_t start = si;
size_t pos = si;
for(; state->src[pos]; ++pos)
{
if(state->src[pos] == '\n') start = pos + 1;
else if(state->src[pos] == '\t')
{
size_t str_len = pos - si;
size_t tab_len = state->tab_size - (pos - start) % state->tab_size;
recexp(state, di + str_len + tab_len, pos + 1);
if(state->dest)
{
memcpy(state->dest + di, state->src + si, str_len);
memset(state->dest + di + str_len, ' ', tab_len);
}
return;
}
}
state->size = di + pos - si + 1;
if(state->expand && !state->dest) state->dest = malloc(state->size);
if(state->dest)
{
memcpy(state->dest + di, state->src + si, pos - si);
state->dest[state->size - 1] = 0;
}
}
size_t expand_tabs(char **dest, const char *src, size_t tab_size)
{
struct state state = { dest ? *dest : NULL, src, tab_size, 0, dest };
recexp(&state, 0, 0);
if(dest) *dest = state.dest;
return state.size;
}
int main(void)
{
char *expansion = NULL; // must be `NULL` for automatic allocation
size_t size = expand_tabs(&expansion,
"spam\teggs\tfoo\tbar\nfoobar\tquux", 4);
printf("expanded size: %lu\n", (unsigned long)size);
puts(expansion);
}
If expand_tabs() is called with dest == NULL, the function will return the size of the expanded string, but no expansion is actually done; if dest != NULL but *dest == NULL, a buffer of correct size will be allocated and must be deallocated by the programmer; if dest != NULL and *dest != NULL, the expanded string will be put into *dest, so make sure the supplied buffer is large enough.
Untested, but something like this should work:
int v = 0;
int tab;
extern char line[];
while (v < length){
if (line[v] == '\t') {
tab = (v % TAB_WIDTH) || TAB_WIDTH;
/* I'm assuming MAXCHARS is the size of your array. You either need
* to bail, or resize the array if the expanding the tab would make
* the string too long. */
assert((length + tab) < MAXCHARS);
if (tab != 1) {
memmove(line + v + tab - 1, line + v, length - v + 1);
}
memset(line + v, ' ', tab);
length += tab - 1;
v += tab;
} else {
++v;
}
}
Note that this is O(n*m) where n is the line size and m is the number of tabs. That probably isn't an issue in practice.
There are a myriad ways to convert tabs in a string into 1-8 spaces. There are inefficient ways to do the expansion in-situ, but the easiest way to handle it is to have a function that takes the input string and a separate output buffer that is big enough for an expanded string. If the input is 6 tabs plus an X and a newline (8 characters + terminating null), the output would be 48 blanks, X, and a newline (50 characters + terminating null) - so you might need a much bigger output buffer than input buffer.
#include <stddef.h>
#include <assert.h>
static int detab(const char *str, char *buffer, size_t buflen)
{
char *end = buffer + buflen;
char *dst = buffer;
const char *src = str;
char c;
assert(buflen > 0);
while ((c = *src++) != '\0' && dst < end)
{
if (c != '\t')
*dst++ = c;
else
{
do
{
*dst++ = ' ';
} while (dst < end && (dst - buffer) % 8 != 0);
}
}
if (dst < end)
{
*dst = '\0';
return(dst - buffer);
}
else
return -1;
}
#ifdef TEST
#include <stdio.h>
#include <string.h>
#ifndef TEST_INPUT_BUFFERSIZE
#define TEST_INPUT_BUFFERSIZE 4096
#endif /* TEST_INPUT_BUFFERSIZE */
#ifndef TEST_OUTPUT_BUFFERSIZE
#define TEST_OUTPUT_BUFFERSIZE (8 * TEST_INPUT_BUFFERSIZE)
#endif /* TEST_OUTPUT_BUFFERSIZE */
int main(void)
{
char ibuff[TEST_INPUT_BUFFERSIZE];
char obuff[TEST_OUTPUT_BUFFERSIZE];
while (fgets(ibuff, sizeof(ibuff), stdin) != 0)
{
if (detab(ibuff, obuff, sizeof(obuff)) >= 0)
fputs(obuff, stdout);
else
fprintf(stderr, "Failed to detab input line: <<%.*s>>\n",
(int)(strlen(ibuff) - 1), ibuff);
}
return(0);
}
#endif /* TEST */
The biggest trouble with this test is that it is hard to demonstrate that it handles overflows in the output buffer properly. That's why there are the two '#define' sequences for the buffer sizes - with very large defaults for real work and independently configurable buffer sizes for stress testing. If the source file is dt.c, use a compilation like this:
make CFLAGS="-DTEST -DTEST_INPUT_BUFFERSIZE=32 -DTEST_OUTPUT_BUFFERSIZE=32" dt
If the 'detab()' function is to be used outside this file, you'd create a header to contain its declaration, and you'd include that header in this code, and the function would not be static, of course.
Here is one that will malloc(3) a bigger buffer of exactly the right size and return the expanded string. It does no division or modulus ops. It even comes with a test driver. Safe with -Wall -Wno-parentheses if using gcc.
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
static char *expand_tabs(const char *s) {
int i, j, extra_space;
char *r, *result = NULL;
for(i = 0; i < 2; ++i) {
for(j = extra_space = 0; s[j]; ++j) {
if (s[j] == '\t') {
int es0 = 8 - (j + extra_space & 7);
if (result != NULL) {
strncpy(r, " ", es0);
r += es0;
}
extra_space += es0 - 1;
} else if (result != NULL)
*r++ = s[j];
}
if (result == NULL)
if ((r = result = malloc(j + extra_space + 1)) == NULL)
return NULL;
}
*r = 0;
return result;
}
#include <stdio.h>
int main(int ac, char **av) {
char space[1000];
while (fgets(space, sizeof space, stdin) != NULL) {
char *s = expand_tabs(space);
fputs(s, stdout);
free(s);
}
return 0;
}

Resources