Related
I implemented a read_line function:
#include<stdlib.h>
#include<stdio.h>
#include<stdbool.h>
char* read_line(){
const char UNIX_LINEBREAK = '\n';
const char WINDOWS_LINEBREAK = '\r';
const char C_STRING_TERMINATOR = '\0';
char extra_linebreak;
char current_letter;
char* line = NULL;
int position = 0;
bool reading_line = true;
while(reading_line){
scanf("%c", ¤t_letter);
if(current_letter == UNIX_LINEBREAK || current_letter == EOF){
reading_line = false;
}
else if(current_letter == WINDOWS_LINEBREAK) {
reading_line = false;
extra_linebreak = (char)getchar();
}
else {
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = current_letter;
position ++;
}
}
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = C_STRING_TERMINATOR;
return line;
}
Which I'm using for reading strings in the format:
operation number number
for example:
sum 13 13
However I'm implementing operations with numbers that may (and will) overflow the max int size. For example:
sum 23879238932898239832983298329839229383928329 239823983298392893289238932883290312803291832109230189
Which forces me to read them in a string format, parse them and finally work with them through a linked list (There may be better approaches but that's not the point yet). By now, I'm trying to use auxiliary buffers (operation, first_number_buffer and second_number_buffer) with sscanf for splitting the line read with read_line in three substrings.
#include <stdio.h>
#include <readline.h>
#include <stdio.h>
#include <readline.h>
int main (){
char* line = read_line();
char operation[4];
char* first_number_buffer;
char* second_number_buffer;
sscanf(line, "%s %s %s", operation, first_number_buffer, second_number_buffer);
printf("%s\n%s\n%s\n", line,first_number_buffer,second_number_buffer);
}
The code above doesn't work very well, since I'm not really allocating first_number_buffer and second_number_buffer yet. I would like to know if there's an efficient way for using sscanf in that situation. I didn't manage to find good results in google, since scanf overlaps sscanf results.
The problem seems to be: Usually, to dynamically allocate a string, one uses realloc to grow it size one by one. However sscanf tries to "throw" all the content of the parsed substring at once. Since strings have inconsistent sizes I cannot simply make them static, like I did with operation.
Yes, I could use a big static buffer but that seems to be an important task, and since I'm an undergrad I would like to know the proper way to to that. Thanks in advance!
I believe I manage to do what I initially intended.
It's hard to use sscanf for the task, since one would have to previously allocate the memory necessary for sscanf to use. Which, in the context of the question, is unknown.
However, #Cheatah suggested using strtok, which worked pretty fine. Here's the final version of the code, using it:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include<stdbool.h>
#include<stdlib.h>
#include<stdio.h>
char* read_line(){
const char UNIX_LINEBREAK = '\n';
const char WINDOWS_LINEBREAK = '\r';
const char C_STRING_TERMINATOR = '\0';
char extra_linebreak;
char current_letter;
char* line = NULL;
int position = 0;
bool reading_line = true;
while(reading_line){
scanf("%c", ¤t_letter);
if(current_letter == UNIX_LINEBREAK || current_letter == EOF){
reading_line = false;
}
else if(current_letter == WINDOWS_LINEBREAK) {
reading_line = false;
extra_linebreak = (char)getchar();
}
else {
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = current_letter;
position ++;
}
}
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = C_STRING_TERMINATOR;
return line;
}
int main (){
char* line = read_line();
char* operation;
char* first_number_buffer;
char* second_number_buffer;
char *line_split = strtok(line, " ");
operation = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(operation, line_split);
line_split = strtok(NULL, " ");
first_number_buffer = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(first_number_buffer, line_split);
line_split = strtok(NULL, " ");
second_number_buffer = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(second_number_buffer, line_split);
printf("%s\n%s\n%s\n", line,first_number_buffer,second_number_buffer);
}
input:
sum 23879238932898239832983298329839229383928329 239823983298392893289238932883290312803291832109230189
output:
sum
23879238932898239832983298329839229383928329
239823983298392893289238932883290312803291832109230189
The code can be improved in many ways. Some people pointed EOF is not properly checked within read_line(), and main can definitely be refactored in smaller functions.
However the idea of using strtok() as a substitute to sscanf() even with an indefinite number of tokens, for that case, works. See an example of a strtok inside a while: https://www.cplusplus.com/reference/cstring/strtok/
When running the following C file, copying the character to fgetc to my tmp pointer results in unknown characters being copied over for some reason. The characters received from fgetc() are the expected characters. However, for some reason when assigning this character to my tmp pointer unknown characters get copied over.
I've tried looking for the reason why online, but haven't found any luck. From what I have read it could be something to do with UTF-8 and ASCII issues. However, I'm not sure about the fix. I'm a relatively new C programmer and still new to memory management.
Output:
TMP: Hello, DATA!�
TEXT: Hello, DATA!�
game.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <allegro5/allegro5.h>
#include <allegro5/allegro_font.h>
const int WIN_WIDTH = 1366;
const int WIN_HEIGHT = 768;
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
int main(int argc, char **argv) {
al_init();
al_install_keyboard();
ALLEGRO_TIMER* timer = al_create_timer (1.0 / 30.0);
ALLEGRO_EVENT_QUEUE *queue = al_create_event_queue();
ALLEGRO_DISPLAY* display = al_create_display(WIN_WIDTH, WIN_HEIGHT);
ALLEGRO_FONT* font = al_create_builtin_font();
al_register_event_source(queue, al_get_keyboard_event_source());
al_register_event_source(queue, al_get_display_event_source(display));
al_register_event_source(queue, al_get_timer_event_source(timer));
int redraw = 1;
ALLEGRO_EVENT event;
al_start_timer(timer);
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
while (1) {
al_wait_for_event(queue, &event);
if (event.type == ALLEGRO_EVENT_TIMER)
redraw = 1;
else if ((event.type == ALLEGRO_EVENT_KEY_DOWN) || (event.type == ALLEGRO_EVENT_DISPLAY_CLOSE))
break;
if (redraw && al_is_event_queue_empty(queue)) {
al_clear_to_color(al_map_rgb(0, 0, 0));
al_draw_text(font, al_map_rgb(255, 255, 255), 0, 0, 0, text);
al_flip_display();
redraw = false;
}
}
free(text);
al_destroy_font(font);
al_destroy_display(display);
al_destroy_timer(timer);
al_destroy_event_queue(queue);
return 0;
}
game.DATA file:
Hello, DATA!
What I use to run the program:
gcc game.c -o game $(pkg-config allegro-5 allegro_font-5 --libs --cflags)
--EDIT--
I tried taking the file reading code and running it in a new c file, for some reason it works there, but not when in the game.c file with allegro code.
test.c:
#include <stdlib.h>
#include <stdio.h>
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
void main() {
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
free(text);
return 0;
}
Produces the correct output always:
TMP: Hello, DATA!
TEXT: Hello, DATA!
When you write a loop that updates various things each time through, like you do with tmpSize in your loop here, it's important to have a handle on what the theoretical computer science types call your "loop invariants". That is, what is it that's true each time through the loop? It's important not only to maintain your loop invariants properly, but also to pick your loop invariants so that they're easy to maintain, and easy for a later reader to understand and to verify.
Since tmpSize starts out as 1, I'm guessing your loop invariant is trying to be, "tmpSize is always one more than the size of the string I've read so far". A reason for picking that slightly-strange loop invariant is, of course, that you'll need that extra byte for the terminating \0. The other clue is that you're setting tmp[tmpSize-1] = c;.
But here's the first problem. When we exit the loop, and if tmpSize is still one more than the size of the string you've read so far, let's see what happens. Suppose we read three characters. So tmpSize should be 4. So we'll set tmp[4] = 0;. But wait! Remember, arrays in C are 0-based. So the three characters we read are in tmp[0], tmp[1], and tmp[2], and we want the terminating \0 character to go into tmp[3], not tmp[4]. Something is wrong.
But actually, it's worse than that. I wasn't at all sure I understood the loop invariant, so I cheated, and inserted a few debugging printouts. Right before the realloc call, I added
printf("realloc %zu\n", tmpSize);
and at the end, right before the tmp[tmpSize] = 0; line, I added
printf("final %zu\n", tmpSize);
The last few lines it printed (while reading a game.DATA file containing "Hello, DATA!" just like yours) were:
...
realloc 10
realloc 11
realloc 12
final 13
But this is off by two! If the last reallocation gave the array a size of 12, the valid indices are from 0 to 11. But somehow we end up writing the \0 into cell 13.
It took me a while to figure it out, but the second problem is that you do the reallocation at the top of the loop, before you've incremented tmpLen.
To me, the loop invariant of "one more than the size of the string read so far" is just too hard to think about. I very much prefer to use a loop invariant where the "size" variable keeps track of the number of characters I have read, not +1 or -1 off of that. Let's see how that loop might look. (I've also cleaned up a few other things.)
size_t tmpSize = 0;
char *tmp = malloc(tmpSize+1);
if (tmp == NULL) {
printf("malloc() failed.\n");
exit(1);
}
for (int c = getc(file); c != EOF; c = getc(file)) {
printf("realloc %zu\n", tmpSize+1+1);
tmp = realloc(tmp, tmpSize+1+1); /* +1 for c, +1 for \0 */
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
tmp[tmpSize] = c;
tmpSize++;
}
printf("final %zu\n", tmpSize);
tmp[tmpSize] = '\0';
There's still something fishy here -- I said I didn't like "fudge factors" like +1, and here I've got two -- but at least now the debugging printouts go
...
realloc 11
realloc 12
realloc 13
final 12
so it looks like I'm not overrunning the allocated memory any more.
To make this even better, I want to take a slightly different approach. You're not supposed to worry abut efficiency at first, but I can tell you that a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient. So let's make a few more changes:
size_t nchAllocated = 0;
size_t nchRead = 0;
char *tmp = NULL;
for (int c = getc(file); c != EOF; c = getc(file)) {
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
Now there are two separate variables: nchAllocated keeps track of exactly how many characters I've allocated, and nchRead keeps track of exactly how many characters I've read. And although I've doubled the number of "counter" variables, in doing so I've simplified a lot of other things, so I think it's a net improvement.
First of all, notice that there are no +1 fudge factors any more, at all.
Second, this loop doesn't call realloc every time -- instead it allocates characters 10 at a time. And because there are separate variables for the number of characters allocated versus read, it can keep track of the fact that it may have allocated more characters than it has read so far. For this code, the debugging printouts are:
realloc 10
realloc 20
final 12
Another little improvement is that we don't have to "preallocate" the array -- there's no initial malloc call. One of our loop invariants is that nchAllocated is the number of characters allocated, and we start this out as 0, and if there are no characters allocated, then it's okay that tmp starts out as NULL. This relies on the fact that when you call realloc for the first time, with tmp equal to NULL, realloc is fine with that, and essentially acts like malloc.
But there's one question you might be asking: If I got rid of all my fudge factors, where do we arrange to allocate one extra byte to hold the terminating \0 character? It's there, but it's subtle: it's lurking in the test
if(nchAllocated <= nchRead)
The very first time through the loop, nchAllocated will be 0, and nchRead will be 0, but this test will be true, so we'll allocate our first chunk of 10 characters, and we're off and running. (If we didn't care about the \0 character, the test nchAllocated < nchRead would have sufficed.)
...But, actually, I've made a mistake! There's a subtle bug here!
What if the file being read is empty? tmp will start out NULL, and we'll never make any trips through the loop, so tmp will remain NULL, so when we assign tmp[nchRead] = 0 it'll blow up.
And actually, it's worse than that. If you trace through the logic very carefully, you'll find that any time the file size is an exact multiple of 10, not enough space gets allocated for the \0, after all.
And this indicates a significant drawback of the "allocate characters 10 at a time" scheme. The code is now harder to test, because the control flow is different for files whose size is a multiple of 10. If you never happen to test that case, you won't realize that this program has a bug in it.
The way I usually fix this is to notice that the \0 byte I have to add to terminate the string is sort of balanced by the EOF character I read that indicated the end of the file. Maybe, when I read the EOF, I can use it to remind me to allocate space for the \0. That's actually easy enough to do, and it looks like this:
int c;
while(1) {
c = getc(file);
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
if(c == EOF)
break;
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
The trick here is that we don't test for EOF until after we've checked that there's enough space in the buffer, and called realloc if necessary. It's as if we allocate space in the buffer for the EOF -- except then we use that space for the \0 instead. This is what I meant by "use it to remind me to allocate space for the \0".
Now, I have to admit that there's still a drawback here, in that the loop is now somewhat unconventional. A loop that has while(1) at the top looks like an infinite loop. This one has
if(c == EOF) break;
down in the middle of it, so it is literally a "break in the middle" loop. (This is by contrast to conventional for and while loops, which are "break at the top", or a do/while loop, which is "break at the bottom".) Personally, I find this to be a useful idiom, and I use it all the time. But some programmers, and perhaps your instructor, would frown on it, because it's "weird", it's "different", it's "unconventional". And to some extent they're right: unconventional programming is somewhat dangerous programming, and is bad if later maintenance programmers can't understand it because they don't recognize or don't understand the idioms in it. (It's sort of the programming equivalent of the English word "ain't", or a split infinitive.)
Finally, if you're still with me, I have one more point to make. (And if you are still with me, thank you. I realize this answer has gotten very long, but I hope you're learning something.)
Earlier I said that "a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient." It turns out that a loop that makes the buffer bigger by 10 isn't much better, and can still be significantly inefficient. You can do a little better by incrementing it by 50 or 100, but if you're dealing with input that might be really big (thousands of characters or more), you're usually better off increasing the buffer size by leaps and bounds, perhaps by multiplying it by some factor, rather than adding. So here's the final version of that part of the loop:
if(nchAllocated <= nchRead) {
if(nchAllocated == 0) nchAllocated = 10;
else nchAllocated *= 2;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
And even this improvement -- multiplying by 2, rather than adding something -- comes with a cost: we need an extra test, to special-case the first trip through the loop, because nchAllocated started out as 0, and 0 × 2 = 0.
Your reallocation scheme is incorrect: the array is always too short by one byte and the null terminator is written one position past the end of the string, instead of at the end of the string. This causes an extra byte to be printed, with whatever value happens to be in memory in the block returned by realloc(), which is uninitialized.
It is less confusing to use tmpLen as the length of the string read si far and allocate 2 extra bytes for the newly read character and the null terminator.
Furthermore the test c != NULL makes no sense: c is byte and NULL is a pointer. Similarly, tmp[tmpSize - 1] = (char *)c; is incorrect: you should just write
tmp[tmpSize - 1] = c;
Here is a corrected version:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
char *tmp = (char *)malloc(tmpLen + 1);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
char *new_tmp = (char *)realloc(tmp, tmpLen + 2);
if (new_tmp == NULL) {
printf("realloc() failure for %zu bytes.\n", tmpLen + 2);
free(tmp);
fclose(file);
return NULL;
}
tmp = new_tmp;
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
It is usually better to reallocate in chunks or with a geometric size increment. Here is a simple implementation:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
size_t tmpSize = 16;
char *tmp = (char *)malloc(tmpSize);
char *newTmp;
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
if (tmpSize - tmpLen < 2) {
size_t newSize = tmpSize + tmpSize / 2;
newTmp = (char *)realloc(tmp, newSize);
if (newTmp == NULL) {
printf("realloc() failure for %zu bytes.\n", newSize);
free(tmp);
fclose(file);
return NULL;
}
tmpSize = newSize;
tmp = newTmp;
}
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
// try to shrink allocated block to the minimum size
// if realloc() fails, return the current block
// it seems impossible for this reallocation to fail
// but the C Standard allows it.
newTmp = (char *)realloc(tmp, tmpLen + 1);
return newTmp ? newTmp : tmp;
}
What is the easiest and most efficient way to remove spaces from a string in C?
Easiest and most efficient don't usually go together…
Here's a possible solution for in-place removal:
void remove_spaces(char* s) {
char* d = s;
do {
while (*d == ' ') {
++d;
}
} while (*s++ = *d++);
}
Here's a very compact, but entirely correct version:
do while(isspace(*s)) s++; while(*d++ = *s++);
And here, just for my amusement, are code-golfed versions that aren't entirely correct, and get commenters upset.
If you can risk some undefined behavior, and never have empty strings, you can get rid of the body:
while(*(d+=!isspace(*s++)) = *s);
Heck, if by space you mean just space character:
while(*(d+=*s++!=' ')=*s);
Don't use that in production :)
As we can see from the answers posted, this is surprisingly not a trivial task. When faced with a task like this, it would seem that many programmers choose to throw common sense out the window, in order to produce the most obscure snippet they possibly can come up with.
Things to consider:
You will want to make a copy of the string, with spaces removed. Modifying the passed string is bad practice, it may be a string literal. Also, there are sometimes benefits of treating strings as immutable objects.
You cannot assume that the source string is not empty. It may contain nothing but a single null termination character.
The destination buffer can contain any uninitialized garbage when the function is called. Checking it for null termination doesn't make any sense.
Source code documentation should state that the destination buffer needs to be large enough to contain the trimmed string. Easiest way to do so is to make it as large as the untrimmed string.
The destination buffer needs to hold a null terminated string with no spaces when the function is done.
Consider if you wish to remove all white space characters or just spaces ' '.
C programming isn't a competition over who can squeeze in as many operators on a single line as possible. It is rather the opposite, a good C program contains readable code (always the single-most important quality) without sacrificing program efficiency (somewhat important).
For this reason, you get no bonus points for hiding the insertion of null termination of the destination string, by letting it be part of the copying code. Instead, make the null termination insertion explicit, to show that you haven't just managed to get it right by accident.
What I would do:
void remove_spaces (char* restrict str_trimmed, const char* restrict str_untrimmed)
{
while (*str_untrimmed != '\0')
{
if(!isspace(*str_untrimmed))
{
*str_trimmed = *str_untrimmed;
str_trimmed++;
}
str_untrimmed++;
}
*str_trimmed = '\0';
}
In this code, the source string "str_untrimmed" is left untouched, which is guaranteed by using proper const correctness. It does not crash if the source string contains nothing but a null termination. It always null terminates the destination string.
Memory allocation is left to the caller. The algorithm should only focus on doing its intended work. It removes all white spaces.
There are no subtle tricks in the code. It does not try to squeeze in as many operators as possible on a single line. It will make a very poor candidate for the IOCCC. Yet it will yield pretty much the same machine code as the more obscure one-liner versions.
When copying something, you can however optimize a bit by declaring both pointers as restrict, which is a contract between the programmer and the compiler, where the programmer guarantees that the destination and source are not the same address. This allows more efficient optimization, since the compiler can then copy straight from source to destination without temporary memory in between.
In C, you can replace some strings in-place, for example a string returned by strdup():
char *str = strdup(" a b c ");
char *write = str, *read = str;
do {
if (*read != ' ')
*write++ = *read;
} while (*read++);
printf("%s\n", str);
Other strings are read-only, for example those declared in-code. You'd have to copy those to a newly allocated area of memory and fill the copy by skipping the spaces:
char *oldstr = " a b c ";
char *newstr = malloc(strlen(oldstr)+1);
char *np = newstr, *op = oldstr;
do {
if (*op != ' ')
*np++ = *op;
} while (*op++);
printf("%s\n", newstr);
You can see why people invented other languages ;)
#include <ctype>
char * remove_spaces(char * source, char * target)
{
while(*source++ && *target)
{
if (!isspace(*source))
*target++ = *source;
}
return target;
}
Notes;
This doesn't handle Unicode.
if you are still interested, this function removes spaces from the beginning of the string, and I just had it working in my code:
void removeSpaces(char *str1)
{
char *str2;
str2=str1;
while (*str2==' ') str2++;
if (str2!=str1) memmove(str1,str2,strlen(str2)+1);
}
#include<stdio.h>
#include<string.h>
main()
{
int i=0,n;
int j=0;
char str[]=" Nar ayan singh ";
char *ptr,*ptr1;
printf("sizeof str:%ld\n",strlen(str));
while(str[i]==' ')
{
memcpy (str,str+1,strlen(str)+1);
}
printf("sizeof str:%ld\n",strlen(str));
n=strlen(str);
while(str[n]==' ' || str[n]=='\0')
n--;
str[n+1]='\0';
printf("str:%s ",str);
printf("sizeof str:%ld\n",strlen(str));
}
The easiest and most efficient way to remove spaces from a string is to simply remove the spaces from the string literal. For example, use your editor to 'find and replace' "hello world" with "helloworld", and presto!
Okay, I know that's not what you meant. Not all strings come from string literals, right? Supposing this string you want spaces removed from doesn't come from a string literal, we need to consider the source and destination of your string... We need to consider your entire algorithm, what actual problem you're trying to solve, in order to suggest the simplest and most optimal methods.
Perhaps your string comes from a file (e.g. stdin) and is bound to be written to another file (e.g. stdout). If that's the case, I would question why it ever needs to become a string in the first place. Just treat it as though it's a stream of characters, discarding the spaces as you come across them...
#include <stdio.h>
int main(void) {
for (;;) {
int c = getchar();
if (c == EOF) { break; }
if (c == ' ') { continue; }
putchar(c);
}
}
By eliminating the need for storage of a string, not only does the entire program become much, much shorter, but theoretically also much more efficient.
/* Function to remove all spaces from a given string.
https://www.geeksforgeeks.org/remove-spaces-from-a-given-string/
*/
void remove_spaces(char *str)
{
int count = 0;
for (int i = 0; str[i]; i++)
if (str[i] != ' ')
str[count++] = str[i];
str[count] = '\0';
}
Code taken from zString library
/* search for character 's' */
int zstring_search_chr(char *token,char s){
if (!token || s=='\0')
return 0;
for (;*token; token++)
if (*token == s)
return 1;
return 0;
}
char *zstring_remove_chr(char *str,const char *bad) {
char *src = str , *dst = str;
/* validate input */
if (!(str && bad))
return NULL;
while(*src)
if(zstring_search_chr(bad,*src))
src++;
else
*dst++ = *src++; /* assign first, then incement */
*dst='\0';
return str;
}
Code example
Exmaple Usage
char s[]="this is a trial string to test the function.";
char *d=" .";
printf("%s\n",zstring_remove_chr(s,d));
Example Output
thisisatrialstringtotestthefunction
Have a llok at the zString code, you may find it useful
https://github.com/fnoyanisi/zString
That's the easiest I could think of (TESTED) and it works!!
char message[50];
fgets(message, 50, stdin);
for( i = 0, j = 0; i < strlen(message); i++){
message[i-j] = message[i];
if(message[i] == ' ')
j++;
}
message[i] = '\0';
Here is the simplest thing i could think of. Note that this program uses second command line argument (argv[1]) as a line to delete whitespaces from.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/*The function itself with debug printing to help you trace through it.*/
char* trim(const char* str)
{
char* res = malloc(sizeof(str) + 1);
char* copy = malloc(sizeof(str) + 1);
copy = strncpy(copy, str, strlen(str) + 1);
int index = 0;
for (int i = 0; i < strlen(copy) + 1; i++) {
if (copy[i] != ' ')
{
res[index] = copy[i];
index++;
}
printf("End of iteration %d\n", i);
printf("Here is the initial line: %s\n", copy);
printf("Here is the resulting line: %s\n", res);
printf("\n");
}
return res;
}
int main(int argc, char* argv[])
{
//trim function test
const char* line = argv[1];
printf("Here is the line: %s\n", line);
char* res = malloc(sizeof(line) + 1);
res = trim(line);
printf("\nAnd here is the formatted line: %s\n", res);
return 0;
}
This is implemented in micro controller and it works, it should avoid all problems and it is not a smart way of doing it, but it will work :)
void REMOVE_SYMBOL(char* string, uint8_t symbol)
{
uint32_t size = LENGHT(string); // simple string length function, made my own, since original does not work with string of size 1
uint32_t i = 0;
uint32_t k = 0;
uint32_t loop_protection = size*size; // never goes into loop that is unbrakable
while(i<size)
{
if(string[i]==symbol)
{
k = i;
while(k<size)
{
string[k]=string[k+1];
k++;
}
}
if(string[i]!=symbol)
{
i++;
}
loop_protection--;
if(loop_protection==0)
{
i = size;
break;
}
}
}
While this is not as concise as the other answers, it is very straightforward to understand for someone new to C, adapted from the Calculix source code.
char* remove_spaces(char * buff, int len)
{
int i=-1,k=0;
while(1){
i++;
if((buff[i]=='\0')||(buff[i]=='\n')||(buff[i]=='\r')||(i==len)) break;
if((buff[i]==' ')||(buff[i]=='\t')) continue;
buff[k]=buff[i];
k++;
}
buff[k]='\0';
return buff;
}
I assume the C string is in a fixed memory, so if you replace spaces you have to shift all characters.
The easiest seems to be to create new string and iterate over the original one and copy only non space characters.
I came across a variation to this question where you need to reduce multiply spaces into one space "represent" the spaces.
This is my solution:
char str[] = "Put Your string Here.....";
int copyFrom = 0, copyTo = 0;
printf("Start String %s\n", str);
while (str[copyTo] != 0) {
if (str[copyFrom] == ' ') {
str[copyTo] = str[copyFrom];
copyFrom++;
copyTo++;
while ((str[copyFrom] == ' ') && (str[copyFrom] !='\0')) {
copyFrom++;
}
}
str[copyTo] = str[copyFrom];
if (str[copyTo] != '\0') {
copyFrom++;
copyTo++;
}
}
printf("Final String %s\n", str);
Hope it helps :-)
I'm very new to C and I'm still learning the basics. I'm creating an application that reads in a text file and breaks down the words individually. My intention will be to count the amount of times each word occurs.
Anyway, the last do-while loop in the code below executes fine, and then crashes. This loop prints memory address to this word (pointer) and then prints the word. It accomplishes this fine, and then crashes on the last iteration. My intention is to push this memory address into a singly linked list, albeit once it's stopped crashing.
Also, just a quick mention regarding the array sizes below; I yet figured out how to set the correct size needed to hold the word character array etc because you must define the size before the array is filled, and I don't know how to do this. Hence why I've set them to 1024.
#include<stdio.h>
#include<string.h>
int main (int argc, char **argv) {
FILE * pFile;
int c;
int n = 0;
char *wp;
char wordArray[1024];
char delims[] = " "; // delims spaces in the word array.
char *result = NULL;
result = strtok(wordArray, delims);
char holder[1024];
pFile=fopen (argv[1],"r");
if (pFile == NULL) perror ("Error opening file");
else {
do {
c = fgetc (pFile);
wordArray[n] = c;
n++;
} while (c != EOF);
n = 0;
fclose (pFile);
do {
result = strtok(NULL, delims);
holder[n] = *result; // holder stores the value of 'result', which should be a word.
wp = &holder[n]; // wp points to the address of 'holder' which holds the 'result'.
n++;
printf("Pointer value = %d\n", wp); // Prints the address of holder.
printf("Result is \"%s\"\n", result); // Prints the 'result' which is a word from the array.
//sl_push_front(&wp); // Push address onto stack.
} while (result != NULL);
}
return 0;
}
Please ignore the bad program structure, as I mentioned, I'm new to this!
Thanks
As others have pointed out, your second loop attempts to dereference result before you check for it being NULL. Restructure your code as follows:
result = strtok( wordArray, delims ); // do this *after* you have read data into
// wordArray
while( result != NULL )
{
holder[n] = *result;
...
result = strtok( NULL, delims );
}
Although...
You're attempting to read the entire contents of the file into memory before breaking it up into words; that's not going to work for files bigger than the size of your buffer (currently 1K). If I may make a suggestion, change your code such that you're reading individual words as you go. Here's an example that breaks the input stream up into words delimited by whitespace (blanks, newlines, tabs, etc.) and punctuation (period, comma, etc.):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
char buffer[1024];
int c;
size_t n = 0;
FILE *input = stdin;
if( argc > 1 )
{
input = fopen( argv[1], "r");
if (!input)
input = stdin;
}
while(( c = fgetc(input)) != EOF )
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
fclose(input);
return 0;
}
No warranties express or implied (having pounded this out before 7:00 a.m.). But it should give you a flavor of how to parse a file as you go. If nothing else, it avoids using strtok, which is not the greatest of tools for parsing input. You should be able to adapt this general structure to your code. For best results, you should abstract that out into its own function:
int getNextWord(FILE *stream, char *buf, size_t bufsize)
{
int c;
size_t n = 0;
while(( c = fgetc(input)) != EOF && n < bufsize)
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buf[n] = 0;
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
if (n == 0)
return 0;
else
return 1;
}
and you would call it like
void foo(void)
{
char word[SOME_SIZE];
...
while (getNextWord(inFile, word, sizeof word))
{
do_something_with(word);
}
...
}
If you expect in your do...while code, that result could be null (this is the condition for loop break), how do you think this code-line:
holder[n] = *result;
must work? It seems to me, that it is the reason for crashing in your program.
Change do while loop to while
use
while (condition)
{
}
instead of
do {
}while(condition)
It is crashing because you are trying to derefrance a NULL pointer result in do while loop.
I work mostly with Objective-C and was just looking at your question for fun, but I may have a solution.
Before setting n=0; after your first do-while loop, create another variable called totalWords and set it equal to n, totalWords can be declared anywhere within the file (except within one of the do-while loops), but can be defined at the top to the else block since its lifetime is short:
totalWords = n;
then you can set n back to zero:
n = 0;
Your conditional for the final do-while loop should then say:
...
} while (n <= ++totalWords);
The logic behind the application will thus say, count the words in the file (there are n words, which is the totalWords in the file). When program prints the results to the console, it will run the second do-while loop, which will run until n is one result past the value of totalWords (this ensures that you print the final word).
Alternately, it is better practice and clearer for other programmers to use a loop and a half:
do {
result = strtok(NULL, delims);
holder[n] = *result;
wp = &holder[n];
printf("Pointer value = %d\n", wp);
printf("Result is \"%s\"\n", result);
//sl_push_front(&wp); // Push address onto stack.
if (n == totalWords) break; // This forces the program to exit the do-while after we have printed the last word
n++; // We only need to increment if we have not reached the last word
// if our logic is bad, we will enter an infinite loop, which will tell us while testing that our logic is bad.
} while (true);
What is the easiest and most efficient way to remove spaces from a string in C?
Easiest and most efficient don't usually go together…
Here's a possible solution for in-place removal:
void remove_spaces(char* s) {
char* d = s;
do {
while (*d == ' ') {
++d;
}
} while (*s++ = *d++);
}
Here's a very compact, but entirely correct version:
do while(isspace(*s)) s++; while(*d++ = *s++);
And here, just for my amusement, are code-golfed versions that aren't entirely correct, and get commenters upset.
If you can risk some undefined behavior, and never have empty strings, you can get rid of the body:
while(*(d+=!isspace(*s++)) = *s);
Heck, if by space you mean just space character:
while(*(d+=*s++!=' ')=*s);
Don't use that in production :)
As we can see from the answers posted, this is surprisingly not a trivial task. When faced with a task like this, it would seem that many programmers choose to throw common sense out the window, in order to produce the most obscure snippet they possibly can come up with.
Things to consider:
You will want to make a copy of the string, with spaces removed. Modifying the passed string is bad practice, it may be a string literal. Also, there are sometimes benefits of treating strings as immutable objects.
You cannot assume that the source string is not empty. It may contain nothing but a single null termination character.
The destination buffer can contain any uninitialized garbage when the function is called. Checking it for null termination doesn't make any sense.
Source code documentation should state that the destination buffer needs to be large enough to contain the trimmed string. Easiest way to do so is to make it as large as the untrimmed string.
The destination buffer needs to hold a null terminated string with no spaces when the function is done.
Consider if you wish to remove all white space characters or just spaces ' '.
C programming isn't a competition over who can squeeze in as many operators on a single line as possible. It is rather the opposite, a good C program contains readable code (always the single-most important quality) without sacrificing program efficiency (somewhat important).
For this reason, you get no bonus points for hiding the insertion of null termination of the destination string, by letting it be part of the copying code. Instead, make the null termination insertion explicit, to show that you haven't just managed to get it right by accident.
What I would do:
void remove_spaces (char* restrict str_trimmed, const char* restrict str_untrimmed)
{
while (*str_untrimmed != '\0')
{
if(!isspace(*str_untrimmed))
{
*str_trimmed = *str_untrimmed;
str_trimmed++;
}
str_untrimmed++;
}
*str_trimmed = '\0';
}
In this code, the source string "str_untrimmed" is left untouched, which is guaranteed by using proper const correctness. It does not crash if the source string contains nothing but a null termination. It always null terminates the destination string.
Memory allocation is left to the caller. The algorithm should only focus on doing its intended work. It removes all white spaces.
There are no subtle tricks in the code. It does not try to squeeze in as many operators as possible on a single line. It will make a very poor candidate for the IOCCC. Yet it will yield pretty much the same machine code as the more obscure one-liner versions.
When copying something, you can however optimize a bit by declaring both pointers as restrict, which is a contract between the programmer and the compiler, where the programmer guarantees that the destination and source are not the same address. This allows more efficient optimization, since the compiler can then copy straight from source to destination without temporary memory in between.
In C, you can replace some strings in-place, for example a string returned by strdup():
char *str = strdup(" a b c ");
char *write = str, *read = str;
do {
if (*read != ' ')
*write++ = *read;
} while (*read++);
printf("%s\n", str);
Other strings are read-only, for example those declared in-code. You'd have to copy those to a newly allocated area of memory and fill the copy by skipping the spaces:
char *oldstr = " a b c ";
char *newstr = malloc(strlen(oldstr)+1);
char *np = newstr, *op = oldstr;
do {
if (*op != ' ')
*np++ = *op;
} while (*op++);
printf("%s\n", newstr);
You can see why people invented other languages ;)
#include <ctype>
char * remove_spaces(char * source, char * target)
{
while(*source++ && *target)
{
if (!isspace(*source))
*target++ = *source;
}
return target;
}
Notes;
This doesn't handle Unicode.
if you are still interested, this function removes spaces from the beginning of the string, and I just had it working in my code:
void removeSpaces(char *str1)
{
char *str2;
str2=str1;
while (*str2==' ') str2++;
if (str2!=str1) memmove(str1,str2,strlen(str2)+1);
}
#include<stdio.h>
#include<string.h>
main()
{
int i=0,n;
int j=0;
char str[]=" Nar ayan singh ";
char *ptr,*ptr1;
printf("sizeof str:%ld\n",strlen(str));
while(str[i]==' ')
{
memcpy (str,str+1,strlen(str)+1);
}
printf("sizeof str:%ld\n",strlen(str));
n=strlen(str);
while(str[n]==' ' || str[n]=='\0')
n--;
str[n+1]='\0';
printf("str:%s ",str);
printf("sizeof str:%ld\n",strlen(str));
}
The easiest and most efficient way to remove spaces from a string is to simply remove the spaces from the string literal. For example, use your editor to 'find and replace' "hello world" with "helloworld", and presto!
Okay, I know that's not what you meant. Not all strings come from string literals, right? Supposing this string you want spaces removed from doesn't come from a string literal, we need to consider the source and destination of your string... We need to consider your entire algorithm, what actual problem you're trying to solve, in order to suggest the simplest and most optimal methods.
Perhaps your string comes from a file (e.g. stdin) and is bound to be written to another file (e.g. stdout). If that's the case, I would question why it ever needs to become a string in the first place. Just treat it as though it's a stream of characters, discarding the spaces as you come across them...
#include <stdio.h>
int main(void) {
for (;;) {
int c = getchar();
if (c == EOF) { break; }
if (c == ' ') { continue; }
putchar(c);
}
}
By eliminating the need for storage of a string, not only does the entire program become much, much shorter, but theoretically also much more efficient.
/* Function to remove all spaces from a given string.
https://www.geeksforgeeks.org/remove-spaces-from-a-given-string/
*/
void remove_spaces(char *str)
{
int count = 0;
for (int i = 0; str[i]; i++)
if (str[i] != ' ')
str[count++] = str[i];
str[count] = '\0';
}
Code taken from zString library
/* search for character 's' */
int zstring_search_chr(char *token,char s){
if (!token || s=='\0')
return 0;
for (;*token; token++)
if (*token == s)
return 1;
return 0;
}
char *zstring_remove_chr(char *str,const char *bad) {
char *src = str , *dst = str;
/* validate input */
if (!(str && bad))
return NULL;
while(*src)
if(zstring_search_chr(bad,*src))
src++;
else
*dst++ = *src++; /* assign first, then incement */
*dst='\0';
return str;
}
Code example
Exmaple Usage
char s[]="this is a trial string to test the function.";
char *d=" .";
printf("%s\n",zstring_remove_chr(s,d));
Example Output
thisisatrialstringtotestthefunction
Have a llok at the zString code, you may find it useful
https://github.com/fnoyanisi/zString
That's the easiest I could think of (TESTED) and it works!!
char message[50];
fgets(message, 50, stdin);
for( i = 0, j = 0; i < strlen(message); i++){
message[i-j] = message[i];
if(message[i] == ' ')
j++;
}
message[i] = '\0';
Here is the simplest thing i could think of. Note that this program uses second command line argument (argv[1]) as a line to delete whitespaces from.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/*The function itself with debug printing to help you trace through it.*/
char* trim(const char* str)
{
char* res = malloc(sizeof(str) + 1);
char* copy = malloc(sizeof(str) + 1);
copy = strncpy(copy, str, strlen(str) + 1);
int index = 0;
for (int i = 0; i < strlen(copy) + 1; i++) {
if (copy[i] != ' ')
{
res[index] = copy[i];
index++;
}
printf("End of iteration %d\n", i);
printf("Here is the initial line: %s\n", copy);
printf("Here is the resulting line: %s\n", res);
printf("\n");
}
return res;
}
int main(int argc, char* argv[])
{
//trim function test
const char* line = argv[1];
printf("Here is the line: %s\n", line);
char* res = malloc(sizeof(line) + 1);
res = trim(line);
printf("\nAnd here is the formatted line: %s\n", res);
return 0;
}
This is implemented in micro controller and it works, it should avoid all problems and it is not a smart way of doing it, but it will work :)
void REMOVE_SYMBOL(char* string, uint8_t symbol)
{
uint32_t size = LENGHT(string); // simple string length function, made my own, since original does not work with string of size 1
uint32_t i = 0;
uint32_t k = 0;
uint32_t loop_protection = size*size; // never goes into loop that is unbrakable
while(i<size)
{
if(string[i]==symbol)
{
k = i;
while(k<size)
{
string[k]=string[k+1];
k++;
}
}
if(string[i]!=symbol)
{
i++;
}
loop_protection--;
if(loop_protection==0)
{
i = size;
break;
}
}
}
While this is not as concise as the other answers, it is very straightforward to understand for someone new to C, adapted from the Calculix source code.
char* remove_spaces(char * buff, int len)
{
int i=-1,k=0;
while(1){
i++;
if((buff[i]=='\0')||(buff[i]=='\n')||(buff[i]=='\r')||(i==len)) break;
if((buff[i]==' ')||(buff[i]=='\t')) continue;
buff[k]=buff[i];
k++;
}
buff[k]='\0';
return buff;
}
I assume the C string is in a fixed memory, so if you replace spaces you have to shift all characters.
The easiest seems to be to create new string and iterate over the original one and copy only non space characters.
I came across a variation to this question where you need to reduce multiply spaces into one space "represent" the spaces.
This is my solution:
char str[] = "Put Your string Here.....";
int copyFrom = 0, copyTo = 0;
printf("Start String %s\n", str);
while (str[copyTo] != 0) {
if (str[copyFrom] == ' ') {
str[copyTo] = str[copyFrom];
copyFrom++;
copyTo++;
while ((str[copyFrom] == ' ') && (str[copyFrom] !='\0')) {
copyFrom++;
}
}
str[copyTo] = str[copyFrom];
if (str[copyTo] != '\0') {
copyFrom++;
copyTo++;
}
}
printf("Final String %s\n", str);
Hope it helps :-)