Remove redundant whitespace from string (in-place) - c

Ok so i posted earlier about trying to (without any prebuilt functions) remove additional spaces so
"this is <insert many spaces!> a test" would return
"this is a test"
Remove spaces from a string, but not at the beginning or end
As it was homework i asked for no full solutions and some kind people provided me with the following
"If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'write' pointer to point to the next non space character. Copy to the read position the character pointed by the write character. Then advance the read and write pointers to the character after the character being copied."
The problem is i now want to fully smash my computer in to pieces as i am sooooo irritated by it. I didnt realise at the time i couldnt utilise the char array so was using array indexes to do it, i thought i could suss how to get it to work but now i am just using pointers i am finding it very hard. I really need some help, not full solutions. So far this is what i am doing;
1)create a pointer called write, so i no where to write to
2)create a pointer called read so i no where to read from
(both of these pointers will now point to the first element of the char array)
while (read != '\0')
if read == a space
add one to read
if read equals a space now
add one to write
while read != a space {
set write to = read
}
add one to both read and write
else
add one to read and write
write = read

Try just doing this yourself character by character on a piece of paper and work out what it is you are doing first, then translate that into code.
If you are still having trouble, try doing something simpler first, for example just copying a string character for character without worrying about the "remove duplicate spaces" part of it - just to make sure that you haven't made a silly mistake elsewhere.

The advice of trying to do it with pen and paper first is good. And it really doesn't matter if you do it with pointers or array indexing; you can either use a reader and a writer pointer, or a reader and a writer index.
Think about when you want to move the indices forward. You always move the write index forward after you write a character. You move the read index forward when you've read a character.
Perhaps you could start with some code that just moves over the string, but actually doesn't change it. And then you add the logic that skips additional spaces.

char p[] = "this is a test";
char *readptr = &p[0];
char *writeptr = &p[0];
int inspaces = 0;
while(*readptr) {
if(isspace(*readptr)) {
inspaces ++;
} else {
inspaces = 0;
}
if(inspaces <= 1) {
*writeptr = *readptr;
writeptr++;
}
readptr++;
}
*writeptr = 0;

Here's a solution that only has a single loop (no inner loop to skip spaces) and no state data:
dm9pZCBTdHJpcFNwYWNlcyAoY2hhciAqdGV4dCkNCnsNCiAgY2hhciAqc3JjID0gdGV4dCwgKmRl
c3QgPSB0ZXh0Ow0KICB3aGlsZSAoKnNyYykNCiAgew0KICAgICpkZXN0ID0gKihzcmMrKyk7DQog
ICAgaWYgKCpzcmMgIT0gJyAnIHx8ICpkZXN0ICE9ICcgJykNCiAgICB7DQogICAgICArK2Rlc3Q7
DQogICAgfQ0KICB9DQogICpkZXN0ID0gMDsNCn0NCg==
The above is Base64 encoded as this is a homework question. To decode, copy the above block and paste into this website to decode it (you'll need to check the "Decode" radio button).

assuming you're trying to write a function like:
void removeSpaces(char * str){
/* ... stuff that changes the contents of str[] */
}
You want to scan the string for consecutive spaces, so that your write pointer is always trailing your read pointer. Advance your read pointer to a place where you are pointing at a space, but the next character is not a space. If your read and write pointers are not the same, then your write pointer ought to be pointing at the beginning of a sequence of spaces. The difference between your write and read pointer (i.e. read_pointer - write_pointer) will tell you the number of consecutive spaces that need to be overwritten to close the gap. When there is a difference of greater than zero, (prefix) advance both pointers along by that many positions, copying characters as you go. When you're read pointer is at the end of the string ('\0'), you should be done.

Can you use regular expressions? If so, it might be really easy. Use a regex to replace \s{2,}? with a single space. The \s means any white space (tabs, spaces, carriage feeds...); {2,} means 2 or more; ? means non-greedy. (Disclaimer: my regex might not be the best one you could write, since I'm no regex pro. Also, it's .net syntax, so the regex library for C might have slightly different syntax.)

Why don't you do something like this:
make a second char** that is the same length as the first. As you run through the first array with your pointers keep an extra pointer on the last space you've seen. If the last space you saw was the previous element then you don't copy that char to the second array.
But I would start with something that runs through the char** by each character and prints out each char. If you can make that happen then you can work on actually copying them into a second char**.

Related

Trying to read text file data into an array of structs?

Here is what I want to do:
Read my 2 text files into an "array of structs" (this is how it is worded on my assignment).
Dynamically create sufficient memory for each entry read from the file
Here is one of the structs I am working with:
typedef struct {
int eventid;
char eventdate[20];
char venuename[20];
char country[20];
int rockid;
} Venue;
Within my main function I have the array setup to receive the text as:
Venue *(places[20]);
Now comes the more complex part. I need to open the file for reading (I got this to work perfectly) and then dynamically allocate the memory for each entry. I know I need to use malloc to do this but I have never used it before and am sort of at a loss. Here is what I have so far:
void load_data(void)
{
char buffer[20]; //stating that each line can't be longer than 20 chars
int i = 0,len; //declaring 2 int variables
FILE * venuePtr=fopen("venueinfo.txt", "r");
if (venuePtr != NULL)
printf("\n**Venue info file has been opened!**\n");
else{
printf("\nPlease create a file named venueinfo.txt and restart!\n");
} //so far so good...
while (!feof(venuePtr)){ //while we have not found the eof key...
fscanf(venuePtr,"%s",buffer); //we scan each line of text
len = strlen(buffer); //find the length (len) of the string
places[i]=(char*)malloc(len+1); //allocate memory space for the word here
strcpy(places[i],buffer); //copy a word into our array
++i; //finally we move on to the next element in the array
} //end while
The problem resides in the while loop and I have been working this for 2 days straight. I have 5 members in my struct and am thinking that strcpy may not work. This is only part of the problem though I am sure. I just can't wrap my head around reading in everything. The file itself is a super simple txt file and looks like this:
1 Jan10 Citadel Belgium 8
4 May05 Sunrise Belize 6
3 Jun17 Footloose Brazil 4
you are trying to copy string into the venue structure, how do you expect that to work ?
strcpy(venue[i],buffer);
please give an example of your file, you probably need to parse each element and write it to structure members
Stuff you can skip but should eventually do:
use function(void), not just empty() for functions. visual studio, which you are using allows it, but it's bad format.
declaring global variables is likewise bad form. You want to declare them in main and pass them in.
finally, you want to return. return 0 from a successful run of main. If you have a void function, still, return; before the final closing '}' character of the function.
oh and fscan_s isn't portable, it's a Microsoft function.
stuff you actually asked about:
Now, onto your problem. Don't allocate memory and assign it. You have statically allocated memory for your structures, and for your strings, by making them arrays with a given number of characters. If you want to statically allocate memory, you need to use a pointer.
if you scan the first number, the rock id into id, you would assign the first venue's rock id as,
venue[0].rockid = id;
for arrays, you do have to string copy. You already allocated memory for them, so you just have to use strcpy.
but you can't just copy strings into the structure and have it all end up in the right place. You need to get each part and add it seperately
That means you either need to read in each element separately, like "%d%s" to read an int then a string, or whatever, or you need to split up your string after reading in the whole thing. NOTE
That %s won't read the whole line!!! It will stop at the first white-space character (new line, tab, even a space) so if you %s "hi there" you get "hi". You may want to use %[^\n] which while characters don't match \n.
My suggestion is to use fscanf with multiple items, but if you need to split the string up, you want to use sscanf, which lets you scan a string again.
Finally, you don't need to test for feof, and that's actually problematic. It's far better to use while (fscanf(parameters go in here) > 0) since EOF is generally -1 and 0 means no items were scanned. Either way, you are done reading.
I suggest you start small. It looks like you are trying to jump ahead without understanding fundamentals, and C is kinda brutal about needing to know those.
Good luck.
P.S. It's very possible I made a small mistake here just now, writing this, because I'm not a C expert, but I'm sure somebody will come along and help me. Finding mistakes is how we learn, so don't get discouraged. :)

Why does this example use null padding in string comparisons? “Programming Pearls”: Strings of Pearls

In "Programming Pearls": Strings of Pearls, section 15.3 (Generating Text), the author introduces how to generate random text from an input document. In the source code, there are some things that I don't understand.
for (i = 0; i < k; i++)
word[nword][i] = 0;
The author explains: "After reading the input, we append k null characters(so the comparison function doesn't run off the end)." This explanation really confuses me, since it still works well after commenting these two lines. Why is this necessary?
Doing that reduces the number of weird cases you have to deal with when doing character-by-character comparisons.
alphabet
alpha___
If you stepped through this one letter at a time, and the null padding at the end of alpha wasn't there, you'd try to examine the next element... and run right off the end of the array.
The null padding basically ensures that when there's a character in one word, there's a corresponding character in the other. And since the null character has a value of 0, the shorter word always going to be considered as 'less than' the longer one!
As to why it seems to work without those lines, there's two associated reasons I can think of:
This was written in C. C does not guard its array boundaries; you can read whatever junk data is beyond the space that was allocated for it, and you'd never hear a thing.
Your input document is made such that you never compare two strings where one is a prefix of the other (like alpha is to alphabet).
As already explained in another answer, the purpose is to null terminate the string.
But I read the posted link and that loop doesn't make sense. If one looks at the comparison function used, there is no reason why the whole string must be filled with zeroes in this case. A plain word[nword][0] = 0; without the for loop would have worked just as fine. Or preferably:
word[nword][0] = '\0';
Filling the whole string with zeroes will add quite some overhead execution time.

Pointer mystery/noobish issue

I am originally a Java programmer who is now struggling with C and specifically C's pointers.
The idea on my mind is to receive a string, from the user, on a command line, into a character pointer. I then want to access its individual elements. The idea is later to devise a function that will reverse the elements' order. (I want to work with anagrams in texts.)
My code is
#include <stdio.h>
char *string;
int main(void)
{
printf("Enter a string: ");
scanf("%s\n",string);
putchar(*string);
int i;
for (i=0; i<3;i++)
{
string--;
}
putchar(*string);
}
(Sorry, Code marking doesn't work).
What I am trying to do is to have a first shot at accessing individual elements. If the string is "Santillana" and the pointer is set at the very beginning (after scanf()), the content *string ought to be an S. If unbeknownst to me the pointer should happen to be set at the '\0' after scanf(), backing up a few steps (string-- repeated) ought to produce something in the way of a character with *string. Both these putchar()'s, though, produce a Segmentation fault.
I am doing something fundamentally wrong and something fundamental has escaped me. I would be eternally grateful for any advice about my shortcomings, most of all of any tips of books/resources where these particular problems are illuminated. Two thick C books and the reference manual have proved useless as far as this.
You haven't allocated space for the string. You'll need something like:
char string[1024];
You also should not be decrementing the variable string. If it is an array, you can't do that.
You could simply do:
putchar(string[i]);
Or you can use a pointer (to the proposed array):
char *str = string;
for (i = 0; i < 3; i++)
str++;
putchar(*str);
But you could shorten that loop to:
str += 3;
or simply write:
putchar(*(str+3));
Etc.
You should check that scanf() is successful. You should limit the size of the input string to avoid buffer (stack) overflows:
if (scanf("%1023s", string) != 1)
...something went wrong — probably EOF without any data...
Note that %s skips leading white space, and then reads characters up to the next white space (a simple definition of 'word'). Adding the newline to the format string makes little difference. You could consider "%1023[^\n]\n" instead; that looks for up to 1023 non-newlines followed by a newline.
You should start off avoiding global variables. Sometimes, they're necessary, but not in this example.
On a side note, using scanf(3) is bad practice. You may want to look into fgets(3) or similar functions that avoid common pitfalls that are associated with scanf(3).

Array fill in C

I have this problem with a lot of arrays in my program, and I can't understand why. I think I miss something on array theory.
"Someone" adds at the end of my arrays some sort of char characters such as ?^)(&%. For example if I have an array of lenght 5 with "hello", so it's full, sometimes it prints hello?()/&%%. I can undesrtand it can occur if it's of 10 elements and i use only 5, so maybe the other 5 elements get some random values, but if it's full, where the hell gets those strange values?
I partially solve it by manaully adding at the end the character '\0'.
For example this problem occurs, sometimes, when I try to fill an array from another array (i read a line form a test file with fgets, then I have to extract single words):
...
for(x=0;fgets(c,500,fileb);x++) { // read old local file
int l=strlen(c);
i=0;
for (k=0;k<(l-34);k++) {
if(c[k+33]!='\n') {
userDatabaseLocalPath[k]=c[k+33];
}
}
Thanks
Strings in C are terminated by a character with the value 0, often referred to as a character literal, i.e. '\0'.
A character array of size 5 can not hold the string hello, since the terminator doesn't fit. Functions expecting a terminator will be confused.
To declare an array holding a string, the best syntax to use is:
char greeting[] = "hello";
This way, you don't need to specify the length (count the characters), since the compiler does that for you. And you also don't need to include the terminator, it's added automatically so the above will create this, in memory:
+-+-+-+-+-+--+
greeting: |h|e|l|l|o|\0|
+-+-+-+-+-+--+
You say that you have problems "filling an array from another longer array", this sounds like an operation most referred to as string copying. Since strings are just arrays with terminators, you can't blindly copy a longer string over a shorter, unless you know that there is extra space.
Given the above, this code would invoke undefined behavior:
strcpy(greeting, "hi there!");
since the string being copied into the greeting array is longer than what the array has space for.
This is typically avoided by using "known to be large enough" buffers, or adding checks that manually keep track of the space used. There is a function called strncpy() which sort of does this, but I would not recommend using it since its exact semantics are fairly odd.
You are facing the issue of boundary limit for the array.. if the array is of size 5 , then its not necessary that the sixth location which will be \0 be safe.. As it is not memory reserved/assigned for your array.. If after sometime some other application accesses this memory and writes to it. you will lose the \0 resulting in the string helloI accessed this space being read. which is what you are getting.

Pointers in c (how to point to the first char in a string with a pointer pointing somewhere else in the same string)

If I have a pointer that is pointing somewhere in a string, let's say it is pointing at the third letter (we do not know the letter position, basically we don't know it is the third letter), and we want it to point back to the first letter so we can make the string to be NULL how do we do that?
For example:
if we have ascii as a pointer
ascii is pointing now somewhere in the string, and i want it to point at the first char of the string how do i do that?
(Note:
I tried saying
int len = strlen(ascii);
ascii -= len;
ascii = '0';
but it is not working, it changes wherever the pointer is to 0 but not the first char to 0)
You cannot. C and C++ go by the "no hidden costs" rule, which means, among other things, noone else is going to secretly store pointers to beginnings of your strings for you. Another common thing is array sizes; you have to store them yourself as well.
First of all, in your code, if ascii is a pointer to char, that should be
*ascii = '\0';
not what you wrote. Your code sets the pointer itself to the character '0'. Which means it's pointing to a bad place!
Second of all, strlen returns the length of the string you are pointing to. Imagine the string is 10 characters long, and you are pointing at the third character. strlen will return 8 (since the first two characters have been removed from the calculation). Subtracting this from where you are pointing will point you to 6 characters before the start of the string. Draw a picture to help see this.
IMHO, without having some other information, it is not possible to achieve what you are wanting to do.
From what I said above, you should be able to work out what you need to do as long as you keep the original length of the string for example.
No. You, however, can make it point at the last character of the string, which is right before '\0' in every string (it's called zero-terminated string).
What you could do is instead of a char* you could use a pointer to a struct which contains the information you need and the string.
It's not guaranteed to work, but you might get away with backing up until you find a null character, and then moving forward one.
while(*ascii) ascii--;
ascii++;

Resources