Normalizing a char array and removing extra characters (truncating) - c

I have the following code that I use to normalize a char array. At the end of the process, the normalized file has some of the old output leftover at the end. This is do to i reaching the end of the array before j. This makes sense but how do I remove the extra characters? I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
hint: you may want to use C library function isupper, isspace, tolower
do "man isupper"
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
int k = len;
if(isspace(buf[0])){
i++;
k--;
}
if(isspace(buf[len-1])){
i++;
k--;
}
for(i;i < len;i++){
if(islower(buf[i])) {
buf[j]=buf[i];
j++;
}
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i]) && !isspace(buf[j-1])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
i++;
k--;
}
}
return k;
}
Here is some sample output:
halb mwqcnfuokuqhuhy ja mdqu nzskzkdkywqsfbs zwb lyvli HALB
MwQcnfuOKuQhuhy Ja mDQU nZSkZkDkYWqsfBS ZWb lyVLi
As you can see the end part is repeating. Both the new normalized data and old remaining un-normalized data is present in the result. How can I fix this?

add a null terminator
k[newLength]='\0';
return k;

to fix like this
int normalize(unsigned char *buf, int len) {
int i, j;
for(j=i=0 ;i < len; ++i){
if(isupper(buf[i])) {
buf[j++]=tolower(buf[i]);
continue ;
}
if(isspace(buf[i])){
if(!j || j && buf[j-1] != ' ')
buf[j++]=' ';
continue ;
}
buf[j++] = buf[i];
}
buf[j] = '\0';
return j;
}

or? add a null terminator
k[newLength] = NULL;
return k;

Related

How to Replace Leading or Trailing Blank Characters with "X"

Looking for a more efficient way to replace leading and trailing empty spaces (' ') and appending an 'X' to the front for each empty space.. It seems to work ok for trailing spaces but I'd like to know if there's a better / simpler way of going about this that I am missing.
Example:
Passed in string: '12345 '
Desired result 'XXXXX12345'
Removed 5 empty spaces and append 5 'X's to front.
Example:
Passed in string: ' 12345'
Desired result 'XX12345'
Remove 2 empty spaces and append 2 'X's to front.
void fixStr(char* str)
{
int i = 0;
int length = strlen(str);
char strCopy[10];
strcpy(strCpy, str);
for(i = 0; i < length; i++)
{
if(strCopy[i] == ' ')
{
strCopy[i] = '\0';
str[i] = '\0';
break;
}
}
for(i = 0; i < length - i + 2; i++)
{
str[i] = 'X';
str[i + 1] = '\0';
}
strcat(str, strCopy);
}
One way to achieve this is to find out the leading non-space position & trailing non-space position of the string, and then move the content in-between (leading nonspace, trailing nonspace) this to end of the string, then set all the empty space at the beginning to 'x'
This way you can get the expected output (function below)
void fixStr(char* str)
{
int i = 0;
int length = strlen(str);
int leadindex = length;
int tailindex = 0;
// First find the leading nonspace position
for(i = 0; i < length; i++)
{
if(str[i] != ' ')
{
leadindex = i;
break;
}
}
// if not found nonspace then no change
if( leadindex == length )
{
// all spaces, so no change required;
return;
}
// Find the trailing nonspace position
for(i = length - 1; i >= 0 ; i--)
{
if(str[i] != ' ')
{
tailindex = i;
break;
}
}
// move the buffer (in place) to exclude trailing spaces
memmove(str + (length - tailindex -1),str,(tailindex +1) );
// set the 'x' to all empty spaces at leading ( you may use for loop to set this)
memset(str, 'X', length - (tailindex - leadindex + 1) );
}
To solve a problem the engineer's way:
Define the needs.
Know your tools.
Use the tools as simple as possible, as accurate as necessary to make up a solution.
In your case:
Needs:
find the number of trailing spaces
move content of string to the end
set beginning to 'X's
Tools:
to measure, iterate, compare and count
to move a block of memory
to initialise a block of memory
Example for a solution:
#include <string.h> /* for strlen(), memmove () and memset() */
void fix_str(char * s)
{
if ((NULL != s) && ('\0' != *s)) /* Ignore NULL and empty string! */
{
/* Store length and initialise counter: */
size_t l = strlen(s), i = l;
/* Count space(s): */
for (; (0 != i) && (' ' == s[i-1]); --i); /* This for loop does not need a "body". */
/* Calculate the complement: */
size_t c = l - i;
/* Move content to the end overwriting any trailing space(s) counted before hand: */
memmove(s+c, s, i); /* Note that using memmove() instead of memmcpy() is essential
here as the source and destination memory overlap! */
/* Initialise the new "free" characters at the beginning to 'X's:*/
memset(s, 'X', c);
}
}
I didn't fix your code but you could use sprintf in combination with isspace, something along the lines of this. Also, remember to make a space for the '\0 at the end of your string. Use this idea and it should help you:
#include <ctype.h>
#include <stdio.h>
int main()
{
char buf[11];
char *s = "Hello";
int i;
sprintf(buf, "%10s", s); /* right justifies in a column of 10 in buf */
for(i = 0; i < 10; i++) {
if(isspace(buf[i])) /* replace the spaces with an x (or whatever) */
buf[i] = 'x';
}
printf("%s\n", buf);
return 0;
}

Rearranging string letters

I was doing a program to copy all string words other than its first 2 words and putting a x at the end of it.
However i cant put x at its end. Please help!!!!
Below is my code.
#include<stdio.h>
#include<string.h>
int main()
{
char a[25], b[25];
int i, j, count = 0, l, k;
scanf("%[^\n]s", a);
i = strlen(a);
if (i > 20)
printf("Given Sentence is too long.");
else
{/* checking for first 2 words and counting 2 spaces*/
for (j = 0; j < i; j++)
{
if (a[j] == ' ')
count = count + 1;
if (count == 2)
{
k = j;
break;
}
}
/* copying remaining string into new one*/
for (j = 0; j < i - k; j++)
{
b[j] = a[j + k];
}
b[j + 1] = 'x';
printf("%s", b);
}
}
you are removing first two index. But you wrote k=j and if you check the current value j there it's 1. so you are updating wrongly k because you removed 2 indexes. So k value should be 2. So checked the below code
/* copying remaining string into new one*/
for (j = 0; j < i - 2; j++)
{
b[j] = a[j + 2];
}
b[j + 1] = 'x';
printf("%s", b);
Your index is off by one. After your second loop, the condition j < i-k was false, so j now is i-k. Therefore, the character after the end of what you copied is b[j], not b[j+1]. The correct line would therefore be b[j] = 'x';.
Just changing this would leave you with something that is not a string. A string is defined as a sequence of char, ending with a '\0' char. So you have to add b[j+1] = 0; as well.
After these changes, your code does what you intended, but still has undefined behavior.
One problem is that your scanf() will happily overflow your buffer -- use a field width here: scanf("%24[^\n]", a);. And by the way, the s at the and doesn't make any sense, you use either the s conversion or the [] conversion.
A somewhat sensible implementation would use functions suited for the job, like e.g. this:
#include<stdio.h>
#include<string.h>
int main(void)
{
// memory is *cheap* nowadays, these buffers are still somewhat tiny:
char a[256];
char b[256];
// read a line
if (!fgets(a, 256, stdin)) return 1;
// and strip the newline character if present
a[strcspn(a, "\n")] = 0;
// find first space
char *space = strchr(a, ' ');
// find second space
if (space) space = strchr(space+1, ' ');
if (space)
{
// have two spaces, copy the rest
strcpy(b, space+1);
// and append 'x':
strcat(b, "x");
}
else
{
// empty string:
b[0] = 0;
}
printf("%s",b);
return 0;
}
For functions you don't know, google for man <function>.
In C strings are array of chars as you know and the way C knows it is end of the string is '\0' character. In your example you are missing at the last few lines
/* copying remaining string into new one*/
for(j=0;j<i-k;j++)
{
b[j]=a[j+k];
}
b[j+1]='x';
printf("%s",b);
after the loop ends j is already increased 1 before it quits the loop.
So if your string before x is "test", it is like
't', 'e', 's', 't','\0' in char array, and since your j is increased more than it should have, it gets to the point just right of '\0', but characters after '\0' doesnt matter, because it is the end, so your x will not be added. Simple change to
b[j]='x';

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

Returning the length of a char array in C

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

Using histogram to find the most common letter in an array

This is what I came up with, but I always get a Run-Time Check Failure #2 - Stack around the variable 'h' was corrupted.
int mostCommonLetter(char s[]) {
int i=0, h[26],k=0, max=0, number=0;
while ( k < 26){
h[k] = '0';
k++;
}
while(s[i] != '\0'){
h[whichLetter(s[i])] = h[whichLetter(s[i])]+1;
i++;
}
h[26] = '\0';
for(i=0;h[i]!='\0';i++){
if(h[i] > max)
number=i;
}
return number;
}
You cannot do h[26] = '\0'; - h has 26 elements indexed 0..25. As you know the length of h you don't need to 0-terminate it, simply do for (i=0; i < 26; ++i)
Also, are you certain whichLetter always returns a value in the 0..25 range? What does it do if it e.g. encounters a space?
This writes past the end of the array:
h[26] = '\0';
Make the for loop depend on the length rather than the last character:
for(i=0;i<26;i++){
if(h[i] > max)
number=i;
}

Resources