removing multi-char constants in C - c

Here's some code I found in a very old C library that's trying to eat whitespace from a file...
while(
(line_buf[++line_idx] != ' ') &&
(line_buf[ line_idx] != ' ') &&
(line_buf[ line_idx] != ',') &&
(line_buf[ line_idx] != '\0') )
{
This great thread explains what the problem is, but most of the answers are "just ignore it" or "you should never do this". What I don't see, however, is the canonical solution. Can anyone offer a way to code this test using the "proper way"?
UPDATE: to clarify, the question is "what is the proper way to test for the presence of a string of one or more characters at a given index in another string". Forgive me if I am using the wrong terminology.

Original question
There is no canonical or correct way. Multi-character constants have always been implementation defined. Look up the documentation for the compiler used when the code was written and figure out what was meant.
Updated question
You can match multiple characters using strchr().
while (strchr( " ,", line_buf[++line_idx] ))
{
Again, this does not account for that multi-char constant. You should figure out why that was there before simply removing it.
Also, strchr() does not handle Unicode. If you are dealing with a UTF-8 stream, for example, you will need a function capable of handling it.
Finally, if you are concerned about speed, profile. The compiler might get you better results using the three (or four) individual test expressions in the ‘while’ condition.
In other words, the multiple tests might be the best solution!
Beyond that, I smell some uncouth indexing: the way that line_idx is updated depends on the surrounding code to actuate the loop properly. Make sure that you don’t create an off-by-one error when you update stuff.
Good luck!

UPDATE: to clarify, the question is "what is the proper way to test
for the presence of a string of one or more characters at a given
index in another string". Forgive me if I am using the wrong
terminology.
Well, there are a number of ways, but the standard way is using strspn which has the prototype:
size_t strspn(const char *s, const char *accept);
and it cleverly:
calculates the length (in bytes) of the initial segment of s
which consists entirely of bytes in accept.
This allows you to test for the "the presence of a string of one or more characters at a given index in another string" and tells you how many of the characters from that string were sequentially matched.
For example, if you had another string say char s = "somestring"; and wanted to know if it contained the letters r, s, t, say, in char *accept = "rst"; beginning at the 5th character, you could test:
size_t n;
if ((n = strspn (&s[4], accept)) > 0)
printf ("matched %zu chars from '%s' at beginning of '%s'\n",
n, accept, &s[4]);
To compare in order, you can use strncmp (&s[4], accept, strlen (accept));. You can also simply use nestest loops to iterate over s with the characters in accept.
All of the ways are "proper", so long as they do not invoke Undefined Behavior (and are reasonable efficient).

Related

Integer to pointer type conversion

int count_words(string word)
{
string spaces = "";
int total_words = 1;
int i, j = 0 ;
for (i = 0; i < strlen(word); i++)
{
strcpy(spaces, word[i]);
if (strcmp(spaces, " ") == 0)
{
total_words = total_words + 1;
}
}
return total_words;
}
I am trying to make a function in c that gets the total number of words, and my strategy is to find the number of spaces in the string input. However i get an error at strcpy about integer to ptrtype conversion,. I cant seem to compare the 2 strings without getting the error. Can someone explain to me whats how the error is happening and how I would go about fixing it. The IDE is also suggesting me to add an ampersand beside word[i] but it then makes a segmentation fault output
You need to learn a little more about the distinction between characters and strings in C.
When you say
strcpy(spaces, word[i]);
it looks like you're trying to copy one character to a string, so that in the next line you can do
if (strcmp(spaces, " ") == 0)
to compare the string against a string consisting of one space character.
Now, it's true, if you're trying to compare two strings, you do have to call strcmp. Something like
if (spaces == " ") /* WRONG */
definitely won't cut it.
In this case, though, you don't need to compare strings. You're inspecting your input a character at a time, so you can get away with a simple character comparison instead. Get rid of the spaces string and the call to strcpy, and just do
if (word[i] == ' ')
Notice that I'm comparing against the character ' ', not the string " ". Using == to compare single characters like this is perfectly fine.
Sometimes, you do have to construct a string out of individual characters, "by hand", but when you do, it's a little more elaborate. It would look like this:
char spaces[2];
spaces[0] = word[i];
spaces[1] = '\0';
if (strcmp(spaces, " ") == 0)
...
This would work, and you might want to try it to be sure, but it's overkill, and there's no reason to write it that way, except perhaps as a learning exercise.
Why didn't your code
strcpy(spaces, word[i]);
work? What did the error about "integer to pointer conversion" mean? Actually there are several things wrong here.
It's not clear what the actual type of the string spaces is (more on this later), but it has space for at most 0 characters, so you're not going to be able to copy a 1-character string into it.
It's also not clear that spaces is even writable. It might be a constant, meaning that you can't legally copy any characters into it.
Finally, strcpy copies one string to another. In C, although strings are arrays, they're usually referred to as pointers. So strcpy accepts two pointers, one to the source and one to the destination string. But you passed it word[i], which is a single character, not a string. Let's say the character was A. If you hadn't gotten the error, and if strcpy had tried to do its job, it would have treated A as a pointer, and it would have tried to copy a string from address 65 in memory (because the ASCII value of the character A is 65).
This example shows that working with strings is a little bit tricky in C. Strings are represented as arrays, but arrays are second-class citizens in C. You can't pass arrays around, but arrays are usually referred to by simple pointers to their first element, which you can pass around. This turns out to be very convenient and efficient once you understand it, but it's confusing at first, and takes some getting used to.
It might be nice if C did have a built-in, first-class string type, but it does not. Since it does not, C programmers myst always keep in mind the distinction between arrays and characters when working with strings. That "convenient" string typedef they give you in CS50 turns out to be a bad idea, because it's not actually convenient at all -- it merely hides an important distinction, and ends up making things even more confusing.

Identyfying prefix in the same string as a suffix

Eg-
maabcma is valid because it contains ma as a proper prefix as well as a proper suffix.
panaba is not.
How do I find out if a word is valid or not as above in C language?
I'm not very good at string operations. So, please help me out with a pseudocode.
Thanks in advance.
I'm completely lost. T=number of test cases.
EDIT: New code. My best code so far-
#include<stdio.h>
#include<string.h>
void main()
{
int i,T,flag=0;
int j,k,len=0;
char W[10],X[10];
scanf("%d",&T);
for(i=0;i<T;i++)
{
scanf("%s",W);
for(len=0;W[len]!='\0';len++)
X[len]=W[len];
X[len]='\0';
for(j=len-1;j>=0;j--)
for(k=0;k<len;k++)
{
if(X[k]!=W[j])
flag=0;
else if((j-k)==(len-1))
flag==1;
}
if (flag == 1)
printf("NICE\n");
else
printf("NOT\n");
}
}
Still not getting the proper results. Where am I going wrong?
The thing is you are only setting the value of flag if a match exists, otherwise you must set it to 0. because see, if I have:
pammbap
my prefix is pam and suffix is bap.
According to the final for loop,
p and a match so flag is set to 1.
but when it comes to b and m it does not become zero. Hence, it returns true.
First, void is not a valid return type for main, unless you are developing for Plan 9.
Second, you should get into the habit of checking the return value of scanf() and all input functions in general. You can't rely on the value of T if the user does not input a number, because T is uninitialised. On that same note, you shouldn't use scanf with an unbounded %s scan operation. If the user enters 20 characters, this isn't going to fit into the ten character buffer that you have. An alternative approach is to use fgets to get a whole line of text at once, or, to use a bounded scan operation. If your array fits 10 characters (including the null terminator) then you can use scanf("%9s", W).
Third, single-character variable names are often very hard to understand. Instead of W, use word, instead of T, use testCount or something similar. This means that someone looking at your code for the first time can more easily work out what each variable is used for.
Most importantly, think about the process in your head, and maybe jot it down on paper. How would you solve this problem yourself? As an example, starting with n = 1,
Take the first n characters from the string.
Compare it to the last n characters from the string
Do they match?
If yes, print out the first n characters as the suffix and stop processing.
If no, increment n and try again. Try until n is in the middle of the string.
There are a few other things to think about as well, do you want the biggest match? For example, in the input string ababcdabab, the prefix ab is also the suffix, but the same can be said about abab. In this case, you don't want to stop processing, you want to keep going even if you find a prefix, so, you should just store the length of the largest prefix that is also the suffix.
Second-most-importantly, running into hurdles like this is incredibly common when learning C, so don't let this put a dampener on your enthusiasm, just keep trying!

Pointer mystery/noobish issue

I am originally a Java programmer who is now struggling with C and specifically C's pointers.
The idea on my mind is to receive a string, from the user, on a command line, into a character pointer. I then want to access its individual elements. The idea is later to devise a function that will reverse the elements' order. (I want to work with anagrams in texts.)
My code is
#include <stdio.h>
char *string;
int main(void)
{
printf("Enter a string: ");
scanf("%s\n",string);
putchar(*string);
int i;
for (i=0; i<3;i++)
{
string--;
}
putchar(*string);
}
(Sorry, Code marking doesn't work).
What I am trying to do is to have a first shot at accessing individual elements. If the string is "Santillana" and the pointer is set at the very beginning (after scanf()), the content *string ought to be an S. If unbeknownst to me the pointer should happen to be set at the '\0' after scanf(), backing up a few steps (string-- repeated) ought to produce something in the way of a character with *string. Both these putchar()'s, though, produce a Segmentation fault.
I am doing something fundamentally wrong and something fundamental has escaped me. I would be eternally grateful for any advice about my shortcomings, most of all of any tips of books/resources where these particular problems are illuminated. Two thick C books and the reference manual have proved useless as far as this.
You haven't allocated space for the string. You'll need something like:
char string[1024];
You also should not be decrementing the variable string. If it is an array, you can't do that.
You could simply do:
putchar(string[i]);
Or you can use a pointer (to the proposed array):
char *str = string;
for (i = 0; i < 3; i++)
str++;
putchar(*str);
But you could shorten that loop to:
str += 3;
or simply write:
putchar(*(str+3));
Etc.
You should check that scanf() is successful. You should limit the size of the input string to avoid buffer (stack) overflows:
if (scanf("%1023s", string) != 1)
...something went wrong — probably EOF without any data...
Note that %s skips leading white space, and then reads characters up to the next white space (a simple definition of 'word'). Adding the newline to the format string makes little difference. You could consider "%1023[^\n]\n" instead; that looks for up to 1023 non-newlines followed by a newline.
You should start off avoiding global variables. Sometimes, they're necessary, but not in this example.
On a side note, using scanf(3) is bad practice. You may want to look into fgets(3) or similar functions that avoid common pitfalls that are associated with scanf(3).

Tool functions for chars

I want to handle some char variables and would like to get a list of some functions that can do these tasks when it comes to handling chars.
Getting first characters of a char (var_name[1] doesnt seem to work)
Getting last characters of a char
Checking for char1 matches with char2 ( eg if "unicorn" matches words with "bicycle"
I am pretty sure some of these methods exist in libraries such as stdio.h or so but google isnt my friend.
EDIT:My 3rd question means not direct match with strcmp but single character match(eg if "hey" and "hello") have e as common letter.
Use var_name[0] to get first character (array indexes run from 0 to N - 1, where N is the number of elements in the array).
Use var_name[strlen(var_name) - 1] to get the last character.
Use strcmp() to compare two char strings.
EDIT:
To search for character in a string you can use strchr():
if (strchr("hello", 'e') && strchr("hey", 'e'))
{
}
There is also strpbrk() function that would indicate if two strings have any common characters:
if (strpbrk("hello", "hey"))
{
}
Assuming you mean a char[], and not a char which is a single character.
C uses 0-based indexing, var_name[0] gives you the first char.
strlen() gives you the length of the string, which together with my answer to 1. means
char lastchar = var_name[strlen(var_name)-1]; http://www.cplusplus.com/reference/clibrary/cstring/strlen/
strcmp(var_name1, var_name2) == 0. http://www.cplusplus.com/reference/clibrary/cstring/strcmp/
I am pretty sure some of these methods exist in libraries such as
stdio.h or so but google isnt my friend.
The string functions in the C standard library (libc) are described in the header file . If you're on a unix-ish machine, try typing man 3 string at a command line. You can then use the man program again to get more information about specific functions, e.g. man 3 strlen. (The '3' just tells man to look in "section 3", which describes the C standard library functions.)
What you're looking for is the string functions in the C runtime library. These are defined in string.h, not stdio.h.
But your list of problems is simple:
var_name[0] works perfectly well for accessing the first char in an array. var_name[ 1] doesn't work because arrays in C are zero-based.
The last char in an array is:
char c;
c = var_name[strlen(var_name)-1];
Testing for equality is simple:
if (var_name[0] == var_name[1])
; // they match
C and C++ strings are zero indexed. The memory you need to hold a particular length string has to be at least the string length and one character for the string terminator \0. So, the first character is array[0].
As #Carey Gregory said, the basic string handling functions are in string.h. But these are only primitives for handling strings. C is a low level enough language, that you have an opportunity to build up your own string handling library based on the functions in string.h.
On example might be that you want to pass a string pointer to a function and also the length of the buffer holding that sane string, not just the string length itself.

Different ways to calculate string length

A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?
I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.
If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);
One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.
You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.
So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.
EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!
The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}

Resources