Input Until Non-Word Character in C, Like Java's \W - c

I want to input words from a file which are delimited by anything that isn't a letter. Is there an easy way to do this in C similar to using the \W predefined character classes in Java?
How should I approach this?
Thanks.

Character classes in general and \W specifically really aren't related to Java at all. They're just part of regular expression support, which is available for many languages, including GNU C. See GNU C library Regular Expressions.

You can make use of the is_ family of functions defined in <ctype.h>.
For instance, isalpha() will tell you if something is an alphabetic character. isspace() will tell you if something is whitespace.

You can use fscanf (provided you're guaranteed a max length of word)
char temp_str[1000];
char temp_w[1000]
while(fscanf(file, "%[a-zA-Z]%[^a-zA-Z]", temp_str, temp_w) != EOF) {
printf("STR: %s\n", temp_str);
}
Edit: this should do the trick, temp_str will hold the sequence of letters and temp_w will hold the delimiting characters inbetween

Related

removing multi-char constants in C

Here's some code I found in a very old C library that's trying to eat whitespace from a file...
while(
(line_buf[++line_idx] != ' ') &&
(line_buf[ line_idx] != ' ') &&
(line_buf[ line_idx] != ',') &&
(line_buf[ line_idx] != '\0') )
{
This great thread explains what the problem is, but most of the answers are "just ignore it" or "you should never do this". What I don't see, however, is the canonical solution. Can anyone offer a way to code this test using the "proper way"?
UPDATE: to clarify, the question is "what is the proper way to test for the presence of a string of one or more characters at a given index in another string". Forgive me if I am using the wrong terminology.
Original question
There is no canonical or correct way. Multi-character constants have always been implementation defined. Look up the documentation for the compiler used when the code was written and figure out what was meant.
Updated question
You can match multiple characters using strchr().
while (strchr( " ,", line_buf[++line_idx] ))
{
Again, this does not account for that multi-char constant. You should figure out why that was there before simply removing it.
Also, strchr() does not handle Unicode. If you are dealing with a UTF-8 stream, for example, you will need a function capable of handling it.
Finally, if you are concerned about speed, profile. The compiler might get you better results using the three (or four) individual test expressions in the ‘while’ condition.
In other words, the multiple tests might be the best solution!
Beyond that, I smell some uncouth indexing: the way that line_idx is updated depends on the surrounding code to actuate the loop properly. Make sure that you don’t create an off-by-one error when you update stuff.
Good luck!
UPDATE: to clarify, the question is "what is the proper way to test
for the presence of a string of one or more characters at a given
index in another string". Forgive me if I am using the wrong
terminology.
Well, there are a number of ways, but the standard way is using strspn which has the prototype:
size_t strspn(const char *s, const char *accept);
and it cleverly:
calculates the length (in bytes) of the initial segment of s
which consists entirely of bytes in accept.
This allows you to test for the "the presence of a string of one or more characters at a given index in another string" and tells you how many of the characters from that string were sequentially matched.
For example, if you had another string say char s = "somestring"; and wanted to know if it contained the letters r, s, t, say, in char *accept = "rst"; beginning at the 5th character, you could test:
size_t n;
if ((n = strspn (&s[4], accept)) > 0)
printf ("matched %zu chars from '%s' at beginning of '%s'\n",
n, accept, &s[4]);
To compare in order, you can use strncmp (&s[4], accept, strlen (accept));. You can also simply use nestest loops to iterate over s with the characters in accept.
All of the ways are "proper", so long as they do not invoke Undefined Behavior (and are reasonable efficient).

C: Parsing through an input file in order to check formatting

My goal here is to read through a text file that has to follow these formatting regulations:
No spaces/tabs between characters
Characters must be either non-negative integer or new line character (no letters/symbols)
I can only use functions given in stdlib.h and stdio.h
I am thinking of reading through the file character by character using the fgetc() function, but I can't think of a way that tests whether or not the character is a new line character (isn't a new line char /n, which would be two chars together which would ruin the idea of going char by char?).
Following this train of thought I was thinking that using getline(), which would negate the necessity of checking if a char is a new line character, would be easier (am I right in thinking this or would this not negate such a requirement?). Yet if I were to do this what would be the easiest way to traverse through the char string that this would produce in order to still check each individual character?
Also, if someone could think of an easier route as to checking for the format of a file using the given libraries that would be much appreciated.
If you use getline it will retain the newline character so you'll still have to check for it.
If you can use any of the standard C library then you can use isdigit(...) from ctypes.h
It returns non-zero if the input character is a digit.
If you use the getline function the input would be written into the buffer that you pass as the first argument. It will append a null terminal character to this buffer so you can walk through it as so:
for(char* s = buffer; *s; s++)
/* test *s */

C define string as char

Is it possible to define a string as char in C like this? I think C calls it multi character constant.
#define OK '_/'
I want C to treat '_/' as a char from now on, not a string, so this:
printf("%c", OK);
prints _/ and not /
While it is technically valid C to define OK as '_/', the value of a multi-character character constant is implementation defined, so this is probably not something you want to do.
There is no way you will be able to print more than one character without resorting to strings.
Multi character constants are of int type and their value is not strictly defined-- it's platform dependent stuff. So using them as normal letters is not best idea, even though you can use them in every context as normal char there is no guarantee that they will be compiled as you intend (as in your example you get only last char from ur string).
here you have explanation of the topic:
Multiple characters in a character constant

How to restrict input to the defined size of an array through scanf?

How to limit the number of character input accepted to the array size-1 with scanf?
Like for char a[10], to only take 9 characters.
Is it possible or not. And in which header file is snprintf defined? Is it compiler specific? Let me know the header file for it.
How to limit the number of character input accepted to the array
size-1 with scanf? Like for char a[10], to only take 9 characters. Is
it possible or not.
It's possible to specify a maximum field width, but it's probably not what you want. I'm going to derive my example from the %s format directive. Just place the width, in decimal digits, between the % and the s, for example "%9s" instructs scanf to read and copy no more than 9 characters. Keep in mind that %s instructs scanf to read a word of input, not necessarily a line. It'll stop copying input into your array as soon as it comes across a whitespace character.
char a[10];
/* TODO: Handle scanf errors */
assert(scanf("%9s", a) == 1);
And in which header file is snprintf defined?
<stdio.h>.
Is it compiler specific?
There are two types of C99 programming implementations. Freestanding implementations don't require <stdio.h>, so technically snprintf isn't required to exist in these environments. It's most likely that you're using a hosted implementation, however. Hosted implementations require snprintf in <stdio.h>.

Tool functions for chars

I want to handle some char variables and would like to get a list of some functions that can do these tasks when it comes to handling chars.
Getting first characters of a char (var_name[1] doesnt seem to work)
Getting last characters of a char
Checking for char1 matches with char2 ( eg if "unicorn" matches words with "bicycle"
I am pretty sure some of these methods exist in libraries such as stdio.h or so but google isnt my friend.
EDIT:My 3rd question means not direct match with strcmp but single character match(eg if "hey" and "hello") have e as common letter.
Use var_name[0] to get first character (array indexes run from 0 to N - 1, where N is the number of elements in the array).
Use var_name[strlen(var_name) - 1] to get the last character.
Use strcmp() to compare two char strings.
EDIT:
To search for character in a string you can use strchr():
if (strchr("hello", 'e') && strchr("hey", 'e'))
{
}
There is also strpbrk() function that would indicate if two strings have any common characters:
if (strpbrk("hello", "hey"))
{
}
Assuming you mean a char[], and not a char which is a single character.
C uses 0-based indexing, var_name[0] gives you the first char.
strlen() gives you the length of the string, which together with my answer to 1. means
char lastchar = var_name[strlen(var_name)-1]; http://www.cplusplus.com/reference/clibrary/cstring/strlen/
strcmp(var_name1, var_name2) == 0. http://www.cplusplus.com/reference/clibrary/cstring/strcmp/
I am pretty sure some of these methods exist in libraries such as
stdio.h or so but google isnt my friend.
The string functions in the C standard library (libc) are described in the header file . If you're on a unix-ish machine, try typing man 3 string at a command line. You can then use the man program again to get more information about specific functions, e.g. man 3 strlen. (The '3' just tells man to look in "section 3", which describes the C standard library functions.)
What you're looking for is the string functions in the C runtime library. These are defined in string.h, not stdio.h.
But your list of problems is simple:
var_name[0] works perfectly well for accessing the first char in an array. var_name[ 1] doesn't work because arrays in C are zero-based.
The last char in an array is:
char c;
c = var_name[strlen(var_name)-1];
Testing for equality is simple:
if (var_name[0] == var_name[1])
; // they match
C and C++ strings are zero indexed. The memory you need to hold a particular length string has to be at least the string length and one character for the string terminator \0. So, the first character is array[0].
As #Carey Gregory said, the basic string handling functions are in string.h. But these are only primitives for handling strings. C is a low level enough language, that you have an opportunity to build up your own string handling library based on the functions in string.h.
On example might be that you want to pass a string pointer to a function and also the length of the buffer holding that sane string, not just the string length itself.

Resources