I've recently been learning about different conversion specifiers, but I am struggling to use one of the more complex conversion specifiers. The one in question being the bracket specifier (%[set]).
To my understanding, from what I've read, using %[set] where any string matching the sequence of characters in set (the scanset) is consumed and assigned, and using %[^set] has the opposite effect; in essence consuming and assigning any string that does not contain the sequence of characters in the scanset.
That's my understanding, albeit roughly explained. I was trying to use this specifier with sscanf to remove a specified character from a string using sscanf:
sscanf(str_1, "%[^#]", str_2);
Suppose that str_1 contains "OH#989". My intention is to store this string in str_2, but removing the hash character in the process. However, sscanf stops reading at the hash character, storing only "OH" when I am intending to store "OH989".
Am I using the correct method in the wrong way, or am I using the wrong method altogether? How can I correctly remove/extract a specified character from a string using sscanf? I know this is possible to achieve with other functions and operators, but ideally I am hoping to use sscanf.
The scanset matches a sequence of (one or more) characters that either do or don't match the contents of the scanset brackets. It stops when it comes across the first character that isn't in the scanset. To get the two parts of your string, you'd need to use something like:
sscanf(str_1, "%[^#]#%[^#]", str_2, str_3);
We can negotiate on the second conversion specification; it might be that %s is sufficient, or some other scanset is appropriate. But this would give you the 'before #' and 'after #' strings that could then be concatenated to give the desired result string.
I guess, if you really want to use sscanf for the purpose of removing a single target character, you could do this:
char str_2[strlen(str_1) + 1];
if (sscanf(str_1, "%[^#]", str_2) == 1) {
size_t len = strlen(str_2);
/* must verify if a '#' was found at all */
if (str_1[len] != '\0') {
strcpy(str_2 + len, str_1 + len + 1);
}
} else {
/* '#' is the first character */
strcpy(str_2, str_1 + 1);
}
As you can see, sscanf is not the right tool for the job, because it has many quirks and shortcomings. A simple loop is more efficient and less error prone. You could also parse str_1 into 2 separate strings with sscanf(str_1, "%[^#]#%[\001-\377]", str_2, str_3); and deal with the 3 possible return values:
char str_2[strlen(str_1) + 1];
char str_3[strlen(str_1) + 1];
switch (sscanf(str_1, "%[^#]#%[\001-\377]", str_2, str_3)) {
case 0: /* empty string or initial '#' */
strcpy(str_2, str_1 + (str_1[0] == '#'));
break;
case 1: /* no '#' or no trailing part */
break;
case 2: /* general case */
strcat(str_2, str_3);
break;
}
/* str_2 hold the result */
Removing a target character from a string using sscanf
sscanf() is not the best tool for this task, see far below.
// Not elegant code
// Width limits omitted for brevity.
str_2[0] = '\0';
char *p = str_2;
// Test for the end of the string
while (*str_1) {
int n; // record stopping offset
int cnt = sscanf(str_1, "%[^#]%n", p, &n);
if (cnt == 0) { // first character is a #
str_1++; // advance to next
} else {
str_1 += n; // advance n characters
p += n;
}
}
Simple loop:
Remove the needles from a haystack and save the hay in a bail.
char needle = '#';
assert(needle);
do {
while (*haystack == needle) haystack++;
} while (*bail++ = *haystack++);
With the 2nd method, code could use haystack = bail = str_1
Related
I have just started learning C after coding for some while in Java and Python.
I was wondering how I could "validate" a string input (if it stands in a certain criteria) and I stumbled upon the sscanf() function.
I had the impression that it acts kind of similarly to regular expressions, however I didn't quite manage to tell how I can create rather complex queries with it.
For example, lets say I have the following string:
char str[]={"Santa-monica 123"}
I want to use sscanf() to check if the string has only letters, numbers and dashes in it.
Could someone please elaborate?
The fact that sscanf allows something that looks a bit like a character class by no means implies that it is anything at all like a regular expression library. In fact, Posix doesn't even require the scanf functions to accept character ranges inside character classes, although I suspect that it will work fine on any implementation you will run into.
But the scanning problem you have does not require regular expressions, either. All you need is a repeated character class match, and sscanf can certainly do that:
#include <stdbool.h>
bool check_string(const char* s) {
int n = 0;
sscanf(s, "%*[-a-zA-Z0-9]%n", &n);
return s[n] == 0;
}
The idea behind that scanf format is that the first conversion will match and discard the longest initial sequence consisting of valid characters. (It might fail if the first character is invalid. Thanks to #chux for pointing that out.) If it succeeds, it will then set n to the current scan point, which is the offset of the next character. If the next character is a NUL, then all the characters were good. (This version returns OK for the empty string, since it contains no illegal characters. If you want the empty string to fail, change the return condition to return n && s[n] == 0;)
You could also do this with the standard regex library (or any more sophisticated library, if you prefer, but the Posix library is usually available without additional work). This requires a little bit more code in order to compile the regular expression. For efficiency, the following attempts to compile the regex only once, but for simplicity I left out the synchronization to avoid data races during initialization, so don't use this in a multithreaded application.
#include <regex.h>
#include <stdbool.h>
bool check_string(const char* s) {
static regex_t* re_ptr = NULL;
static regex_t re;
if (!re_ptr) regcomp((re_ptr = &re), "^[[:alnum:]-]*$", REG_EXTENDED);
return regexec(re_ptr, s, 0, NULL, 0) == 0;
}
I want to use sscanf() to check if the string has only letters, numbers and dashes in it.
Variation of #rici good answer.
Create a scanset for letters, numbers and dashes.
//v The * indicates to scan, but not save the result.
// v Dash (or minus sign), best to list first.
"%*[-0-9A-Za-z]"
// ^^^^^^ Letters a-z, both cases
// ^^^ Digits
Use "%n" to detect how far the scan went.
Now we can use determine if
Scanning stop due to a null character (the whole string is valid)
Scanning stop due to an invalid character
int n = 0;
sscanf(str, "%*[-0-9A-Za-z]%n", &n);
bool success = (str[n] == '\0');
sscanf does not have this functionality, the argument you are referring to is a format specifier and not used for validation. see here: https://www.tutorialspoint.com/c_standard_library/c_function_sscanf.htm
as also mentioned sscanf is for a different job. for more in formation see this link. You can loop over string using isalpha and isdigit to check if chars in string are digits and alphabetic characters or no.
char str[]={"Santa-monica 123"}
for (int i = 0; str[i] != '\0'; i++)
{
if ((!isalpha(str[i])) && (!isdigit(str[i])) && (str[i] != '-'))
printf("wrong character %c", str[i]);//this will be printed for spaces too
}
I want to ... check if the string has only letters, numbers and dashes in it.
In C that's traditionally done with isalnum(3) and friends.
bool valid( const char str[] ) {
for( const char *p = str; p < str + strlen(str); p++ ) {
if( ! (isalnum(*p) || *p == '-') )
return false;
}
return true;
}
You can also use your friendly neighborhood regex(3), but you'll find that requires a surprising amount of code for a simple scan.
After retrieving value on sscanf(), you may use regular expression to validate the value.
Please see Regular Expression ic C
I have an array of charracters where I put in information using a gets().
char inname[30];
gets(inname);
How can I add another character to this array without knowing the length of the string in c? (the part that are actual letters and not like empty memmory spaces of romething)
note: my buffer is long enough for what I want to ask the user (a filename, Probebly not many people have names longer that 29 characters)
Note that gets is prone to buffer overflow and should be avoided.
Reading a line of input:
char inname[30];
sscanf("%.*s", sizeof(inname), inname);
int len = strlen(inname);
// Remove trailing newline
if (len > 0 && inname[len-1] == '\n') {
len--;
inname[len] = '\0'
}
Appending to the string:
char *string_to_append = ".";
if (len + strlen(string_to_append) + 1) <= sizeof(inname)) {
// There is enough room to append the string
strcat(inname, string_to_append);
}
Optional way to append a single character to the string:
if (len < sizeof(inname) - 2) {
// There is room to add another character
inname[len++] = '.'; // Add a '.' character to the string.
inname[len] = '\0'; // Don't forget to nul-terminate
}
As you have asked in comment, to determine the string length you can directly use
strlen(inname);
OR
you can loop through string in a for loop until \0 is found.
Now after getting the length of prvious string you can append new string as
strcat(&inname[prevLength],"NEW STRING");
EDIT:
To find the Null Char you can write a for loop like this
for(int i =0;inname[i] != 0;i++)
{
//do nothing
}
Now you can use i direcly to copy any character at the end of string like:
inname[i] = Youe Char;
After this increment i and again copy Null char to(0) it.
P.S.
Any String in C end with a Null character termination. ASCII null char '\0' is equivalent to 0 in decimal.
You know that the final character of a C string is '\0', e.g. the array:
char foo[10]={"Hello"};
is equivalent to this array:
['H'] ['e'] ['l'] ['l'] ['0'] ['\0']
Thus you can iterate on the array until you find the '\0' character, and then you can substitute it with the character you want.
Alternatively you can use the function strcat of string.h library
Short answer is you can't.
In c you must know the length of the string to append char's to it, in other languages the same applies but it happens magically, and without a doubt, internally the same must be done.
c strings are defined as sequences of bytes terminated by a special byte, the nul character which has ascii code 0 and is represented by the character '\0' in c.
You must find this value to append characters before it, and then move it after the appended character, to illustrate this suppose you have
char hello[10] = "Hello";
then you want to append a '!' after the 'o' so you can just do this
size_t length;
length = strlen(hello);
/* move the '\0' one position after it's current position */
hello[length + 1] = hello[length];
hello[length] = '!';
now the string is "Hello!".
Of course, you should take car of hello being large enough to hold one extra character, that is also not automatic in c, which is one of the things I love about working with it because it gives you maximum flexibility.
You can of course use some available functions to achieve this without worrying about moving the '\0' for example, with
strcat(hello, "!");
you will achieve the same.
Both strlen() and strcat() are defined in string.h header.
I have a bunch of strings that I need to verify if these have all spaces.
I can do strlen(trim(strct.data)) > 0.
But, it's not null terminated, but the length is known.
i.e. if strct.len is 5 then I need to verify if strct.data has spaces for 5 chars. 6th char is not guaranteed to be null. I have an array of strct each of which can have different length of data to be validated for spaces.
I tried strnlen(trim(strct.data)) and later realized it didn't fix anything as the trim already removed all spaces.
Any ideas other than the obvious looping over each character of strct.data (my last option if there is no other go)?
note: trim is a userdefined function I use to remove leading and trailing spaces. It doesn't stop until the NULL too. I am looking for a way to handle both.
How to ensure the string is full of spaces for a given length?
step 1:
char buf[MAX_SIZE];
sprintf(buf,"%*s",MAX_SIZE-1,""); //fill buffer with spaces
step 2:
Now use strncmp() compare strct.len number of character of strct.data character array with buf
if(strncmp(strct.data ,buf ,strct.len) ==0)
{
//all are spaces
}
You need not to repeat step1.
Another solution jxh suggested you can also use memset() instead of sprintf()
memset(buf, ' ', sizeof buf); //fill buf with all spaces
You need do this once, next time onwards you need not do this.
You can also use VLA.
declaring char buf[strct.len];
but you need to use memset each time.
Probably the best is doing the looping yourself:
for(int i=0; i<strct.len; ++i) {
if(strct[i] != ' ') {
return false;
}
}
return true;
As the character array is not null terminated, it is not a string.
But let's not quibble on that point and make a fast routine for large arrays.
IsCharArrayAllSpace(const char *p, size_t Length) {
if (Length < 1) return 1; // TBD: decide how to handle the zero-length case
return (p[0] == ' ') && (memcmp(p, &p[1], Length-1) == 0);
}
I'm having this kind of input data.
<html>......
<!-- OK -->
I only want to extract the data before the comment sign <!--.
This is my code:
char *parse_data(char *input) {
char *parsed_data = malloc(strlen(input) * sizeof(char));
sscanf(input, "%s<!--%*s", parsed_data);
return parsed_data;
}
However, it doesn't seem to return the expected result. I can't figure out why is that so.
Could anyone explain me the proper way to extract this kind of data and the behavior of 'sscanf()`.
Thank you!
The "%s" format specifier will not treat "<!--" as a single delimiter, or any of the individual characters as a delimiter (which would not be the correct behaviour anyway). Only whitespace is considered a delimiter. Scan sets are available in sscanf() but they take a collection of individual characters rather that a sequence of characters representing a single delimiter. This means that everything in input before the first whitespace character will be assigned to parsed_data.
You could use strstr() instead:
const char* comment_start = strstr(input, "<!--");
char* result = 0;
if (comment_start)
{
result = malloc(comment_start - input + 1);
memcpy(result, input, comment_start - input);
result[comment_start - input] = 0;
}
Note that sizeof(char) is guaranteed to be 1 so can be omitted as part of the malloc() argument calculation.
Consider a char array like this:
43 234 32 32
I want the last value that is 32 in integer form.
The string size/length is not known. In the above example there are 4 numbers, but the size will vary.
How can this be done?
i have copied these value from the file onto char array.now i want the last number in integer variable
When you were copying,add a counter of # of characters copied. Then do this
int count = 0;
char c;
while(c = readCharFromFile()) {
array[count++] = c;
}
int last = array[count - 1];
There are many ways to solve this.
Convert every token (space delimited string) into a number and when the tokens run out return the last value converted.
Scan the line for tokens until you get to the end and then convert the last token into a number.
Start at the end of the line. Skip spaces and store digits until the first space is encountered and then convert the result to a number.
Split the string into an array of strings and convert the last one into a number.
I could go on and on but you get the idea I hope.
int getLastInt(char *data)
{
size_t i = strlen(data);
if(!i--) return -1; // failure
for(;i;--i)
{
if(data[i] == ' ')
{
return atoi(&data[i+1]);
}
}
return -1; // failure
}
Should work as long as the data has a space + actual text.
You could also skip the strlen and just loop forward, which could be faster depending on your system's strlen.
If there is no trailing whitespace on the line:
int last_int(const char *s)
{
const char *ptr = strrchr(s, ' ');
if (ptr == NULL) {
ptr = s;
} else {
ptr++;
}
return atoi(ptr);
}
If there can be trailing whitespace, then you'll need to do something like what ProdigySim suggested, but with more states, to walk backwards past the trailing whitespace (if any), then past the number, then call atoi(). No matter what you do, you'll need to watch out for boundary conditions and edge cases and decide how you want to handle them.
I guess you want to use a combination of strtok_r and atoi