What is the simplest way to detect a substring in a specific format?
For example, consider the string in C
"[random characters/symbols] a-b-c [random characters/symbols]"
Is there a function in C that allows me to detect the substring in the format "%s-%s-%s"?
Try starting at various points within the string until success.
"%*[^- ] look for a sub-string that does not contain a '-' nor space.
"%n Record the offset in the scan.
#include<stdio.h>
int main(void) {
char *s = "[random characters/symbols] a-b-c [random characters/symbols]";
while (*s) {
int n = 0;
sscanf(s, "%*[^- ]-%*[^- ]-%*[^- ]%n", &n);
if (n) {
printf("Success '%.*s'\n", n, s);
break;
}
s++;
}
return 0;
}
Output
Success 'a-b-c'
Use strchr() or strnchr() if you have it to detect a literal string (no pattern matching). The function strnchr() is better because you can specify a max length to protect against a string with a missing null terminator; but, it is not ANSI so not all languages have it. If you use strchr() make sure you protect against a missing null terminator.
You can use regcomp() to do a regular expressions search the string.
See regex in C language using functions regcomp and regexec toggles between first and second match
Related
I have just started learning C after coding for some while in Java and Python.
I was wondering how I could "validate" a string input (if it stands in a certain criteria) and I stumbled upon the sscanf() function.
I had the impression that it acts kind of similarly to regular expressions, however I didn't quite manage to tell how I can create rather complex queries with it.
For example, lets say I have the following string:
char str[]={"Santa-monica 123"}
I want to use sscanf() to check if the string has only letters, numbers and dashes in it.
Could someone please elaborate?
The fact that sscanf allows something that looks a bit like a character class by no means implies that it is anything at all like a regular expression library. In fact, Posix doesn't even require the scanf functions to accept character ranges inside character classes, although I suspect that it will work fine on any implementation you will run into.
But the scanning problem you have does not require regular expressions, either. All you need is a repeated character class match, and sscanf can certainly do that:
#include <stdbool.h>
bool check_string(const char* s) {
int n = 0;
sscanf(s, "%*[-a-zA-Z0-9]%n", &n);
return s[n] == 0;
}
The idea behind that scanf format is that the first conversion will match and discard the longest initial sequence consisting of valid characters. (It might fail if the first character is invalid. Thanks to #chux for pointing that out.) If it succeeds, it will then set n to the current scan point, which is the offset of the next character. If the next character is a NUL, then all the characters were good. (This version returns OK for the empty string, since it contains no illegal characters. If you want the empty string to fail, change the return condition to return n && s[n] == 0;)
You could also do this with the standard regex library (or any more sophisticated library, if you prefer, but the Posix library is usually available without additional work). This requires a little bit more code in order to compile the regular expression. For efficiency, the following attempts to compile the regex only once, but for simplicity I left out the synchronization to avoid data races during initialization, so don't use this in a multithreaded application.
#include <regex.h>
#include <stdbool.h>
bool check_string(const char* s) {
static regex_t* re_ptr = NULL;
static regex_t re;
if (!re_ptr) regcomp((re_ptr = &re), "^[[:alnum:]-]*$", REG_EXTENDED);
return regexec(re_ptr, s, 0, NULL, 0) == 0;
}
I want to use sscanf() to check if the string has only letters, numbers and dashes in it.
Variation of #rici good answer.
Create a scanset for letters, numbers and dashes.
//v The * indicates to scan, but not save the result.
// v Dash (or minus sign), best to list first.
"%*[-0-9A-Za-z]"
// ^^^^^^ Letters a-z, both cases
// ^^^ Digits
Use "%n" to detect how far the scan went.
Now we can use determine if
Scanning stop due to a null character (the whole string is valid)
Scanning stop due to an invalid character
int n = 0;
sscanf(str, "%*[-0-9A-Za-z]%n", &n);
bool success = (str[n] == '\0');
sscanf does not have this functionality, the argument you are referring to is a format specifier and not used for validation. see here: https://www.tutorialspoint.com/c_standard_library/c_function_sscanf.htm
as also mentioned sscanf is for a different job. for more in formation see this link. You can loop over string using isalpha and isdigit to check if chars in string are digits and alphabetic characters or no.
char str[]={"Santa-monica 123"}
for (int i = 0; str[i] != '\0'; i++)
{
if ((!isalpha(str[i])) && (!isdigit(str[i])) && (str[i] != '-'))
printf("wrong character %c", str[i]);//this will be printed for spaces too
}
I want to ... check if the string has only letters, numbers and dashes in it.
In C that's traditionally done with isalnum(3) and friends.
bool valid( const char str[] ) {
for( const char *p = str; p < str + strlen(str); p++ ) {
if( ! (isalnum(*p) || *p == '-') )
return false;
}
return true;
}
You can also use your friendly neighborhood regex(3), but you'll find that requires a surprising amount of code for a simple scan.
After retrieving value on sscanf(), you may use regular expression to validate the value.
Please see Regular Expression ic C
I am trying to extract the number 4 and 3 from the string /ab/cd__my__sep__4__some__sep__3. I am trying with regex but not sure how would I do this. I wrote the following code, but it just prints out __my__sep__4__some__sep__3
#include <stdio.h>
#include <regex.h>
#include <string.h>
#include <stdlib.h>
int main() {
char* s = "/ab/cd__my__sep__4__some__sep__3";
regex_t regex;
int reti = regcomp(®ex,"__my__sep__([0-9]+)",REG_EXTENDED);
if(reti!=0) {
exit(-1);
}else {
regmatch_t match[2];
reti = regexec(®ex, s, 2, match, 0);
if(reti == 0) {
char *v = &s[match[1].rm_so];
ssize_t fl;
sscanf(v, "%zu", &fl);
printf("%s",v);
}else {
printf("else");
}
}
}
How could I extract the numbers 4 and 3 ?
match[0] refers to the part of the text matched by the entire pattern. match[1] is the match corresponding to the first capture (parenthesized subpattern).
Note that &s[match[1].rm_so] gives you a pointer to the start of the capture, but if you print the string at that point, you will get the part of the string starting at the beginning of the capture. In this case, that doesn't really matter. Since you're using sscanf to extract the integer value of the captured text, the fact that the substring isn't terminated immediately doesn't matter; it's not going to be followed by a digit, and sscanf will stop at the first non-digit.
But in the general case, it's possible that it will not be so easy to identify the end of the matched capture, and you can use one of these techniques:
If you want to print the capture, you can use a computed string width format: (See Note 1.)
printf("%.*s\n", match[1].rm_eo - match[1].rm_so, &s[match[1].rm_so]);
If you have strndup, you can easily create a dynamically-allocated copy of the capture: (See Note 2.)
char* capture = strndup(&s[match[1].rm_so], match[1].rm_eo - match[1].rm_so);
As a quick-and-dirty hack, it is also possible to just insert a NUL terminator (assuming that the searched string is not immutable, which means that it cannot be a string literal). You'll probably want to save the old value of the following character so that you can restore the string to it's original state:
char* capture = &s[match[1].rm_so];
char* rest = &s[match[1].rm_eo];
char saved_char = *rest;
*rest = 0;
/* capture now points to a NUL-terminated string. */
/* ... */
/* restore s */
*rest = saved_char;
None of the above is really necessary in the context of the original question, since the sscanf as written will work perfectly if you change the start of the string to scan from match[0] to match[1].
Notes:
In the general case, you should test to make sure that a capture was actually found before trying to use its offset. The rm_so member will be -1 if the capture was not found during the regex search That doesn't necessarily mean that the search failed, because the capture could be part of an alternative not used in the match.
Don't forget to free the copy when you no longer need it. If you don't have strndup, it's pretty easy to implement. But watch out for the corner cases.
Since you are using sscanf(), there is no need to use a regex. You can parse the two numbers from your string using sscanf() alone using the format string: "%*[^0-9]%d%*[^0-9]%d" where "%*[^0-9]" uses the assignment suppression '*' to read and discard all non-digit characters and then uses "%d" to extract the integer value. The full format-string just repeats those two patterns twice.
A short example using your input could be:
#include <stdio.h>
int main (void) {
char *s = "/ab/cd__my__sep__4__some__sep__3";
int a, b;
if (sscanf (s, "%*[^0-9]%d%*[^0-9]%d", &a, &b) == 2)
printf ("a: %d\nb: %d\n", a, b);
else {
fputs ("error: parse of integers failed.\n", stderr);
return 1;
}
}
Example Use/Output
$ ./bin/parse2ints
a: 4
b: 3
If you find yourself attempting to parse something that sscanf() cannot handle, then a regex is appropriate. Here, sscanf() is more than capable of handling your needs alone.
Create a regex format that only holds [0-9]. Then create a separate boolean function checking whether a character belongs or not to your regex. Then apply the function to your string. If true, add the character to the string you want to output
I've recently been learning about different conversion specifiers, but I am struggling to use one of the more complex conversion specifiers. The one in question being the bracket specifier (%[set]).
To my understanding, from what I've read, using %[set] where any string matching the sequence of characters in set (the scanset) is consumed and assigned, and using %[^set] has the opposite effect; in essence consuming and assigning any string that does not contain the sequence of characters in the scanset.
That's my understanding, albeit roughly explained. I was trying to use this specifier with sscanf to remove a specified character from a string using sscanf:
sscanf(str_1, "%[^#]", str_2);
Suppose that str_1 contains "OH#989". My intention is to store this string in str_2, but removing the hash character in the process. However, sscanf stops reading at the hash character, storing only "OH" when I am intending to store "OH989".
Am I using the correct method in the wrong way, or am I using the wrong method altogether? How can I correctly remove/extract a specified character from a string using sscanf? I know this is possible to achieve with other functions and operators, but ideally I am hoping to use sscanf.
The scanset matches a sequence of (one or more) characters that either do or don't match the contents of the scanset brackets. It stops when it comes across the first character that isn't in the scanset. To get the two parts of your string, you'd need to use something like:
sscanf(str_1, "%[^#]#%[^#]", str_2, str_3);
We can negotiate on the second conversion specification; it might be that %s is sufficient, or some other scanset is appropriate. But this would give you the 'before #' and 'after #' strings that could then be concatenated to give the desired result string.
I guess, if you really want to use sscanf for the purpose of removing a single target character, you could do this:
char str_2[strlen(str_1) + 1];
if (sscanf(str_1, "%[^#]", str_2) == 1) {
size_t len = strlen(str_2);
/* must verify if a '#' was found at all */
if (str_1[len] != '\0') {
strcpy(str_2 + len, str_1 + len + 1);
}
} else {
/* '#' is the first character */
strcpy(str_2, str_1 + 1);
}
As you can see, sscanf is not the right tool for the job, because it has many quirks and shortcomings. A simple loop is more efficient and less error prone. You could also parse str_1 into 2 separate strings with sscanf(str_1, "%[^#]#%[\001-\377]", str_2, str_3); and deal with the 3 possible return values:
char str_2[strlen(str_1) + 1];
char str_3[strlen(str_1) + 1];
switch (sscanf(str_1, "%[^#]#%[\001-\377]", str_2, str_3)) {
case 0: /* empty string or initial '#' */
strcpy(str_2, str_1 + (str_1[0] == '#'));
break;
case 1: /* no '#' or no trailing part */
break;
case 2: /* general case */
strcat(str_2, str_3);
break;
}
/* str_2 hold the result */
Removing a target character from a string using sscanf
sscanf() is not the best tool for this task, see far below.
// Not elegant code
// Width limits omitted for brevity.
str_2[0] = '\0';
char *p = str_2;
// Test for the end of the string
while (*str_1) {
int n; // record stopping offset
int cnt = sscanf(str_1, "%[^#]%n", p, &n);
if (cnt == 0) { // first character is a #
str_1++; // advance to next
} else {
str_1 += n; // advance n characters
p += n;
}
}
Simple loop:
Remove the needles from a haystack and save the hay in a bail.
char needle = '#';
assert(needle);
do {
while (*haystack == needle) haystack++;
} while (*bail++ = *haystack++);
With the 2nd method, code could use haystack = bail = str_1
My problem is: Input the string then replace the word that we want to change
For example: input: i love coke
word: coke
replace: pepsi
result: i love pepsi
But when i run this code it crashed. Can you help show me the mistake?
#include <stdio.h>
#include<string.h>
char replace(char s1[100],char s2[100],char s3[100])
{
int k,i,j;
for(i=0;i<strlen(s1);i++)
for(j=0;j<strlen(s2);j++)
for(k=0;k<strlen(s3);k++)
{
if(s1[i]==s2[j])
{
s1[i]=s3[k];
}
}
return s3;
}
int main()
{
char s1[100],s2[100],s3[100];
printf("input string: ");gets(s1);
printf("Find string: ");gets(s2);
printf("Replace: ");gets(s3);
printf("Result: %s",replace(s1,s2,s3));
return 0;
}
I suggest you use a 4th buffer to store the generated result. You won't be able to replace locally if the word to be replaced and the new word aren't the same length.
Also, you are comparing characters individually. Just because you found a c doesn't automatically mean you found coke and that you should replace it. You must check the entire word is there before replacing anything. Use strstr() to locate substrings inside a string.
In addition, your function is returning a char, it should return a string (char *).
Furthermore, there are plenty of examples online on how to write a function to replace words on a string, so lets not be reduntant. Google it.
Strings in C are null terminated e.g. "i love coke\0". The string length does not include the null terminator. Because of this you are overwriting the null terminator after the 'e' with the 'i' in "pepsi".
A quick hack to check if null terminating the string would help, is to memset s1, s2, and s3 to 0.
Your approach doesn't quite work. What you need to do is search the input string for the word you wish to replace. So, before you even start switching things around, you need to search for the whole word you wish to replace.
Once you find that word, you need to then put in your new word in it's place, and then start searching for the word again untill you finish your input string.
So, for pseudo code:
for i in input //for every letter
if input[i] != lookfor[0]
results[i] put input[i] into new "results" array
else // We might have found it.
for j in lookfor // Go through coke, one at a time
if input[i+j] != lookfor[j] "c,o,k,e"
break; //You didn;t find coke, just "c" or "co" or "cok"
// If you got all the way through, you found coke.
//So now you have to switch that out for the new that in the result
results[i] = "pepsi" //Just be careful here, because this has a different index than i, because pespi's length != coke's length
Did that make sense?
First of all, your replace function is returning to char instead of char*. You can also define your function's return type as void and can make it return to char* buffer (in/out) parameter after in-function string operations. Moreover, you can use strtok(), strcmp() and strstr() predefined string.h functions to accomplish any kind of string operations.
Check this out to get information about standart string operation functions: String Operations
I have a C library that requires hexadecimal input of the form "\xFF". I need to pass an array of hexadecimal values formatted as "0xFF" form. Is there a way to replace "0x" by "\x" in C?
That sounds like an easy string replacement operation, but I think that's not really what you need.
The notation "\xFF" in a C string means "this string contains the character whose encoded value is 0xFF, i.e. 255 decimal".
So if that's what you mean, then you need to do the compiler's job and replace the incoming "0xFF" text with the single character that has the code 0xFF.
There is no standard function for this, since it's typically done by the compiler.
To implement this, I would write a loop that looks for 0x, and every time it's found, use strtoul() to attempt to convert a number at that location. If the number is too long (i.e. 0xDEAD) you need to figure out how to handle that.
You can use strstr in order to find the substring "0x" and then replace '0' with '\\':
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "0x01,0x0a,0x0f";
char *p = s;
printf("%s\n", s);
while (p) {
p = strstr(p, "0x");
if (p) *p = '\\';
}
printf("%s\n", s);
return 0;
}
Output:
0x01,0x0a,0x0f
\x01,\x0a,\x0f
But as pointed out by #unwind and #Sathish, that's probably not what you need.