How to check if a string starts with another string in C? - c

Is there something like startsWith(str_a, str_b) in the standard C library?
It should take pointers to two strings that end with nullbytes, and tell me whether the first one also appears completely at the beginning of the second one.
Examples:
"abc", "abcdef" -> true
"abcdef", "abc" -> false
"abd", "abdcef" -> true
"abc", "abc" -> true

There's no standard function for this, but you can define
bool prefix(const char *pre, const char *str)
{
return strncmp(pre, str, strlen(pre)) == 0;
}
We don't have to worry about str being shorter than pre because according to the C standard (7.21.4.4/2):
The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2."

Apparently there's no standard C function for this. So:
bool startsWith(const char *pre, const char *str)
{
size_t lenpre = strlen(pre),
lenstr = strlen(str);
return lenstr < lenpre ? false : memcmp(pre, str, lenpre) == 0;
}
Note that the above is nice and clear, but if you're doing it in a tight loop or working with very large strings, it does not offer the best performance, as it scans the full length of both strings up front (strlen). Solutions like wj32's or Christoph's may offer better performance (although this comment about vectorization is beyond my ken of C). Also note Fred Foo's solution which avoids strlen on str (he's right, it's unnecessary if you use strncmp instead of memcmp). Only matters for (very) large strings or repeated use in tight loops, but when it matters, it matters.

I'd probably go with strncmp(), but just for fun a raw implementation:
_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
while(*prefix)
{
if(*prefix++ != *string++)
return 0;
}
return 1;
}

Use strstr() function. Stra == strstr(stra, strb)
Reference
The strstr() function finds the first occurrence of string2 in string1. The function ignores the null character (\0) that ends string2 in the matching process.
https://www.ibm.com/docs/en/i/7.4?topic=functions-strstr-locate-substring

I'm no expert at writing elegant code, but...
int prefix(const char *pre, const char *str)
{
char cp;
char cs;
if (!*pre)
return 1;
while ((cp = *pre++) && (cs = *str++))
{
if (cp != cs)
return 0;
}
if (!cs)
return 0;
return 1;
}

Optimized (v.2. - corrected):
uint32 startsWith( const void* prefix_, const void* str_ ) {
uint8 _cp, _cs;
const uint8* _pr = (uint8*) prefix_;
const uint8* _str = (uint8*) str_;
while ( ( _cs = *_str++ ) & ( _cp = *_pr++ ) ) {
if ( _cp != _cs ) return 0;
}
return !_cp;
}

I noticed the following function definition in the Linux Kernel. It returns true if str starts with prefix, otherwise it returns false.
/**
* strstarts - does #str start with #prefix?
* #str: string to examine
* #prefix: prefix to look for.
*/
bool strstarts(const char *str, const char *prefix)
{
return strncmp(str, prefix, strlen(prefix)) == 0;
}

Because I ran the accepted version and had a problem with a very long str, I had to add in the following logic:
bool longEnough(const char *str, int min_length) {
int length = 0;
while (str[length] && length < min_length)
length++;
if (length == min_length)
return true;
return false;
}
bool startsWith(const char *pre, const char *str) {
size_t lenpre = strlen(pre);
return longEnough(str, lenpre) ? strncmp(str, pre, lenpre) == 0 : false;
}

Or a combination of the two approaches:
_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
char * const restrict prefix_end = prefix + 13;
while (1)
{
if ( 0 == *prefix )
return 1;
if ( *prefix++ != *string++)
return 0;
if ( prefix_end <= prefix )
return 0 == strncmp(prefix, string, strlen(prefix));
}
}
EDIT: The code below does NOT work because if strncmp returns 0 it is not known if a terminating 0 or the length (block_size) was reached.
An additional idea is to compare block-wise. If the block is not equal compare that block with the original function:
_Bool starts_with_big(const char *restrict string, const char *restrict prefix)
{
size_t block_size = 64;
while (1)
{
if ( 0 != strncmp( string, prefix, block_size ) )
return starts_with( string, prefix);
string += block_size;
prefix += block_size;
if ( block_size < 4096 )
block_size *= 2;
}
}
The constants 13, 64, 4096, as well as the exponentiation of the block_size are just guesses. It would have to be selected for the used input data and hardware.

I use this macro:
#define STARTS_WITH(string_to_check, prefix) (strncmp(string_to_check, prefix, ((sizeof(prefix) / sizeof(prefix[0])) - 1)) ? 0:((sizeof(prefix) / sizeof(prefix[0])) - 1))
It returns the prexif length if the string starts with the prefix. This length is evaluated compile time (sizeof) so there is no runtime overhead.

Related

Compare two char arrays without CR LF

I would like to use the following function to compare two char arrays:
if(strcmp((PtrTst->cDatVonCom),szGeraeteAntwort)==0)
Now my problem is that PtrTst->cDatVonCom[5000] is different than the szGeraeteAntwort[255] and the entire values looks a little bit different:
(abstract from the logfile).
PtrTst->cDatVonCom:
04/16/19 12:53:36 AB A{CR}{LF}
0 0{CR}{LF}
szGeraeteAntwort:
04/16/19 12:53:36 AB A 0 0{CR}{LF}
Could I check if the command (in this case AB A) is the same in both?
The command can change and it must be in both the same to go through the if statement.
UPDATE:
Both char arrays are always there and i need to check if the "szGeraeteAntwort" is in the PtrTst->cDatVonCom.
In C# i would use an cDatVonCom.Contains... or something like this to check if there the same.
You have two strings that whose logical content you want to compare, but their literal presentation may vary. In particular, there may be CR/LF line termination sequences inserted into one or both, which are not significant for the purposes of the comparison. There are many ways to approach this kind of problem, but one common one is to define a unique canonical form for your strings, prepare versions of both strings to that form, and compare the results. In this case, the canonical form would presumably be one without any CR or LF characters.
The most general way to approach this is to create canonicalized copies of your strings. This accounts for the case where you cannot modify the strings in-place. For example:
/*
* src - the source string
* dest - a pointer to the first element of an array that should receive the result.
* dest_size - the capacity of the destination buffer
* Returns 0 on success, -1 if the destination array has insufficient capacity
*/
int create_canonical_copy(const char src[], char dest[], size_t dest_size) {
static const char to_ignore[] = "\r\n";
const char *start = src;
size_t dest_length = 0;
int rval = 0;
while (*start) {
size_t segment_length = strcspn(start, to_ignore);
if (dest_length + segment_length + 1 >= dest_size) {
rval = -1;
break;
}
memcpy(dest + dest_length, start, segment_length);
dest_length += segment_length;
start += segment_length;
start += strspn(start, to_ignore);
}
dest[dest_length] = '\0';
return rval;
}
You might use that like so:
char tmp1[255], tmp2[255];
if (create_canonical_copy(PtrTst->cDatVonCom, tmp1, 255) != 0) {
// COMPARISON FAILS: cDatVonCom has more non-CR/LF data than szGeraeteAntwort
// can even accommodate
return -1;
} else if (create_canonical_copy(szGeraeteAntwort, tmp2, 255) != 0) {
// should not happen, given that szGeraeteAntwort's capacity is the same as tmp2's.
// If it does, then szGeraeteAntwort must not be properly terminated
assert(0);
return -1;
} else {
return strcmp(tmp1, tmp2);
}
That assumes you are comparing the strings for equality only. If you were comparing them for order, as well, then you could still use this approach, but you would need to be more care ful about canonicalizing as much data as the destination can accommodate, and about properly handling the data-too-large case.
A function that compares the strings while skipping over some characters could be used.
#include <stdio.h>
#include <string.h>
int strcmpskip ( char *match, char *against, char *skip) {
if ( ! match && ! against) { //both are NULL
return 0;
}
if ( ! match || ! against) {//one is NULL
return 1;
}
while ( *match && *against) {//both are not zero
while ( skip && strchr ( skip, *match)) {//skip not NULL and *match is in skip
match++;
if ( ! *match) {//zero
break;
}
}
while ( skip && strchr ( skip, *against)) {//skip not NULL and *against is in skip
against++;
if ( ! *against) {//zero
break;
}
}
if ( *match != *against) {
break;
}
if ( *match) {//not zero
match++;
}
if ( *against) {//not zero
against++;
}
}
return *match - *against;
}
int main( void) {
char line[] = "04/16/19 12:53:36 AB A\r\n 0 0\r\n";
char text[] = "04/16/19 12:53:36 AB A 0 0\r\n";
char ignore[] = "\n\r";
if ( strcmpskip ( line, text, ignore)) {
printf ( "do not match\n");
}
else {
printf ( "match\n");
}
return 0;
}
There are several things you can do; here are two:
Parse both strings (e.g. using scanf() or something more fancy)), and during the parsing ignore the newlines. Now you'll have the different fields (or an indication one of the lines can't be parsed properly, which is an error anyway). Then you can compare the commands.
Use a regular expression matcher on those two strings, to obtain just the command while ignoring everything else (treating CR and LF as newline characters essentially), and compare the commands. Of course you'll need to write an appropriate regular expression.

Check if strings contain the same number of the same characters

Here is the problem we have to check if two strings contain the same characters, regardless of order. For example s1=akash s2=ashka match.
My program is showing NO for every input strings;
s1 and s2 are two input strings
t is the number of testcases
->it would be really helpful if you can tell me where is the error I am a beginner
#include<stdio.h>
#include<string.h>
int main(){
int t,i,j;
scanf("%d",&t);
while(t>0){
char s1[100],s2[100];
scanf("%s ",s1);
scanf("%s",s2);
int count=0;
int found[100];
for(i=0;i<strlen(s1)-1;i++){
for(j=0;j<strlen(s1);j++){
if(s1[i]==s2[j]){
found[i]=1;
break;
}
}
}
for(i=0;i<strlen(s1);i++){
if(found[i]!=1){
count=1;
break;
}
}
if(count==1)
printf("NO");
else
printf("YES");
t--;
}
}
Some good answers above suggest sorting the strings first.
If you want to modify your program above to do this job then you need to modify it as you realised. I have a suggestion (in words) for how to do this below - after that there is a modified code that works, and finally a couple of extra points.
I guess that two strings aa and a would not be equal according to your definition, but your program would say that they were equal because once you find a character you do not have anyway of saying that it has been 'used up'
I would suggest that you change your found[] array so that it records when a character in the second string is matched.
I suggest logic as follows.
Loop through all S1 characters
| Loop through S2 charaters
| - if you get a match mark the S2 character as found
| - if you don't get a match by the end of the S2 loops then you are done - they are not equal
At the end of the S1 loop if you have not finished early then every character is matched, but you need to go through found[] array to check that every character in S2 was found.
working code is below....
note
you did not initialize found - it is initialize below in code
the first loop needs to have < strlen(s1) not < strlen(s1)-1
the second loop you should have been going to strlen(s2).
logic changed as described above so that found records characters found in s2 not s1
logic also changed so that if a character in s1 is not found the loop breaks early. There are tests to see if the loop broke early to see if the values of i and j are what we expect at the end of the loop.
edited code below (at the bottom below the code are some extra comments)
#include<stdio.h>
#include<string.h>
int main(){
int t,i,j;
scanf("%d",&t);
while(t>0){
char s1[100],s2[100];
scanf("%s ",s1);
scanf("%s",s2);
int count=0;
int found[100]={ 0 };
for(i=0;i<strlen(s1);i++){
for(j=0;j<strlen(s2);j++){
if(found[j]==1) continue; // character S2[j] already found
if(s1[i]==s2[j]){
found[j]=1;
break;
}
}
if (j==strlen(s2)) {
break; // we get here if we did not find a match for S1[i]
}
}
if (i!=strlen(s1)) {
printf("NO"); // we get here if we did not find a match for S1[i]
}
else {
// matched all of S1 now check S2 all matched
for(i=0;i<strlen(s2);i++){
if(found[i]!=1){
count=1;
break;
}
}
if(count==1) {
printf("NO");
}
else {
printf("YES");
}
}
t--;
}
return 0;
}
Two extra points to make your code more efficient.
First, as suggested by #chux it will probably be faster not to have strlen(s2) in the condition for the loop. What you could have instead would be for (j=0;s2[j];j++). This works because the final character at the end of the string will have the value 0 and in C a value of 0 means false.. in the for loop the loop runs whilst the logic statement is true and when it is false the loop stops. The speed up of not using strlen[s2] in the loop is because the compiler might decide to calculate strlen[s2] each time you go through the loop, which means counting for l2 if l2 is the length of s2 - thus as you have to go through the two loops l1*l2 times potentially with the strlen counting you actually have l1*l2*l2 steps.
secondly, you could speed up many tests by checking to see if the lengths of the two strings are different before checking if they contain the same number of the same types of character.
As suggested in my comment, and since it's now a bit more clear, an easy way to compare two multisets represented as strings is to:
Sort the two strings (easy using the qsort() standard function)
Compare the result (using the strcmp() standard function)
This will work since it will map both "akash" and "ashka" to "aahks", before comparing.
Sort both the strings by using bubble sort or any other tech. you know , then simply compair both strings by using strcmp() function .
for(i=0;i<strlen(s1)-1;i++){
for(j=0;j<strlen(s1);j++){
if(s1[i]==s2[j]){
found[i]=1;
break;
}
}
}
I am not able to understand why are you using j<strlen(s1) is second loop.
I think simple solution will be sorting the characters alphabetically and comparing one by one in single loop.
First, note that found is never initialized. The values within it are unknown. It ought to be initialized by setting every element to zero before each test for equality. (Or, if not every element, every element up to strlen(s1)-1, as those are the ones that will be used.)
Once found is initialized, though, there is another problem.
The first loop on i uses for(i=0;i<strlen(s1)-1;i++). Within this, found[i] is set if a match is found to s1[i]. Note that i never reaches strlen(s1)-1 within the loop, since the loop terminates when it does.
The second loop on i uses for(i=0;i<strlen(s1);i++). Within this loop, found[i] is tested to see if it is set. Note that i does reach strlen(s1)-1, since the loop terminates only when i reaches strlen(s1). However, found[strlen(s1)-1] can never have been set by the first loop, since i never reaches strlen(s1)-1 in the first loop. Therefore, the second loop would always report failure.
Additionally, it is not clear whether two strings ought to be considered equal if and only if they are anagrams (the characters in one can be rearranged to form the other string, without adding or removing any characters) or if each character in one string is found at least once in the other (“aaabbc” would be equal to “abbccc”, because both strings contain a, b, and c).
As written, with the initialization and loop bugs fixed, your program tests whether each character in the first string appears in the second string. This is not an equivalence relation because it is not reflexive: It does not test whether each character in the second string appears in the first string. So, you need to think more about what property you want to test and how to test for it.
Complicated solutions I did as a training. Two implementations controlled with a macro below.
First implementation loops through every character in the string, counts it's count in the first and second string and compares the values.
The second implementation allocates and creates a map of characters with count for each string and then compares these maps.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <assert.h>
#include <stdlib.h>
#include <errno.h>
// configuration
#define STRCHARSETCNTCMP_METHOD_FOREACH 0
#define STRCHARSETCNTCMP_METHOD_MAP 1
// eof configuration
//#define dbgln(fmt, ...) fprintf(stderr, "%s:%d: " fmt "\n", __func__, __LINE__, ##__VA_ARGS__)
#define dbgln(...) ((void)0)
/**
* STRing CHARacter SET CouNT CoMPare
* compare the count of set of characters in strings
* #param first string
* #param the other string
* #ret true if each character in s1 is used as many times in s2
*/
bool strcharsetcntcmp(const char s1[], const char s2[]);
// Count how many times the character is in the string
size_t strcharsetcntcmp_count(const char s[], char c)
{
assert(s != NULL);
size_t ret = 0;
while (*s != '\0') {
if (*s == c) {
++ret;
}
*s++;
}
return ret;
}
// foreach method implementation
bool strcharsetcntcmp_method_foreach(const char s1[], const char s2[])
{
const size_t s1len = strlen(s1);
const size_t s2len = strlen(s2);
if (s1len != s2len) {
return false;
}
for (size_t i = 0; i < s1len; ++i) {
const char c = s1[i];
const size_t cnt1 = strcharsetcntcmp_count(s1, c);
const size_t cnt2 = strcharsetcntcmp_count(s2, c);
// printf("'%s','%s' -> '%c' -> %zu %zu\n", s1, s2, c, cnt1, cnt2);
if (cnt1 != cnt2) {
return false;
}
}
return true;
}
// array of map elements
struct strcharsetcntcmp_map_s {
size_t cnt;
struct strcharsetcntcmp_map_cnt_s {
char c;
size_t cnt;
} *map;
};
// initialize empty map
void strcharsetcntcmp_map_init(struct strcharsetcntcmp_map_s *t)
{
assert(t != NULL);
dbgln("%p", t);
t->map = 0;
t->cnt = 0;
}
// free map memory
void strcharsetcntcmp_map_fini(struct strcharsetcntcmp_map_s *t)
{
assert(t != NULL);
dbgln("%p %p", t, t->map);
free(t->map);
t->map = 0;
t->cnt = 0;
}
// get the map element for character from map
struct strcharsetcntcmp_map_cnt_s *strcharsetcntcmp_map_get(const struct strcharsetcntcmp_map_s *t, char c)
{
assert(t != NULL);
for (size_t i = 0; i < t->cnt; ++i) {
if (t->map[i].c == c) {
return &t->map[i];
}
}
return NULL;
}
// check if the count for character c was already added into the map
bool strcharsetcntcmp_map_exists(const struct strcharsetcntcmp_map_s *t, char c)
{
return strcharsetcntcmp_map_get(t, c) != NULL;
}
// map element into map, without checking if it exists (only assertion)
int strcharsetcntcmp_map_add(struct strcharsetcntcmp_map_s *t, char c, size_t cnt)
{
assert(t != NULL);
assert(strcharsetcntcmp_map_exists(t, c) == false);
dbgln("%p %p %zu %c %zu", t, t->map, t->cnt, c, cnt);
void *pnt = realloc(t->map, sizeof(t->map[0]) * (t->cnt + 1));
if (pnt == NULL) {
return -errno;
}
t->map = pnt;
t->map[t->cnt].c = c;
t->map[t->cnt].cnt = cnt;
t->cnt++;
return 0;
}
// create map from string, map needs to be initialized by init and needs to be freed with fini
int strcharsetcntcmp_map_parsestring(struct strcharsetcntcmp_map_s *t, const char s[])
{
assert(t != NULL);
assert(s != NULL);
int ret = 0;
while (*s != '\0') {
const char c = *s;
if (!strcharsetcntcmp_map_exists(t, c)) {
const size_t cnt = strcharsetcntcmp_count(s, c);
ret = strcharsetcntcmp_map_add(t, c, cnt);
if (ret != 0) {
break;
}
}
++s;
}
return ret;
}
// compare two maps if they have same sets of characters and counts
bool strcharsetcntcmp_cmp(const struct strcharsetcntcmp_map_s *t, const struct strcharsetcntcmp_map_s *o)
{
assert(t != NULL);
assert(o != NULL);
if (t->cnt != o->cnt) {
return false;
}
for (size_t i = 0; i < t->cnt; ++i) {
const char c = t->map[i].c;
const size_t t_cnt = t->map[i].cnt;
struct strcharsetcntcmp_map_cnt_s *o_map_cnt = strcharsetcntcmp_map_get(o, c);
if (o_map_cnt == NULL) {
dbgln("%p(%zu) %p(%zu) %c not found", t, t->cnt, o, o->cnt, c);
return false;
}
const size_t o_cnt = o_map_cnt->cnt;
if (t_cnt != o_cnt) {
dbgln("%p(%zu) %p(%zu) %c %zu != %zu", t, t->cnt, o, o->cnt, c, t_cnt, o_cnt);
return false;
}
dbgln("%p(%zu) %p(%zu) %c %zu", t, t->cnt, o, o->cnt, c, t_cnt);
}
return true;
}
// map method implementation
bool strcharsetcntcmp_method_map(const char s1[], const char s2[])
{
struct strcharsetcntcmp_map_s map1;
strcharsetcntcmp_map_init(&map1);
if (strcharsetcntcmp_map_parsestring(&map1, s1) != 0) {
abort(); // <insert good error handler here>
}
struct strcharsetcntcmp_map_s map2;
strcharsetcntcmp_map_init(&map2);
if (strcharsetcntcmp_map_parsestring(&map2, s2) != 0) {
abort(); // <insert good error handler here>
}
const bool ret = strcharsetcntcmp_cmp(&map1, &map2);
strcharsetcntcmp_map_fini(&map1);
strcharsetcntcmp_map_fini(&map2);
return ret;
}
bool strcharsetcntcmp(const char s1[], const char s2[])
{
assert(s1 != NULL);
assert(s2 != NULL);
#if STRCHARSETCNTCMP_METHOD_FOREACH
return strcharsetcntcmp_method_foreach(s1, s2);
#elif STRCHARSETCNTCMP_METHOD_MAP
return strcharsetcntcmp_method_map(s1, s2);
#endif
}
// unittests. Should return 0
int strcharsetcntcmp_unittest(void)
{
struct {
const char *str1;
const char *str2;
bool eq;
} const tests[] = {
{ "", "", true, },
{ "a", "b", false, },
{ "abc", "bca", true, },
{ "aab", "abb", false, },
{ "aabbbc", "cbabab", true, },
{ "123456789012345678901234567890qwertyuiopqwertyuiopasdfghjklasdfghjklzxcvbnmzxcvbnm,./;", "123456789012345678901234567890qwertyuiopqwertyuiopasdfghjklasdfghjklzxcvbnmzxcvbnm,./;", true },
{ "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", true },
{ "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678900", false },
};
int ret = 0;
for (size_t i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i) {
const bool is = strcharsetcntcmp(tests[i].str1, tests[i].str2);
if (is != tests[i].eq) {
fprintf(stderr,
"Error: strings '%s' and '%s' returned %d should be %d\n",
tests[i].str1, tests[i].str2, is, tests[i].eq);
ret = -1;
}
}
return ret;
}
int main()
{
return strcharsetcntcmp_unittest();
}

How to word-wrap using specific delimiters, without dynamic allocation

I have a program that displays UTF-8 encoded strings with a size limitation (say MAX_LEN).
Whenever I get a string with a length > MAX_LEN, I want to find out where I could split it so it would be printed gracefully.
For example:
#define MAX_LEN 30U
const char big_str[] = "This string cannot be displayed on one single line: it must be splitted"
Without process, the output will looks like:
"This string cannot be displaye" // Truncated because of size limitation
"d on one single line: it must "
"be splitted"
The client would be able to chose eligible delimiters for the split but for now, I defined a list of delimiters by default:
#define DEFAULT_DELIMITERS " ;:,)]" // Delimiters to track in the string
So I am looking for an elegant and lightweight way of handling these issue without using malloc: my API should not return the sub-strings, I just want the positions of the sub-strings to display.
I already have some ideas that I will propose in answer: any feedback (e.g. pros and cons) would be appreciated, but most of all I am interested in alternatives solutions.
I just want the positions of the sub-strings to display.
So all you need is one function analysing your input returning the positions where a delimiter was found.
A possible appoach using strpbrk() assuming C99 at least:
#include <unistd.h> /* for ssize_t */
#include <string.h>
#define DELIMITERS (" ;.")
void find_delimiter_positions(
const char * input,
const char * delimiters,
ssize_t * delimiter_positions)
{
ssize_t dp_current = 0;
const char * p = input;
while (NULL != (p = strpbrk(p, delimiters)))
{
delimiter_positions[dp_current] = p - input;
++dp_current;
++p;
}
}
int main(void)
{
char input[] = "some randrom data; more.";
size_t input_length = strlen(input);
ssize_t delimiter_positions[input_length];
for (size_t s = 0; s < input_length; ++s)
{
delimiter_positions[s] = -1;
}
find_delimiter_positions(input, DELIMITERS, delimiter_positions);
for (size_t s = 0; -1 != delimiter_positions[s]; ++s)
{
/* print out positions */
}
}
For why C99: C99 introduces V(ariable) L(ength) A(rray), which are necessary here to get around the limitation to not use dynamic memory allocation.
If VLAs also may not be used one needs to fall back a defining a maximum number of possible occurences of delimiters per string. The latter however might be feasable as the maximum length of the string to be parsed is given, which in turn would imply the maximum number of possible delimiters per string.
For the latter case those lines from the example above
char input[] = "some randrom data; more.";
size_t input_length = strlen(input);
ssize_t delimiter_positions[input_length];
could be replaced by
char input[MAX_INPUT_LEN] = "some randrom data; more.";
size_t input_length = strlen(input);
ssize_t delimiter_positions[MAX_INPUT_LEN];
An approach that doesn't require additional storage is to make the wrapping function call a callback function for each substring. In the example below, the string is just printed with plain old printf, but the callback could call any other API function.
Things to note:
There is a function next that should advance a pointer to the next UTF-8 character. The encoding width for an UTF-8 char can be seen from its first byte.
The space and punctuation delimiters are treated slightly differently: Spaces are neither appended to the end or beginning of a line. (If there aren't any consecutive spaces in the text, that is.) Punctuation is retained at the end of a line.
Here's an example implementation:
#include <assert.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define DELIMITERS " ;:,)]"
/*
* Advance to next character. This should advance the pointer to
* up to three chars, depending on the UTF-8 encoding. (But at the
* moment, it doesn't.)
*/
static const char *next(const char *p)
{
return p + 1;
}
typedef struct {
const char *begin;
const char *end;
} substr_t;
/*
* Wraps the text and stores the found substring' ranges into
* the lines struct. Return the number of word-wrapped lines.
*/
int wrap(const char *text, int width, substr_t *lines, uint32_t max_num_lines)
{
const char *begin = text;
const char *split = NULL;
uint32_t num_lines = 1;
int l = 0;
while (*text) {
if (strchr(DELIMITERS, *text)) {
split = text;
if (*text != ' ') split++;
}
if (l++ == width) {
if (split == NULL) split = text;
lines[num_lines - 1].begin = begin;
lines[num_lines - 1].end = split;
//write(fileno(stdout), begin, split - begin);
text = begin = split;
while (*begin == ' ') begin++;
split = NULL;
l = 0;
num_lines++;
if (num_lines > max_num_lines) {
//abort();
return -1;
}
}
text = next(text);
}
lines[num_lines - 1].begin = begin;
lines[num_lines - 1].end = text;
//write(fileno(stdout), begin, split - begin);
return num_lines;
}
int main()
{
const char *text = "I have a program that displays UTF-8 encoded strings "
"with a size limitation (say MAX_LEN). Whenever I get a string with a "
"length > MAX_LEN, I want to find out where I could split it so it "
"would be printed gracefully.";
substr_t lines[100];
const uint32_t max_num_lines = sizeof(lines) / sizeof(lines[0]);
const int num_lines = wrap(text, 48, lines, max_num_lines);
if (num_lines < 0) {
fprintf(stderr, "error: can't split into %d lines\n", max_num_lines);
return EXIT_FAILURE;
}
//printf("num_lines = %d\n", num_lines);
for (int i=0; i < num_lines; i++) {
FILE *stream = stdout;
const ptrdiff_t line_length = lines[i].end - lines[i].begin;
write(fileno(stream), lines[i].begin, line_length);
fputc('\n', stream);
}
return EXIT_SUCCESS;
}
Addendum: Here's another approach that builds loosely on the strtok pattern, but without modifying the string. It requires a state and that state must be initialised with the string to print and the maximum line width:
struct wrap_t {
const char *src;
int width;
int length;
const char *line;
};
int wrap(struct wrap_t *line)
{
const char *begin = line->src;
const char *split = NULL;
int l = 0;
if (begin == NULL) return -1;
while (*begin == ' ') begin++;
if (*begin == '\0') return -1;
while (*line->src) {
if (strchr(DELIMITERS, *line->src)) {
split = line->src;
if (*line->src != ' ') split++;
}
if (l++ == line->width) {
if (split == NULL) split = line->src;
line->line = begin;
line->length = split - begin;
line->src = split;
return 0;
}
line->src = next(line->src);
}
line->line = begin;
line->length = line->src - begin;
return 0;
}
All definitions not shown (DELIMITERS, next) are as above and the basic algorithm hasn't changed. I think this method is easy to use for the client:
int main()
{
const char *text = "I have a program that displays UTF-8 encoded strings "
"with a size limitation (say MAX_LEN). Whenever I get a string with a "
"length > MAX_LEN, I want to find out where I could split it so it "
"would be printed gracefully.";
struct wrap_t line = {text, 60};
while (wrap(&line) == 0) {
printf("%.*s\n", line.length, line.line);
}
return 0;
}
Solution1
A function that will be called successively until the whole string is processed: it would return the count of bytes to recopy to create the sub-strings:
The API:
/**
* Return the length between the beginning of the string and the
* last delimiter (such that returned length <= max_length)
*/
size_t get_next_substring_length(
const char * str, // The string to be splitted
const char * delim, // String of eligible delimiters for a split
size_t max_length); // The maximum length of resulting substring
On the client' side:
size_t shift = 0;
for(;;)
{
// Where do we start within big_str ?
const char * tmp = big_str + shift;
size_t count = get_next_substring_length(tmp, DEFAULT_DELIMITERS, MAX_LEN);
if(count)
{
// Allocate a sub-string and recopy "count" bytes
// Display the sub-string
shift += count;
}
else // End Of String (or error)
{
// Handle potential error
// Exit the loop
}
}
Solution2
Define a custom structure to store positions and lengths of sub-strings:
const char * str = "This is a long test string";
struct substrings
{
const char * str; // Beginning of the substring
size_t length; // Length of the substring
} sub[] = { {&str[0], 4},
{&str[5], 2},
{&str[8], 1},
{&str[10], 4},
{&str[15], 4},
{&str[20], 6},
{NULL, 0} };
The API:
size_t find_substrings(
struct substrings ** substr,
size_t max_length,
const char * delimiters,
const char * str);
On the client' side:
#define ARRAY_LENGTH 20U
struct substrings substr[ARRAY_LENGTH];
// Fill the structure
find_substrings(
&substr,
ARRAY_LENGTH,
DEFAULT_DELIMITERS,
big_str);
// Browse the structure
for (struct substrings * sub = &substr[0]; substr->str; sub++)
{
// Display sub->length bytes of sub->str
}
Some things are bothering me though:
in Solution1 I don't like the infinite loop, it is often bug prone
in Solution2 I fixed ARRAY_LENGTH arbitrarily but it should vary depending of input string length

Extracting key=value with scanf in C

I need to extract a value for a given key from a string. I made this quick attempt:
char js[] = "some preceding text with\n"
"new lines and spaces\n"
"param_1=123\n"
"param_2=321\n"
"param_3=string\n"
"param_2=321\n";
char* param_name = "param_2";
char *key_s, *val_s;
char buf[32];
key_s = strstr(js, param_name);
if (key_s == NULL)
return 0;
val_s = strchr(key_s, '=');
if (val_s == NULL)
return 0;
sscanf(val_s + 1, "%31s", buf);
printf("'%s'\n", buf);
And it in fact works ok (printf gives '321'). But I suppose the scanf/sscanf would make this task even easier but I have not managed to figure out the formatting string for that.
Is that possible to pass a content of a variable param_name into sscanf so that it evaluates it as a part of a formatting string? In other words, I need to instruct sscanf that in this case it should look for a pattern param_2=%s (the param_name in fact comes from a function argument).
Not directly, no.
In practice, there's of course nothing stopping you from building the format string for sscanf() at runtime, with e.g. snprintf().
Something like:
void print_value(const char **js, size_t num_js, const char *key)
{
char tmp[32], value[32];
snprintf(tmp, sizeof tmp, "%s=%%31s", key);
for(size_t i = 0; i < num_js; ++i)
{
if(sscanf(js[i], tmp, value) == 1)
{
printf("found '%s'\n", value);
break;
}
}
}
OP's has a good first step:
char *key_s = strstr(js, param_name);
if (key_s == NULL)
return 0;
The rest may be simplified to
if (sscanf(&key_s[strlen(param_name)], "=%31s", buf) == 0) {
return 0;
}
printf("'%s'\n", buf);
Alternatively one could use " =%31s" to allow spaces before =.
OP's approach gets fooled by "param_2 321\n" "param_3=string\n".
Note: Weakness to all answers so far to not parse the empty string.
One issue that bears consideration is the difference between finding a 'key=value' setting in the string for a specific key value (such as param_2 in the question), and finding any 'key=value' setting in the string (with no specific key in mind a priori). The techniques to be used are rather different.
Another issue that has not self-evidently been considered is the possibility that you're looking for a key param_2 but the string also contains param_22=xyz and t_param_2=abc. The simple-minded approaches using strstr() to hunt for param_2 will pick up either of those alternatives.
In the sample data, there is a collection of characters that are not in the 'key=value' format to be skipped before the any 'key=value' parts. In the general case, we should assume that such data appears before, in between, and after the 'key=value' pairs. It appears that the values do not need to support complications such as quoted strings and metacharacters, and the value is delimited by white space. There is no comment convention visible.
Here's some workable code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAX_KEY_LEN = 31 };
enum { MAX_VAL_LEN = 63 };
int find_any_key_value(const char *str, char *key, char *value);
int find_key_value(const char *str, const char *key, char *value);
int find_any_key_value(const char *str, char *key, char *value)
{
char junk[256];
const char *search = str;
while (*search != '\0')
{
int offset;
if (sscanf(search, " %31[a-zA-Z_0-9]=%63s%n", key, value, &offset) == 2)
return(search + offset - str);
int rc;
if ((rc = sscanf(search, "%255s%n", junk, &offset)) != 1)
return EOF;
search += offset;
}
return EOF;
}
int find_key_value(const char *str, const char *key, char *value)
{
char found[MAX_KEY_LEN + 1];
int offset;
const char *search = str;
while ((offset = find_any_key_value(search, found, value)) > 0)
{
if (strcmp(found, key) == 0)
return(search + offset - str);
search += offset;
}
return offset;
}
int main(void)
{
char js[] = "some preceding text with\n"
"new lines and spaces\n"
"param_1=123\n"
"param_2=321\n"
"param_3=string\n"
"param_4=param_2=confusion\n"
"m= x\n"
"param_2=987\n";
const char p2_key[] = "param_2";
int offset;
const char *str;
char key[MAX_KEY_LEN + 1];
char value[MAX_VAL_LEN + 1];
printf("String being scanned is:\n[[%s]]\n", js);
str = js;
while ((offset = find_any_key_value(str, key, value)) > 0)
{
printf("Any found key = [%s] value = [%s]\n", key, value);
str += offset;
}
str = js;
while ((offset = find_key_value(str, p2_key, value)) > 0)
{
printf("Found key %s with value = [%s]\n", p2_key, value);
str += offset;
}
return 0;
}
Sample output:
$ ./so24490410
String being scanned is:
[[some preceding text with
new lines and spaces
param_1=123
param_2=321
param_3=string
param_4=param_2=confusion
m= x
param_2=987
]]
Any found key = [param_1] value = [123]
Any found key = [param_2] value = [321]
Any found key = [param_3] value = [string]
Any found key = [param_4] value = [param_2=confusion]
Any found key = [m] value = [x]
Any found key = [param_2] value = [987]
Found key param_2 with value = [321]
Found key param_2 with value = [987]
$
If you need to handle different key or value lengths, you need to adjust the format strings as well as the enumerations. If you pass the size of the key buffer and the size of the value buffer to the functions, then you need to use snprint() to create the format strings used by sscanf(). There is an outside chance that you might have a single 'word' of 255 characters followed immediately by the target 'key=value' string. The chances are ridiculously small, but you might decide you need to worry about that (it prevents this code being bomb-proof).

For string, find and replace

Finding some text and replacing it with new text within a C string can be a little trickier than expected.
I am searching for an algorithm which is fast, and that has a small time complexity.
What should I use?
I couldn't find an implementation of search/replace in C that I liked so I present here my own. It does not use things like strstr(), snprintf(), arbitrary length temporary buffers, etc. It only requires that the haystack buffer is large enough to hold the resulting string after replacements are made.
// str_replace(haystack, haystacksize, oldneedle, newneedle) --
// Search haystack and replace all occurences of oldneedle with newneedle.
// Resulting haystack contains no more than haystacksize characters (including the '\0').
// If haystacksize is too small to make the replacements, do not modify haystack at all.
//
// RETURN VALUES
// str_replace() returns haystack on success and NULL on failure.
// Failure means there was not enough room to replace all occurences of oldneedle.
// Success is returned otherwise, even if no replacement is made.
char *str_replace(char *haystack, size_t haystacksize,
const char *oldneedle, const char *newneedle);
// ------------------------------------------------------------------
// Implementation of function
// ------------------------------------------------------------------
#define SUCCESS (char *)haystack
#define FAILURE (void *)NULL
static bool
locate_forward(char **needle_ptr, char *read_ptr,
const char *needle, const char *needle_last);
static bool
locate_backward(char **needle_ptr, char *read_ptr,
const char *needle, const char *needle_last);
char *str_replace(char *haystack, size_t haystacksize,
const char *oldneedle, const char *newneedle)
{
size_t oldneedle_len = strlen(oldneedle);
size_t newneedle_len = strlen(newneedle);
char *oldneedle_ptr; // locates occurences of oldneedle
char *read_ptr; // where to read in the haystack
char *write_ptr; // where to write in the haystack
const char *oldneedle_last = // the last character in oldneedle
oldneedle +
oldneedle_len - 1;
// Case 0: oldneedle is empty
if (oldneedle_len == 0)
return SUCCESS; // nothing to do; define as success
// Case 1: newneedle is not longer than oldneedle
if (newneedle_len <= oldneedle_len) {
// Pass 1: Perform copy/replace using read_ptr and write_ptr
for (oldneedle_ptr = (char *)oldneedle,
read_ptr = haystack, write_ptr = haystack;
*read_ptr != '\0';
read_ptr++, write_ptr++)
{
*write_ptr = *read_ptr;
bool found = locate_forward(&oldneedle_ptr, read_ptr,
oldneedle, oldneedle_last);
if (found) {
// then perform update
write_ptr -= oldneedle_len;
memcpy(write_ptr+1, newneedle, newneedle_len);
write_ptr += newneedle_len;
}
}
*write_ptr = '\0';
return SUCCESS;
}
// Case 2: newneedle is longer than oldneedle
else {
size_t diff_len = // the amount of extra space needed
newneedle_len - // to replace oldneedle with newneedle
oldneedle_len; // in the expanded haystack
// Pass 1: Perform forward scan, updating write_ptr along the way
for (oldneedle_ptr = (char *)oldneedle,
read_ptr = haystack, write_ptr = haystack;
*read_ptr != '\0';
read_ptr++, write_ptr++)
{
bool found = locate_forward(&oldneedle_ptr, read_ptr,
oldneedle, oldneedle_last);
if (found) {
// then advance write_ptr
write_ptr += diff_len;
}
if (write_ptr >= haystack+haystacksize)
return FAILURE; // no more room in haystack
}
// Pass 2: Walk backwards through haystack, performing copy/replace
for (oldneedle_ptr = (char *)oldneedle_last;
write_ptr >= haystack;
write_ptr--, read_ptr--)
{
*write_ptr = *read_ptr;
bool found = locate_backward(&oldneedle_ptr, read_ptr,
oldneedle, oldneedle_last);
if (found) {
// then perform replacement
write_ptr -= diff_len;
memcpy(write_ptr, newneedle, newneedle_len);
}
}
return SUCCESS;
}
}
// locate_forward: compare needle_ptr and read_ptr to see if a match occured
// needle_ptr is updated as appropriate for the next call
// return true if match occured, false otherwise
static inline bool
locate_forward(char **needle_ptr, char *read_ptr,
const char *needle, const char *needle_last)
{
if (**needle_ptr == *read_ptr) {
(*needle_ptr)++;
if (*needle_ptr > needle_last) {
*needle_ptr = (char *)needle;
return true;
}
}
else
*needle_ptr = (char *)needle;
return false;
}
// locate_backward: compare needle_ptr and read_ptr to see if a match occured
// needle_ptr is updated as appropriate for the next call
// return true if match occured, false otherwise
static inline bool
locate_backward(char **needle_ptr, char *read_ptr,
const char *needle, const char *needle_last)
{
if (**needle_ptr == *read_ptr) {
(*needle_ptr)--;
if (*needle_ptr < needle) {
*needle_ptr = (char *)needle_last;
return true;
}
}
else
*needle_ptr = (char *)needle_last;
return false;
}
Example usage
#define BUF 30
char *retval1, *retval2;
char message[BUF] = "Your name is $USERNAME.";
char username[] = "admin";
char username_toolong[] = "System Administrator";
int main() {
retval1 = str_replace(message, BUF, "$USERNAME", username_toolong);
retval2 = str_replace(message, BUF, "$USERNAME", username);
if (!retval1)
printf("Not enough room to replace $USERNAME with `%s'\n", username_toolong);
if (!retval2)
printf("Not enough room to replace $USERNAME with `%s'\n", username);
printf("%s\n", message);
return 0;
}
Output
Not enough room to replace $USERNAME with `System Administrator'
Your name is admin.
Cheers.
Knuth-Morris-Pratt (which is classic) or Boyer-Moore (which is sometimes faster)?
http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm
Try using a Google search for 'string searching algorithms'.
I can't help but wonder what algorithm strstr() implements. Given that these are fairly standard algorithms, it's entirely possible that a good implementation of strstr() uses one of them.
However there's no guarantee that strstr() implements an optimised algorithm or that the same algorithm is used from one platform to another.
Using std::string (from <string>) you can simply use find and replace.
http://www.cplusplus.com/reference/string/string/find/ - Gets you an index.
http://www.cplusplus.com/reference/string/string/replace/ - Takes an index.
Edit: Touché. This is for C++ only.
Is this any good to you?
http://www.daniweb.com/forums/thread51976.html
here is a nice code
#include <stdio.h>
#include <string.h>
char *replace_str(char *str, char *orig, char *rep)
{
static char buffer[4096];
char *p;
if(!(p = strstr(str, orig))) // Is 'orig' even in 'str'?
return str;
strncpy(buffer, str, p-str); // Copy characters from 'str' start to 'orig' st$
buffer[p-str] = '\0';
sprintf(buffer+(p-str), "%s%s", rep, p+strlen(orig));
return buffer;
}
int main(void)
{
puts(replace_str("Hello, world!", "world", "Miami"));
return 0;
}
My solution, based on the others, but a bit safer I believe:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SOURCE_SIZE (0x100000)
char * searchReplace(char * string, char *toReplace[], char *replacements[], int numReplacements){
int i = 0;
char *locOfToRep;
char *toRep;
char *rep;
int lenToRep,lenStr,lenAfterLocRep;
static char buffer[MAX_SOURCE_SIZE];
for(i = 0; i < numReplacements; ++i){
toRep = toReplace[i];
rep = replacements[i];
//if str not in the string, exit.
if (!(locOfToRep = strstr(string,toRep))){
exit(EXIT_FAILURE);
}
lenToRep = strlen(toRep);
lenStr = strlen(string);
lenAfterLocRep = strlen(locOfToRep);
//Print the string upto the pointer, then the val, and then the rest of the string.
sprintf(buffer, "%.*s%s%s", lenStr-lenAfterLocRep, string,rep,locOfToRep+lenToRep);
string = buffer;
}
return buffer;
}
int main(){
char * string = "Hello, world!";
int numVals;
char *names[2] = {"Hello", "world"};
char *vals[2] = {"Goodbye", "you"};
numVals = 2;
string = searchReplace(string, names, vals, numVals);
printf("%s\n",string);
}

Resources