Hi I'm trying to insert String into String in Position N using C . So I developed a function that woks well but the problem is that work one time .
This is my code :
char *substring(char *string, int position, int length)
{
char *pointer;
int c;
pointer = (char*) malloc(length+1);
if( pointer == NULL )
exit(EXIT_FAILURE);
for( c = 0 ; c < length ; c++ )
*(pointer+c) = *((string+position-1)+c);
*(pointer+c) = '\0';
return pointer;
}
void insert_substring(char *a, char *b, int position)
{
char *f, *e;
int length;
length = strlen(a);
f = substring(a, 1, position - 1 );
e = substring(a, position, length-position+1);
strcpy(a, "");
strcat(a, f);
free(f);
strcat(a, b);
strcat(a, e);
free(e);
}
int main(void) {
char username[UNLEN+1];
DWORD username_len = UNLEN+1;
GetUserName(username, &username_len);
char msg1 [] ="Good morning mr ";
char msg2 [] ="Good evening mr ";
insert_substring(msg1,username,17);
printf("%s\n",msg1);
insert_substring(msg2,username,17);
printf("%s\n",msg2);
return 0;
}
The program display one message :
Good morning mr XXXXX
What I remark , that the program doesn't execute any instruction after the second call of insert_substring .And it doesn't display any error . Maybe it is a problem of local variable Used in functions
Looking at your code, you seem to be trying to call Win32 API functions, get current user's username and print a greeting message based on time of day (but you didn't implement this part yet).
When concatenating two null-terminated strings, you need to make sure that the target buffer has enough space to hold the two strings that you are concatenating.
In your case, you already know your maximum space required to hold username, through the use of UNLEN macro.
So you can use a buffer with size: sizeof(msg1) + UNLEN alocated through malloc. something like this:
char *target_msg1 = (char *)malloc(sizeof(msg1) + UNLEN);
You can even use the lpnSizeparameter of GetUserName and allocate as little space as possible, but since you are doing Win32 programming I don't think a couple bytes of memory is much of a concern.
Your code would change like this:
strcpy(target_msg1, msg1);
insert_substring(target_msg1,username,17);
printf("%s\n",target_msg1);
Related
I'm currently creating a program that captures user's keypresses and stores them in a string. I wanted the string that stores the keypresses to be dynamic, but i came across a problem.
My current code looks something like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct Foo {
const char* str;
int size;
} Foo;
int main(void)
{
int i;
Foo foo;
foo.str = NULL;
foo.size = 0;
for (;;) {
for (i = 8; i <= 190; i++) {
if (GetAsyncKeyState(i) == -32767) { // if key is pressed
foo.str = (char*)realloc(foo.str, (foo.size + 1) * sizeof(char)); // Access violation reading location xxx
sprintf(foo.str, "%s%c", foo.str, (char)i);
foo.size++;
}
}
}
return 0;
}
Any help would be appreciated, as I don't have any ideas anymore. :(
Should I maybe also allocate the Foo object dynamically?
First, in order to handle things nicely, you need to define
typedef struct Foo {
char* str;
int size
} Foo;
Otherwise, Foo is really annoying to mutate properly - you invoke undefined behaviour by modifying foo->str after the realloc call in any way.
The seg fault is actually caused by sprintf(foo.str, "%s%c", foo.str, (char)i);, not the call to realloc. foo.str is, in general, not null-terminated.
In fact, you're duplicating work by calling sprintf at all. realloc already copies all the characters previously in f.str, so all you have to do is add a single character via
f.str[size] = (char) i;
Edit to respond to comment:
If we wanted to append to strings (or rather, two Foos) together, we could do that as follows:
void appendFoos(Foo* const first, const Foo* const second) {
first->str = realloc(first->str, (first->size + second->size) * (sizeof(char)));
memcpy(first->str + first->size, second->str, second->size);
first->size += second->size;
}
The appendFoos function modifies first by appending second onto it.
Throughout this code, we leave Foos as non-null terminated. However, to convert to a string, you must add a final null character after reading all other characters.
const char *str - you declare the pointer to const char. You cant write to the referenced object as it invokes UB
You use sprintf just to add the char. It makes no sense.
You do not need a pointer in the structure.
You need to set compiler options to compile **as C language" not C++
I would do it a bit different way:
typedef struct Foo {
size_t size;
char str[1];
} Foo;
Foo *addCharToFoo(Foo *f, char ch);
{
if(f)
{
f = realloc(f, sizeof(*f) + f -> size);
}
else
{
f = realloc(f, sizeof(*f) + 1);
if(f) f-> size = 0
}
if(f) //check if realloc did not fail
{
f -> str[f -> size++] = ch;
f -> str[f -> size] = 0;
}
return f;
}
and in the main
int main(void)
{
int i;
Foo *foo = NULL, *tmp;
for (;;)
{
for (i = 8; i <= 190; i++)
{
if (GetAsyncKeyState(i) == -32767) { // if key is pressed
if((tmp = addCharToFoo(f, i))
{
foo = tmp;
}
else
/* do something - realloc failed*/
}
}
}
return 0;
}
sprintf(foo.str, "%s%c", foo.str, (char)i); is ill-formed: the first argument cannot be const char *. You should see a compiler error message.
After fixing this (make str be char *), then the behaviour is undefined because the source memory read by the %s overlaps with the destination.
Instead you would need to use some other method to append the character that doesn't involve overlapping read and writes (e.g. use the [ ] operator to write the character and don't forget about null termination).
I made a program in C that can find two similar or different strings and extract the string between them. This type of program has so many uses, and generally when you use such a program, you have a lot of info, so it needs to be fast. I would like tips on how to make this program as fast and efficient as possible.
I am looking for suggestions that won't make me resort to heavy libraries (such as regex).
The code must:
be able to extract a string between two similar or different strings
find the 1st occurrence of string1
find the 1st occurrence of string2 which occurs AFTER string1
extract the string between string1 and string2
be able to use string arguments of any size
be foolproof to human error and return NULL if such occurs (example, string1 exceeds entire text string length. don't crash in an element error, but gracefully return NULL)
focus on speed and efficiency
Below is my code. I am quite new to C, coming from C++, so I could probably use a few suggestions, especially regarding efficient/proper use of the 'malloc' command:
fast_strbetween.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include <stdio.h> // printf
#include <stdlib.h> // malloc
// inline function if it pleases the compiler gods
inline size_t fast_strlen(char *str)
{
int i; // Cannot return 'i' if inside for loop
for(i = 0; str[i] != '\0'; ++i);
return i;
}
char *fast_strbetween(char *str, char *str1, char *str2)
{
// size_t segfaults when incorrect length strings are entered (due to going below 0), so use int instead for increased robustness
int str0len = fast_strlen(str);
int str1len = fast_strlen(str1);
int str1pos = 0;
int charsfound = 0;
// Find str1
do {
charsfound = 0;
while (str1[charsfound] == str[str1pos + charsfound])
++charsfound;
} while (++str1pos < str0len - str1len && charsfound < str1len);
// '++str1pos' increments past by 1: needs to be set back by one
--str1pos;
// Whole string not found or logical impossibilty
if (charsfound < str1len)
return NULL;
/* Start searching 2 characters after last character found in str1. This will ensure that there will be space, and logical possibility, for the extracted text to exist or not, and allow immediate bail if the latter case; str1 cannot possibly have anything between it if str2 is right next to it!
Example:
str = 'aa'
str1 = 'a'
str2 = 'a'
returned = '' (should be NULL)
Without such preventative, str1 and str2 would would be found and '' would be returned, not NULL. This also saves 1 do/while loop, one check pertaining to returning null, and two additional calculations:
Example, if you didn't add +1 str2pos, you would need to change the code to:
if (charsfound < str2len || str2pos - str1pos - str1len < 1)
return NULL;
It also allows for text to be found between three similar strings—what??? I can feel my brain going fuzzy!
Let this example explain:
str = 'aaa'
str1 = 'a'
str2 = 'a'
result = '' (should be 'a')
Without the aforementioned preventative, the returned string is '', not 'a'; the program takes the first 'a' for str1 and the second 'a' for str2, and tries to return what is between them (nothing).
*/
int str2pos = str1pos + str1len + 1; // the '1' added to str2pos
int str2len = fast_strlen(str2);
// Find str2
do {
charsfound = 0;
while (str2[charsfound] == str[str2pos + charsfound])
++charsfound;
} while (++str2pos < str0len - str2len + 1 && charsfound < str2len);
// Deincrement due to '++str2pos' over-increment
--str2pos;
if (charsfound < str2len)
return NULL;
// Only allocate what is needed
char *strbetween = (char *)malloc(sizeof(char) * str2pos - str1pos - str1len);
unsigned int tmp = 0;
for (unsigned int i = str1pos + str1len; i < str2pos; i++)
strbetween[tmp++] = str[i];
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
//Result should be: 'abba'
printf("The string between is: \'%s\'\n", fast_strbetween(str, str1, str2));
// free malloc as we go
for (int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
In order to have some way of measuring progress, I have already timed the code above (extracting a small string 10000000 times):
$ time fast_strbetween
The string between is: 'abba'
0m11.09s real 0m11.09s user 0m00.00s system
Process used 99.3 - 100% CPU according to 'top' command (Linux).
Memory used while running: 3.7Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
If anyone would like to offer code, tips, pointers... I would appreciate it. I will also implement the changes and give a timed result for your troubles.
Oh, and one thing that I learned is to always de-allocate malloc; I ran the code above (with extra loops), just before posting this. My computer's ram filled up, and the computer froze. Luckily, Stack made a backup draft! Lesson learned!
* EDIT *
Here is the revised code using chqrlie's advice as best I could. Added extra checks for end of string, which ended up costing about a second of time with the tested phrase but can now bail very fast if the first string is not found. Using null or illogical strings should not result in error, hopefully. Lots of notes int the code, where they can be better understood. If I've left anything thing out or done something incorrectly, please let me know guys; it is not intentional.
fast_strbetween2.c:
/*
Compile with:
gcc -Wall -O3 fast_strbetween2.c -o fast_strbetween2
Corrections and additions courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
// Strings now set to 'const'
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
// string size will now be calculated by the characters picked up
size_t str1pos = 0;
size_t str1chars;
// Find str1
do{
str1chars = 0;
// Will the do/while str1 check for '\0' suffice?
// I haven't seen any issues yet, but not sure.
while(str1[str1chars] == str[str1pos + str1chars] && str1[str1chars] != '\0')
{
//printf("Found str1 char: %i num: %i pos: %i\n", str1[str1chars], str1chars + 1, str1pos);
++str1chars;
}
// Incrementing whilst not in conditional expression tested faster
++str1pos;
/* There are two checks for "str1[str1chars] != '\0'". Trying to find
another efficient way to do it in one. */
}while(str[str1pos] != '\0' && str1[str1chars] != '\0');
--str1pos;
//For testing:
//printf("str1pos: %i str1chars: %i\n", str1pos, str1chars);
// exit if no chars were found or if didn't reach end of str1
if(!str1chars || str1[str1chars] != '\0')
{
//printf("Bailing from str1 result\n");
return '\0';
}
/* Got rid of the '+1' code which didn't allow for '' returns.
I agree with your logic of <tag></tag> returning ''. */
size_t str2pos = str1pos + str1chars;
size_t str2chars;
//printf("Starting pos for str2: %i\n", str1pos + str1chars);
// Find str2
do{
str2chars = 0;
while(str2[str2chars] == str[str2pos + str2chars] && str2[str2chars] != '\0')
{
//printf("Found str2 char: %i num: %i pos: %i \n", str2[str2chars], str2chars + 1, str2pos);
++str2chars;
}
++str2pos;
}while(str[str2pos] != '\0' && str2[str2chars] != '\0');
--str2pos;
//For testing:
//printf("str2pos: %i str2chars: %i\n", str2pos, str2chars);
if(!str2chars || str2[str2chars] != '\0')
{
//printf("Bailing from str2 result!\n");
return '\0';
}
/* Trying to allocate strbetween with malloc. Is this correct? */
char * strbetween = malloc(2);
// Check if malloc succeeded:
if (strbetween == '\0') return '\0';
size_t tmp = 0;
// Grab and store the string between!
for(size_t i = str1pos + str1chars; i < str2pos; ++i)
{
strbetween[tmp] = str[i];
++tmp;
}
return strbetween;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", fast_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
free(fast_strbetween(str, str1, str2));
return 0;
}
** Results **
$ time fast_strbetween2
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m10.93s real 0m10.93s user 0m00.00s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
chqrlie's answer
I understand that this is just some example code that shows proper programming practices. Nonetheless, it can make for a decent control in testing.
Please note that I do not know how to deallocate malloc in your code, so it is NOT a fair test. As a result, ram usage builds up, taking 130Mb+ for the process alone. I was still able to run the test for the full 10000000 loops. I will say that I tried deallocating this code the way I did my code (via bringing the function 'simple_strbetween' down into main and deallocating with 'free(strndup(p, q - p));'), and the results weren't much different from not deallocating.
** simple_strbetween.c **
/*
Compile with:
gcc -Wall -O3 simple_strbetween.c -o simple_strbetween
Courtesy of:
https://stackoverflow.com/questions/55308295/extracting-a-string-between-two-similar-or-different-strings-in-c-as-fast-as-p
*/
#include<string.h>
#include<stdio.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
int main() {
char str[30] = { "abaabbbaaaabbabbbaaabbb" };
char str1[10] = { "aaa" };
char str2[10] = { "bbb" };
printf("Searching \'%s\' for \'%s\' and \'%s\'\n", str, str1, str2);
printf(" 0123456789\n\n"); // Easily see the elements
printf("The word between is: \'%s\'\n", simple_strbetween(str, str1, str2));
for(int i = 10000000; --i;)
simple_strbetween(str, str1, str2);
return 0;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m19.68s real 0m19.34s user 0m00.32s system
Process used 100% CPU according to 'top' command (Linux).
Memory used while running: 130Mb (leak due do my lack of knowledge)
Executable size: 8380 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
Results for above code ran with this alternate strndup:
char *alt_strndup(const char *s, size_t n)
{
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
$ time simple_strbetween
Searching 'abaabbbaaaabbabbbaaabbb' for 'aaa' and 'bbb'
0123456789
The word between is: 'abba'
0m20.99s real 0m20.54s user 0m00.44s system
I kindly ask that nobody make judgements on the results until the code is properly ran. I will revise the results as soon as it is figured out.
* Edit *
Was able to decrease the time by over 25% (11.93s vs 8.7s). This was done by using pointers to increment the positions, as opposed to size_t. Collecting the return string whilst checking the last string was likely what caused the biggest change. I feel there is still lots of room for improvement. A big loss comes from having to free malloc. If there is a better way, I'd like to know.
fast_strbetween3.c:
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
#include<stdlib.h> // malloc, free
char * fast_strbetween(const char *str, const char *str1, const char *str2)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
return NULL;
char *strbetween = malloc(1);
if (!strbetween)
return '\0';
str = spos;
int i = 0;
//char *p = &strbetween[0]; // Alt. for advancing strbetween (slower)
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
//*p = *str;
//++p;
strbetween[i] = *str;
++str;
++i;
}while(*str2 && *spos);
if (spos == str)
return NULL;
//*--p = '\0';
strbetween[i - 1] = '\0';
return strbetween;
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
printf("\nString: \'%s\'\n", fast_strbetween(s, s1, s2));
for(int i = 10000000; --i; )
free(fast_strbetween(s, s1, s2));
return 0;
}
String: 'abba'
0m08.70s real 0m08.67s user 0m00.01s system
Process used 99.0 - 100% CPU according to 'top' command (Linux).
Memory used while running: 1.8Mb
Executable size: 8336 bytes
Ran on a Raspberry Pi 3B+ (4 x 1.4Ghz, Arm 6)
* Edit *
This doesn't really count as it does not 'return' a value, and therefore is against my own rules, but it does pass a variable through, which is changed and brought back to main. It runs with 1 library and takes 3.6s. Getting rid of malloc was the key.
/*
gcc -Wall -O3 fast_strbetween.c -o fast_strbetween
*/
#include<stdio.h> // printf
unsigned int fast_strbetween(const char *str, const char *str1, const char *str2, char *strbetween)
{
const char *sbegin = &str1[0]; // String beginning
const char *spos;
// Find str1
do{
spos = str;
str1 = sbegin;
while(*spos == *str1 && *str1)
{
++spos;
++str1;
}
++str;
}while(*str1 && *spos);
// Nothing found if spos hasn't advanced
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
str = spos;
sbegin = &str2[0]; // Recycle sbegin
// Find str2
do{
str2 = sbegin;
spos = str;
while(*spos == *str2 && *str2)
{
++str2;
++spos;
}
*strbetween = *str;
++strbetween;
++str;
}while(*str2 && *spos);
if (spos == str)
{
strbetween[0] = '\0';
return 0;
}
*--strbetween = '\0';
return 1; // Successful (found text)
}
int main() {
char s[100] = "abaabbbaaaabbabbbaaabbb";
char s1[100] = "aaa";
char s2[100] = "bbb";
char sret[100];
fast_strbetween(s, s1, s2, sret);
printf("String: %s\n", sret);
for(int i = 10000000; --i; )
fast_strbetween(s, s1, s2, sret);
return 0;
}
Your code has multiple problems and is probably not as efficient as it should be:
you use types int and unsigned int for indexes into the strings. These types may be smaller than the range of size_t. You should revise your code to use size_t and avoid mixing signed and unsigned types in comparisons.
your functions' string arguments should be declared as const char * as you do not modify the strings and should be able to pass const strings without a warning.
redefining strlen is a bad idea: your version will be slower than the system's optimized, assembly coded and very likely inlined version.
computing the length of str is unnecessary and potentially costly: both str1 and str2 may appear close to the beginning of str, scanning for the end of str will be wasteful.
the while loop inside the first do / while loop is incorrect: while(str1[charsfound] == str[str1pos + charsfound]) charsfound++; may access characters beyond the end of str and str1 as the loop does not stop at the null terminator. If str1 only appears at the end of str, you have undefined behavior.
if str1 is an empty string, you will find it at the end of str instead of at the beginning.
why do you initialize str2pos as int str2pos = str1pos + str1len + 1;? If str2 immediately follows str1 inside str, an empty string should be allocated and returned. Your comment regarding this case is unreadable, you should break such long lines to fit within a typical screen width such as 80 columns. It is debatable whether strbetween("aa", "a", "a") should return "" or NULL. IMHO it should return an allocated empty string, which would be consistent with the expected behavior on strbetween("<name></name>", "<name>", "</name>") or strbetween("''", "'", "'"). Your specification preventing strbetween from returning an empty string produces a counter-intuitive border case.
the second scanning loop has the same problems as the first.
the line char *strbetween = (char *) malloc(sizeof(char) * str2pos - str1pos - str1len); has multiple problems: no cast is necessary in C, if you insist on specifying the element size sizeof(char), which is 1 by definition, you should parenthesize the number of elements, and last but not least, you must allocate one extra element for the null terminator.
You do not test if malloc() succeeded. If it returns NULL, you will have undefined behavior, whereas you should just return NULL.
the copying loop uses a mix of signed and unsigned types, causing potentially counterintuitive behavior on overflow.
you forget to set the null terminator, which is consistent with the allocation size error, but incorrect.
Before you try and optimize code, you must ensure correctness! Your code is too complicated and has multiple flaws. Optimisation is a moot point.
You should first try a very simple implementation using standard C string functions: searching a string inside another one is performed efficiently by strstr.
Here is a simple implementation using strstr and strndup(), which should be available on your system:
#include <string.h>
char *simple_strbetween(const char *str, const char *str1, const char *str2) {
const char *q;
const char *p = strstr(str, str1);
if (p) {
p += strlen(str1);
q = *str2 ? strstr(p, str2) : p + strlen(p);
if (q)
return strndup(p, q - p);
}
return NULL;
}
strndup() is defined in POSIX and is part of the Extensions to the C Library Part II: Dynamic Allocation Functions, ISO/IEC TR 24731-2:2010. If it is not available on your system, it can be redefined as:
#include <stdlib.h>
#include <string.h>
char *strndup(const char *s, size_t n) {
size_t i;
char *p;
for (i = 0; i < n && s[i] != '\0'; i++)
continue;
p = malloc(i + 1);
if (p != NULL) {
memcpy(p, s, i);
p[i] = '\0';
}
return p;
}
To ensure correctness, write a number of test cases, with border cases such as all combinations of empty strings and identical strings.
Once your have thoroughly your strbetween function, you can write a benchmarking framework to test performance. This is not so easy to get reliable performance figures, as you will experience if you try. Remember to configure your compiler to select the appropriate optimisations, -O3 for example.
Only then can you move to the next step: if you are really restricted from using standard C library functions, you may first recode your versions of strstr and strlen and still use the same method. Test this new version both for correctness and for performance.
The redundant parts are the computation of strlen(str1) which must have been determined by strstr when it finds a match. And the scan in strndup() which is unnecessary since no null byte is present between p and q. If you have time to waste, you can try and remove these redundancies at the expense of readability, risking non conformity. I would be surprised if you get any improvement at all on average over a wide variety of test cases. 20% would be remarkable.
The following code which I whipped up in 7 minutes takes a short string and converts all letters to lower case:
void tolower(char *out,const char *in){
int l=strlen(in);int cc;int i;
for (i=0;i<l;i++){
cc=(int)in[i]-0;
if (cc >=65 && cc <=90){cc+=0x20;}
out[i]=(char)cc;
}
}
int main(int argc, char *argv[]){
const char *w="aBcDe";
char w2[6]=" ";
tolower(w2,w);
printf("x=%s %s\n",w,w2);
return EXIT_SUCCESS;
}
The problem with it is that I will be dealing with large sets of data (approximately 10KB worth of data per second) and I want to be able to make a function that works the fastest possible.
I have seen code out there that can deal with machine registers and when I used code like that in the past with Quick Basic, things worked faster.
So I'm curious as to how I could use machine registers (like eax) compatible with both 32 and 64 bit processors in my C program.
If I could take at least 4 bytes of the string at a time and then act on all 4 bytes simultaneously, then that would be best.
In Quick Basic, I could achieve what I need with the help of its mkd$() and cvd() functions.
Anyone know how I can make the function I posted work faster? and please don't say upgrade the computer processor.
Two approaches, which one is faster depends on profiling in your system.
// tolower()
void Mike_tolower1(char *out, const char *in) {
while ((*out++ = tolower((unsigned char) (*in++) )) != 0);
}
}
// table lookup
void Mike_tolower2(char *out, const char *in) {
// fill in the table
static const char lwr[CHAR_MAX+1] = { '\0', '\1', '\2', ...
'a', 'b' ...
'a', 'b' ...
};
while (*in) {
*out++ = lwr[(unsigned char) (*in++)];
}
}
Fastest way don't use strlen() since it does exactly what the following code does, it counts how many characters are there before the '\0' appears so you are iterating through the string twice, do it this way
#include <ctype.h>
void string_tolower(char *string)
{
while (*string != '\0')
{
*string = tolower(*string);
string++;
}
}
and don't call your function tolower it's a standard function declared in ctype.h and it converts a single ascii character to lower case.
Fastest depends on the processor, but a pretty speedy version is:
int c;
char *cp;
for (cp = out; 0 != (c=*cp); ++cp) {
if ((c >= 'A') && (c <= 'Z'))
*cp = (char)(c + 'a' - 'A');
}
That includes Jester's suggestion of avoiding strlen.
If you have some RAM to burn, you can compute a lookup table. So for example:
pseudocode for creating a tolower lookup table
lookup["AA"] = "aa"
lookup["bb"] = "bb"
This way you can lowercase 2 bytes at a time and with no if statements needed.
If you really want to go nuts, you could write a GPGPU implementation that would scream.
See Chux answer for an example of implementing a 1 char at a time look-up table.
I'm gonna use this code I just constructed now based from various sources including here: https://code.google.com/p/stringencoders/wiki/PerformanceAscii.
void tolower1(char *out,const char *in,int lg){
uint32_t x;
const uint32_t* s = (const uint32_t*) in;
uint32_t* d = (uint32_t*) out;
int l=(lg/sizeof(uint32_t));
int i;
for(i=0;i<l;++i){
x=s[i];
x=x-(((x+(0x05050505+0x1a1a1a1a)) >> 2) & 0x20202020);
d[i]=x;
}
}
I suggest something more like:
#include <ctype.h>
int main()
{
const char *w="aBcDe";
// allow room for nul termination byte
char w2[6]=" " = {'\0'};
// this 'for' statement may need tweaking if
// w[] contains '\0' byte except at the end
for( int i=0; w[i]; i++)
{
w3[i] = tolower(w[i]);
}
printf("x=%s %s\n",w,w2);
return EXIT_SUCCESS;
} // end function: main
or, for a callable function,
that also allows for nul bytes within the string
char *myToLower( int byteCount, char *originalArray )
{
char *lowerArray = NULL;
if( NULL == (lowerArray = malloc(byteCount) ) )
{ // then, malloc failed
perror( "malloc failed" );
exit( EXIT_FAILURE );
}
// implied else, malloc successful
for( int i=0; i< byteCount; i++ )
{
lowerArray[i] = tolower(originalArray[i]);
}
return( lowerArray );
} // end function: myToLower
Beginner here,
somewhat confused about an exercise:
Tutorial Last one on the page (it is german). I should read HTML-Lines and print attributes and their values. The declaration of the function which should be used is given.
Two things irritate me:
1. The Line is stored in a const char string, but i would like the User to type in his desired HTML-line. It seems not to be possible to change a const variable at runtime. How can it be achieved without changing the given declaration?
2. The Tutorial wants me to give back the position of strtok-search as an integer, but I read online that this value is stored within strtok, is there a way to cast that, or get it somehow?
To solve the exercise I wrote this code, but the program crashes at runtime with "Segmentation fault (core dumped)"-Message and I don't know why, could someone please explain that to me? (I probably need malloc, but for which variable?)
//cHowTo Uebung Teil 2 Nr. 4
//HTMLine.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
//char getHTMLline ();
int getHtmlAttributes(const char *string, int start, char *attrNamem,char *attrValue); //given by Tutorial
int main(int argc, char *argv) //because i want user-input later on, if possible
{
const char strg[]= {"<img src=\"kurt.jpg\" width=\"250\" alt=\"Kurt Kanns\" />"}; //given example line by tutorial
char attriN[255]={0}, attriV[255]={0};
int pos=99;
//printf("Please type the tag for analysis.\n");
//fgets(strg, 255, stdin);
printf("attribute\tvalue\n\n");
do
{
pos = getHtmlAttributes(strg, pos, attriN, attriV); //pos should be strtok-search-position
printf("%s\t\t%s\n", attriN, attriV);
}
while(pos!=1);
return EXIT_SUCCESS;
}
int getHtmlAttributes(const char *string, int start, char *attrNamem, char *attrValue)
{
int i, len;
char *ptr;
len = strlen(string);
char stringT[len]; //variable used to be split by strtok
for(i=0; i<len; i++)
stringT[i]=string[i];//copy string to stringT
if(start==99)
ptr = strtok(stringT, "<="); //get first attribute as whole
else
ptr = strtok(NULL, "= "); // get following attributes
for(i=0; i<len; i++)
attrNamem[i] = ptr[i];
ptr = strtok(NULL, "\""); //get values
for(i=0; i<len; i++)
attrValue[i] = ptr[i];
if(ptr == NULL) //if search complete
{
return 1;
}
else // if search continues
{
return 0;
}
}
//char getHTMLline ()
//{
// char user_input;
// scanf("%s", &user_input);
// return user_input;
//}
What strtok() does is that if you call it with a string different to NULL it stores a pointer to that string internally and returns the first token. A subsequent call with NULL then uses that internally stored pointer to determine the next token.
Now what happens in your code is:
When you call getHtmlAttributes() for the first time you create a copy of the given string in stringT and pass that copy to strtok(). The next time you call strtok( NULL, ... ). And there we have two bugs:
your loop to copy string() to stringT() is not correct. You don't copy
the teminating '\0'. Just use strcpy() in such cases
the important one: When you call getHtmlAttributes() for the second time, you call strtok( NULL, ... ) but the lifetime of the stringT that it has been called with originally has ended with the first call of getHtmlAttributes() returning, because stringT is a local variable that is created anew on the stack every time the function is called. You could solve that problem by either
declaring static char stringT[N] where N must be a constant (like 255), you can't use len (what should have been len+1 anyway) in that case
creating a dynamically allocated copy of string by char *stringT = strdup( string );. Please do that only if you call strtok( stringT, ... ) aferwards and be aware that without additional code you have a memory leak because you are not able to free that memory again.
what I would pefer: use string directly instead of stringT. In that case you should not declare string as const and create a copy of strg in main() that you pass to the function
Edit
I think the request "give back the position of strtok-search as an integer" means, you should return the offset of the found token in the complete string. That's quite easy to achieve, if you use the suggested solution static char stringT[N] from above:
As strtok() works on the string passed by the first call, after receiving a ptr different from NULL, you can calculate and return int offset = ptr - stringT;
Edit 2
And now something completely different
I've just read the Tutorial you have linked to (greetings from Hessen :-) ) and I think the idea is to use a new strtok() loop every time the function is called. That could like this:
int getHtmlAttributes(const char *string, int start, char *attrName, char *attrValue)
{
char *buf;
char *attrptr;
char *ptr;
// copy the substring starting at start to buf
// you should add error checking (buf may become NULL)
buf = strdup( string+start );
// first step: tokenize on whitespace; we stop at the first token that contains a "="
for( attrptr = strtok( buf, " \t" ); attrptr && (ptr = strchr( attrptr, '=' )) != NULL; attrptr = strtok( NULL, " \t" ) ) ;
if( attrptr ) {
// copy everything before "=" to attrName
sprintf( attrName, "%.*s", ptr-attrptr, attrptr );
// copy everything after "=" to attrValue
strcpy( attrValue, ptr+1 );
// the next start is the current start + the offset of the attr found
// + the length of the complete attr
start += (attrptr - buf) + strlen( attrptr );
free( buf );
return start;
} else {
// no more attribute
free( buf );
return -1;
}
}
I'm making a raytracing engine in C using the minilibX library.
I want to be able to read in a .conf file the configuration for the scene to display:
For example:
(Az#Az 117)cat universe.conf
#randomcomment
obj:eye:x:y:z
light:sun:100
light:moon:test
The number of objects can vary between 1 and the infinite.
From now on, I'm reading the file, copying each line 1 by 1 in a char **tab, and mallocing by the number of objects found, like this:
void open_file(int fd, struct s_img *m)
{
int i;
char *s;
int curs_obj;
int curs_light;
i = 0;
curs_light = 0;
curs_obj = 0;
while (s = get_next_line(fd))
{
i = i + 1;
if (s[0] == 'l')
{
m->lights[curs_light] = s;
curs_light = curs_light + 1;
}
else if (s[0] == 'o')
{
m->objs[curs_obj] = s;
curs_obj = curs_obj + 1;
}
else if (s[0] != '#')
{
show_error(i, s);
stop_parsing(m);
}
}
Now, I want to be able to store each information of each tab[i] in a new char **tab, 1 for each object, using the ':' as a separation.
So I need to initialize and malloc an undetermined number of char **tab. How can I do that?
(Ps: I hope my code and my english are good enough for you to understand. And I'm using only the very basic function, like read, write, open, malloc... and I'm re-building everything else, like printf, get_line, and so on)
You can't allocate an indeterminate amount of memory; malloc doesn't support it. What you can do is to allocate enough memory for now and revise that later:
size_t buffer = 10;
char **tab = malloc(buffer);
//...
if (indexOfObjectToCreate > buffer) {
buffer *= 2;
tab = realloc(tab, buffer);
}
I'd use an alternative approach (as this is c, not c++) and allocate simply large buffers as we go by:
char *my_malloc(size_t n) {
static size_t space_left = 0;
static char *base = NULL;
if (base==NULL || space_left < n) base=malloc(space_left=BIG_N);
base +=n; return base-n;
}
Disclaimer: I've omitted the garbage collection stuff and testing return values and all safety measures to keep the routine short.
Another way to think this is to read the file in to a large enough mallocated array (you can check it with ftell), scan the buffer, replace delimiters, line feeds etc. with ascii zero characters and remember the starting locations of keywords.