Lower cpu usage on searching a big char array - c

I'm searching for few bytes in a char array. The problem is that on slower machines the process gets up to 90%+ cpu usage. How to prevent that? My code is:
for(long i = 0; i < size - 5; ) {
if (buff[++i] == 'f' && buff[++i] == 'i' && buff[++i] == 'l' && buff[++i] == 'e') {
printf("found at: %d\n", i);
}
}
EDIT:
The string "file" is not null-terminated.

This looks like an attempt at very naive string search, I'd suggest you use either the standard functions provided for this purpose (like strstr) and/or research string search algorithms like Boyer-Moore.
The linked Wikipedia article on Boyer-Moore shows quite well why moving along one character at a time on a mismatch (like you do) is not necessary - it's an interesting read.
EDIT: also look at this page, it has a nice animated presentation that shows how BM does its job.
EDIT2: regarding the string not being nullterminated: either you
buff[size] = 0;
terminate it yourself, and use strstr, or you have a look at the BM code from the page I linked, that works with lengths, ie it will work with strings without terminating 0.

There is nothing wrong with getting 90% utilisation, since the algorithm is CPU-bound. But...
Unless you expect the search term to be on a 32-bit word boundary, the code is broken. If the word 'file' begins on the second character of the buffer, you will simply skip over it. (EDIT: Short-circuit eval means the code is correct as it stands. My mistake.)
Don't roll your own code for this; use strstr.

Try just storing a list of values where 'file' is found and print them out after the loop. It will prevent context switches and will enable the CPU to use the cache better. Also put i in a register.

Related

Ghidra Indexing C quad word

There's this Ghidra decompiled C code.
I understand that local_60 is a quad word, but I don't understand indexing it
What does local_60._3_1_ refer to here?
local_60 = 0x6c46575935676a5a;
local_28 = 0x7945474e3563544f;
printf("Enter access code: ");
__isoc99_scanf(&DAT_0010201c,&DAT_001040c0);
if ((((DAT_001040c0 == 'f') && (DAT_001040c1 == 'b')) && (DAT_001040c2 == local_60._3_1_)) &&
((DAT_001040c3 == '6' && (DAT_001040c4 == local_28._2_1_)))) {
You can usually click on the part of decompiled C code and highlight what part of original assembly produced it, this could help you understand what ghidra means by _3_1_ and _2_1_.
From my experience, the _X_Y_ syntax usually means that the code tries to "index" into an integer like an array, by taking a value of a single byte out of 8 into account. In addition to that, if you inspect byte values from the two constants 0x6c46575935676a5a and 0x7945474e3563544f you may notice that all the bytes are proper ASCII characters. These two things would suggest that local_60 and local_28 should instead be char[8] instead of integers. You should be able to right-click on the variable declarations and change their type manually, which may make code more readable by changing syntax into array indexing and array initialization.

How to recursively return string value in C?

Trying to learn something here more than solve the specific problem. Please help me towards some best practices to apply in this situation, and if possible some clarification of why. Thanks in advance for your assistance.
Basically I'm brute-force breaking a very simple hash algorithm, within known limits. Function tests possibilities of a string (within a length limit) against the hash, until it matches the hash passed. Then recursion should stop all iterations and return the string that matched. The iteration works, but when the answer is found, it seems that each run of the function doesn't get the value returned by its call of the same function.
Here's the code of the function, with extra comments for clarity:
//'hash' is the hash to be replicated
//'leading' is for recursive iteration (1st call should have leading=="")
//'limit' is the maximum string length to be tested
string crack(string hash, string leading, int limit)
{
string cracked=NULL, force=NULL, test=NULL;
//as per definition of C's crypt function - validated
char salt[3] = {hash[0], hash[1], '\0'};
// iterate letters of the alphabet - validated
for(char c='A'; c<='z'; c++)
{
// append c at the end of string 'leading' - validated
test = append(leading,c);
// apply hash function to tested string - validated
force = crypt(test,salt);
// if hash replicated, store answer in 'cracked' - validated
if(strcmp(hash,force)==0)
{
cracked = test;
}
else
{
// if within length limit, iterate next character - validated
if(strlen(test)<=limit+1)
{
// THIS IS WHERE THE PROBLEM OCCURS
// value received when solution found
// is always empty string ("", not NULL)
// tried replacing this with strcpy, same result
cracked = crack_des(hash,test,limit);
}
}
// if answer found, break out of loop - validated
if(cracked){break;}
// test only alphabetic characters - validated
if(c=='Z'){c='a' - 1;}
}
free(test);
// return NULL if not cracked to continue iteration on level below
// this has something to do with the problem
return cracked;
} // end of function
From the little bit I recall of pointers, I'd guess it is something with passing references instead of values, but I don't have enough knowledge to solve it. I have read this thread, but this suggestion doesn't seem to solve the issue - I tried using strcpy and got the same results.
Disclaimer: this is an exercise in 2018's CS50 by Harvard at EDX. It won't affect my grading (have already submitted two perfect exercises for the week, which is what is required) but as stated above I'm looking to learn.
Edit: edited the tag back to C (as clarified in comments, string is from string.h, and append was coded by me and validated several times over - I'll get to TDD in a while). Thanks all for your comments; problem solved and lessons learned!
I found a bug in the code, but I am not sure whether it is the root cause of your problem.
When the code hit the line:
strcmp(hash,force)==0
then you will assign the string pointed by 'test' to 'cracked':
cracked = test;
then this line is hit:
if(cracked){break;}
then the loop is breaked, and the next line:
free(test);
this line will free the string pointed by test, and remember that it is the same string pointed by 'cracked', thus you returned a string which is already freed.
What will happened to the string is dependent on your compiler and libc. You can try to fix this problem by allocating memory for 'cracked':
cracked = strdup(test);
Also, there are memory leaks caused by the 'test' and 'force' string, but they should be irrelevant to your problem.

Reading input using getchar_unlocked()

I have learnt that using getchar_unlocked is fast way of reading input. I have seen the code to read at many places but was unable to understand. Can anyone please help me understand how to read using getchar_unlocked ?
Thanks in Advance.
void scanint(int &x)
{
register int c = getchar_unlocked();
x = 0;
for(;(c<48 || c>57);c = getchar_unlocked())
;
for(;c>47 && c<58;c = getchar_unlocked())
{
x = (x<<1) + (x<<3) + c - 48;
}
}
I have seen many other codes as well. I dont particularly understand the purpose of shifting the number. Any help regarding that is appreciated
getch_lock reads a character at a time. here in the given code we are trying to read an integer. the purpose of first for loop is to read digit character if any present and neglect it. The second for loop reads a char which must be digit and performs
n=n*10+c
As C is in Ascii we have subtracted 48 ie Ascii code of '0' . To make code faster instead of using multiplication shift is used.
n*10=n*(8+2)=n*8+n*2=n<<3+n<<1
getchar_unlocked() is like getchar() except that it does not check for multi-thread locks.
So, it is faster, but it is not thread-safe.
I think you might have the wrong idea of the purpose of getchar_unlocked(). Really.
When doing single-character I/O from a human user, it's fantastically hard to believe that you need to focus on being "fast", since the human will be very slow.
The function you included looks like it's reading an integer using getchar_fast(), and is written in a pretty horrible style. It certainly doesn't look like part of a solution to anything in particular. It's also totally broken in its handling of the x pointer variable.
In short, your question is not very clear.

Designing Around a Large Number of Discrete Functions in C

Greetings and salutations,
I am looking for information regrading design patterns for working with a large number of functions in C99.
Background:
I am working on a complete G-Code interpreter for my pet project, a desktop CNC mill. Currently, commands are sent over a serial interface to an AVR microcontroller. These commands are then parsed and executed to make the milling head move. a typical example of a line might look like
N01 F5.0 G90 M48 G1 X1 Y2 Z3
where G90, M48, and G1 are "action" codes and F5.0, X1, Y2, Z3 are parameters (N01 is the optional line number and is ignored). Currently the parsing is coming along swimmingly, but now it is time to make the machine actually move.
For each of the G and M codes, a specific action needs to be taken. This ranges from controlled motion to coolant activation/deactivation, to performing canned cycles. To this end, my current design features a function that uses a switch to select the proper function and return a pointer to that function which can then be used to call the individual code's function at the proper time.
Questions:
1) Is there a better way to resolve an arbitrary code to its respective function than a switch statement? Note that this is being implemented on a microcontroller and memory is EXTREMELY tight (2K total). I have considered a lookup table but, unfortunately, the code distribution is sparse leading to a lot of wasted space. There are ~100 distinct codes and sub-codes.
2) How does one go about function pointers in C when the names (and possibly signatures) may change? If the function signatures are different, is this even possible?
3) Assuming the functions have the same signature (which is where I am leaning), is there a way to typedef a generic type of that signature to be passed around and called from?
My apologies for the scattered questioning. Thank you in advance for your assistance.
1) Perfect hashing may be used to map the keywords to token numbers (opcodes) , which can be used to index a table of function pointers. The number of required arguments can also be put in this table.
2) You don's want overloaded / heterogeneous functions. Optional arguments might be possible.
3) your only choice is to use varargs, IMHO
I'm not an expert on embedded systems, but I have experience with VLSI. So sorry if I'm stating the obvious.
The function-pointer approach is probably the best way. But you'll need to either:
Arrange all your action codes to be consecutive in address.
Implement an action code decoder similar to an opcode decoder in a normal processor.
The first option is probably the better way (simple and small memory footprint). But if you can't control your action codes, you'll need to implement a decoder via another lookup table.
I'm not entirely sure on what you mean by "function signature". Function pointers should just be a number - which the compiler resolves.
EDIT:
Either way, I think two lookup tables (1 for function pointers, and one for decoder) is still going to be much smaller than a large switch statement. For varying parameters, use "dummy" parameters to make them all consistent. I'm not sure what the consequences of force casting everything to void-pointers to structs will be on an embedded processor.
EDIT 2:
Actually, a decoder can't be implementated with just a lookup table if the opcode space is too large. My mistake there. So 1 is really the only viable option.
Is there a better way ... than a switch statement?
Make a list of all valid action codes (a constant in program memory, so it doesn't use any of your scarce RAM), and sequentially compare each one with the received code. Perhaps reserve index "0" to mean "unknown action code".
For example:
// Warning: untested code.
typedef int (*ActionFunctionPointer)( int, int, char * );
struct parse_item{
const char action_letter;
const int action_number; // you might be able to get away with a single byte here, if none of your actions are above 255.
// alas, http://reprap.org/wiki/G-code mentions a "M501" code.
const ActionFunctionPointer action_function_pointer;
};
int m0_handler( int speed, int extrude_rate, char * message ){ // M0: Stop
speed_x = 0; speed_y = 0; speed_z = 0; speed_e = 0;
}
int g4_handler ( int dwell_time, int extrude_rate, char * message ){ // G4: Dwell
delay(dwell_time);
}
const struct parse_item parse_table[] = {
{ '\0', 0, unrecognized_action } // special error-handler
{ 'M', 0, m0_handler }, // M0: Stop
// ...
{ 'G', 4, g4_handler }, // G4: Dwell
{ '\0', 0, unrecognized_action } // special error-handler
}
ActionFunctionPointer get_action_function_pointer( char * buffer ){
char letter = get_letter( buffer );
int action_number = get_number( buffer );
int index = 0;
ActionFunctionPointer f = 0;
do{
index++;
if( (letter == parse_table[index].action_letter ) and
(action_number == parse_table[index].action_number) ){
f = parse_table[index].action_function_pointer;
};
if('\0' == parse_table[index].action_letter ){
index = 0;
f = unrecognized_action;
};
}while(0 == f);
return f;
}
How does one go about function pointers in C when the names (and
possibly signatures) may change? If the function signatures are
different, is this even possible?
It's possible to create a function pointer in C that (at different times) points to functions with more or less parameters (different signatures) using varargs.
Alternatively, you can force all the functions that might possibly be pointed to by that function pointer to all have exactly the same parameters and return value (the same signature) by adding "dummy" parameters to the functions that require fewer parameters than the others.
In my experience, the "dummy parameters" approach seems to be easier to understand and use less memory than the varargs approach.
Is there a way to typedef a generic type of that signature
to be passed around and called from?
Yes.
Pretty much all the code I've ever seen that uses function pointers
also creates a typedef to refer to that particular type of function.
(Except, of course, for Obfuscated contest entries).
See the above example and Wikibooks: C programming: pointers to functions for details.
p.s.:
Is there some reason you are re-inventing the wheel?
Could maybe perhaps one of the following pre-existing G-code interpreters for the AVR work for you, perhaps with a little tweaking?
FiveD,
Sprinter,
Marlin,
Teacup Firmware,
sjfw,
Makerbot,
or
Grbl?
(See http://reprap.org/wiki/Comparison_of_RepRap_Firmwares ).

How to return string from a char function

I want the function getCategory() to return "invalid" , instead of printing the word "invalid" (i.e instead of using printf ) when input to the function is invalid (i.e.when either height or weight are lower then zero).
please help:
#include<stdio.h>
#include<conio.h>
char getCategory(float height,float weight)
{
char invalid = '\0';
float bmirange;
if(height<=0 || weight<=0)
return invalid;
else
{
height=height*0.01; //1 centimeter = 0.01 meters
bmirange=[weight/(height*height)];
if(bmirange< 15 )
return starvation;
}
}
int main()
{
char Category;
float height,weight;
printf("enter height");
scanf("%f",&height);
printf("enter weight");
scanf("%f",&weight);
Category=getCategory(height,weight);
if(Category == 0)
printf("invalid");
else
printf("%c", Category);
}
NOTE: the original question has been altered many, many times and the code has changed just as often, introducing new errors in each iteration. I leave this answer as it answered the original code, see history. Below this answer there's an update giving advice instead of code, as that seems more appropriate here.
Hmm, astander removed his answer. But perhaps this is what you should actually have:*
char getCategory(float height,float weight)
{
char invalid = '\0';
if(height<=0 || weight<=0)
return invalid;
return 'c'; /* do something for the valid cases */
}
* originally the question contained height || weight <= 0 and no value for variable invalid.
Notes on the code:
With proper indentation, your program flow becomes clearer. I corrected your if-statement, assuming this was your intend, actually. The last line should contain what you currently left out in your question. I added an initialization in the first line, because having a value is better then not having a value (which means: if you don't initialize, it can be anything, really).
In your calling code, you can do this:
Category = getCategory(height, weight);
if(Category == 0)
printf("invalid");
else
printf("%c", Category);
which actually prints the word "invalid" to the output, if that was your intend.
Update: based on new text in the question, it's clear that the asker wants something else, so here's a new answer. I leave the above, it's still valid with the original question.
You're now asking not to print the word "invalid" and not to use a special value for the invalid case. Instead, you ask to return "invalid", which I understand as returning the string with the value "invalid" (which, taken in itself, is still returning a special value).
You cannot do it
In short: you cannot do that. The current function has return type char. I don't know the purpose of your function, but I'm sure you've given it some thought and there's a reason for using a char. A char can only contain one character. And the word "invalid" is multiple characters. You have a few options, choose whichever suits you best:
Other ways
change the return type to be string instead of char, this requires redesign of all code involved;
settle with returning a special value. You don't show the body of your function, but if it would normally never return \0, you can use that value, as in my example above. Of course, you can choose any other char value;
raise an exception and use a try/catch in the body. But you use C, not C++. Here's a link that describes using C++-style exception handling for C, but this may be a bit out-of-bounds, learning C can better be taken on a small step at the time.
What's commonly best practice
In normal situations, it is common to choose either special-case values (typical in older or more basic languages like C or assembler) or exceptions (typical for more structured languages like C++, Java, Python). It's commonly considered bad practice to change a complete function for the purpose of special-cases (like invalid input).
Why
Instead, the caller of the function should deal with these special cases. The reason for this is a very important rule in programming: the function can never know beforehand what users of that function want to do when something bad happens (illegal input). One may choose to print "Illegal input" (for commandline users), another wants to quit the program (for in a library) and yet another wants to ignore and do nothing (for automatic processing). In short: what you are trying to achieve, you should try to achieve differently (see option 2 and 3 above, and my original solution).
Teachers and textbooks
Using this approach is by far the easiest and also best to understand for any (future) co-workers as it follows common computer practices. Of course, I haven't seen your assignment or textbook, so I can't tell in what direction they want a solution, and it won't be the first textbook or teacher to first show you the wrong path, let you tremble, and then show you the right path.
The getCategory method doesn't always return (because of the if statement). Also, not sure about the height in if statement. Add another return invalid at the end of the method.
char getCategory(float height,float weight)
{
char invalid;
if(height<=0 || weight<=0)
return invalid;
return 0
}
you need to (very carefully) pore over your textbook to ascertain the multitude of errors in the above code.
1, your test in getCategory will almost certainly not do what you want it to do.
2, you ARE returning invalid in some cases (but not all, see #1). However, there is no way to know that as invalid has no known value.
3. in other cases, getCategory returns no value at all
You're defining a variable named invalid. Its contents are undefined (it could be anything from -128 to 127). When you return this variable you're returning anything; do you want to assign something to the invalid variable before you return it? e.g.
char invalid;
invalid = 'i';
if ( ... ) {
return invalid;
} else {
return 0;
}
What does invalid should be mapped to? You should have a convention like this:
char invalid_category = '?';
or perhaps:
#define INVALID_CATEGORY '?'
This is better defined outside of the getCategory function so that the calling code can access it.
Also it isn't evident what your code returns when valid arguments are passed to it.
By the way, in your function getCategory, you have a variable that is not used nor declared - starvation. Where does that come from? I doubt that is a global variable.
Also, the variable bmirange does not make sense nor would it compile
bmirange=[weight/(height*height)];
as you can see that is a left hand side expression (LHS) but you have used an array subscript operators on the right hand side of expression (RHS). That is an illegal statement!
What was your intention there? Was that meant to be a pair of parenthesis?
Can you confirm this?
A lot of the answers are confusing because the OP did not make themselves clear on what is the error nor an explanation as to what is going on which is leading others to end up with code posted that does not satisfy the OP.
Hope this helps,
Best regards,
Tom.

Resources