Up to this moment I know that it's quite important that the paramaters which are included in your code to have suggestive names, so that the code could be easy to read by anyone who has to read it. But ... in matters of memory, run time, how important is that the used parameters to not be too many or to have too long names? Could this be something to be aware of or is it not so important for the efficiency of the code?
The name of the parameters/arguments matters absolutely zero at runtime. The compiler does not use the names when generating object code. They will not appear in your binary unless you take special effort to get them there. They are only for the human who reads the code. As such, they should be as long and descriptive as necessary, but no longer.
On the other hand, having too many parameters can indeed have a minor effect on the runtime speed of your code, since each time that function is called, all those parameters have to be pushed. But that is really not the most significant issue. A bigger problem is usability—a function becomes very hard to understand and use [correctly] if it takes a bazillion parameters. Design your functions so that they are easy to use correctly and hard to use incorrectly. (It is also worth pointing out that a function that takes a lot of parameters is probably violating the single responsibility principle.)
Related
For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...
I have defined a customized tcl type using tcl library in c/c++. I basically make the Tcl_Obj.internalRep.otherValuePtr point to my own data structure. The problem happens by calling [string length myVar] or other similar string functions that does so called shimmering behaviour which replace my internalRep with it's own string structure. So that after the string series tcl function, myVar cannot convert back! because it's a complicate data structure cannot be converted back from the Tcl_Obj.bytes representation plus the type is no longer my customized type. How can I avoid that.
The string length command converts the internal representation of the values it is given to the special string type, which records information to allow many string operations to be performed rapidly. Apart from most of the string command's various subcommands, the regexp and regsub commands are the main ones that do this (for their string-to-match-the-RE-against argument). If you have a precious internal representation of your own and do not wish to lose it, you should avoid those commands; there are some operations that avoid the trouble. (Tcl mostly assumes that internal representations are not fragility, and therefore that they can be regenerated on demand. Beware when using fragility!)
The key operations that are mostly safe (as in they generate the bytes/length rep through calling the updateStringProc if needed, but don't clear the internal rep) are:
substitution into a string; the substituted value won't have the internal rep, but it will still be in the original object.
comparison with the eq and ne expression operators. This is particularly relevant for checks to see if the value is the empty string.
Be aware that there are many other operations that spoil the internal representation in other ways, but most don't catch people out so much.
[EDIT — far too long for a comment]: There are a number of relatively well-known extensions that work this way (e.g., TCOM and Tcl/Java both do this). The only thing you can really do is “be careful” as the values really are fragile. For example, put them in an array and then pass the indexes into the array around instead, as those need not be fragile. Or keep things as elements in a list (probably in a global variable) and pass around the list indices; those are just plain old numbers.
The traditional, robust approach is to put a map (e.g., a Tcl_HashTable or std::map) in your C or C++ code and have the indices into that be short strings with not too much meaning (I like to use the name of the type of value followed by either a sequence number or a serialisation of the pointer, such as you might get with the %p conversion in sprintf(); the printed pointer reveals more of the implementation details, is a little more helpful if you're debugging, and generally doesn't actually make that much difference in practice). You then have the removal of things from the map be an explicit deletion operation, and it is also easy to provide operations like listing all the known current values. This is safe, but prone to “leaking” (though it's not formally a memory leak if you provide the listing operation). It can be accelerated by caching the lookup in a Tcl_Obj*'s internal representation (a cheap way to handle deletion is to use a sequence number that you increment when you delete something, and only bypass the map lookup if the sequence number that you cache in the intrep is equal to the main sequence number) but it's not usually a big deal; only go to that sort of thing if you've measured a bottleneck in the lookups.
But I'd probably just live with fragility in my own code, and would just take care to ensure that I never bust the assumptions. The problem is really that you're being incautious about how you use the values; the Tcl code should just pass them around and nothing else really. Also, I've experimented a fair bit with wrapping such things up inside a TclOO object; it's far too heavyweight (by the design of TclOO) for values that you're making a lot of, but if you've only got a few of them and you're wanting to treat them as objects with methods, this can work very well indeed (and gives many more options for automatic cleanup).
strerror() function returns a short error description, given error number as argument. For example, if the argument is ENOTDIR, it will return "Not a directory", if the argument is EBUSY, it will return "Device or resource busy", etc.
But, is there a function that returns "ENOTDIR" for argument equals ENOTDIR, "EBUSY" for argument EBUSY, etc.?
I just want to avoid writing a huge switch statement for this purpose.
No- there is no standard or commonly used nonstandard function that provides this functionality.
One approach would be to write a huge switch statement, but this might not be the best approach for you to take. Most values of errno are not specified by any standard, so their values may or may not be consistent across different operating systems or even different versions of the same operating system.
Plus it would be a pain in the rear end.
A more elegant approach, if some runtime overhead is acceptable, would be to write a function that looks up these errors codes when they occur, rather than hard-coding the values into a big table. GNU/Linux systems have a list of all possible errno values at:
/usr/include/asm-generic/errno-base.h
/usr/include/asm-generic/errno.h
These files provide a #define of each errno value along with their value and a short description in an adjacent comment. It'd be pretty trivial to search these files line-by-line and print out the matching error code. Even if not, these files would be the things to start with in your quest to write a huge switch statement.
Beware that the kernel might negate these values when they're passed to userspace.
As David points out above, the problem is there is no standard function that can provide the desired functionality. So thinking that this would be a neat problem to try to write a script for, I wrote a little something to automatically generate the switch function (if it should come to be necessary) and posted the code here. Seems to work alright on OS X, otherwise mileage may vary. A script such as this could be added to the build process to make sure that the values were defined correctly.
I´m searching information about how to compare two codes and decide if the code submitted by someone is correct or not (based on a solution code defined before).
I could compare the output but many codes may have the same output. Then I think I must compare someway the codes and give a percentage of similitude.
Anybody can help me?
(the language code is C but I think this isn´t important)
Some of my teachers used online automated program grading systems like http://web-cat.org/
In the assignment they would specify a public api you must provide, and then they would just write tests against your functions, much like unit tests. They would intentionally pick tests that would exploit boundary conditions and other things students are notorious for not thinking about, and just call your code with many different inputs to try to get your code to fail.
Sometimes they would hardcode the expected values, other times they would allow values within a range, and other times they just did the assignment themselves and made it so your own code has to match the results produced by their code.
Obviously, not all programs can be effectively graded this way. It's also kinda error prone in that sometimes even the teacher made a mistake and overflowed an int or something, then the correct student submissions wouldn't match the teachers incorrect results. But, a system doesn't need to be perfect to be useful. But I think this raises an important point in that manually grading by reading the code won't necessarily reveal all mistakes either.
Another possibility is copy the submitted code, strip out all of the white space and search for substrings that must exist for the code to be correct and/or substrings that cannot exist for the code to be considered correct. The troublesome bit might be setting up to allow for some of the more tricky requirements such as [(a or c),((a or b) and c),((a or b) and c)], where the variables are the result of a boolean check as to if the substring related to the variable exists within the code.
For example, [("printf"),("for"), (not "1,2,3,4,5,6,7,9,10")], would require that "printf" and "for" be substrings in the code, while "1,2,3,4,5,6,7,9,10" i I'm not familiar with C, so I'm I'm assuming here that "printf" is required to be able to print anything without involving output streams, which could be accounted for by something like [("printf" or "out"),("for"), (not "1,2,3,4,5,6,7,9,10")], where "out" is part of C code required to make use of output streams.
It might be possible to automatically find required substrings based on a "correct" code, but as others have mentioned, there are alternative ways to do things. Which is why hard-coding the "solution" is probably required. Even so, it's quite possible that you'll miss a required substring, and it'll be marked as wrong, but it's probably the only way you can do what you ask with some degree of success.
Regular expressions might be useful here.
I've done some searching and have not found anything that would boost the file and formatting functions in Visual Studio VS2010 C (not C++).
I've been able to address the raw i/o issues to some extent by using large buffers and a SSD drive, so the more pressing issue is a replacement for the family of printf functions.
Has anyone found something worthwhile?
As I understand it, part of the glacial speed issue with the printf functions is that they have to handle myriad types of arguments. Does anyone have experience with writing a datatype-specific version of printf; eg, one that only prints ints, or only prints doubles, etc?
First off, you should profile the code first before assuming it's printf.
But if you're sure it's printf and similar then you can do a few things to fix the issue.
1) print less. IE, don't call expensive operations as much if you can avoid it. Do you need all the output, for example?
2) manually replace the string concatenation with manually built routines that do all the pieces without having to parse the format specifier.
EG: printf("--%s--", "really cool");
Can become:
write(1, "--", 2);
write(1, "really cool", 11);
write(1, "--", 2);
That may be faster. But again, you won't know until you profile it. Don't spend energy on a solution till you can confirm it's the solution you need and be able to measure the success of your proposed solution.
#Wes is right, never assume you know what you need to fix until you have proof.
Where I differ is on the method of finding out.
I and others use random pausing which works for these reasons, and here's a short slide show demo & C++ code so you can see how it works, if you want.
The thing about printf (or any output) function is it spends A) a certain number of CPU cycles creating a buffer to be output, and then it spends B) a certain amount of time waiting while the system and/or auxiliary hardware actually moves the data out.
That's maybe a bit over-simplified, but if you randomly pause and examine the state, that's what you see.
What you've done by using large buffers and an SSD drive is reduce B, and that's good.
That means of the time remaining, A is a larger fraction.
You know that.
Now of the samples you find in A, you might get a hint of what's happening if you see what subordinate routines inside printf are showing up.
Usually printf calls something like vprintf to get rid of the variable argument list, which then cycles over the format string to figure out what to do, including things like parsing precision specifiers.
If it looks like that's what it's doing, then you know about how much time goes into parsing the format.
On the other hand, if you see it inside a routine that is copying a string, or formatting an integer (along with dealing with leading/trailing characters, etc.) then you know to concentrate on that.
On yet another hand, if you see it inside a routine that looks like it's formatting a floating point number (which is actually quite complicated), you know to concentrate on that.
Given all that, you want to know what I do?
First, I ask who is going to read this anyway?
If nobody really needs to read all this text, why not pump it out in binary? Or failing that, in hex?
If you simply write binary, A shrinks to nothing, and when you read it back in with another program, guess what?
No Lost Bits!