I've done some searching and have not found anything that would boost the file and formatting functions in Visual Studio VS2010 C (not C++).
I've been able to address the raw i/o issues to some extent by using large buffers and a SSD drive, so the more pressing issue is a replacement for the family of printf functions.
Has anyone found something worthwhile?
As I understand it, part of the glacial speed issue with the printf functions is that they have to handle myriad types of arguments. Does anyone have experience with writing a datatype-specific version of printf; eg, one that only prints ints, or only prints doubles, etc?
First off, you should profile the code first before assuming it's printf.
But if you're sure it's printf and similar then you can do a few things to fix the issue.
1) print less. IE, don't call expensive operations as much if you can avoid it. Do you need all the output, for example?
2) manually replace the string concatenation with manually built routines that do all the pieces without having to parse the format specifier.
EG: printf("--%s--", "really cool");
Can become:
write(1, "--", 2);
write(1, "really cool", 11);
write(1, "--", 2);
That may be faster. But again, you won't know until you profile it. Don't spend energy on a solution till you can confirm it's the solution you need and be able to measure the success of your proposed solution.
#Wes is right, never assume you know what you need to fix until you have proof.
Where I differ is on the method of finding out.
I and others use random pausing which works for these reasons, and here's a short slide show demo & C++ code so you can see how it works, if you want.
The thing about printf (or any output) function is it spends A) a certain number of CPU cycles creating a buffer to be output, and then it spends B) a certain amount of time waiting while the system and/or auxiliary hardware actually moves the data out.
That's maybe a bit over-simplified, but if you randomly pause and examine the state, that's what you see.
What you've done by using large buffers and an SSD drive is reduce B, and that's good.
That means of the time remaining, A is a larger fraction.
You know that.
Now of the samples you find in A, you might get a hint of what's happening if you see what subordinate routines inside printf are showing up.
Usually printf calls something like vprintf to get rid of the variable argument list, which then cycles over the format string to figure out what to do, including things like parsing precision specifiers.
If it looks like that's what it's doing, then you know about how much time goes into parsing the format.
On the other hand, if you see it inside a routine that is copying a string, or formatting an integer (along with dealing with leading/trailing characters, etc.) then you know to concentrate on that.
On yet another hand, if you see it inside a routine that looks like it's formatting a floating point number (which is actually quite complicated), you know to concentrate on that.
Given all that, you want to know what I do?
First, I ask who is going to read this anyway?
If nobody really needs to read all this text, why not pump it out in binary? Or failing that, in hex?
If you simply write binary, A shrinks to nothing, and when you read it back in with another program, guess what?
No Lost Bits!
Related
TL;DR:
I am asking you to tell me what would be the most efficient approach to double my strings and print them out?
Full story:
I had trouble with the title, and the actual problem may be a bit different than you expect.
Imagine I have a main buffer.
At some index determined by the program, I want to insert a string.
But every char in that string needs to be doubled.
So "abc", inserted at index 10 of buffer[999], needs to be "aabbcc".
Now, the second part of the problem - this needs to be as efficient as possible. I could make this easily, but I need the fastest option.
I thought I had devised several approaches, but it boils down to:
fill buffer(1000) with single chars and double the chars when printing (pushing to stdout)
fill buffer(2000) with double chars and print like normal
The variations to the second approach would be When to double the chars (when copying or generating "aabbcc" from the start and copy the full thing).
The first approach would be the most intuitive, but I fear I would need to devise a low level char-doubling function because putc and printf and any large amount of function calls will have much overhead. (There are allegedly very efficient functions in libc with bitshifting and pointer magic, but I couldn't find them. I can only find the very dissappointing versions where fgets() is just a wrapper for getc() - which can't be efficient.)
The second approach obviously wastes a lot of memory and requires a lot of copying, but it could probably put everything into stdout more efficiently as a chunk without the overhead of copying single chars.
I am unsure if under everything there is just a system write call, and I also lack the knowledge how it works. I am just going with my research that says that fgets is about 12 times faster than fgetc for equal data. And so I assume it is with all the single-char vs line functions.
So in conclusion, I am asking you to tell me what would be the most efficient approach to double my strings and print them out?
Up to this moment I know that it's quite important that the paramaters which are included in your code to have suggestive names, so that the code could be easy to read by anyone who has to read it. But ... in matters of memory, run time, how important is that the used parameters to not be too many or to have too long names? Could this be something to be aware of or is it not so important for the efficiency of the code?
The name of the parameters/arguments matters absolutely zero at runtime. The compiler does not use the names when generating object code. They will not appear in your binary unless you take special effort to get them there. They are only for the human who reads the code. As such, they should be as long and descriptive as necessary, but no longer.
On the other hand, having too many parameters can indeed have a minor effect on the runtime speed of your code, since each time that function is called, all those parameters have to be pushed. But that is really not the most significant issue. A bigger problem is usability—a function becomes very hard to understand and use [correctly] if it takes a bazillion parameters. Design your functions so that they are easy to use correctly and hard to use incorrectly. (It is also worth pointing out that a function that takes a lot of parameters is probably violating the single responsibility principle.)
MAC Addresses are 48 bits. That is equivalent to three shorts. MAC
addresses are sometimes written like this: 01:23:45:67:89:ab where
each pair of digits represents a hexadecimal number.
Write a function that will take in a character pointer pointing to a
null terminated string of characters like in the example and will
break it apart and then store it in an array of three 16-bit shorts.
The address of the array will also be passed into the function.
I figured the function header should look something like void convertMacToShort(char *macAddr, short *shorts);. What I'm having difficulty with is parsing the char*. I feel like it's possible if I loop over it, but that doesn't feel efficient enough. I don't even need to make this a universal function of sorts--the MAC address will always be a char* in the format of 01:23:45:67:89:ab.
What's a good way to go about parsing this?
Well efficiency is one thing ... robustness another.
If you have very defined circumstances like a list of millions of MAC addresses which are all in the same format (only lower case letters, always leading zeroes, ...) then I would suggest using a quick function accessing the characters directly.
If you're parsing user input and need to detect input errors as well, execution speed is not the thing to worry about. In this scenario you have to make sure that you detect all possible mistakes a user is able to do (and this is quite a feat). This leads to sscanf(..) and in that case I would even suggest to write your own function which parses the string (for my experience sscanf(..) sometimes causes trouble depending on the input string and therefore I avoid using it when processing user input).
Another thing: If you're worrying about efficiency in the means of execution time, write a little benchmark which runs the parsing function a few million times and compare execution time. This is easily done and sometimes brings up surprises...
I have been reviewing example code for using OpenSSL and in every example I locate, the creator has chosen to use BIO_printf() to write things to stdout instead of printf().
I have taken their code, removed the openssl/bio.h header declaration, and changed all calls to BIO_printf() to regular printf() statements. The programs ran with identical results.
The problem I'm grasping with is why these coders use BIO_printf() when it takes a lot more to setup than just using printf(). You have to include another header (which will increase program size), you need to set the file pointer to the stream you want to write to. Then you can print your message to stdout. It seems a lot more complicated than using printf().
When I do a search on BIO_printf() it lists possible man pages for BIO_printf (3), but none of the pages actually contain any information!
I decided to do a benchmark test on both methods. I looped printf("Hey\n"); 1,000,000 times. Then I did it for BIO_printf(fp, "Hey\n");. I only timed the BIO_printf() statement and not the setting up of the file pointer (which would have increased the time). The difference came out to printf() being ~4.7x faster than using BIO_printf().
Why are they using it? What is the benefit? It's my understanding that in programming you either want code to be simple or efficient, and in the case of BIO_printf() it's neither.
In general, a BIO might not be writing to stdout.
You can have a BIO that writes to a file, or null, or a socket, or a network drive, or another BIO, etc.
By using the BIO_printf family, the code can easily be changed to have its output sent to a different location or another BIO which might do some further filtering and then pass the output onto wherever else.
As pointed by others, BIO can be stacked contrary to FILE. snprintf() and vnsprintf() were added in C99. OpenSSL/SSLeay is older than this. Hence, the SSLeay developpers had to write their own implementation. Unfortunately, having a little used implementation leads to the performance issues described by the OP or to CVE-2016-0799.
I need to write a program where during run time, a set of integers of arbitrary size will taken as input. They will be seperated by white space. At the end, a new line is given, showing the end of input. How do I save them into an array of integers so that i can display them later. I think it is a little difficult because the number of values that will be entered is not known during compilation
Sounds like homework.
Correct me if I am wrong and I will give you more than hints.
You can either declare an array of a really large size that would not possibly be filled by the user input, then use scanf or something like that to grab the integers until you hit '\n', or you can grab each integer at a time, allocating memory as you go, using a combination of malloc and memcpy calls. The first option should never be done in a real world problem, and I am certainly not advocating such practices even though your textbook probably tells you to do it this way.
There is an example just like this in K&R.
This is a typical problem you will have in C. The solution is usually one of two options.
Use a really large array that is large enough to hold the input. Sometimes this is a poor option when the data could be really large. An example of when it would be a bad idea is when you are saving a video frame or a large text file to the array. This also opens you up to a buffer overrun attack in older versions of Windows. However, this is sometimes a good quick hack solution for smaller (homework) programs where you can count on the user (i.e. your professor who is not trying to break your program) to not input 1000's of characters. Usually this is considered bad practice, please consider my 2nd option for the security reason I mentioned before.
Use dynamic arrays (i.e. malloc). This is probably what your professor wants you to do as this sounds like a typical problem to use when a student is first learning pointers and arrays. This is a great approach, just remember to call free on your memory when you are finished. The tricky part here is that you still have to know the size of the array you want ahead of time (not at compile time though of course).