Customizable implementation of sprintf() - c

Can anyone point me to a source code file or to a package that has a good, reusable implementation of sprintf() in C which I can customize as per my own need?
An explanation on why I need it: Strings are not null terminated in my code (binary compatible). Therefore sprintf("%s") is useless unless I fix the code to understand how to render string.
Thanks to quinmars for pointing out that there is way to print string through %s without it being null terminated. Though it solves the requriement right now, I shall eventually need the sprintf (or snprintf) implementation for higher level functions which use variants. Out of other mentioned till now, it seems to me that SQLite implementation is the best. Thanks Doug Currie for pointing it out.

I haven't tried it, because I don't have a compiler here, but reading the man page, it looks like that you can pass a precision for '%s':
... If a precision is given, no null character
need be present; if the precision is not specified, or is greater
than the size of the array, the array must contain a terminating
NUL character.
So have you tried to do something like that?
snprintf(buffer, sizeof(buffer), "%.*s", bstring_len, bstring);
As said I haven't test it, and if it works, it works of course only if you have no '\0'-byte inside of the string.
EDIT: I've tested it now and it works!

You should really be looking for snprintf (sprintf respecting output buffer size); google suggests http://www.ijs.si/software/snprintf/.

There is a nice public domain implementation as part of SQLite here.
I agree with Dickon Reed that you want snprintf, which is included in the SQLite version.

I have used this guys source code.
It is small, understandable and easy to modify(as opposed to glib & libc).

According to this link- http://www.programmingforums.org/thread12049.html :
If you have the full gcc distribution,
the source for the C library (glib or
libc) is one of the subdirectories
that comes for the ride.
So you can look it up there.
I don't know how helpful that will be...

The only reason I can think of for wanting to modify sprintf is to extend it, and the only reason for extending it is when you're on your way to writing some sort of parser.
If you are looking to create a parser for something like a coding language, XML, or really anything with a syntax, I suggest you look into Lexers and Parser Generators (2 of the most commonly used ones are Flex and Bison) which can pretty much write the extremely complex code for parsers for you (though the tools themselves are somewhat complex).
Otherwise, you can find the code for it in the source files that are included with Visual Studio (at least 2005 and 2008, others might have it, but those 2 definitely do).

snprintf from glibc is customizable via hook/handler mechanism

Just an idea...
Example:
#include <stdio.h>
#include <string.h>
#include <stdarg.h>
int sprintf(char * str, const char * format, ... )
{// Here you can redfine your input before continuing to compy with standard inputs
va_list args;
va_start(args, format);
vsprintf(str,format, args);// This still uses standaes formating
va_end(args);
return 0;// Before return you can redefine it back if you want...
}
int main (void)
{
char h[20];
sprintf(h,"hei %d ",10);
printf("t %s\n",h);
getchar();
return 0;
}

Look at Hanson's C Interfaces: Implementations and Techniques. It is an interesting book in that it is written using Knuth's Literate Programming techniques, and it specifically includes an extensible formatted I/O interface based on snprintf().

A small implementation originally authored by Marco Paland, and I have been maintaining it, fixing many bugs and adding missing functionality, in this repository: eyalroz/printf. It's ~1170 lines-of-code for full C99 sprintf/vsprintf/etc. compared to sqlite's 3993 (although SQLite could use a lot less; it includes this sqliteint.h header with a lot of unrelated stuff; also, %a is not yet supported, as are sub-normal doubles)

Related

Portable way to use the regex(3) functions on a wide char string in C

There are functions like regwcomp(3) etc. on some systems, but this does not seem to be a portable solution at the moment. When there is a wchar_t string, what is the suggested portable solution (not Linux or GNU specific) to use the regex(3) functions (which normally work with char strings only)? In my case it is not really necessary that the pattern or text to match is non-7-bit ASCII, the problem is that the code used wchar_t for other reasons.
If anyone else has this problem, feel free to borrow the functions my_regwcomp and my_regwexec that I had to write recently. You can find them in this source file in the ProofPower system. These functions simulate the regwcomp and regwexec functions of Free BSD using the POSIX regcomp and regexec functions.
PS: my code is part of a Motif application, if you replace XtMalloc, XtRealloc and XtFree by malloc, ralloc and free it should work in any standard C/C++ development framework. Please add a comment to this answer if you need any help getting my functions working in your environment.

Can stdio be used while coding for a Kernel...?

I need to build a OS, a very small and basic one, with actually least functionality, coded in C.
Probably a CUI OS which does some memory management and has at least a text editor and a calculator, its just going to be a experimentation about how to make a code that has full and direct control over your hardware.
Still I'll be requiring an interface, that will need input/output functions like printf(&args), scanf(&args). Now my basic question is should I use existing headers or go for coding actually from scratch, and why so ?
I'd be more than very thankful to you guys for and help.
First, you can't link against anything from libc ... you're going to have to code everything from scratch.
Now having worked on a micro-kernel myself, I would not use the actual stdio headers that come with libc since they are going to be cluttered with a lot of extra information that will be either irrelevant for your OS, or will create compiler errors due to missing definitions, etc. What I would do though is keep the function signatures for these standard functions the same ... so in the end you would have a file called stdio.h for your OS, but it would be a very stripped down header file with the basic minimum requirements for your needs, and only having the standard I/O functions you need, with the correct standard signatures.
Keep in mind on the back-end, i.e., in your stdio.c file, you're going to have to point these functions to a custom console-driver or some other type of character drive for your display. Either that, or you could just use them as wrappers for some other kernel-level display printing routine. You are also going to want to make sure that even though you may use a #include <stdio.h> directive in your other OS code modules to access these printing functions, you do not link against libc. This can be done using gcc -ffreestanding.
Just retarget newlib.
printf, scanf, etc relies on implementation specific funcions to get a single char or print a single char. You can then make your stdin and stdout the UART 1 for example.
Kernel itself would not require the printf and scanf functions, if you do not want to keep the kernel in kernel mode and work the apps you have planned for. But for basic printf and scanf features, you can write your own printf and scanf functions, which would provide basic support for printing ans taking input. I do not have much experience on this, but you can try make a console buffer, where the keyboard driver puts the read in ASCII characters (after conversion from scan codes), and then make the printf and scanf work on it. I have one basic implementation were i have wrote a gets instead of scanf and kept things simple. To get integer output you can write an atoi function to convert the string to a number.
To port in other libraries, you need to make the components which the libraries depend on. You need to make the decision if you can code in those support in the kernel so that the libraries could be ported in. If it is more difficult then coding some basic input output functions i think won't be bad at this stage,

Which string manipulation functions should I use?

On my Windows/Visual C environment there's a wide number of alternatives for doing the same basic string manipulation tasks.
For example, for doing a string copy I could use:
strcpy, the ANSI C standard library function (CRT)
lstrcpy, the version included in kernel32.dll
StrCpy, from the Shell Lightweight Utility library
StringCchCopy/StringCbCopy, from a "safe string" library
strcpy_s, security enhanced version of CRT
While I understand that all these alternatives have an historical reason, can I just choose a consistent set of functions for new code? And which one? Or should I choose the most appropriate function case by case?
First of all, let's review pros and cons of each function set:
ANSI C standard library function (CRT)
Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, it might be a wise thing to have a separation of portable vs. OS-dependent code.
These functions have often assembly level optimization and are therefore very fast.
There are some drawbacks:
they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
there are some archaisms like the infamous strncpy
Kernel32 string functions
Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:
avoiding the CRT payload for an ultra lightweight executable (unusual these days but not in the 90s!)
avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).
Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 12 optimized for a Core i13, kernel32 could use an assembly-optimized version.
Shell Lightweight Utility Functions
Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.
StrSafe Function
The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).
Security-Enhanced CRT functions
These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:
although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
when fail, they don't just return an error code but execute an invalid parameter handler
Conclusions
While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.
In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.
My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.
I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.
Then if your code just has to run under Windows you can use the "safe string" variants.
If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g
glib or
libapr
or other "safe string libraries" like e.g:
SafeStrLibrary
I would suggest using functions from the standard library, or functions from cross-platform libraries.
I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.
But these are just tips, it's a subjective question.
Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.
If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.
If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:
strcpy(out, str1);
strcat(out, str2);
strcat(out, str3);
...
You should be doing something like:
size_t l, n = outsize;
char *s = out;
l = strlen(str1);
if (l>=outsize) goto error;
strcpy(s, str1);
s += l;
n -= l;
l = strlen(str2);
if (l>=outsize) goto error;
strcpy(s, str2);
s += l;
n -= l;
...
Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.
Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:
if (!my_strcpy(&s, &n, str1)) goto error;
Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.
Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:
char *tmp = malloc(match.end-match.start+1);
memcpy(tmp, src+match.start, match.end-match.start);
tmp[match.end-match.start] = 0;
printf("%s\n", tmp);
free(tmp);
The same thing can be accomplished with:
printf("%.*s\m", match.end-match.start, src+match.start);
No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

Should I use secure versions of POSIX functions on MSVC - C

I am writing some C code which is expected to compile on multiple compilers (at least on MSVC and GCC). Since I am beginner in C, I have all warnings turned on and warnings are treated as errors (-Werror in GCC & /WX in MSVC) to prevent me from making silly mistakes.
When I compiled some code that uses strcpy on MSVC, I get warning like,
warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
I am bit confused. Lot of common functions are deprecated on MSVC. Should I use this secured version when on Windows? If yes, should I wrap strcpy something like,
my_strcpy()
{
#ifdef WIN32
// use strcpy_s
#ELSE
// use strcpy
}
Any thoughts?
Whenever you move data between non-constant-size buffers, you have to (gasp! omg!) actually think about whether it fits. Using functions (like the MS-specific strcpy_s or the BSD strlcpy) that purport to be "safe" will protect you from some obvious buffer overflow conditions, but won't protect you from the bugs that result from string truncation. It also won't protect you from integer overflows in computing the necessary sizes of buffers.
Unless you're an expert dealing with C strings, I would recommend forgetting about special functions and commenting every line of your code that will perform variable-length/position writes with a justification for how you know, at this point in the program, that the length/offset you're about to use is within the bounds of the size of the buffer. Do this for lines where you perform arithmetic on sizes/offsets too - document how you know that the arithmetic will not overflow, and add tests for overflow if you find you don't know.
Another approach is to completely wrap all your string handling in a string object that stores the length of the buffer along with the string and automatically reallocates when a string needs to be enlarged, and then only use const char * for read-only access to strings when you need to pass them to system functions or other libraries. This will sacrifice a good bit of the performance you'd expect from C, but it will help you ensure that you don't make mistakes. Just don't take it to the extreme. There's no need to duplicate stuff like strchr, strstr, etc. in your string wrapper. Just provide methods to duplicate string objects, concatenate them, and truncate them, and then with the existing library functions that operate on const char * you can do just about anything you'd want to.
There are lots and lots of discussions about this topic here on SO. The usual suspects like strncpy, strlcpy and whatever will pop up here again, I'm sure. Just type "strcpy" in the search box and read some of the longer threads to get an overview.
My advice is: Whatever your final choice will be, it is a good idea to follow the DRY principle and continue to do it as in your example of my_strcpy(). Don't throw the raw calls all over your code, use wrappers and centralize them in your own string handling library. This will reduce overall code (boilerplate), and you have one central location to make modifications, if you change your mind later.
Of course this opens up some other cans of worms, especially for a beginner: Memory handling responsibility and interface design. Both a topic on its own, and 5 people will give you 10 suggestions of how to do it. A central library usually has the nice effect that it enforces a decision, which you will follow throughout your whole codebase, instead of using method a in module A and method b in module B, causing you trouble when you try to connect A with B...
I would tend to use the safer function snprintf † which is available on both platforms rather than having different paths depending on platform. You will need to use the define to prevent the warnings on MSVC.
† though possibly slightly less safer - it will return a string which is not nul-terminated on error, so you must check the return, but it won't cause a buffer overflow.

How can I use _stprintf in my programs, with and without UNICODE support?

Microsoft's <tchar.h> defines _stprintf as swprintf if _UNICODE is defined, and sprintf if not. But these functions take different arguments! In swprintf, the second argument is the buffer size, but sprintf doesn't have this.
Did somebody goof? If so, this is a big one. How can I use _stprintf in my programs, and have them work with and without _UNICODE?
You're seeing parallel evolution here. swprintf is a latecomer to standard C, after it was discovered that (A) 8 bits is insufficient for text and (B) you should pass buffer sizes along with buffers. TCHAR is a microsoft idea to unify ASCII and Unicode APIs. They dropped the ball, missing point (B). The proper TCHAR solution should have been to define _stprintf as either swprintf or snprintf.
The solution is then to simply wrap <tchar.h> and do this yourself.
these functions take different arguments!
There are two different versions available with MS compilers. Take a look here.
This is in keeping with the ANSI standard. But I think that does not answer your question. I'll skip it for a while and rather tell you how you can have uniformity.
have them work with and without _UNICODE?
You are better off using the 'safe string functions' as per MS recommendations. See this. Use `_stprintf_s' and I think you will get around your problem.
Did somebody goof?
EDITED: I don't think so. I don't have the Rationale handy to give you the answer. I'll post an update when I get my hands on something more concrete. In the meantime look at MSalters' explanation.
A curious thing is MS's C runtime does not claim compatibility with the ISO standard.
Disclaimer: I am not defending the giant of Redmond, only pointing out stuff that strikes me as odd!
This may not be directly answering the question, but one alternative is to use _stprintf_s. You have to add the extra parm, but then it will still compile both ways, and is more future-proof.

Resources