I'm well on my way on a programming language I've written in Java that compiles directly into C99 code. I want to add string interpolation functionality and am not sure what the resulting C code would be. In Ruby, you can interpolate strings: puts "Hello #{name}!" What would be the equivalent in C?
So called interpolated strings are really just expressions in disguise, consisting of string concatenation of the various parts of the string content, alternating string literal fragments with interpolated subexpressions converted to string values.
The interpolated string
"Hello #{name}!"
is equivalent to
concatenate(concatenate("Hello",toString(name)),"!")
The generalization to more complicated interpolated strings should be obvious.
You can compile it to the equivalent of this in C. You will need a big library of type-specific toString operations to match the types in your language. User defined types will be fun.
You may be able to implement special cases of this using "sprintf", which is the string-building version of C's "printf" library function, in cases where the types of the interpolated expressions match the limited set of types that printf format strings can handle (e.g., native ints and floats).
The printf family would be good to read, as is the scanf family of functions in the C library.
Related
I'm programming in Windows and MSVC. There are two ways to write a DEBUG_PRINT statement as I know of:
printf(__FUNCTION__": Error code: %x\n", ErrorCode);
printf("%s: Error code: %x\n", __FUNCTION__, ErrorCode);
Is it okay to concatenate predefined macro with strings like this? I don't know if predefined macro like __FUNCTION__ or __LINE__ is a legit string literal. And intuitively, seems like a dangerous way to treat strings like this in C.
And what's the difference between these two? As I used /FAcs compiler option to output the code snippet to assembly, I really can't see much of a difference.
First of all __FUNCTION__ is not in the C standard, you should probably use __func__ instead (except that microsoft has decided to skip support for that in their C compiler).
Second __FUNCTION__/__func__ "macro" is not really a macro (or at least not normally - microsoft's compiler seem to behave differently), it behaves more like a local variable and therefore it isn't a candidate for the string concatenation. You should use string formatting instead (since that will ensure that your code will become more portable).
The __LINE__ macro (is a macro), but it doesn't work well with string concatenation directly since it doesn't expand to a string - it expands to a number (which by the way can be useful in other cases). However you can use the preprocessor to stringify it (the XSTR macro will first expand it's argument and then stringify the result while STR will not expand the it's argument before stringifying it):
#define STR(x) # x
#define XSTR(x) STR(x)
printf("I'm on line #" XSTR(__LINE__));
The __FILE__ macro (which is also a macro) does expand to a string literal which plays well together with string concatenation directly.
The reason you don't see any difference is that the compiler knows what printf does and can use that for optimization. It would figure out that it doesn't need to rely on printf code to expand the %s at runtime since it can do it at compile time.
The former will concatenate the function name in __FUNCTION__ with the format string at compile-time. The second will format it into the output at runtime.
This assumes it's a pre-defined macro (it's not standard). MSVC has __FUNCTION__ as a proper string literal, GCC does not.
__LINE__ is supported by GCC, and expands to a decimal integer, i.e. not a string.
For performance reasons, I would suggest always using the first way when possible, i.e. when the two strings are compile-time constant. There will be a price to pay, as usual: the string pool will be larger, since each use of the macro creates a unique format string. For desktop-class systems, this is probably neglible.
The difference is that in the 1st case the string literal gets concatenated with the format string during the compiler translation phases, while in the second case, it gets read in during run-time. So the first method is much faster.
If you know that the macro is a pre-defined string literal, I don't see anything wrong with the code.
That being said, I have no idea what __FUNCTION__ is. There is a standard C macro __func__ but it is not a string literal and should be treated like a const char array.
__LINE__ is a standard C macro that gives the source file line as an integer.
Thanks for answering. Looks like the first way is a legit literal string concatenate, it just does it in build-time, and it's faster, just space consuming.
I think I'll stick in the first way. Thanks again.
In C, if I put a literal string like "Hello World \n\t\x90\x53" into my code, the compiler will parse the escape sequences into the correct bytes and leave the rest of characters alone.
If the above string is instead supplied by the user, either on the command line or in a file, is there a way to invoke the compiler's functionality to get the same literal bytes into a char[]?
Obviously I could manually implement the functionality by hardcoding the escape sequences, but I would prefer not to do that if I can just invoke some compiler library instead.
No, there's no standard function to do that.
A suggestion for a non-standard library solution is to use glib's g_strcompress() function.
The following prints the percentage of memory used.
printf (TEXT("There is %*ld percent of memory in use.\n"),
WIDTH, statex.dwMemoryLoad);
WIDTH is defined to be equal to 7.
What does TEXT mean, and where is this sort of syntax defined in printf?
As others already said, TEXT is probably a macro.
To see what they become, simply look at the preprocessor output. If are using gcc:
gcc -E file.c
Just guessing but TEXT is a char* to char* function that takes care of translating a text string for internationalization support.
Note that if this is the case then may be you are also required to always use TEXT with a string literal (and not with expressions or variables) to allow an external tool to detect all literals that need translations by a simple scan of the source code. For example may be you should never write:
puts(TEXT(flag ? "Yes" : "No"));
and you should write instead
puts(flag ? TEXT("Yes") : TEXT("No"));
Something that is instead standard but not used very often is the parameteric width of a field: for example in printf("%*i", x, y) the first parameter x is the width used to print the second parameter y as a decimal value.
When used with scanf instead the * special char can be used to specify that you don't want to store the field (i.e. to "skip" it instead of reading it).
TEXT() is probably a macro or function which returns a string value. I think it is user defined and does some manner of formatting on that string which is passed as an argument to the TEXT function. You should go to the function declaration for TEXT() to see what exactly it does.
TEXT() is a unicode support macro defined in winnt.h. If UNICODE is defined then it prepends L to the string making it wide.
Also see TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE blog post.
_TEXT() or _T() is a microsoft specific macro.
This MSDN link says
To simplify code development for various international markets,
the Microsoft run-time library provides Microsoft-specific "generic-text" mappings for many data types, routines, and other objects.
These mappings are defined in TCHAR.H.
You can use these name mappings to write generic code that can be compiled for any of the three kinds of character sets:
ASCII (SBCS), MBCS, or Unicode, depending on a manifest constant you define using a #define statement.
Generic-text mappings are Microsoft extensions that are not ANSI compatible.
_TEXT is a macro to make a strings "character set neutral".
For example _T("HELLO");
Characters can either be denoted by 8 bit ANSI standards or the 16 bit Unicode notation.
If you define _TEXT for all strings and define a preprocessor symbol "_UNICODE", all such strings will follow UNICODE encoding. If you don’t define _UNICODE, the strings will all be ANSI.
Hence the macro _TEXT allows you to have all strings as UNICODE or ANSI.
So no need to change every time you change your character set.
I've got two C strings that I want to append and result should be assigned to an lhs variable. I saw a static initialization code like:
char* out = "May God" "Bless You";.
The output was really "May GodBless You" on printing out. I understand this result can be output of some undefined behaviour.
The code was actually in production and never gave wrong results. And it was not like we had such statements only at one place. It could be seen at multiple places of very much stable code and were used to form sql queries.
Does C standard allow such concatenation?
Yes, it is guaranteed.
Extract from http://en.wikipedia.org/wiki/C_syntax#String_literal_concatenation :
Adjacent string literals are
concatenated at compile time; this
allows long strings to be split over
multiple lines, and also allows string
literals resulting from C preprocessor
defines and macros to be appended to
strings at compile time
Yes. this concatenation is allowed in C, it is not undefined behavior.
Although I think it should produce "May GodBless You" (since there is no space in the quoted part)
The Standard says
5.1.1.2 Translation phases
6. Adjacent string literal tokens are concatenated.
So, the Solaris compiler was doing the right thing.
I'm exploring wxWidgets and at the same time learning C/C++. Often wxWidgets functions expect a wxString rather than a string, therefore wxWidgets provides a macro wxT(yourString) for creating wxStrings. My question concerns the expansion of this macro. If you type wxT("banana") the expanded macro reads L"banana". What meaning does this have in C? Is L a function here that is called with argument "banana"?
"banana" is the word written using 1-byte ASCII characters.
L"banana" is the word written using multi-byte (general 2=byte UNICODE) characters.
L is a flag on strings to let it know it's a wide (unicode) string.
The L tells your compiler that it's a unicode string instead of a "normal" one.