The following prints the percentage of memory used.
printf (TEXT("There is %*ld percent of memory in use.\n"),
WIDTH, statex.dwMemoryLoad);
WIDTH is defined to be equal to 7.
What does TEXT mean, and where is this sort of syntax defined in printf?
As others already said, TEXT is probably a macro.
To see what they become, simply look at the preprocessor output. If are using gcc:
gcc -E file.c
Just guessing but TEXT is a char* to char* function that takes care of translating a text string for internationalization support.
Note that if this is the case then may be you are also required to always use TEXT with a string literal (and not with expressions or variables) to allow an external tool to detect all literals that need translations by a simple scan of the source code. For example may be you should never write:
puts(TEXT(flag ? "Yes" : "No"));
and you should write instead
puts(flag ? TEXT("Yes") : TEXT("No"));
Something that is instead standard but not used very often is the parameteric width of a field: for example in printf("%*i", x, y) the first parameter x is the width used to print the second parameter y as a decimal value.
When used with scanf instead the * special char can be used to specify that you don't want to store the field (i.e. to "skip" it instead of reading it).
TEXT() is probably a macro or function which returns a string value. I think it is user defined and does some manner of formatting on that string which is passed as an argument to the TEXT function. You should go to the function declaration for TEXT() to see what exactly it does.
TEXT() is a unicode support macro defined in winnt.h. If UNICODE is defined then it prepends L to the string making it wide.
Also see TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE blog post.
_TEXT() or _T() is a microsoft specific macro.
This MSDN link says
To simplify code development for various international markets,
the Microsoft run-time library provides Microsoft-specific "generic-text" mappings for many data types, routines, and other objects.
These mappings are defined in TCHAR.H.
You can use these name mappings to write generic code that can be compiled for any of the three kinds of character sets:
ASCII (SBCS), MBCS, or Unicode, depending on a manifest constant you define using a #define statement.
Generic-text mappings are Microsoft extensions that are not ANSI compatible.
_TEXT is a macro to make a strings "character set neutral".
For example _T("HELLO");
Characters can either be denoted by 8 bit ANSI standards or the 16 bit Unicode notation.
If you define _TEXT for all strings and define a preprocessor symbol "_UNICODE", all such strings will follow UNICODE encoding. If you don’t define _UNICODE, the strings will all be ANSI.
Hence the macro _TEXT allows you to have all strings as UNICODE or ANSI.
So no need to change every time you change your character set.
Related
I stumbled on some C++ code like this:
int $T$S;
First I thought that it was some sort of PHP code or something wrongly pasted in there but it compiles and runs nicely (on MSVC 2008).
What kind of characters are valid for variables in C++ and are there any other weird characters you can use?
The only legal characters according to the standard are alphanumerics
and the underscore. The standard does require that just about anything
Unicode considers alphabetic is acceptable (but only as single
code-point characters). In practice, implementations offer extensions
(i.e. some do accept a $) and restrictions (most don't accept all of the
required Unicode characters). If you want your code to be portable,
restrict symbols to the 26 unaccented letters, upper or lower case, the
ten digits, and the '_'.
It's an extension of some compilers and not in the C standard
MSVC:
Microsoft Specific
Only the first 2048 characters of Microsoft C++ identifiers are significant. Names for user-defined types are "decorated" by the compiler to preserve type information. The resultant name, including the type information, cannot be longer than 2048 characters. (See Decorated Names for more information.) Factors that can influence the length of a decorated identifier are:
Whether the identifier denotes an object of user-defined type or a type derived from a user-defined type.
Whether the identifier denotes a function or a type derived from a function.
The number of arguments to a function.
The dollar sign is also a valid identifier in Visual C++.
// dollar_sign_identifier.cpp
struct $Y1$ {
void $Test$() {}
};
int main() {
$Y1$ $x$;
$x$.$Test$();
}
https://web.archive.org/web/20100216114436/http://msdn.microsoft.com/en-us/library/565w213d.aspx
Newest version: https://learn.microsoft.com/en-us/cpp/cpp/identifiers-cpp?redirectedfrom=MSDN&view=vs-2019
GCC:
6.42 Dollar Signs in Identifier Names
In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.
http://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html#Dollar-Signs
In my knowledge only letters (capital and small), numbers (0 to 9) and _ are valid for variable names according to standard (note: the variable name should not start with a number though).
All other characters should be compiler extensions.
This is not good practice. Generally, you should only use alphanumeric characters and underscores in identifiers ([a-z][A-Z][0-9]_).
Surface Level
Unlike in other languages (bash, perl), C does not use $ to denote the usage of a variable. As such, it is technically valid. In C it most likely falls under C11, 6.4.2. This means that it does seem to be supported by modern compilers.
As for your C++ question, lets test it!
int main(void) {
int $ = 0;
return $;
}
On GCC/G++/Clang/Clang++, this indeed compiles, and runs just fine.
Deeper Level
Compilers take source code, lex it into a token stream, put that into an abstract syntax tree (AST), and then use that to generate code (e.g. assembly/LLVM IR). Your question really only revolves around the first part (e.g. lexing).
The grammar (thus the lexer implementation) of C/C++ does not treat $ as special, unlike commas, periods, skinny arrows, etc... As such, you may get an output from the lexer like this from the below c code:
int i_love_$ = 0;
After the lexer, this becomes a token steam like such:
["int", "i_love_$", "=", "0"]
If you where to take this code:
int i_love_$,_and_.s = 0;
The lexer would output a token steam like:
["int", "i_love_$", ",", "_and_", ".", "s", "=", "0"]
As you can see, because C/C++ doesn't treat characters like $ as special, it is processed differently than other characters like periods.
I find in the new C++ Standard
2.11 Identifiers [lex.name]
identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit
identifier-nondigit:
nondigit
universal-character-name
other implementation-defined character
with the additional text
An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified
in E.1. [...]
I can not quite comprehend what this means. From the old std I am used to that a "universal character name" is written \u89ab for example. But using those in an identifier...? Really?
Is the new standard more open w.r.t to Unicode? And I do not refer to the new Literal Types "uHello \u89ab thing"u32, I think I understood those. But:
Can (portable) source code be in any unicode encoding, like UTF-8, UTF-16 or any (how-ever-defined) codepage?
Can I write an identifier with \u1234 in it myfu\u1234ntion (for whatever purpose)
Or can i use the "character names" that unicode defines like in the ICU, i.e.
const auto x = "German Braunb\U{LOWERCASE LETTER A WITH DIARESIS}r."u32;
or even in an identifier in the source itself? That would be a treat... cough...
I think the answer to all thise questions is no but I can not map this reliably to the wording in the standard... :-)
Edit: I found "2.2 Phases of translation [lex.phases]", Phase 1:
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set [...] if necessary. The set of physical source file characters accepted is implementation-defined. [...] Any source file character not in the basic
source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)
By reading this I now think, that a compiler may choose to accept UTF-8, UTF-16 or any codepage it wishes (by meta information or user configuration). In Phase 1 it translates this into an ASCII form ("basic source character set") in which then the Unicode-characters are replaced by its \uNNNN notation (or the compiler can choose to continue to work in its Unicode-representation, but than has to make sure it handles the other \uNNNN the same way.
What do you think?
Is the new standard more open w.r.t to Unicode?
With respect to allowing universal character names in identifiers the answer is no; UCNs were allowed in identifiers back in C99 and C++98. However compilers did not implement that particular requirement until recently. Clang 3.3 I think introduces support for this and GCC has had an experimental feature for this for some time. Herb Sutter also mentioned during his Build 2013 talk "The Future of C++" that this feature would also be coming to VC++ at some point. (Although IIRC Herb refers to it as a C++11 feature; it is in fact a C++98 feature.)
It's not expected that identifiers will be written using UCNs. Instead the expected behavior is to write the desired character using the source encoding. E.g., source will look like:
long pörk;
not:
long p\u00F6rk;
However UCNs are also useful for another purpose; Compilers are not all required to accept the same source encodings, but modern compilers all support some encoding scheme where at least the basic source characters have the same encoding (that is, modern compilers all support some ASCII compatible encoding).
UCNs allow you to write source code with only the basic characters and yet still name extended characters. This is useful in, for example, writing a string literal "°" in source code that will be compiled both as CP1252 and as UTF-8:
char const *degree_sign = "\u00b0";
This string literal is encoded into the appropriate execution encoding on multiple compilers, even when the source encodings differ, as long as the compilers at least share the same encoding for basic characters.
Can (portable) source code be in any unicode encoding, like UTF-8, UTF-16 or any (how-ever-defined) codepage?
It's not required by the standard, but most compilers will accept UTF-8 source. Clang supports only UTF-8 source (although it has some compatibility for non-UTF-8 data in character and string literals), gcc allows the source encoding to be specified and includes support for UTF-8, and VC++ will guess at the encoding and can be made to guess UTF-8.
(Update: VS2015 now provides an option to force the source and execution character sets to be UTF-8.)
Can I write an identifier with \u1234 in it myfu\u1234ntion (for whatever purpose)
Yes, the specification mandates this, although as I said not all compilers implement this requirement yet.
Or can i use the "character names" that unicode defines like in the ICU, i.e.
const auto x = "German Braunb\U{LOWERCASE LETTER A WITH DIARESIS}r."u32;
No, you cannot use Unicode long names.
or even in an identifier in the source itself? That would be a treat... cough...
If the compiler supports a source code encoding that contains the extended character you want then that character written literally in the source must be treated exactly the same as the equivalent UCN. So yes, if you use a compiler that supports this requirement of the C++ spec then you may write any character in its source character set directly in the source without bothering with writing UCNs.
I think the intent is to allow Unicode characters in identifiers, such as:
long pöjk;
ostream* å;
I suggest using clang++ instead of g++. Clang is designed to be highly compatible with GCC (wikipedia-source), so you can most likely just substitute that command.
I wanted to use Greek symbols in my source code.
If code readability is the goal, then it seems reasonable to use (for example) α over alpha. Especially when used in larger mathematical formulas, they can be read more easily in the source code.
To achieve this, this is a minimal working example:
> cat /tmp/test.cpp
#include <iostream>
int main()
{
int α = 10;
std::cout << "α = " << α << std::endl;
return 0;
}
> clang++ /tmp/test.cpp -o /tmp/test
> /tmp/test
α = 10
This article https://www.securecoding.cert.org/confluence/display/seccode/PRE30-C.+Do+not+create+a+universal+character+name+through+concatenation works with the idea that int \u0401; is compliant code, though it's based on C99, instead of C++0x.
Present versions of gcc (up to version 5.2 so far) only support ASCII and in some cases EBCDIC input files. Therefore, unicode characters in identifiers have to be represented using \uXXXX and \UXXXXXXXX escape sequences in ASCII encoded files. While it may be possible to represent unicode characters as ??/uXXXX and ??/UXXXXXXX in EBCDIC encoded input files, I have not tested this. At anyrate, a simple one-line patch to cpp allows direct reading of UTF-8 input provided a recent version of iconv is installed. Details are in
https://www.raspberrypi.org/forums/viewtopic.php?p=802657
and may be summarized by the patch
diff -cNr gcc-5.2.0/libcpp/charset.c gcc-5.2.0-ejo/libcpp/charset.c
*** gcc-5.2.0/libcpp/charset.c Mon Jan 5 04:33:28 2015
--- gcc-5.2.0-ejo/libcpp/charset.c Wed Aug 12 14:34:23 2015
***************
*** 1711,1717 ****
struct _cpp_strbuf to;
unsigned char *buffer;
! input_cset = init_iconv_desc (pfile, SOURCE_CHARSET, input_charset);
if (input_cset.func == convert_no_conversion)
{
to.text = input;
--- 1711,1717 ----
struct _cpp_strbuf to;
unsigned char *buffer;
! input_cset = init_iconv_desc (pfile, "C99", input_charset);
if (input_cset.func == convert_no_conversion)
{
to.text = input;
Ignoring that there are sometimes better non-macro ways to do this (I have good reasons, sadly), I need to write a big bunch of generic code using macros. Essentially a macro library that will generate a large number of functions for some pre-specified types.
To avoid breaking a large number of pre-existing unit tests, one of the things the library must do is, for every type, generate the name of that type in all caps for printing. E.g. a type "flag" must be printed as "FLAG".
I could just manually write out constants for each type, e.g.
#define flag_ALLCAPSNAME FLAG
but this is not ideal. I'd like to be able to do this programatically.
At present, I've hacked this together:
char capname_buf[BUFSIZ];
#define __MACRO_TO_UPPERCASE(arg) strcpy(capname_buf, arg); \
for(char *c=capname_buf;*c;c++)*c = (*c >= 'a' && *c <= 'z')? *c - 'a' + 'A': *c;
__MACRO_TO_UPPERCASE(#flag)
which does what I want to some extent (i.e. after this bit of code, capname_buf has "FLAG" as its contents), but I would prefer a solution that would allow me to define a string literal using macros instead, avoiding the need for this silly buffer.
I can't see how to do this, but perhaps I'm missing something obvious?
I have a variadic foreach loop macro written (like this one), but I can't mutate the contents of the string literal produced by #flag, and in any case, my loop macro would need a list of character pointers to iterate over (i.e. it iterates over lists, not over indices or the like).
Thoughts?
It is not possible in portable C99 to have a macro which converts a constant string to all uppercase letters (in particular because the notion of letter is related to character encoding. An UTF8 letter is not the same as an ASCII one).
However, you might consider some other solutions.
customize your editor to do that. For example, you could write some emacs code which would update each C source file as you require.
use some preprocessor on your C source code (perhaps a simple C code generator script which would emit a bunch of #define in some #include-d file).
use GCC extensions to have perhaps
#define TO_UPPERCASE_COUNTED(Str,Cnt)
#define TO_UPPERCASE(Str) TO_UPPERCASE_COUNTED(Str,__COUNT__) {( \
static char buf_##Cnt[sizeof(Str)+4]; \
char *str_##Cnt = Str; \
int ix_##Cnt = 0; \
for (; *str_##Cnt; str_##Cnt++, ix_##Cnt++) \
if (ix_##Cnt < sizeof(buf_##Cnt)-1) \
buf_##Cnt[ix_##Cnt] = toupper(*str_##Cnt); \
buf_##Cnt; )}
customize GCC, perhaps using MELT (a domain specific language to extend GCC), to provide your __builtin_capitalize_constant to do the job (edit: MELT is now an inactive project). Or code in C++ your own GCC plugin doing that (caveat, it will work with only one given GCC version).
It's not possible to do this entirely using the c preprocessor. The reason for this is that the preprocessor reads the input as (atomic) pp-tokens from which it composes the output. There's no construct for the preprocessor to decompose a pp-token into individual characters in any way (no one that would help you here anyway).
In your example when the preprocessor reads the string literal "flag" it's to the preprocessor basically an atomic chunk of text. It have constructs to conditionally remove such chunks or glue them together into larger chunks.
The only construct that allows you in some sense to decompose a pp-token is via some expressions. However these expressions only can work on arithmetic types which is why they won't help you here.
Your approach circumvents this problem by using C language constructs, ie you do the conversion at runtime. The only thing the preprocessor does then is to insert the C code to convert the string.
Basically I have some simple code that does some things for files and I'm trying to port it to windows. I have something that looks like this:
int SomeFileCall(const char * filename){
#ifndef __unix__
SomeWindowsFileCall(filename);
#endif
#ifdef __unix__
/**** Some unix only stat code here! ****/
#endif
}
the line SomeWindowsFileCall(filename); causes the compiler error:
cannot convert parameter 1 from 'const char *' to 'LPCWSTR'
How do I fix this, without changing the SomeFileCall prototype?
Most of the Windows APIs that take strings have two versions: one that takes char * and one that takes WCHAR * (that latter is equivalent to wchar_t *).
SetWindowText, for example, is actually a macro that expands to either SetWindowTextA (which takes char *) or SetWindowTextW (which takes WCHAR *).
In your project, it sounds like all of these macros are referencing the -W versions. This is controlled by the UNICODE preprocessor macro (which is defined if you choose the "Use Unicode Character Set" project option in Visual Studio). (Some of Microsoft's C and C++ run time library functions also have ANSI and wide versions. Which one you get is selected by the similarly-named _UNICODE macro that is also defined by that Visual Studio project setting.)
Typically, both of the -A and -W functions exist in the libraries and are available, even if your application is compiled for Unicode. (There are exceptions; some newer functions are available only in "wide" versions.)
If you have a char * that contains text in the proper ANSI code page, you can call the -A version explicitly (e.g., SetWindowTextA). The -A versions are typically wrappers that make wide character copies of the string parameters and pass control to the -W versions.
An alternative is to make your own wide character copies of the strings. You can do this with MultiByteToWideChar. Calling it can be tricky, because you have to manage the buffers. If you can get away with calling the -A version directly, that's generally simpler and already tested. But if your char * string is using UTF-8 or any encoding other than the user's current ANSI code page, you should do the conversion yourself.
Bonus Info
The -A suffix stands for "ANSI", which was the common Windows term for a single-byte code-page character set.
The -W suffix stands for "Wide" (meaning the encoding units are wider than a single byte). Specifically, Windows uses little-endian UTF-16 for wide strings. The MSDN documentation simply calls this "Unicode", which is a little bit of a misnomer.
Configure your project to use ANSI character set. (General -> Character Set)
What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR etc.
typedef const wchar_t* LPCWSTR;
{project properties->advanced->character set->use multi byte character set} İf you do these step you problem is solved
You are building with WinApi in Unicode mode, so all string parameters resolve to wide strings. The simplest fix would be to change the WinApi to ANSI, otherwise you need to create a wchar_t* with the contents from filename and use that as an argument.
Am able to solve this error by setting the Character set to "Use Multi-Byte Character set"
[Project Properties-> Configuration Properties -> General -> Character Set ->"Use Multi-Byte Character set"
not sure what compiler you are using but in visual studio you can specify the default char type, whether it be UNICODE or multibyte. In your case it sounds as if UNICODE is default so the simplest solution is to check for the switch on your particular compiler that determines default char type because it would save you some work, otherwise you would end up adding code to convert back and forth from UNICODE which may add unnecessary overhead plus could be an additional source of error.
I'm exploring wxWidgets and at the same time learning C/C++. Often wxWidgets functions expect a wxString rather than a string, therefore wxWidgets provides a macro wxT(yourString) for creating wxStrings. My question concerns the expansion of this macro. If you type wxT("banana") the expanded macro reads L"banana". What meaning does this have in C? Is L a function here that is called with argument "banana"?
"banana" is the word written using 1-byte ASCII characters.
L"banana" is the word written using multi-byte (general 2=byte UNICODE) characters.
L is a flag on strings to let it know it's a wide (unicode) string.
The L tells your compiler that it's a unicode string instead of a "normal" one.