C string type definition - c

I was just trying to make a string like type when you could just write:
string s;
And I thought to do it like:
#define string char *
Then for example in main function I have to write only string s; and if I type different variable name then it doesn't work.
How can I improve the definition or maybe how can I use "typedef" for this job if it's not a bad practice to do it so. Or is there any better approach to make variables type of string?
I was searching but I think I couldn't find the answer.

You can't make variables of type string in C, because "string" is not a type.
A "string" is, by definition, "a contiguous sequence of characters terminated by and including the first null character". It's not a data type, it's a data format.
An array of char may contain a string. A char* may point to a string. Neither of them is a string.
If you like, you can define
typedef char *string; /* not recommended */
but that's misleading, since a variable of type char*, as I mentioned, isn't a string.
The best practice is simply to use char* directly. This makes it clear that your variable is a pointer. It's also consistent with the way the standard library is defined; for example, the strlen function is declared as:
size_t strlen(const char *s);
It's also consistent with the way most experienced C programmers write code that deals with strings.
Because of the way C treats arrays (more or less as second-class citizens), arrays, including arrays that contain strings, are usually manipulated via pointers to their elements. We can use pointer arithmetic to traverse an array. Pretending that the pointer is the array, or that it is a string, is tempting, and might seem to make the code more understandable, but in the long run it just causes confusion.
A macro approach like
#define string char*
is even worse than a typedef. Macros are expanded as sequences of tokens; the processor doesn't know about the syntax of C declarations. So given the above definition, this:
string x, y;
expands to
char* x, y;
which defines x as a char* and y as a char. If you need a name for a type, typedef is almost always better than #define.

You have to learn what the Preprocessor is. As for your problem, the right solution is
typedef char *string;
and if you want to use the preprocessor remove the semi colon and the s like this
#define string char *
In the second case the preprocessor will replace each occurence of string with char *, and hence if you declare
string x, y;
it will expand to
char *x, y;
where x if a pointer to char and y is simply a char, this is misleading and should be avoided.
One more thing, when working in C it is never a good Idea to hide the fact that some variable is a pointer, so things like
typedef SomeType *SomeTypeName;
where SomeType could be any type, are generally a bad idea, you can do something I've seen that clarifies this a little, you can append a P to SomeType, like this
typedef SomeType *SomeTypeP;
but I personally prefer the * to any of these typedefs

Don't. string is a C++ keyword. Not only it will cause problems if C++ becomes involved, but more importantly, it causes confusion and hides the fact that you are dealing with a pointer.
UPDATE: Right guys, it's not a keyword. But if like many people you put using namespace std; in your code, then string is a C++ class from the standard library.
What I wanted to say is that it is not a common practice to make an alias for char * and it might confuse other people looking at your code.

Related

C function that returns a pointer to an array correct syntax?

In C you can declare a variable that points to an array like this:
int int_arr[4] = {1,2,3,4};
int (*ptr_to_arr)[4] = &int_arr;
Although practically it is the same as just declaring a pointer to int:
int *ptr_to_arr2 = int_arr;
But syntactically it is something different.
Now, how would a function look like, that returns such a pointer to an array (of int e.g.) ?
A declaration of int is int foo;.
A declaration of an array of 4 int is int foo[4];.
A declaration of a pointer to an array of 4 int is int (*foo)[4];.
A declaration of a function returning a pointer to an array of 4 int is int (*foo())[4];. The () may be filled in with parameter declarations.
As already mentioned, the correct syntax is int (*foo(void))[4]; And as you can tell, it is very hard to read.
Questionable solutions:
Use the syntax as C would have you write it. This is in my opinion something you should avoid, since it's incredibly hard to read, to the point where it is completely useless. This should simply be outlawed in your coding standard, just like any sensible coding standard enforces function pointers to be used with a typedef.
Oh so we just typedef this just like when using function pointers? One might get tempted to hide all this goo behind a typedef indeed, but that's problematic as well. And this is since both arrays and pointers are fundamental "building blocks" in C, with a specific syntax that the programmer expects to see whenever dealing with them. And the absensce of that syntax suggests an object that can be addressed, "lvalue accessed" and copied like any other variable. Hiding them behind typedef might in the end create even more confusion than the original syntax.
Take this example:
typedef int(*arr)[4];
...
arr a = create(); // calls malloc etc
...
// somewhere later, lets make a hard copy! (or so we thought)
arr b = a;
...
cleanup(a);
...
print(b); // mysterious crash here
So this "hide behind typedef" system heavily relies on us naming types somethingptr to indicate that it is a pointer. Or lets say... LPWORD... and there it is, "Hungarian notation", the heavily criticized type system of the Windows API.
A slightly more sensible work-around is to return the array through one of the parameters. This isn't exactly pretty either, but at least somewhat easier to read since the strange syntax is centralized to one parameter:
void foo (int(**result)[4])
{
...
*result = &arr;
}
That is: a pointer to a pointer-to-array of int[4].
If one is prepared to throw type safety out the window, then of course void* foo (void) solves all of these problems... but creates new ones. Very easy to read, but now the problem is type safety and uncertainty regarding what the function actually returns. Not good either.
So what to do then, if these versions are all problematic? There are a few perfectly sensible approaches.
Good solutions:
Leave allocation to the caller. This is by far the best method, if you have the option. Your function would become void foo (int arr[4]); which is readable and type safe both.
Old school C. Just return a pointer to the first item in the array and pass the size along separately. This may or may not be acceptable from case to case.
Wrap it in a struct. For example this could be a sensible implementation of some generic array type:
typedef struct
{
size_t size;
int arr[];
} array_t;
array_t* alloc (size_t items)
{
array_t* result = malloc(sizeof *result + sizeof(int[items]));
return result;
}
The typedef keyword can make things a lot clearer/simpler in this case:
int int_arr[4] = { 1,2,3,4 };
typedef int(*arrptr)[4]; // Define a pointer to an array of 4 ints ...
arrptr func(void) // ... and use that for the function return type
{
return &int_arr;
}
Note: As pointed out in the comments and in Lundin's excellent answer, using a typedef to hide/bury a pointer is a practice that is frowned-upon by (most of) the professional C programming community – and for very good reasons. There is a good discussion about it here.
However, although, in your case, you aren't defining an actual function pointer (which is an exception to the 'rule' that most programmers will accept), you are defining a complicated (i.e. difficult to read) function return type. The discussion at the end of the linked post delves into the "too complicated" issue, which is what I would use to justify use of a typedef in a case like yours. But, if you should choose this road, then do so with caution.

The difference between "char* variable" and "char *variable" in C [duplicate]

Why do most C programmers name variables like this:
int *myVariable;
rather than like this:
int* myVariable;
Both are valid. It seems to me that the asterisk is a part of the type, not a part of the variable name. Can anyone explain this logic?
They are EXACTLY equivalent.
However, in
int *myVariable, myVariable2;
It seems obvious that myVariable has type int*, while myVariable2 has type int.
In
int* myVariable, myVariable2;
it may seem obvious that both are of type int*, but that is not correct as myVariable2 has type int.
Therefore, the first programming style is more intuitive.
If you look at it another way, *myVariable is of type int, which makes some sense.
Something nobody has mentioned here so far is that this asterisk is actually the "dereference operator" in C.
*a = 10;
The line above doesn't mean I want to assign 10 to a, it means I want to assign 10 to whatever memory location a points to. And I have never seen anyone writing
* a = 10;
have you? So the dereference operator is pretty much always written without a space. This is probably to distinguish it from a multiplication broken across multiple lines:
x = a * b * c * d
* e * f * g;
Here *e would be misleading, wouldn't it?
Okay, now what does the following line actually mean:
int *a;
Most people would say:
It means that a is a pointer to an int value.
This is technically correct, most people like to see/read it that way and that is the way how modern C standards would define it (note that language C itself predates all the ANSI and ISO standards). But it's not the only way to look at it. You can also read this line as follows:
The dereferenced value of a is of type int.
So in fact the asterisk in this declaration can also be seen as a dereference operator, which also explains its placement. And that a is a pointer is not really declared at all, it's implicit by the fact, that the only thing you can actually dereference is a pointer.
The C standard only defines two meanings to the * operator:
indirection operator
multiplication operator
And indirection is just a single meaning, there is no extra meaning for declaring a pointer, there is just indirection, which is what the dereference operation does, it performs an indirect access, so also within a statement like int *a; this is an indirect access (* means indirect access) and thus the second statement above is much closer to the standard than the first one is.
Because the * in that line binds more closely to the variable than to the type:
int* varA, varB; // This is misleading
As #Lundin points out below, const adds even more subtleties to think about. You can entirely sidestep this by declaring one variable per line, which is never ambiguous:
int* varA;
int varB;
The balance between clear code and concise code is hard to strike — a dozen redundant lines of int a; isn't good either. Still, I default to one declaration per line and worry about combining code later.
I'm going to go out on a limb here and say that there is a straight answer to this question, both for variable declarations and for parameter and return types, which is that the asterisk should go next to the name: int *myVariable;. To appreciate why, look at how you declare other types of symbol in C:
int my_function(int arg); for a function;
float my_array[3] for an array.
The general pattern, referred to as declaration follows use, is that the type of a symbol is split up into the part before the name, and the parts around the name, and these parts around the name mimic the syntax you would use to get a value of the type on the left:
int a_return_value = my_function(729);
float an_element = my_array[2];
and: int copy_of_value = *myVariable;.
C++ throws a spanner in the works with references, because the syntax at the point where you use references is identical to that of value types, so you could argue that C++ takes a different approach to C. On the other hand, C++ retains the same behaviour of C in the case of pointers, so references really stand as the odd one out in this respect.
A great guru once said "Read it the way of the compiler, you must."
http://www.drdobbs.com/conversationsa-midsummer-nights-madness/184403835
Granted this was on the topic of const placement, but the same rule applies here.
The compiler reads it as:
int (*a);
not as:
(int*) a;
If you get into the habit of placing the star next to the variable, it will make your declarations easier to read. It also avoids eyesores such as:
int* a[10];
-- Edit --
To explain exactly what I mean when I say it's parsed as int (*a), that means that * binds more tightly to a than it does to int, in very much the manner that in the expression 4 + 3 * 7 3 binds more tightly to 7 than it does to 4 due to the higher precedence of *.
With apologies for the ascii art, a synopsis of the A.S.T. for parsing int *a looks roughly like this:
Declaration
/ \
/ \
Declaration- Init-
Secifiers Declarator-
| List
| |
| ...
"int" |
Declarator
/ \
/ ...
Pointer \
| Identifier
| |
"*" |
"a"
As is clearly shown, * binds more tightly to a since their common ancestor is Declarator, while you need to go all the way up the tree to Declaration to find a common ancestor that involves the int.
That's just a matter of preference.
When you read the code, distinguishing between variables and pointers is easier in the second case, but it may lead to confusion when you are putting both variables and pointers of a common type in a single line (which itself is often discouraged by project guidelines, because decreases readability).
I prefer to declare pointers with their corresponding sign next to type name, e.g.
int* pMyPointer;
People who prefer int* x; are trying to force their code into a fictional world where the type is on the left and the identifier (name) is on the right.
I say "fictional" because:
In C and C++, in the general case, the declared identifier is surrounded by the type information.
That may sound crazy, but you know it to be true. Here are some examples:
int main(int argc, char *argv[]) means "main is a function that takes an int and an array of pointers to char and returns an int." In other words, most of the type information is on the right. Some people think function declarations don't count because they're somehow "special." OK, let's try a variable.
void (*fn)(int) means fn is a pointer to a function that takes an int and returns nothing.
int a[10] declares 'a' as an array of 10 ints.
pixel bitmap[height][width].
Clearly, I've cherry-picked examples that have a lot of type info on the right to make my point. There are lots of declarations where most--if not all--of the type is on the left, like struct { int x; int y; } center.
This declaration syntax grew out of K&R's desire to have declarations reflect the usage. Reading simple declarations is intuitive, and reading more complex ones can be mastered by learning the right-left-right rule (sometimes call the spiral rule or just the right-left rule).
C is simple enough that many C programmers embrace this style and write simple declarations as int *p.
In C++, the syntax got a little more complex (with classes, references, templates, enum classes), and, as a reaction to that complexity, you'll see more effort into separating the type from the identifier in many declarations. In other words, you might see see more of int* p-style declarations if you check out a large swath of C++ code.
In either language, you can always have the type on the left side of variable declarations by (1) never declaring multiple variables in the same statement, and (2) making use of typedefs (or alias declarations, which, ironically, put the alias identifiers to the left of types). For example:
typedef int array_of_10_ints[10];
array_of_10_ints a;
A lot of the arguments in this topic are plain subjective and the argument about "the star binds to the variable name" is naive. Here's a few arguments that aren't just opinions:
The forgotten pointer type qualifiers
Formally, the "star" neither belongs to the type nor to the variable name, it is part of its own grammatical item named pointer. The formal C syntax (ISO 9899:2018) is:
(6.7) declaration:
declaration-specifiers init-declarator-listopt ;
Where declaration-specifiers contains the type (and storage), and the init-declarator-list contains the pointer and the variable name. Which we see if we dissect this declarator list syntax further:
(6.7.6) declarator:
pointeropt direct-declarator
...
(6.7.6) pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
Where a declarator is the whole declaration, a direct-declarator is the identifier (variable name), and a pointer is the star followed by an optional type qualifier list belonging to the pointer itself.
What makes the various style arguments about "the star belongs to the variable" inconsistent, is that they have forgotten about these pointer type qualifiers. int* const x, int *const x or int*const x?
Consider int *const a, b;, what are the types of a and b? Not so obvious that "the star belongs to the variable" any longer. Rather, one would start to ponder where the const belongs to.
You can definitely make a sound argument that the star belongs to the pointer type qualifier, but not much beyond that.
The type qualifier list for the pointer can cause problems for those using the int *a style. Those who use pointers inside a typedef (which we shouldn't, very bad practice!) and think "the star belongs to the variable name" tend to write this very subtle bug:
/*** bad code, don't do this ***/
typedef int *bad_idea_t;
...
void func (const bad_idea_t *foo);
This compiles cleanly. Now you might think the code is made const correct. Not so! This code is accidentally a faked const correctness.
The type of foo is actually int*const* - the outer most pointer was made read-only, not the pointed at data. So inside this function we can do **foo = n; and it will change the variable value in the caller.
This is because in the expression const bad_idea_t *foo, the * does not belong to the variable name here! In pseudo code, this parameter declaration is to be read as const (bad_idea_t *) foo and not as (const bad_idea_t) *foo. The star belongs to the hidden pointer type in this case - the type is a pointer and a const-qualified pointer is written as *const.
But then the root of the problem in the above example is the practice of hiding pointers behind a typedef and not the * style.
Regarding declaration of multiple variables on a single line
Declaring multiple variables on a single line is widely recognized as bad practice1). CERT-C sums it up nicely as:
DCL04-C. Do not declare more than one variable per declaration
Just reading the English, then common sense agrees that a declaration should be one declaration.
And it doesn't matter if the variables are pointers or not. Declaring each variable on a single line makes the code clearer in almost every case.
So the argument about the programmer getting confused over int* a, b is bad. The root of the problem is the use of multiple declarators, not the placement of the *. Regardless of style, you should be writing this instead:
int* a; // or int *a
int b;
Another sound but subjective argument would be that given int* a the type of a is without question int* and so the star belongs with the type qualifier.
But basically my conclusion is that many of the arguments posted here are just subjective and naive. You can't really make a valid argument for either style - it is truly a matter of subjective personal preference.
1) CERT-C DCL04-C.
Because it makes more sense when you have declarations like:
int *a, *b;
For declaring multiple pointers in one line, I prefer int* a, * b; which more intuitively declares "a" as a pointer to an integer, and doesn't mix styles when likewise declaring "b." Like someone said, I wouldn't declare two different types in the same statement anyway.
When you initialize and assign a variable in one statement, e.g.
int *a = xyz;
you assign the value of xyz to a, not to *a. This makes
int* a = xyz;
a more consistent notation.

How to reuse a literal in a char and a one-character string constant?

I need to specify an argument short option (e.g. -F) both as char and char[] constant in c code. In order to maximize code reusage I want to declare a variable which allows me to change the value in one place (a "literal" - not stringly speaking a string or char literal, but in the sense of the abstract concept). I would prefer a solution which solves this exclusively in preprocessor constants and functions/macros or exclusively in c code to a good explanation why this has to be solved in a mixture of both.
I tried/checked out
to #define FOREGROUND_OPTION_VALUE 'F' which causes me trouble to transform it to a char[] (as preprocessor constant) (writing a macro which stringifies with # causes the ' quotes to be stringified as well
to omit the ' quotes which leaves me with the problem of creating the ' quotes or create a char another way.
#PedroWitzel's answer to declare a char[] and use the 0th char for another constant. That's fine, but I'd prefer a way to create the char[] from the char because that enforces both to be equal (otherwise I'd have to add a compile time assertion that char[] isn't longer than 1).
The only thing that matters for me is code maintenance, nothing else (like cost in processing the code (during compilation or runtime - have not reflected intensively if there could be any and don't care)).
Over and above the discussion in comments to Pedro Witzel's answer, there's another option:
#define FOREGROUND_OPTION_VALUE 'F'
static const char fg_opt_str[] = { FOREGROUND_OPTION_VALUE, '\0' };
It's not a commonly used way of initializing a string, but it is a valid one and seems appropriate for your scenario. Now you can use FOREGROUND_OPTION_VALUE where you need a constant char (or int) value, and fg_opt_str where you need a one-character string. If you change the value defined (to f, say), then you only have to change one place for the code to continue to work, assuming you weren't using f before, which meets your maintainability requirement.
A static constant variable would work for you?
static const char FOREGROUND_OPTION_VALUE[] = "F";

In C, is it good form to use typedef for a pointer?

Consider the following C code:
typedef char * MYCHAR;
MYCHAR x;
My understanding is that the result would be that x is a pointer of type "char". However, if the declaration of x were to occur far away from the typedef command, a human reader of the code would not immediately know that x is a pointer. Alternatively, one could use
typedef char MYCHAR;
MYCHAR *x;
Which is considered to be better form? Is this more than a matter of style?
If the pointer is never meant to be dereferenced or otherwise manipulated directly -- IOW, you only pass it as an argument to an API -- then it's okay to hide the pointer behind a typedef.
Otherwise, it's better to make the "pointerness" of the type explicit.
I would use pointer typedefs only in situations when the pointer nature of the resultant type is of no significance. For example, pointer typedef is justified when one wants to declare an opaque "handle" type which just happens to be implemented as a pointer, but is not supposed to be usable as a pointer by the user.
typedef struct HashTableImpl *HashTable;
/* 'struct HashTableImpl' is (or is supposed to be) an opaque type */
In the above example, HashTable is a "handle" for a hash table. The user will receive that handle initially from, say, CreateHashTable function and pass it to, say, HashInsert function and such. The user is not supposed to care (or even know) that HashTable is a pointer.
But in cases when the user is supposed to understand that the type is actually a pointer and is usable as a pointer, pointer typedefs are significantly obfuscating the code. I would avoid them. Declaring pointers explicitly makes code more readable.
It is interesting to note that C standard library avoids such pointer typedefs. For example, FILE is obviously intended to be used as an opaque type, which means that the library could have defined it as typedef FILE <some pointer type> instead of making us to use FILE * all the time. But for some reason they decided not to.
I don't particularly like typedef to a pointer, but there is one advantage to it. It removes confusion and common mistakes when you declare more than one pointer variable in a single declaration.
typedef char *PSTR;
...
PSTR str1, str2, str3;
is arguably clearer than:
char *str1, str2, str3; // oops
I prefer leaving the *, it shows there's a pointer. And your second example should be shortened as char* x;, it makes no sense.
I also think this is a matter of style/convention. In Apple's Core Graphics library they frequently "hide" the pointer and use a convention of appending "Ref" to the end of the type. So for example, CGImage * corresponds to CGImageRef. That way you still know it's a pointer reference.
Another way to look at it is from the perspective of types. A type defines the operations that are possible on that type, and the syntax to invokes these operations. From this perspective, MYCHAR is whatever it is. It is the programmers responsibility to know the operations allowed on it. If it is declared like the first example, then it supports the * operator. You can always name the identifier appropriately to clarify it's use.
Other cases where it is useful to declare a type that is a pointer is when the nature of the parameter is opaque to the user (programmer). There may be APIs that want to return a pointer to the user, and expect the user to pass it back to the API at some other point. Like a opaque handle or a cookie, to be used by the API only internally. The user does not care about the nature of the parameter. It would make sense not to muddy the waters or expose its exact nature by exposing the * in the API.
If you look at several existing APIs, it looks as if not putting the pointerness into the type seems better style:
the already mentionned FILE *
the MYSQL * returned by MySQL's mysql_real_connect()
the MYSQL * returned by MySQL's mysql_store_result() and mysql_use_result()
and probably many others.
For an API it is not necessary to hide structure definitions and pointers behind "abstract" typedefs.
/* This is part of the (hypothetical) WDBC- API
** It could be found in wdbc_api.h
** The struct connection and struct statement ar both incomplete types,
** but we are allowed to use pointers to incomplete types, as long as we don't
** dereference them.
*/
struct connection *wdbc_connect (char *connection_string);
int wdbc_disconnect (struct connection *con);
int wdbc_prepare (struct connection * con, char *statement);
int main(void)
{
struct connection *conn;
struct statement *stmt;
int rc;
conn = wdbc_connect( "host='localhost' database='pisbak' username='wild' password='plasser'" );
stmt = wdbc_prepare (conn, "Select id FROM users where name='wild'" );
rc = wdbc_disconnect (conn);
return 0;
}
The above fragment compiles fine. (but it fails to link, obviously)
Is this more than a matter of style?
Yes. For instance, this:
typedef int *ip;
const ip p;
is not the same as:
const int *p; // p is non-const pointer to const int
It is the same as:
int * const p; // p is constant pointer to non-const int
Read about const weirdness with typedef here typedef pointer const weirdness

array of N pointers to functions returning pointers to functions

This was asked to me in an interview!
i really got confused
How do I declare an array of N
pointers to functions returning
pointers to functions returning
pointers to characters
could anybody please help?
Typedefs are for wusses. Here's a straightforward, mechanical method for figuring out hairy declarations:
a -- a
a[N] -- is an N-element array
*a[N] -- of pointers
(*a[N])() -- to functions
*(*a[N])() -- returning pointers
(*(*a[N])())() -- to functions
*(*(*a[N])())() -- returning pointers
char *(*(*a[N])())() -- to char.
So, the answer is in the neighborhood of char *(*(*a[N])())();. I say "in the neighborhood" since it's never specified what arguments the functions take.
It's an obnoxious interview question (types this ugly are truly rare IME), but it does give the interviewer an idea of how well you understand declarators. Either that or they were bored and just wanted to see if they could make your brain sieze.
EDIT
Most everyone else recommends using typedefs. The only time I recommend using a typedef is if the type is intended to be truly opaque (i.e., not manipulated directly by the programmer, but passed to an API, sort of like the FILE type). Otherwise, if the programmer is meant to manipulate objects of that type directly, then IME it's better to have all that information available in the declaration, ugly as it may be. For example, something like
NameFuncPickerPointer a[N];
gives me no information on how to actually use a[i]. I don't know that a[i] is callable, or what it returns, or what arguments it should take (if any), or much of anything else. I have to go looking for the typedef
typedef char *NameFunc();
typedef NameFunc *NameFuncPicker();
typedef NameFuncPicker *NameFuncPickerPointer;
and from that puzzle out how to write the expression that actually calls one of the functions. Whereas using the "naked", non-typedef'd declaration, I know immediately that the structure of the call is
char *theName = (*(*a[i])())();
typedef char* (* tCharRetFunc)();
typedef tCharRetFunc (* tFuncRetCharFunc)();
tFuncRetCharFunc arr[N];
Divide big problem into smaller parts:
/* char_func_ptr is pointer to function returning pointer to char */
typedef char* (*char_func_ptr)();
/* func_func_ptr is a pointer to function returning above type */
typedef char_func_ptr (*func_func_ptr)();
/* the_array is array of desired function pointers */
func_func_ptr the_array[42];
array of N pointers to functions returning pointers to functions returns a char:
int (*(*arr_fp[n])(void))(void)
Is this what you are looking for:
typedef char* charptr;
typedef charptr (*innerfun)();
typedef innerfun (*outerfun)();
const size_t N = 10;
outerfun my_outerfun_array[N];
I hope I got it correct, it seems a strange question to me especially in an interview :(
Using typedefs as Christopher tells you is really th only humane way of declaring such a thing. Without tyedefs , it'll become:
char *(*(*arr[10])(void ))(void );
(yes I had to cheat and ran
cdecl> declare arr as array 10 of pointer to function(void) returning pointer to function(void) returning pointer to char )

Resources