Why I can't declare unsigned char* test = "Some text" - c

This isn't working in visual studio 2010 , it gives me the following error
void main (void)
{
unsigned char* test = "ATGST";
}
Edit 1: My question is why this works on Embedded systems, but doesn't work on PC?
But when I change it to :
char* test = "ATGST";
it works.
The main thing that I write code for embedded systems using C, and I use visual studio to test some functions so I don't have to test it in real time on a Micro-controller.
I need an explanation, because Micro-controllers accepts the first code.

Edited to conform to the removal of the C++ tag and to appease the embedded tag.
First, the problem at hand, you are trying to pass a char[] literal into an unsigned char*. You can't really equate char with either unsigned or signed, it is a bit special in that regard. Also, a string literal is given unique storage and should never be modified. If you're dealing with characters, you need to use a standard char* in which char[] can decay into. You could forcefully cast it, but I don't like to recommend such things. It is safe to do, as one of the comments pointed out. Actually, it is actually one of the rare things that are really a safety no-brainer.
But there is far too little space for a tight answer to provide enough qualification on reinterpret_casting, which is basically saying to the compiler that you know what you're doing. That is potentially very dangerous and should only be done when you're quite sure about the problem at hand. The char is usually just generic, not even signed or unsigned. Since an unsigned char has a bigger range than a char and usually char uses the positive subset of the signed char to describe characters (or any other kind of data that can fit), if your data is not in the extended positive range, you're good to go. But, do conform to the environment and code safely.
On the entry point function - conforming edit
Since it has been established that you work on an embedded system, this implies that your program is very likely not required to return anything, so it can remain void main() (it could also be the case that it requires very different returns specified by the given embedded system, the OP knows the most about the requirements his system imposes). In a lot of cases, the reason you can remain with void is because there is no environment/OS to appease, nobody to communicate with. But embedded systems can also be quite specialized and it is best to approach by studying the given platform in detail in order to satisfy the requirements imposed (if any).

For one, you need a const in there. And secondly, char != unsigned char, and also (uniquely) != signed char.
String literals are of type const char[N]- for an appropriate size N, and therefore can only be converted to a const char*. Note that the language has a special rule allowing you to implicitly drop the const but it's still UB to modify a string literal, making it a terribly bad idea to do so.
The micro-controller's C implementation is non-conforming in this regard. It would be better to simply use const char*, as is correct, rather than try to hack VS into accepting incorrect code.

I believe this is the case of assigning string to unsigned char *.
Well, when you assign a string value, it will assign ASCII values associated with characters, so you should use char * in place of unsigned char *.
if you want to assign values other than strings, characters then your implementation is correct.
hope it helps.

If you are using character types for text, use the unqualified char.
If you are using character types as numbers, use unsigned char.
unsigned char, which gives you at least the 0 to 255 range.
for more information: What is an unsigned char?

My question is why this works on Embedded systems, but doesn't work on PC?
Most likely because you are accidentally compiling the PC code in C++ which has stricter type checking than C. In C, it doesn't matter the slightest whether you use unsigned char or plain char, the code will compile just fine.
However, there are some issues with your code that should be fixed, as suggested in other answers. If the code needs to run on embedded and Windows both, you should rewrite it as:
#ifdef _WIN32
int main (void)
#else
void main (void)
#endif
{
const unsigned char* test = "ATGST";
}

Related

Should I use a more universal variable alias/type or use the variable type's that the functions i will use take?

Introduction to the problem
So a few months back while i was really struggling to get better at C, i decided maybe its time for me to ditch the usual var types, char, short, int, long in favor of the alias provided in stdint.h those being the uint8_t, etc... The initial problem is that declaring long on some machine's has different byte size lengths than others but soon i started to see that it became a snob thing for me to "differentiate" me from other IT student's.
I'm now doing a project and as always, I started to use the variables aliases mentioned above, as they are easy for me to read, write and i don't have to wonder what storage allocation will my computer choose to use(i know the size doesn't change at will and its dependent on the computer running it, but not being evident and explicit causes me to just obsess over it).
As usual my debugger start's complaining from time to time especially due to pointer conversion's like so
Passing 'int8_t *' (aka 'signed char *') to parameter of type 'char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is notclang(-Wpointer-sign)
The Question
Although i usually just explicitly cast them i am just wondering if I'm making the code worse because of a preference of mine, and what is the good practice: using a more universal(to the project) variable alias, or just using the aliases depending on how will they be used(this implies using 2-3 different aliases on a file and probably 10+ project wide)?
Example
png_voidp per_chunck_ptr;
size_t readnum;
FILE *fp;
if ((fp = fopen(file, "rb")) == NULL){
fp = NULL;
//ERROR()
//ABORT(GENERAL_OPENING_READ_FILE_ERROR, " ERR01 ");
}
int8_t *pop;
pop=malloc(GENERAL_PNG_SIG_SIZE * sizeof(int8_t));
readnum=fread(pop,sizeof(int8_t),GENERAL_PNG_SIG_SIZE,fp);
if(readnum!=GENERAL_PNG_SIG_SIZE){
fclose(fp);
//ERR 5
fp = NULL;
fprintf(stderr, GENERAL_READ_FILE_ERROR, "ERR02");
exit(5);
}
if(png_sig_cmp((png_const_bytep)pop, 0, GENERAL_PNG_SIG_SIZE)!= 0){
fclose(fp);
fp = NULL;
fprintf(stderr, "File is not recognized as png file\n");
//err 6
exit(6);
}
free(pop);
The error checking is terrible, i use like 6-7 different ways to print error's for no other reason than me learning new functions or just getting distracted and using a certain one once and forgetting it there but although the answer could extend to it lets only focus on the variable pop for now
At the time i was pretty sure that the pointer "pop" would be used in the future as a parameter of a function which typically prefers png_bytep or in this case png_const_bytep and furthermore i the library has their own function for allocating memory although from what I've seen its not much different in its manner than using malloc, (although i haven't read the manual for the specific implementation and only know a few generic theoretical concepts about the designation(if you haven't figured it out, its PNG).
Now lets focus on size_t readnum, this part goes against what i said earlier since im using another alias when i could've just said uint64_t, the thing is that if im not wrong size_t can store the maximum size of a theoretically possible object of any type (including array).(although there's types for bigger "words", like long double, which for what I've read is 10 bytes long, or __uint128_t which 16 bytes these aren't as easily used by the cpu and they require optimizations on software to be able to store and manipulate them, and again correct me here if im wrong) so i shouldn't use a uint8_t here because i know that if someday cpu manufacturers decide to increase the bit size, of the register storage, ALU, and memory adresses(i heard my professor even say that modern intel ones sometimes had around 80 bits) to something like 128 bits the size_t would move and my function could fail if it ever overflowed. This caused me to not use uint64_t
Conclusion
So should i just quit using my preferences of exact variable size and instead use the variable aliases that the functions use, or is using exact size alias really better as they allow to define certain behavior more precisely
As a general rule, use the exact width types when you need to control exactly how big a particular variable is, using any kind of bit manipulation, or when dealing with raw data. You would use the base types when interacting with an API that specifies those types, i.e. if a function expects a long, pass it a long instead of a uint64_t.
Regarding the specific warning you're getting, that's because the type char may be either signed or unsigned, depending on the implementation. So if you were to pass a int8_t * to a function expecting a char *, and that system happens define char as an unsigned type, then you have a mismatch.
This isn't a big deal however, since it's allowed to access an integer object via a pointer to either the signed or unsigned version of its type.
First of all, why are you trying to cast an int8_t * into a char *(aka uint8_t *)? The warning is for your code's safety. It's not because you're using aliases but because you're implicitly casting a signed integer type into an unsigned one.
In practice, you should use whatever your API expects. And, if you're not using any other library or stuff like that, use whatever suits you the best. If you want to be specific about the size, use the stdint.h values and if you want to be vague about the sizes(for eg. you want to second biggest type) use the normal C keywords. Beware that the "second biggest" type may be the same as the biggest or the third biggest.

Sub-string representation — length or pointer to last byte?

Imagine parsing a string and wanting to extract a sub-string. To represent this sub-string, I see two ways:
// 1. represent it using a start pointer and a length
struct { char *start; size_t length; };
// 2. represent it using two pointers, start and end
struct { char *start; char *end; };
// or it could as well be returned by a function:
char *find_substring(char *s, size_t s_len, size_t *substring_len);
char *find_substring(char *s, size_t s_len, char **substring_end);
Is there a reason to prefer one form over the other? Is it only down to preferences? I don't see it affecting performances as one can be translated into the other using a simple addition/subtraction but I might be wrong.
The context is an HTTP request parser if that changes anything. I used the first one but I'm curious if the second one brings anything to the table as I have seen it used in picohttpparser.
Is there a reason to prefer one form over the other?
One could choose optimization and speed of execution as the measure of preference over on the other.
If more often you append data on the end, then *end++ would be faster over start[length++].
If more often you get the length of the string, then just length would be faster then end - start.
Remember about rules of optimization. The only real answer comes from profiling your code.
Is it only down to preferences?
I advise to prefer the more appropriate representation to the problem you are trying to model, based on how readable it is, how easy it is to use it and find bugs in it, which comes down to personal preference.
We could also inspect existing implementations. In C, all(all?) C standard functions and in POSIX like strbuf, aiocb, XSI messages queues, iovec use pointer+integer to represent a memory region. I think all C++ implementations of std::vector, like glibc std::vector or llvm vector, use pointers internally, but one can expect they be optimized for push_back() operations.
Generally I lean over to use pointers. When operating on size_t you have to handle overflow and underflow and negative/too big values or converting from pointer difference ptrdiff_t to size_t. Such problems kind-of disappear with pointers - a pointer is either valid, or not, you need only bound check using < > operators if you may in-/decrement it or not. However when writing an external api, I would use size_t, as C programmers are used to represent memory region using that convention.
In most cases this would be down to personal preference. I guess most people choose the first representation. But depending what you plan to do with that substring the second implementation may be better performance-wise.
With the second implementation you have to be specific about where end points to: is it the last character still in substring or the first character beyond substring.
The first way is preferred way. For example, consider that you have to deal with very large strings. Then it will not remain a simple allocation of bytes. In that case you have to represent it in a more complicated manner.
The second way leaks the information about internal representation of string while the first does not.

Why short* instead of char* for string? Difference between char* and unsigned char*?

As the title says, I'm having two questions.
Edit: To clarify, they don't actually use char and short, they ensure them to be 8-bit and 16-bit by specific typedefs. The actual type is then called UInt8 and UInt16.
1. Question
The iTunes SDK uses unsigned short* where a string is needed. What are the advantages of using it instead of char*/unsigned char*? How to convert it to char*, and what differs when working with this type instead?
2. Question
I've only seen char* when a string must be stored, yet. When should I use unsigned char* then, or doesn't it make any difference?
unsigned short arrays can be used with wide character strings - for instance if you have UTF-16 encoded texts - although I'd expect to see wchar_t in those cases. But they may have their reasons, like being compatible between MacOS and Windows. (If my sources are right, MacOS' wchar_t is 32 bits, while Windows' is 16 bits.)
You convert between the two types of string by calling the appropriate library function. Which function is appropriate depends on the situation. Doesn't the SDK come with one?
And char instead of unsigned char, well, all strings have historically always been defined with char, so switching to unsigned char would introduce incompatibilities.
(Switching to signed char would also cause incompatibilities, but somehow not as many...)
Edit Now the question has been edited, let me say that I didn't see the edits before I typed my answer. But yes, UInt16 is a better representation of a 16 bit entity than wchar_t for the above reason.
1. Question - Answer
I would suppose that they use unsigned short* because they must be utilizing UTF-16 encoding for unicode characters and hence representing characters both in and out of the BMP. The rest of your question depends on the type of Unicode encoding of the source and the destination (UTF-8,16,32)
2. Question - Answer
Again depends on the type of encoding and what strings are you talking about. You should never used signed or unsigned characters if you plan to deal with strings of characters outside of the Extended ASCII table. (Any other language except from English)
Probably a harebrained attempt to use UTF-16 strings. C has a wide character type, wchar_t and its chars (or wchar_ts) can be 16 bits long. Though I'm not familiar enough with the SDK to say why exactly they went through this route, it's probably to work around compiler issues. In C99 there are much more suitable [u]int[least/fast]16_t types - see <stdint.h>.
Note that C makes very little guarantees about data types and their underlying sizes. Signed or unsigned shorts aren't guaranteed to be 16 bits (though they are guaranteed to be at least that much), nor are chars restricted to 8 or widechars 16 or 32.
To convert between char and short strings, you'd use the conversion functions provided by the SDK. You could also write your own or use a 3rd party library, if you knew exactly what they stored in those short strings AND what you wanted in your char strings.
It doesn't really make a difference. You'd normally convert to unsigned char if you wanted to do (unsigned) arithmetic or bit manipulation on a character.
Edit: I wrote (or started writing, anyhow) this answer before you told us they used UInt16 and not unsigned short. In that case there are no hare brains involved; the proprietary type is probably used for compatibility with older (or noncompliant) compilers which don't have the stdint types, to store UTF-16 data. Which is perfectly reasonable.

Why can't C constant be stored in short type

As the title implies, I don't understand why it is like that.
The Code:
#include <stdio.h>
#define try 32
int main(void)
{
printf("%ld\n" , sizeof try);
return 0;
}
The Question:
1.) When I try using sizeof operator to get the size of storage the constant try is stored in, I get 4 which is 32-bits.
2.) Why doesn't C store it in 16-bits short since it is large enough to hold it.
3.) Is there any ways to make a constant be stored in short type?
Thank you for reading my question, Much appreciated.
You misunderstand the C preprocessor. The C preprocessor simply performs string substitution - it has no knowledge of types. The '32' will get interpreted according to the language rules applying to the context where it gets inserted - typically, as an int.
To make your defined value always be seen as a short, you could do:
#define try ((short)32)
See also: How do I write a short literal in C++?
Declared constants are probably what you are looking for:
const short try = 32;
This is typically preferred as I believe without optimizations (or maybe even with) the compiler will not try to fit any data type into the smallest number of addressable bytes.
What you call a constant definition is in fact a macro definition that says the label try will be replaced by 32 at preprocessing time.
So the compiler will understand it sizeof(32) which does not have any type specified. So it gets the default types which is system dependent.
To make a short constant you will have either to cast your value in the defined macro or initialize it as a global one like follow:
const short try = 32;
I will take a crack at this, but I'm not entirely sure if I'm correct
Since #define is just telling the preprocessor to replace try with your number, it's acting as if it was just a value. On your specific machine, that value is being stored in a 4 byte space in memory. The only solution to this I can think of is to use a machine with an architecture that uses a default 16 bit size for an int.
Even if it was a short, it might still occupy 4 bytes in memory because of alignment, but it really depends.

strlen to return size_t?

In C:
My string length function is returning a size_t value?
Why is it not returning a integer which is conventional? And one more thing I noticed was that when I was trying concatenate this string with another string I received a bus error when I ran the program.
Context: I was kind of playing with gmp library and converting big numbers to strings and I end up with the above situation.
What kind of a string is that? Is my operating system playing a role in this issue? I use a MAC, 64-bit OS.
Edited: The error message I received was:
: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
Thanks!
#all: Thanks for the answers but I thought I will put the bus error as another question because it seems to be a different issue.
The problem is int might be not wide enough to store the whole range of possible length values. For example on 64-bit you can have a string longer than 4 gigabytes and if int is 32 bit you can't possibly return length of such a long string via an int variable.
strlen() always returned size_t ... and the POSIX standard also says that.
I guess the reason is that int has sign and the capacity of even an unsigned int might not be enough for holding size of an element (say if you have a 32bit int on x86-64 with 16GB RAM) ... the example is extreme, but possible.
POSIX strlen() does return size_t.
As to what's caused the bus error, it's impossible to say without seeing the code and knowing more details about the exact nature of your changes. One possibility is that you've caused a buffer overrun or did something with a NULL pointer you shouldn't have done.
To address your warning (which is actually an error - you've invoked undefined behavior by passing the wrong type to printf) you should use %zu rather than %d for printing size_t values.
strlen() returns a size_t since at least ISO C90 -- I just checked in my copy. And this standard should have no technical difference with ANSI C89.
There was a change of convention (size_t wasn't in K&R C), but it was a long time ago.
There is a very simple and logical reason for all of the functions from the standard library to work with size_t when it comes to lengths of memory blocks - the built-in sizeof operator yields a size_t result as well.
Moreover, size_t is unsigned, of a particular size, tied to the architecture and is semantically different than just a generic int which is meant for storing any number from the count of trees around your office to your SO reputation.

Resources