Good Practice to use integer array to store characters in C

Good Practice to use integer array to store characters in C - c

I am currently learning to program in C and am referring the book by Kernighan and Ritchie. In order to better grasp arrays especially when inputting characters I have searched the internet and noticed that a lot of solutions use an integer array instead of a character array during input. I know that characters are in a sense integers and have lower precedence, so my question is very simple, are there any major cons to declaring an integer array and using it to store characters? In addition, is it good practice to do this as the title suggests?

No, it would be pretty bad practice, in my opinion, since:
Integers are not characters, in general. That is "an array of integer" doesn't communicate as well to the reader what is going on, as "an array of character" does.
You can't use the standard library's string functions, which is sad since an array of characters can often be treated as a string.
It wastes memory.
It would be interesting if you had posted some code which you feel is representable of doing this. I don't think it happens in K&R.

C has various types that can store integers. (Types don't have precedence - perhaps you meant size.) Sometimes, an integer of one byte is enough to store one character of a specific text (if it's known ahead of time that the text will contain a limited range of characters).
In modern systems and applications, support for more than 256 characters is expected or required, so wider character types are often employed. For these types, there are usually library functions that perform operations comparable to those of the char processing functions.
The int type, however, is not one of these types and is generally not intended to be used like that.

Related

What does the following mean in context of C programming language?

From Modern C by Jens Gustedt,
Representations of values on a computer can vary “culturally” from architecture to architecture or are determined by the type the programmer gave to the value. Therefore, we should try to reason primarily about values and not about representations if we want to write portable code.
If you already have some experience in C and in manipulating bytes and bits, you will need to make an effort to actively “forget” your knowledge for most of this section. Thinking about concrete representations of values on your computer will inhibit you more
than it helps.
Takeaway - C programs primarily reason about values and not about their representation.
Question 1: What kind of 'representations' of values, is author talking about? Could I be given an example, where this 'representation' varies from architecture to architecture and also an example of how representations of values are determined by type programmer gave to value?
Question 2: What's the purpose of specifying a data type in C language, I mean that's the rule of the language but I have heard that's how a compiler knows how much memory to allocate to an object? Is that the only use, albeit crucial? I've heard there isn't a need to specify a data type in Python.

What kind of 'representations' of values, is author talking about?
https://en.wikipedia.org/wiki/Two%27s_complement vs https://en.wikipedia.org/wiki/Ones%27_complement vs https://en.wikipedia.org/wiki/Offset_binary. Generally https://en.wikipedia.org/wiki/Signed_number_representations.
But also the vast space of floating point number formats https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers - IEEE 745, minifloat, bfloat16, etc. etc. .
Could I be given an example, where this 'representation' varies from architecture to architecture
Your PC uses twos complement vs https://superuser.com/questions/1137182/is-there-any-existing-cpu-implementation-which-uses-ones-complement .
Ach - but of course, most notably https://en.wikipedia.org/wiki/Endianness .
also an example of how representations of values are determined by type programmer gave to value?
(float)1 is represented in IEEE 745 as 0b00111111100000000000000000000000 https://www.h-schmidt.net/FloatConverter/IEEE754.html .
(unsigned)1 with 32-bit int is represented as 0b00.....0001.
What's the purpose of specifying a data type in C language,
Use computer resources efficiently. There is no point in reserving 2 gigabytes to store 8-bits of data. Type determines the range of values that can be "contained" in a variable. You communicate that "upper/lower range" of allowed values to the compiler, and the compiler generates nice and fast code. (There is also ADA where you literally specify the range of types, like type Day_type is range 1 .. 31;).
Programs are written using https://en.wikipedia.org/wiki/Harvard_architecture . Variables at block scope are put on stack https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Hardware_stack . The idea is that you have to know in advance how many bytes to reserve from the stack. Types communicate just that.
have heard that's how a compiler knows how much memory to allocate to an object?
Type communicates to the compiler how much memory to allocate for an object, but it also communicates the range of values, the representation (float vs _Float32 might be similar, but be different). Overflowing addition of two int's is invalid, overflowing addition of two unsigned is fine and wraps around. There are differences.
Is that the only use, albeit crucial?
The most important use of types is to clearly communicate the purpose of your code to other developers.
char character;
int numerical_variable;
uint_least8_t variable_with_8_bits_that_is_optimized_for_size;
uint_fast8_t variable_with_8_bits_that_is_optimized_for_speed;
wchar_t wide_character;
FILE *this_is_a_file;
I've heard there isn't a need to specify a data type in Python.
This is literally the difference between statically typed programming languages and dynamically typed programming languages. https://en.wikipedia.org/wiki/Type_system#Type_checking

C Programming integer size limits

I am a student currently learning the C programming language through a book called "C Primer Plus, 5th edition". I am learning it because I am pursuing a career in programming for embedded systems and devices, device drivers, low-level stuff, etc. My question is very simple, but I have not yet gotten a straight answer from the textbook & from various posts on SO that are similar to my question.
How do you determine the size of integer data types like SHORT, INT, or LONG? I know that this is a simple question that has been asked a lot, but everyone seems to answer the question with "depends on architecture/compiler", which leaves me clueless and doesn't help someone like me who is a novice.
Is there a hidden chart somewhere on the internet that will clearly describe these incompatibilities or is there some numerical method of looking at a compiler (16-bit, 24-bit, 32-bit, 64-bit, etc) and being able to tell what the data type will be? Or is manually using the sizeof operator with a compiler on a particular system the only way to tell what these data types will hold?

You just need the right docs, in your case you need the document that defines the standard, and you should name at least 1 version of it while asking this kind of questions; for example the C99 is one of the most popular version of the language and it's defined in the ISO-IEC 9899-1999 document.
The C standard doesn't define the size in absolute terms, it goes more for a minimum size expressed in bytes, and sometimes not even that.
The notable exception is char, which is a type that is guaranteed to be 1 byte in size, but here it is another potential pitfall for you, the C standard doesn't defines how big a byte is, so it says that char is 1 byte, but you can't say anything for sure without knowing your platform.
You always need to know both the standard and your platform, if you want to do this programmatically there is the limits.h header with macros for your platform .

You're looking for limits.h. It defines various macros such as INT_MAX (the maximum value of type int) or CHAR_BIT (the number of bits in a char). You can use these values to calculate the size of each type.

performance of pointer comparison vs string comparison strcmp

I have the choice to do either a pointer comparison or a strcmp.
I know that the string is never longer than 8 characters and I'm targeting a 64 bit platform.
will they perform equally well or will one of them be a better choice? I can imagine that this might differ between platforms and compilers, if so, I'd like to know the details about the platform/compiler specifics.
gr,
Coen

A pointer comparison will almost certainly be faster, as it is a single comparison of two pointers (possibly loading one or both into registers), whereas strcmp, even if inlined and the first bytes differ (best case) will require dereferencing both pointers. If strcmp isn't inlined then there's a function call and return, and if the first bytes don't differ (and aren't both NUL) then there are multiple dereferences.
For more insight into this, I suggest looking at the assembler output of your program using both methods.
Note: I'm assuming that your claim "I have the choice to do either a pointer comparison or a strcmp" is correct, which will only be the case if your strings are all known to have unique content.

The first question should be: Is this comparison the critical path in my executable? If not, the performance question might be irrelevant, because the impact may be so minor that it doesn't matter.
Comparing the pointers is only a subset of strcmp, because you don't know if the stringvalue is the same if the happen to be in different memory locations. You may have to consider that, in your design.
A pointer comparison is certainly faster. However, if you have a guaruanteed string length of 8 bytes, you may compare the strings without strcmp and use a datatype that has an 8 byte length and can be compared directly. This way you have basically a similar speed as a pointer comparison AND also compare the strings as well. But of course, this would only be reliable if you make sure that all strings are 8 bytes, and if they are shorter, you fill the remainder with zeroes.

Two strings (even short ones of 8 char) can be equal but at different addresses, so comparing pointers is not the same as using strcmp.
But your application might do hash-consing or string-interning, that is have a canonical string (e.g. like Glib quarks)
And you should not bother that much about performance, unless you measure it. Notice that some compilers (with high-enough optimization levels) are able to optimize quite well strcmp calls.
addenda
If your strings are not real arbitrary strings but 8 bytes, you might declare them with a union (which the compiler will suitably align and perhaps optimize).
typedef union {
char eightbytes[8];
int64_t sixtyfourbits;
} mytype_t;
then you might initialize
mytype_t foo = {.eightbytes="Foo"};
If you are sure that the strings are 0 byte padded (like the above initialization do; but if you heap allocate them, you need to zero them before filling e.g. with strncpy(p->eightbytes, somestring, 8) etc...), you could compare foo.sixtyfourbits == foo2.sixtyfourbits ...
But I find such code exceedingly bad taste. If you really want to code this way, add a lot of explanatory comments. I believe that coding this way makes your code unreadable and unmaintainable, for a probably very tiny performance benefit.

What typing system does BASIC use?

I've noticed that nowhere I can find can give me a definitive answer to the question above. I first wondered about this when I noticed that you never had to state the type of variable in QBasic when declaring them, although you could add a suffix to the name of a variable to make sure it was of a particular type.
Also, as some dialects of BASIC are interpreted and others compiled, does this affect the typing system?

There are so many flavors of BASIC, some only historical and some still in use, that it's impossible to give one true answer.
Some of the old BASICs (the line numbered BASICs) had two data types: String, or Integer. The original BASIC that shipped with Apple-II computers was an "Integer BASIC." Later BASICs introduced Floating Point, which was often single precision FP. The BASIC that shipped with the TI-99/4a was an example of an early-80's floating point BASIC. "Way back when", you would make a string literal with quotes, and a string variable with a $ sigil following the identifier name. Variables that didn't have the $ sigil would usually default to the type of numeric variable that the given flavor of basic supported (Integer or Floating Point). GWBasic, for example, would default to floating point unless you specified the % sigil, which meant "Integer". TI Extended Basic didn't have an integer type, but the floating point numeric type had something like 15 significant digits, if I recall (floating point math errors not withstanding).
These early basics were essentially statically typed, though the distinction was far less useful than in more powerful languages. The choices for data types were few: String, Number (sometimes Int, sometimes FP), and sometimes with the ability to specify whether a number as Int or FP. Behind the scenes some even freely converted between ints and floating point as necessary. Often such behind the scenes conversions were not well documented.
But that was the state of affairs in the '80s, when everyone with a home computer was a hobbiest, and standards were loose. Every hardware manufacturer seemed to have their own take on how BASIC should work.
More modern BASICs are more powerful, and allow for tighter control over variable types (when needed).

Earlier dialects of BASIC were always statically typed. Numeric variables, string variables and arrays each required different syntax. Also length of names was often limited to just one symbol. Most often used syntax was just V for numeric, V$ for string and arrays were separately declared with DIM.
Since I didn't program in BASIC for good 15 years, I can't say for sure what is going on in modern dialects.

The enhanced version of BASIC used in MultiValue Database systems uses dynamic typing. This means that the compiler decides how to treat your variable based on the logic and context of the statements.
Anything in double quotes is a string and any numeric value not in double quotes is a number. For writing numeric data away in the form of doubles or floats there are various format expressions you can use to achieve this, which you apply to your variables.
Ultimately everything is saved at database level as an ASCII string. So the developer enforces the type at business logic level, as opposed to the database enforcing it.

Why isn't there an array length function in C?

While there are various ways to find the length of an array in C, the language doesn't provide one.
What was the reason for not including such a common operation in C or any of its revisions?

One of the guiding philosophies of C's design is that all data types map directly to memory, and attempting to store metadata for array types such as length runs counter to that philosophy.
From an article by Dennis Ritchie describing the development of C, we find this:
Embryonic C
...
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
Emphasis mine. Just replace the term "pointer" with "metadata" in the passage above, and I think we have the answer to your question.

Unless someone here happens to be on the C standard committee, you're unlikely to get an authoritative answer. But two reasons I can think of:
In many (most?) situations, you don't have an array, you just have a pointer.
Storing metadata about the array increases the storage size, etc. The general rule of C is that you don't pay for what you don't use.

C is not object-oriented, so it has no concept of methods that are attached to objects. It was designed with speed and simplicity in mind, and the common idiom sizeof (array) / sizeof(array[0]) is short and straightforward.

It is down to efficiency. C is a very efficient programing language.

Array syntax in C is just a syntactic sugar for pointer arithmetic. If you want to have a real array with length and bounds checking, you can create a struct which contains an array pointer and its length and access it only through functions which check bounds.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight