Standard C function names - c

Is there any rationale for the abbreviated way standard C functions are written? For example, malloc() is short for 'memory allocation'. sprintf() is 'string print formatted'. Neither of these names are very good at telling you what the function actually does. It never occurred to me how terrible some of these abbreviated function names are until recently when I had to teach a new intern many of these functions.
When the language was being developed, was there any reason malloc() was chosen over memAllocate() or something similar? My best guess would be that they more closely resemble UNIX commands, but that doesn't feel like the right answer.

Check out http://publications.gbdirect.co.uk/c_book/chapter2/keywords_and_identifiers.html -
The problem is that there was never any guarantee that more than a
certain number of characters would be checked when names were compared
for equality—in Old C this was eight characters, in Standard C this
has changed to 31.
Basically, in the past (long while back) you could only count on the the first eight characters for uniqueness in a function name. So, you end up with a bunch of short names for the core functions.

As Neal Stephenson wrote about Unix in In the Beginning Was the Command Line,
Note the obsessive use of abbreviations and avoidance of capital letters; this is a system invented by people to whom repetitive stress disorder is what black lung is to miners. Long names get worn down to three-letter nubbins, like stones smoothed by a river.
The first version of Unix and the first C compiler were written using versions of ed. Not vi, not emacs, not anything resembling an IDE, but the line-based ed. There comes a point where reducing the number of keystrokes really does increase the number of SLOC you can write per day, when you're inventing something brand-new and writing it for the first time.

The historical justification is of course that historically the C standard only required implementations to distinguish the initial 6 characters of external identifier names. This allowance was removed in C99. However, users of the C language generally:
Aim to write source code in such a way that it fits in a reasonable number of columns, usually 80 or fewer, which is difficult with long identifier names.
Type identifier names with a keyboard, which is difficult and a waste of time when the identifiers are long.
Tend to prefer high information density and signal-to-noise ratio in source code.

Related

How to definitely solve the scanf input stream problem

Suppose I want to run the following C snippet:
scanf("%d" , &some_variable);
printf("something something\n\n");
printf("Press [enter] to continue...")
getchar(); //placed to give the user some time to read the something something
This snippet will not pause! The problem is that the scanf will leave the "enter" (\n)character in the input stream1, messing up all that comes after it; in this context the getchar() will eat the \n and not wait for an actual new character.
Since I was told not to use fflush(stdin) (I don't really get why tho) the best solution I have been able to come up with is simply to redefine the scan function at the start of my code:
void nsis(int *pointer){ //nsis arconim of: no shenanigans integer scanf
scanf("%d" , pointer);
getchar(); //this will clean the inputstream every time the scan function is called
}
And then we simply use nsis in place of scanf. This should fly. However it seems like a really homebrew, put-together-with-duct-tape, solution. How do professional C developers handle this mess? Do they not use scanf at all? Do they simply accept to work with a dirty input stream? What is the standard here?
I wasn't able to find a definite answer on this anywhere! Every source I could find mentioned a different (and sketchy) solution...
EDIT: In response to all commenting some version of "just don't use scanf": ok, I can do that, but what is the purpose of scanf then? Is it simply an useless broken function that should never be used? Why is it in the libraries to begin with then?
This seems really absurd, especially considering all beginners are taught to use scanf...
[1]: The \n left behind is the one that the user typed when inputting the value of the variable some_variable, and not the one present into the printf.
but what is the purpose of scanf then?
An excellent question.
Is it simply a useless broken function that should never be used?
It is almost useless. It is, arguably, quite broken. It should almost never be used.
Why is it in the libraries to begin with then?
My personal belief is that it was an experiment. It tries to be the opposite of printf. But that turned out not to be such a good idea in practice, and the function never got used very much, and pretty much fell out of favor, except for one particular use case...
This seems really absurd, especially considering all beginners are taught to use scanf...
You're absolutely right. It is really quite absurd.
There's a decent reason why all beginners are taught to use scanf, though. During week 1 of your first C programming class, you might write the little program
#include <stdio.h>
int main()
{
int size = 5;
for(int i = 0; i < size; i++) {
for(int j = 0; j < size; j++)
putchar('*');
putchar('\n');
}
}
to print a square. And during that first week, to make a square of a different size, you just edit the line int size = 5; and recompile.
But pretty soon — say, during week 2 — you want a way for the user to enter the size of the square, without having to recompile. You're probably not ready to muck around with argv. You're probably not ready to read a line of text using fgets and convert it back to an integer using atoi. (You're probably not even ready to seriously contemplate the vast differences between the integer 5 and the string "5" at all.) So — during week 2 of your first C programming class — scanf seems like just the ticket.
That's the "one particular use case" I was talking about. And if you only used scanf to read small integers into simple C programs during the second week of your first C programming class, things wouldn't be so bad. (You'd still have problems forgetting the &, but that would be more or less manageable.)
The problem (though this is again my personal belief) is that it doesn't stop there. Virtually every instructor of beginning C classes teaches students to use scanf. Unfortunately, few or none of those instructors ever explicitly tell students that scanf is a stopgap, to be used temporarily during that second week, and to be emphatically graduated beyond in later weeks. And, even worse, many instructors go on to assign more advanced problems, involving scanf, for which it is absolutely not a good solution, such as trying to do robust or "user friendly" input validation.
scanf's only virtue is that it seems like a nice, simple way to get small integers and other simple input from the user into your early programs. But the problem — actually a big, shuddering pile of 17 separate problems — is that scanf turns out to be vastly complicated and full of exceptions and hard to use, precisely the opposite of what you'd want in order to make things easy for beginners. scanf is only useful for beginners, and it's almost perfectly useless for beginners. It has been described as being like square training wheels on a child's bicycle.
How do professional C developers handle this mess?
Quite simply: by not using scanf at all. For one thing, very few production C programs print prompts to a line-based screen and ask users to type something followed by Return. And for those programs that do work that way, professional C developers unhesitatingly use fgets or the like to read a full line of input as text, then use other techniques to break down the line to extract the necessary information.
In answer to your initial question, there's no good answer. One of the fundamental rules of scanf usage (a set of rules, by the way, that no instructor ever teaches) is that you should never try to mix scanf and getchar (or fgets) in the same program. If there were a good way to make your "Press [enter] to continue..." code work after having called scanf, we wouldn't need that rule.
If you do want to try to flush the extra newline, so that a later call to getchar might work, there are several questions here with a bunch of good answers:
scanf() leaves the newline character in the buffer
Using fflush(stdin)
How to properly flush stdin in fgets loop
There's one more unrelated point that ends up being pretty significant to your question. When C was invented, there was no such thing as a GUI with multiple windows. Therefore no C programmer ever had the problem of having their output disappear before they could read it. Therefore no C programmer ever felt the need to write printf("Press [enter] to continue..."); followed by getchar(). I believe (another personal belief) that it is egregiously bad behavior for any vendor of a GUI-based C compiler to rig things up so that the output disappears upon program exit. Persistent output windows ought to be the default, for the benefit of beginning C programmers, with some kind of non-default option to turn that behavior off for those who don't want it.
Is scanf broken? No it is not. It is an excellent input function when you want to parse free form input data where few errors are to be expected. Free form means here that new lines are not relevant exactly as when you read/write very long paragraphs on a normal screen. And few errors expected is common when you read from files.
The scanf family function has another nice point: you have the same syntax when reading from the standard input stream, a file stream or a character string. It can easily parse simple common types and provide a minimal return value to allow cautious programmers to know whether all or part of all the expected data could be decoded.
That being said, it has major drawbacks: first being a C function, it cannot directly control whether the programmer has passed types meeting the format specifications, and second, as beginners are not consistenly hit on their head when they forget to control its return value, it is really too easy to make fully broken programs using it.
But the rule is:
if input is expected to be line oriented, first use fgets to get lines and then sscanf testing return values of both
only if input is expect to be free form (irrelevant newlines), scanf should be used directly. But never without testing its return value except for trivial tests.
Another drawback is that beginners hope it to be clever. It can indeed parse simple input formats, but is only a poor man's parser: do not use it as a generic parser because that is not what it is intended for.
Provided those rules are observed, it is a nice tool consistent with most of C language and its standard library: a simple tool to do simple things. It is up to programmers or library implementers to build richer tools.
I have only be using C language for more than 30 years, and was never bitten by scanf (well I was when I was a beginner, but I now know that I was to blame). Simply I have just tried for decades to only use it for what it can do...

Display-width of multibyte character in C standard library – how accurate is the database?

The wcwidth call of Standard C Library returns 2 for Asian characters. Then there are Unicode symbols, like arrows. For those it returns 1. It is often the case that character is wider than single column, yet the library isn't wrong, because terminals print them at single column and allow visual overlapping, sometimes giving not bad results, like for ndash "–".
Are there characters that plainly suffer? I wonder how Asian people and people from other regions use terminals, what solutions have they developed. For example displaying a shell prompt that spans whole line and contains current directory name can be a serious problem. Can be wcwidth patched to obtain better results? Using github/wcwidth.c as a starting point, for example.
There are differences with the ambiguous-width characters. xterm has both Markus Kuhn's original (the link you show appears to be his, with the comment-header removed), as well as an alternate version with adjustments to accommodate CJK (East Asian). Besides that, it checks at startup for usable system locale tables. Some are good enough; others are not. No one's done a systematic (unbiased) survey of what's actually good (you may see some opinions on that aspect, offered as answers).

What must I know to handle UTF-8 in my C program?

I have a C program that now I need to do support to UTF-8 characters. What must I know in order to perform that? I've always hear how problematic is handle it in a C/C++ environment. Why exactly is it problematic? How does it differ from an usual C character, also its size? Can I do it without any operating system help, in pure C and still make it portable? what else I should have asked but I didn't? what I'm looking for implement is it: The characters are a name with accents(like french word: résumé) that I need to read it and put into a symbol table and then search and print them from a file. It's part of my configuration file parsing(very much .ini-like)
There's an awesome article written by Joel Spolsky, one of the Stack Overflow creators.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Apart from that, you might want to query some other Q&A's regarding this subject, like Handling special characters in C (UTF-8 encoding).
As cited in the aforementioned Q&A, Tips on Using Unicode with C/C++ might give you the basics.
Two good links that i have used in the past:
The-Basics-of-UTF8
reading-unicode-utf-8-by-hand-in-c
valter

How do I align a number like this in C?

I need to align a series of numbers in C with printf() like this example:
-------1
-------5
------50
-----100
----1000
Of course, there are numbers between all those but it's not relevant for the issue at hand... Oh, consider the dashes as spaces, I used dashes so it was easier to understand what I want.
I'm only able to do this:
----1---
----5---
----50--
----100-
----1000
Or this:
---1
---5
--50
-100
1000
But none of this is what I want and I can't achieve what is displayed on the first example using only printf(). Is it possible at all?
EDIT:
Sorry people, I was in a hurry and didn't explain myself well... My last example and all your suggestions (to use something like "%8d") do not work because, although the last number is 1000 it doesn't necessarily go all the way to 1000 or even 100 or 10 for that matter.
No matter the number of digits to be displayed, I only want 4 leading spaces at most for the largest number. Let's say I have to display digits from 1 to 1000 (A) and 1 to 100 (B) and I use, for both, "%4d", this would be the output:
A:
---1
....
1000
Which is the output I want...
B:
---1
....
-100
Which is not the output I want, I actually want this:
--1
...
100
But like I said, I don't know the exact number of numbers I have to print, it can have 1 digit, it can have 2, 3 or more, the function should be prepared for all. And I want four extra additional leading spaces but that's not that relevant.
EDIT 2:
It seems that what I want, the way I need it, it's not possible (check David Thornley and Blank Xavier answers and my comments). Thank you all for your time.
Why is printf("%8d\n", intval); not working for you? It should...
You did not show the format strings for any of your "not working" examples, so I'm not sure what else to tell you.
#include <stdio.h>
int
main(void)
{
int i;
for (i = 1; i <= 10000; i*=10) {
printf("[%8d]\n", i);
}
return (0);
}
$ ./printftest
[ 1]
[ 10]
[ 100]
[ 1000]
[ 10000]
EDIT: response to clarification of question:
#include <math.h>
int maxval = 1000;
int width = round(1+log(maxval)/log(10));
...
printf("%*d\n", width, intval);
The width calculation computes log base 10 + 1, which gives the number of digits. The fancy * allows you to use the variable for a value in the format string.
You still have to know the maximum for any given run, but there's no way around that in any language or pencil & paper.
Looking this up in my handy Harbison & Steele....
Determine the maximum width of fields.
int max_width, value_to_print;
max_width = 8;
value_to_print = 1000;
printf("%*d\n", max_width, value_to_print);
Bear in mind that max_width must be of type int to work with the asterisk, and you'll have to calculate it based on how much space you're going to want to have. In your case, you'll have to calculate the maximum width of the largest number, and add 4.
printf("%8d\n",1);
printf("%8d\n",10);
printf("%8d\n",100);
printf("%8d\n",1000);
[I realize this question is a million years old, but there is a deeper question (or two) at its heart, about OP, the pedagogy of programming, and about assumption-making.]
A few people, including a mod, have suggested this is impossible. And, in some--including the most obvious--contexts, it is. But it's interesting to see that that wasn't immediately obvious to the OP.
The impossibility assumes that the contex is running an executable compiled from C on a line-oriented text console (e.g., console+sh or X-term+csh or Terminal+bash), which is a very reasonable assumption. But the fact that the "right" answer ("%8d") wasn't good enough for OP while also being non-obvious suggests that there's a pretty big can of worms nearby...
Consider Curses (and its many variants). In it, you can navigate the "screen", and "move" the cursor around, and "repaint" portions (windows) of text-based output. In a Curses context, it absolutely would be possible to do; i.e., dynamically resize a "window" to accommodate a larger number. But even Curses is just a screen "painting" abstraction. No one suggested it, and probably rightfully so, because a Curses implementation in C doesn't mean it's "strictly C". Fine.
But what does this really mean? In order for the response: "it's impossible" to be correct, it would mean that we're saying something about the runtime system. In other words, this isn't theoretical, (as in, "How do I sort a statically-allocated array of ints?"), which can be explained as a "closed system" that totally ignores any aspect of the runtime.
But, in this case, we have I/O: specifically, the implementation of printf(). But that's where there's an opportunity to have said something more interesting in response (even though, admittedly, the asker was probably not digging quite this deep).
Suppose we use a different set of assumptions. Suppose OP is reasonably "clever" and understands that it would not be possible to to edit previous lines on a line-oriented stream (how would you correct the horizontal position of a character output by a line-printer??). Suppose also, that OP isn't just a kid working on a homework assignment and not realizing it was a "trick" question, intended to tease out an exploration of the meaning of "stream abstraction". Further, let's suppose OP was wondering: "Wait...If C's runtime environment supports the idea of STDOUT--and if STDOUT is just an abstraction--why isn't it just as reasonable to have a terminal abstraction that 1) can vertically scroll but 2) supports a positionable cursor? Both are moving text on a screen."
Because if that were the question we're trying to answer, then you'd only have to look as far as:
ANSI Escape Codes
to see that:
Almost all manufacturers of video terminals added vendor-specific escape sequences to perform operations such as placing the cursor at arbitrary positions on the screen. One example is the VT52 terminal, which allowed the cursor to be placed at an x,y location on the screen by sending the ESC character, a Y character, and then two characters representing with numerical values equal to the x,y location plus 32 (thus starting at the ASCII space character and avoiding the control characters). The Hazeltine 1500 had a similar feature, invoked using ~, DC1 and then the X and Y positions separated with a comma. While the two terminals had identical functionality in this regard, different control sequences had to be used to invoke them.
The first popular video terminal to support these sequences was the Digital VT100, introduced in 1978. This model was very successful in the market, which sparked a variety of VT100 clones, among the earliest and most popular of which was the much more affordable Zenith Z-19 in 1979. Others included the Qume QVT-108, Televideo TVI-970, Wyse WY-99GT as well as optional "VT100" or "VT103" or "ANSI" modes with varying degrees of compatibility on many other brands. The popularity of these gradually led to more and more software (especially bulletin board systems and other online services) assuming the escape sequences worked, leading to almost all new terminals and emulator programs supporting them.
It has been possible, as early as 1978. C itself was "born" in 1972, and the K&R version was established in 1978. If "ANSI" escape sequences were around at that time, then there is an answer "in C" if we're willing to also stipulate: "Well, assuming your terminal is VT100-capable." Incidentally, the consoles which don't support ANSI escapes? You guessed it: Windows & DOS consoles. But on almost every other platform (Unices, Vaxen, Mac OS, Linux) you can expect to.
TL;DR - There is no reasonable answer that can be given without stating assumptions about the runtime environment. Since most runtimes (unless you're using desktop-computer-market-share-of-the-80's-and-90's to calculate 'most') would have, (since the time of the VT-52!), then I don't think it's entirely justified to say that it's impossible--just that in order for it to be possible, it's an entire different order of magnitude of work, and not as simple as %8d...which it kinda seemed like the OP knew about.
We just have to clarify the assumptions.
And lest one thinks that I/O is exceptional, i.e., the only time we need to think about the runtime, (or even the hardware), just dig into IEEE 754 Floating Point exception handling. For those interested:
Intel Floating Point Case Study
According to Professor William Kahan, University of California at
Berkeley, a classic case occurred in June 1996. A satellite-lifting
rocket named Ariane 5 turned cartwheels shortly after launch and
scattered itself and a payload worth over half a billion dollars over
a marsh in French Guiana. Kahan found the disaster could be blamed
upon a programming language that disregarded the default
exception-handling specifications in IEEE 754. Upon launch, sensors
reported acceleration so strong that it caused a conversion-to-integer
overflow in software intended for recalibration of the rocket’s
inertial guidance while on the launching pad.
So, you want an 8-character wide field with spaces as the padding? Try "%8d". Here's a reference.
EDIT: What you're trying to do is not something that can be handled by printf alone, because it will not know what the longest number you are writing is. You will need to calculate the largest number before doing any printfs, and then figure out how many digits to use as the width of your field. Then you can use snprintf or similar to make a printf format on the spot.
char format[20];
snprintf(format, 19, "%%%dd\\n", max_length);
while (got_output) {
printf(format, number);
got_output = still_got_output();
}
Try converting to a string and then use "%4.4s" as the format specifier. This makes it a fixed width format.
As far as I can tell from the question, the amount of padding you want will vary according to the data you have. Accordingly, the only solution to this is to scan the data before printing, to figure out the widest datum, and so find a width value you can pass to printf using the asterix operator, e.g.
loop over data - get correct padding, put into width
printf( "%*d\n", width, datum );
If you can't know the width in advance, then your only possible answer would depend on staging your output in a temporary buffer of some kind. For small reports, just collecting the data and deferring output until the input is bounded would be simplest.
For large reports, an intermediate file may be required if the collected data exceeds reasonable memory bounds.
Once you have the data, then it is simple to post-process it into a report using the idiom printf("%*d", width, value) for each value.
Alternatively if the output channel permits random access, you could just go ahead and write a draft of the report that assumes a (short) default width, and seek back and edit it any time your width assumption is violated. This also assumes that you can pad the report lines outside that field in some innocuous way, or that you are willing to replace the output so far by a read-modify-write process and abandon the draft file.
But unless you can predict the correct width in advance, it will not be possible to do what you want without some form of two-pass algorithm.
Looking at the edited question, you need to find the number of digits in the largest number to be presented, and then generate the printf() format using sprintf(), or using %*d with the number of digits being passed as an int for the * and then the value. Once you've got the biggest number (and you have to determine that in advance), you can determine the number of digits with an 'integer logarithm' algorithm (how many times can you divide by 10 before you get to zero), or by using snprintf() with the buffer length of zero, the format %d and null for the string; the return value tells you how many characters would have been formatted.
If you don't know and cannot determine the maximum number ahead of its appearance, you are snookered - there is nothing you can do.
#include<stdio.h>
int main()
{
int i,j,n,b;
printf("Enter no of rows ");
scanf("%d",&n);
b=n;
for(i=1;i<=n;++i)
{
for(j=1;j<=i;j++)
{
printf("%*d",b,j);
b=1;
}
b=n;
b=b-i;
printf("\n");
}
return 0;
}
fp = fopen("RKdata.dat","w");
fprintf(fp,"%5s %12s %20s %14s %15s %15s %15s\n","j","x","integrated","bessj2","bessj3","bessj4","bessj5");
for (j=1;j<=NSTEP;j+=1)
fprintf(fp,"%5i\t %12.4f\t %14.6f\t %14.6f\t %14.6f\t %14.6f\t %14.6f\n",
j,xx[j],y[6][j],bessj(2,xx[j]),bessj(3,xx[j]),bessj(4,xx[j]),bessj(5,xx[j]));
fclose(fp);

What does the 9th commandment mean?

In The Ten Commandments for C Programmers, what is your interpretation of the 9th commandment?
The 9th commandment:
Thy external identifiers shall be unique in the first six characters, though this harsh discipline be irksome and the years of its necessity stretch before thee seemingly without end, lest thou tear thy hair out and go mad on that fateful day when thou desirest to make thy program run on an old system.
Exactly what is this all about?
Old linkers only used a limited number of characters of the symbol - I seem to recall that the old IBM mainframes I started programming in only used 8 characters. The C standards people settled on 6 characters as a "lowest common denominator", but would allow a linker to resolve longer names if they wanted to.
If you really hit one of these lowest common denominator linkers, external symbols (function names, external variables, etc) ABCDEFG and ABCDEFH would appear the same to them. Unless you're programming on really old hardware, you can safely ignore this "commandment".
Note that any linker that can't handle more than 6 characters can't do C++ either because of the name mangling.
External identifier = something that might have to be called from another system
The reason for the first six characters being unique is that an ancient system may have a six-character limitation on its' identifiers. If, one day, such a system tries to call your code, it needs to be able to tell the difference between all of your functions.
These days, this seems overly conservative to me, unless you are working with a lot of legacy systems.
Here are the minimum number of significant characters in an external identifier that must be supported by C/C++ compilers under various standards:
C90 - 6 characters
C99 - 31 characters
C++98 - 1024 characters
Here's an example of the kinds of problems that you can run into if your toolset skimps on these limits (from http://www.comeaucomputing.com/libcomo/):
Note to users with Borland and
MetroWerks CodeWarrior as backend C:
==================================================================
Note that the Borland compiler and linker, and the Metrowerks compiler,
seem to have a maximum external id length of 250 characters. It turns
out that some of the generated mangled template names are unable to
fit within that space. Therefore, when Borland or Metrowerks is used
as the backend C compiler, we have remapped some of the names libcomo
uses to shorter names. So short in fact we could not get away with
names beginning with underscores. In fact, it was necessary to map most
to 2 character id names.
In response to:
C++98 - 1024 characters
'begin humor'
Addendum to 9th commandment:
If thy external identifiers approach'th to be anywhere
near as long as one-thousand-and-
twenty-four thou shouldst surely be
quickly brought outside and shot.
'/end humor'
A lot of old compilers and linkers had limitations on how long an identifier could be. Six characters was a common limit. Actually, they could be longer than that, but the compiler or linker would throw away everything after the sixth character.
This was usually done to conserve symbol table memory.
It means you're looking at a piece of ancient history. Those commandments are mostly true, but that 9th one may as well actually be carved into a stone tablet, it's so old.
The remaining mystery is: creat. What was wrong with create? It's only six letters!
According to this site:
What does this mean? Your globals should be "Unique to the first six letters", not "limited to six letters". This is in ANSI, I hear, because of utterly painful "obsolescence" of some linkers. Hopefully ANSI will some day say "linkers will have to do longer names, or they'll cry and do longer names". (And all rational people will just ignore this commandment and tell people to upgrade their 2-penny linker - this may not get you friends, or make warnings happy...)

Resources