Where did the name `atoi` come from? - c

In the C language where did they come up with the name atoi for converting a string to an integer? The only thing I can think of is Array To Integer for an acronym but that doesn't really make sense.

It means Ascii to Integer. Likewise, you can have atol for Ascii to Long, atof for Ascii to Float, etc.
A Google search for 'atoi "ascii to integer"' confirms this on several pages.
I'm having trouble finding any official source on it... but in this listing of man pages from Third Edition Unix (1973) collected by Dennis Ritchie himself, it does contain the line:
atoi(III): convert ASCII to integer
In fact, even the first edition Unix (ca 1971) man pages list atoi as meaning Ascii to Integer.
So even if there isn't any documentation more official than man pages indicating that atoi means Ascii to Integer (I suspect there is and I just haven't been able to locate it), it's been Ascii to Integer by convention at least since 1971.

I griefly believe that function atoi means ascii to integer.

Related

How does C language transform char literal to number and vice versa

I've been diving into C/low-level programming/system design recently. As a seasoned Java developer I still remember my attemtps to pass SUN Java Certification and questions if char type in Java can be cast to Integer and how can that be done. That is what I know and remember - numbers up to 255 can be treated both like numbers or characters depending on casting.
Getting to know C I want to know more but I find it hard to find proper answer (tried googling but I usually get gazilion results how just to convert char to int in the code) how does EXACTLY it work, that C compiler/system calls transform number to character and vice versa.
AFAIK in the memory numbers are being stored. So let's assume in the memory cell we store value 65 (which is letter 'A'). So there is a value stored and suddenly C code wants to get it and store into char variable. So far so good. And then we issue printf procedure with %c formatting for given char parameter.
And here is where the magic happens - HOW EXACTLY printf knows that character with value 65 is letter 'A' (and should display it as a letter). It is a base sign from raw ASCII range (not some funny emoji-style UTF sign). Does it call external STD/libraries/system calls to consult encoding system? I would love some nitty-gritty, low-level explanation or at least link to trusted source.
The C language is largely agnostic about the actual encoding of characters. It has a source character set which defines how the compiler treats characters in the source code. So, for instance on an old IBM system the source character set might be EBCDIC where 65 does not represent 'A'.
C also has an execution character set which defines the meaning of characters in the running program. This is the one that seems more pertinent to your question. But it doesn't really affect the behavior of I/O functions like printf. Instead it affects the results of ctype.h functions like isalpha and toupper. printf just treats it as a char sized value which it receives as an int due to variadic functions using default argument promotions (any type smaller than int is promoted to int, and float is promoted to double). printf then shuffles off the same value to the stdout file and then it's somebody else's problem.
If the source character set and execution character set are different, then the compiler will perform the appropriate conversion so the source token 'A' will be manipulated in the running program as the corresponding A from the execution character set. The choice of actual encoding for the two character sets, ie. whether it's ASCII or EBCDIC or something else is implementation defined.
With a console application it is the console or terminal which receives the character value that has to look it up in a font's glyph table to display the correct image of the character.
Character constants are of type int. Except for the fact that it is implementation defined whether char is signed or unsigned, a char can mostly be treated as a narrow integer. The only conversion needed between the two is narrowing or widening (and possibly sign extension).
"HOW EXACTLY printf knows that character with value 65 is letter 'A' (and should display it as a letter)."
It usually doesn't, and it does not even need to. Even the compiler does not see characters ', A and ' in the C language fragment
char a = 'A';
printf("%c", c);
If the source and execution character sets are both ASCII or ASCII-compatible, as is usually the case nowadays, the compiler will have among the stream of bytes the triplet 39, 65, 39 - or rather 00100111 01000001 00100111. And its parser has been programmed with a rule that something between two 00100111s is a character literal, and since 01000001 is not a magic value it is translated as is to the final program.
The C program, at runtime, then handles 01000001 all the time (though from time to time it might be 01000001 zero-extended to an int, e.g. 00000000 00000000 00000000 01000001 on 32-bit systems; adding leading zeroes does not change its numerical value). On some systems, printf - or rather the underlying internal file routines - might translate the character value 01000001 to something else. But on most systems, 01000001 will be passed to the operating system as is. Then on the operating system - or possibly in a GUI program receiving the output from the operating system - will want to display that character, and then the display font is consulted for the glyph that corresponds to 01000001, and usually the glyph for letter 01000001 looks something like
A
And that will be displayed to the user.
At no point does the system really operate with glyphs or characters but just binary numbers. The system in itself is a Chinese room.
The real magic of printf is not how it handles characters, but how it handles numbers, as these are converted to more characters. While %c passes values as-is, %d will convert such a simple integer value as 0b101111000110000101001110 to stream of bytes 0b00110001 0b00110010 0b00110011 0b00110100 0b00110101 0b00110110 0b00110111 0b00111000 so that the display routine will correctly display it as
12345678
char in C is just an integer CHAR_BIT bits long. Usually it is 8 bits long.
HOW EXACTLY printf knows that character with value 65 is letter 'A'
The implementation knows what characters encoding it uses and pritnf function code takes the appropriate action do output the letter 'A'

What is the difference between sqlite3_bind_text, sqlite3_bind_text16 and sqlite3_bind_text64?

I am using sqlite3 C interface. After reading document at https://www.sqlite.org/c3ref/bind_blob.html , I am totally confused.
What is the difference between sqlite3_bind_text, sqlite3_bind_text16 and sqlite3_bind_text64?
The document only describe that sqlite3_bind_text64 can accept encoding parameter including SQLITE_UTF8, SQLITE_UTF16, SQLITE_UTF16BE, or SQLITE_UTF16LE.
So I guess, based on the parameters pass to these functions, that:
sqlite3_bind_text is for ANSI characters, char *
sqlite3_bind_text16 is for UTF-16 characters,
sqlite3_bind_text64 is for various encoding mentioned above.
Is that correct?
One more question:
The document said "If the fourth parameter to sqlite3_bind_text() or sqlite3_bind_text16() is negative, then the length of the string is the number of bytes up to the first zero terminator." But it does not said what will happen for sqlite3_bind_text64. Originally I thought this is a typo. However, when I pass -1 as the fourth parameter to sqlite3_bind_text64, I will always get SQLITE_TOOBIG error, that makes me think they remove sqlite3_bind_text64 from the above statement by purpose. Is that correct?
Thanks
sqlite3_bind_text() is for UTF-8 strings.
sqlite3_bind_text16() is for UTF-16 strings using your processor's native endianness.
sqlite3_bind_text64() lets you specify a particular encoding (utf-8, native utf-16, or a particular endian utf-16). You'll probably never need it.
sqlite3_bind_blob() should be used for non-Unicode strings that are just treated as binary blobs; all sqlite string functions work only with Unicode.

Where did the name `atol` come from? [duplicate]

This question already has answers here:
Where did the name `atoi` come from?
(2 answers)
Closed 5 years ago.
Anyone know what the source of the name of the function atol for converting a string to an long?
I thought about Array To long but it's not sounds to me true.
ASCII To Long is what atol(3) means (in the early days of Unix, ASCII was only used, and IIRC was mentioned in the K&R book)
Today we usually use UTF-8 everywhere, but atol still works (since UTF-8 for digits uses the same encoding than ASCII)
On C implementations using another encoding (e.g. EBCDIC) atol should still do what is expected (so atol("345") would give 345), since the C standard requires that the encoding of digit characters is consecutive. Its implementation might be more complex (or encoding specific).
so today, the atol name don't refer anymore to ASCII. The C11 standard n1570 don't mention ASCII (as mandatory) IIRC. you might rewrite history by reading atol as anything to long even if historically it was ASCII to long.
It's Ascii to long, the same convention is used for atoi etc.

Print a number in base 4

Lately I had a task that included printing base-4 representation of a number. Since I didn't find a function to do it for me, I implemented it (which is not so hard of course), but I wonder, is there a way to do it using format placeholders?
I'm not asking how to implement such function, but if such function / format placeholder already exists?
There is no standard C or C++ function, but you may be able to use itoa
The closest you could get to doing it with printf is using snprintf to convert it to hex, then a lookup table to convert hex digits to pairs of base-4 digits. :-)
No, not in the Standard C library.
I think that printf can handle only decimal, hexadecimal and octal values.
So i think no.

isLetter with accented characters in C

I'd like to create (or find) a C function to check if a char c is a letter...
I can do this for a-z and A-Z easily of course.
However i get an error if testing c == á,ã,ô,ç,ë, etc
Probably those special characters are stored in more then a char...
I'd like to know:
How these special characters are stored, which arguments my function needs to receive, and how to do it?
I'd also like to know if are there any standard function that already does this.
I think you're looking for the iswalpha() routine:
#include <wctype.h>
int iswalpha(wint_t wc);
DESCRIPTION
The iswalpha() function is the wide-character equivalent of
the isalpha(3) function. It tests whether wc is a wide
character belonging to the wide-character class "alpha".
It does depend upon the LC_CTYPE of the current locale(7), so its use in a program that is supposed to handle multiple types of input correctly simultaneously might not be ideal.
If you are working with single-byte codesets such as ISO 8859-1 or 8859-15 (or any of the other 8859-x codesets), then the isalpha() function will do the job if you also remember to use setlocale(LC_ALL, ""); (or some other suitable invocation of setlocale()) in your program. Without this, the program runs in the C locale, which only classifies the ASCII characters (8859-x characters in the range 0x00..0x7F).
If you are working with multibyte or wide character codesets (such as UTF8 or UTF16), then you need to look to the wide character functions found in <wchar.h> and <wctype.h>.
How these characters are stored is locale-dependent. On most UNIX systems, they'll be stored as UTF8, whereas a Win32 machine will likely represent them as UTF16. UTF8 is stored as a variable-amount of chars, whereas UTF16 is stored using surrogate pairs - and thus inside a wchar_t (or unsigned short) (though incidentally, sizeof(wchar_t) on Windows is only 2 (vs 4 on *nix), and thus you'll often need 2 wchar_t types to store the 1 character if a surrogate pair encoding is used - which it will be in many cases).
As was mentioned, the iswalpha() routine will do this for you, and is documented here. It should take care of locale-specific issues for you.
You probably want http://site.icu-project.org/. It provides a portable library with APIs for this.

Resources