Query on handling TAB character? - c

In ASCII CharacterSet world, Below 3 whitespace characters have a number.
Space(32) LineFeed(10) CarriageReturn(13)
So, It is easy to write a program to read or write such whitespace characters in standard way using some programming language(like C) in a portable way, by using following notations,
LineFeed - '\n'
Carriage Return - '\r'
Space - ' '
I learnt that TAB is collection of sometimes 4 or sometimes 8 Space characters.
My question:
How do i understand the meaning of '\t' character in programming language(like C)? When there is no standard definition of TAB in CharacterSet world?

In ASCII, a Horizontal Tab is code 9. The ASCII Horizontal Tab is code 9 regardless of what character set the code is written in.
In C, '\t' is the horizontal tab in the character set of the source code, regardless of the character set of the IO. the integer value of '\t' may/may not be 9.
The character set of the code and character set of IO (e.g. files) are commonly the same. In 2014, there are often both ASCII (at least for the ASCII codes 0 - 127).
In C, printing a '\t' is somewhat precise;
Moves the active position to the next horizontal tabulation position
on the current line. If the active position is at or past the last defined horizontal
tabulation position, the behavior of the display device is unspecified. C11 §5.2.2 2
With ASCII, the Horizontal Tab is intended not to represent printable information, but rather to control devices. The ASCII original definition led to ambiguity as to the precise action of a control device: move to the next tabulation stop.
Given these similar, but different meanings, and acknowledging other languages have various meanings, the precise meaning is highly dependent on the situation. Therefore to maintain portability, other situation dependent info is needed (e. g. a definition or list of the tab stops) to precisely generate and interpret a horizontal tab.
Recommend:
Unless the data format requires ( CSV, makefile), do not generate tabs, but spaces. Upon reading a '\t', interpret it, as able, the same as 1 or more consecutive spaces.

What to do with tab depends on
What kind of input you are processing
The capability of the device you are targeting
A tab is meant as an elastic delimiter that tells the device to move to next tab stop. Replacing tabs with N spaces is just poor man's handling. To correctly render a tab you need to figure out appropriate tab-stops somehow.
When printing a table that uses tabs as field separator onto a text terminal you need to load the table, count the number of characters in each column, and fill each field with spaces so the fit the widest field in that column. In this case the length of the tab is determined by the number of characters you have to write before next column, which in turn varies with the contents of current row.
A practical example of tab-stops: http://nickgravgaard.com/elastictabstops/

This is just an addition to other answers. The tab stops are usually set at positions 4*n or 8*n (here I am using 0-based numbering of positions). If a cursor is at position x, after outputting a tab character it jumps to next tab stop:
x = (x / 8 + 1) * 8;
Or if the tab-stops have spacing of s (usually s is 4 or 8, as mentioned):
x = (x / s + 1) * s;
However, if tab-stops are completely flexible (e.g. the user can specify each tab-stop in Microsoft Word), no such formula exists.

Related

What character set does C's char type use?

Forgive me for the stupid question. But I was wondering what character set C's char type uses. At first I thought it was ASCII, but then I realized it could reach 255 which exceeds ASCII's 127 characters. What character set is this? Extended ASCII?
The C standard does not require C implementations to use a particular character set. It requires the execution character set (used in running programs, in contrast to the source character set used when compiling) to have the Latin alphabet letters A-Z and a-z, the digits 0-9, these characters:
!"#%&’()*+,-./:;?[\]^_{|}~
the space character, and characters for horizontal tab, vertical tab, form feed, alert, backspace, carriage return, and new line. It requires the codes for the digits to be consecutive from the code for 0 to the code for 9, and the character value zero must be available to mark the end of strings. Otherwise, it leaves the character set up to each C implementation.
C implementations overwhelmingly use ASCII with the character codes 0-127. There may be somewhat more variation in what implementations use with codes 128-255.

Can backspace escape cancel a new-line escape?

I'm working with ubuntu.
Code:
printf("Hello\n\b world");
I get on terminal:
Hello
world
Why does backspace not cancel the \n?
Is there a hierarchy in chars?
How can I delete special chars?
Your question goes beyond the scope of the C language: printf("Hello\n\b world"); outputs the bytes from the format string, possibly translated according to the text mode handling of newlines:
on unix systems, the bytes are output to the system handle unmodified.
on Microsoft legacy systems, the newline is converted to CR LF and the other bytes transmitted unmodified.
If the standard output is directed to a file, the file will contain the translation of the newline and a backspace (0x08 on most systems).
If the standard output goes to a terminal, the handling of the backspace special character is outside the program's control: the terminal (hardware, virtual, local or remote...) will perform its task as programmed and configured... Most terminals move the cursor left one position on whatever display they control, some erase the character at that position. If the cursor is already at column 1, it is again system dependent whether backspace moves the cursor back to the end of the previous line, whatever that means. Many systems don't do that and keep the cursor at column 1. This seems consistent with the behavior you observe.
This is what the C standard says (in C 2018 5.2.2 2) about the new line character:
Moves the active position to the initial position of the next line.
and backspace:
Moves the active position to the previous position on the current line. If the active position is at the initial position of a line, the behavior of the display device is unspecified.
Note that the backspace character is not specified to erase a previous character. It is specified to cause a certain action on a display device.
Recall that C was developed in an era when teletypes and other physical printing devices were in common use. Many of these devices could only push the paper upward. Once a new line character caused the paper to be pushed upward, there was no way to move it downward again.
Additionally, some early video displays, or the software driving them, emulated physical printing and did not support going back a line, at least in some of their modes of operation.
On displays where one could move the cursor freely, it is not clear what a backspace from the beginning of a line should do. Consider a display which has 80 columns, numbered from 1 to 80, and the last line printed contained 40 characters, followed by a new line. When we backspace, we move the cursor back to that line, but which column do we move it to? Column 80, the last one of the display? Or column 40, the last one where something was printed? Different devices might handle this differently. Note that the latter choice requires the device to remember the length of each line, an added burden on early computing machinery. (My high school’s cheap display terminals did not have enough memory to remember all the text in a 24×80 display. I think it was only 1024 bytes, enough for 12.8 lines of 80 characters. If you wrote complete lines of text, it would scroll earlier lines off the display, keeping only the last 12.)
Because of these variations in behavior, the C standard did not specify the details of backspacing from the start of a line.
You ask about a “backspace escape” canceling a “new-line escape.” However, the escape sequences are irrelevant here; they are in a different layer of representation than the operations of the characters:
Inside a string literal, \b and \n are escape sequences. As the compiler translates the program, it replaces these with a backspace character and a new line character. Then they are no longer escape sequences; they are simply characters in a string.
When you write the characters with printf, they are transmitted as characters in a stream.
When the characters are sent to a display device (because that is what the stream is connected to), they produce the actions in the 5.2.2 2 text cited above.
Those escape sequences \b and \n represent control characters. A control character is a special character that, well, controls the behavior of the output device in some special way. When you say
printf("A");
it prints the (ordinary) character A to the screen. But when you say
printf("\n");
it doesn't print anything, instead it moves the cursor down to the beginning of the next line.
Now, the meaning of \b is not "cancel the character to the left". The control character \b does not "cancel" anything. What it does is just move the cursor one character to the left, if it can. But if the cursor is already at the left edge, it probably can't.
Once upon a time, and especially when the output was going to a printer that actually printed on paper, it was common to do things like
printf("this is u\b_n\b_d\b_e\b_r\b_l\b_i\b_n\b_e\b_d\b_\n");
or
printf("this is b\bbo\bol\bld\bd\n");
to print underlined or bold words by overprinting. These examples obviously rely on the move-one-to-the-left behavior of \b. These examples prove that the behavior of \b is not anything like "canceling"!
It sounds like you think \b might somehow affect the string it's part of.
It sounds like you think \b might somehow be processed by your C compiler, or by the C library.
It sounds like you think that the string "abc\bdef" might get converted to "abdef".
But none of these things is true. The backspace character \b is interpreted by your screen or your printer, or whatever output device your program is "printing" to. The interpretation of control characters like \b is mostly up to your output device. It is mostly not a property of the C programming language.

How tab and space affect code size in C

When working on embedded system, every byte of memory matters, in C/C++ program is there any difference in resulting code size when you use 4 space instead of 1 tab?
No.
The emitted binary doesn't change based on what spacing you use in your program.
The amount of space the source file takes up does change though. spaces and tabs are each one character, so using 1 tab vs 4 spaces takes up different amounts of memory. It's important to note that this is only for the source file, and during compilation.
Formatting the source code itself with spaces or tabs, makes no difference to the executable code size. It is a preference, mine is never to use tab formatting - please read this.
As for program itself, tabs only make a difference when using string literals. The control character '\t' is one byte in the executable, any spaces will be one or more.
But I prefer to use a field width specifier such as printf("%4d", i) to format the output.

BBC Basic: Inserting a control character without occupying space in Mode 7

I'm using mode 7 ("Teletext mode") on my Beeb. I'd like to print a string of unbroken characters with an coloured text control character in the middle, as-per this mock-up:
However, I can't work how this can be done. The control character needs to occupy space in the output:
PRINT CHR$129;"STACK"CHR$132;"OVERFLOW"
I read up on held graphics mode, but this only seems to allow me to repeat the last used graphics symbol, instead of inserting a space when I print a control character. When I do try this with text I just get an additional space for the held graphics character:
PRINT CHR$129;"STACK"CHR$158;CHR$132;"OVERFLOW"
Is this possible? Can I print a control character without getting a visible space?
Or perhaps there is a way to insert a control character followed by a backspace, to claim back the occupied space but retain the control code effect?
It is not possible to treat the text characters as graphics characters when using the 'held graphics' character. A good example of using 'held graphics' can be found here: http://www.riscos.com/support/developers/bbcbasic/part2/teletext.html
You also cannot use the backspace character to go back one space as each control code takes up one space on the screen.
OK so this is a bit of a fudge; but it was an answer to my problem so I will share it here for all those BBC Micro / Teletext developers struggling with the same problem...
My challenge was to avoid a noticeable space between the two coloured words. Control characters must exist in the text and occupy a character (either as a space or a copy of the last used block graphic).
Therefore, by inserting a space between every character I was able to make the text appear as one word (albeit with slightly excessive letter spacing):
PRINT CHR$129;"S T A C K"CHR$132;"O V E R F L O W"
This had the desired effect for me - it may not for some others. The only other route I could see available was to render the whole text in block graphics, which would occupy significantly more screen space than the approach I settled for.
This is from memory, I recall that CHR$(8) moves the cursor one place to the left.
Put that just before the "O":
PRINT CHR$(129);"STACK";CHR$(132);CHR$(8);"OVERFLOW"
Sadly my BBC Model B is, I believe, in my parents' attic, so I can't test this.

ncurses form: set_field_pad with no pad

In Linux ncurses and with the C language, using the form.h library it is possible to specify which pad character should be printed on screen in the empty positions of a certain field. That is: if a field is only partially filled by some characters typed by the user, the remaining ones will be momentarily equal to the specified pad character.
The function is
set_field_pad(FIELD *field, int pad)
As stated here (sec. 18.3.4), the default pad character is a blank space. First of all: if I choose to change it to #, not only a # will be showed in the empty characters, but also where I type space. Is this avoidable? space is a character that I intentionally typed, while the empty characters are where I never typed anything.
I tried calling set_field_pad with several values for int pad, in particular 0 and 3 (ASCII values for NULL and End of Text) but the resulting pad character has always remained "space" as in the default case. Here it is said that some terminal incorrectly display NULL as a blank space.
I would like to have no padding character: the cursor in the field should stop on the last character typed by the user. If it was possible, how could I set this for my field?

Resources