TSR Program to change case of characters on screen in windows/dos - c

I would like to know how can we change the letters of the characters on screen using C. It is a TSR program using dos.h header file.

I might be able to help partially from what i remember of my early undergrad.
In DOS, the address 0xB8000000 (0xB800:0 as segment:offset rightly pointed out in comments) is the starting address of text mode video memory (0xA8000000 being that for graphics). Any thing written into this area is copied directly to vga card. Now every character on the screen is made up of two bytes. First byte was the ascii character and second was the color.
So effectively you take a far pointer in 16 bit c (since a normal near pointer won't do) and assign it the above address. Then assuming your screen size (25*80) or whatever * 2 is the total number of single byte addresses filling your screen.
I remember having written the equivalent of a trivial printf function using above.
Getting back to your problem, you have to write code which loops through all even addresses starting from above address till screen size. Even address because odd one represents color. There it checks if the assigned character is valid ascii and add or subtract according to needs e.g. 'A' + 32 would get you 'a' and so on.
Question remains about when your above program does this. I believe you can have some interrupt or similar thing in dos.h which triggers every time any screen character is changed. But this part is not very clear in my memory.
See if that works for you.

Related

How to dynamically change the string from the i/o stream in c

I was looking at a problem in K&R (Exercise 1-18), which asked to remove any trailing blanks or tabs. That pushed me to think about text messengers like Whatsapp. The thing is lets say I am writing a word Parochial, then the moment I had just written paro, it shows parochial as options, I click on that replaces the entire word (even if the spelling is wrong written by me, it replaces when I chose an option).
What I am thinking is the pointer goes back to the starting of the word or say that with start of every new word when I am writing, the pointer gets fixed to the 1st letter & if I choose some option it replaces that entire word in the stream (don't know if I'm thinking in the right direction).
I can use getchar() to point at the next letter but how do I:
1: Go backward from the current position of the pointer pointing the stream?
(By using fseek())?
2: How to fix a pointer a position in an I/o stream, so that I can fix it at the beginning of a new word.
Please tell me my approach is correct or understanding of some different concept is needed. Thanks in advance
Standard streams are mainly for going forward*, minimizing the number of IO system calls, and for avoiding the need to keep large files in memory at once.
A GUI app is likely to want to keep all of its display output in memory, and when you have the whole thing in memory, going back and forth is just a simple mater of incrementing and decrementing pointers or indices.
*(random seeks aren't always optimal and they limit you from doing IO on nonseekable files such as pipes or sockets)

Parsing a MAC Address to an array of three 16-bit shorts

MAC Addresses are 48 bits. That is equivalent to three shorts. MAC
addresses are sometimes written like this: 01:23:45:67:89:ab where
each pair of digits represents a hexadecimal number.
Write a function that will take in a character pointer pointing to a
null terminated string of characters like in the example and will
break it apart and then store it in an array of three 16-bit shorts.
The address of the array will also be passed into the function.
I figured the function header should look something like void convertMacToShort(char *macAddr, short *shorts);. What I'm having difficulty with is parsing the char*. I feel like it's possible if I loop over it, but that doesn't feel efficient enough. I don't even need to make this a universal function of sorts--the MAC address will always be a char* in the format of 01:23:45:67:89:ab.
What's a good way to go about parsing this?
Well efficiency is one thing ... robustness another.
If you have very defined circumstances like a list of millions of MAC addresses which are all in the same format (only lower case letters, always leading zeroes, ...) then I would suggest using a quick function accessing the characters directly.
If you're parsing user input and need to detect input errors as well, execution speed is not the thing to worry about. In this scenario you have to make sure that you detect all possible mistakes a user is able to do (and this is quite a feat). This leads to sscanf(..) and in that case I would even suggest to write your own function which parses the string (for my experience sscanf(..) sometimes causes trouble depending on the input string and therefore I avoid using it when processing user input).
Another thing: If you're worrying about efficiency in the means of execution time, write a little benchmark which runs the parsing function a few million times and compare execution time. This is easily done and sometimes brings up surprises...

I cannot understand the abstraction between characters we see and how computers treats them

This is pretty low level and English is not my mother tongue so please be easy on me.
So imagine you are in bash and command prompt is in front of your screen.
When you type ls and hit enter, you are actually sending some bytes to the cpu, 01101100 01110011 00001010 (that is: l s linefeed) from your computer right? The keyboard controller sends the bytes to the cpu, and the cpu tells the operating system what bytes have been received.
So we have an application that is called 01101100 01110011 in our hard drive (or memory..) if I understand correctly? That is a file, and it is an executable file. But how does the operating system find 01101100 01110011 in a drive or in memory?
Also I want to expand this question to functions. We say C Standard library has a function called printf for example. How can a function have a name that is in a file? Ok, I understand that the implementation of printf function is cpu and operating system specific and is a number of machine instructions lying somewhere in the memory or hard drive. But I do not understand how we get to it?
When I link a code that requires the implementation of printf, how is it found? I am assuming the operating system knows nothing about the name of the function, or does it?
Koray, user #DrKoch gave a good answer, but I'd like to add some abstractions.
First, ASCII is a code. It is a table with bit patterns in one column and a letter in the next column. The bit patterns are exactly one byte long (excluding 'wide chars' and so). If we know a byte is supposed to represent a character, then we can look up the bit pattern of the byte in the table. A print function (remember the matrix printers?) receives a character (a byte) and instructs the needles of the matrix printer to hammer in some orderly way onto the paper and see, a letter is formed that humans can read. The ASCII code was devised because computers don't think in letters. There are also other codes, such as EBCDIC, which only means the table is diferent.
Now, if we don't know the byte is a representation of a letter in a certain code, then we are lost and the byte could just mean a number. We can multiply the byte with another byte. So you can multiply a' with 'p', which gives 97 * 112= 10864. Does that male sense? Only if we know the bytes represent numbers and is nonsense if the bytes represent characters.
The next level is that we call a sequence of bytes that are all supposed to represent letters (characters) a 'string' and we developed functions that can search, get and append from/to strings. How long is a string? In C we agreed that the end of the string is reached when we see a byte whose bit pattern is all zeroes, the null character. In other languages, a string representation can have a length member and so won't need a terminating null character.
This is an example of a "stacking of agreements". Another example (referring to a question you asked before) is interrupts: the hardware defines a physical line on the circiut board as an interrupt line (agreement). It gets connected to the interrupt pin (agreement) of the processor. A signal on the line (e.g. from an external device) causes the processor to save the current state of registers (agreement) and transfer control to a pre-defined memory location (agreement) where an interrupt handler is placed (agreement) which handles the request from the external device. In this example of stacking we can go many levels up to the functional application, and many levels down to the individual gates and transistors (and the basal definition of how many volts is a '1' and how many volts is a '0', and of how long that voltage must be observed before a one or zero has definitiely been seen).
Only when understanding that all these levels are only agreements, can one understand a computer. And only when understanding that all these levels are only agreements made between humans, can one abstract from it and not be bothered with these basics (the engineers take care of them).
You'll hardly find an answer if you look at the individual bits or bytes and the CPU.
In fact, if you type l and s the ASCII codes of these character are read by the shell and combined to the string "ls". At that time the shell has build a dictionary with string keys where it finds the key "ls" and it finds that this points to a specific executable "ls" in a path like "/usr/bin".
You see, even the shell thinks in strings not in characters, bytes or even bits.
Something very similar happens inside the linker when it tries to build an executable from your code and a collection of library files (*.lib, *.dll). It has build a dictionary with "printf" as one of the keys, which points to the correct library file and an byte-offset into this file. (This is rather simplified, to demonstrate the principle)
There are several layers of libraries (and BIOS code) before all this gets to the CPU. Don't make your life too hard, don't think too much about these layers in detail.

Getting character attributes

Using WinAPI to get the attribute of a character located in y line and x column of the screen console.
This is what I am trying to do after a call to GetConsoleScreenBufferInfo(GetStdHandle(STD_OUTPUT_HANDLE), &nativeData); where the console cursor is set to the specified location. This won't work. It will return the last used attribute change instead.
How do I obtain the attributes used on all the characters on their locations?
EDIT:
The code I used to test ReadConsoleOutput() : http://hastebin.com/atohetisin.pl
It throws garbage values.
I see several problems off the top of my head:
No error checking. You must check the return value for ReadConsoleOutput and other functions, as documented. If the function fails, you must call GetLastError() to get the error code. If you don't check for errors, you're flying blind.
You don't allocate a buffer to receive the data in. (Granted, the documentation confusingly implies that it allocates the buffer for you, but that's obviously wrong since there's no way for it to return a pointer to it. Also, the sample code clearly shows that you have to allocate the buffer yourself. I've added a note.)
It looks as if you had intended to read the characters you had written, but you are writing to (10,5) and reading from (0,0).
You're passing newpos, which is set to (10,5), as dwBufferCoord when you call ReadConsoleOutput, but you specified a buffer size of (2,1). It doesn't make sense for the target coordinates to be outside the buffer.
Taking those last two points together I think perhaps you have dwBufferCoord and lpReadRegion confused, though I'm not sure what you meant the coordinates (200,50) to do.
You're interpreting CHAR_INFO as an integer in the final printf statement. The first element of CHAR_INFO is the character itself, not the attribute. You probably wanted to say chiBuffer[0].Attributes rather than just chiBuffer[0]. (Of course, this is moot at the moment, since chiBuffer points to a random memory address.)
If you do want to retrieve the character, you'll first need to work out whether the console is in Unicode or ASCII mode, and retrieve UnicodeChar or AsciiChar accordingly.

Saving data to a binary file

I would like to save a file as binary, because I've heard that it would probably be smaller than a normal text file.
Now I am trying to save a binary file with some text, but the problem is that the file just contains the text and NULL at the end. I would expect to see only zero's and one's inside the file.
Any explaination or suggestions are highly appreciated.
Here is my code
#include <iostream>
#include <stdio.h>
int main()
{
/*Temporary data buffer*/
char buffer[20];
/*Data to be stored in file*/
char temp[20]="Test";
/*Opening file for writing in binary mode*/
FILE *handleWrite=fopen("test.bin","wb");
/*Writing data to file*/
fwrite(temp, 1, 13, handleWrite);
/*Closing File*/
fclose(handleWrite);
/*Opening file for reading*/
FILE *handleRead=fopen("test.bin","rb");
/*Reading data from file into temporary buffer*/
fread(buffer,1,13,handleRead);
/*Displaying content of file on console*/
printf("%s",buffer);
/*Closing File*/
fclose(handleRead);
std::system("pause");
return 0;
}
All files contain only ones and zeroes, on binary computers that's all there is to play with.
When you save text, you are saving the binary representation of that text, in a given encoding that defines how each letter is mapped to bits.
So for text, a text file or a binary file almost doesn't matter; the savings in space that you've heard about generally come into play for other data types.
Consider a floating point number, such as 3.141592653589. If saved as text, that would take one character per digit (just count them), plus the period. If saved in binary as just a copy of the float's bits, it will take four characters (four bytes, or 32 bits) on a typical 32-bit system. The exact number of bits stored by a call such as:
FILE *my_file = fopen("pi.bin", "wb");
float x = 3.1415;
fwrite(&x, sizeof x, 1, my_file);
is CHAR_BIT * sizeof x, see <stdlib.h> for CHAR_BIT.
The problem you describe is a chain of (very common1, unfortunately) mistakes and misunderstandings. Let me try to fully detail what is going on, hopefully you will take the time to read through all the material: it is lengthy, but these are very important basics that any programmer should master. Please do not despair if you do not fully understand all of it: just try to play around with it, come back in a week, or two, practice, see what happens :)
There is a crucial difference between the concepts of a character encoding and a character set. Unless you really understand this difference, you will never really get what is going on, here. Joel Spolsky (one of the founders of Stackoverflow, come to think of it) wrote an article explaining the difference a while ago: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Before you continue reading this, before you continue programming, even, read that. Honestly, read it, understand it: the title is no exaggeration. You must absolutely know this stuff.
After that, let us proceed:
When a C program runs, a memory location that is supposed to hold a value of type "char" contains, just like any other memory location, a sequence of ones and zeroes. "type" of a variable only means something to the compiler, not to the running program who just sees ones and zeroes and does not know more than that. In other words: where you commonly think of a "letter" (an element from a character set) residing in memory somewhere, what is actually there is a bit sequence (an element from a character encoding).
Every compiler is free to use whatever encoding they wish to represent characters in memory. As a consequence, it is free represent what we call a "newline" internally as any number it chooses. For example, say I write a compiler, I can agree with myself that every time I want to store a "newline" internally I store it as number six (6), which is just 0x6 in binary (or 110 in binary).
Writing to a file is done by telling the operating system2 four things at the same time:
The fact that you want to write to a file (fwrite())
Where the data starts that you want to write (first argument to fwrite)
How much data you want to write (second and third argument, multiplied)
What file you want to write to (last argument)
Note that this has nothing to do with the "type" of that data: your operating has no idea, and does not care. It does not know anything about characters sets and it does not care: it just sees a sequence of ones and zeroes starting somewhere and copies that to a file.
Opening a file in "binary" mode is actually the normal, intuitive way of dealing with files that a novice programmer would expect: the memory location you specify is copied one-on-one to the file. If you write a memory location that used to hold variables that the compiler decided to store as type "char", those values are written one-on-one to the file. Unless you know how the compiler stores values internally (what value it associates with a newline, with a letter 'a', 'b', etc), THIS IS MEANINGLESS. Compare this to Joel's similar point about a text file being useless without knowing what its encoding is: same thing.
Opening a file in "text" mode is almost equal to binary mode, with one (and only one) difference: anytime a value is written that has value equal to what the compiler uses INTERNALLY for the newline (6, in our case), it writes something different to the file: not that value, but whatever the operating system you are on considers to be a newline. On windows, this is two bytes (13 and 10, or 0x0d 0x0a, on Windows). Note, again, if you do not know about the compiler's choice of internal representation of the other characters, this is STILL MEANINGLESS.
Note at this point that it is pretty clear that writing anything but data that the compiler designated as characters to a file in text mode is a bad idea: in our case, a 6 might just happen to be among the values you are writing, in which case the output is altered in a way that we absolutely do not mean to.
(Un)Luckily, most (all?) compilers actually use the same internal representation for characters: this representation is US-ASCII and it is the mother of all defaults. This is the reason you can write some "characters" to a file in your program, compiled with any random compiler, and then open it with a text editor: they all use/understand US-ASCII and it happens to work.
OK, now to connect this to your example: why is there no difference between writing "test" in binary mode and in text mode? Because there is no newline in "test", that is why!
And what does it mean when you "open a file", and then "see" characters? It means that the program you used to inspect the sequence of ones and zeroes in that file (because everything is ones and zeroes on your hard disk) decided to interpret that as US-ASCII, and that happened to be what your compiler decided to encode that string as, in its memory.
Bonus points: write a program that reads the ones and zeroes from a file into memory and prints every BIT (there's multiple bits to make up one byte, to extract them you need to know 'bitwise' operator tricks, google!) as a "1" or "0" to the user. Note that "1" is the CHARACTER 1, the point in the character set of your choosing, so your program must take a bit (number 1 or 0) and transform it to the sequence of bits needed to represent character 1 or 0 in the encoding that the terminal emulator uses that you are viewing the standard out of the program on oh my God. Good news: you can take lots of short-cuts by assuming US-ASCII everywhere. This program will show you what you wanted: the sequence of ones and zeroes that your compiler uses to represent "test" internally.
This stuff is really daunting for newbies, and I know that it took me a long time to even know that there was a difference between a character set and an encoding, let alone how all of this worked. Hopefully I did not demotivate you, if I did, just remember that you can never lose knowledge you already have, only gain it (ok not always true :P). It is normal in life that a statement raises more questions than it answered, Socrates knew this and his wisdom seamlessly applies to modern day technology 2.4k years later.
Good luck, do not hesitate to continue asking. To other readers: please feel welcome to improve this post if you see errors.
Hraban
1 The person that told you that "saving a file in binary is probably smaller", for example, probably gravely misunderstands these fundamentals. Unless he was referring to compressing the data before you save it, in which case he just uses a confusing word ("binary") for "compressed".
2 "telling the operating system something" is what is commonly known as a system call.
Well, the difference between native and binary is the way the end of line is handled.
If you write a string in a binary, it will stay the string.
If you want to make it smaller, you'll have to somehow compress it (look for libz for example).
What is smaller is: when wanting to save binary data (like an array of bytes), it's smaller to save it as binary rather than putting it in a string (either in hexa representation or base64). I hope this helps.
I think you're a bit confused here.
The ASCII-string "Test" will still be an ASCII-string when you write it to the file (even in binary mode). The cases when it makes sense to write binary are for other types than chars (e.g. an array of integers).
try replacing
FILE *handleWrite=fopen("test.bin","wb");
fwrite(temp, 1, 13, handleWrite);
with
FILE *handleWrite=fopen("test.bin","w");
fprintf(handleWrite, "%s", temp);
Function printf("%s",buffer); prints buffer as zero-ending string.
Try to use:
char temp[20]="Test\n\rTest";

Resources