How would you write to stdout from a bytecode interpreter?

How would you write to stdout from a bytecode interpreter? - c

I was reading this tutorial on building a simple virtual machine/bytecode interpreter. It had instructions like PUSH, POP, HALT, etc... these instructions are decoded and evaluated in a switch, so you would say if the current instruction equals PUSH, then you would push it to a stack. But what if I wanted to print out a string or character?
In assembly, you would use make a string in .data, push the length, then the message, then the file descriptor for stdout (1), the system write call number so 4 (for 32 bit), and then do int 80.
How would I do something like this for a virtual machine? Would I handle it similarly? I thought maybe I could just dump whatever I wanted to write in a register, and then printf the contents when it has something other than (magic number) in it, but that doesn't seem like a good idea.

"Printing" is assuming having some kind of IO (input/output) system with an output device capable of presenting data on it (as a printer or display). On a virtual machine such a device can be only virtual as well, and it is up to the VM implementation how it would be emulated. For example it can have a specific well-defined memory range, which is the "video memory", such that writing there is interpreted as sending data to the output device, which can be emulated by, say, a textbox in you VM's interface.

You need to create a way for programs to access the screen - they can't do it "automatically".
For example (as suggested by Eugene Sh.'s comment): create a PUTCHAR opcode, which pops a number from the stack and prints it to the screen (with the normal C function putchar).

Related

writing hex file to RAM in ARM Cortex-M

I am doing an ongoing project to write a simplified OS for hobby/learning purposes. I can generate hex files, and now I want to write a script on the chip to accept them via serial, load them into RAM, then execute them. For simplicity I'm writing in assembly so that all of the startup code is up to me. Where do I start here? I know that the hex file format is well documented, but is it as simple as reading the headers for each line, aligning the addresses, then putting the data into RAM and jumping to the address? It sounds like I need a lot more than that, but this is a problem that most people don't even try to solve. Any help would be great.

way too vague, there are many different file formats and at least two really popular ones that use text with the data in hex. So not really helping us here.
writing a script on chip means you have an operating system running on your microcontroller? what operating system is it and what does the command line look like, etc.
assembly is not required to completely control everything (basically baremetal) can use asm to bootstrap C and then the rest in C, not a problem.
Are you wanting to download to ram and run or wanting to download and then burn to flash to reset into in some way?
Basically what you are making is a bootloader. And yes we write bootloaders all the time, one for each platform, sometimes borrowing code from a prior platform sometimes not.
First off on your development computer, windows, mac, linux, whatever, write a program (in C or Pascal ideally, something you can easily port to the microcontroller) that reads the whole file into an array, then write some code that basically accepts one byte at a time like you would if you were receiving it serially. Parse through that file format whatever format you choose (initially, then perhaps change formats if you decide you no longer like it) take real programs that you have built which the disassembler or other tools should have other output options to show you what bytes or words should be landing at what addresses. Parse this data, printf out the address/byte or address/word items you find, and then compare that to what the toolchain showed. carve the parsing tool out and replace the printf with write to memory at that address. and then yes you jump to the entry point if you can figure that out and/or you as the designer decide all programs must have a specific entry point.
Intel hex and motorola s-record are good choices (-O ihex or -O srec, my current favorite is --srec-forceS3 -O srec), both are documented at wikipedia. and at least the gnu tools will produce both, you can use a dumb terminal program (minicom) to dump the file into your microcontroller and hopefully parse and write to ram as it comes in. If you cant handle that flow you might think of a raw binary (-O binary) and implement an xmodem receiver in your bootloader.

I cannot understand the abstraction between characters we see and how computers treats them

This is pretty low level and English is not my mother tongue so please be easy on me.
So imagine you are in bash and command prompt is in front of your screen.
When you type ls and hit enter, you are actually sending some bytes to the cpu, 01101100 01110011 00001010 (that is: l s linefeed) from your computer right? The keyboard controller sends the bytes to the cpu, and the cpu tells the operating system what bytes have been received.
So we have an application that is called 01101100 01110011 in our hard drive (or memory..) if I understand correctly? That is a file, and it is an executable file. But how does the operating system find 01101100 01110011 in a drive or in memory?
Also I want to expand this question to functions. We say C Standard library has a function called printf for example. How can a function have a name that is in a file? Ok, I understand that the implementation of printf function is cpu and operating system specific and is a number of machine instructions lying somewhere in the memory or hard drive. But I do not understand how we get to it?
When I link a code that requires the implementation of printf, how is it found? I am assuming the operating system knows nothing about the name of the function, or does it?

Koray, user #DrKoch gave a good answer, but I'd like to add some abstractions.
First, ASCII is a code. It is a table with bit patterns in one column and a letter in the next column. The bit patterns are exactly one byte long (excluding 'wide chars' and so). If we know a byte is supposed to represent a character, then we can look up the bit pattern of the byte in the table. A print function (remember the matrix printers?) receives a character (a byte) and instructs the needles of the matrix printer to hammer in some orderly way onto the paper and see, a letter is formed that humans can read. The ASCII code was devised because computers don't think in letters. There are also other codes, such as EBCDIC, which only means the table is diferent.
Now, if we don't know the byte is a representation of a letter in a certain code, then we are lost and the byte could just mean a number. We can multiply the byte with another byte. So you can multiply a' with 'p', which gives 97 * 112= 10864. Does that male sense? Only if we know the bytes represent numbers and is nonsense if the bytes represent characters.
The next level is that we call a sequence of bytes that are all supposed to represent letters (characters) a 'string' and we developed functions that can search, get and append from/to strings. How long is a string? In C we agreed that the end of the string is reached when we see a byte whose bit pattern is all zeroes, the null character. In other languages, a string representation can have a length member and so won't need a terminating null character.
This is an example of a "stacking of agreements". Another example (referring to a question you asked before) is interrupts: the hardware defines a physical line on the circiut board as an interrupt line (agreement). It gets connected to the interrupt pin (agreement) of the processor. A signal on the line (e.g. from an external device) causes the processor to save the current state of registers (agreement) and transfer control to a pre-defined memory location (agreement) where an interrupt handler is placed (agreement) which handles the request from the external device. In this example of stacking we can go many levels up to the functional application, and many levels down to the individual gates and transistors (and the basal definition of how many volts is a '1' and how many volts is a '0', and of how long that voltage must be observed before a one or zero has definitiely been seen).
Only when understanding that all these levels are only agreements, can one understand a computer. And only when understanding that all these levels are only agreements made between humans, can one abstract from it and not be bothered with these basics (the engineers take care of them).

You'll hardly find an answer if you look at the individual bits or bytes and the CPU.
In fact, if you type l and s the ASCII codes of these character are read by the shell and combined to the string "ls". At that time the shell has build a dictionary with string keys where it finds the key "ls" and it finds that this points to a specific executable "ls" in a path like "/usr/bin".
You see, even the shell thinks in strings not in characters, bytes or even bits.
Something very similar happens inside the linker when it tries to build an executable from your code and a collection of library files (*.lib, *.dll). It has build a dictionary with "printf" as one of the keys, which points to the correct library file and an byte-offset into this file. (This is rather simplified, to demonstrate the principle)
There are several layers of libraries (and BIOS code) before all this gets to the CPU. Don't make your life too hard, don't think too much about these layers in detail.

File format for executable on Mac OS X

I am attempting to (one step at a time) build my own copy of Forth to run on Mac OS X.
I currently have a version of Forth running on Apache and localhost in PHP, Ruby, and Python.
I want to make a version of Forth in C that will create a native executable version of Forth that can make its own native executable files of any compiled Forth code. Sorry about the semi-recursive sentence. My goal is to start in C and end up with my own Forth compiler (no longer running in any C code).
My starting point is to attempt to get a minimal test program to run as a binary executable for Terminal. Once I can understand what the existing C compiler is doing, I can modify its methods for my own purposes.
I created a small "hello world" program in C and ended up with an executable file of 8,497 bytes, which consisted mostly of 0x00 arrays (presumably buffers). My guess is that the entire stdio library was included.
Next, I created the smallest possible C program I could think of (other than a completely null program -- I wanted to be able to find my code in the resulting hex):
int main(void)
{
char testitem;
testitem = 'A';
return -1;
}
That should have given me the barest possible overhead with the storage of the ASCII A and the return value of all ones being easy to find.
Instead, I ended up with a file of 4,313 bytes. There were four locations with the 0x41 (ASCII 'A'), but none were part of a MOV immediate byte instruction. Presumably the 0x41 was stored as constant data and loaded with a different MOV instruction.
Again there were a lot of 0x00 arrays (3,731 bytes, or all but 402 bytes). Presumably there is some kind of header data in the object file (which does run correctly in Terminal and does signal -1) and who knows what else.
At the moment I am not concerned with having an application bundle -- running in Terminal is my short term goal. Once I have this first step working, I can move on to a full application.
Any suggestions on how to determine what I need to have in the object file for it to correctly work as a Terminal tool?

This turns out to be a common challenge. You might want to check following links that provide rather in-depth information:
Let's Build A Mach-O Executable
Hello Mach-O

Why does inserting characters into an executable binary file cause it to "break"?

Why does inserting characters into an executable binary file cause it to "break" ?
And, is there any way to add characters without breaking the compiled program?
Background
I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...
Example
As an example in the application below, Facebook could be changed to Lacebook, and the program will still execute just fine:
But it Breaks with new Characters
I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My in front of Facebook would achieve this:
What I know
I've done some work with C and understand that code is written in human readable, compiled, and linked into an executable file.
I've done introductory studies of assembly language and understand the concepts about data, commands, and pointers being moved around
I've written small programs for Windows, Mac and Linux
What I don't know
I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
I understand why having extra characters in a text string of the binary file would cause problems
What I'd like to know
Why do the extra characters cause the program to break?
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?

I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.
Why do the extra characters cause the program to break?
Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.

When a program is compiled into machine code, it includes many references to the addresses of instructions and data in the program memory. The compiler determines the layout of all the memory of the program, and puts these addresses into the program. The executable file is also organized into sections, and there's a table of contents at the beginning that contains the number of bytes in each section.
If you insert something into the program, the address of everything after that is shifted up. But the parts of the program that contain references to the program and data locations are not updated, they continue to point to the original addresses. Also, the table that contains the sizes of all the sections is no longer correct, because you increased the size of whatever section you modified.

The format of a machine-language executable file is based on hard offsets, rather than on parsing a byte stream (like textual program source code). When you insert a byte somewhere, the file format continues to reference information which follows the insertion point at the original offsets.
Offsets may occur in the file format itself, such as the header which tells the loader where things are located in the file and how big they are.
Hard offsets also occur in machine language itself, such in instructions which refer to the program's data or in branch instructions.
Suppose an instruction says "branch 200 bytes down from where we are now", and you insert a byte into those 200 bytes (because a character string happens to be there that you want to alter). Oops; the branch still covers 200 bytes.
On some machines, the branch couldn't even be 201 bytes even if you fixed it up because it would be misaligned and cause a CPU exception; you would have to add, say, four bytes to patch it to 204 (along with a myriad other things needed to make the file sane).

System call to plot a point in c( linux)

I am new to Linux system calls.My question is do we have a system call in Linux to plot points on screen.I googled it but could not find any simple explanation for it. I want to write a simple C program in Linux that directly plot a point on screen without the help of a C graphics library.
If there is no such system call how can I create my own system call to plot point on screen?

The lowest level hardware independent graphics interface on linux is the framebuffer. This is manipulated by writing to a device node (generally /dev/fb0) which is the equivalent of a system call, since it is a means of sending requests to the kernel. So this does not require any libraries.
A common approach seems to be to mmap() a chunk of user space memory representing the screen to /dev/fb0 and then manipulating that. There are some ioctl() calls to get information about the framebuffer display. A good starting place for information would be the docs in the kernel source -- src/Documentation/fb is a whole directory, see e.g. "framebuffer.txt" and "api.txt" there. There are a few tutorials and such around if you look online. It doesn't matter particularly which kernel version source you look at -- the last revision of "api.txt" was 2011 and "framebuffer.txt" a decade before that (so the interface is very stable).
Note that you can't use the framebuffer from within X. If you want to do graphics stuff within X, you have to use at least Xlib, or a higher level library built on that.

#define MAX_SCREEN_AREA 100
int Gotoxy(int x, int y)
{
char essq[MAX_SCREEN_AREA]={0}; // String variable to hold the escape sequence
sprintf(essq, "\033[%d;%df", y,x);
printf("%s", essq);
return 0;
}
Try this.