Footprint of Lua on a PPC Micro

Footprint of Lua on a PPC Micro - c

We're developing some code on Freescale PPC micros (5517 and 5668 at the moment), and I was wondering if we could put Lua on them.
The devices need to be easily programmed/reconfigured in the field, and the current product uses a proprietary interpreted logic language that can be loaded in, and our software (written in C) runs an interpreter. I would like to move to a better language (the implementation is a bit buggy and slow), so I'm considering Lua, but the memory footprint must be very low. For the 5517 (which we may not use), the maximum RAM is 80K. Things are better on the 5668, with 592K of RAM.
So does anyone know if I can put Lua on bare metal? We're effectively not running an OS. If so, are there any estimates on what kind of memory footprint we might see? How much effort it would take?
Failing this, does anyone know of any kind of interpreter that might be better in a memory-constrained environment without an OS? Or are we better just rolling our own?

See the eLua project.

Related

Do processors have optimizations and architecture preferences targeted firstly or mainly to C/C++ languages?

I have read an article C Is Not a Low-level Language, where is such paragraph:
Unfortunately, simple translation providing fast code is not true for
C. In spite of the heroic efforts that processor architects invest in
trying to design chips that can run C code fast, the levels of
performance expected by C programmers are achieved only as a result of
incredibly complex compiler transforms. The Clang compiler, including
the relevant parts of LLVM, is around 2 million lines of code. Even
just counting the analysis and transform passes required to make C run
quickly adds up to almost 200,000 lines (excluding comments and blank
lines).
What does a bold sentence mean? Does it mean that manufacturers design processors with some optimizations and architecture decisions targeted firstly or even specifically to C (C++) code? Or it just means that they are trying to design processors that executes any code faster, including the code written in C language?
If some preferences to C exists, what are they?
My couple of thoughts:
a branch prediction algorithm tuned in to patterns happening mainly in C code.
instructions which are useful and used in C but aren't useful in other languages. Otherwise other languages (compilers) will use them too.
I knows about language specific processors like Jazelle or Lisp machine for Java and Lisp respectively, but similar technologies can't be applied to C, because there are no bytecode.

Processors don't necessarily have optimizations targeted at C, but they do provide features to make C (and other procedural languages in general) map more cleanly to the platform.
Take cache-coherency in a multi-threaded environment as an example. From a C perspective, a global variable shared by two threads should look the same to both threads. If one thread writes to it the other should be able to see those modifications. But in a multi-core CPU with independent caches, that takes extra effort to support. Core 1 has to be able to detect that core 2 is accessing an address it has modified in cache and flush that out to memory (or somehow share it directly to core 2's cache).
That's essentially the thesis of that entire article. C's abstract machine model doesn't necessarily map cleanly to real modern high-performance processors like it did to the (by comparison extremely simple) PDP-11, and CPUs and compilers have to take great pains to paper over those differences.

The "heroic efforts" of the processor architects is largely referring to the design of cache and memory subsystems on the CPUs.
For a very long time now, the instruction executions circuits inside the CPUs have been far, far quicker than the electronics that looks after fetching/writing data from/to memory, largely because the technologies we have for RAM chips is hasn't really got better. Where the cores have speeded up the memory hasn't, and so the cache and memory subsystem has to get ever more elaborate in order to be able to pre-fetch data and move it towards the execution circuits ahead of time. Needless to say, this doesn't always pan out well.
It's also partly because of the physical distance between the CPU and RAM chips. Though only a few inches (if that) of track on a motherboard, that distance is significant; the speed of a signal down the track is about 1ns every 8 inches. For signals clocked in the GHz range (1 cycle << 1ns), a short track is a long way. This is partly why Apple have gone down the route of putting RAM onto the same package as the CPU in the home-grown M1 silicon.
Back to caches - the likes of Intel (and AMD, ARM) have strived to make CPUs that have good, general purpose performance, so that they run pretty much any code well. Modern compilers help a lot - if they know what the cache in the CPU is likely to do in any particular circumstance, the compilers can arrange code to fit in with what the hardware is likely to do.
A reasonable question then is, is that effective? Well, yes and no. Yes, because compiled code does run quite well, but no for a couple of reasons. The first is that ultimate performance for any given algorightm is rarely reached by the compiler / CPU, and secondly all this complexity makes it nigh on impossible for a good programmer to do their own optimisation.
Some CPUs help out the hero-programmer here. PowerPC (at least some variants) has instructions where the programmer can give the cache system a hint that the programme will shortly need data from such-and-such a location in RAM. The CPU uses that instruction to pre-load the L1 cache with that data, so that when the program actually starts to perform operations on data at that address it's already in cache.
The IBM Cell processor took this to a whole new level. The SPE math cores (there were 8 of them) had no cache, and no way of addressing data in CPU RAM at all. What there was instead was 256K of static RAM per core into which all code and data had to fit, and a way for code to push code and data in and out of that static RAM very quickly (256Gbyte/sec at the time, which was very very quick). The developer was completely on their own; they had to write code to load code and data into a core, get that executed, and then write more code to get the results out to wherever. This was actually pretty liberating; instead of having a cache and memory subsystem trying to automatically deliver data to executions cores, get in the way or (worse) just hide inefficiencies from you, one had the freedom to break down an algorigthm into core-sized lumps knowing that if it fitted, it'd be very quick, or knowing for sure it didn't fit.
Miles Budnek's answer addresses the issues that arise from multi-core CPUs with a cache-coherency and a Symetric Multi Processing (SMP) environment. It's even harder for the cache designer to get it right if there's multuple cores that might very well start tampering with a value. The difficulties involved has lead to vulnerabilities like Meltdown and Spectre.
SMP could be said to be an "optimisation" put into CPUs by designers to aid the C (or other) developer in transitioning code from single to multiple thread. It's an attractive thought - in the way that a single thread programme can see all of it's data merely by addressing it, why not extend the same visibility of data to all threads in the programme?
Turns out that this is what makes it very difficult to design modern CPUs. However the reasons why the industry went this way are plain enough - the smallest possible delta between single and multicore CPUs was going to be the least troublesome for the existing software community to adopt. That's perfectly reasonable.
But it is running out of steam, fast. A better approach (if the goal is the outright pursuit of performance) would be to go back to the old Transputer architectures from Inmos from the 1980s, early 1990s. In such architectures, data held by one core could only be processed by another if the software was written to explicitly transfer the data. Sounds familiar? Yes - Cell process was a bit like that.
Interestingly, languages such as Rust, Go, Erlang have all implemented Communicating Sequential Processes as a parallel processing paradigm. The irony is that, these days, CSP has to be implemented on top of a SMP environment, which is itself an artificial construct brought about by the interconnect between CPUs, cores and memory (e.g. QPI, Hypertransport). Basically, if the whole software world got fully comfortable with CSP then CPU designers wouldn't have to design cache-coherency into their multi-core CPUs. Rust in particular is very well suited, as it already has a strong concept of data ownership in its syntax (which could be leveraged to shovel data around between cores automatically).

The article referred to by the OP seems to me to have it in for C for some reason. There were so many points in it I felt triggered by, but I don't want to go addressing each one point by point. Maybe there is some bias or special interest that has not been declared. As a C programmer, with a particular interest in writing high performance programs, I thought I'd give my two cents on some of the issues raised. Hopefully, this might be of interest to others in the industry with or without a programming background.
From my point of view, the strengths of C are mainly as follows....
C allows you to do things you just can't do in 'higher level' languages.
A well written C (see weakness no.1) program is hard to beat on performance on the same hardware, written in another language.
C is comfortable handling binary data allowing for memory conservation.
C is well established with lots of libraries and programmers.
Objects in memory can be made easy to process from anywhere in the program by using pointers so the data itself doesn't need to be passed around.
Multi-threaded and multi-process programs are quite easy to implement.
It has Read-Write shared memory between threads (and processes with some fancy low-level stuff?)
Assembly can be inlined where needed (though it's not C then I know!).
... and main weaknesses...
Utilising SIMD capabilities is not possible in standard C, and difficult to implement in a portable way with intrinsics.
It takes a lot of code to do simple things for which there are no library functions.
Buffer overflow potential is easily missed, even for experienced programmers.
C pointers can be confusing.
The C programming language has a special place in the evolution of programming languages and I for one, would welcome a replacement that is a better fit to what is possible with modern hardware if it doesn't tie the hands of the programmer and offers better security and performance. From the article,...
'A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model. Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.'
Such things exist already, GPUs! Modern CPUs are much more like GPUs than they used to be now core counts can be 100+. I have used OpenCL C to write programs with amazing computational performance but they can't do everything well. Some applications can not be efficiently parallelised, if at all. OpenCL C program performance can become terrible when there is even a small amount of branching. Also, it is so much easier to exhause your memory bandwidth and fast cache when running many threads that it might be judged not worth the added complexity over a good single threaded implementation.
In OpenCL C, the programmer has somewhat more control of where data is stored in memory which can definately aid performance. Maybe it's a costly mistake to try to make programming languages too hardware independent. Might it be better to review some (LLVM like) intermediate standard, like in OpenCL C, where one can define 'private', 'local' and 'constant' memory objects to get performance improvements over 'global' memory objects. Such a standard wouldn't need to be tied to an instruction set. As a programmer, I welcome fast CPU instructions but it would be nice if they could be much more easily utilised in portable code AND compilable to portable binaries. Maybe this is something compiler writers could look into along with using SIMD vector registers rather than memory for pushing and popping. As I see it, there are four levels of portability.
Hardware independent source code to run on any hardware conforming to the intermediate standard. The burden is on the compiler to create binaries that will run correctly and efficiently on any hardware conforming to the intermediate standard.
Hardware independent source code to run on any hardware conforming to the intermediate standard. The burden is on the host compiler to create binaries that will run on the host's hardware configuration conforming to the intermediate standard, but may not run on other hardware conforming to the same.
Hardware dependent source code where the logical execution path through the source depends on the architecture of the hardware on which it is run. Programs need to 'query' the hardware configuration.
Hardware specific source code.
In a fantasy world where one can just imagine new standards, hardware, and programming languages, one could choose which level of portablity to aim for. I think that C was supposed to be hardware independent, but it isn't really if you want to get the best performance out of your hardware. OpenCL C tries also, but doesn't quite make it, though with run-time kernel compilation it does a pretty good job. The host program has the same issues though as any other. I don't think there are any 'Level 1' portable languages currently.
Sorry my response is a bit rambling. It's unfortunate that it's difficult to have an objective constructive discussion about the pros and cons of different ideas about future changes in software and hardware. Personally, I think FPGAs have huge potential but are still a long way from where they would need to be to go mainstream. Any new computing language will probably become out of date when hardware changes occur and software trends change. It's remarkable that C still occupies such a prominent space. In another 10 or 20 years time, C will probably still be going strong. How many other modern languages will still be commonplace then?

What is XV6 operating system used for?

I have been taking online courses on Operating systems, and I heard them say, that XV6 operating system can be used learn implementation of operating systems, thats all.But after I searched on the internet there aren't enough resources, which would get me started with it.
My question is, why should I use it,and how will it help me in understanding operating system.
(Please be gentle with information you throw at me, I am a newbie :( )
Any effort is appreciated

There are only 2 possibilities:
too complex to be useful for teaching
too simple to be useful for teaching
Things like fancy features/enhanced functionality, mitigating security issues, dealing with hardware bugs/errata, performance, scalability and supporting a very wide range of different hardware all increase the complexity of the code; and if you look at a real commercial OS (e.g. Linux maybe) that has to care about all of these things it's hard to learn about one thing (e.g. memory management) without all the complexity getting in the way and making it significantly harder to learn.
If you have a simple OS that does none of those things (no fancy features, no mitigation of security issues, ...) then it's much much easier to learn basic principles from it; but it also becomes impossible to use it to learn about fancy features, mitigating security issues, dealing with hardware bugs/errata, ...
The solution is to start with a simple OS (e.g. XV6) to learn the basics, then switch to a real OS later to learn everything else.
However; most OS courses at Universities are not intended to teach you about writing an OS. Instead they're intended to give you basic information about operating systems so that you can use that knowledge when writing application programs for existing operating systems. For that reason (and because there's time constraints) they only do the first part (with a simple OS like XV6) and then the course finishes.

1.XV6 is used for teaching in many universities.
2.It's also a tool OS for many program

Does OpenCL on an APU capable of using the whole memory?

Is it possible to build a machine with something like 32GB of RAM, and use about ~28GB with OpenCL?
My current APU is an Athlon 5350, with a "global memory size" reported of 2142658560. I played a little with pyopencl with the CL_MEM_USE_HOST_PTR, but I didn't find a way for doing that.
Is that possible at all?
May be with some new generation APU, like Ryzen Vega?
NOTE: I'm a non-professional and newbie, I didn't spend a hour yet studing OpenCL because before investing money and time on this, I want to know if it's possible at all... so sorry if this is a stupid question.

Yes, it is possible to have a 32GB computer and to devote ~28GB of it's RAM to any program. When you are writing an OpenCL program, all management of memory spaces (on-chip and off-chip) must be done manually. I do not think you can run an OpenCL kernel that seems to directly accessing RAM, but even if you could, that would not be particularly worth thinking about because the power of OpenCL is in fine-grained management of RAM, L2, and L1 - not in allowing programmers to consider their program as operating against just RAM.
Take some time, dive deep into memory management, and gain a very firm grasp of your computer's several memory spaces of varying sizes, connection speeds, and connection bandwidths.
You seem to be thinking about buying a huge amount of RAM to solve your problem. Hopefully you can find a better way to architect your solution - one that does not involve buying 128GB of RAM.
That said, some programs are inherently hard to parallelize. For these programs you might just want to buy a ton of RAM (and maybe even skip OpenCL entirely and run it on the CPU)

What language to learn for microcontroller programming? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm getting into microcontroller programming and have been hearing contrasting views. What language is most used in the industry for microcontroller programming? Is this what you use in your own work? If not, why not?
P.S.: I'm hoping the answer is not assembly language.

In my experience, you absolutely must know C, and assembly language helps too.

Unless you are dealing with very bare-bones microcontrollers (like the RS08 series), C is by far the language of choice. Get to know C, understand functionality like volatile and const. Also understand the architecture - what is efficient, what isn't, what can the CPU do. These will differ wildly from a "desktop" environment. Learn to love stdint.h.
You will encounter C++ (or a restricted subset) as projects scale up.
However, you need to understand the CPU and how to read basic assembly as a debugging tool. You can't become an excellent embedded developer without this skillset.

What 'contrasting' views have you heard? To some extent it will depend on the microcontroller and the application. However C is available for almost all architectures (I hesitate to say all, but probably all that you will ever encounter); so on that point alone, learning C would give you the greatest coverage.
For all architectures, the availability of an assembler and a C compiler are pretty much a given. For 32-bit and most 16-bit architectures C++ will also be available. Notable exceptions I have encountered are Microchip's PIC24/dsPIC parts for which C++ is not supported by Microchip's own GNU based compiler (although 3rd party compilers may do so).
While there are C++ compilers for 8 bit microcontroller's C++ is not ubiquitous on such platforms, and often the compilers are subsets of the full language. For the types (or more specifically the size) of application for which 8-bit is usually employed, C++ may be useful but not to the extent that it is on much larger applications, so C is generally adequate.
There are lot of myths about C++ in embedded systems; while the language is larger than C and has constructs that may compromise the performance or capacity of your system, you only pay for what you use with C++. But of course if what you use is just the C subset, the C would be adequate in any case.
The point about C (and C++) is that it is a systems level language; it will run on your microprocessor with no additional support save a very simple runtime start-up to initialise the processor (and possibly external SDRAM), initialise static data, establish a stack, and in the case of C++ invoke static constructors. This is why along with target specific assembler, it is used to build operating systems and kernels - it needs no operating system or kernel itself to run.
One of the reasons I suggested that it may depend on the microcontroller is that if for example it is an ARM9 with a few Mb of external SDRAM, and at least say 4Mb Flash (also usually external - memory takes up a lot of die space), then you could run a 'heavyweight' OS on it such as Linux, WinCE, or Symbian, or even a large RTOS such as QNX or VxWorks. Then your choice of language (once you got the OS working), would be influenced by the OS, though for real-time applications C and C++ would still dominate, (or often Ada in military, avionics, and some transport applications).
For mid-size applications - a few hundred KBytes of code and data space - C# running on the .NET-Micro platform is possible; However I sat in a presentation of this at the Embedded Systems Show in the UK a few years ago, just after it was when it was launched; when I asked the question "but is it real-time", and was told, "no you need WinCE for that", there was a gasp and a groan from much of the audience, and some stopped wasting their time an left the presentation there and then (including me).
So I am still interested in the 'contrasting' opinions you have heard; because although it is possible to use other languages; the answer to your question:
What language is most used in the
industry for microcontoller
programming?
then the definitive answer is C; for the reasons I have given. For anyone who might choose to contest this assertion here are the statistics (note the different survey method after 2004 explained in the text). However just to add to the collection of alternatives, I once spent two years programming in Forth on embedded systems, and I know of people still using it, but it is a bit of a niche.

I've successfully used both C and C++ but in almost any microcontroller project you will need to be familiar with the assembly language of the target micro. If only for debugging low level hardware issues assembly will be indispensable, even if it is a cursory familiarity.
I think the hardest thing for me when moving from a desktop environment to a micro was that almost everything needs to be allocated statically. You won't often use malloc/new in a micro unless maybe it has external RAM.
I notice that you also tagged your question with FPGA and Verilog, take a look at Altium, they have a C to Hardware compiler that works really well with their integrated environment.

Regarding assembler:
Prefer C/C++ over assembler as much as possible. You'll get better productivity by writing as much as possible in C or C++. That includes being able to run some of your code on a PC, which can help developing the higher-level code (application-layer functions).
On many embedded platforms, it's good to have someone on the project who is comfortable with a little assembler. Mostly to get start-up code and interrupts going nicely, and perhaps functions for interrupt enable/disable. That's not the same as knowing it really thoroughly--just a basic working knowledge will be sufficient.
If you're porting an RTOS (e.g. µC/OS-II) to a new platform, then you'll have to know your assembler more. But hopefully your RTOS supports your platform well already.
If you're pushing up against CPU performance limits, you probably need to know assembler more thoroughly. But hopefully you're not pushing performance limits much, because that can be a drag on a project's viability.
If you're writing for a DSP, you probably need to know the DSP's assembler fairly thoroughly.

Microcontrollers were originally programmed only in assembly language, but various high-level programming languages are now also in common use to target microcontrollers. These languages are either designed specially for the purpose, or versions of general purpose languages such as the C programming language. Compilers for general purpose languages will typically have some restrictions as well as enhancements to better support the unique characteristics of microcontrollers. Some microcontrollers have environments to aid developing certain types of applications. Microcontroller vendors often make tools freely available to make it easier to adopt their hardware.
Many microcontrollers are so quirky that they effectively require their own non-standard dialects of C, such as SDCC for the 8051, which prevent using standard tools (such as code libraries or static analysis tools) even for code unrelated to hardware features. Interpreters are often used to hide such low level quirks.
Interpreter firmware is also available for some microcontrollers. For example, BASIC on the early microcontrollers Intel 8052[4]; BASIC and FORTH on the Zilog Z8[5] as well as some modern devices. Typically these interpreters support interactive programming.
Simulators are available for some microcontrollers, such as in Microchip's MPLAB environment. These allow a developer to analyze what the behavior of the microcontroller and their program should be if they were using the actual part. A simulator will show the internal processor state and also that of the outputs, as well as allowing input signals to be generated. While on the one hand most simulators will be limited from being unable to simulate much other hardware in a system, they can exercise conditions that may otherwise be hard to reproduce at will in the physical implementation, and can be the quickest way to debug and analyze problems.

You need to know assembly language programming.You need to have good knowledge in C and also C++ too.so work hard on thse things to get better expertize on micro controller programming.

And don't forget about VHDL.

For microcontrollers assembler comes before C. Before the ARMs started pushing into this market the compilers were horrible and the memory and ROM really tiny. There are not enough resources or commonality to port your code so writing in C for portability makes no sense.
Some microcontroller's assembler is less than desirable, and ARM is taking over that market. For less money, less power, and less footprint you can have a 32-bit processor with more resources. It just makes sense. Much if your code will still not port, but you can probably get by with C.
Bottom line, assembler and C. If they advertise BASIC or Java or something like that, mark that company on your blacklist and move on. Been there, done that, have the scars to prove it.

First Assembly. After C.
I think that who knows Assembly and C are better than who knows only C.

How would I go about creating my own VM?

I'm wondering how to create a minimal virtual machine that'll be modeled after the Intel 16 bit system. This would be my first actual C project, most of my code is 100 lines or less, but I have the core fundamentals down, read K&R, and understand how things ought to work, so this pretty much is a test of wits.
Could anyone guide me in as far as documentation, tools, tutorials, or plain old tips/pointers on how to go about this, thus far I understand that I require somewhere to store data, a CPU of sorts and some sort of mechanism that functions as an interrupt controller.
I'm doing this to learn: Systems internals, ASM internals and C - three facets of computing that I want to learn in a singular project.
Please be kind enough not to tell me to do something simpler - that would only be annoying. :)
Thanks for reading, and hopefully writing!

Virtual machines fall into two categories: those that interpret the code instruction at a time and those that compile the code to native instructions (e.g. "JIT").
The interpretation category is usually built around an instruction execution loop, using a switch statement, computed gotos or function pointers to determine how to execute each instruction.
There is a fun platform that is worth studying for its simplicity and fun: Corewars.
Corewars is a programming challenge game where programs written in "Redcode" run on a MARS VM. There are many MARS VMs, typically written in C.
It has also inspired 8086-based versions, where programs written in 8086 assembler battle.

Well, for starters I would pick up a reference book for assembly language for the processor you intend to virtualize, like 80286 or similar.

For a JIT, you might want to dynamically generate and execute x86 code.

If you want to write a Virtual Machine using the x86 VMM technology you will need quite a bit of things.
There are a few instructions that are critical such as VM_ENTER/VM_EXIT (name can change depending on the chip, AMD and INTEL use different names but the functionalities are the same). Those instructions are actually privileged and therefore, you will need to write a kernel module to use them.
The first step for your VM to start is to boot it and therefore, you will need a 'BIOS' which will be loaded. Then you need to emulate devices, etc. You could even run an old version of MSDOS in such a VM if you wanted to.
All in all, it clearly isn't trivial and requires a lot of time and effort.
Now, you could do something similar to what VMWare used to do before the Virtualization ready CPUs appeared.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight