I first came across the ARM instruction set in the 80's, and have not used it since. Out of curiosity I was looking at the the tablets and other ARM devices and note that the CPU's are produced by different manufacturers.
I did a quick search but I couldn't find a definitive statement as whether the different ARM chips have differing instruction sets.
I would assume that in the main they are the same.
goto http://infocenter.arm.com along the left under contents look for ARM architecture. And under that Reference Manuals. used to be there was a single ARM ARM (ARM Architecture Reference Manual) but the family has grown to the point they had to break it into, well, families.
The ARM ARM's are going to show you the instruction sets. What I think they call the ARMv5 manual is the old ARM ARM. You will find the ARM instructions (32bit) and thumb instructions (16 bit). For each instruction they list what architecture supports it, so you might see an ARMv5 instruction that is not supported by the ARMv4 (ARMv4 a.k.a ARM7, like the popular ARM7TDMI core). Thumb instructions are supported by ARMv4T and newer, etc.
So there is the core 32 bit arm instruction set which you may have been used to with new instructions added from time to time and bugs/restrictions fixed (ldr r0,[r0] for example), etc.
The floating point unit has had one or two overhauls, most cores do not have a fpu and the ones that have an fpu that doesnt mean the chip vendor included it in the chip. the fpa being the older, vfp being newer and now neon stuff. If you pay attention these all fall into the generic coprocessor instructions category. But you dont have to know/use the coprocessor version they have aliases for everything.
There is/was this java/jazelle thing, same story some cores might have it as an option doesnt mean the vendor included it.
At least two sets of thumb2 extensions to the thumb instruction set. Before thumb2 extensions the thumb instructions were all 16 bit and had a one to one mapping to an ARM instruction, makes sense you only need an ARM core, the decoder translates from the smaller instruction to ARM instruction and feeds that to the core. All instructions are 16 bit except the branch, and if you look at that pattern you can quite easily decode that as two separate 16 bit instructions. So then they decide to make their microcontroller offering smaller, instead of everyone just using the ARM7TDMI and consuming the chip size and power, thumb2 capable processors are thumb only, they do not support 32 bit ARM instructions, there is no ARM core that thumb instructions are translated to, etc. new core. The ARMv6-M a.k.a Cortex-m0 and Cortex-m1 take the thumb instruction set and add a few 32 bit instructions to close the performance gap to ARM (thumb was smaller yes, but a little slower than ARM if you compiled the same code to both, it took like 10-20% more instructions from my experiments to use thumb). In theory thumb-2 (ARMv7-M) outperforms ARM when and where you can compare them. For whatever reason the Cortex-m3 came out first which is ARMv7-M and has a bunch of 32 bit thumb2 instructions added to the thumb instruction set. I recently counted and ARMv6-M added like 20, ARMv7-M has like 140-150 instructions added to the base thumb instruction set. thumb2 is basically variable word length. And again only runs on the cortex-m series. Looking at it it is almost like they re-built the ARM instruction set again under the name thumb. not completely but you get back a lot of arm like instructions, three register instead of two, being able to reach higher registers and use immediates, etc. What this caused is a desire to write asm that compiled for both ARM and thumb/thumb2. So they came up with a unified syntax. you can write an instruction like
add r0,r1
If assembling for thumb, that is the instruction, if assembling for arm they will convert it to
add r0,r0,r1
for you, instead of any syntax errors. You have to specify that you are using the unified syntax, at least with the gnu binutils assembler (gas).
An equally important set of documents is the Technical Reference Manuals, also at infocenter.arm.com. Each core has a trm, actually each rev of each core has a TRM. Also the extra cost items like L2 caches have their own TRM, for each rev. it is important to find out the core the chip vendor bought/used and if possible the revision (rev 2.0 r2p0, rev 1.0 r1p0, etc) as there are programming differences as well as errata differences between them (dont trust Linux as a reference!, it is a huge mess, every time I look yet another company has completely misunderstood and misapplied core/errata differences, it si a bit of a disaster at the moment). Sometimes the TRM includes instruction information, or paints a more clear picture on what that core supports and doesnt support. The ARM ARM's are generic they cover the whole family or a number of families of cores, where the TRM is very specific to one core. An example of confusing between the ARM ARM and the TRM is that looking at the ARM ARM you might get the impression that you can use BE-32 or BE-8 big endian modes, the reality is you have either one or the other ARMv6 and newer is BE-8, period, get used to it. ARMv5 and ARMv4 is BE-32 or before ARMv6 just called big endian. I highly recommend NOT using big endian on an arm despite what you think you might gain from it. go with the native mode and you will save yourself a ton of work and failure. I mention it from personal experience trying to figure out why the bits described in an ARM ARM just didnt work in the core I was using.
A 64 bit core is somewhere in the development phase, I wouldnt be surprised if it is done and just looking for someone to pull the trigger and use it. Actually the ARMv8 doc is available, downloading now.
Short answer infocenter.arm.com under ARM Architecture you find all the docs describing the different instruction sets as well as improvements/additions over time to those instruction sets.
There is no difference (with respect to the instruction set) between manufacturers.
They all respect the ARM specification.
Some extensions are optional. This is the case with NEON.
But, as far as I know, only the Tegra 2 does not include this extension.
This is why the Tegra 2 is a very bad processor for video decoding (for example).
There are a few common variations of instructions sets, UAL, Thumb and Thumb2 being the most common. Some ARM cores that contain specialized hardware (such as DSPs) extend the language as well.
This used to not be the case. ARM required adherence to their spec. ARM ships IP which of course adheres to their spec but they also require the architecture licensees to adhere to it. However, that changed slightly in 2019 when ARM began to allow custom instructions with their embedded CPUs.
Related
And can we hope it will be supported in all future mobile devices on ARM architecture (including NVIDIA Tegra)?
Unlike certain other ISA author companies, ARM does not have a hard commitment to backward compatibility. Mobile devices, the primary market for ARM cores, are unique and heterogeneous to the point where having a stable ISA would not make much difference. ARM is also targeted at low-power applications, meaning it is desirable to exclude unwanted features and keep the instruction set small, using only what the target application can take advantage of. As a result, a given release of the ARM architecture is a union of ISA families that together define feature-rich a system with relatively few unwanted instruction families.
An (incomplete) argument for a changing instruction set across releases is that an ISA is no longer seen as the programmer abstraction, and a chip-specific compiler is tasked with presenting a stable language for the programmer. Of course, compilers are only so good at squeezing C into extravagant instructions, and architecture-specific directives (and hand-coded segments of assembly) are still common in efficiency-critical libraries.
While some of the ISA extensions we have seen crop up over the years that prove to be widely useful ISA features (NEON appearing to be one of them - 128-bit SIMD is handy!) persist more reliably than others, there does not appear to be a guarantee that any specific ISA extension is to become future-compatible.
In response to the original question, NEON is mandatory in all Cortex-A8 devices but optional in Cortex-A9. ARM's business model allows companies to license a Cortex-A9 core and omit non-mandatory extensions. Tegra 2 appears to exclude NEON (but makes up for it by offering a "SIMD"-ish coprocessor in form of the GPU).
In theory, NEON is optional in ARM processors implementing ARM v7 architecture (for example Cortex-A9 based SoCs). In practice, most modern SoCs indeed have NEON.
On the other hand, NEON is mandatory in ARM processors based on ARMv8 architecture (the processors with 64-bit support).
I am looking for some information on programming ARM devices, in a particular non-particular way [1]. Assume that I am writing code for an ARM processor that is used a machine similar to a Apple II/Atari "**" XL/Commodore 64/DOS-PC, or even something that runs a multitasking OS like VMS or SUNOs. Assume further that any peripherals/OS specific stuff has already been abstracted into subroutines.Examples of this type of programing might be: a text/curses based game like rogue or moria; a curses based word processor ( or rather something based on a curses like library ) ; or a modem/terminal program.
I'm looking for two things. Materials to help learn ARM programming, though the ARM System Developers Guide may be enough, other resources would be helpful I'm looking in particular for something which explains the software ( and relative hardware ie registers ) differences of various generations of processor.
The other thing I'm looking for is a development environment which inculdes, emulation, a decent macro assembler, and a debugger. Along with any thing else that will help me see what is going on inside my programs.
[1] OK. Sorry I just couldn't resist that particular pun.
You have the choice of using ARM Cortex M or A series. If you are going to develop high end applications such as those which run on smartphones / tablets, then learning about ARM A is your choice. If you are going for an emphasis with hardware/low level stuff such as controllers then you should go for ARM Cortex-M. If you are into real time applications (which I doubt is your case, them use the R series).
Most of these new ARM generations are based on ARMv7 architecture and ISA, so reading the manuals on that could get you started. Most recently, a new ARMv8 architecture and ISA have been announced, it supports 64 bit processing.
Download the reference and technical manuals from ARM site to learn about the HW/peripherals.
I would go with auslen's suggestion of buying a board, you could go with TI's Stellaris Launch pad which has an ARM-M4F processor (supports floating point and SIMD), it sells for 12.99$
http://www.ti.com/ww/en/launchpad/stellaris_head.html?DCMP=stellaris-launchpad&HQS=stellaris-launchpad-b
or you could go with ST's discovery board (based on the same processor as above), but has audio, accelerometer and usb on board. it sells for 14.99$ http://www.st.com/internet/evalboard/product/252419.jsp
or the STM F3 board (10.99$)
http://www.st.com/internet/evalboard/product/254044.jsp
In any case, you need to check the examples which come with the board, without which you could go nowhere easily. The board comes with its own drivers, all is abstracted in a way, so you could get started from there!
As for OS, if your interest is an RTOS, ARM provides the CMSIS RTOS for it's M series processors
http://www.arm.com/products/processors/cortex-m/cortex-microcontroller-software-interface-standard.php
This book offers an introduction to the generations of ARM processors. Then focuses on cortex M3. It covers its ISA with lots of assembly code. It also addresses the built-in peripherals and how to start-up with C.
http://www.amazon.ca/Definitive-Guide-ARM-Cortex-M3/dp/185617963X/ref=sr_1_1?ie=UTF8&qid=1352506616&sr=8-1
good luck
infocenter.arm.com and look at the various ARM ARMs (architectural reference manuals) and TRM's (technical reference manuals) for the various architectures and cores. these manuals are better than most other companies documentation. except for the new 64 bit stuff, the difference from one architecture to the next is somewhat subtle as far as the instruction set goes. the major differences have to do with the peripherals, the mmu is a slow changing thing, the interrupt manager has taken big steps and the fpu has been replaced at least once if not twice wholesale (if you even have an fpu which, having one is the exception not the rule it consumes a huge real estate for such little return).
I am confused with your question. I think it is important to draw the line between learning the architecture/instruction set and learning the operating system calls, these are two separate things. Operating system stuff you rarely need to look beyond the source code (C/C++), and the limited asm is for hand tuned C libraries or boostrap code, and interrupt wrappers. Likewise the architecture, registers, instructions, etc vs the peripherals (the cores from arm generally have very very few peripherals, the bulk are in the vendor specific stuff) which I would separate as a separate learning curve, has little to do with asm and the instruction set so no different than learning a peripheral on any other platform, just some addresses you read and write.
If you are looking for non-operating system bare metal the stm32f0 discovery is $10, I highly recommend it. Looks like ti has a stellaris launchpad for just a little more (waiting for mine to arrive so I cant talk much about them, and shipping is free from ti so the cost is basically the same as the stm32 boards) the stm32f4 discovery is about $20 and I would barely call a microcontroller with all the stuff the cortex-m4 has.
Moving up to linux capable or designed for linux systems there is the raspberry pi, beaglebone and open-rd and on up (pandaboard). Again though you are just writing just another linux C/C++ program so there isnt much excitement there (related to a specific platform, the entertainment is the same for all platforms) and very little arm knowledge required if any. It is very easy to use any of these platforms for bare metal programming giving you race car like performance compared to the ARM based microcontrollers.
I have a thumb simulator which you are probably not interested in. gdb has the armulator which was the cornerstone of the company back in the day. skyeye or something like that has an arm instruction set simulator as does qemu, none of them will give you great visibility other than what gdb can provide. opencores has the amber project an armv2 clone, which you can see the close relationship to the armv4 and newer that you will not find rtl for without a box full of cash. with my arm and chip experience (No I do not work for arm) I do find the amber project worth looking at, but many folks wont know what to do with it and really are not interested in that level of visibility. (it is instruction compatible, a good design, but dont think you are looking at an arm design, no secrets there). you can learn the basic arm architecture from it and then move on to hardware for example...
With the microcontrollers being cortex-m based, you might find the older microcontrollers a better stepping stone to the upper end arm cores. ARM7tdmi based stuff like the sam7s and others from nxp, st, atmel, etc which you can still find at sparkfun and microcontroller pros and other places for arduino like prices.
Is it possible to programmatically alter a ARMv7-compiled binary to replace all the new opcodes and instructions with ARMv6 compatible ones?
I don't really care that much about performance at this point, I just want to use some ARMv7 only binaries on an ARMv6 (with vfp, if that matters).
vfp will matter if using instructions not supported on ARMv6 unless those are the ones you are replacing.
If we are talking just arm instructions not armv6 it is probably a short list. It is probably reducing the number of instructions though so you would have to modify the code such that the armv7 instruction causes a branch somewhere, that somewhere is the replacment code using armv6 or older instructions, then branch back. not branch and link, unconditional branch or ldr pc,something, etc. if you are talking about thumb2 stuff that may still be possible but
probably more work, some things you might not be able to do.
short answer: Yes in general this kind of thing can be done.
The C language was used to write UNIX to achieve portability -- the same C language program compiled using different compilers produces different machine instructions. How come Windows OS is able to run on both Intel and AMD processors?
AMD and Intel processors(*) have a large set of instructions in common, so it is possible for a compiler or assembler to write binary code which runs "the same" on both.
However, different processor families even from one manufacturer have their own sets of instructions, usually referred to as "extensions" or whatever. Ignoring the x87 co-processor, the first time I remember this being a marketing point was when everything suddenly went "with MMX(TM) technology". Binary code expected to run on any processor either needs to avoid extensions, or to detect the CPU type before using them.
Intel's Itanium 64-bit architecture was completely different from AMD's x86-64 architecture, so for a while their 64bit offerings were non-compatible (and Itanium was nothing like x86, whereas x86-64 extended the instruction set by adding 64bit instructions). Intel blinked first and adopted x86-64, although there are still a few differences: http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
Windows probably uses the common x86 or x86-64 instruction set for almost all code. I wouldn't be surprised if various drivers and codecs are shipped in multiple versions, and the correct one selected once the CPU has been interrogated.
(*) Actually, Intel make or have made various kinds of processors, including ARM (Intel's ARM processors were called XScale, but I think they've sold that business). And AMD make other processors too. But we know which Intel/AMD processors you mean :-)
AMD are Intel compatible, otherwise they would never have gained a foothold in the market place.
They are effectively clone compatible.
As you suspect, the main stream Intel and AMD processors have the same instruction set.
Windows does not run on ARM or PowerPC chips, for example, because it is somewhat dependant on the underlying instruction set.
However, most of Windows is written in C++ (as far as I know), which should be portable to other architectures. Windows NT even ran on PowerPC and other architectures.
Intel's 80x86 CPUs and AMD's 80x86 are "mostly the same sort of", but some things are completely different (e.g. virtual machine extensions - SVM vs. VT-x) and some things (extensions) may or may not be supported. However, some things are different on different CPUs from the same manufacturer too (e.g. some Intel chips support AVX2 and some don't).
There are multiple ways to deal with the differences:
only use the common subset so the same code runs on all 80x86 CPUs (e.g. treat it like an 8086 chip).
use a subset of features that is common to a range of CPUs so the same code runs on all 80x86 CPUs in that range. This is very common (e.g. "this software requires an 80x86 CPU (and OS) that supports 64-bit extensions").
use install-time tests. For example, there might be 4 different copies of software (compiled for 4 different ranges of CPUs) where the installer decides which copy makes sense for the computer the software is being installed on.
use run-time tests. For example, code can use the CPUID instruction to do if( AVX2_is_supported() ) { set_function_pointers_so_AVX2_is_used(); } else {set_function_pointers_so_AVX2_is_not_used(); }. Note: Some compilers (Intel's ICC) can automatically generate code that does run-time tests.
These aren't mutually exclusive options. For example, the installer might decide to install a 64-bit version (and not a 32-bit version), and then the 64-bit version might check which features are supported at run-time and have different code to use different features.
Also note that different parts of an OS can be treated separately. For example, an OS could have 6 different boot loaders, 4 different "HALs", 4 different kernels, and 3 different "kernel modules" to support virtualisation; where some of these things might do run-time tests and some might not.
Do Intel and AMD processor have the same assembler?
Almost all assemblers for 80x86 support almost all extensions (from all CPU manufacturers - e.g. Intel, AMD, VIA, Cyrix, SiS, ...). In general; it's up to the programmer (or compiler) to make sure they only use things that they know exist. Some assemblers provide features to make this easier (e.g. NASM provides a CPU ... directive so that the programmer can tell the assembler to generate errors if it sees instructions that aren't supported on the specified CPU).
AMD and Intel use the same instruction set.
When you install windows on an AMD processor or an Intel processor, it doesn't "compile" code on the machine.
I remember many people being confused on this subject back during college. They believe that a "setup" means that it is compiling code on your machine. It isn't. Most if not all Windows application outside of the free realms, are given to you by binary.
As for portability, that isn't neccessarily 100% true. While C is highly portable, in many cases writing for a specific OS or system will result in the code only being able to compile/executed on that box. For example, certain Unix machines handle files and directories differently so it might not be 100% portable.
Do Intel and AMD processor have the same assembler?
An assembler assembles a program to be run on a processor, so your question is flawed. Processors DO NOT use assemblers.
If you mean can Intel and AMD processor run the same assembler? Then the answer is YES!!!
All an assemblers are, is a program that assembles other programs from structured text files. Visual Basic is an example of an assembler.
First Question
From a C programmer's point of view, what are the differences between Intel Core processors and their AMD equivalents ?
Related Second Question
I think that there are some instructions that differentiate between the Intel Core from the other processors and vis-versa. How important are those instructions ? Are they being taken into account by compilers ? Would performances be better if there was some special Intel compiler only for the Core family ?
If you are programming user-level code and most driver code, there aren't many differences (one exception is the availability of certain instruction sets - which may differ for different processors, see below). If you are writing kernel code dealing with CPU-specific features (profiling using internal counters, memory management, power management, virtualization), the architectures differ in implementation, sometimes greatly.
Most compilers do not automatically take advantage of SSE instructions. However, most do provide SSE-based intrinsics, which will allow you to write SSE-aware code. The subset of all SSE levels available differs for each processor architecture and maker.
See this page for instruction listings. Follow the links to see which architectures the specific instructions are supported on. Also, read the Intel and AMD architecture development manuals for exact details about support and implementation of any and all instruction sets.
First Question From a C programmer's point of view, what are the differences between
Intel Core processors and their AMD equivalents ?
The most significant differences are likely to show up only in highly specialized code that makes use of new generation instructions, such as vector maths, parallelization, SSE.
Would performances be better if there was some special Intel compiler only for the Core family ?
Not sure if you are aware of it, but there's a compiler specifically for Intel cores: icc. It's generally considered to be the best compiler from an optimization point of view.
You might want to check out its wikipedia article.
According to the Intel Core Wikipedia article, there were notable
improvements to SSE, SSE2, and SSE3 instructions. These instructions are SIMD (same instruction, multiple data), meaning that they are designed for applying a single arithmetic operation to a vector of values. They are certainly important, and have been made used by compilers such as GCC for quite awhile.
Of course, recent AMD processors have adopted the newest Intel instructions, and vice-versa. This is an ongoing trend.