Intel Core for a C programmer - c

First Question
From a C programmer's point of view, what are the differences between Intel Core processors and their AMD equivalents ?
Related Second Question
I think that there are some instructions that differentiate between the Intel Core from the other processors and vis-versa. How important are those instructions ? Are they being taken into account by compilers ? Would performances be better if there was some special Intel compiler only for the Core family ?

If you are programming user-level code and most driver code, there aren't many differences (one exception is the availability of certain instruction sets - which may differ for different processors, see below). If you are writing kernel code dealing with CPU-specific features (profiling using internal counters, memory management, power management, virtualization), the architectures differ in implementation, sometimes greatly.
Most compilers do not automatically take advantage of SSE instructions. However, most do provide SSE-based intrinsics, which will allow you to write SSE-aware code. The subset of all SSE levels available differs for each processor architecture and maker.
See this page for instruction listings. Follow the links to see which architectures the specific instructions are supported on. Also, read the Intel and AMD architecture development manuals for exact details about support and implementation of any and all instruction sets.

First Question From a C programmer's point of view, what are the differences between
Intel Core processors and their AMD equivalents ?
The most significant differences are likely to show up only in highly specialized code that makes use of new generation instructions, such as vector maths, parallelization, SSE.
Would performances be better if there was some special Intel compiler only for the Core family ?
Not sure if you are aware of it, but there's a compiler specifically for Intel cores: icc. It's generally considered to be the best compiler from an optimization point of view.
You might want to check out its wikipedia article.

According to the Intel Core Wikipedia article, there were notable
improvements to SSE, SSE2, and SSE3 instructions. These instructions are SIMD (same instruction, multiple data), meaning that they are designed for applying a single arithmetic operation to a vector of values. They are certainly important, and have been made used by compilers such as GCC for quite awhile.
Of course, recent AMD processors have adopted the newest Intel instructions, and vice-versa. This is an ongoing trend.

Related

Compilers and Instruction sets

"C is a genereal purpose language, not tied to a particular system"
The C programming Language, BRIAN W KERNIGHAN & DENNIS M. RITCHIE
Yet with the right compiler we can make a .exe which runs on every Windows machine, which in turn means on every CPU Windows runs on.
So my question is: does every x86-64 CPU (Intel or AMD) use the same instruction set ? (yes, I could make a comparison...) if not, then I'll have to assume that the compiler detects what CPU we're running and uses the right instruction set during compile time.
Am I totally mistaken ?
I barely know what I'm talking about so please bear with me.
Just a dude trying to look under the hood.
Thank you
Intel makes many different processor models that share a core instruction set of the “x86-64” family (and additional processor models that do not). Even among the processors with the shared core instructions, there are variations. Newer models may have instructions that older models did not, and some parts of the instruction set may be on certain models and not others.
Some instructions even behave differently on different processors.
When you compile a program, the compiler “targets” a particular combination of instruction subsets. This means the instructions in those subsets are available for the compiler to use when it is generating code. The compiler might or might not use any particular instruction or subset depending on its needs or choices when compiling a particular program. The resulting program is then suitable for processor models with the targeted instructions and not for other models (unless the compiler happened not to use any of the instructions not on those models, even though it could have).
Often, the default setting for the compiler‘s target is either a processor model like the one you are running on or some typical selection of instruction subsets that is common for modern processor models. The target may also be selected based on other settings you give the compiler, such as asking it to target a particular version of an operating system. However, you can pass the compiler switches to tell it to compile for entirely different targets, even for entirely different architectures, such as compiling for an ARM processor while running on an Intel processor.
Software is also part of a computer system, so the executable file the compiler produces may also depend on certain software libraries being available at run-time or certain operating system features being available.

Is NEON extension present at all modern ARM SoCs?

And can we hope it will be supported in all future mobile devices on ARM architecture (including NVIDIA Tegra)?
Unlike certain other ISA author companies, ARM does not have a hard commitment to backward compatibility. Mobile devices, the primary market for ARM cores, are unique and heterogeneous to the point where having a stable ISA would not make much difference. ARM is also targeted at low-power applications, meaning it is desirable to exclude unwanted features and keep the instruction set small, using only what the target application can take advantage of. As a result, a given release of the ARM architecture is a union of ISA families that together define feature-rich a system with relatively few unwanted instruction families.
An (incomplete) argument for a changing instruction set across releases is that an ISA is no longer seen as the programmer abstraction, and a chip-specific compiler is tasked with presenting a stable language for the programmer. Of course, compilers are only so good at squeezing C into extravagant instructions, and architecture-specific directives (and hand-coded segments of assembly) are still common in efficiency-critical libraries.
While some of the ISA extensions we have seen crop up over the years that prove to be widely useful ISA features (NEON appearing to be one of them - 128-bit SIMD is handy!) persist more reliably than others, there does not appear to be a guarantee that any specific ISA extension is to become future-compatible.
In response to the original question, NEON is mandatory in all Cortex-A8 devices but optional in Cortex-A9. ARM's business model allows companies to license a Cortex-A9 core and omit non-mandatory extensions. Tegra 2 appears to exclude NEON (but makes up for it by offering a "SIMD"-ish coprocessor in form of the GPU).
In theory, NEON is optional in ARM processors implementing ARM v7 architecture (for example Cortex-A9 based SoCs). In practice, most modern SoCs indeed have NEON.
On the other hand, NEON is mandatory in ARM processors based on ARMv8 architecture (the processors with 64-bit support).

Intel based hardware speed ups for DCT?

We are writing an image processing algorithm targeting some Intel hardware. Generally we prefer generic C implementations, but we have identified an algorithm that at its core does a ton of Discrete Cosine Transforms (DCT's) that works extremely well. Unfortunately, our throughput requirements are such that a generic C implementation is about 2 orders of magnitude too slow. I can get one order of magnitude through some other tricks, so if I can improve my DCT's by about an order of magnitude I have a path towards success.
Is the Intel MMX a way to get at hardware acceleration to do these DCT's? Is there other intel specific libraries and/or hardware that I can exploit to speed these bad boys up?
Where do I start to look? This is a new job for me, and my first time digging hard into Intel hardware, so any pointers would be most appreciated.
Take a look at Intel's Integrated Performance Primitives library. It contains a wealth of routines that are optimized heavily to take use of the Intel architecture, specifically MMX and SSE. Among many other things, IPP also contains routines for the DCT (documentation here).

Do different ARM manufacturers provide different instruction sets?

I first came across the ARM instruction set in the 80's, and have not used it since. Out of curiosity I was looking at the the tablets and other ARM devices and note that the CPU's are produced by different manufacturers.
I did a quick search but I couldn't find a definitive statement as whether the different ARM chips have differing instruction sets.
I would assume that in the main they are the same.
goto http://infocenter.arm.com along the left under contents look for ARM architecture. And under that Reference Manuals. used to be there was a single ARM ARM (ARM Architecture Reference Manual) but the family has grown to the point they had to break it into, well, families.
The ARM ARM's are going to show you the instruction sets. What I think they call the ARMv5 manual is the old ARM ARM. You will find the ARM instructions (32bit) and thumb instructions (16 bit). For each instruction they list what architecture supports it, so you might see an ARMv5 instruction that is not supported by the ARMv4 (ARMv4 a.k.a ARM7, like the popular ARM7TDMI core). Thumb instructions are supported by ARMv4T and newer, etc.
So there is the core 32 bit arm instruction set which you may have been used to with new instructions added from time to time and bugs/restrictions fixed (ldr r0,[r0] for example), etc.
The floating point unit has had one or two overhauls, most cores do not have a fpu and the ones that have an fpu that doesnt mean the chip vendor included it in the chip. the fpa being the older, vfp being newer and now neon stuff. If you pay attention these all fall into the generic coprocessor instructions category. But you dont have to know/use the coprocessor version they have aliases for everything.
There is/was this java/jazelle thing, same story some cores might have it as an option doesnt mean the vendor included it.
At least two sets of thumb2 extensions to the thumb instruction set. Before thumb2 extensions the thumb instructions were all 16 bit and had a one to one mapping to an ARM instruction, makes sense you only need an ARM core, the decoder translates from the smaller instruction to ARM instruction and feeds that to the core. All instructions are 16 bit except the branch, and if you look at that pattern you can quite easily decode that as two separate 16 bit instructions. So then they decide to make their microcontroller offering smaller, instead of everyone just using the ARM7TDMI and consuming the chip size and power, thumb2 capable processors are thumb only, they do not support 32 bit ARM instructions, there is no ARM core that thumb instructions are translated to, etc. new core. The ARMv6-M a.k.a Cortex-m0 and Cortex-m1 take the thumb instruction set and add a few 32 bit instructions to close the performance gap to ARM (thumb was smaller yes, but a little slower than ARM if you compiled the same code to both, it took like 10-20% more instructions from my experiments to use thumb). In theory thumb-2 (ARMv7-M) outperforms ARM when and where you can compare them. For whatever reason the Cortex-m3 came out first which is ARMv7-M and has a bunch of 32 bit thumb2 instructions added to the thumb instruction set. I recently counted and ARMv6-M added like 20, ARMv7-M has like 140-150 instructions added to the base thumb instruction set. thumb2 is basically variable word length. And again only runs on the cortex-m series. Looking at it it is almost like they re-built the ARM instruction set again under the name thumb. not completely but you get back a lot of arm like instructions, three register instead of two, being able to reach higher registers and use immediates, etc. What this caused is a desire to write asm that compiled for both ARM and thumb/thumb2. So they came up with a unified syntax. you can write an instruction like
add r0,r1
If assembling for thumb, that is the instruction, if assembling for arm they will convert it to
add r0,r0,r1
for you, instead of any syntax errors. You have to specify that you are using the unified syntax, at least with the gnu binutils assembler (gas).
An equally important set of documents is the Technical Reference Manuals, also at infocenter.arm.com. Each core has a trm, actually each rev of each core has a TRM. Also the extra cost items like L2 caches have their own TRM, for each rev. it is important to find out the core the chip vendor bought/used and if possible the revision (rev 2.0 r2p0, rev 1.0 r1p0, etc) as there are programming differences as well as errata differences between them (dont trust Linux as a reference!, it is a huge mess, every time I look yet another company has completely misunderstood and misapplied core/errata differences, it si a bit of a disaster at the moment). Sometimes the TRM includes instruction information, or paints a more clear picture on what that core supports and doesnt support. The ARM ARM's are generic they cover the whole family or a number of families of cores, where the TRM is very specific to one core. An example of confusing between the ARM ARM and the TRM is that looking at the ARM ARM you might get the impression that you can use BE-32 or BE-8 big endian modes, the reality is you have either one or the other ARMv6 and newer is BE-8, period, get used to it. ARMv5 and ARMv4 is BE-32 or before ARMv6 just called big endian. I highly recommend NOT using big endian on an arm despite what you think you might gain from it. go with the native mode and you will save yourself a ton of work and failure. I mention it from personal experience trying to figure out why the bits described in an ARM ARM just didnt work in the core I was using.
A 64 bit core is somewhere in the development phase, I wouldnt be surprised if it is done and just looking for someone to pull the trigger and use it. Actually the ARMv8 doc is available, downloading now.
Short answer infocenter.arm.com under ARM Architecture you find all the docs describing the different instruction sets as well as improvements/additions over time to those instruction sets.
There is no difference (with respect to the instruction set) between manufacturers.
They all respect the ARM specification.
Some extensions are optional. This is the case with NEON.
But, as far as I know, only the Tegra 2 does not include this extension.
This is why the Tegra 2 is a very bad processor for video decoding (for example).
There are a few common variations of instructions sets, UAL, Thumb and Thumb2 being the most common. Some ARM cores that contain specialized hardware (such as DSPs) extend the language as well.
This used to not be the case. ARM required adherence to their spec. ARM ships IP which of course adheres to their spec but they also require the architecture licensees to adhere to it. However, that changed slightly in 2019 when ARM began to allow custom instructions with their embedded CPUs.

Do Intel and AMD processor have the same assembler?

The C language was used to write UNIX to achieve portability -- the same C language program compiled using different compilers produces different machine instructions. How come Windows OS is able to run on both Intel and AMD processors?
AMD and Intel processors(*) have a large set of instructions in common, so it is possible for a compiler or assembler to write binary code which runs "the same" on both.
However, different processor families even from one manufacturer have their own sets of instructions, usually referred to as "extensions" or whatever. Ignoring the x87 co-processor, the first time I remember this being a marketing point was when everything suddenly went "with MMX(TM) technology". Binary code expected to run on any processor either needs to avoid extensions, or to detect the CPU type before using them.
Intel's Itanium 64-bit architecture was completely different from AMD's x86-64 architecture, so for a while their 64bit offerings were non-compatible (and Itanium was nothing like x86, whereas x86-64 extended the instruction set by adding 64bit instructions). Intel blinked first and adopted x86-64, although there are still a few differences: http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
Windows probably uses the common x86 or x86-64 instruction set for almost all code. I wouldn't be surprised if various drivers and codecs are shipped in multiple versions, and the correct one selected once the CPU has been interrogated.
(*) Actually, Intel make or have made various kinds of processors, including ARM (Intel's ARM processors were called XScale, but I think they've sold that business). And AMD make other processors too. But we know which Intel/AMD processors you mean :-)
AMD are Intel compatible, otherwise they would never have gained a foothold in the market place.
They are effectively clone compatible.
As you suspect, the main stream Intel and AMD processors have the same instruction set.
Windows does not run on ARM or PowerPC chips, for example, because it is somewhat dependant on the underlying instruction set.
However, most of Windows is written in C++ (as far as I know), which should be portable to other architectures. Windows NT even ran on PowerPC and other architectures.
Intel's 80x86 CPUs and AMD's 80x86 are "mostly the same sort of", but some things are completely different (e.g. virtual machine extensions - SVM vs. VT-x) and some things (extensions) may or may not be supported. However, some things are different on different CPUs from the same manufacturer too (e.g. some Intel chips support AVX2 and some don't).
There are multiple ways to deal with the differences:
only use the common subset so the same code runs on all 80x86 CPUs (e.g. treat it like an 8086 chip).
use a subset of features that is common to a range of CPUs so the same code runs on all 80x86 CPUs in that range. This is very common (e.g. "this software requires an 80x86 CPU (and OS) that supports 64-bit extensions").
use install-time tests. For example, there might be 4 different copies of software (compiled for 4 different ranges of CPUs) where the installer decides which copy makes sense for the computer the software is being installed on.
use run-time tests. For example, code can use the CPUID instruction to do if( AVX2_is_supported() ) { set_function_pointers_so_AVX2_is_used(); } else {set_function_pointers_so_AVX2_is_not_used(); }. Note: Some compilers (Intel's ICC) can automatically generate code that does run-time tests.
These aren't mutually exclusive options. For example, the installer might decide to install a 64-bit version (and not a 32-bit version), and then the 64-bit version might check which features are supported at run-time and have different code to use different features.
Also note that different parts of an OS can be treated separately. For example, an OS could have 6 different boot loaders, 4 different "HALs", 4 different kernels, and 3 different "kernel modules" to support virtualisation; where some of these things might do run-time tests and some might not.
Do Intel and AMD processor have the same assembler?
Almost all assemblers for 80x86 support almost all extensions (from all CPU manufacturers - e.g. Intel, AMD, VIA, Cyrix, SiS, ...). In general; it's up to the programmer (or compiler) to make sure they only use things that they know exist. Some assemblers provide features to make this easier (e.g. NASM provides a CPU ... directive so that the programmer can tell the assembler to generate errors if it sees instructions that aren't supported on the specified CPU).
AMD and Intel use the same instruction set.
When you install windows on an AMD processor or an Intel processor, it doesn't "compile" code on the machine.
I remember many people being confused on this subject back during college. They believe that a "setup" means that it is compiling code on your machine. It isn't. Most if not all Windows application outside of the free realms, are given to you by binary.
As for portability, that isn't neccessarily 100% true. While C is highly portable, in many cases writing for a specific OS or system will result in the code only being able to compile/executed on that box. For example, certain Unix machines handle files and directories differently so it might not be 100% portable.
Do Intel and AMD processor have the same assembler?
An assembler assembles a program to be run on a processor, so your question is flawed. Processors DO NOT use assemblers.
If you mean can Intel and AMD processor run the same assembler? Then the answer is YES!!!
All an assemblers are, is a program that assembles other programs from structured text files. Visual Basic is an example of an assembler.

Resources