I'm doing a kernel mode driver, and I've run into a bit of a bug when running the code on 64-bit.
The code runs fine on 32-bit, but when I build/run in amd64 I'm getting strange results. I read up a little on 64 bit pointers and addressing vs 32bit vs 16bit (in win32) and I'm sure I'm missing something regarding the fundamentals of pointers in the 64bit architecture.
Here is the C code that works just fine in 32-bit.
ncImageLoadEventSettings.buff is a char* and ncILHead->count is simply an int.
// Calculate offset
pnt = (void*)(ncImageLoadEventSettings.buff + sizeof(struct NC_IL_HEAD) + (ncILHead->count * sizeof(struct NC_IL_INFO)));
This code calculates the address at which to write a struct object onto a buffer (beginning at .buff), which works perfectly fine in 32-bit mode.
It should be noted that the program reading this buffer is 32-bit. I think I read somewhere that structs in 64-bit mode are different sizes than those in 32-bit mode.
The 32-bit reader program reads some of the buffer's contents just fine, while the majority of the entries are garbage.
Is this the proper way to calculate addresses, or might there be an issue with the 64-bit vs 32-bit reader application that is reading that buffer?
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
In general, pointers are larger (64bit), and most fields that are of 64bit size (including pointers) will be aligned (with added padding).
Related
I was reading Intel 64 and IA-32 Architectures Software Developer's Manual.
I understand that before going through paging, the logical address will first have to be converted to linear address, and from linear address it will goes through paging table in order to generate the ultimate physical address.
My question is that, if we run the C code below, the address of variable a that is printed, is it a logical address or a linear address?
I know that Windows 10 64-bit is currently using long mode and so the logical address and linear address are the same, but the question I have in mind is:
Is the address we are seeing in user mode program is logical address or it's the linear address that has gone through the global descriptor table translation?
#include <stdio.h>
int main(void)
{
int a = 50;
printf("%p", &a);
return 0;
}
Windows has not used segmented memory since it stopped being 16-bit. Or to put it a different way, the GDT just spans from 0 to the end of linear space in 32-bit Windows.
All this is irrelevant when asking about C because it has no knowledge of such details. To answer we must ignore the C abstract machine and look directly at Windows on x86.
If we imagine your code running in 16-bit Windows, taking the address of a local variable is going to give you the offset into the segment. This is 16-bits. A FAR address on the other hand is 32-bits of information, the segment and the offset. The Windows function lstrlen (l for long?) takes a FAR address and can compute the length of a string coming from anywhere. The strlen C function might just use a plain char* pointer; just the segment offset. Your C compiler might support different memory models (tiny, small, compact, medium, large), giving you access to more memory, perhaps even FAR pointers. Classic DOS .com files use the tiny model, there are no segments, just a maximum of 64 KiB for all code and data. Other models might separate code and data in different segments.
In 32 and 64 bit Windows the logical and linear addresses are the same but if you have to think of it in terms of your graphic, %p is going to print the logical address. print is not going to ask the CPU/OS to translate the address in any way.
Running the following on Linux x86-64 compiled with gcc -m32
#include <stdio.h>
#include <limits.h>
int main() {
int a = 4;
int* ptr = &a;
printf("int* is %d bits in size\n", CHAR_BIT * sizeof(ptr));
return 0;
}
results in
int* is 32 bits in size
Why I convinced myself it ought to be 64 bits (prior to executing): since it is running on a 64-bit computer in order to address the memory we need 64 bits. Since &a is the address of where value 4 is stored it should be 64 bits. The compiler could implement a trick by having the same offset for all pointers since it is running in the compatibility mode, but it couldn't guarantee congruent data after calling malloc multiple times. This is wrong. Why?
On the hardware level, your typical x86-64 processor has a 32-bits compatibility mode, where it behaves like a x86 processor. That means memory is addressed using 4 bytes, hence your pointer is 32 bits.
On the software level, the 64 bits kernel allows 32 bits processes to be run in this compatibility mode.
This is how 'old' 32 bits programs can run on 64 bits machines.
The compiler, particularly with the -m32 flag, writes code for x86 addressing, so that's why int* is also 32 bits.
Modern CPUs have a memory management unit, it makes possible that every program has its own address space. You could even have two different programs using the same addresses. This unit is also what detects segmentation faults (access violations). With this, the addresses a program uses are not the same as the addresses on the address bus that connects the CPU and the peripherials including RAM, so it's no problem for the OS to assign 32-bit addresses to a program.
An x86-64 machine running a 64bit OS runs 32bit processes in "compat" mode, which is different from "legacy" mode. In compat mode, user-space (i.e. the 32bit program's point of view) works the same as on a system in legacy mode (32bit everything).
However, the kernel is still 64bits, and can map the compat-mode process's virtual address space anywhere in physical address space. (so two different 32b processes can each be using 4GB of RAM.) IDK if the page tables for a compat process need to be different from 64bit processes. I found http://wiki.osdev.org/Setting_Up_Long_Mode, which has some stuff but doesn't answer that question.
In compat mode, system calls switch the CPU to 64b long mode, and returns from system calls switch back. Kernel functions that take a user-space pointer as an argument need simple wrappers to do whatever is necessary to get the appropriate address for use from kernel-space.
The high level answer is that there's hardware support for everything compat mode needs to be just as fast as legacy mode (32bit kernel).
IIRC, 32bit virtual addresses get zero-extended to 64bit by the MMU hardware, so the kernel just sets up the page tables accordingly.
If you use an address-size override prefix in 64bit code, the 32-bit address formed from the 32bit registers involved will be zero-extended. (There's an x32 ABI for code that doesn't need more than 4GB of RAM, and would benefit from smaller pointers, but still wants the performance benefit of more registers, and having them be 64b.)
As far as I noticed a 32bit program uses the FLAT memory model and the 64bit also. Using the 32bit program one has only 4GB to address and using 64bit (rcx for example) makes it possible to saturate the 40 to 48 address bits modern CPU provide and address even more.
So beside this and some additional control registers that a 32bit processor does not has, I ask myself if it is possible to run 32bit code in linux flawlessly.
I mean must every C code I execute be 64bit for instance?
I can understand that since C builds upon a stack frame and base pointer pushing a 32bit base pointer on stack my introduce problems where the stack pointer is 64bit and one might access the pop and push op codes in 32 bit fashion.
So what are the difference and is it possible to actually run 32bit code when running a 64bit Linux kernel?
[Update]
To state the scenario clear I am running a 64bit program and load a ELF64 file into memory map everything and call the method directly. The idea is to generate asm code dynamically.
The main difference between them is the different calling conventions. On 32bit there are several types: __stdcall, __fastcall, ...
On 64bit (x64) there's only one (on Windows® platforms, about others I don't know) And it has some requirements, which are very different to 32bit.
More on https://future2048.blogspot.com
Note that ARM and IA64 (Itanium) also are different Encodings as x64 (Intel64/AMD64)
And you have 8 more general registers r8..r15, with sub registers
r8d..r15d, r8w..r15w, r8b..r15b
For the SIMD-based code also 8 additional registers xmm8..xmm15 are present.
The exception handling is data-based on 64bit; on 32bit it was code-based. So on 64bit for unwinding exceptions no longer instructions are used to build the exception frame. The exceptiom handling is completely data-based so that no additional instructions are required to try/catch.
The memory limit of 2GB on 32bit apps (or with /LARGEADDRESSAWARE 3GB on an app on 32bit Win OS, or 4GB on 64bit Win OS) is now much larger
More on https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx
And of course, the general purpose registers have 64bit width instead of 32bit. So any integer calculation can process values bigger than the 32bit limit of 0..4294967296. (signed -2147483648..+2147483647)
Also reading and storing memory with a simple MOV instruction can read and write a QWORD (64bit) at once; on 32bit that only could write a DWORD (32bit).
Some instructions have been removed: PUSHA + POPA disappeared.
And one Encoding form of INC/DEC is now used as REX-Byte prefix Encoding.
Some 32 bit code will work in a 64 bit environment without modification. However, in general, functions won't work because the calling conventions are probably different (depends on the architecture). Depending on the program, you could write some glue to convert arguments to the calling convention you want. So you can't just link a 32-bit library into your 64-bit application.
However, if your entire application is 32-bit, you will probably be able to run it just fine. The word size of the kernel doesn't really matter. Popular operating systems all support running 32-bit code with a 64-bit kernel: Linux, OS X, and Windows support this.
In short: you can run 32-bit applications on a 64-bit system but you can't mix 32-bit and 64-bit code in the same application (barring deep wizardry).
I've coded a win server application in C. I have to compile it into a .dll, and as a 32-bit .dll on a 32-bit machine, it works beautifully. However, when I compile as 64 bit using cmake tool and put the same code in 64-bit machine and specify that it compile for 64-bit, and I run the program, it crashes on a line that frees some memory.
My question is this: What causes this? Why does a program, coded exactly the same, crash on a memory free in its 64-bit version on a 64-bit machine, and not the 32-bit version on a 32-bit machine? Is it any difference in win 32 bit server and win 64 bit server.Please help me to get difference of memory structure in both type of windows os.
Version info: I'm using Visual Studio 2010, win 2008 R2
What causes this? Why does a program, coded exactly the same way, crash on a memory free in its 64-bit version on a 64-bit machine, and not the 32-bit version on a 32-bit machine?
Because your code is incorrect. By chance it works on 32 bit, but compiling for 64 bit, with different pointer size, exposes the faults in your code.
Is there any difference in win 32 bit server and win 64 bit server? Please help me to get difference of memory structure in both type of windows OS.
The principal difference is that pointers are 32 bits wide on 32 bit, and 64 bits wide on 64 bit. There are obviously many other differences, but from your perspective it's pointer size that matters.
Far and away the most common bug uncovered by a port from 32 to 64 bit is pointer truncation. Say you have code that casts pointers to ints.
int i = (int) p;
That happens to work at runtime when compiled for 32 bit, but on 64 bit you lose half of the pointer. When you later cast back to a pointer
int* p = (int*) i;
you don't get the same pointer that you started with. Fundamentally the problem is that the code made the assumption that an int, which is 4 bytes wide on Windows, is the same size as a pointer. That assumption holds for 32 bit, but not for 64 bit.
You'll likely be suffering from this problem and very likely other more subtle problems. In order to solve the problem you will need to debug the process in detail.
I think you are looking for some simple switch that will make your program work. There is no magic solution because the problem lies in your code. And so it will need to be carefully debugged.
I've recently been (relearning) lower level CS material and I've been exploring buffer overflows. I created a basic C program that has an 8-byte array char buffer[8];. I then used GDB to explore and disassemble the program and step through its execution. I'm on a 64-bit version of Ubuntu, and I noticed that my 8-byte char array is actually represented in 16 bytes in memory - the high order bits all just being 0.
E.g. Instead of 0xDEADBEEF 0x12345678 as I might expect to represent the 8 byte array, it's actually something like 0x00000000 0xDEADBEEF 0x00000000 0x12345678.
I did some googling and was able to get GCC to compile my program as a 32-bit program (using -m32 flag) - which resulted in the expected 8 bytes as normal.
I'm just looking for an unambiguous explanation as to why the 8-byte character array is represented in 16 bytes on a 64-bit system. Is it because the minimum word size / addressable unit is 16 bytes (64 bits) and GDB is simply printing based on an 8-byte word size?
Hopefully this is clear, but let me know if clarification is needed.
64bit systems are geared toward aligning all memory to 16 byte boundries (16 byte stack alignment is part of the System-V ABI), for stack allocations, there are two parts to this, firstly, the stack itself needs to be aligned, secondly any allocations then try to preserve that alignment.
This explains the first part as to why the 8 byte array becomes 16 bytes on the stack, as to why it gets split into two 8byte qwords, this is a little more difficult to tell, as you haven't provided any code (assembly or C) as to the use of this buffer. And trying to replicated this using mingw64 provides the 16 byte alignment, but not the funny layout you are seeing.
Of course, the other possibility stemming from the lack of ASM is that GDB is displaying 2xQWORD's even though its in fact 2xDWORD's (in other words, try using p/x (char[8]) to dump the contents...).