C function code in malloc'd memory

C function code in malloc'd memory - c

Is there a way to malloc memory space and then copy function code inside the space in C?
This question might not make sense in practice. I ask this question out of curiosity so that I can get a better understanding about how c and its underlying implementation work.
Here's the follow-up questions if it is possible to copy the code into heap:
How to determine the size for the function binary code when copy?
Can we use function pointer to execute the code? (the code is placed inside malloc'd memory, and that part of memory might be marked as non-executable for safety reason, but I'm not sure about this)

This (or something like it) is possible on most machines, but the techniques you'd use are system-specific -- there's no standard C or C++ way to do it.
Even figuring out the length of a function so you can copy it is difficult. I don't think you can do it reliably if the function is in the same translation unit, because the compiler may have done optimization magic that you can't see. However, if the function is in a different file, then the interface to it will probably be more reliable (although there could be linker magic going on that you would have to understand and emulate to accomplish your goal.)
Other problems (on some systems) are that malloc'd memory may not be executable. (This is often the case to improve security by preventing execution of code placed in an overrun buffer area.) However, systems with executable protection often have an alternate memory allocation function that can give you a chunk of memory where executable code can be placed, and to which execution can transfer. Some variation of this feature is necessary to implement shared libraries.
Finally, although self modifying code is probably the first thing people probably think of when considering your question, a reasonable, legitimate use of the relevant techniques might be in a native-code, just-in-time compilation system.
You may get better answers by specifying a particular OS and CPU where you want to do this.

The C standard (e.g. C11, read n1570) or the C++ one (e.g. C++11, C++14 and notice that they have lambda expressions and std::function; read more about closures ...) does not define what is a function address or pointer (it only defines what calling such an address does, then function pointers should point to existing functions and there is no standard way to build new ones dynamically at runtime). In some systems (pure Harvard architectures) a function sits in a different address space than the C heap (and on these systems executing anything in malloc-ed heap makes no sense and is undefined behavior). so the C11 standard forbids casting function pointers to data pointers and vice-versa.
So, to your question
Is there a way to malloc memory space and then put function code inside the space in C?
the answer is NO in general (but on some systems you could generate code at runtime, see below).
However, on desktop or laptop PCs or server PCs or tablets (running common OSes like Linux, Windows, MacOSX, Android), you usually have a Von Neumann architecture and there is (for a given process) a single virtual address space sharing both code and data (notably heap data obtained with malloc). That virtual address space organised in pages, and each page has its own memory protection. Read more about computer architecture, instruction sets, MMUs. Quite often heap allocated data is non-executable thru the NX bit.
The operating system plays an essential role. You need to read an entire book about OS, such as Operating Systems : Three Easy Pieces.
(I am guessing that you want to "create" some new functions in your program at runtime and call them thru C function pointers; you should explain why; I suppose you are coding some application for a PC or a tablet with a Unix-like OS, practically a Linux-x86_64 distribution, but you could adapt my answer to Windows)
You could use some libraries for JIT compilation such as asmjit, libgccjit, LLVM (or libjit or GNU lightning) and they generate code which is executable.
You could also use dynamic loading techniques on some plugin; on POSIX systems look into dlopen & dlsym (which can be used to "create" function addresses from a loaded plugin, beyond what the C11 standard allows). A possible way would be to generate some C code in a temporary file, compile it into a plugin, and dlopen that generated plugin. See this answer for more details.
On Linux, you can use the mmap(2) and related system calls (used to implement malloc in your C standard library, and also by dlopen(3)) to change your virtual address space, and the mprotect(2) system call to change protection (on a page by page basis). So if you want to explicitly copy or generate some function code it has to go into an executable page (PROT_EXEC).
Notice that because of relocation issues (and offsets or absolute addresses in machine code), it is not easy to copy machine code. Copying with memcpy the bytes of a given function code into some executable page usually won't work without pain: often CALL or JUMP machine instructions are using PC-relative addressing, so copying them without changing their offset won't work.
if it is possible to copy the code into heap
No, it is not possible in general; and in practice it is much more difficult than what you believe (even on Linux-x86_64, where other approaches that I mentioned are preferable); if you want to go that route you need to care about low level implementation details (instruction set, processor, compiler, calling conventions, ABIs, relocation) and your code would be non-portable and brittle.
How to determine the size for the function binary code when copy?
That question (and the notion of function size) has no sense in general. Some optimizing compilers are able to emit some machine code which is shared between several C functions, or to emit several non-contiguous machine code chunks for a given function (and gcc -O2 is likely to do these optimizations, read about function cloning). On Linux you could use dladdr(3) (or the nm or readelf programs) to get a "symbol size" in the ELF sense, but that size might not mean much. And as I explained, you can't just byte-copy binary machine code, you need to relocate (some parts of) it.

Related

Is there still a performance advantage to redefine standard like memcpy?

My questions is quite simple, but I can't find any clear answer, so here I am.
Nowadays C compilers are more efficient than it could be few years ago. Is there still any advantage to redefine functions like memcpy or memset in a new project ?
To be more specific lets assume that the targeted MCU on the project is a 32bit ARM core such as Cortex M or A. And GNU ARM toolchain is used.
Thanks

No, it is not beneficial to redefine memcpy. The problem is that your own function cannot work like the standard library memcpy, because the C compiler knows that the function with name memcpy is the one that (C11 7.24.2.1p2)
[...] copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
and it is explicitly allowed to construct any equivalent program that behaves as if such a function is called. Sometimes it will even lead to code that does not even touch memory, memcpy being replaced by a register copy, or using an unaligned load instruction to load a value from memory into a register.
If you define your own superduperfastmemcpy in assembler, the C compiler will not know about what it does and will slavishly call it whenever asked to.
What can be beneficial however is to have a special routine for copying large blocks of memory when e.g. it is known that both source and destination address are divisible by 1k and all lengths are always divisible by 1k; in that case there could be several alternative routines that could be timed at the program start up and the fastest one be chosen to be used. Of course, copying large amounts of memory around is a sign of mostly bad design...

The question is only answerable as other than a matter of opinion because you have been specific about the target and toolchain. It is not possible to generalise (and never has been).
The GNU ARM toolchain uses the Newlib C library. Newlib is designed to be architecture agnostic and portable. As such it is written in C rather then assembler, so its performance is determined by the code generation of the compiler and in turn the compiler options applied when the library is built. It is possible to build for a very specific ARM architecture, or to build for more generic ARM instruction subset; that will affect performance too.
Moreover Newlib itself can be built with various conditional compilation options such as PREFER_SIZE_OVER_SPEED and __OPTIMIZE_SIZE__.
Now if you are able to generate better ARM assembler code (and have the time) than the compiler, then that is great, but such kung-foo coding skills are increasingly rare and frankly increasingly unnecessary. Do you have sufficient assembler expertise to beat the compiler; do you have time, and do you really want to do that for every architecture you might use? It may be a premature optimisation, and be rather unproductive.
In some circumstances, on targets with the capability, it may be worthwhile setting up a memory-to-memory DMA transfer. The GNU ARM compiler will not generate DMA code because that is chip vendor dependent and not part of the ARM architecture. However memcpy is general purpose for arbitrary copy size alignment and thread safety. For specific circumstances where DMA is optimal, better perhaps to define a new differently named routine and use it where it is needed rather than redefine memcpy and risk it being sub-optimal for small copies which may predominate, or multi-threaded applications.
The implementation of memcpy() in Newlib for example can be seen here. It is a reasonable idiomatic implementation and therefore sympathetic to a typical compiler optimiser, which generally work best on idiomatic code. An alternative implementation may perform better in un-optimised compilation, but if it is "unusual", the optimiser may not work as well. If you are writing it in assembler, you just have to be better than the compiler - you'd be a rare though not necessarily valuable (commercially) commodity. That said, looking at this specific implementation, it does look far less efficient for large un-aligned blocks in the speed-over-size implementation. It would be possible to improve that at some small expense perhaps to more common aligned copies.

The functions like memcpy belong to the standard library and almost sure they are implemented in assembler, not in C.
If you redefine them it will surely work slower. If you want to optimize memcpy you should either use memmove instead or declaring the pointers as restrict, to tell that they do not overlap and treat them as fast as memmove.
Those engineers who wrote the Standard C library for the given arhitechture for sure they used the existing assembler function to move memory faster.
EDIT:
Taking the remarks from some comments, every generation of code that keeps the semantics of copying (including replacing memcpy by mov-instructions or other code) is allowed.
For algorithms of copying (including the algorithm that newlib is using) you can check this article . Quote from this article:
Special situations If you know all about the data you're copying as
well as the environment in which memcpy runs, you may be able to
create a specialized version that runs very fast

There are several points here, maybe already mentioned above:
Certified libs: usually they are not certified to run if safety constrained environments. Developed according to certain ASPICE/CMM level is usually never provided, and these libs can therefore not be used in such envrionments.
Architecture specific implementations: Maybe your own implementation uses some very target specific features, that the libs can not provide, e.g. specific load/store instructions (SIMD, vector based instructions), or even a DMA based implementation for bigger data, or using different implementations in case of multiprocessor with different core architectures (e.g. NXP S32 with e200z4 and e200z7 cores, or ARM M5 vs. A53), and the lib would need to find out on which core it is called to get the best perfomance
Since embedded development is according to C-standard "freestanding" and not "hosted", a big part of the standard is "implementation defined" or even "unspecified", and that includes the libs.

what is the mechansm of memory readable only ？

In generaly memory can be readable and writable.when C compiler set memory const,what is mechan of it?who block the memory being written.if by mistake to force to write the marked const memory,who is reporting the segment error?

It's the operating systems which marks pages of virtual memory as either readable, writable or executable (or a combination of all).
The compiler and linker works together to mark special sections of the executable file, and then the operating system loader handles setting up the memory itself.
Nothing of this is part of the C standard, which only specifies that attempting to modify a const variable is undefined behavior.

There is no specified mechanism in the C11 standard for read only memory. Check by reading n1570. But be scared of undefined behavior (e.g. writing in some const data).
In practice, on many C implementations running on current operating systems (e.g. Linux, Windows, Android, MacOSX, ...) and desktops, tablets or servers with an x86-64 or an ARM processor, a process has some virtual address space, with various segments, some being read only (and managed by the operating system kernel with the help of the MMU). Read also about virtual memory & segmentation fault. Take several days to read a book like Operating Systems: Three Easy Pieces (freely downloadable).
On embedded microcontrollers (e.g. Arduino like), some memory might be a hardware ROM. And some compilers might (but are not required to!) use it for some of your constant data.
You might use linker scripts (with GNU ld) to organize some read only segments into read-only memory. This is very implementation specific.
However, some platforms don't have any kind of user-programmable read-only memory (e.g. some embedded systems have a factory ROM containing a fixed boot loader in firmware, and everything else is in RAM). YMMV.

The complier and linker implement it, for instance in a embeded system, data on RAM is changable, while data on Flash/ROM is not changable.
So if the data is defined with const, it will be place into a non-volative storage place, e.g. Flash/ROM, disk.
Defining a variable with const has two benefits:
- Avoid this variable is changed by coding error.(complier error)
- Reduce RAM usage, e.g. A long text should be placed into Flash/ROM or disk.

What is the need of randomizing memory addresses for loading libraries?

ldd displays the memory addresses where the shared libraries are linked at runtime
$ cat one.c
#include<stdio.h>
int main() {
printf ("%d", 45);
}
$ gcc one.c -o one -O3
$ ldd one
linux-gate.so.1 => (0x00331000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x00bc2000)
/lib/ld-linux.so.2 (0x006dc000)
$
From this answer to another question,
... The addresses are basically random numbers. Before secure implementations were devised, ldd would consistently indicate the memory addresses where the program sections were loaded. Since about five years ago, many flavors of Linux now intentionally randomize load addresses to frustrate would-be virus writers, etc.
I do not fully understand how these memory addresses can be used for exploitations.
Is the problem something like "If the addresses are fixed, one can put some undesirable code at that address which would be linked as if it was a library" or is it something more than this?

"If the addresses are fixed, one can put some undesirable code at that address which would be linked as if it was a library"
Yes.
Also. Buffer overflow exploits require a consistent memory model so that the bytes that overflow the buffer do known things to known parts of the code.
http://www.corewars.org/ A great illustration of the principle.

Some vulnerabilities allow overwriting some address (stack overflows allow overwriting return addresses, exploit for heap overflows typically overwrite SEH pointers on Win32 and addresses (GOT entries) of dynamically called functions on Linux, ...). So the attacker needs to make the overwritten address point to something interesting. To make this more difficult, several counter-measures have been adopted:
Non-executable stacks prevents exploits from just jumping to some code the attacker has put on the stack.
W^X segments (segments which can never be writable and executable at the same time) prevents the same for other memory areas.
Randomized load addresses for libraries and position independent executables decrease the probabilities of succesful exploitation via return-into-libc and return-oriented-programming techniques, ...
Randomized load addresses also prevent attackers from knowing in advance where to find some interesting function (e.g: imagine an attacker that can overwrite the GOT entry and part of the message for the next logging call, knowing the address of system would be "interesting").
So, you have to view load address randomization as another counter-measure among many (several layers of defense and all that).
Also note that exploits aren't restricted to arbitrary code execution. Getting a program to print some sensitive information instead of (or in addition to, think of string truncation bugs) some non-sensitive information also counts as an exploit; it would not be difficult to write some proof-of-concept program with this kind of vulnerability where knowing absolute addresses would make reliable exploits possible.
You should definitely take a look at return-into-libc and return-oriented-programming. These techniques make heavy use of knowledge of addresses in the executable and libraries.
And finally, I'll note there are two ways to randomize library load addresses:
Do it on every load: this makes (some) exploits less reliable even if an attacker can obtain info about addresses on one run and try to use that info on another run.
Do it once per system: this is what prelink -R does. It avoids attackers using generic information for e.g: all Redhat 7.2 boxes. Obviously, its advantage is that it doesn't interfere with prelink :).

A simple example:
If on a popular operating system the standard C library was always loaded at address 0x00100000 and a recent version of the standard C library had the system function at offset 0x00000100 then if someone were able to exploit a flaw in a program running on a computer with this operating system (such as a web server) causing it to write some data to the stack (via a buffer overrun) they would know that it was very likely that if they wrote 0x00100100 to the place on the stack where the current function expected its return address to be then they could make it so that upon returning from the current function the system function would be called. While they still haven't done everything needed to cause system to execute something that they want it to, they are close, and there are some tricks writing more stuff to the stack aver the address mentioned above that have a high likelihood of resulting in a valid string pointer and a command (or series of commands) being run by this forced call to system.
By randomizing the addresses at which libraries are loaded the attacker is more likely to just crash the web server than gain control of the system.

The typical method is by a buffer overrun, where you put a particular address on the stack, and then return to it. You typically pick an address in the kernel where it assumes the parameters you've passed it on the stack have already been checked, so it just uses them without any further checking, allowing you to do things that normally wouldn't be allowed.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?

Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.

It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.

Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.

You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?

I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.

A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!

If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).

It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.

Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.

I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.

A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.

My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.

As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...

I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).

I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

What kind of C is an operating system written in?

It makes sense that something like an operating system would be written in C. But how much of it, and what kind of C? I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack? The OS is responsible for setting up all of this stuff that other applications use, but how does it do that? When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call? Or on the other end of a memory allocation?
Also, how much of an operating system would actually be written in C? All of it? What about architecture dependent code? What about the higher levels of abstraction--does that ever get written in higher level languages, like C++?
I mean, I'm just asking this out of sheer curiosity. I'm downloading the latest linux kernel now but it's taking forever. I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.

Excellent questions, all. The answer is: little to none of the standard C library is available in the "dialect" of C used to write an operating system. In the Linux kernel, for example, the standard memory allocation functions malloc, nmalloc, free etc. are replaced with special kernel-internel memory allocation functions kmalloc and kfree, with special restrictions on their use. The operating system must provide its own "heap" -- in the Linux kernel, physical memory pages that have been allocated for kernel use must be non-pageable and often physically continguous. See This linux journal article on kmalloc and kfree. Similarly, the operating system kernel maintains its own special call stack, the use of which requires, from memory, special support from the GCC compiler.
Also, how much of an operating system would actually be written in C? All of
it?
As far as I'm aware, operating systems are overwhelmingly written in C. Some architecture-specific features are coded in assembler, but usually very little to improve portability and maintainability: the Linux kernel has some assembler but tries to minimize it as much as possible.
What about architecture dependent
code? What about the higher levels of
abstraction--does that ever get
written in higher level languages,
like C++?
Usually the kernel will be written in pure C, but sometimes the higher level frameworks and APIs are written in a higher level language. For example, the Cocoa framework/API on MacOS is written in Objective C, and the BeOS higher level APIs were written in C++. Much of Microsoft's .NET framework was written in C#, with the "Common Language Runtime" written in a mix of C++ and assembler. The QT widget set most often used on Linux is written in C++. Of course, this introduces philosophical questions about what counts as "the operating system."
The Linux kernel is definitely worth looking at for this, although, it must be said, it is huge and intimidating for anyone to read from scratch.

What kind of C?
Mostly ANSI C, with a lot of time looking at the machine code it generates.
But, does an OS even have a heap?
Malloc asks the operating system for a pointer to some memory it is allowed to use. If a program running on an OS (user mode) tries to access memory it doesn't own, it will give a segmentation fault. An OS is allowed to directly access all the physical memory on the system, malloc not needed, no seg-faults on any address that exists.
What about a call stack?
The call stack actually often works at the hardware level, with a link register.
For file access, the OS needs access to a disk driver, which needs to know how to read the file system that's on the disk (there are a lot of different kinds) Sometimes the OS has one built in, but I think it's more common that the boot loader hands it one to start with, and it loads another (bigger) one. The disk driver has access to the hardware IO of the physical disk, and builds from that.

C is a very low level language, and you can do a lot of things directly. Any of the C library methods (like malloc, printf, crlscr etc) need to be implemented first, to invoke them from C (Have a look at libc concepts for example). I'll give an example below.
Let us see how the C library methods are implemented under the hood. We'll go with a clrscr example. When you implement such methods, you'll access system devices directly. For ex, for clrscr (clearing the screen) we know that the video memory is resident at 0xB8000. Hence, to write to screen or to clear it, we start by assigning a pointer to that location.
In video.c
void clrscr()
{
unsigned char *vidmem = (unsigned char *)0xB8000;
const long size = 80*25;
long loop;
for (loop=0; loop<size; loop++) {
*vidmem++ = 0;
*vidmem++ = 0xF;
}
}
Let us write our mini kernel now. This will clear the screen when the control is handed over to our 'kernel' from the boot loader. In main.c
void main()
{
clrscr();
for(;;);
}
To compile our 'kernel', you might use gcc to compile it to a pure bin format.
gcc -ffreestanding -c main.c -o main.o
gcc -c video.c -o video.o
ld -e _main -Ttext 0x1000 -o kernel.o main.o video.o
ld -i -e _main -Ttext 0x1000 -o kernel.o main.o video.o
objcopy -R .note -R .comment -S -O binary kernel.o kernel.bin
If you noticed the ld parameters above, you see that we are specifying the default load location of your Kernel as 0x1000. Now, you need to create a boot loader. From your boot loader logic, you might want to pass control to your Kernel, like
jump 08h:01000h
You normally write your boot loader logic in Asm. Even before that, you may need to have a look at how a PC Boots - Click Here.
Better start with a tinier Operating system to explore. See this Roll Your Own OS Tutorial
http://www.acm.uiuc.edu/sigops/roll_your_own/

But how much of it, and what kind of C?
Some parts must be written in assembly
I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something.
Some OS's have a heap. At a lowest level, they are slabs of memory that are dolled out called pages. Your C library then partitions with its own scheme in a variable sized manner with malloc. You should learn about virtual memory which is a common memory scheme in modern OS's.
When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call?
You call into assembly routines that query hardware with instructions like IN and OUT. With raw memory access sometimes you have regions of memory that are dedicated to communicating to and from hardware. This is called DMA.
I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.
Yes you will. You should pick up a book on hardware and OS's first.

I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack?
A lot of what you say in your question is actually done by the runtime library in userspace.
All that OS needs to do is to load the program into memory and jump to it's entry point, most details after that can be done by the user space program. Heap and stack are just areas of the processes virtual memory. Stack is just a pointer register in the cpu.
Allocating physical memory is something that is done on the OS level. OS usually allocates fixed size pages, which are then mapped to a user space process.

You should read the Linux Device Drivers 3. It explains pretty well the internals of the linux kernel.

I wouldn't start reading the Linux kernel, It's too complicated for starters.
Osdev is an excellent place to start reading.
I have done a little os with information from Osdev for an school subject. It runs on vmware, bochs, and qemu so it's easy to test it. Here is the source code.

Traditionally, C is mostly needed for the kernel and device drivers due to interaction with hardware. However, languages such as C++ and Java could be used for the entire operating system
For more information, I've found Operating Systems Design and Implementation by Andrew Tannenbaum particularly useful with LOTS of code samples.

malloc and memory management functions aren't keywords in C. This is functions of standard OS libraries. I don't know the name of this standard (it is unlikely that it's POSIX standard - I haven't found any mention), but it's exists - you use malloc in C applications on most platforms.
If you want to know how Linux kernel works I advice this book http://oreilly.com/catalog/9780596005658/ . I think it's good explanation with some C code inserted :).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight