C's strange pointer arithmetics [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I was working on pointers. I saw a code snippet but I couldn't understand how it works.
The strange thing is that when the k function is executed, the expression y = 2 doesn't seem to work. Because the output is y = 1 instead of y = 2.
Any idea about this?
#include<stdio.h>
void k(void){
int x;
*(&x+5) += 7;
}
void main(void){
int y = 1;
y = 1;
k();
y = 2;
printf("y = %d", y);
}
CPU:AMD Ryzen 7 5800H with Radeon Graphics (16) # 3.200GHz
OS: Arch Linux x86_64
Compiler: GCC (Version: 12.1.1)
compile command: gcc a.c -o a

As pointed out by others, this statement suffers from undefined behavior:
*(&x+5) += 7;
Memory address &x+5 is outside the bounds of variable x. Writing to that address is a very bad idea.
OP's code sample exploits certain C compiler implementation details.
Probably educational; it can be used to demonstrate how hackers can exploit missing bounds checks to change the designed behavior of a program.
It is interesting enough to investigate the actual behavior of this program.
I will assume the program has been compiled by GCC with default settings on a 64-bit Linux distro on Intel x64 processor architecture (i.e. little endian).
Pointers are 64 bits (8 bytes), int is 32 bits (4 bytes).
Variable x is located on the stack. During execution of k(), the stack looks like this:
+--------+----------------------------------------------------+ TOP OF STACK
| rsp+0 | unused; 16-byte alignment filler | <--- rsp
+--------+----------------------------------------------------+
| rsp+4 | x | <--- &x
+--------+----------------------------------------------------+
| rsp+8 | reserved; least significant 32 bits of stack guard |
+--------+----------------------------------------------------+
| rsp+12 | reserved; most significant 32 bits of stack guard |
+--------+----------------------------------------------------+
| rsp+16 | least significant 32 bits of saved base pointer | <--- rbp
+--------+----------------------------------------------------+
| rsp+20 | most significant 32 bits of saved base pointer |
+--------+----------------------------------------------------+
| rsp+24 | least significant 32 bits of return address of k() | <--- &x+5
+--------+----------------------------------------------------+
| rsp+28 | most significant 32 bits of return address of k() |
+--------+----------------------------------------------------+
| rsp+32 | start of stack frame of main() |
+--------+----------------------------------------------------+
&x+5 is a memory address that is 20 bytes away from &x (because sizeof(int) is 4).
That happens to be the location of the return address of k().
*(&x+5) += 7 will increase the return address by 7.
That will have its effect when returning from k() to main().
Here is objdump output of main():
void main(void)
{
45: 55 push rbp
46: 48 89 e5 mov rbp,rsp
49: 48 83 ec 10 sub rsp,0x10
int y = 1;
4d: c7 45 fc 01 00 00 00 mov DWORD PTR [rbp-0x4],0x1
y = 13;
54: c7 45 fc 0d 00 00 00 mov DWORD PTR [rbp-0x4],0xd
k();
5b: e8 00 00 00 00 call 60 <main+0x1b>
y = 2;
60: c7 45 fc 02 00 00 00 mov DWORD PTR [rbp-0x4],0x2
printf("y = %d\n", y);
67: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
6a: 89 c6 mov esi,eax
6c: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 73 <main+0x2e>
73: b8 00 00 00 00 mov eax,0x0
78: e8 00 00 00 00 call 7d <main+0x38>
}
7d: 90 nop
7e: c9 leave
7f: c3 ret
The program is supposed to return from k() at offset 60 (the start of y = 2), but instead, it will return at offset 67 (the start of printf("y = %d\n", y)).
In other words, after returning from k(), the program will skip y = 2 and continue from printf("y = %d\n", y), printing y = 1.

There is undefined behavior (UB) on the *(&x+5) += 7; line. When there is UB, the program can do anyting including formatting your hard drive and outputting y=1.
To learn what is happening under the hood, you can check the assembly output.

The function k() invokes undefined behaviour.
This is because first you declare a local variable x. After this you try to add 7 to the address &x+5 which goes past the end of x.

Related

Local variables in Stack and sequence of their usage in C program

In a C program the local variables are stored on Stack. Let us say following there local variables are defined.
int x, y, z;
It means first 'x' will be pushed on stack, then 'y' and then 'z'.
Now if I need to use 'x' variable before 'y' or 'z' then it would mean that I would have to pop 'y' and 'z' before I can get access of 'x' variable on the stack.
Is my understanding correct or is there anything missing in it?
You have a basic misunderstanding of how a typical system stack works.
Most important: It seems that you think that only the last element on the stack can be accessed but that's not how it works. All objects (aka variables) on the stack can be accessed at any time.
Further, there is typically not a push/pop of the individual objects. Typically it's a stack pointer which is changed and that single operation makes room for all the objects needed in the stack.
Finally, the order in which you declare variables doesn't dictate the order in which they are placed on the stack. It's even possible that one or more of your variables will be kept in registers and therefore won't be on the stack.
Also notice that nothing of this comes from the C standard. It all depends on the system being used. The C standard doesn't even mention/require that the implementation uses a stack.
The order in which local variables are placed on the stack is an implementation detail of the compiler. They could appear in the order declared, in the reverse order, or mixed. The existence of these variables could even be optimized out.
The C standard does not dictate the ordering of local variables. In fact, it doesn't even dictate that a stack must be used.
With regard to using the variables, the compiler keeps track of their addresses and generates code to read/write from the address in question. Typically, nothing actually gets popped until either the function they are in returns or the enclosing scope they are in ends, but again that's an implementation detail.
Your understanding is not correct. You do not have to do any pushing or popping to access different variables. As far as your source code is concerned, there's no awareness of a stack or any stack operations. You just reference the variables as you need to:
z = x * y;
First, C does not mandate the use of a stack to store local (auto) variables; that's an implementation detail. Almost every C implementation does use a stack for this purpose, but it's not required.
Secondly, for implementations that use a stack, individual items are not pushed or popped as they are accessed. Instead, when a function is called a whole region of the stack, usually called a stack frame, is set aside and the function's arguments and local variables are referred to via an offset from an address within that frame.
For example, assume the function
void foo( int x, int y )
{
int a, b;
// some code
}
On an x86-like system, when this function is called a region of space on the stack will be allocated to store the function arguments, the local variables, the return address (i.e., the address of the next instruction to execute after the function returns), and the address of the previous stack frame (which stores the state of whatever function called foo). It would look something similar to this:
+-----------------+
0x80000000: | y | Argument 2
+-----------------+
0x7FFFFFFC: | x | Argument 1
+-----------------+
0x7FFFFFF8: | 0x01234567 | Address of next instruction
+-----------------+
0x7FFFFFF4: | 0x80000010 | Address of previous stack frame <--+
+-----------------+ |
0x7FFFFFF0: | a | Local variable |
+-----------------+ |
0x7FFFFFDC: | b | Local variable <-+ |
+-----------------+ | |
| |
+-----------------+ | |
esp: | 0x7FFFFFC8 | Stack pointer ---+ |
+-----------------+ |
|
+-----------------+ |
ebp: | 0x7FFFFFF4 | Frame pointer ---------------------+
+-----------------+
(This assumes 32-bit integers and addresses, and the addresses shown are only for illustration)
There are two registers that are used to keep track of the runtime stack. The stack pointer (esp or rsp on x86 and x86-64 platforms) stores the address of the last item pushed on the stack. On x86 the stack grows "downward" from high to low address, so the "top" of the stack has a lower address than the "bottom" of the stack. The frame pointer (ebp or rbp on x86 and x86-64) stores the address of the current stack frame. Function arguments and local variables are referenced via offsets from the frame pointer. So if you write something like
a = 10;
the compiled machine code does something like
mov 0x0c,-0x04(%rpb) ;; write 10 to the location 4 bytes "below" the address
;; stored in rbp, or 0x7FFFFFF0
Function argument x would be 8 bytes "above" the frame pointer, or 0x08(%rbp) (0x7FFFFFFC).
There's no requirement that local variables be allocated in any specific order - the compiler is free to arrange the space for them however it sees fit. Similarly, there's no requirement that function arguments be pushed in any specific order (or that they be pushed on the stack at all), but almost all systems that use a stack for function arguments expect them to be pushed in reverse order.
Did you notice the details in the tag you used?
For questions about the call stack, use [callstack] or
[stack-pointer] instead.
Only the call adresses are last-in-first-out. This is built-in in the CALL and RET instructions.
This return address stack can be extended to contain stack frames. The most direct way is to temporarily subtract the needed amount of bytes from the stack pointer. Here 32 bytes (0x20) for three long ints (x=5, y=7, z=11).
0000000000001143 <main>:
1143: 48 83 ec 20 sub rsp,0x20
1147: 48 c7 44 24 08 05 00 mov QWORD PTR [rsp+0x8],0x5
1150: 48 c7 44 24 10 07 00 mov QWORD PTR [rsp+0x10],0x7
1159: 48 c7 44 24 18 0b 00 mov QWORD PTR [rsp+0x18],0xb
1162: b8 00 00 00 00 mov eax,0x0
1167: e8 ad ff ff ff call 1119 <subsub>
116c: b8 00 00 00 00 mov eax,0x0
1171: 48 83 c4 20 add rsp,0x20
1175: c3 ret
The called subsub does no calls and can just access the unused part of the stack:
0000000000001119 <subsub>:
1119: c7 44 24 f4 2c 00 00 mov DWORD PTR [rsp-0xc],0x2c
1121: c7 44 24 f8 bc 01 00 mov DWORD PTR [rsp-0x8],0x1bc
1129: c7 44 24 fc 5c 11 00 mov DWORD PTR [rsp-0x4],0x115c
1132: c3 ret
This is with -fomit-frame-pointer. Like that, rsp is used directly.
With frame pointer, there is one extra push and pop per call to save and restore rbp base pointer.
0000000000001119 <subsub>:
1119: 55 push rbp
111a: 48 89 e5 mov rbp,rsp
111d: c7 45 f4 2c 00 00 00 mov DWORD PTR [rbp-0xc],0x2c
1124: c7 45 f8 bc 01 00 00 mov DWORD PTR [rbp-0x8],0x1bc
112b: c7 45 fc 5c 11 00 00 mov DWORD PTR [rbp-0x4],0x115c
1133: 5d pop rbp
1134: c3 ret

How to recognize functions start address and end address in binary of PowerPC?

Given a PowerPC binary file(ELF), we can disassembly it, but how to recoginze the functions as the IDA pro does? Is there an algorithm?
Can't you use objdump? If the elf file contains the debug symbols, you can see the functions and the code, as for example:
#include <altivec.h>
include
void print_vec_char(char *s, vector signed char _data){
int i;
printf("%s\t:", s);
for (i = 0 ; i <= 15; i++)
printf("%3d ", vec_extract(_data, i));
printf("\n");
}
void print_vec_long(char *s, vector signed long int _data){
int i;
printf("%s\t:", s);
for (i = 0 ; i <= 1; i++)
printf("%ld ", vec_extract(_data, i));
printf("\n");
}
int main(){
vector signed long int output;
signed char x, y;
vector signed char _data;
const vector signed char bits = {120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 0};
// Initialize the vector _data with the same values
_data = vec_splat_s8(-1);
print_vec_char("_data", _data);
print_vec_char("bits", bits);
output = vec_vbpermq(_data, bits);
print_vec_long("output", output);
y = vec_extract(output, 0);
x = vec_extract(output, 1);
printf("First Half = %x\n", y);
printf("Second Half = %x\n", x);
}
and the functions in the elf could be showed as:
# objdump -D test | grep print_vec_long -A 10
0000000000000924 <print_vec_long>:
924: 02 00 4c 3c addis r2,r12,2
928: dc 75 42 38 addi r2,r2,30172
92c: a6 02 08 7c mflr r0
930: 10 00 01 f8 std r0,16(r1)
934: f8 ff e1 fb std r31,-8(r1)
938: 61 ff 21 f8 stdu r1,-160(r1)
93c: 78 0b 3f 7c mr r31,r1
940: 78 00 7f f8 std r3,120(r31)
944: 56 12 02 f0 xxswapd vs0,vs34
948: d0 ff 20 39 li r9,-48
or
objdump -D test | grep print_vec_long -A 10
0000000000000924 <print_vec_long>:
924: 02 00 4c 3c addis r2,r12,2
928: dc 75 42 38 addi r2,r2,30172
92c: a6 02 08 7c mflr r0
930: 10 00 01 f8 std r0,16(r1)
934: f8 ff e1 fb std r31,-8(r1)
938: 61 ff 21 f8 stdu r1,-160(r1)
93c: 78 0b 3f 7c mr r31,r1
940: 78 00 7f f8 std r3,120(r31)
944: 56 12 02 f0 xxswapd vs0,vs34
948: d0 ff 20 39 li r9,-48
Is it what you are looking for?
Look for padding to alignment boundaries as candidates for the gaps between functions.
mflr (move from link register) is often found near the start of a function, if it's used at all. (non-leaf function).
Compiler generate code often ends functions with a lot of reloading call-preserved registers from stack memory, and often has most of the saving early in a function.
Of course, the actual return points in a function might not be the last basic blocks; it's often useful to put the code for a rare condition in a block at the very end past the normal return that's branched to, and then jumps back after doing something.
A real example of compiler-generated code may be illustrative. Compiled by gcc4.8.5 on the Godbolt compiler explorer for PowerPC (32-bit) with -O3 -mregnames
int ext();
int foo() { ext(); return 1; }
foo:
mflr %r0 # save link-register value
stwu %r1,-16(%r1)
stw %r0,20(%r1) # ... to memory
bl ext
lwz %r0,20(%r1) # then restore it after a function call
li %r3,1 # return 1
addi %r1,%r1,16 # stack pointer adjustment
mtlr %r0
blr # ret
Notice that blr (branch to link-register) is used as a return instruction, like x86 ret, instead of simply doing a normal register-indirect jump to the return address in %r0. (Presumably PowerPC handles blr specially, maybe with a return address predictor stack). If the code you're analyzing uses does this, it makes finding the ends of functions much easier. (But remember that tail-duplication optimizations can give a function multiple return paths, and blr won't always be the last instruction.)
The bl target addresses will give you (most of) the function entry points. Have your disassembler put labels on all the bl targets: branch-and-link is basically a call instruction, so the target address is always a function.
Functions that end with a tail-call to another function won't use blr at the end.
If a function is only ever tail-called, there won't be any bl instructions that target it. Or if it's only used with function pointers.
Regular branch/jump instructions that jump more than a few kiB are almost certainly jumping outside the current function to tail-call another function. So you should look at jumps like that as likely candidates for function entry/exit points.

What numeric values defines in dissembled of C code?

I'm understanding the assembly and C code.
I have following C program , compiled to generate Object file only.
#include <stdio.h>
int main()
{
int i = 10;
int j = 22 + i;
return 0;
}
I executed following command
objdump -S myprogram.o
Output of above command is:
objdump -S testelf.o
testelf.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main()
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
int i = 10;
6: c7 45 f8 0a 00 00 00 movl $0xa,-0x8(%ebp)
int j = 22 + i;
d: 8b 45 f8 mov -0x8(%ebp),%eax
10: 83 c0 16 add $0x16,%eax
13: 89 45 fc mov %eax,-0x4(%ebp)
return 0;
16: b8 00 00 00 00 mov $0x0,%eax
}
1b: c9 leave
1c: c3 ret
What is meant by number numeric before the mnemonic commands
i.e. "83 ec 10 " before "sub" command or
"c7 45 f8 0a 00 00 00" before "movl" command
I'm using following platform to compile this code:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Vendor ID: GenuineIntel
Those are x86 opcodes. A detailed reference, other than the ones listed in the comments above is available here.
For example the c7 45 f8 0a 00 00 00 before the movl $0xa,-0x8(%ebp) are hexadecimal values for the opcode bytes. They tell the CPU to move the immediate value of 10 decimal (as a 4-byte value) into the address located on the current stack 8-bytes above the stack frame base pointer. That is where the variable i from your C source code is located when your code is running. The top of the stack is at a lower memory address than the bottom of the stack, so moving a negative direction from the base is moving up the stack.
The c7 45 f8 opcodes mean to mov data and clear the arithmetic carry flag in the EFLAGS register. See the reference for more detail.
The remainder of the codes are an immediate value. Since you are using a little endian system, the least significant byte of a number is listed first, such that 10 decimal which is 0x0a in hexadecimal and has a 4-byte value of 0x0000000a is stored as 0a 00 00 00.

Smashing the stack not working

I have gone through the walkthrough about smashing the stack. Both the one http://insecure.org/stf/smashstack.html here and one I found on here Trying to smash the stack. I understand what is suppose to be happening, but I can't get it to work properly.
This is just like the other scenarios. I need to skip x=1 and print 0 as the value of x.
I compile with:
gcc file.c
The original code :
void function(){
char buffer[8];
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
When I run
objdump -dS a.out
I get
0000000000400530 <function>:
400530: 55 push %rbp
400531: 48 89 e5 mov %rsp,%rbp
400534: 5d pop %rbp
400535: c3 retq
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 20 sub $0x20,%rsp
40053e: 89 7d ec mov %edi,-0x14(%rbp)
400541: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400545: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40054c: b8 00 00 00 00 mov $0x0,%eax
400551: e8 da ff ff ff callq 400530 <function>
400556: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
40055d: 8b 45 fc mov -0x4(%rbp),%eax
400560: 89 c6 mov %eax,%esi
400562: bf 10 06 40 00 mov $0x400610,%edi
400567: b8 00 00 00 00 mov $0x0,%eax
40056c: e8 9f fe ff ff callq 400410 <printf#plt>
400571: c9 leaveq
400572: c3 retq
400573: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40057a: 00 00 00
40057d: 0f 1f 00 nopl (%rax)
In the function I need to figure out how many bytes the return address is beyond the start of the buffer. I am not sure about this value. But since there are 6 bytes from the beginnig of the function to the return; would I add 7 bytes to the buffer?
Then I need to skip the instruction
x=1;
And since that instruction is 7 bytes long. Would I add 7 to return pointer?
Something like this?
void function(){
char buffer[8];
int *ret = buffer + 7;
(*ret) += 7;
}
void main(){
int x;
x = 0;
function();
x = 1;
printf("%d\n", x);
}
This throws the warning:
warning: initialization from incompatible pointer type [enabled by default]
int *ret = buffer1 + 5;
^
And the output is 1. What am I doing wrong? And can you explain how to do it right and why it is the correct way?
Thank you.
Try the function below, I wrote it for 32-bit compiler try using (-m32 gcc flag) or with a little effort you can make it work with your 64-bit compiler (Note that in your objdump listing you got 7 bytes offset between call to function and the next instruction so use 7 instead of 8.
void function(void)
{
unsigned long *x;
/* &x will more likely be at -4(ebp) */
/* Adding 1 (+4) gets us to stored ebp */
/* Adding 2 (+8) gets us to stored return address */
x = (unsigned long *)(&x + 2);
/* This is the tricky part */
/* TODO: On my 32-bit compiler gap between call to function
and the next instruction is 8 */
*x += 8;
}
We know that automatic variables are created on the stack - so taking the address of an automatic variable yields a pointer into the stack. When you call a void function, its return address is pushed onto the stack and the size of that address depends on your platform (4 or 8 bytes normally). So if you pass the address of an automatic variable to a function and then write over the memory before that address, you will damage the return address and smash the stack. Here is an example:
#include <stdlib.h>
#include <stdio.h>
static void f(int *p)
{
p[0] = 0x30303030;
p[1] = 0x31313131;
*(p - 1) = 0x35353535;
*(p - 2) = 0x36363636;
}
int main()
{
int a = 0x41424344;
int b = 0x45464748;
int c = 0x494a4b5c;
f(&b);
printf("%08x %08x %08x\n", a, b, c);
return 0;
}
I compiled this on linux with 'gcc -g' and ran under gdb and got this:
Program received signal SIGSEGV, Segmentation fault.
0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
10 }
(gdb) bt
#0 0x000000000040056a in f (p=0x7fffffffde74) at smash.c:10
#1 0x3636363600400594 in ?? ()
#2 0x3030303035353535 in ?? ()
#3 0x494a4b5c31313131 in ?? ()
#4 0x0000000000000000 in ?? ()
(gdb)
As you can see, the parent function addresses now contain some of my magic numbers. I ran this on 64 bit linux, so really I should have used 64 bit ints to fully overwrite the return address - as it is I left the lower word untouched.

divdi3 division used for long long by gcc on x86

When gcc sees multiplication or division of integer types that isn't supported in hardware, it generates call to special library function.
http://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html#Integer-library-routines
According link above, long __divdi3 (long a, long b) used for division of long. However, here http://gcc.gnu.org/onlinedocs/gcc-3.3/gccint/Library-Calls.html divdi explained as "call for division of one signed double-word". When first source has cleary mapping of di suffix -> long arguments, second states divdi for double-word and udivdi for full-word (single, right?)
When I compile simple example
int main(int argc, char *argv[]) {
long long t1, t2, tr;
t1 = 1;
t2 = 1;
tr = t1 / t2;
return tr;
}
with gcc -Wall -O0 -m32 -march=i386 (gcc ver. 4.7.2)
dissamble shows me
080483cc <main>:
80483cc: 55 push %ebp
80483cd: 89 e5 mov %esp,%ebp
80483cf: 83 e4 f0 and $0xfffffff0,%esp
80483d2: 83 ec 30 sub $0x30,%esp
80483d5: c7 44 24 28 01 00 00 movl $0x1,0x28(%esp)
80483dc: 00
80483dd: c7 44 24 2c 00 00 00 movl $0x0,0x2c(%esp)
80483e4: 00
80483e5: c7 44 24 20 01 00 00 movl $0x1,0x20(%esp)
80483ec: 00
80483ed: c7 44 24 24 00 00 00 movl $0x0,0x24(%esp)
80483f4: 00
80483f5: 8b 44 24 20 mov 0x20(%esp),%eax
80483f9: 8b 54 24 24 mov 0x24(%esp),%edx
80483fd: 89 44 24 08 mov %eax,0x8(%esp)
8048401: 89 54 24 0c mov %edx,0xc(%esp)
8048405: 8b 44 24 28 mov 0x28(%esp),%eax
8048409: 8b 54 24 2c mov 0x2c(%esp),%edx
804840d: 89 04 24 mov %eax,(%esp)
8048410: 89 54 24 04 mov %edx,0x4(%esp)
8048414: e8 17 00 00 00 call 8048430 <__divdi3>
8048419: 89 44 24 18 mov %eax,0x18(%esp)
804841d: 89 54 24 1c mov %edx,0x1c(%esp)
8048421: 8b 44 24 18 mov 0x18(%esp),%eax
8048425: c9 leave
8048426: c3 ret
Note 8048414: call 8048430 <__divdi3>.
I can't use gcc lib for my project and it's multiplatform. I hoped not to write all __* functions for all platforms (speed is not matter), but now I'm a bit confused.
Can somebody explain, why is there __divdi3 (not __divti3) call generated for long long int (64-bit) division?
On x86 machines, the term "word" usually implies presence of a 16-bit value. More generally in the computer-science world, word can denote values of virtually arbitrary lengths, with words of 10 or 12 bits not being uncommon in the embedded systems.
I believe that the terminology you have hit upon is used for the Linux/Unix systems just for the sake of unification on the level of the operating system and has nothing to do with the target platform of your build. An example of use of the same notation can be found in gdb, which uses w for the 32-bit word and hw for the 16-bit "half-word" (in the x86 sense).
Furthermore, this convention also extends to the standard IEEE-754 floating point numbers with ease, and is summarised in the few bullet points below
s - single (precision, word) is used for four byte integers (int) / floats (float)
d - double (precision) for eight byte integers (long or long long) / floats (double)
t - ten bytes for integers (long long) / floats (long double)
This naming convention is used for all arithmetic built-ins, like __divsi3, __divdi3, __divti3 or __mulsi3, __muldi3, __multi3... (and all u - unsigned - variants). A complete list can be found here.
Division of 64-bit numbers on 32-bit machines uses advanced (and bit difficult) algorithm. However, you can still use algorithm principle you've learned in school. Here's simple pseudo-code for it (have a look on this answer about big-integers):
result = 0;
count = 0;
remainder = numerator;
while(highest_bit_of_divisor_not_set) {
divisor = divisor << 1;
count++;
}
while(remainder != 0) {
if(remainder >= divisor) {
remainder = remainder - divisor;
result = result | (1 << count);
}
if(count == 0) {
break;
}
divisor = divisor >> 1;
count--;
}

Resources