How to experience cache miss and hits in Linux system?

How to experience cache miss and hits in Linux system? - c

Hello I've been trying to experience cache miss and hits in Linux.
To do so, I've done a program in C, where I mesure the time in CPU cycle to do the instruction printf(). The first part mesure the time needed for a miss and the second one for a hit. Here is the given program :
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sched.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
uint64_t rdtsc() {
uint64_t a, d;
asm volatile ("mfence");
asm volatile ("rdtsc" : "=a" (a), "=d" (d));
a = (d<<32) | a;
asm volatile ("mfence");
return a;
}
int main(int argc, char** argv)
{
size_t time = rdtsc();
printf("Hey ");
size_t delta1 = rdtsc() - time;
printf("delta: %zu\n", delta1);
size_t time2 = rdtsc();
printf("Hey ");
size_t delta2 = rdtsc() - time2;
printf("delta: %zu\n", delta2);
sleep(100);
}
Now I would like to show that two processes (two terminals) have cache in commun. So I thought that running this program in two terminals would result in :
Terminal 1:
miss
hit
Terminal 2:
hit
hit
But now I have something like:
Terminal 1:
miss
hit
Terminal 2:
miss
hit
Is my understanding incorrect? Or my program wrong?

Your assumption is somewhat correct.
printf is part of the libc library. If you use dynamic linking, the operating system may optimize memory usage by only loading the library once for all processes using it.
However, there are multiple reasons why I don't expect you to measure any sizable difference:
compared to the difference between a cache hit and cache miss, printf takes an enormous amount of time to complete and there is a lot going on that introduces noise. With just a single measurement, it is very unlikely that you're able to measure that tiny difference.
the actual reason for the first measurement to take longer is likely the lazy binding of the library function printf being resolved by the loader (https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table) or some other magic happening (buffers being setup, etc.) for the first output.
a lot of libc functions are used by many different processes. If the library is shared, it is likely, that printf may be cached even though you did not use it.
I would suggest to mount a Flush+Reload attack (https://eprint.iacr.org/2013/448.pdf) on printf in one of the terminals and use it in the other terminal. Then, you may see a timing difference.
Note: to find the actual address of printf for the attack, you need to be familiar with dynamic linking and the plt. Just using something like void* addr = printf will probably not work!

Related

What function clears the screen on a mac terminal?

I'm new to C and I do not own a mac, but I'm working on a personal project for someone who does and part of the project's requirements is that it clears the screen. The reason I need to clear the screen is that it's part of a loop that clears the screen and then prints something again (I'm trying to make a "ticking counter" of sorts.)
I know that system("cls") works well on my terminal (obviously any system function isn't ideal though), however, I know that she's on a Mac OS, and that the system() function is notoriously nonportable and I need this to work on a mac. I've scoured the internet trying to see what system functions clear the screen on a mac, and the most recent source I could find was from 2006. Considering how often the mac gets updated, I'm not surprised that
I don't really need a solution that's elegant or secure, just an idea for something that works. My compiler is MinGW with GCC for libraries.
Here's a sample of the relevant code:
#include <stdio.h>
#include <time.h>
#include <math.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
int main()
{
time_t seconds; //variable declarations
float days;
float rate;
int i;
i = 3;
char str[50];
while (i > 2);
{
time(&seconds);
days = (seconds - ((float)1584673594)) / (float)86400;
rate = pow(1.05, days);
rate = rate * 100;
printf("\nCurrent Snuggle-Debt Balance: %f snuggles\n", rate);
printf("Days passed: %f \n", days);
sleep(.5);
system("cls");

If you're writing a C program that uses standard input and output, and you need to do things like move the cursor around or clear part or all of the screen, the curses library is what you want. Curses is widely available and does what you want and much more. To clear the screen, just call the clear() function. And that's just the beginning of what you can do.

Interleaved usleep() functions being executed together. Is this compiler optimization?

I have a code that does something similar to the following repeatedly over a loop:
$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
char arr[6] = {'h','e','l','l','o','!'};
for(int x=0; x<6 ; x++){
printf("%c",arr[x]);
usleep(1000000);
printf("%c",arr[x]);
usleep(1000000);
}
printf("\n");
return 0;
}
I see that printf() executes one after the other WITHOUT any delay (due to usleep), and then the program sleeps for the total usleep time at the end before the next iteration. Seems like all the usleep() calls happen together in the end.
I tried -O0 flag in gcc, because I suspected its the effect of compiler optimization. But I guess -O0 flag does not disable whatever optimization category this case falls under (if my guess is correct about the compiler being the reason for this behavior).
I am trying to understand the reason for this behavior and how to achieve the desired behavior from my program.
Note: I know it might be possible to replace usleep() with some compute-heavy function call that take an equivalent amount of time, but that is not the solution I am looking for.

You are using usleep() wrong. Use sleep(1) instead.
From man usleep:
EINVAL usec is greater than or equal to 1000000. (On systems where that is considered an error.)
Once you fix that you should do fflush() after printf() to avoid another surprise with output buffering.

Different run time after Segfault in Infinite Recursive main()

Just as we know,
In Linux world, infinite recusive "main()" in userspace will receive "segmentation fault" messsage, which is actually caused by stack overflow. (just as the following code)
#include <stdio.h>
void main(void)
{
main ();
}
Experiment and Question:
Change code to:
#include <stdio.h>
int cnt = 0;
void main(void) {
printf("cnt %d\n", cnt++);
main();
}
Test environment:
x86-64 ubuntu,
gcc-4.6
I need your help and thanks in advance!
Why Segmentation fault happens in different "cnt" value:
cnt: 523614
cnt: 523602
cnt: 523712
cnt: 523671

This is probably due to Address space layout randomization. If you run the slightly modified example of your program:
#include <stdio.h>
int cnt = 0;
void main(void)
{
int a;
printf("cnt %d %p\n", cnt++, (void*)&a); fflush(stdout);
main();
}
you will see that the address of a is not consistent over various runs of the program. Probably the initial size of the stack is also slightly randomized resulting in a slightly different number of stack frames fitting in this space.
P.S: I've added a fflush so the output of the program can be safely piped through for example tail and grep, otherwise buffering may blur the actual last line of output.
P.S2: I had to change print into printf and add #include <stdio.h>.
P.S3: You should not use an optimization on your program, because otherwise a tail-call optimization will remove your recursion and your program will actually loop forever. My version of the program doesn't do that, because of the aliased a.

Error with -mno-sse flag and gettimeofday() in C

A simple C program which uses gettimeofday() works fine when compiled without any flags ( gcc-4.5.1) but doesn't give output when compiled with the flag -mno-sse.
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
float time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%f\n", time);
return 0;
}
I have CFLAGS=-march=native -mtune=native
Could someone explain why this happens?
The program returns a correct value normally, but prints "0" when compiled with -mno-sse enabled.

The flag -mno-sse causes floating point arguments to be passed on the stack, whereas the usual x86_64 ABI specifies that they should be passed via SSE registers.
Since printf() in your C library was compiled without -mno-sse, it is expecting floating point arguments to be passed in accordance with the ABI. This is why your code fails. It has nothing to do with gettimeofday().
If you wish to use printf() from your code compiled with -mno-sse and pass it floating point arguments, you will need to recompile your C library with that option and link against that version.

It appears that you are using a loop which does nothing in order to observe a time difference. The problem is, the compiler may optimize this loop away entirely. The issue may not be with the -mno-sse itself, but may be that that allows an optimization that removes the loop, thus giving you the same time each time you run it.
I would recommend trying to put something in that loop which can't be optimized out (such as incrementing a number which you print out at the end). See if you still get the same behavior. If not, I'd recommend looking at the generated assembler gcc -S and see what the code difference is.

The datastructures tv_usec and tv_sec are usually longs.
Redeclaration of the variable "time" as a long integer solved the issue.
The following link addresses the issue.
http://gcc.gnu.org/ml/gcc-patches/2006-10/msg00525.html
Working code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct timeval s,e;
long time;
int i;
gettimeofday(&s, NULL);
for( i=0; i< 10000; i++);
gettimeofday(&e, NULL);
time = e.tv_sec - s.tv_sec + e.tv_usec - s.tv_usec;
printf("%ld\n", time);
return 0;
}
Thanks for the prompt replies. Hope this helps.

What do you mean doesn't give output?
0 (zero) is a perfectly reasonable output to expect.
Edit: Try compiling to assembler (gcc -S ...) and see the differences between the normal and the no-sse version.

Executing machine code in memory

I'm trying to figure out how to execute machine code stored in memory.
I have the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
FILE* f = fopen(argv[1], "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
char* bin = (char*)malloc(len);
fread(bin, 1, len, f);
fclose(f);
return ((int (*)(int, char *)) bin)(argc-1, argv[1]);
}
The code above compiles fine in GCC, but when I try and execute the program from the command line like this:
./my_prog /bin/echo hello
The program segfaults. I've figured out the problem is on the last line, as commenting it out stops the segfault.
I don't think I'm doing it quite right, as I'm still getting my head around function pointers.
Is the problem a faulty cast, or something else?

You need a page with write execute permissions. See mmap(2) and mprotect(2) if you are under unix. You shouldn't do it using malloc.
Also, read what the others said, you can only run raw machine code using your loader. If you try to run an ELF header it will probably segfault all the same.
Regarding the content of replies and downmods:
1- OP said he was trying to run machine code, so I replied on that rather than executing an executable file.
2- See why you don't mix malloc and mman functions:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>
int main()
{
char *a=malloc(10);
char *b=malloc(10);
char *c=malloc(10);
memset (a,'a',4095);
memset (b,'b',4095);
memset (c,'c',4095);
puts (a);
memset (c,0xc3,10); /* return */
/* c is not alligned to page boundary so this is NOOP.
Many implementations include a header to malloc'ed data so it's always NOOP. */
mprotect(c,10,PROT_READ|PROT_EXEC);
b[0]='H'; /* oops it is still writeable. If you provided an alligned
address it would segfault */
char *d=mmap(0,4096,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
memset (d,0xc3,4096);
((void(*)(void))d)();
((void(*)(void))c)(); /* oops it isn't executable */
return 0;
}
It displays exactly this behavior on Linux x86_64 other ugly behavior sure to arise on other implementations.

Using malloc works fine.
OK this is my final answer, please note I used the orignal poster's code.
I'm loading from disk, the compiled version of this code to a heap allocated area "bin", just as the orignal code did (the name is fixed not using argv, and the value 0x674 is from;
objdump -F -D foo|grep -i hoho
08048674 <hohoho> (File Offset: 0x674):
This can be looked up at run time with the BFD (Binary File Descriptor library) or something else, you can call other binaries (not just yourself) so long as they are statically linked to the same set of lib's.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
unsigned char *charp;
unsigned char *bin;
void hohoho()
{
printf("merry mas\n");
fflush(stdout);
}
int main(int argc, char **argv)
{
int what;
charp = malloc(10101);
memset(charp, 0xc3, 10101);
mprotect(charp, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
__asm__("leal charp, %eax");
__asm__("call (%eax)" );
printf("am I alive?\n");
char *more = strdup("more heap operations");
printf("%s\n", more);
FILE* f = fopen("foo", "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
bin = (char*)malloc(len);
printf("read in %d\n", fread(bin, 1, len, f));
printf("%p\n", bin);
fclose(f);
mprotect(&bin, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
asm volatile ("movl %0, %%eax"::"g"(bin));
__asm__("addl $0x674, %eax");
__asm__("call %eax" );
fflush(stdout);
return 0;
}
running...
co tmp # ./foo
am I alive?
more heap operations
read in 30180
0x804d910
merry mas
You can use UPX to manage the load/modify/exec of a file.
P.S. sorry for the previous broken link :|

It seems to me you're loading an ELF image and then trying to jump straight into the ELF header? http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
If you're trying to execute another binary, why don't you use the process creation functions for whichever platform you're using?

An typical executable file has:
a header
entry code that is called before main(int, char **)
The first means that you can't generally expect byte 0 of the file to be executable; intead, the information in the header describes how to load the rest of the file in memory and where to start executing it.
The second means that when you have found the entry point, you can't expect to treat it like a C function taking arguments (int, char **). It may, perhaps, be usable as a function taking no paramters (and hence requiring nothing to be pushed prior to calling it). But you do need to populate the environment that will in turn be used by the entry code to construct the command line strings passed to main.
Doing this by hand under a given OS would go into some depth which is beyond me; but I'm sure there is a much nicer way of doing what you're trying to do. Are you trying to execute an external file as a on-off operation, or load an external binary and treat its functions as part of your program? Both are catered for by the C libraries in Unix.

It is more likely that that it is the code that is jumped to by the call through function-pointer that is causing the segfault rather than the call itself. There is no way from the code you have posted to determine that that code loaded into bin is valid. Your best bet is to use a debugger, switch to assembler view, break on the return statement and step into the function call to determine that the code you expect to run is indeed running, and that it is valid.
Note also that in order to run at all the code will need to be position independent and fully resolved.
Moreover if your processor/OS enables data execution prevention, then the attempt is probably doomed. It is at best ill-advised in any case, loading code is what the OS is for.

What you are trying to do is something akin to what interpreters do. Except that an interpreter reads a program written in an interpreted language like Python, compiles that code on the fly, puts executable code in memory and then executes it.
You may want to read more about just-in-time compilation too:
Just in time compilation
Java HotSpot JIT runtime
There are libraries available for JIT code generation such as the GNU lightning and libJIT, if you are interested. You'd have to do a lot more than just reading from file and trying to execute code, though. An example usage scenario will be:
Read a program written in a scripting-language (maybe
your own).
Parse and compile the source into an
intermediate language understood by
the JIT library.
Use the JIT library to generate code
for this intermediate
representation, for your target platform's CPU.
Execute the JIT generated code.
And for executing the code you'd have to use techniques such as using mmap() to map the executable code into the process's address space, marking that page executable and jumping to that piece of memory. It's more complicated than this, but its a good start in order to understand what's going on beneath all those interpreters of scripting languages such as Python, Ruby etc.
The online version of the book "Linkers and Loaders" will give you more information about object file formats, what goes on behind the scenes when you execute a program, the roles of the linkers and loaders and so on. It's a very good read.

You can dlopen() a file, look up the symbol "main" and call it with 0, 1, 2 or 3 arguments (all of type char*) via a cast to pointer-to-function-returning-int-taking-0,1,2,or3-char*

Use the operating system for loading and executing programs.
On unix, the exec calls can do this.
Your snippet in the question could be rewritten:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
return execv(argv[1],argv+2);
}

Executable files contain much more than just code. Header, code, data, more data, this stuff is separated and loaded into different areas of memory by the OS and its libraries. You can't load a program file into a single chunk of memory and expect to jump to it's first byte.
If you are trying to execute your own arbitrary code, you need to look into dynamic libraries because that is exactly what they're for.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight