User time execution and kernel time on two different resources

User time execution and kernel time on two different resources - c

I have a question about how is possible to modify the Linux scheduler(Linux kernel in general) to specify resources (CPUs) to run an application, to be clear, I'm wondering is there any way (even not efficient and slow) to tell the kernel scheduler that executes the user part of the code on specific core and the kernel part on another one?
I know in terms of execution it may have no sense but in terms of debugging and studying specific subjects, it may help a lot.
I couldn't find any article, book, document about this, I appreciate it if refer me to some references too.

Related

Where to start with Linux Kernel Modules?

A little background, I'm a CMPE Student currently in an Operating Systems class. I have some basic knowledge of C coding but am more comfortable with C++ (taken about 3 semesters of that). Other than that, never had any other formal training in coding. Also, I've got a basic understanding of the linux environment.
I am working on a project that requires me and my team to code a linux kernel module that can do the following:
echoes data passed from user-level processes by printing the data received to the kernel log
is able to pass data from one user process to another.
must be possible to use the kernel module as an inter-process communication abstraction. module should provide for situations where a sender posts data to it but no receiver is waiting.module must cover the situation where a receiver asks for data but there is no data available.
module must cover the situation where a receiver asks for data but there is no data available.
must be a limit in the buffer capacity in your module.
Now I don't know how difficult this seems to those with a background in programming, but this seems like an impossibly complicated task for someone in my position.
Here's what I've done so far:
Coded, Compiled, Inserted, and Removed the basic "hello world" linux kernel module successfully
Read through about the first 4 or 5 chapters of The Linux Kernel Module Programming Guide
Read through a few stackoverflow posts, none of which seem to be able to direct me to where I need to go.
So finally here's my question: Can someone please point me in the direction that I need to go with this? I don't even know where to being to find commands to use for reading in user-level process data and I need somewhere to start me off. TLPD was great for insight on the topic but isn't helping me get to the point where I will have a workable project to turn in. In the past, I would learn off of reading source code and reverse engineering, is there anywhere I can find something like that? Any and all help is appreciated.
-Will

I've found that the Linux Kernel Module Programming Guide is a pretty good resource. From the sounds of it, something like a character device might work best for your purposes, but I'm not sure if you have other constraints.
Another direction I might consider (though this could be a bad path) is to look at examples in the Linux kernel for a kernel module that has similar functionality. I don't have a good example offhand, but perhaps look through /drivers/char/.

What you describe is pretty much the same as a pipe.
Read chapter three of Linux Device Drivers.
(But don't just copy the scull pipe example …)

Simple cache profiling API

Is there a way I can access the (Intel]) hardware counters for each core programmatically? (that is, no perf, perfmon, or valgrind, and I should add "simple", so no PAPI, e.g.) I'd like to know (for each core) how many L1-LLC cache hits/misses it (= a certain program running on that core) incurred in. This is for Linux 3.2.0-32, C, and using GCC.

The performance counters in the processor can not be read from "user-mode" code, so you need some sort of kernel module to do this. Once you have that, it's not terribly hard, there are a number of MSR's.
You can perhaps also use /dev/cpu/core-number/msr to read the values without a kernel module.
To describe all the details of how you do this is a little too much for an answer (unless I copy'n'paste the entire section of Intel's programmers manual (Vol3) - which I don't think is quite what we want here...)

How to profiling sections of code?

I need to profile a piece of software written in C. Now the problem is that while gprof or my own begin timer/end timer function calls would provide me time spent in each function, I would have no information about which is the most time consuming part within each function. Some may term it as micro-optimization but that is what the need of the hour is!
One of achieving this to "manually" place begin/end timer calls across for loops (there could be more than one of these). In this case, a smarter thing would be to allow enabling/disabling these calls using macros.
But I want to automate this instrumentation?
Can you tell me if a good tool exists to achieve the same? It would be ideal if I could invoke the instrumented program repeatedly from a script and then find the average of time spent in each "section" of the code. For now section is a loosely defined term but that "tool" could have a more specific definition of what a section is.
It would also be helpful if I could somehow learn which tools would be

You might try using Callgrind (one of Valgrind tools) in conjunction with KCachegrind. See also this question.

I have not used it myself, but I have heard that the Valgrind instrumentation framework (http://www.valgrind.org/) has tools that enable the very fine-grained profiling necessary for what you are trying to accomplish.

You want the code to run as fast as possible, right?
gprof is a measuring tool. It can help in evaluating alternative implementations, as the original authors wrote.
They did not say it is effective for locating the code needing an alternative implementation, and it isn't, even though nearly everyone thinks it is.
The fallacy is that measuring locates, but if you want to find an elephant in the room, do you need to measure it to know it's there?
No, you open your eyes.
Here's a way to open your eyes to what your program is doing.

Content for Linux Operating Systems Class

I will be TA for an operating systems class this upcoming semester. The labs will deal specifically with the Linux Kernel.
What concepts/components of the Linux kernel do you think are the most important to cover in the class?
What do you wish was covered in your studies that was left out?
Any suggestions regarding the Linux kernel or overall operating systems design would be much appreciated.

My list:
What an operating system's concerns are: Abstraction and extension of the physical machine and resource management.
How the build process works ie, how architecture specific/machine code stuff is implanted
How system calls work and how modules can link up
Memory management / Virtual Memory / Paging and all the rest
How processes are born, live and die in POSIX and other systems
userspace vs kernel threads and what the difference is between process/threads
Why the monolithic Kernel design is growing tiresome and what are the alternatives
Scheduling (and some of the alternative / domain specific schedulers)
I/O, Driver development and how they are dynamically loaded
The early stages of booting and what the kernel does to setup the environment
Problems with clocks, mmu-less systems etc
... I could go on ...
I almost forgot IPC and Unix 'eveything is a file' design decisions
POSIX, why it exists, why it shouldn't
In the end just get them to go through tanenbaum's modern operating systems and also do case studies on some other kernels like Mach/Hurd's microkernel setup and maybe some distributed and exokernel stuff.
Give a broad view past Linux too, I recon
For those who are super geeky, the history of operating systems and why they are the way they are.

The Virtual File System layer is an absolute must for any Linux Operating System class.
I took a similar class in college. The most frustrating but, at the same time, helpful project was writing a small file system for the Linux operating system. Getting this to work takes ~2-3 weeks for a group of 4 people and really teaches you the ins and outs of the Kernel.

I recently took an operating systems class, and I found the projects to be challenging, but essential in understanding the concepts in class. The projects were also fun, in that they involved us actually working with the Linux source code (version 2.6.12, or thereabouts).
Here's a list of some pretty good projects/concepts that I think should be covered in any operating systems class:
The difference between user space and kernel space
Process management (i.e. fork(), exec(), etc.)
Write a small shell that demonstrates knowledge of fork() and exec()
How system calls work, i.e. how do we switch from user to kernel mode
Add a simple system call to the Linux kernel, write a test application that calls the system call to demonstrate it works.
Synchronization in and out of the kernel
Implement synchronization primitives in user space
Understand how synchronization primitives work in kernel space
Understand how synchronization primitives differ between single-CPU architectures and SMP
Add a simple system call to the Linux kernel that demonstrates knowledge of how to use synchronization primitives in the Linux kernel (i.e. something that has to acquire, say, the tasklist lock, etc. but also make it something where you have to kmalloc, which can't be done while holding a lock (unless you GFP_ATOMIC, but you shouldn't, really))
Scheduling algorithms, and how scheduling takes place in the Linux kernel
Modify the Linux task scheduler by adding your own scheduling policy
What is paging? How does it work? Why do we have paging? How does it work in the Linux kernel?
Add a system call to the Linux kernel which, given an address, will tell you if that address is present or if it's been swapped out (or some other assignment involving paging).
File systems - what are they? Why do they exist? How do they work in the Linux kernel?
Disk scheduling algorithms - why do they exist? What are they?
Add a VFS to the Linux kernel

Well, I just finished my OS course this semester so I thought I'd chime in.
I was kind of upset that we didn't actually play around with the actual OS itself, rather we just did system programming. I'd recommend having the labs be on something that is in the OS itself, which is what it sounds like what you want to do.
One lab that I did enjoy and found useful however was writing our own malloc/free routines. It was difficult, but pretty entertaining as well.
Maybe also cover loading programs into memory and/or setting up the memory manager (such as paging).

For labs, one thing that may be cool is to show them actual code and discuss about it, ask questions about what do they think things are done that way and not another, etc.
If I were again in the University I would certainly appreciate more in depth lessons about synchronization primitives, concurrency and so on... those are hard matters that are more difficult to approach without proper guidance. I remember I went to a speech by Paul "Rusty" Russell about spinlocks and other synchronization primitives that was absolutely rad, maybe you could find it in youtube and borrow some ideas.

Another good topic (or possibly exercise for the students) would be looking at virtualisation. Especially Rusty Russel's "lguest" which is designed as a simple introduction to what is required to virtualise an operating system. The docs are good reading too.

I actually just took a class that perfectly fits your description (OS Design using linux) in the spring. I was actually very frustrated with it because I felt like the teacher focused too narrowly for the projects rather than give a broader understanding. For instance, our last project revolved around futexes. My partner and I barely learned what they were, got it working (kinda) and then turned it in. I came away with no general knowledge of anything really from that project. I wish one of the projects had been to write a simple device driver or something like that.
In other words, I think it's good to make sure a good broad overview is presented, with as much detail as you can afford, but ultimately broad. I felt like my teacher nitpicked these tiny areas and made us intensely focus on those, while in the end I did NOT come away with that great of a general understanding of the inner-workings of Linux.
Another thing I'd like to note is a lot of why I didn't retain knowledge from the class was lack of organization. Topics came out of nowhere any given week, and there was no roadmap. Give the material a logical flow. Mental organization is the key to retaining the knowledge.

The networking sub-system is also quite interesting. You could follow a packet as it goes from the socket system call to the wire and the other way around.
Fun assignments could be:
create a state-full firewall by using netfilter
create an HTTP load balancer
design and implement a simple tunneling protocol

Memory mapped I/O and the 1g/3g vs 2g/2g split between kernel address space and user addressable space in 32bit operating systems.
Limitations of 32 bit architecture on hard drive size and what this means for the design of file systems.
Actually just all the pros and cons of going to 64 bit, what it means and why as well as the history and why are aren't there yet.

Call tree for embedded software [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Does anyone know some tools to create a call tree for C application that will run on a microcontroller (Cortex-M3)? It could be generated from source code (not ideal), object code (prefered solution), or at runtime (acceptable). I've looked at gprof, but there's still a lot missing to get it to work on an embedded system.
An added bonus would be that the tool also gives the maximum stack depth.
Update: solution is preferably free.

One good way to achieve this is by using the --callgraph option to the ARM linker (armlink) that is part of RVCT (not free).
For more details - callgraph documentation.
I realize from one of the comments that you are looking for a gcc-based solution, which this isn't. But it may still be helpful.

From source code, you can use Doxygen and GraphViz even if you don't already use Doxygen to document your code. It is possible to configure it so that it will include all functions and methods whether or not they have documentation comments. With AT&T Graphviz installed, Doxygen will include call and caller graphs for most functions and methods.
From object code, I don't have a ready answer. I would imagine that this would be highly target dependent since even with the debug information present, it would have to parse the object code to find calls and other stack users. In the worst case, that approach seems like it would require effectively simulating the target.
At runtime on the target hardware your choices are going to depend in part on what kind of embedded OS is present, and how it manages stacks for each thread.
A common approach is to initialize each stack to a known value that seems unlikely to be commonly stored in automatic variables. An interrupt handler or a thread can then inspect the stack(s) and measure an approximate high water mark.
Even without pre-filling the stack and later walking it to look for footprints, an interrupt could just sample the current value of the stack pointer (for each thread) and keep a record of its greatest observed extent. That would require storage for a copy of each threads SP, and the interrupt handler wouldn't have very much work to do to maintain the information. It would have to access the saved states of all the active threads, of course.
I don't know of a tool that does this explicitly.
If you happen to be using µC/OS-II from Micrium as your OS, you might take a look at their µC/Probe product. I haven't used it myself, but it claims to allow a connected PC to observe program and OS state information in near real time. I wouldn't be surprised if it is adaptable to another RTOS if needed.

Call graphs from source code is no problem as mentioned above your compiler or doxygen can generate this information from source code. Most modern compilers can generate a call graph as part of the compile process.
On a previous embedded projects I filled that stack with a pattern and ran a task. Check up to which point the stack destroyed my pattern. Reload stack with pattern and run the next task. This makes your code very ssslloowww .... but is free. It is not fully accurate because all the data is timing out the whole time and the code spends lots of time in error handlers.
On some processors you can get a trace pod so that you can monitor code cover and what not if your processor needs to run at full speed to test and you can also not use instrumented code. Unfortunately these types of tools are very expensive. Look at Green Hills Time machine if you have money. This make all types of debugging easier.

Check out StackAnalyzer.

I haven't used these, but are you aware of:
calltree
cflow
Since they analyse the source code, they don't calculate stack depth.
Note, Doxygen can do "call graphs" and "caller graphs" but I believe these are per-function and only show the tree up to a certain number of "hops" from each function.
Stack depth and/or call tree generation may be supported by compiler tools. For example, for Renesas micros there is a utility called Call Walker.

My calltree graph generator, implemented in bash, using cscope and dot.
Can generate graphs of upstream callers, downstream callees, and call-associations between functions. You can set it up to view graphs in a number of ways, including xfig, .png viewers, and the dynamic dot visualiztion tool "zgrviewer".
http://toolchainguru.blogspot.com/2011/03/c-calltrees-in-bash-revisited.html

Just a thought. Is it possible to run it in a virtual machine (like Valgrind) and take stack samples ?

Eclipse with CDT has C/C++ indexing and will show you a call graphs. As far as I know, you don't need to be able to build in Eclipse to get the indexer to work, just make sure all your source files are in the project.
It works pretty nicely.
Visual Studio will do similar (but it's not free). I use Visual Studio to work on embedded projects; using a makefile project I can do all the work except debugging in the VS IDE.

I have suggested this approach already in another discussion about embedded development, but if you really need a callgraph, as well as stack use info, and all this for free, I would personally consider using an open source emulator to simulate the whole thing, while instrumenting the object code by adding a handful hooks to the emulator itself to get this data.
I am not familiar with this particular target, but there is a whole number of open source ARM emulators available (freshmeat, sourceforge, google), and you are probably mostly interested in opcodes related to call/ret and push/pop?
For example check out skyeye.
So, even if you find that it's not straightforward to extend a compiler or an emulator to provide this information, it should still be possible to create a simple script in order to look for the entrypoint and all calls/rets, as well as opcodes related to stack usage.
Of course, the only reliable information on stack usage is going to come from runtime instrumentation, preferably exercising all important code paths.

A pretty light tool: Egypt

Use Understand: http://www.scitools.com/
It's not free, and runs on source (not runtime), but it works, it works well, and it's well supported.
It will tell you much more than could ever want to know about your code.

I know this is reponding to a very old question, but someone might stumble upon this with the same question...
I recently experimented with a Python script that analyses the assembler version of the application, extracts the stack usage and the call tree, and reports the maximum stack use. In my build system I then use this to create a stack of exactly that size.
I used it only on small applications, but it seems to work OK for AVR8, MSP430, and Cortex-M3. Obviously, there are strict limitations: no indirect calls (no function pointers, no virtual functions), no recursion, and stack-using assembler instruction patterns that are used are limited to what I found in GCC's output. If these limitations are not met, the script will report an error.
The Python source is 24k, free (boost license), not very fast, and still under development. Contact me if you are interested.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight