Core dump truncated although ulimit set to unlimited - core

My app core dump is getting truncated always. I have set ulimit to unlimited. Also I have enough disk space. And I am not killing the process while core dumping is going on. Any other possibilities for core getting truncated ?

Related

how can I do reproducible benchmarks in a storage system?

If the memory of the machine is far larger than the cache configured for a storage system, the file system caches far more data than the cache configured for the storage system. so, how to do reproducible benchmarks with different machine memory and the same cache configured for the storage system?
Maybe try running a program that allocates and locks a bunch of memory (i.e. pin it so it can't be paged out), then sleeps. Kill it when you want to release the memory.
Specifically, I'm thinking of the mlock(2) POSIX system call, or the Linux-specific MAP_LOCKED flag for mmap(2). This requires root, since the default ulimit -l is only 64kiB for non-root users, at least on my Ubuntu desktop.
On an otherwise-idle system with nothing using much memory, it should be easy to detect the total present and lock all but 2GB of it, for example. It's probably less easy to choose a reasonable size to lock on systems with other processes running and using varying amounts of RAM.

Is there any way to increase the stack size/recursion limit?

I'm writing a C program and am exceeding the recursion limit via a segmentation fault. Is there any way to increase the program's recursion limit (perhaps via increasing the stack size), either via an option to GCC or via a command-line option? The program is running on Ubunutu.
You can change the stack size with ulimit on Linux, for example:
ulimit -s unlimited
On Windows with Visual Studio, use /F option.
The stack size is a function of the operating system, though many earlier operating systems (MSDOS for example) didn't do program stack segment control: it was up to the program to reserve an adequately sized segment.
With virtual memory and 32-bit APIs, the stack size is usually provided by a resource management mechanism. For example, on Linux, the ulimit command provides one source of stack size control. Other levels of control are provided by mechanisms inside the kernel enforcing system policy, memory limitations, and other limits.

linux kernel: how can I copy files before panic?

I have a file on tmpfs partition, which is updated alot.
I want to copy it to other partition (flash partition) before crash/reboot.
It is not an option to keep this file in the first place on the flash partition,
because this flash has limited read/write life-cycle and I'm trying to avoid excessive read/writes to it.
too many writes will damage the flash, that is why the file is on tmpfs.
regrading reboot - I can modify the reboot process to copy before reboot - is there more neat way?
regrading crash - I don't know any way to do so. any ideas?
I know that that I should not mess with files from kernel space.
Thanks
Only a Kernel panic occurs its possible that in-core data structures are already corrupted and unreliable. Ideally, your kernel is not expected to panic, if the version you are using is a stable and tested release. I would recommend capturing a vmcore using crash tool and working with the vendor on the root cause of the kernel panic.
However, if you are referring to an abrupt system shutdown dew to a power failure, etc which could possibly cause loss of the data / file stored in the memory, you could write a cron-job to sync the file to disk on intervals and tune the kernel on how frequently the dirty page get synced. Having said that, if the file you are writing to is quite important, why design it to be kept in the memory in the first place.
You should be syncing this file back to the disk once in every few seconds or in regular intervals. In this way you will not loose the complete data.
As the numbers of read/writes are heavy on the tmpfs file, it may be worth considering using a SSD for this purpose. Read about how file system transaction logs are configured to be stored on SSD drives.
Write a cron-job for syncing the tmpfs file to the SSD or disk in frequent intervals or when ever there are updates. You may want to consider changing some kernel tunables (such as, vm.dirty_expire_centisecs=0, vm.dirty_background_ratio=0) such that any dirty pages would get synced to the disk immediately. A word of caution, doing this would cause higher CPU% and I/O loads, as pages would synced to the disk frequently, although the data loss would be kept to minimal.

Auto generate w3wp.exe process dump file when CPU threshold is reached even when PID changes

I'm trying to troubleshoot an issue with one of our websites which causes the CPU to spike intermittently. The site sits on a farm of web servers and it intermittently happens across all servers at different times. The process which causes the spike is w3wp.exe. I have checked all the obvious things and now want to analyse multiple sets of dump files for the w3wp.exe which causes the spike.
I'm trying to automatically generate a dump file of the w3wp.exe process when it reaches a specified CPU threshold for a specified time.
I can use ProcDump.exe to do this and it works a treat IF it's fired before the PID (Process ID) changes.
For example:
procdump -ma -c 80 -s 10 -n 2 5844(where 5844 is the PID)
-ma Write a dump file with all process memory. The default dump format includes thread and handle information.
-c CPU threshold at which to create a dump of the process.
-s Consecutive seconds CPU threshold must be hit before dump written (default is 10).
-n Number of dumps to write before exiting.
The above command would monitor the w3wp.exe till CPU spikes 80% for 10 seconds and it would take full dump at least for two iterations.
The problem:
I have multiple instances of w3wp.exe running so I cannot use the Process Name, I need to specify the PID. The PID changes each time the App Pool is recycled. This causes the PID to change before I can capture multiple dump files. I then need to start procdump again on each web server.
My question:
How can I keep automatically generating dump files even after the PID changes?
USe DebugDiagnostic 2.0 from Microsfot: https://www.microsoft.com/en-us/download/details.aspx?id=49924
It handles multiple w3wp.exe processes. If you need a generic solution, you will have to write a script - such as https://gallery.technet.microsoft.com/scriptcenter/Getting-SysInternals-027bef71

Appropriate Windows O/S pagefile size for SQL Server

Does any know a good rule of thumb for the appropriate pagefile size for a Windows 2003 server running SQL Server?
With all due respect to Remus (whom I respect greatly), I strongly disagree. If your page file is large enough to support a full dump, it will perform a full dump every time. If you have a very large amount of RAM, this can cause a tiny blip to became a major outage.
You do NOT want your server to have to write out 1 TB of RAM to disk if there is a one-time transient issue. If there is a recurring issue, you can increase the page file to capture a full dump. I would wait to do this until you have been isntructed by PSS (or someone else qualified to analyze a full dump) request you to capture a full dump. An extremely small percentage of DBAs know how to analyze a full dump. A mini-dump is sufficent for troubleshooting most issues that pop up anyway.
Plus, if your server is configured to allow a 1 TB full dump and a recurring issue occurs, how much free disk space would you recommend having on hand? You could fill up an entire SAN in a single weekend.
A page file 1.5*RAM was the norm back in the days when you were lucky to have a SQL Server with 3 or 4 GB of RAM. This is not the case any more. I leave the page file at Windows default size and settings on all production servers (except for an SSAS server that is experiencing memory pressure).
And just for clarification, I've worked with servers ranging from 2 GB of RAM to 2 TB of RAM. After more than 11 years, I have only had to increae the paging file to capture a full dump one time.
Irrelevant of the size of the RAM, you still need a pagefile at least 1.5 times the amount of physical RAM. This is true even if you have a 1 TB RAM machine, you'll need 1.5 TB pagefile on disk (sounds crazy, but is true).
When a process asks MEM_COMMIT memory via VirtualAlloc/VirtualAllocEx, the requested size needs to be reserved in the pagefile. This was true in the first Win NT system, and is still true today see Managing Virtual Memory in Win32:
When memory is committed, physical
pages of memory are allocated and
space is reserved in a pagefile.
Bare some extreme odd cases, SQL Server will always ask for MEM_COMMIT pages. And given the fact that SQL uses a Dynamic Memory Management policy that reserves upfront as much buffer pool as possible (reserves and commits in terms of VAS), SQL Server will request at start up a huge reservation of space in the pagefile. If the pagefile is not properly sized errors 801/802 will start showing up in SQL's ERRORLOG file and operations.
This always causes some confusion, as administrators erroneously assume that a large RAM eliminates the need for a pagefile. In truth the contrary happens, a large RAM increases the need for pagefile, just because of the inner workings of the Windows NT memory manager. The reserved pagefile is, hopefully, never used.
According to Microsoft, "as the amount of RAM in a computer increases, the need for a page file decreases." The article then goes on to describe how to use Performance Logs to determine how much of the page file is actually being used. Try setting your page file to 1.5X system memory for a start, then do the recommended monitoring and make adjustments from there.
How to determine the appropriate page file size for 64-bit versions of Windows
The bigger the better up to the size of the working set of the application where you will start to get into diminishing returns. You can try to find this by slowly increasing or decreasing the size until you see a significant change in cache hit rates. However, if the cache hit rate is over 90% or so you're probably OK. Generally you should keep an eye on this on a production system to make sure it hasn't outgrown its RAM allocation.
We were recently having some performance issues with one of our SQL Server that we weren't able to completely narrow down, and actually used one of our Microsoft support tickets to have them help troubleshoot. The optimal pagefile size to use with SQL Server came up, and Microsoft's recommendation is that it be 1 1/2 times the amount of RAM.
In this case, the normal recommendation of 1.5 times total physical RAM is not the best. This very general recommendation is provided under the assumption that all memory is being used by "normal" processes, which can generally have their least-used pages moved to disk without generating massive performance issues for the application process the memory belongs to.
For servers running SQL Server (generally with very large amounts of RAM), the majority of the physical RAM is committed to the SQL Server process and should be (if configured correctly) locked in physical memory, preventing it from being paged out to the pagefile. SQL Server manages its own memory very carefully with performance in mind, using a large part of the RAM allocated to its process as a data cache to reduce disk I/O. It does not make sense to page out those data cache pages to the pagefile, as the sole purpose of having that data in RAM in the first place is to reduce disk I/O. (Note that the Windows OS also uses available RAM similarly as disk cache to speed up system operation.) Since SQL Server already manages its own memory space, this memory space should not be considered "pageable", and not included in a calculation for pagefile size.
In regard to MEM_COMMIT mentioned by Remus, the terminology is confusing because in the virtual memory parlance, "reserved" never refers to actual allocation, but to preventing use of an address space (not physical space) by another process. Memory available to be "committed" is basically equal to the sum of physical RAM and pagefile size, and doing a MEM_COMMIT just decrements the amount available in the committed pool. It does not allocate a matching page in the pagefile at that time. When a committed memory page is actually written to, that is when the virtual memory system will allocate a physical memory page and possibly bump another memory page from physical RAM to the pagefile. See MSDN's VirtualAlloc function reference.
The Windows OS keeps track of memory pressures between application processes and its own disk cache mechanism and decides when it should bump non-locked memory pages from physical to the pagefile. My understanding is that having a pagefile that is way too large compared to the actual non-locked memory space can result in Windows overzealously paging out application memory to the pagefile, resulting in those applications suffering the consequences of page misses (slow performance).
As long as the server is not running other memory-hungry processes, a pagefile size of 4GB should be plenty. If you have set SQL Server to allow locking pages in memory, you should also consider setting SQL Server's max memory setting so that it leaves some physical RAM available to the OS for itself and other processes.
802 errors in SQL Server indicate that the system cannot commit any more pages for the data cache. Increasing the pagefile size will only help in this situation insofar as Windows is able to page out memory from non-SQL Server processes. Allowing SQL Server memory to grow into the pagefile in this situation might get rid of the error messages, but it is counterproductive, due to the point earlier about the reason for the data cache in the first place.
If you're looking for high performance, you are going to want to avoid paging completely, so the page file size becomes less significant. Invest in as much RAM as feasible for the DB server.
After much research our dedicated SQL Servers running Enterprise x64 on Windows 2003 Enterprise x64 have no page file.
Simply, the page file is a cache for files that gets managed by the OS, and SQL has it's own internal memory management system.
The MS article referenced does not qualify that the advice is for the OS running out-of-the-box services such as file sharing.
Having a page file simply burdens the disk I/O because Windows is trying to help, when only the SQL OS can do the job.

Resources