Understanding sp_spaceused values: reserved and index_size - sql-server

I'm running the following simple script:
create table MyTable (filler char(10))
go
insert into MyTable (filler) values ('a')
go 1
exec sp_spaceused MyTable
go
drop table MyTable
go
and get the following result:
rows reserved data index_size unused
------ ---------- ------ ----------- -------
1 72 KB 8 KB 8 KB 56 KB
My questions:
Why 72 KB were reserved?
Why index_size is 8 KB if the table is not even indexed?
EDIT:
I'd like to add a follow-up:
When changing the script a little:
create table MyTable (filler char(69))
go
insert into MyTable (filler) values ('a')
go 100
I get:
rows reserved data index_size unused
------ ---------- ------ ----------- -------
100 72 KB 16 KB 8 KB 48 KB
Note that defining filler's size to 68 bytes (and inserting 100 rows) still gives 8KB as the data value (we can continue and set it to 148 bytes, which will result in another 8KB increment, i.e. to 24KB).
Can you help me break down the calculation? If (apparently) only 6,900 bytes are used, What is the cause for the 8KB addition?
EDIT #2:
Here's the results of DBCC PAGE:
PAGE: (1:4392)
BUFFER:
BUF #0x00000000061A78C0
bpage = 0x00000001EF3A8000 bhash = 0x0000000000000000 bpageno = (1:4392)
bdbid = 6 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 18482 bstat = 0x9
blog = 0x15ab215a bnext = 0x0000000000000000 bDirtyContext = 0x0000000000000000
bstat2 = 0x0
PAGE HEADER:
Page #0x00000001EF3A8000
m_pageId = (1:4392) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x8200
m_objId (AllocUnitId.idObj) = 260 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594054967296
Metadata: PartitionId = 72057594048151552 Metadata: IndexId = 0
Metadata: ObjectId = 1698105090 m_prevPage = (0:0) m_nextPage = (0:0)
pminlen = 72 m_slotCnt = 100 m_freeCnt = 396
m_freeData = 7596 m_reservedCnt = 0 m_lsn = (55:8224:2)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = -2116084714 DB Frag ID = 1
Allocation Status
GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x44 ALLOCATED 100_PCT_FULL
DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED
Slot 0 Offset 0x60 Length 75
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Record Size = 75
Memory Dump #0x0000000012A3A060
0000000000000000: 10004800 61202020 20202020 20202020 20202020 ..H.a
0000000000000014: 20202020 20202020 20202020 20202020 20202020
0000000000000028: 20202020 20202020 20202020 20202020 20202020
000000000000003C: 20202020 20202020 20202020 010000 ...
Slot 0 Column 1 Offset 0x4 Length 68 Length (physical) 68
filler = a
-- NOTE: The structure of each Slot is identical to that of Slot #0, so we can simply jump to slot 99:
Slot 99 Offset 0x1d61 Length 75
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Record Size = 75
Memory Dump #0x0000000012A3BD61
0000000000000000: 10004800 61202020 20202020 20202020 20202020 ..H.a
0000000000000014: 20202020 20202020 20202020 20202020 20202020
0000000000000028: 20202020 20202020 20202020 20202020 20202020
000000000000003C: 20202020 20202020 20202020 010000 ...
Slot 99 Column 1 Offset 0x4 Length 68 Length (physical) 68
filler = a
So we can see the last slot starts after 7521 bytes, and adding its size gives us 7,596 bytes. If we add the size of the slot array (in which each pointer is 2 bytes), we get 7,796 bytes.
However, we need to get to 8,192 bytes to fill the page. What's missing?

The 72K of reserved space includes a 64K extent (8 pages of 8K each) plus the 8K IAM page overhead. Of this 72K, only the IAM page and a single data page is actually used. sp_space_used reports the IAM page in the index_size, albeit not technically an index. You can see these details with the undocumented sys.dm_db_database_page_allocations TVF (use only on a test system):
SELECT extent_file_id, extent_page_id, page_type_desc
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'dbo.MyTable'), 0, 1, 'DETAILED');
This database apparently has the MIXED_PAGE_ALLOCATION database option set to OFF so a full 64K extent is allocated a initially. If the option were ON, the single data page would be allocated from a mixed extent instead of a 64K extent dedicated to the table. The space allocated in that case would be 16K - a 8K single data page plus the IAM page.
Although mixed extents do reduce space requirements for small tables (under 64K), mixed extents have more overhead and can cause allocation contention in a high concurrency workload so it is off by default in SQL 2016 onwards. In older SQL versions, mixed extent allocation was on by default and can be turned off at the server level with trace flag 1118.
You can see the mixed extent setting in sys.databases:
SELECT name, is_mixed_page_allocation_on
FROM sys.databases;
To toggle the setting:
ALTER DATABASE Test
SET MIXED_PAGE_ALLOCATION ON;
EDIT 1:
Space within a data page includes overhead for the page itself as well as records within the page. This overhead, plus the space needed for user data, will determine how many rows can fit on a page and number of data pages required to store a given number of rows. See Paul Randal's anatomy of a page and anatomy of a record articles for details of that overhead.
EDIT 2:
From your follow-up comment:
7998 bytes, so there are more 194 bytes to go for the next allocation.
What am I missing?
I almost never use heaps but as you can see in the page dump, the associated PFS (page free space) allocation status for this page is 100 percent full. According to Kalen Delaney's Microsoft SQL Server 2012 Internals book, the PFS status is actually a 3-bit mask of these ranges:
000: empty
001: 1-50% full
010: 51-80% full
011: 81-95% full
100: 96-100% full
So it looks like once heap page fullness crossed the 96% percent threshold it was considered 100% full and a new page was allocated. Note this does not happen on a table with a clustered index because the page for a new row is first determined by the CI key and a new page allocated only if it can't fit in that page at all. Yet another reason to avoid heaps.

Related

Missing bytes in SQL Server data page

Did not know how to call the Title properly. However, I am trying to understand how the data pages are stored. I've created simple table:
CREATE TABLE testFix
(
id INT,
v CHAR(10)
);
INSERT INTO dbo.testFix
(
id,
v
)
VALUES
( 1, -- id - int
'asdasd' -- v - varchar(100)
)
GO 2
DBCC TRACEON(3604);
Then I got PageFID and PagePID by following command:
DBCC IND(tempdb, testFix, -1)
GO
Then the actual data pages:
DBCC PAGE (tempdb, 1, 368, 3)
So now I see:
Slot 0 Offset 0x60 Length 21
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 21
Memory Dump #0x000000287DD7A060
0000000000000000: 10001200 01000000 61736461 73642020 20200200
........asdasd .. 0000000000000014: 00
.
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 2 Offset 0x8 Length 10 Length (physical) 10
v = asdasd
Slot 1 Offset 0x75 Length 21
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 21
Memory Dump #0x000000287DD7A075
0000000000000000: 10001200 01000000 61736461 73642020 20200200
........asdasd .. 0000000000000014: 00
.
Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 1 Column 2 Offset 0x8 Length 10 Length (physical) 10
v = asdasd
Slot 2 Offset 0x8a Length 21
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Record Size = 21
Memory Dump #0x000000287DD7A08A
0000000000000000: 10001200 01000000 61736461 73642020 20200200
........asdasd .. 0000000000000014: 00
So the length of the record is 21 byte. However INT is 4 bytes and CHAR(10) is 10 bytes. 4+10=14. What for the other 7 bytes are used?
Here is the "anatomy" of data row
In red there are 7 bytes you are missing: Status Bits A (1), Status Bits B (1), Fdata length (2), Ncols (2), NullBits (1)
From this book: Pro SQL Server Internals by Korotkevitch D.

SQL Server reduce on disk size

I have a database in production and need to reduce the on disk size of the database. I followed the instructions to shrink the file but the results was a bit surprising.
Sorry for the lots of numbers here but I do not know how to express the problem any better.
Database containing only one table with 11,634,966 rows.
The table structure as follow I just changed the column names
id bigint not null 8 -- the primary key (clustered index)
F1 int not null 4
F2 int not null 4
F3 int not null 4
F4 datetime not null 8
F5 int not nul 4
F6 int not null 4
F7 int not null 4
F8 xml ?
F9 uniqueidentifier 16
F10 int 4
F11 datetime not null 8
Excluding the XML field I calculate that the data size will be 68 bytes per row.
I ran a query against the database finding the min, max and avg size of the xml field F8
Showing the following:
min : 625 bytes
max : 81782 bytes
avg : 5321 bytes
The on disk file is 108G big after shrinking the database.
This translate to the following
108G / 11.6M records = 9283 bytes per row
- 5321 bytes per row (Avg of XML)
= 3962 bytes per row
- 68 (data size of other fields in row)
= 3894 bytes per row. (must be overhead)
but this mean that the overhead is 41.948%
Is this to be expected? and is there anything that I can do to reduce the 108G disk size.
BTW there is only one clustered index on the table.
And I am using SQL Server 2008 (SP3)

IAR EW51: Large CONST array location

In CC2541 (IAR EW51/7.20) project I need to store few large const arrays (~30KB each).
I define the first array:
const uint8 demo_signal_1[30000] = {
0X00,
0X01,
0X10,
// rest of the data
};
It links in XDATA_ROM_C segment and runs just fine.
Then I add another 30KB array in another segment to overcome 32KB limitation:
const uint8 demo_signal_2[30000] = {
0X22,
0X33,
0X44,
// rest of data
}
The linker throws an error:
Error[e104]: Failed to fit all segments into specified ranges. Problem discovered in segment XDATA_ROM_C. Unable to place 96 block(s) (0xe37e byte(s) total) in 0x8000 byte(s) of memory.
Can anyone guide how to locate the second array on its own segment so linking should pass ?
I tried to follow the documentation and the forum but I seem to fail grasp something.
many thanks for any support
Thanks.
UPDATE (pretty long but please bear with me):
I played a little with segments definitions -
I've added two new (CONST)segments to the xcl file:
// Define segments for const data in flash.
// First the segment with addresses as used by the program (flash mapped as XDATA)
-P(CONST)XDATA_ROM_C=0x8000-0xFFFF
-Z(CONST)XDATA_ROM_C2=0x28000-0x2FFFF // Added
-Z(CONST)XDATA_ROM_C3=0x38000-0x3FFFF // Added
//
And defined the arrays to locate in these segments
// Array 1 in it own segment
const uint8 demo_signal_1[28800] # "XDATA_ROM_C2"= {
0X00,
0X00,
0X01,
0X01,
// ...rest of initialization data
}
// Array 2 in it own segment
const uint8 demo_signal_2[28800] # "XDATA_ROM_C3" = {
0X00,
0X00,
0X02,
0X02,
// ...rest of initialization data
}
This time it does link fine and generate the following map file
****************************************
* *
* SEGMENTS IN ADDRESS ORDER *
* *
****************************************
SEGMENT SPACE START ADDRESS END ADDRESS SIZE TYPE ALIGN
======= ===== ============= =========== ==== ==== =====
INTVEC CODE 00000000 - 00000085 86 com 0
CSTART CODE 00000086 - 00000136 B1 rel 0
BIT_ID CODE 00000137 dse 0
BDATA_ID CODE 00000137 dse 0
IDATA_ID CODE 00000137 dse 0
IXDATA_ID CODE 00000137 dse 0
PDATA_ID CODE 00000137 dse 0
DATA_ID CODE 00000137 dse 0
XDATA_ID CODE 00000137 - 0000057A 444 rel 0
BANK_RELAYS CODE 0000057B - 0000151C FA2 rel 0
RCODE CODE 0000151D - 00001C4F 733 rel 0
CODE_N CODE 00001C50 dse 0
DIFUNCT CODE 00001C50 dse 0
NEAR_CODE CODE 00001C50 - 00002C14 FC5 rel 2
<BANKED_CODE> 1 CODE 00002C15 - 00002C17 3 rel 0
<BANKED_CODE,CODE_C> 1
CODE 00002C18 - 00007FFB 53E4 rel 2
<BANKED_CODE,XDATA_ROM_C_FLASH> 1
CODE 00008000 - 0000FFFD 7FFE rel 2
<BANKED_CODE> 2 CODE 00010000 - 00017FF9 7FFA rel 0
<BANKED_CODE> 3 CODE 00018000 - 0001DE08 5E09 rel 0
BLENV_ADDRESS_SPACE
CODE 0003E800 - 0003F7FF 1000 rel 0
REGISTERS DATA 00000000 - 00000007 8 rel 0
VREG DATA 00000008 - 00000017 10 rel 0
PSP DATA 00000018 dse 0
XSP DATA 00000018 - 00000019 2 rel 0
DATA_I DATA 0000001A dse 0
BREG BIT 00000020.0 - 00000020.7 8 rel 0
DATA_Z DATA 00000021 - 00000028 8 rel 0
SFR_AN DATA 00000080 - 00000080 1 rel 0
DATA 00000086 - 0000008A 5
DATA 0000008C - 0000008D 2
DATA 00000090 - 00000091 2
DATA 00000094 - 00000097 4
DATA 0000009A - 000000A9 10
DATA 000000AB - 000000AF 5
DATA 000000B3 - 000000B4 2
DATA 000000B6 - 000000B6 1
DATA 000000B8 - 000000B9 2
DATA 000000BB - 000000C7 D
DATA 000000C9 - 000000C9 1
DATA 000000D1 - 000000DB B
DATA 000000E1 - 000000E9 9
DATA 000000F1 - 000000F5 5
DATA 000000F8 - 000000FA 3
DATA 000000FC - 000000FF 4
XSTACK XDATA 00000001 - 00000280 280 rel 0
XDATA_Z XDATA 00000281 - 00000BE4 964 rel 0
XDATA_I XDATA 00000BE5 - 00001028 444 rel 0
<XDATA_N> 1 XDATA 00001029 - 00001C2A C02 rel 0
XDATA_AN XDATA 0000780E - 00007813 6 rel 0
<XDATA_ROM_C> 1 CONST 00008000 - 00008805 806 rel 2
XDATA_ROM_C2 XDATA 00028000 - 0002F07F 7080 rel 0
XDATA_ROM_C3 XDATA 00038000 - 0003F07F 7080 rel 0
IDATA_I IDATA 00000029 dse 0
IDATA_Z IDATA 00000029 - 0000002A 2 rel 0
ISTACK IDATA 00000040 - 000000FF C0 rel 0
****************************************
* *
* END OF CROSS REFERENCE *
* *
****************************************
126 461 bytes of CODE memory
34 bytes of DATA memory (+ 86 absolute )
7 210 bytes of XDATA memory (+ 6 absolute )
194 bytes of IDATA memory
8 bits of BIT memory
59 654 bytes of CONST memory
Errors: none
Warnings: none
Two observations (scroll to the bottom of the map file):
The CODE size reduced by the 28803 bytes as the size of the original
array located in the original segment, similarly
segment size reduced by same 28803 bytes. and;
Newly added CONST segment (in xcl file) appear as XDATA in the map file
This all would be fine if I could download the generated binary into the chip, but when I try to 'download and debug' I receive the following message:
Fatal Error: Everything you want to place in flash memory must be placed with Xlink CODE memory segment type.
I tried to circumvent and generate intel-extended hex file to flash it standalone but the IDE returns the following error:
Error[e133]: The output format intel-extended cannot handle multiple address spaces. Use format variants (-y -O) to specify which address space is wanted
Reaching so far I tried one more thing and changed the new segments definition to (CODE) as advised by the error message.
// Define segments for const data in flash.
// First the segment with addresses as used by the program (flash mapped as XDATA)
-P(CONST)XDATA_ROM_C=0x8000-0xFFFF
-Z(CODE)XDATA_ROM_C2=0x28000-0x2FFFF // Modified to (CODE)
-Z(CODE)XDATA_ROM_C3=0x38000-0x3FFFF // Modified to (CODE)
Once again it links fine and the map file shows (scroll to the bottom):
****************************************
* *
* SEGMENTS IN ADDRESS ORDER *
* *
****************************************
SEGMENT SPACE START ADDRESS END ADDRESS SIZE TYPE ALIGN
======= ===== ============= =========== ==== ==== =====
INTVEC CODE 00000000 - 00000085 86 com 0
CSTART CODE 00000086 - 00000136 B1 rel 0
DATA_ID CODE 00000137 dse 0
BDATA_ID CODE 00000137 dse 0
BIT_ID CODE 00000137 dse 0
IDATA_ID CODE 00000137 dse 0
IXDATA_ID CODE 00000137 dse 0
PDATA_ID CODE 00000137 dse 0
XDATA_ID CODE 00000137 - 0000057A 444 rel 0
BANK_RELAYS CODE 0000057B - 0000151C FA2 rel 0
RCODE CODE 0000151D - 00001C4F 733 rel 0
DIFUNCT CODE 00001C50 dse 0
CODE_N CODE 00001C50 dse 0
NEAR_CODE CODE 00001C50 - 00002C14 FC5 rel 2
<BANKED_CODE> 1 CODE 00002C15 - 00002C17 3 rel 0
<BANKED_CODE,CODE_C> 1
CODE 00002C18 - 00007FFB 53E4 rel 2
<BANKED_CODE,XDATA_ROM_C_FLASH> 1
CODE 00008000 - 0000FFFD 7FFE rel 2
XDATA_ROM_C2 CODE 00010000 - 0001707F 7080 rel 0
<BANKED_CODE> 2 CODE 00017080 - 00017FF3 F74 rel 0
XDATA_ROM_C3 CODE 00018000 - 0001F07F 7080 rel 0
<BANKED_CODE> 3 CODE 0001F080 - 0001FFF0 F71 rel 0
<BANKED_CODE> 4 CODE 00020000 - 00027FF9 7FFA rel 0
<BANKED_CODE> 5 CODE 00028000 - 0002BF23 3F24 rel 0
BLENV_ADDRESS_SPACE
CODE 0003E800 - 0003F7FF 1000 rel 0
REGISTERS DATA 00000000 - 00000007 8 rel 0
VREG DATA 00000008 - 00000017 10 rel 0
PSP DATA 00000018 dse 0
XSP DATA 00000018 - 00000019 2 rel 0
DATA_I DATA 0000001A dse 0
BREG BIT 00000020.0 - 00000020.7 8 rel 0
DATA_Z DATA 00000021 - 00000028 8 rel 0
SFR_AN DATA 00000080 - 00000080 1 rel 0
DATA 00000086 - 0000008A 5
DATA 0000008C - 0000008D 2
DATA 00000090 - 00000091 2
DATA 00000094 - 00000097 4
DATA 0000009A - 000000A9 10
DATA 000000AB - 000000AF 5
DATA 000000B3 - 000000B4 2
DATA 000000B6 - 000000B6 1
DATA 000000B8 - 000000B9 2
DATA 000000BB - 000000C7 D
DATA 000000C9 - 000000C9 1
DATA 000000D1 - 000000DB B
DATA 000000E1 - 000000E9 9
DATA 000000F1 - 000000F5 5
DATA 000000F8 - 000000FA 3
DATA 000000FC - 000000FF 4
XSTACK XDATA 00000001 - 00000280 280 rel 0
XDATA_Z XDATA 00000281 - 00000BE4 964 rel 0
XDATA_I XDATA 00000BE5 - 00001028 444 rel 0
<XDATA_N> 1 XDATA 00001029 - 00001C2A C02 rel 0
XDATA_AN XDATA 0000780E - 00007813 6 rel 0
<XDATA_ROM_C> 1 CONST 00008000 - 00008805 806 rel 2
IDATA_I IDATA 00000029 dse 0
IDATA_Z IDATA 00000029 - 0000002A 2 rel 0
ISTACK IDATA 00000040 - 000000FF C0 rel 0
****************************************
* *
* END OF CROSS REFERENCE *
* *
****************************************
184 061 bytes of CODE memory
34 bytes of DATA memory (+ 86 absolute )
7 210 bytes of XDATA memory (+ 6 absolute )
194 bytes of IDATA memory
8 bits of BIT memory
2 054 bytes of CONST memory
Errors: none
Warnings: none
Voilla, CONST has shrunk and CODE expanded by exactly 57600 bytes as expected.
It even download and debug and generates hex file.
BUT when debugging the code it appears that instead of accessing 0x28000/0x38000, the memory controller access the data at 0x8000 which is the 'original' (CONST)XDATA_ROM_C segment.
To summarize:
when defining the two new segments as (CONST), the code links but can not debug nor generate hex file and
when defining the two new segments as (CODE), the code links, loads and run but the memory is not accessed correclty.
phew, this was long description. Any ideas someone ????
Note that this thread is duplicated at TI e2e forum as well here
I have the same problem. Seems all data constants need to be in a single 32K segment that gets mapped into the Xdata region. I did a weak work around by using the -Z(CODE)XDATA_ROM_C2=0x38000-0x3FFFF declaration in your second attempt to map my 2nd array of 32K data into BANK6. In my code access routine I cheated by using the HalFlashRead routine to pull 32 bytes chunks into a local array and then index into the local array -- lots of bits missing, but the HaFlash routine expects a 2k page number, offset into page, buffer to place the data, and byte count. This is hardly a solid portable solution but allowed me to move forward. -- hope it helps or gives you more insight
rd_addr = 0x38000 + offset;
HalFlashRead((rd_addr>>0x11)&0xff), (rd_addr & 0x7ff), *ptr32bytes, 32);
realdata = ptr[0]; // offset == [0], offset+1= [1], ....

ORACLE: Calculate storage requirement of table in terms of blocks on paper

I have to calculate storage requirements in terms of blocks for following table:
Bill(billno number(7),billdate date,ccode varchar2(20),amount number(9,2))
The table storage attributes are :
PCTFREE=20 , INITRANS=4 , PCTUSED=60 , BLOCKSIZE=8K , NUMBER OF ROWS=100000
I searched a lot on internet, referred many books but didn't got anything.
First you need to figure out what is the typical value for varchar2 column. The total size will depend on that. I created 2 tables from your BILL table. BILLMAX where ccode takes always 20 Char ('12345678901234567890') and BILLMIN that has always NULL in ccode.
The results are:
TABLE_NAME NUM_ROWS AVG_ROW_LEN BLOCKS
BILLMAX 3938 37 28
BILLMIN 3938 16 13
select table_name, num_rows, avg_row_len, blocks from user_tables
where table_name in ( 'BILLMIN', 'BILLMAX')
As you can see, the number of blocks depends on that. Use exec dbms_stats.GATHER_TABLE_STATS('YourSchema','BILL') to refresh values inside user_tables.
The other thing that you need to take into consideration is how big will be your extents. For example :
STORAGE (
INITIAL 64K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
will generate first 16 extents with 8 blocks size. After that it will start to create extents with size of 1 MB (128 blocks).
So for BILLMAX it will generate 768 blocks and BILLMIN will take 384 blocks.
As you can see the difference is quite big.
For BILLMAX : 16 * 8 + 128 * 5 = 768
For BILLMIN : 16 * 8 + 128 * 2 = 384

application memory optimization

We have a project written in ANSI C. Generally the memory consumption was not a big concern, but now we have a request to fit our program into 256 KB of RAM. I don't have this exact platform on hands, so I compile my project under 32 bit x86 Linux (because it provides enough different tools to evaluate the memory consumption), optimize what I can, remove some features and eventually I have to have the conclusion: what features we need to sacrifice to be able to run on very small systems (if we're able at all). First of all I did a research what exactly a memory size in linux and it seems I have to optimize the RSS size, not VSZ. But in linux even a smallest program which prints "Hello world!" once a second consumes 285-320 KB in RSS:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
unsigned char cuStopCycle = 0;
void SigIntHandler(int signo)
{
printf("SIGINT received, terminating the program\n");
cuStopCycle = 1;
}
int main()
{
signal( SIGINT, SigIntHandler);
while(!cuStopCycle)
{
printf("Hello, World!\n");
sleep(1);
}
printf("Exiting...\n");
}
user#Ubuntu12-vm:~/tmp/prog_size$ size ./prog_size
text data bss dec hex filename
1456 272 12 1740 6cc ./prog_size
root#Ubuntu12-vm:/home/app# ps -C prog_size -o pid,rss,vsz,args
PID RSS VSZ COMMAND
22348 316 2120 ./prog_size
Obviously this program will perfectly run on small PLCs, with 64KB of RAM. It is just linux loads a lot of libs. I generate a map file for this program and all this data + bss comes from the CRT library. I need to mention that if I add some code to this project - 10,000 times "a = a + b" or manipulate arrays 2000 long int variables, I see the difference in code size, bss size but eventually the RSS size of the process is the same, it doesn't affected)
So I take this as a baseline, the point I want to reach (and which I will never reach, because I need more functions than just print a message once a second).
So here comes my project, where I removed all extra features, removed all auxiliary functions, removed everything except the basic functionality. There are some ways to optimize more, but not that much, what could be removed is already taken away:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# ls -l App
-rwxr-xr-x 1 root root 42520 Jul 13 18:33 App
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# size ./App
text data bss dec hex filename
37027 404 736 38167 9517 ./App
So I have ~36KB of code and ~1KB of data. I do not call malloc inside of my project, I use a shared memory allocation with a wrapper library so I can control how much memory is allocated:
The total memory size allocated is 2052 bytes
Under the hood there are malloc calls obviously, if I substitute 'malloc' calls with my function which summarize all alloc requests I see that ~2.3KB of memory is allocated:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# LD_PRELOAD=./override_malloc.so ./App
Malloc allocates 2464 bytes total
Now I run my project amd see that it consumes 600KB of RAM.
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# ps -C App -o pid,rss,vsz,args
PID RSS VSZ COMMAND
22093 604 2340 ./App
I do not understand why it eats so much memory. The code size is small. There is not much memory allocated. The size of data is small. Why it takes so much memory? I tried to analyze the mapping of the process:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# pmap -x 22093
22093: ./App
Address Kbytes RSS Dirty Mode Mapping
08048000 0 28 0 r-x-- App
08052000 0 4 4 r---- App
08053000 0 4 4 rw--- App
09e6a000 0 4 4 rw--- [ anon ]
b7553000 0 4 4 rw--- [ anon ]
b7554000 0 48 0 r-x-- libpthread-2.15.so
b756b000 0 4 4 r---- libpthread-2.15.so
b756c000 0 4 4 rw--- libpthread-2.15.so
b756d000 0 8 8 rw--- [ anon ]
b7570000 0 300 0 r-x-- libc-2.15.so
b7714000 0 8 8 r---- libc-2.15.so
b7716000 0 4 4 rw--- libc-2.15.so
b7717000 0 12 12 rw--- [ anon ]
b771a000 0 16 0 r-x-- librt-2.15.so
b7721000 0 4 4 r---- librt-2.15.so
b7722000 0 4 4 rw--- librt-2.15.so
b7731000 0 4 4 rw-s- [ shmid=0x70000c ]
b7732000 0 4 4 rw-s- [ shmid=0x6f800b ]
b7733000 0 4 4 rw-s- [ shmid=0x6f000a ]
b7734000 0 4 4 rw-s- [ shmid=0x6e8009 ]
b7735000 0 12 12 rw--- [ anon ]
b7738000 0 4 0 r-x-- [ anon ]
b7739000 0 104 0 r-x-- ld-2.15.so
b7759000 0 4 4 r---- ld-2.15.so
b775a000 0 4 4 rw--- ld-2.15.so
bfb41000 0 12 12 rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 2336 - - -
And it looks like the program size (in RSS) is only 28KB, the rest is consumed by shared libraries. BTW I do not use posix threads, I do not explicitly link to it, but somehow the linker anyway links this library I have no idea why (this is not really important). If we look at the mapping in more details:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# cat /proc/22093/smaps
08048000-08052000 r-xp 00000000 08:01 344838 /home/app/workspace/proj_sizeopt/Cmds/App
Size: 40 kB
Rss: 28 kB
Pss: 28 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 28 kB
Private_Dirty: 0 kB
Referenced: 28 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
09e6a000-09e8b000 rw-p 00000000 00:00 0 [heap]
Size: 132 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
b7570000-b7714000 r-xp 00000000 08:01 34450 /lib/i386-linux-gnu/libc-2.15.so
Size: 1680 kB
Rss: 300 kB
Pss: 7 kB
Shared_Clean: 300 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 300 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
b7739000-b7759000 r-xp 00000000 08:01 33401 /lib/i386-linux-gnu/ld-2.15.so
Size: 128 kB
Rss: 104 kB
Pss: 3 kB
Shared_Clean: 104 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 104 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
bfb41000-bfb62000 rw-p 00000000 00:00 0 [stack]
Size: 136 kB
Rss: 12 kB
Pss: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
So I see that RSS size for my project is 40KB, but only 28 KB is used. Does it mean that this project will fit into 256 KB of RAM?
The heap size is 132KB but only 4 KB is used. Why is that? I'm sure it will be different on the small embedded platform.
The stack is 136KB but only 12KB is used.
GLIBC/LD obviously consume some memory, but what exactly memory will it be on the embedded platform?
I do not look at PSS because it doesn't make any sense in my case, I look only at RSS.
What conclusions can I draw from this picture? How exactly to evaluate memory consumption by the application? Look at the RSS size of the process? Or subtract from this size RSS of all mapped system libraries? What is about heap/stack size?
I would be very grateful for any advises, notes, memory consumption optimizations techniques, DOs and DON'Ts for platforms with extremely small amount of RAM (except obvious - keep amount of data and code to the very minimum).
I also will appreciate an explanation WHY the program with small amount of code and data (and which doesn't allocate much memory) still consumes a lot of RAM in RSS.
Thank you in advance
... fit our program into 256 KB of RAM. I don't have this exact platform on hands, so I compile my project under 32 bit x86 Linux..
And what you now see is that the Linux platform tools make reasonable assumptions of your possible need of stack and heap, given that it nows you run on a big machine, and links-in a reasonable set of library functions for your needs. Some you won't need, but it gives them to you "for free".
To fit in 256 Kb on your target platform, you must compile for your target platform and link with the target platform's libraries (and CRT) using the target platform's linker.
Those will make different assumptions, use possibly smaller linbrary footprints, make smaller assumptions on stack and heap space, etcetera. For example, create "Hello World" for the target platform and check its needs on that target platform. Or use a realistic simulator of the target platform and libraries (and not to forget, OS, whch partly dictates what the libraries must do).
And if it is then still too big, you have to re-write or tweak the whole CRT and all libraries....
the program needs to be compiled/linked with the embedded device in mind.
For best results use a makefile
use the 'rt' library written for the embedded device
use the start.s file, located, via the makefile, where execution begins.
use 'static' in the linker parameters
use the linker parameters to not include any libraries but what is specifically requested.
do not use libraries written for your development machine. Only use libraries written for the embedded device.
do NOT include stdio.h, etc. unless specifically written for the embedded device
do NOT call printf() inside a signal handler.
if possible, do not call printf() at all.
instead write a small char output function and have it perform the output through the uart.
do not use signals, instead use interrupts
the resulting application will not run on your PC., but, once loaded, will run on the 256k device
do not call sleep(), rather write your own function that uses a device timer peripheral, that sets the timer and puts the device into powerdown mode.
the time interrupt needs to bring the device out of the powerdown mode.
in the makefile, specifically set the size of the stack, heap, etc.
have the link step output a .map file. study that map file until you understand everything in it.
use a compiler/linker that is specific for the embedded device
you probably will need to include a function that initializes the peripherals on the device, like the clock, the uart, the timer, the watchdog, and any other built in peripherals that the code actually uses.
you will need a file that allocates the interrupt table, and a small function to handle each of the interrupts, even though most of those functions will do nothing beyond clearing the appropriate interrupt pending flag and returning from the interrupt
you will probably need a function to periodically refresh the watchdog, conditionally, depending on an indication that the main function is still cycling regularily. I.E the main function loop and the initialization function will refresh the watchdog

Resources