I have done a sched_setaffinity test in Linux in a server with 1 socket ,4 cores ,
the following /proc/cpuinfo showes the cpu information :
processor : 0
model name : Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 1
model name : Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 2
model name : Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
processor : 3
model name : Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz
cache size : 2048 KB
physical id : 0
siblings : 4
cpu cores : 4
I have a simple test application :
struct foo {
int x;
int y;
} ;
//globar var
volatile struct foo fvar ;
pid_t gettid( void )
{
return syscall( __NR_gettid );
}
void *test_func0(void *arg)
{
int proc_num = (int)(long)arg;
cpu_set_t set;
CPU_ZERO( &set );
CPU_SET( proc_num, &set );
printf("proc_num=(%d)\n",proc_num) ;
if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))
{
perror( "sched_setaffinity" );
return NULL;
}
int i=0;
for(i=0;i<1000000000;++i){
__sync_fetch_and_add(&fvar.x,1);
}
return NULL;
} //test_func0
compiled :
gcc testsync.c -D_GNU_SOURCE -lpthread -o testsync.exe
The following is the test results :
2 threads running test_func0 in core 0,1 take 35 secs ;
2 threads running test_func0 in core 0,2 take 55 secs ;
2 threads running test_func0 in core 0,3 take 55 secs ;
2 threads running test_func0 in core 1,2 take 55 secs ;
2 threads running test_func0 in core 1,3 take 55 secs ;
2 threads running test_func0 in core 2,3 take 35 secs ;
I wonder why 2 threads running in core (0,1) or in core(2,3) would be much
faster in others ? if I running 2 threads at the same core , like core(1,1) ,
core(2,2),core(3,3) , that would be take 28 secs , also confused why this happen ?
Cores 0 and 1 share an L2 cache, and so do cores 2 and 3. Running on two cores that share the cache makes the shared variable stay in the L2 cache, which makes things faster.
This is not true in today's Intel processors, where L2 is per core. But on the CPU you're using, this is how it works (it's actually a quad-core CPU made by gluing together two dual-core CPUs).
Related
5.13
We made tweaks on my.cnf according to mysql tuner, and tuning primer,
but i can't really figure out when the memory will be stabilize usage of mariadb.
Here mysql tuner result -
------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 5d 8h 19m 28s (226M q [491.299 qps], 1M conn, TX: 4653G, RX: 49G)
[--] Reads / Writes: 95% / 5%
[--] Binary logging is disabled
[--] Physical Memory : 125.3G
[--] Max MySQL memory : 67.1G
[--] Other process memory: 0B
[--] Total buffers: 15.4G global + 264.8M per thread (200 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 34.3G (27.34% of installed RAM)
[OK] Maximum possible memory usage: 67.1G (53.54% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (78/226M)
[OK] Highest usage of available connections: 36% (73/200)
[OK] Aborted connections: 0.02% (447/1823123)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (4K temp sorts / 56M sorts)
[!!] Joins performed without indexes: 815533
[!!] Temporary tables created on disk: 87% (26M on disk / 30M total)
[OK] Thread cache hit rate: 99% (73 created / 1M connections)
[OK] Table cache hit rate: 99% (286M hits / 287M requests)
[OK] table_definition_cache(37000) is upper than number of tables(36204)
[OK] Open file limit used: 69% (51K/74K)
[OK] Table locks acquired immediately: 99% (170M immediate / 170M locks)
Here tuning primer -
WORKER THREADS
Current thread_cache_size = 200
Current threads_cached = 72
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine
MAX CONNECTIONS
Current max_connections = 200
Current threads_connected = 1
Historic max_used_connections = 73
The number of used connections is 36% of the configured maximum.
Your max_connections variable seems to be fine.
INNODB STATUS
Current InnoDB index space = 1.46 G
Current InnoDB data space = 10.79 G
Current InnoDB buffer pool free = 24 %
Current innodb_buffer_pool_size = 15.00 G
Depending on how much space your innodb indexes take up it may be safe
to increase this value to up to 2 / 3 of total system memory
MEMORY USAGE
Max Memory Ever Allocated : 15.76 G
Configured Max Per-thread Buffers : 1.71 G
Configured Max Global Buffers : 15.14 G
Configured Max Memory Limit : 16.85 G
Physical Memory : 125.34 G
Max memory limit seem to be within acceptable norms
Here a sample of resources from top since start up until 5 days and 8 hours uptime of mariadb -
mysql 20 0 21.4g 4.1g 24576 S 4.7 3.3 0:08.90 mariadbd
mysql 20 0 21.4g 4.8g 25728 S 4.7 3.8 2:13.62 mariadbd
mysql 20 0 25.9g 8.7g 26480 S 15.0 6.9 56:52.91 mariadbd
mysql 20 0 26.2g 8.8g 26480 S 3.7 7.0 58:37.38 mariadbd
mysql 20 0 26.7g 9.3g 26480 S 10.6 7.4 66:38.53 mariadbd
mysql 20 0 26.7g 9.3g 26480 S 6.3 7.4 76:34.20 mariadbd
mysql 20 0 26.9g 9.6g 26480 S 10.0 7.7 80:31.37 mariadbd
mysql 20 0 26.9g 9.8g 26480 S 10.3 7.8 95:36.03 mariadbd
mysql 20 0 27.4g 10.2g 26480 S 8.0 8.1 96:28.67 mariadbd
mysql 20 0 27.7g 12.4g 26496 S 2.7 9.9 127:20.78 mariadbd
mysql 20 0 27.7g 15.0g 26496 S 11.3 12.0 129:35.15 mariadbd
mysql 20 0 27.7g 16.3g 26492 S 4.7 13.0 130:34.33 mariadbd
mysql 20 0 27.7g 16.3g 26492 S 22.9 13.0 135:47.21 mariadbd
mysql 20 0 27.7g 16.4g 26460 S 13.3 13.1 182:57.25 mariadbd
mysql 20 0 27.7g 16.9g 26436 S 5.0 13.5 186:18.78 mariadbd
mysql 20 0 28.2g 17.4g 26436 S 27.9 13.9 220:29.14 mariadbd
mysql 20 0 28.2g 17.4g 26436 S 10.3 13.9 240:20.25 mariadbd
mysql 20 0 28.6g 17.8g 26436 S 15.9 14.2 242:50.85 mariadbd
mysql 20 0 28.6g 17.9g 26436 S 6.6 14.2 244:48.40 mariadbd
mysql 20 0 29.0g 18.3g 26436 S 5.0 14.6 271:35.53 mariadbd
mysql 20 0 29.4g 18.5g 26480 S 11.6 14.7 327:44.08 mariadbd
mysql 20 0 29.8g 18.9g 26480 S 21.6 15.1 328:47.99 mariadbd
mysql 20 0 29.8g 18.9g 26480 S 3.7 15.1 340:27.10 mariadbd
mysql 20 0 29.8g 18.9g 26480 S 7.0 15.1 360:15.26 mariadbd
mysql 20 0 30.2g 19.6g 26460 S 3.6 15.7 397:31.67 mariadbd
mysql 20 0 31.4g 20.1g 26460 S 6.6 16.1 424:18.82 mariadbd
mysql 20 0 31.8g 20.6g 26460 S 1.7 16.5 434:33.15 mariadbd
mysql 20 0 31.8g 20.7g 26460 S 12.3 16.5 474:25.23 mariadbd
mysql 20 0 32.1g 21.1g 26444 S 9.6 16.9 501:38.46 mariadbd
mysql 20 0 32.1g 21.1g 26448 S 14.0 16.9 509:05.98
mysql 20 0 32.5g 21.6g 26448 S 10.3 17.2 524:06.26 mariadbd
mysql 20 0 33.0g 22.0g 26448 S 7.0 17.6 527:11.55 mariadbd
mysql 20 0 33.5g 22.5g 25612 S 29.9 17.9 578:12.81 mariadbd
mysql 20 0 33.5g 22.5g 25612 S 2.3 18.0 580:04.96 mariadbd
mysql 20 0 34.4g 23.4g 25044 S 17.9 18.7 647:25.39 mariadbd
mysql 20 0 35.3g 24.5g 25272 S 10.0 19.5 753:34.30 mariadbd
mysql 20 0 35.7g 24.9g 25272 S 10.0 19.9 766:17.57 mariadbd
if im trying to determine the memory usage via service status the following result -
service mariadb status
Redirecting to /bin/systemctl status mariadb.service
● mariadb.service - MariaDB 10.5.13 database server
Active: active (running) since Mon 2022-01-03 03:03:15 IST; 5 days ago
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Main PID: (mariadbd)
Status: "Taking your SQL requests now..."
Tasks: 87 (limit: 820947)
Memory: 37.4G
now i just see so many indiffrent current memory usage, and for example tuning primer returns -
Configured Max Memory Limit : 16.85 G
but mysql tuner returns -
[OK] Maximum reached memory usage: 34.3G (27.34% of installed RAM)
[OK] Maximum possible memory usage: 67.1G (53.54% of installed RAM)
attaching my.cnf configurations -
performance-schema=0
#innodb_buffer_pool_size=134217728
max_allowed_packet=268435456
open_files_limit=74000
innodb_file_per_table=1
unix_socket=OFF
max_connections=200
#wait_timeout=600
#interactive_timeout=600
##
innodb_buffer_pool_size=15G
max_heap_table_size=128M
tmp_table_size=128M
#max_connections=400
table_open_cache=37000
table_definition_cache=37000
thread_cache_size=200
key_buffer_size=128M
sort_buffer_size=1M
read_buffer_size=4M
read_rnd_buffer_size=512k
join_buffer_size=3M
##
But as you can see, top command returns 24.9gb ram usage, and service status returns 37.4GB ram usage? and tuning primer saying max memory is 16.85GB, but mysql tuner says 67.1GB?
I'm completly lost in this one..
when the memory will build up until getting a stabilize memory usage?
why tuning primer, and mysql tuner memory usage is so different?from the current usage of the server itself for mariadb service?
Could anyone share his tought regards it?
Thanks!
Edit -
Hey thanks for the reply, it just keep increasing after 7 and half days -
mysql 20 0 36.2g 25.4g 25280 S 10.3 20.3 811:15.91 mariadbd
mysql 20 0 36.7g 25.8g 25296 S 5.0 20.6 840:23.93 mariadbd
mysql 20 0 37.1g 26.3g 25296 S 3.3 20.9 843:11.06 mariadbd
mysql 20 0 37.1g 26.3g 25296 S 7.6 21.0 846:27.98 mariadbd
mysql 20 0 37.1g 26.5g 25296 S 14.9 21.1 898:34.45 mariadbd
mysql 20 0 37.2g 26.5g 25272 S 13.0 21.1 933:06.45 mariadbd
mysql 20 0 37.2g 26.5g 25260 S 25.0 21.2 956:38.96 mariadbd
mysql 20 0 37.6g 26.9g 25260 S 4.3 21.5 992:54.48 mariadbd
mysql 20 0 38.0g 27.4g 25252 S 17.5 21.9 1068:31 mariadbd
mariadb.service - MariaDB 10.5.13 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Mon 2022-01-03 03:03:15 IST; 1 weeks 0 days ago
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Main PID: 4105634 (mariadbd)
Status: "Taking your SQL requests now..."
Tasks: 86 (limit: 820947)
Memory: 41.0G
total used free shared buff/cache available
Mem: 128350 35498 9313 2356 83538 89236
Swap: 0 0 0
systemd service status 37.4G looks consistent with the 35.7g figure from top (GiB vs GB perhaps explains the differences).
The 24.9G in top is the amount paged it. The difference to 37.4G is memory not paged in (or perhaps swapped out).
Tuning primer's 16.58G looks like it hasn't accounted all threads memory usage
mysqltuner's maximum looks like it has taken the pessimistic allocation of the maximum possible per thread allocation, which isn't occurring because your workload evidently doesn't to all the things that would get to this figure.
If your 5 days of uptime is pretty consistent to your workload, then 35/37G looks to be about the maximum used.
The thing that would make it rise from this point is if your 24% free buffer pool gets used. Your tuning primer shows 24% free which on 15G gives another 3.6G of potentially memory used on our current workload.
MySQL Tuner full from around 3h ago -
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 7d 9h 28m 52s (316M q [495.709 qps], 2M conn, TX: 6533G, RX: 69G)
[--] Reads / Writes: 95% / 5%
[--] Binary logging is disabled
[--] Physical Memory : 125.3G
[--] Max MySQL memory : 67.1G
[--] Other process memory: 0B
[--] Total buffers: 15.4G global + 264.8M per thread (200 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 34.3G (27.34% of installed RAM)
[OK] Maximum possible memory usage: 67.1G (53.54% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (111/316M)
[OK] Highest usage of available connections: 36% (73/200)
[OK] Aborted connections: 0.02% (605/2516029)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (6K temp sorts / 78M sorts)
[!!] Joins performed without indexes: 1135010
[!!] Temporary tables created on disk: 87% (36M on disk / 41M total)
[OK] Thread cache hit rate: 99% (73 created / 2M connections)
[OK] Table cache hit rate: 99% (398M hits / 400M requests)
[OK] table_definition_cache(37000) is upper than number of tables(36425)
[OK] Open file limit used: 69% (51K/74K)
[OK] Table locks acquired immediately: 99% (242M immediate / 242M locks)
-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 36 thread(s).
[--] Using default value is good enough for your version (10.5.13-MariaDB)
-------- MyISAM Metrics ----------------------------------------------------------------------------
[OK] Key buffer used: 99.7% (127.6M used / 128.0M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/1.3G
[OK] Read Key buffer hit rate: 99.9% (3B cached / 1M reads)
[!!] Write Key buffer hit rate: 43.1% (31M cached / 13M writes)
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 15.0G/12.2G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (0.625 %): 96.0M * 1/15.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 120 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.99% (7707381322 hits/ 7708055862 total)
[!!] InnoDB Write Log efficiency: 335.3% (2710443 hits/ 808371 total)
[OK] InnoDB log waits: 0.00% (0 waits / 3518814 writes)
-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/896.0K
[OK] Aria pagecache hit rate: 97.8% (1B cached / 30M reads)
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: OFF
[--] Semi synchronous replication Slave: OFF
[--] This is a standalone server
Heres tops -
top - 17:06:15 up 67 days, 14:15, 1 user, load average: 2.88, 2.79, 2.77
Tasks: 560 total, 3 running, 557 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 0.6 sy, 0.1 ni, 95.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 128350.7 total, 7895.4 free, 35970.4 used, 84484.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 89024.2 avail Mem
nothing except mariadb using alot of memory, some LSPHP proccesses spawn and go as for php requests to sites hosted within the server but they using memory only for proccesing the requests and then releasing it..
but overall cpu average is around 3-12% cpu usage, so server not under heavy load.
Server specs -
Intel xeon w-2295
software raid 1 with 2 NVMEs hard drives
OS - Cloudlinux 8.5 (based of almalinux 8.5 + redhat)
server using cPanel with mariadb 10.5.3 of cPanel.
128GB ram ecc memory
SELECT COUNT() FROM information_schema.tables;
COUNT()
94
Proccess list -
Show MySQL Processes
Id User Host db Command Time State Info Progress
root localhost NULL Query 0 starting SHOW PROCESSLIST 0.000
sometimes you see around 4-14 proccessess in the proccess lists.
Variable_name
Value
alter_algorithm
DEFAULT
analyze_sample_percentage
100.000000
aria_block_size
8192
aria_checkpoint_interval
30
aria_checkpoint_log_activity
1048576
aria_encrypt_tables
OFF
aria_force_start_after_recovery_failures
0
aria_group_commit
none
aria_group_commit_interval
0
aria_log_file_size
1073741824
aria_log_purge_type
immediate
aria_max_sort_file_size
9223372036853727232
aria_page_checksum
ON
aria_pagecache_age_threshold
300
aria_pagecache_buffer_size
134217728
aria_pagecache_division_limit
100
aria_pagecache_file_hash_size
512
aria_recover_options
BACKUP,QUICK
aria_repair_threads
1
aria_sort_buffer_size
268434432
aria_stats_method
nulls_unequal
aria_sync_log_dir
NEWFILE
aria_used_for_temp_tables
ON
auto_increment_increment
1
auto_increment_offset
1
autocommit
ON
automatic_sp_privileges
ON
back_log
90
basedir
/usr/
big_tables
OFF
bind_address
binlog_annotate_row_events
ON
binlog_cache_size
32768
binlog_checksum
CRC32
binlog_commit_wait_count
0
binlog_commit_wait_usec
100000
binlog_direct_non_transactional_updates
OFF
binlog_file_cache_size
16384
binlog_format
MIXED
binlog_optimize_thread_scheduling
ON
binlog_row_image
FULL
binlog_row_metadata
NO_LOG
binlog_stmt_cache_size
32768
bulk_insert_buffer_size
8388608
character_set_client
latin1
character_set_connection
latin1
character_set_database
latin1
character_set_filesystem
binary
character_set_results
latin1
character_set_server
latin1
character_set_system
utf8
character_sets_dir
/usr/share/mysql/charsets/
check_constraint_checks
ON
collation_connection
latin1_swedish_ci
collation_database
latin1_swedish_ci
collation_server
latin1_swedish_ci
column_compression_threshold
100
column_compression_zlib_level
6
column_compression_zlib_strategy
DEFAULT_STRATEGY
column_compression_zlib_wrap
OFF
completion_type
NO_CHAIN
concurrent_insert
AUTO
connect_timeout
10
core_file
OFF
datadir
/var/lib/mysql/
date_format
%Y-%m-%d
datetime_format
%Y-%m-%d %H:%i:%s
deadlock_search_depth_long
15
deadlock_search_depth_short
4
deadlock_timeout_long
50000000
deadlock_timeout_short
10000
debug_no_thread_alarm
OFF
default_password_lifetime
0
default_regex_flags
default_storage_engine
InnoDB
default_tmp_storage_engine
default_week_format
0
delay_key_write
ON
delayed_insert_limit
100
delayed_insert_timeout
300
delayed_queue_size
1000
disconnect_on_expired_password
OFF
div_precision_increment
4
encrypt_binlog
OFF
encrypt_tmp_disk_tables
OFF
encrypt_tmp_files
OFF
enforce_storage_engine
eq_range_index_dive_limit
200
event_scheduler
OFF
expensive_subquery_limit
100
expire_logs_days
0
explicit_defaults_for_timestamp
OFF
extra_max_connections
1
extra_port
0
flush
OFF
flush_time
0
foreign_key_checks
ON
ft_boolean_syntax
+ -><()~*:""&|
ft_max_word_len
84
ft_min_word_len
4
Variable_name
Value
ft_query_expansion_limit
20
ft_stopword_file
(built-in)
general_log
OFF
general_log_file
servername.log
group_concat_max_len
1048576
gtid_binlog_pos
gtid_binlog_state
gtid_cleanup_batch_size
64
gtid_current_pos
gtid_domain_id
0
gtid_ignore_duplicates
OFF
gtid_pos_auto_engines
gtid_slave_pos
gtid_strict_mode
OFF
have_compress
YES
have_crypt
YES
have_dynamic_loading
YES
have_geometry
YES
have_openssl
YES
have_profiling
YES
have_query_cache
YES
have_rtree_keys
YES
have_ssl
DISABLED
have_symlink
YES
histogram_size
254
histogram_type
DOUBLE_PREC_HB
host_cache_size
328
hostname
hostname
idle_readonly_transaction_timeout
0
idle_transaction_timeout
0
idle_write_transaction_timeout
0
ignore_builtin_innodb
OFF
ignore_db_dirs
in_predicate_conversion_threshold
1000
init_connect
init_file
init_slave
innodb_adaptive_flushing
ON
innodb_adaptive_flushing_lwm
10.000000
innodb_adaptive_hash_index
OFF
innodb_adaptive_hash_index_parts
8
innodb_adaptive_max_sleep_delay
0
innodb_autoextend_increment
64
innodb_autoinc_lock_mode
1
innodb_background_scrub_data_check_interval
0
innodb_background_scrub_data_compressed
OFF
innodb_background_scrub_data_interval
0
innodb_background_scrub_data_uncompressed
OFF
innodb_buf_dump_status_frequency
0
innodb_buffer_pool_chunk_size
134217728
innodb_buffer_pool_dump_at_shutdown
ON
innodb_buffer_pool_dump_now
OFF
innodb_buffer_pool_dump_pct
25
innodb_buffer_pool_filename
ib_buffer_pool
innodb_buffer_pool_instances
1
innodb_buffer_pool_load_abort
OFF
innodb_buffer_pool_load_at_startup
ON
innodb_buffer_pool_load_now
OFF
innodb_buffer_pool_size
16106127360
innodb_change_buffer_max_size
25
innodb_change_buffering
all
innodb_checksum_algorithm
full_crc32
innodb_cmp_per_index_enabled
OFF
innodb_commit_concurrency
0
innodb_compression_algorithm
zlib
innodb_compression_default
OFF
innodb_compression_failure_threshold_pct
5
innodb_compression_level
6
innodb_compression_pad_pct_max
50
innodb_concurrency_tickets
0
innodb_data_file_path
ibdata1:12M:autoextend
innodb_data_home_dir
innodb_deadlock_detect
ON
innodb_default_encryption_key_id
1
innodb_default_row_format
dynamic
innodb_defragment
OFF
innodb_defragment_fill_factor
0.900000
innodb_defragment_fill_factor_n_recs
20
innodb_defragment_frequency
40
innodb_defragment_n_pages
7
innodb_defragment_stats_accuracy
0
innodb_disable_sort_file_cache
OFF
innodb_disallow_writes
OFF
innodb_doublewrite
ON
innodb_encrypt_log
OFF
innodb_encrypt_tables
OFF
innodb_encrypt_temporary_tables
OFF
innodb_encryption_rotate_key_age
1
innodb_encryption_rotation_iops
100
innodb_encryption_threads
0
innodb_fast_shutdown
1
innodb_fatal_semaphore_wait_threshold
600
innodb_file_format
innodb_file_per_table
ON
innodb_fill_factor
100
innodb_flush_log_at_timeout
1
innodb_flush_log_at_trx_commit
1
innodb_flush_method
fsync
innodb_flush_neighbors
1
innodb_flush_sync
ON
Variable_name
Value
innodb_flushing_avg_loops
30
innodb_force_load_corrupted
OFF
innodb_force_primary_key
OFF
innodb_force_recovery
0
innodb_ft_aux_table
innodb_ft_cache_size
8000000
innodb_ft_enable_diag_print
OFF
innodb_ft_enable_stopword
ON
innodb_ft_max_token_size
84
innodb_ft_min_token_size
3
innodb_ft_num_word_optimize
2000
innodb_ft_result_cache_limit
2000000000
innodb_ft_server_stopword_table
innodb_ft_sort_pll_degree
2
innodb_ft_total_cache_size
640000000
innodb_ft_user_stopword_table
innodb_immediate_scrub_data_uncompressed
OFF
innodb_instant_alter_column_allowed
add_drop_reorder
innodb_io_capacity
200
innodb_io_capacity_max
2000
innodb_large_prefix
innodb_lock_schedule_algorithm
fcfs
innodb_lock_wait_timeout
50
innodb_log_buffer_size
16777216
innodb_log_checksums
ON
innodb_log_compressed_pages
ON
innodb_log_file_size
100663296
innodb_log_files_in_group
1
innodb_log_group_home_dir
./
innodb_log_optimize_ddl
OFF
innodb_log_write_ahead_size
8192
innodb_lru_flush_size
32
innodb_lru_scan_depth
1536
innodb_max_dirty_pages_pct
90.000000
innodb_max_dirty_pages_pct_lwm
0.000000
innodb_max_purge_lag
0
innodb_max_purge_lag_delay
0
innodb_max_purge_lag_wait
4294967295
innodb_max_undo_log_size
10485760
innodb_monitor_disable
innodb_monitor_enable
innodb_monitor_reset
innodb_monitor_reset_all
innodb_old_blocks_pct
37
innodb_old_blocks_time
1000
innodb_online_alter_log_max_size
134217728
innodb_open_files
36885
innodb_optimize_fulltext_only
OFF
innodb_page_cleaners
1
innodb_page_size
16384
innodb_prefix_index_cluster_optimization
OFF
innodb_print_all_deadlocks
OFF
innodb_purge_batch_size
300
innodb_purge_rseg_truncate_frequency
128
innodb_purge_threads
4
innodb_random_read_ahead
OFF
innodb_read_ahead_threshold
56
innodb_read_io_threads
4
innodb_read_only
OFF
innodb_replication_delay
0
innodb_rollback_on_timeout
OFF
innodb_scrub_log
OFF
innodb_scrub_log_speed
256
innodb_sort_buffer_size
1048576
innodb_spin_wait_delay
4
innodb_stats_auto_recalc
ON
innodb_stats_include_delete_marked
OFF
innodb_stats_method
nulls_equal
innodb_stats_modified_counter
0
innodb_stats_on_metadata
OFF
innodb_stats_persistent
ON
innodb_stats_persistent_sample_pages
20
innodb_stats_traditional
ON
innodb_stats_transient_sample_pages
8
innodb_status_output
OFF
innodb_status_output_locks
OFF
innodb_strict_mode
ON
innodb_sync_array_size
1
innodb_sync_spin_loops
30
innodb_table_locks
ON
innodb_temp_data_file_path
ibtmp1:12M:autoextend
innodb_thread_concurrency
0
innodb_thread_sleep_delay
0
innodb_tmpdir
innodb_undo_directory
./
innodb_undo_log_truncate
OFF
innodb_undo_logs
128
innodb_undo_tablespaces
0
innodb_use_atomic_writes
ON
innodb_use_native_aio
ON
innodb_version
10.5.13
innodb_write_io_threads
4
interactive_timeout
28800
join_buffer_size
3145728
join_buffer_space_limit
2097152
join_cache_level
2
keep_files_on_create
OFF
key_buffer_size
134217728
key_cache_age_threshold
300
key_cache_block_size
1024
Variable_name
Value
key_cache_division_limit
100
key_cache_file_hash_size
512
key_cache_segments
0
large_files_support
ON
large_page_size
0
large_pages
OFF
lc_messages
en_US
lc_messages_dir
lc_time_names
en_US
license
GPL
local_infile
ON
lock_wait_timeout
86400
locked_in_memory
OFF
log_bin
OFF
log_bin_basename
log_bin_compress
OFF
log_bin_compress_min_len
256
log_bin_index
log_bin_trust_function_creators
OFF
log_disabled_statements
sp
log_error
/var/lib/mysql/hostname.err
log_output
FILE
log_queries_not_using_indexes
OFF
log_slave_updates
OFF
log_slow_admin_statements
ON
log_slow_disabled_statements
sp
log_slow_filter
admin,filesort,filesort_on_disk,filesort_priority_...
log_slow_rate_limit
1
log_slow_slave_statements
ON
log_slow_verbosity
log_tc_size
24576
log_warnings
2
long_query_time
10.000000
low_priority_updates
OFF
lower_case_file_system
OFF
lower_case_table_names
0
master_verify_checksum
OFF
max_allowed_packet
268435456
max_binlog_cache_size
18446744073709547520
max_binlog_size
1073741824
max_binlog_stmt_cache_size
18446744073709547520
max_connect_errors
100
max_connections
200
max_delayed_threads
20
max_digest_length
1024
max_error_count
64
max_heap_table_size
134217728
max_insert_delayed_threads
20
max_join_size
18446744073709551615
max_length_for_sort_data
1024
max_password_errors
4294967295
max_prepared_stmt_count
16382
max_recursive_iterations
4294967295
max_relay_log_size
1073741824
max_rowid_filter_size
131072
max_seeks_for_key
4294967295
max_session_mem_used
9223372036854775807
max_sort_length
1024
max_sp_recursion_depth
0
max_statement_time
0.000000
max_tmp_tables
32
max_user_connections
0
max_write_lock_count
4294967295
metadata_locks_cache_size
1024
metadata_locks_hash_instances
8
min_examined_row_limit
0
mrr_buffer_size
262144
myisam_block_size
1024
myisam_data_pointer_size
6
myisam_max_sort_file_size
9223372036853727232
myisam_mmap_size
18446744073709551615
myisam_recover_options
BACKUP,QUICK
myisam_repair_threads
1
myisam_sort_buffer_size
134216704
myisam_stats_method
NULLS_UNEQUAL
myisam_use_mmap
OFF
mysql56_temporal_format
ON
net_buffer_length
16384
net_read_timeout
30
net_retry_count
10
net_write_timeout
60
old
OFF
old_alter_table
DEFAULT
old_mode
old_passwords
OFF
open_files_limit
74000
optimizer_max_sel_arg_weight
32000
optimizer_prune_level
1
optimizer_search_depth
62
optimizer_selectivity_sampling_limit
100
optimizer_switch
index_merge=on,index_merge_union=on,index_merge_so...
optimizer_trace
enabled=off
optimizer_trace_max_mem_size
1048576
optimizer_use_condition_selectivity
4
performance_schema
OFF
performance_schema_accounts_size
-1
performance_schema_digests_size
-1
performance_schema_events_stages_history_long_size
-1
performance_schema_events_stages_history_size
-1
performance_schema_events_statements_history_long_...
-1
Variable_name
Value
performance_schema_events_statements_history_size
-1
performance_schema_events_transactions_history_lon...
-1
performance_schema_events_transactions_history_siz...
-1
performance_schema_events_waits_history_long_size
-1
performance_schema_events_waits_history_size
-1
performance_schema_hosts_size
-1
performance_schema_max_cond_classes
90
performance_schema_max_cond_instances
-1
performance_schema_max_digest_length
1024
performance_schema_max_file_classes
80
performance_schema_max_file_handles
32768
performance_schema_max_file_instances
-1
performance_schema_max_index_stat
-1
performance_schema_max_memory_classes
320
performance_schema_max_metadata_locks
-1
performance_schema_max_mutex_classes
210
performance_schema_max_mutex_instances
-1
performance_schema_max_prepared_statements_instanc...
-1
performance_schema_max_program_instances
-1
performance_schema_max_rwlock_classes
50
performance_schema_max_rwlock_instances
-1
performance_schema_max_socket_classes
10
performance_schema_max_socket_instances
-1
performance_schema_max_sql_text_length
1024
performance_schema_max_stage_classes
160
performance_schema_max_statement_classes
222
performance_schema_max_statement_stack
10
performance_schema_max_table_handles
-1
performance_schema_max_table_instances
-1
performance_schema_max_table_lock_stat
-1
performance_schema_max_thread_classes
50
performance_schema_max_thread_instances
-1
performance_schema_session_connect_attrs_size
-1
performance_schema_setup_actors_size
-1
performance_schema_setup_objects_size
-1
performance_schema_users_size
-1
pid_file
/var/lib/mysql/servername.pid
plugin_dir
/usr/lib64/mysql/plugin/
plugin_maturity
gamma
port
3306
preload_buffer_size
32768
profiling
OFF
profiling_history_size
15
progress_report_time
5
protocol_version
10
proxy_protocol_networks
query_alloc_block_size
16384
query_cache_limit
1048576
query_cache_min_res_unit
4096
query_cache_size
1048576
query_cache_strip_comments
OFF
query_cache_type
OFF
query_cache_wlock_invalidate
OFF
query_prealloc_size
24576
range_alloc_block_size
4096
read_binlog_speed_limit
0
read_buffer_size
4194304
read_only
OFF
read_rnd_buffer_size
524288
relay_log
relay_log_basename
relay_log_index
relay_log_info_file
relay-log.info
relay_log_purge
ON
relay_log_recovery
OFF
relay_log_space_limit
0
replicate_annotate_row_events
ON
replicate_do_db
replicate_do_table
replicate_events_marked_for_skip
REPLICATE
replicate_ignore_db
replicate_ignore_table
replicate_wild_do_table
replicate_wild_ignore_table
report_host
report_password
report_port
3306
report_user
require_secure_transport
OFF
rowid_merge_buff_size
8388608
rpl_semi_sync_master_enabled
OFF
rpl_semi_sync_master_timeout
10000
rpl_semi_sync_master_trace_level
32
rpl_semi_sync_master_wait_no_slave
ON
rpl_semi_sync_master_wait_point
AFTER_COMMIT
rpl_semi_sync_slave_delay_master
OFF
rpl_semi_sync_slave_enabled
OFF
rpl_semi_sync_slave_kill_conn_timeout
5
rpl_semi_sync_slave_trace_level
32
secure_auth
ON
secure_file_priv
secure_timestamp
NO
server_id
1
session_track_schema
ON
session_track_state_change
OFF
session_track_system_variables
autocommit,character_set_client,character_set_conn...
session_track_transaction_info
OFF
skip_external_locking
ON
skip_name_resolve
OFF
skip_networking
OFF
Variable_name
Value
skip_show_database
OFF
slave_compressed_protocol
OFF
slave_ddl_exec_mode
IDEMPOTENT
slave_domain_parallel_threads
0
slave_exec_mode
STRICT
slave_load_tmpdir
/tmp
slave_max_allowed_packet
1073741824
slave_net_timeout
60
slave_parallel_max_queued
131072
slave_parallel_mode
optimistic
slave_parallel_threads
0
slave_parallel_workers
0
slave_run_triggers_for_rbr
NO
slave_skip_errors
OFF
slave_sql_verify_checksum
ON
slave_transaction_retries
10
slave_transaction_retry_errors
1158,1159,1160,1161,1205,1213,1429,2013,12701
slave_transaction_retry_interval
0
slave_type_conversions
slow_launch_time
2
slow_query_log
OFF
slow_query_log_file
servername-slow.log
socket
/var/lib/mysql/mysql.sock
sort_buffer_size
1048576
sql_auto_is_null
OFF
sql_big_selects
ON
sql_buffer_result
OFF
sql_if_exists
OFF
sql_log_bin
ON
sql_log_off
OFF
sql_mode
STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_...
sql_notes
ON
sql_quote_show_create
ON
sql_safe_updates
OFF
sql_select_limit
18446744073709551615
sql_slave_skip_counter
0
sql_warnings
OFF
ssl_ca
ssl_capath
ssl_cert
ssl_cipher
ssl_crl
ssl_crlpath
ssl_key
standard_compliant_cte
ON
storage_engine
InnoDB
stored_program_cache
256
strict_password_validation
ON
sync_binlog
0
sync_frm
ON
sync_master_info
10000
sync_relay_log
10000
sync_relay_log_info
10000
system_time_zone
IST
system_versioning_alter_history
ERROR
system_versioning_asof
DEFAULT
table_definition_cache
37000
table_open_cache
36885
table_open_cache_instances
1
tcp_keepalive_interval
0
tcp_keepalive_probes
0
tcp_keepalive_time
0
tcp_nodelay
ON
thread_cache_size
200
thread_handling
one-thread-per-connection
thread_pool_dedicated_listener
OFF
thread_pool_exact_stats
OFF
thread_pool_idle_timeout
60
thread_pool_max_threads
65536
thread_pool_oversubscribe
3
thread_pool_prio_kickup_timer
1000
thread_pool_priority
auto
thread_pool_size
36
thread_pool_stall_limit
500
thread_stack
299008
time_format
%H:%i:%s
time_zone
SYSTEM
tls_version
TLSv1.1,TLSv1.2,TLSv1.3
tmp_disk_table_size
18446744073709551615
tmp_memory_table_size
134217728
tmp_table_size
134217728
tmpdir
/tmp
Couldn't paste full global prams as stack limit but thats around 95% of it.
We have disabled swap in the previous my.cnf edit, as for mariadb even when there are alot of free memory within the system is still building swap until swap reached 100%(4GB of swap).
FreeRTOS priority 1 is special?
In my system,i have 6 priority 0-5. I know the idle pro at 0.
i assign a task at 1,others at 2-5.from the cpu time and IDEL pro info ,i could know the cpu have enough time to do all tasks.
I found a problem that the task at 1 could not work at the right time,the frequence is 10Hz,but i found sometimes it not work. maybe 8 or lower than 10,even lower than 1Hz.
when i set the task at 2,it's ok,work at 10Hz.
The code struct like this:
void SYS_MONITOR::run(){
int ret = 0;
while(1){
vTaskDelayUntil(&last_wake_time, SYS_MONITOR_RUN_INTERVAL_MS/portTICK_RATE_MS);
dosomething....
}
ID State Prio Mark CPU(%) Name
1 S 1 261 0.0000000 God
2 R 0 109 69.6136779 IDLE
3 S 5 470 3.9053585 Tmr Svc
...
...
44 B 2 179 0.0242588 SYS_MONITOR_run
Heap : Total 491520 , Used 193696 , Remain 297824
DmaHeap: Total 16384 , Used 2048 , Remain 14336
There is no enough information to answer this.
You have quite large setup , by looking at the number of tasks you have.
One thing :
1 S 1 261 0.0000000 God
.....
4 B 2 179 0.0242588 SYS_MONITOR_run
5 R 1 303 0.0142761 SYS_CLI_SERV_run
You have at least 2 Tasks with priority 1 there. If your SYS_MONITOR_run was 1 also and started working "better" after you bumped it's priority to 2 (higher) that is not surprising.
It depends on your scheduler configuration how the equal priority tasks get chance to run, e.g.: do you have time slicing round-robin or FIFO on equal priority tasks? That's one. ...
Two, you have complex setup (44 Tasks!) and too little info to really answer your question.
I am trying to get the physical address range of all the available ram in the system inside a linux kernel module.
I saw cat /proc/iomem and saw that the physical memory is itself not contiguous.
I understand that for 32bit systems compatibility there is PCI and other peripheral memory need to be inside the 4GB address range.
also the 640 kB initial for DOS.
below output is from x86_64 system
00000000-00000fff : reserved
00001000-0009d7ff : System RAM //640kB here
0009d800-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000cedff : Video ROM
000e0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-daa85fff : System RAM //~3.5 gb here
01000000-0177acb8 : Kernel code
0177acb9-01d1b53f : Kernel data
01e79000-01fbdfff : Kernel bss
daa86000-daa87fff : reserved
daa88000-dad0efff : System RAM //some RAM here
dad0f000-dae75fff : reserved
dae76000-dae95fff : ACPI Non-volatile Storage
dae96000-daf1efff : reserved
daf1f000-daf9efff : ACPI Non-volatile Storage
daf9f000-daffefff : ACPI Tables
dafff000-daffffff : System RAM //some RAM here
db000000-df9fffff : reserved
dba00000-df9fffff : Graphics Stolen Memory
dfa00000-feafffff : PCI Bus 0000:00
e0000000-efffffff : 0000:00:02.0
f0000000-f03fffff : 0000:00:02.0
f0400000-f04fffff : PCI Bus 0000:02
f0400000-f0403fff : 0000:02:00.0
f0400000-f0403fff : r8169
f0404000-f0404fff : 0000:02:00.0
f0404000-f0404fff : r8169
f0500000-f05fffff : PCI Bus 0000:01
f0500000-f0503fff : 0000:01:00.0
f0500000-f0503fff : bcma-pci-bridge
f0600000-f0603fff : 0000:00:1b.0
f0600000-f0603fff : ICH HD audio
f0604000-f06040ff : 0000:00:1f.3
f0605000-f060500f : 0000:00:16.0
f0605000-f060500f : mei_me
f0608000-f06087ff : 0000:00:1f.2
f0608000-f06087ff : ahci
f0609000-f06093ff : 0000:00:1d.0
f0609000-f06093ff : ehci_hcd
f060a000-f060a3ff : 0000:00:1a.0
f060a000-f060a3ff : ehci_hcd
f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f]
f8000000-fbffffff : reserved
f8000000-fbffffff : pnp 00:05
fec00000-fec00fff : reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed08000-fed08fff : reserved
fed10000-fed19fff : reserved
fed10000-fed17fff : pnp 00:05
fed18000-fed18fff : pnp 00:05
fed19000-fed19fff : pnp 00:05
fed1c000-fed1ffff : reserved
fed1c000-fed1ffff : pnp 00:05
fed1f410-fed1f414 : iTCO_wdt
fed20000-fed3ffff : pnp 00:05
fed40000-fed44fff : PCI Bus 0000:00
fed45000-fed8ffff : pnp 00:05
fed90000-fed93fff : pnp 00:05
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
ff000000-ffffffff : INT0800:00
ffd80000-ffffffff : reserved
100000000-15fdfffff : System RAM //~1.5 gB here
15fe00000-15fffffff : RAM buffer
My Question is .
1. how to get all of the RAM which can be used for DMA, using kernel code.
2. why is there extra RAM regions. ? also why the RAM is not split at some proper boundary for ex. 2GB +3GB.
3. will only the 3.5GB of area will be used for DMA or higher 1.5 GB can also be used for DMA. in linux.
There are a few commands that can be used from the linux terminal for this. Both will show the physical memory address range in your linux system.
cat /proc/meminfo: This will print the value in terminal as:
MemTotal: 8027952 kB
MemFree: 3893748 kB
Buffers: 132208 kB
Cached: 1666864 kB
SwapCached: 226556 kB
Active: 1979556 kB
Inactive: 1849480 kB
Active(anon): 1592580 kB
Inactive(anon): 886080 kB
Active(file): 386976 kB
Inactive(file): 963400 kB
Unevictable: 68 kB
Mlocked: 68 kB
SwapTotal: 15624188 kB
SwapFree: 15050964 kB
Dirty: 172 kB
Writeback: 0 kB
AnonPages: 1907548 kB
Mapped: 223484 kB
Shmem: 448696 kB
Slab: 140444 kB
SReclaimable: 101456 kB
SUnreclaim: 38988 kB
KernelStack: 4960 kB
PageTables: 53108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 19638164 kB
Committed_AS: 7822876 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 567356 kB
VmallocChunk: 34359151824 kB
or vmstat -s: This will print value as:
8027952 K total memory
4114688 K used memory
1960100 K active memory
1849792 K inactive memory
3913264 K free memory
132240 K buffer memory
1667108 K swap cache
15624188 K total swap
573224 K used swap
15050964 K free swap
931285 non-nice user cpu ticks
6391 nice user cpu ticks
152567 system cpu ticks
7019826 idle cpu ticks
181109 IO-wait cpu ticks
19 IRQ cpu ticks
2262 softirq cpu ticks
There is one more command using dmidecode: you can use sudo dmidecode -t memory to check the details of ram in your linux system.
I'm trying calculate crc32 for multithread. I'm trying use OpenCL.
The GPU code is:
__kernel void crc32_Sarwate( __global int* lenghtIn,
__global unsigned char *In,
__global int *OutCrc32,
int size ) {
int i, j, len;
i = get_global_id( 0 );
if( i >= size )
return;
len = j = 0;
while( j != i )
len += lenghtIn[ j++ ];
OutCrc32[ i ] = crc32( In + len, lenghtIn[ i ] ); }
I received this results( time ) with a thousand repetitions:
for 4 using work-item: 29.82
for 8 using work-item: 29.9
for 16 using work-item: 35.16
for 32 using work-item: 35.93
for 64 using work-item: 38.69
for 128 using work-item: 52.83
for 256 using work-item: 152.08
for 512 using work-item: 333.63
I have intel HD Graphics with 350 MHz and 3 work-group with 256 work-item
each work-group.
I assumed that by increasing the number of work-item 128 to 256 happen slight increase in time, but time tripled. Why?
( I'm sorry for my very bad English ).
The
while( j != i )
len += lenghtIn[ j++ ];
part runs for get_global_id( 0 ) times.
When it is 128, the latest work item to complete is doing 128 loop iterations.
When it is 256, it is doing 256 iterations so it should be %100 increase from memory's point of view but only for the last work item. When we integrate all workers' total memory access numbers,
1 item from 0 to 0 ---> 1 access
2 item from 0 to 0 and 0 to 1 ---> 3 access
4 item from 0 to 0 and 0 to 1 and 0 to 2 and 0 to 3---> 10 access
8 items: SUM(1 to 8) => 36 accesses
16 items: SUM(1 to 16) => 136 accesses (even more than + %200)
32 items: => 528 (~ %400)
64 items: => 2080 ( ~%400)
128 items: => 8256 (~%400) (cache of your igpu starts failing here)
256 items: => 32896 (~400%) (now caching is saturated and you start )
( seeing %400 per doubling of work items)
512 => uses second compute unit too! But %400 work is done
so it is not only %200 time consuming.
so each time you increase work items by %100, you increase total memory
accesses to %400 . But caching helps up to some degree. When you cross that, memory accesses increase badly. Alse the execution overhead(drivers,..) becomes unimportant.
You are accessing to memory non-parallel. You need to cache it first but it may not be possible in that hardware so you should distribute the job equally among workitems and make memory accesses contiguous between cores(vectorize). This should give more performance.
For now, each vector unit does:
unit : v0 v1 v2 v3 v4 ... v7
read address: 0 0 0 0 0 0
- 1 1 1 1 1
- - 2 2 2 2
- - - 3 3 3
- - - - 4 4
....
- - - - - ... 7
done in 8 steps on 8 streaming cores.
At the last step, only single work item is actually computing something. This should be something like:
Some Optimization
unit : v0 v1 v2 v3 no need other work items
read address: 0 0 0 0 \
1 1 1 1 \
2 2 2 2 \
3 3 3 3 / this is 5th work item's work
4 4 4 4 /
5 5 5 0 \
6 6 0 1 \ this is 0 to 3 as 4th work
7 0 1 2 /
first item<-- 0 1 2 3 /
done in 8 steps in only 4 streaming cores and is doing same job for the first
half part(probably faster).
Further Optimization Suggestion
I think it would be better with a prefix-scan(sum) algorithm on another kernel before getting to crc32() part. (probably in just 3 steps for this example rather than 8 and also more efficient)
Using precomputed values of
while( j != i )
len += lenghtIn[ j++ ];
should make crc32 immune to the current algorithm complexity (O(n²)).
I've set KMP_AFFINITY to scatter but the execution time increased a lot!
That's why I think that OpenMP is spawning threads on only 1 core.
So I need something
that returns which core a thread is using at the moment.
This is the pragma I'm using before the for loop:
int procs = omp_get_num_procs();
#pragma omp parallel for num_threads(procs)\
shared (c, u, v, w, k, j, i, nx, ny) \
reduction(+: a, b, c, d, e, f, g, h, i)
And these are the exports I did:
export OMP_NUM_THREADS=5
export KMP_AFFINITY=verbose,scatter
If it helps I'm pasting also the verbose:
OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid instr info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #159: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #160: KMP_AFFINITY: OS proc to physical thread map ([] => level not in map):
OMP: Info #168: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 4 maps to package 0 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 1 maps to package 1 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 5 maps to package 1 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 3 maps to package 1 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 [thread 0]
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}
thanks in advance!
As #user3018144 pointed out, sched_getcpu(3) is what can be used to obtain CPU number.
Consider the following code:
#define _GNU_SOURCE // sched_getcpu(3) is glibc-specific (see the man page)
#include <stdio.h>
#include <sched.h>
#include <omp.h>
int main() {
#pragma omp parallel
{
int thread_num = omp_get_thread_num();
int cpu_num = sched_getcpu();
printf("Thread %3d is running on CPU %3d\n", thread_num, cpu_num);
}
return 0;
}
This is my output without affinity:
$> OMP_NUM_THREADS=4 ./a.out | sort
Thread 0 is running on CPU 2
Thread 1 is running on CPU 0
Thread 2 is running on CPU 3
Thread 3 is running on CPU 1
And this is the output with affinity:
$> GOMP_CPU_AFFINITY='0,1,2,3' OMP_NUM_THREADS=4 ./a.out | sort
Thread 0 is running on CPU 0
Thread 1 is running on CPU 1
Thread 2 is running on CPU 2
Thread 3 is running on CPU 3
If you are on linux, you might use the function sched_getcpu(). Here is a link to explain how it works and its declaration:
http://man7.org/linux/man-pages/man3/sched_getcpu.3.html
Hope this can help