netty pipelines not getting released from memory - nio

I have a high volume netty server that keeps consuming memory. Using jmap, I've tracked it down to the fact that pipelines just seem to keep growing and growing (along with nio sockets, etc). It is like the socket isn't ever disconnecting.
My initialization of the ServerBootstrap is:
ServerBootstrap bootstrap = new ServerBootstrap(new NioServerSocketChannelFactory(coreThreads, workThreads, Runtime.getRuntime().availableProcessors()*2));
bootstrap.setOption("child.keepAlive", false);
bootstrap.setOption("child.tcpNoDelay", true);
bootstrap.setPipelineFactory(new HttpChannelPipelineFactory(this, HttpServer.IdleTimer));
bootstrap.bind(new InetSocketAddress(host, port));
coreThreads and workThreads are java.util.concurrent.Executors.newCachedThreadPool().
IdleTimer is private static Timer IdleTimer = new HashedWheelTimer();
My pipeline factory is:
ChannelPipeline pipeline = Channels.pipeline();
pipeline.addLast("idletimer", new HttpIdleHandler(timer));
pipeline.addLast("decoder", new HttpRequestDecoder());
pipeline.addLast("aggregator", new HttpChunkAggregator(65536));
pipeline.addLast("encoder", new HttpResponseEncoder());
pipeline.addLast("chunkwriter", new ChunkedWriteHandler());
pipeline.addLast("http.handler" , handler);
pipeline.addLast("http.closer", new HttpClose());
HttpIdleHandler is the basic stock idle handler given in the examples except using the "all". It doesn't get executed that often. The timeout is 500 milliseconds. (aka 1/2 second). The idle handler calls close on the channel. The HttpClose() is a simple close the channel on everything that makes it there just in case the handler doesn't process it. It executes very irregularly.
Once I've sent the response in my handler (derived from SimpleChannelUpstreamHandler), I close the channel regardless of keepalive setting. I've verified that I'm closing channels by adding a listener to the channels ChannelFuture returned by close() and the value of isSuccess in the listener is true.
Some examples from the jmap output (columns are rank, number of instances, size in bytes, classname):
3: 147168 7064064 java.util.HashMap$Entry
4: 90609 6523848 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext
6: 19788 3554584 [Ljava.util.HashMap$Entry;
8: 49893 3193152 org.jboss.netty.handler.codec.http.HttpHeaders$Entry
11: 11326 2355808 org.jboss.netty.channel.socket.nio.NioAcceptedSocketChannel
24: 11326 996688 org.jboss.netty.handler.codec.http.HttpRequestDecoder
26: 22668 906720 org.jboss.netty.util.internal.LinkedTransferQueue
28: 5165 826400 [Lorg.jboss.netty.handler.codec.http.HttpHeaders$Entry;
30: 11327 815544 org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture
31: 11326 815472 org.jboss.netty.channel.socket.nio.DefaultNioSocketChannelConfig
33: 12107 774848 java.util.HashMap
34: 11351 726464 org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout
36: 11327 634312 org.jboss.netty.channel.DefaultChannelPipeline
38: 11326 634256 org.jboss.netty.handler.timeout.IdleStateHandler$State
45: 10417 500016 org.jboss.netty.util.internal.LinkedTransferQueue$Node
46: 9661 463728 org.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry
47: 11326 453040 org.jboss.netty.handler.stream.ChunkedWriteHandler
48: 11326 453040 org.jboss.netty.channel.socket.nio.NioSocketChannel$WriteRequestQueue
51: 11326 362432 org.jboss.netty.handler.codec.http.HttpChunkAggregator
52: 11326 362432 org.jboss.netty.util.internal.ThreadLocalBoolean
53: 11293 361376 org.jboss.netty.handler.timeout.IdleStateHandler$AllIdleTimeoutTask
57: 4150 323600 [Lorg.jboss.netty.util.internal.ConcurrentIdentityHashMap$HashEntry;
58: 4976 318464 org.jboss.netty.handler.codec.http.DefaultHttpRequest
64: 11327 271848 org.jboss.netty.channel.SucceededChannelFuture
65: 11326 271824 org.jboss.netty.handler.codec.http.HttpResponseEncoder
67: 11326 271824 org.jboss.netty.channel.socket.nio.NioSocketChannel$WriteTask
73: 5370 214800 org.jboss.netty.channel.UpstreamMessageEvent
74: 5000 200000 org.jboss.netty.channel.AdaptiveReceiveBufferSizePredictor
81: 5165 165280 org.jboss.netty.handler.codec.http.HttpHeaders
84: 1562 149952 org.jboss.netty.handler.codec.http.DefaultCookie
96: 2048 98304 org.jboss.netty.util.internal.ConcurrentIdentityHashMap$Segment
98: 2293 91720 org.jboss.netty.buffer.BigEndianHeapChannelBuffer
What am I missing? What thread is responsible for releasing it's reference to the pipeline (or socket? channel?) such that the garbage collector will collect this memory? There appears to be some large hashtable holding on to them (several references to hashtable entries that I filtered out of the above list).

Unless you have a reference to Channel, ChannelPipeline, ChannelHandlerContext in your application, they should become unreachable as soon as the connection is closed. Please double-check if your application is hold a reference to one of them somewhere. Sometimes an anonymous class is a good suspect, but the precise answer will not be possible with the heap dump file.

According to this response: https://stackoverflow.com/a/12242390/8425783, there was issue in netty, and it was fixed in version 3.5.4.Final
Netty issue: https://github.com/netty/netty/issues/520

Related

Declare a queue with x-max-length programmatically using Rabbitmq-c

I am implementing a RPC function for my C application , and try to programmatically declare a queue which limits maximum number of pending messages, after reading the declaration of amqp_table_entry_t and amqp_field_value_t in amqp.h , here's my minimal code sample :
int default_channel_id = 1;
int passive = 0;
int durable = 1;
int exclusive = 0;
int auto_delete = 0;
amqp_table_entry_t *q_arg_n_elms = malloc(sizeof(amqp_table_entry_t));
*q_arg_n_elms = (amqp_table_entry_t) {.key = amqp_cstring_bytes("x-max-length"),
.value = {.kind = AMQP_FIELD_KIND_U32, .value = {.u32 = 234 }}};
amqp_table_t q_arg_table = {.num_entries=1, .entries=q_arg_n_elms};
amqp_queue_declare( conn, default_channel_id, amqp_cstring_bytes("my_queue_123"),
passive, durable, exclusive, auto_delete, q_arg_table );
amqp_rpc_reply_t _reply = amqp_get_rpc_reply(conn);
The code above always returns AMQP_RESPONSE_LIBRARY_EXCEPTION in the object of amqp_rpc_reply_t, with error message a socket error occurred , I don't see any active connection triggered by this code in web management UI of the RabbitMQ. so I think rabbitmq-c library doesn't establish a connection and just reply with error.
However everything works perfectly when I replace the argument q_arg_table with default amqp_empty_table (which means no argument).
Here are my questions :
Where can I find the code which filter the invalid key of the queue argument ? according to this article , x-max-length should be correct argument key for limiting number of messages in a queue , but I cannot figure out why the library still reports error.
Is there any example that demonstrates how to properly set up amqp_table_t passing in amqp_queue_declare(...) ?
Development environment :
RabbitMQ v3.2.4
rabbitmq-c v0.11.0
Appreciate any feedback , thanks for reading.
[Edit]
According to the server log rabbit#myhostname-sasl.log, RabbitMQ broker accepted a new connection, found decode error on receiving frame, then close connection immediately. I haven't figured out the Erlang implementation but the root cause is likely the decoding error on the table argument when declaring the queue.
131 =CRASH REPORT==== 18-May-2022::16:05:46 ===
132 crasher:
133 initial call: rabbit_reader:init/2
134 pid: <0.23706.1>
135 registered_name: []
136 exception error: no function clause matching
137 rabbit_binary_parser:parse_field_value(<<105,0,0,1,44>>) (src/rabbit_binary_parser.erl, line 53)
138 in function rabbit_binary_parser:parse_table/1 (src/rabbit_binary_parser.erl, line 44)
139 in call from rabbit_framing_amqp_0_9_1:decode_method_fields/2 (src/rabbit_framing_amqp_0_9_1.erl, line 791)
140 in call from rabbit_command_assembler:process/2 (src/rabbit_command_assembler.erl, line 85)
141 in call from rabbit_reader:process_frame/3 (src/rabbit_reader.erl, line 688)
142 in call from rabbit_reader:handle_input/3 (src/rabbit_reader.erl, line 738)
143 in call from rabbit_reader:recvloop/2 (src/rabbit_reader.erl, line 292)
144 in call from rabbit_reader:run/1 (src/rabbit_reader.erl, line 273)
145 ancestors: [<0.23704.1>,rabbit_tcp_client_sup,rabbit_sup,<0.145.0>]
146 messages: [{'EXIT',#Port<0.31561>,normal}]
147 links: [<0.23704.1>]
148 dictionary: [{{channel,1},
149 {<0.23720.1>,{method,rabbit_framing_amqp_0_9_1}}},
150 {{ch_pid,<0.23720.1>},{1,#Ref<0.0.20.156836>}}]
151 trap_exit: true
152 status: running
153 heap_size: 2586
154 stack_size: 27
155 reductions: 2849
156 neighbours:
RabbitMQ may not support unsigned integers as table values.
Instead try using a signed 32 or 64-bit number (e.g., .value = {.kind = AMQP_FIELD_KIND_I32, .value = {.i32 = 234 }}).
Also the RabbitMQ server logs may contain additional debugging information that can help understand errors like this as well as the amqp_error_string2 function can be used to translate error-code into an error-string.

npm package error,<--- JS stacktrace ---> FATAL ERROR: invalid table size Allocation failed - JavaScript heap out of memory

#running other reactjs projects work but running some others run into this error, increasing memory is not solving the issue and clearing cache is also not healing the node,I am stack#
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
Starting the development server...
<--- Last few GCs --->
[24628:0000025F59DB78F0] 10935 ms: Scavenge 318.7 (375.5) -> 318.7 (375.5) MB, 37.4 / 0.0 ms (average mu = 0.990, current mu = 0.984) allocation failure
[24628:0000025F59DB78F0] 13187 ms: Scavenge 510.7 (567.5) -> 510.7 (567.5) MB, 165.5 / 0.0 ms (average mu = 0.990, current mu = 0.984) allocation failure
[24628:0000025F59DB78F0] 18581 ms: Scavenge 894.7 (951.6) -> 894.7 (951.6) MB, 315.3 / 0.0 ms (average mu = 0.990, current mu = 0.984) allocation failure
<--- JS stacktrace --->
FATAL ERROR: invalid table size Allocation failed - JavaScript heap out of memory
1: 00007FF635DA7B7F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+114079
2: 00007FF635D34546 DSA_meth_get_flags+65542
3: 00007FF635D353FD node::OnFatalError+301
4: 00007FF63666B29E v8::Isolate::ReportExternalAllocationLimitReached+94
5: 00007FF63665587D v8::SharedArrayBuffer::Externalize+781
6: 00007FF6364F8C4C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1468
7: 00007FF635FC8D89 v8::internal::Isolate::FatalProcessOutOfHeapMemory+25
8: 00007FF63632D115 v8::internal::HashTable<v8::internal::NumberDictionary,v8::internal::NumberDictionaryShape>::EnsureCapacity<v8::internal::Isolate>+341
9: 00007FF63632AE66 v8::internal::Dictionary<v8::internal::NumberDictionary,v8::internal::NumberDictionaryShape>::Add<v8::internal::Isolate>+86
10: 00007FF6363C8595 v8::internal::FeedbackNexus::ic_state+32581
11: 00007FF6363C29F2 v8::internal::FeedbackNexus::ic_state+9122
12: 00007FF636375714 v8::internal::JSObject::AddDataElement+1092
13: 00007FF63633442B v8::internal::StringSet::Add+1835
14: 00007FF63637700C v8::internal::JSObject::DefineAccessor+1644
15: 00007FF6363764AB v8::internal::JSObject::AddProperty+3083
16: 00007FF63637667B v8::internal::JSObject::AddProperty+3547
17: 00007FF636240658 v8::internal::Runtime::GetObjectProperty+5064
18: 00007FF6366F8F91 v8::internal::SetupIsolateDelegate::SetupHeap+494417
19: 00007FF636722E5D v8::internal::SetupIsolateDelegate::SetupHeap+666141
20: 00007FF63670CD2A v8::internal::SetupIsolateDelegate::SetupHeap+575722
21: 00007FF63668B53E v8::internal::SetupIsolateDelegate::SetupHeap+45310
22: 0000025F5C052EC8
This looks like a corrupted installation of nodejs.
Uninstall and reinstall your nodejs.
Clean your node_modules and reinstall all your dependencies.
Run again and tell us if that solve anything.

uvm_monitor - does not sample correctly. Where am I wrong?

I have the following interface and uvm_monitor (run_phase shown below).
The DUT signals are "x" for sometime. When I print the signals, in my monitor, they are captured as "x". Great.
Next, DUT signals show a valid value (the first time). When I print the signals, in my monitor, they are captured as with valid values. Great.
Next, DUT updates the all the three signals to the next value, and at time stamp 134, mirror_byte_wr_en remains to be 0 but expected to be at 0xffff..
Any idea, why? Appreciate your thoughts and inputs.
Example output from the log:
UVM_INFO snp_decomp_snpd_egress_monitor.sv(65) # 122:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_data =
0x00006c61776e694720616669617a7548
UVM_INFO snp_decomp_snpd_egress_monitor.sv(71) # 122:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_byte_wr_en = 0xffff
UVM_INFO snp_decomp_snpd_egress_monitor.sv(76) # 122:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_wr_addr = 0x00000
UVM_INFO snp_decomp_snpd_egress_monitor.sv(65) # 134:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_data =
0x3c10xxxxxxxxxxxxxxxx616c00000000
UVM_INFO snp_decomp_snpd_egress_monitor.sv(71) # 134:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_byte_wr_en = 0x0000
UVM_INFO snp_decomp_snpd_egress_monitor.sv(76) # 134:
uvm_test_top.m_snp_decomp_env.snpd_egress[0].m_monitor
[snp_decomp_snpd_egress_monitor] mirror_wr_addr = 0x00010
enter code here
task run_phase(uvm_phase phase);
snp_decomp_snpd_egress_transaction tr;
tr = snp_decomp_snpd_egress_transaction ::type_id::create("tr");
forever begin
#(vif.egress.egress_cb);
fork
begin
// # (vif.egress.egress_cb);
tr.mirror_data = vif.egress.egress_cb.mirror_wr_data;
`uvm_info(get_type_name(),$sformatf("mirror_data = 0x%x\n", vif.egress.egress_cb.mirror_wr_data),UVM_LOW);
end
begin
// # (vif.egress.egress_cb);
tr.mirror_wr_byte_en = vif.egress.egress_cb.mirror_byte_wr_en;
`uvm_info(get_type_name(),$sformatf("mirror_byte_wr_en = 0x%x\n", vif.egress.egress_cb.mirror_byte_wr_en),UVM_LOW);
end
begin
// # (vif.egress.egress_cb);
tr.mirror_wr_addr = vif.egress.egress_cb.mirror_wr_addr;
`uvm_info(get_type_name(),$sformatf("mirror_wr_addr = 0x%x\n", vif.egress.egress_cb.mirror_wr_addr),UVM_LOW);
end
join
end
endtask : run_phase
interface snp_decomp_snpd_egress_intf(input logic clock, input logic reset);
logic [127:0] mirror_wr_data;
logic [15:0] mirror_byte_wr_en;
logic [18:0] mirror_wr_addr;
modport DUT (
input clock,
input reset,
output mirror_wr_data,
output mirror_byte_wr_en,
output mirror_wr_addr
); // modport DUT
clocking egress_cb #(posedge clock);
input mirror_wr_data;
input mirror_byte_wr_en;
input mirror_wr_addr;
endclocking: egress_cb
modport egress(clocking egress_cb);
endinterface : snp_decomp_snpd_egress_intf
enter image description here
It is correct behaviour because sample values in clocking block were taken from the previous clock cycle. It depends on SystemVerilog time step semantics.
begin
#(vif.egress.egress_cb);
`uvm_info(get_type_name(), $sformatf("mirror_byte_wr_en: value from previous cycle - 'h%0h, value from current cycle - 'h%0h",
vif.egress.egress_cb.mirror_byte_wr_en, vif.egress.mirror_byte_wr_en), UVM_LOW)
end
For full understanding - LRM 14.13.
Best regards, Maksim.

Unrecognized jedec id

On linux 2.6.25 i have output:
physmap platform flash device: 00800000 at ff800000
physmap-flash.0: Found 1 x16 devices at 0x0 in 8-bit bank
Amd/Fujitsu Extended Query Table at 0x0040
physmap-flash.0: CFI does not contain boot bank location. Assuming top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
RedBoot partition parsing not available
Using physmap partition information
Creating 6 MTD partitions on "physmap-flash.0":
0x00000000-0x00040000 : "U-Boot image"
0x00040000-0x00050000 : "U-Boot params"
0x00050000-0x00250000 : "Linux kernel"
0x00250000-0x00750000 : "RFS"
0x00750000-0x007f0000 : "JFFS"
0x007f0000-0x00800000 : "unused"
m25p80 spi1.0: s70fl256p (16384 Kbytes)
Creating 2 MTD partitions on "tpts1691.spi.flash":
0x00000000-0x00400000 : "spi_flash_part0"
0x00400000-0x01000000 : "spi_flash_part1"
DSPI: Coldfire master initialized
And i try to port spi flash driver to new kernel 4.12.5.
I add in spi_nor_ids in spi-nor/spi-nor.c my jedecid
{ "s70fl256p", INFO(0x012018, 0, 256 * 1024, 64, 0) },
but i have error:
spi_coldfire spi_coldfire: master is unqueued, this is deprecated
m25p80 spi1.0: unrecognized JEDEC id bytes: 00, 00, 00
in output:
physmap platform flash device: 00800000 at ff800000
physmap-flash.0: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x000001 Chip ID 0x000201
Amd/Fujitsu Extended Query Table at 0x0040
Amd/Fujitsu Extended Query version 1.3.
physmap-flash.0: CFI contains unrecognised boot bank location (1). Assuming bottom.
number of CFI chips: 1
Creating 6 MTD partitions on "physmap-flash.0":
0x000000000000-0x000000040000 : "U-Boot image"
0x000000040000-0x000000050000 : "U-Boot params"
0x000000050000-0x000000250000 : "Linux kernel"
0x000000250000-0x000000750000 : "RFS"
0x000000750000-0x0000007f0000 : "JFFS"
0x0000007f0000-0x000000800000 : "unused"
uclinux[mtd]: probe address=0x3549d0 size=0x10804000
Creating 1 MTD partitions on "ram":
0x000000000000-0x000010804000 : "ROMfs"
spi_coldfire spi_coldfire: master is unqueued, this is deprecated
m25p80 spi1.0: unrecognized JEDEC id bytes: 00, 00, 00
DSPI: Coldfire master initialized
Maybe someone has already solved this error's?
Thank you.
The 1st message spi_coldfire: master is unqueued, this is deprecated is not an error.
This just a warning that the registering SPI controller has its own message transfer callback master->transfer. It is deprecated but still supported in kernel 4.12.5.
Look at drivers/spi/spi.c:1993.
The 2nd message: I suspect, that your flash doesn't have JEDEC ID at all (reads 0,0,0), but your flash_info has. So to avoid calling spi_nor_read_id() just let info->id_len to be 0. id_len calculates as .id_len = (!(_jedec_id) ? 0 : (3 + ((_ext_id) ? 2 : 0))), so the possible solution is simply to let jedec_id be 0.
Like:
{ "s70fl256p", INFO(0, 0, 256 * 1024, 64, 0) },

Understand elements of the gdb core print

I have a core generated. /var/log/messages displays this line:
Jan 29 07:50:40 NetAcc-02 kernel: LR.exe[15326]: segfault at 51473861 ip 081e2dba sp 00240030 error 4 in LR.exe[8048000+34c000]
Jan 29 07:50:52 NetAcc-02 abrt[20696]: saved core dump of pid 15252 (/home/netacc/active/LR.exe) to /var/spool/abrt/ccpp-2015-01-29-07:50:40-15252.new/coredump (1642938368 bytes)
Jan 29 07:50:52 NetAcc-02 abrtd: Directory 'ccpp-2015-01-29-07:50:40-15252' creation detected
Jan 29 07:50:54 NetAcc-02 abrtd: Executable '/home/netacc/active/LR.exe' doesn't belong to any package
Jan 29 07:50:54 NetAcc-02 abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2015-01-29-07:50:40-15252 (res:2), deleting
Does the last line mean that the core is corrupted? Because a bt of my corefile seems to be corrupted:
#0 0x081e2dba in CfaPepDecision (pBuf=0xa0d6735, pIp=0x5147384d, u2DirectFlag=1, ppepserver=0x67684e6f, paccl=0x45517377, pPepMode=0x6a31396c "") at /home/TAN/release/rel/idu-sw/pep/pep/src/pepcfa.c:498
#1 0x52367331 in ?? ()
#2 0x0a0d6735 in gProfileVsatTable ()
#3 0x5147384d in ?? ()
#4 0x75417875 in ?? ()
#5 0x38000200 in ?? ()
Strangely the gProfileVsatTable is a global array!
The address pIp = 0x5147384d is out of bounds in gdb.
Any inputs are helpful.
Because a bt of my corefile seems to be corrupted:
This is usually the result of analyzing the wrong binary. Invoke GDB like this:
gdb /home/netacc/active/LR.exe \
/var/spool/abrt/ccpp-2015-01-29-07:50:40-15252.new/coredump
Make sure that you have not updated the binary since Jan 29 07:50:52. In particular, make sure you did not rebuild the binary with different options after the crash.

Resources