LWIP hanging after some time due to memory not freeing - c

OBS: I'm using a STM32F2 microcontroller, FreeRTOS and lwIP and I'm using the raw API
I have an application in which I'm listening to one PC and connecting to another. Basically everything works fine for a while when I am trying to achieve high throughput, but after about half an hour ... about 80~90k packets received it hangs. It actually varies a little bit where it hangs, but it stopped doing it when I started closing the connection whenever tcp_write returnd err_mem.
Sometimes it hangs in this line:
/* useg should point to last segment on unacked queue */
useg = pcb->unacked;
if (useg != NULL) {
for (; useg->next != NULL; useg = useg->next); <------- here
}
Sometimes when I call tcp_write it returns ERR_MEM and it never returns anything besides after ERR_MEM. This is how I send data, basically I accept a connection, recieve data, store the PCB, do something and then reply to that same PCB:
err_t ret;
ret = tcp_write(g_response[i].pcb, data, len, 1);
if(ret == ERR_OK)
tcp_output(g_response[i].pcb);
else
tcp_close(g_response[i].pcb);
Here is how I setup the socket to listen:
pcb = tcp_new();
tcp_bind(pcb, IP_ADDR_ANY, port);
pcb = tcp_listen(pcb);
pcb->so_options |= SOF_KEEPALIVE; // enable keep-alive
pcb->keep_intvl = 1000; // sends keep-alive every second
tcp_accept(pcb, accept);
And here are my callbacks to sent and rcv
static err_t rcv(void *arg, struct tcp_pcb *pcb, struct pbuf *p, err_t err) {
if(p == NULL) {
return ERR_OK;
} else if(err != ERR_OK) {
return err;
}
tcp_recved(pcb, p->len);
// do something
pbuf_free(p);
return ERR_OK;
}
int sentcounter = 0;
static err_t sent(void *arg, struct tcp_pcb *pcb, uint16_t len) {
sentcounter++;
return ERR_OK;
}
static err_t accept(void *arg, struct tcp_pcb *pcb, err_t err) {
int i;
tcp_arg(pcb, NULL);
/* Set up the various callback functions */
tcp_recv(pcb, rcv);
tcp_err(pcb, error);
tcp_sent(pcb, sent);
tcp_accepted(pcb);
}
The way I send data where I close the pcb whenever there isn ERR_MEM may be strange, but now I have fewer lost packets and it actually got me to exchange up to 90k packets, before that it was failing randomly.
I actually need a high throughput, that's why I'm calling tcp_output instead of letting the tcpip_thread deal with sending the packet. Whenever I let this thread take car of the packet eveything just works, but it's too slow (maybe one packet every 200~300 ms, and by calling tcp_output in the function I got to the point where I'm sending the data sub 10 ms ... I'm also not transfering big amounts of data, so that helps).
Recently I've noticed that the tcpip_thread calls the input function, goes to ipv4_input, goes to memp_free and keeps going but never leaves (I'm actually running a new test right now so later I'll update this question with the callstack to the input before it freezes).
Has somebody done anything similar? Bursts of small packets with high throughput?
EDIT: As promised, here is my call stack
osMutexWait() at cmsis_os.c:681 0x800474c sys_arch_protect() at
sys_arch.c:400 0x80146a6 do_memp_free_pool() at memp.c:415 0x800dca2
memp_free() at memp.c:486 0x800dcf8 tcp_seg_free() at tcp.c:1.336
0x800fb0e tcp_receive() at tcp_in.c:1.162 0x8011712 tcp_process() at
tcp_in.c:877 0x8011048 tcp_input() at tcp_in.c:367 0x8010692
ip4_input() at ip4.c:670 0x800c688 ethernet_input() at ethernet.c:176
0x80142fe tcpip_thread() at tcpip.c:124 0x8006836
pxPortInitialiseStack() at port.c:231 0x8004cd0
Just like #Lundin said, it is a concurrency problem. I should probably try to be more careful with how the functions are called. I'll try to modify my code to work with netconn or socket instead of tcp_pcb, and then measure the speeds I get. I really need high throughput

Related

Managing UART buffer Thingy91

I have a Thingy91 Nordic device, and I am integrating it with a sensor that works at 1200 baud rate. It basically sends strings that i need to store in a buffer and parse for usage. However I am facing a peculiar issue wherein I am getting the correct string the 1st time I recieve data, but after that I recieve garbage values.
Below is the part of the code:
uint8_t message_buf[100];
void uart_cb(struct device *x) {
uart_irq_update(x);
if (uart_irq_rx_ready(x)) {
uart_fifo_read(x, message_buf, sizeof(message_buf));
printk("%s", message_buf);
}
}
void main(){
struct device *uart = device_get_binding(UART_PORT);
......
uart_irq_callback_set(uart, uart_cb);
......
}
We think that it might be a problem in managing the message_buf while getting the new string, and wanted to know the correct procedure of managing the buffer. Also,what could be the root cause that I get correct data on the first call and get garbage values later on.
Regards,
Adeel.
Looking at the API document:
static int uart_fifo_read(structdevice *dev, u8_t *rx_data, const int size)
Read data from FIFO.
This function is expected to be called from UART interrupt handler (ISR), if uart_irq_rx_ready() returns true. Result of calling this function not from an ISR is undefined (hardware-dependent). It’s unspecified whether “RX ready” condition as returned by uart_irq_rx_ready() is level- or edge- triggered. That means that once uart_irq_rx_ready() is detected, uart_fifo_read() must be called until it reads all available data in the FIFO (i.e. until it returns less data than was requested).
Returns: Number of bytes read.
Parameters:
dev: UART device structure.
rx_data: Data container.
size: Container size.
It clearly specifies that the API must be called until the return value is lesser than requested number of bytes (size).
Your code:
if (uart_irq_rx_ready(x)) {
uart_fifo_read(x, message_buf, sizeof(message_buf));
printk("%s", message_buf);
}
It calls uart_fifo_read() once and then exits the callback uart_cb, which deviates from what is recommended.
You should read from FIFO until it is empty:
int readBytes = 0;
if (uart_irq_rx_ready(x))
{
do{
readBytes += uart_fifo_read(x, &message_buf[readBytes], (sizeof(message_buf)-readBytes));
while(readBytes < sizeof(message_buf));
printk("Received %d bytes: ", readBytes);
// printk("%s", message_buf); // Certainly the code will crash if it isn't a string that fits in message_buf!
}

How to properly utilize masks to send index information to perf event output?

According to the documentation for bpf_perf_event_output found here: http://man7.org/linux/man-pages/man7/bpf-helpers.7.html
"The flags are used to indicate the index in map for which the value must be put, masked with BPF_F_INDEX_MASK."
In the following code:
SEC("xdp_sniffer")
int xdp_sniffer_prog(struct xdp_md *ctx)
{
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
if (data < data_end) {
/* If we have reached here, that means this
* is a useful packet for us. Pass on-the-wire
* size and our cookie via metadata.
*/
/* If we have reached here, that means this
* is a useful packet for us. Pass on-the-wire
* size and our cookie via metadata.
*/
__u64 flags = BPF_F_INDEX_MASK;
__u16 sample_size;
int ret;
struct S metadata;
metadata.cookie = 0xdead;
metadata.pkt_len = (__u16)(data_end - data);
/* To minimize writes to disk, only
* pass necessary information to userspace;
* that is just the header info.
*/
sample_size = min(metadata.pkt_len, SAMPLE_SIZE);
flags |= (__u64)sample_size << 32;
ret = bpf_perf_event_output(ctx, &my_map, flags,
&metadata, sizeof(metadata));
if (ret)
bpf_printk("perf_event_output failed: %d\n", ret);
}
return XDP_PASS;
}
It works as you would expect and stores the information for the given CPU number.
However, suppose I want all packets to be sent to index 1.
I swap
__u64 flags = BPF_F_INDEX_MASK;
for
__u64 flags = 0x1ULL;
The code compiles correctly and throws no errors, however no packets get saved at all anymore. What am I doing wrong if I want all of the packets to be sent to index 1?
Partial answer: I see no reason why the packets would not be sent to the perf buffer, but I suspect the error is on the user space code (not provided). It could be that you do not “open” the perf event for all CPUs when trying to read from the buffer. Have a look at the man page for perf_event_open(2): check that the combination of values for pid and cpu allows you to read data written for CPU 1.
As a side note, this:
__u64 flags = BPF_F_INDEX_MASK;
is misleading. The mask should be used to mask the index, not to set its value. BPF_F_CURRENT_CPU should be used instead, the former only happens to work because the two enum attributes have the same value.

Raw LWIP Send TCP Transmission to Static IP

I've got the TCP Echo example working well on my hardware, and yesterday figured out how to get a UDP Broadcast working. After further thought, I've realized is that what I really need is to be able to set up a TCP Connection to a Static IP, the idea being that my hardware can connect to a server of some sort and then use that connection for all its transactions. The difference is that whereas the echo example sets up a passive connection, that binds with the incoming source (as I understand it), I want to initiate the connection deliberately to a known IP.
Based on what I found on Wikia Here Here
I've attempted as a base case to implement a function that can send a packet to a Defined IP. I'm simply trying to send a packet to my PC, and I'm looking for it on Wireshark.
void echo_tx_tcp()
{
err_t wr_err = ERR_OK;
struct tcp_pcb *l_tcp_pcb;
l_tcp_pcb = tcp_new();
ip_addr_t dest_ip =
{ ((u32_t)0x0C0C0C2BUL) };
wr_err = tcp_bind(l_tcp_pcb, &dest_ip, 12);
wr_err = tcp_connect(l_tcp_pcb, &dest_ip, 12, echo_accept);
tcp_sent(l_tcp_pcb, echo_sent);
struct pbuf *p = pbuf_alloc(PBUF_TRANSPORT, 1024, PBUF_RAM);
unsigned char buffer_send[1024] = "My Name Is TCP";
p->payload = buffer_send;
p->len = 1024;
p->tot_len = 1024;
wr_err = tcp_write(l_tcp_pcb, p->payload, p->len, 1);
wr_err = tcp_output(l_tcp_pcb);
if(wr_err == ERR_OK)
{
p->len++;
}
return;
}
The last if statement just exists so that I can inspect the wr_err value with a debugger. The err is coming back OK but the packet is not seen on wireshark. My setup is my hardare as well as my PC connected to a router in an isolated manner. The IP Address of the PC locally is 12.12.12.43
Am I missing a step here?
The tcp_write() function will fail and return ERR_MEM if:
The length of the data exceeds the current send buffer size.
The length of the queue of the outgoing segment is larger than the upper limit defined in lwipopts.h.
The number of bytes available in the output queue can be retrieved with the tcp_sndbuf() function.
Potential solution(s):
Try again but send less data.
Monitor the amount of space available in the send buffer and only send (more) data when there is space available in the send buffer.
Suggestions:
tcp_snd_buf() can be used to find out how much send buffer space is available.
tcp_sent() can be implemented with callback function, that will be called when send butter space is available.

Issue with SPI (Serial Port Comm), stuck on ioctl()

I'm trying to access a SPI sensor using the SPIDEV driver but my code gets stuck on IOCTL.
I'm running embedded Linux on the SAM9X5EK (mounting AT91SAM9G25). The device is connected to SPI0. I enabled CONFIG_SPI_SPIDEV and CONFIG_SPI_ATMEL in menuconfig and added the proper code to the BSP file:
static struct spi_board_info spidev_board_info[] {
{
.modalias = "spidev",
.max_speed_hz = 1000000,
.bus_num = 0,
.chips_select = 0,
.mode = SPI_MODE_3,
},
...
};
spi_register_board_info(spidev_board_info, ARRAY_SIZE(spidev_board_info));
1MHz is the maximum accepted by the sensor, I tried 500kHz but I get an error during Linux boot (too slow apparently). .bus_num and .chips_select should correct (I also tried all other combinations). SPI_MODE_3 I checked the datasheet for it.
I get no error while booting and devices appear correctly as /dev/spidevX.X. I manage to open the file and obtain a valid file descriptor. I'm now trying to access the device with the following code (inspired by examples I found online).
#define MY_SPIDEV_DELAY_USECS 100
// #define MY_SPIDEV_SPEED_HZ 1000000
#define MY_SPIDEV_BITS_PER_WORD 8
int spidevReadRegister(int fd,
unsigned int num_out_bytes,
unsigned char *out_buffer,
unsigned int num_in_bytes,
unsigned char *in_buffer)
{
struct spi_ioc_transfer mesg[2] = { {0}, };
uint8_t num_tr = 0;
int ret;
// Write data
mesg[0].tx_buf = (unsigned long)out_buffer;
mesg[0].rx_buf = (unsigned long)NULL;
mesg[0].len = num_out_bytes;
// mesg[0].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[0].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[0].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[0].cs_change = 0;
num_tr++;
// Read data
mesg[1].tx_buf = (unsigned long)NULL;
mesg[1].rx_buf = (unsigned long)in_buffer;
mesg[1].len = num_in_bytes;
// mesg[1].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[1].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[1].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[1].cs_change = 1;
num_tr++;
// Do the actual transmission
if(num_tr > 0)
{
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), mesg);
if(ret == -1)
{
printf("Error: %d\n", errno);
return -1;
}
}
return 0;
}
Then I'm using this function:
#define OPTICAL_SENSOR_ADDR "/dev/spidev0.0"
...
int fd;
fd = open(OPTICAL_SENSOR_ADDR, O_RDWR);
if (fd<=0) {
printf("Device not found\n");
exit(1);
}
uint8_t buffer1[1] = {0x3a};
uint8_t buffer2[1] = {0};
spidevReadRegister(fd, 1, buffer1, 1, buffer2);
When I run it, the code get stuck on IOCTL!
I did this way because, in order to read a register on the sensor, I need to send a byte with its address in it and then get the answer back without changing CS (however, when I tried using write() and read() functions, while learning, I got the same result, stuck on them).
I'm aware that specifying .speed_hz causes a ENOPROTOOPT error on Atmel (I checked spidev.c) so I commented that part.
Why does it get stuck? I though it can be as the device is created but it actually doesn't "feel" any hardware. As I wasn't sure if hardware SPI0 corresponded to bus_num 0 or 1, I tried both, but still no success (btw, which one is it?).
UPDATE: I managed to have the SPI working! Half of it.. MOSI is transmitting the right data, but CLK doesn't start... any idea?
When I'm working with SPI I always use an oscyloscope to see the output of the io's. If you have a 4 channel scope ypu can easily debug the issue, and find out if you're axcessing the right io's, using the right speed, etc. I usually compare the signal I get to the datasheet diagram.
I think there are several issues here. First of all SPI is bidirectional. So if yo want to send something over the bus you also get something. Therefor always you have to provide a valid buffer to rx_buf and tx_buf.
Second, all members of the struct spi_ioc_transfer have to be initialized with a valid value. Otherwise they just point to some memory address and the underlying process is accessing arbitrary data, thus leading to unknown behavior.
Third, why do you use a for loop with ioctl? You already tell ioctl you haven an array of spi_ioc_transfer structs. So all defined transaction will be performed with one ioctl call.
Fourth ioctl needs a pointer to your struct array. So ioctl should look like this:
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), &mesg);
You see there is room for improvement in your code.
This is how I do it in a c++ library for the raspberry pi. The whole library will soon be on github. I'll update my answer when it is done.
void SPIBus::spiReadWrite(std::vector<std::vector<uint8_t> > &data, uint32_t speed,
uint16_t delay, uint8_t bitsPerWord, uint8_t cs_change)
{
struct spi_ioc_transfer transfer[data.size()];
int i = 0;
for (std::vector<uint8_t> &d : data)
{
//see <linux/spi/spidev.h> for details!
transfer[i].tx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].rx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].len = d.size(); //number of bytes in vector
transfer[i].speed_hz = speed;
transfer[i].delay_usecs = delay;
transfer[i].bits_per_word = bitsPerWord;
transfer[i].cs_change = cs_change;
i++
}
int status = ioctl(this->fileDescriptor, SPI_IOC_MESSAGE(data.size()), &transfer);
if (status < 0)
{
std::string errMessage(strerror(errno));
throw std::runtime_error("Failed to do full duplex read/write operation "
"on SPI Bus " + this->deviceNode + ". Error message: " +
errMessage);
}
}

When two processes read the same file simultaneously, will Linux kernel save one device I/O?

I have a generic question about Linux kernel's handling of file I/O. So far my understanding is that, in an ideal case, after process A reads a file, data is loaded into page cache, and if process B reads the same page before it is reclaimed, it does not need to hit the disk again.
My question is related to how the block device I/O works. Process A's read request will eventually be queued before the I/O actually happens. Now if device B's request (a bio struct) is to be inserted into the request_queue, before A's request is executed, elevator will consider whether to merge B's bio into any existing request. Now, if A and B try to read the same file offset, i.e. same device block, they are literally the same I/O, (or A and B's requests are not exactly the same but they overlap for some blocks), but so far I have not seen this case being considered in kernel code. (The only relevant thing I saw is a test on whether bio can be glued to an existing request contiguously.)
kernel 2.6.11
inline int elv_try_merge(struct request *__rq, struct bio *bio)
{
int ret = ELEVATOR_NO_MERGE;
/*
* we can merge and sequence is ok, check if it's possible
*/
if (elv_rq_merge_ok(__rq, bio)) {
if (__rq->sector + __rq->nr_sectors == bio->bi_sector)
ret = ELEVATOR_BACK_MERGE;
else if (__rq->sector - bio_sectors(bio) == bio->bi_sector)
ret = ELEVATOR_FRONT_MERGE;
}
return ret;
}
kernel 5.3.5
enum elv_merge elv_merge(struct request_queue *q, struct request **req,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
struct request *__rq;
...
/*
* See if our hash lookup can find a potential backmerge.
*/
__rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
...
}
struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
{
struct elevator_queue *e = q->elevator;
struct hlist_node *next;
struct request *rq;
hash_for_each_possible_safe(e->hash, rq, next, hash, offset) {
...
if (rq_hash_key(rq) == offset)
return rq;
}
return NULL;
}
#define rq_hash_key(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq))
Does that mean kernel will just do two I/Os? Or (very likely) I missed something?
thanks!

Resources