Design of a Communication Stack on CAN

Design of a Communication Stack on CAN - c

My Goal is to design and implement a portable Communication Stack on CAN.
To be simple let's assume that the protocol stack i want implement is composed of the following Layers :
1) Data Link Layer : CAN driver and so on
2) Communication Layer: Handle the filtering of the frame in reception and Manage the sending of periodic / Event triggered frames
3) Transport Layer :Manage the segmentation of messages (Standard CAN protocol only allows a frames with length of 8 Byte)
4) Application Layer : defined by the end user
The choice of my design is to build the comunication stack around a non preemptif scheduler and to consider each layer as a task of the scheduler the communication between the different layers is done using communication mechanisms mutex and queues ext.
The questions are:
1) Could this be a good design or there is much easier one
2) How do the Communication stacks really work? , I mean what is the "engine" behind the application layer, Is it a scheduler? or the management of the communication between layers is defined by the end user?
3) Could anyone point me to a free and easy implementation (Ideally in c) of a communication stack (Not necessary for CAN)
Thank you in advance

You should consider using an existing protocol on top of CAN. This could be CANopen. A free implementation is CAN Festival

Transport Layer :Manage the segmentation of messages
No, this is the application layer. It doesn't make sense to handle segmentation unless you have a high-level protocol specifying which CAN identifiers to use and the nature of the data.
The application layer in this case needs to be implemented by you and not the end user. Otherwise you are not making a protocol stack, but merely some glorified CAN driver. Which identifiers are there? The nature of the data? Priorities? How are messages scheduled on the bus over time? Is the system sending data repeatedly and synchronously, or is it event-driven? Are RTR frames used and how? And so on.
How do the Communication stacks really work? , I mean what is the "engine" behind the application layer, Is it a scheduler? or the management of the communication between layers is defined by the end user?
This is quite a broad question, but generally such stacks are event-driven. There a message pump directing incoming data to whoever needs it. The CAN stack need to implement some sort of hardware timer for a given hardware port, to keep track of message timing, but possibly also to keep track of itself.
Some stacks have the possibility to take a "time slice" as parameter and then schedule themselves in a way. Others are built on the concept of doing as little as possible each time they are called, but instead count on constantly getting called repeatedly from the main loop. Whichever makes most sense depends on the end application, really. The former concept might make most sense in applications with RTOS or low power applications. The latter makes most sense for high integrity, fast response systems, like for example a car ECU.
3) Could anyone point me to a free and easy implementation (Ideally in c) of a communication stack (Not necessary for CAN)
(Please note that asking for external resources like libraries is off-topic on SO.)
http://www.canfestival.org is a free CANopen stack. I haven't used it myself, so I have no idea of the quality.

Related

Is it better to use select or multi-threading (or both) when creating a server that will simultaneously send/receive tasks for 200+ clients

I am creating a server that will be sending and receiving tasks from over 200 clients simultaneously (potentially more client in the future). There will also be background engines on the clients that will perform tasks and send responses to the server without asking first. I expect there to be a high volume of information transferred both ways. I've been doing research into multi-threading and using the select function, and I'm wondering given some of the parameters of the project which option (or a combination) would be the most efficient scalable solution based on the amount of traffic that might occur.
Any suggestions would be greatly appreciated. I'd be glad to answer any questions to provide more clarity.

Either approach will work; as far is which is "better", that's going to depend a lot on how you define the word "better".
The single-threaded approach avoids any chance of problems with race conditions or deadlocks, because those problems inherently can't occur in a single-threaded program. In a multithreaded program you have to be extremely careful about data-locking patterns, or else you will find yourself trying to debug very mysterious malfunctions that only occur once every few days/weeks/months.
On the other hand, the single-threaded approach limits you to using a single core; it won't be able to take advantage of a modern multi-core CPU to give you a parallelism speedup.
On the third hand, the multi-threaded approach can get hairy (and lose its speedup potential) if the various threads/connections often need to access any shared/mutable data structures. In that "shared data bottleneck" scenario, the threads may spend a lot of their time blocked waiting to lock a mutex, and then you're mostly back to using a single core anyway. If each connection operates independently of the others (e.g. as part of a simple web server) and doesn't need to interact with the other threads, then this shouldn't be a concern.
Multithreading allows you to use blocking I/O (which is simpler to implement than non-blocking I/O), but blocking I/O limits your control over the threads (e.g. how do you get a thread to exit cleanly, or take some other non-client-initiated action, if it is blocked indefinitely inside a recv() call? There aren't any good solutions to that problem, only poor ones)
Single-threading requires you to use non-blocking I/O (otherwise a single unresponsive client can halt service to all the other clients while the server is blocked inside a send() or recv() call), and non-blocking I/O is tricky to do correctly, since you have to handle partial-reads and partial-writes gracefully.
If your program ever needs to do a non-trivial amount of computation or file I/O, note that a single-threaded design will force all clients to wait while the computation (or I/O) for any client completes. In a multithreaded design, OTOH, clients B through Z can continue to be serviced on other cores/threads while client A's is busy reading from the disk or crunching numbers.
The overhead of spawning and maintaining threads will vary from one OS to another. If you're going to be running hundreds of threads simultaneously, you might want to verify first that your target OS (and hardware) will be able to handle that load efficiently. (You can reduce the overhead of spawning and reaping threads via a thread-pool, at some expense of increased RAM usage)
I personally prefer the single-threaded/non-blocking-I/O approach, because blocking I/O is problematic if you want your program to be able to shut down cleanly and reliably (which you should want, if only so you can do e.g. memory-leak testing under valgrind). If single-core performance turns out to be insufficient, it's often fairly straightforward extend the handle-N-sockets-on-1-thread design to a more powerful handle-N-sockets-on-each-of-M-threads design, and then you can play around with different values of N and M until you find the one that gives you the best performance (e.g. by setting M to the number of cores on the host machine, and handing out newly-accepted sockets to whichever thread is currently handling the smallest number of sockets)

I once made a program in Java, a chat application, that each connection with the server that was established, represented a new Thread in the server, to manage the client in question.
Inside the Server class, there was a static variable, to manage which clients were connected.
I don't know if recommend different technologies is the right way to answer you question, but i think, that for your case, would be a good idea to take a look at Erlang/Elixir platform, the premise is the is able to hold a lot of clients at the same time.
Currently, big companies, like Whatsapp uses Erlang and Discord Elixir.
I hope that my answer was helpful.

Writing program to run on server, requesting experienced advice

I'm developing a program that will need to run on Internet servers (a back-end component to be used by several cross-platform programs). I'm familiar with the security precautions to take (to prevent buffer overflows and SQL Injection attacks, for instance), but have never written a server program before, or any program that will be used on this scale.
The program needs to be able to serve hundreds or thousands of clients simultaneously. The protocols are designed for processing speed and to minimize the amount of data that must be exchanged, and the server side will be written in C. There will be both a Windows and a Linux version from the same code.
Questions:
How should the program handle communications -- multiple threads, a single thread handling all the sockets in turn, or spawn a new process for every so many incoming connections (or for each one)?
Do I need to worry about things like memory fragmentation, since this program will need to run for months at a time?
What other design issues, specific to this kind of programming, might an experienced developer of cross-platform programs for desktop and mobile systems not be aware of?
Please, no suggestions to use a different language. That decision has already been made, for reasons I'm not at liberty to go into.

For I'd use libevent or libev and non-blocking I/O. This way the operating system will take case of most of your scheduling problems. I'd also use a thread pool for processing tasks, that by nature are blocking, so they don't block the main loop. And if you ever need to read or write large amounts of data to or from the disc, use mmap, again to let the OS handle as much as possible.
The basic advice is use the OS, as much as possible. If you want a good example of a program which does this look at Varnish, it is very well written, and performs fantastic.

With my experience running multiple servers for over 3 years of uptime, and programs with little over a year of uptime I can still recommend making the setup so that the system gracefully recovers from a program error and from a server reboot.
Even though performance gets a hit when a program is restarted, you need to be able to handle that as external circumstances can force the program to such a restart.
Don't try to reinvent the wheel when not needed, and have a look at zeromq or something like that to handle distribution of incoming communications. (If you are allowed to, prototype the backends in a more forgiving language than C like Python, then reimplement in C but keeping the communications protocol)

how is tcp(kernel) bypass implemented?

Assuming I would like to avoid the overhead of the linux kernel in handling incoming packets and instead would like to grab the packet directly from user space. I have googled around a bit and it seems that all that needs to happen is one would use raw sockets with some socket options. Is this the case? Or is it more involved than this? And if so, what can I google for or reference in order to implement something like this?

There are many techniques for networking with kernel bypass.
First, if you are sending messages to another process on the same machine, you can do so through a shared memory region with no jumps into the kernel.
Passing packets over a network without involving the kernel gets more interesting, and involves specialized hardware that gets direct access to user memory. This idea is called RDMA.
Here's one way it can work (this is what InfiniBand hardware does). The application registers a memory buffer with the RDMA hardware. This buffer is pinned in physical memory, since swapping it out would obviously be bad (since the hardware will keep writing to the physical memory region). A control region is also mapped into userspace memory. When an application is ready to use the buffer to send or receive a message, it writes a command to the control region. The hardware takes the data from a registered buffer on one end, and places it into another registered buffer at the other end.
Clearly, this is too low level, so there are abstractions that make programming RDMA hardware easier. OFED verbs are one such abstraction.
The InfiniBand software stack has one extra interesting bit: the Sockets Direct Protocol (SDP) that is used for compatibility with existing applications. It works by inserting an LD_PRELOAD shim that translates standard socket API calls into IB verbs.
InfiniBand is just what I'm most familiar with. RoCE/iWARP hardware is very similar from the programmer's perspective, but uses a different transport than InfiniBand (TCP using an offload engine in iWarp, Ethernet in RoCE). There are/were also other approaches to RDMA (Quadrics, for example).

C HTTP server - multithreading model?

I'm currently writing an HTTP server in C so that I'll learn about C, network programming and HTTP. I've implemented most of the simple stuff, but I'm only handling one connection at a time. Currently, I'm thinking about how to efficiently add multitasking to my project. Here are some of the options I thought about:
Use one thread per connection. Simple but can't handle many connections.
Use non-blocking API calls only and handle everything in one thread. Sounds interesting but using select()s and such excessively is said to be quite slow.
Some other multithreading model, e.g. something complex like lighttpd uses. (Probably) the best solution, but (probably) too difficult to implement.
Any thoughts on this?

There is no single best model for writing multi-tasked network servers. Different platforms have different solutions for high performance (I/O completion ports, epoll, kqueues). Be careful about going for maximum portability: some features are mimicked on other platforms (i.e. select() is available on Windows) and yield very poor performance because they are simply mapped onto some other native model.
Also, there are other models not covered in your list. In particular, the classic UNIX "pre-fork" model.
In all cases, use any form of asynchronous I/O when available. If it isn't, look into non-blocking synchronous I/O. Design your HTTP library around asynchronous streaming of data, but keep the I/O bit out of it. This is much harder than it sounds. It usually implies writing state machines for your protocol interpreter.
That last bit is most important because it will allow you to experiment with different representations. It might even allow you to write a compact core for each platform local, high-performance tools and swap this core from one platform to the other.

Yea, do the one that's interesting to you. When you're done with it, if you're not utterly sick of the project, benchmark it, profile it, and try one of the other techniques. Or, even more interesting, abandon the work, take the learnings, and move on to something completely different.

You could use an event loop as in node.js:
Source code of node (c, c++, javascript)
https://github.com/joyent/node
Ryan Dahl (the creator of node) outlines the reasoning behind the design of node.js, non-blocking io and the event loop as an alternative to multithreading in a webserver.
http://www.yuiblog.com/blog/2010/05/20/video-dahl/
Douglas Crockford discusses the event loop in Scene 6: Loopage (Friday, August 27, 2010)
http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/
An index of Douglas Crockford's above talk (if further background information is needed). Doesn't really apply to your question though.
http://yuiblog.com/crockford/

Look at your platforms most efficient socket polling model - epoll (linux), kqueue (freebsd), WSAEventSelect (Windows). Perhaps combine with a thread pool, handle N connections per thread. You could always start with select then replace with a more efficient model once it works.

A simple solution might be having multiple processes: have one process accept connections, and as soon as the connection is established fork and handle the connection in that child process.
An interesting variant of this technique is used by SER/OpenSER/Kamailio SIP proxy: there's one main process that accepts the connections and multiple child worker processes, connected via pipes. The parent sends the new filedescriptor through the socket. See this book excerpt at 17.4.2. Passing File Descriptors over UNIX Domain Sockets. The OpenSER/Kamailio SIP proxies are used for heavy-duty SIP processing where performance is a huge issue and they do very well with this technique (plus shared memory for information sharing). Multi-threading is probably easier to implement, though.

Server Architecture for Embedded Device

I am working on a server application for an embedded ARM platform. The ARM board is connected to various digital IOs, ADCs, etc that the system will consistently poll. It is currently running a Linux kernel with the hardware interfaces developed as drivers. The idea is to have a client application which can connect to the embedded device and receive the sensory data as it is updated and issue commands to the device (shutdown sensor 1, restart sensor 2, etc). Assume the access to the sensory devices is done through typical ioctl.
Now my question relates to the design/architecture of this server application running on the embedded device. At first I was thinking to use something like libevent or libev, lightweight C event handling libraries. The application would prioritize the sensor polling event (and then send the information to the client after the polling is done) and process client commands as they are received (over a typical TCP socket). The server would typically have a single connection but may have up to a dozen or so, but not something like thousands of connections. Is this the best approach to designing something like this? Of the two event handling libraries I listed, is one better for embedded applications or are there any other alternatives?
The other approach under consideration is a multi-threaded application in which the sensor polling is done in a prioritized/blocking thread which reads the sensory data and each client connection is handled in separate thread. The sensory data is updated into some sort of buffer/data structure and the connection threads handle sending out the data to the client and processing client commands (I supposed you would still need an event loop of sort in these threads to monitor for incoming commands). Are there any libraries or typical packages used which facilitate designing an application like this or is this something you have to start from scratch?
How would you design what I am trying to accomplish?

I would use a unix domain socket -- and write the library myself, can't see any advantages to using libvent since the application is tied to linux, and libevent is also for hundreds of connections. You can do all of what you are trying to do with a single thread in your daemon. KISS.
You don't need a dedicated master thread for priority queues you just need to write your threads so that it always processes high priority events before anything else.
In terms of libraries, you will possibly benifit from Google's protocol buffers (for serialization and representing your protocol) -- however it only has first class supports for C++, and the over the wire (serialization) format does a bit of simple bit shifting to numeric data. I doubt it will add any serious overhead. However an alternative is ASN.1 (asn1c).

My suggestion would be a modified form of your 2nd proposal. I would create a server that has two threads. One thread polling the sensors, and another for ALL of your client connections. I have used in embedded devices (MIPS) boost::asio library with great results.
A single thread that handles all sockets connections asynchronously can usually handle the load easily (of course, it depends on how many clients you have). It would then serve the data it has on a shared buffer. To reduce the amount and complexity of mutexes, I would create two buffers, one 'active' and another 'inactive', and a flag to indicate the current active buffer. The polling thread would read data and put it in the inactive buffer. When it finished and had created a 'consistent' state, it would flip the flag and swap the active and inactive buffers. This could be done atomically and should therefore not require anything more complex than this.
This would all be very simple to set up since you would pretty much have only two threads that know nothing about the other.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight