Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
The question is really general, so here is a more detailed information:
I run currently Ubuntu 14.04 and work on a neural network currently. To find optimal parameters I want to train many different parametrized networks and see which one works best.
The network and its training sequence is written in c and I have a 4 core processor. If I run the program it trains each network one after another.
Now the system monitor tells me the programme is using about 25 percent of the total process power. How can I improve that, what is the best way to use all cores equally and 100 percent of my cpu(and gpu?)
Currently I am using the compiling flag -pthread, but I guess there are many more possibilities.
Yes, the question is general. So is the answer: learn about concurrent programming. Threads, or OpenMP. Especially with OpenMP you might turn your program into a multi-threaded program by adding a single #pragma before the right for loop.
A different approach could be to have each of the four trainings be performed by a different process. The strategy would be to use main's arguments (argc, argv) to tell each process what to do. This is easy if there needs to be no communication between the processes.
I would suggest you look into OpenCL and OpenMP as ways to fully exploit the processing power. There has been a lot of work on neural nets using OpenCL and CUDA.
These approaches are probably more suited to your neural net. In addition OpenCL and OpenMP applications can be made to compile to use both CPU and GPU hardware with no significant changes.
OpenCL is a C-like language, and although getting optimal performance from it can be quite tricky, it would, IMO, be well worth your while if neural net stuff is important to you. In OpenCL you write the bulk of support code in C and invoke a small kernel in OpenCL to do small operations on large volumes of data in parallel.
You may be developing your own software, but I believe that the FANN neural network library did have a version that supported OpenCL.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am working on a project related to embedded systems and wanted to know if it is a good practice to use the function exit(0) in your program.
if (Req[0] == '0')
{
puts("Emergency stop button operated\n");
**exit(0);**
}
exit, as well as returning from main(), only makes sense in hosted systems where there is an OS to return to. Most embedded systems do not have that, but are so-called "freestanding" systems: "bare metal" or RTOS microcontroller applications. A compiler for such a system need not provide stdlib.h so the function exit might not even be available.
The normal way to handle errors in such systems is to implement a custom error handler, which can log or print the error. And from there on in case of critical errors, you usually provoke a watchdog reset, leading to a full system re-boot. This is because errors in embedded systems are often hardware-related, and a watchdog reset doesn't just restore the software to default, but also all MCU registers and peripheral hardware connected to the MCU.
In high integrity embedded systems, such as the ones that actually have a real emergency stop, it is however common to keep the program running but revert it to a safe state. The MCU keeps running but all dangerous outputs etc are disabled, either to allow the system to get shut down in a controlled manner, or to "limp home" and keep running as well as it is still capable of. Writing software for such system is quite a big topic of its own.
Short answer, as #Felix G said in comment: If you work with an operating system, exit(0); is relevant, if you work with bare-metal application, no.
Please refer to #Felix G comment for more details.
By the way **exit(0);** is not a correct statement in C. You may mean exit(0); or /*exit(0);*/.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am beginner and I am little bit confused about the difference between a task in RTOS and state machines. let's take an example of state machine I am willing to implement:
enum states{
READY_STATE
RUNNING_STATE
BLOCKED_STATE
FINISHED_STATE
}STATES;
What are the benefits of using RTOS and creating tasks if this state machine can be enumerated with events/interrupts without using any RTOS?
RTOS is used to handle program complexity by placing unrelated tasks in different procedures, so that they can execute seemingly simultaneously. For example, it might make sense to split application logic, GUI and serial communication in 3 independent processes.
Whether this gives true multi-processing, or multi-processing simulation, depends on the number of CPU cores available. Traditionally, most RTOS are multi-processing simulation on single-core.
A state machine on the other hand, is a program design specification, which may or may not have the purpose of splitting up complexity. So it is not necessarily related to RTOS.
You can however design a "poor man's RTOS" as a manner of finite state machine, where you give each state a certain time slice and expect the state to be finished before it elapses (or the watchdog will bite). This can give the same real-time behavior as a RTOS, but there will only be one single stack and no "true" context switches.
Picking bare metal or RTOS depends a lot on the program complexity. Unless the original program design is state of the art (it rarely is), bare metal programs tend to become a pain when they grow up to somewhere between 50k-100k LOC. In these situations, picking a RTOS from the start would perhaps have been wiser.
On the other hand, if you don't think the program will ever grow that large, bare metal is so much easier to work with. RTOS introduces extra complexity, and extra complexity introduces bugs. The golden rule is to always keep software as simple as possible.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I understand that implementing a state machine is the perfect way to program the computer. Since state machines are typically programmed using non-blocking calls, I wonder why blocking calls similar to the Berkeley sockets APIs were invented? Don't they encourage bad programming practice?
Thanks in advance.
Edit: The idea behind this question is to establish the fact that a multi-context event driven state machine based on non-blocking IO is indeed the perfect way to program the computer. Everything else is amateur. People who think otherwise should allow for a debate.
Your question makes some pretty substantial assertions / assumptions:
the underlying nature of computers is a state machine?
Well, surely you can model computers as state machines, but that does not in itself mean that such a model represents some fundamental "underlying nature".
I understand that implementing a state machine is the perfect way to program the computer.
Then by all means, write all your programs as state machines. Good luck.
In real life, some tasks can be conveniently and effectively written as state machines, but there are many for which a state-machine approach would be cumbersome to write and difficult to understand or maintain.
There is no "perfect" way to program a computer. Indeed, it would be pretty pretentious to claim perfection even for a single program.
Since state machines are typically programmed using non-blocking calls,
You don't say? I think you would need to be a lot more specific about what you mean by this. I have written state-machine based software at times in the past, and I would not characterize any of it as having been implemented using non-blocking calls, nor as exposing a non-blocking external API.
I wonder why blocking calls similar to the Berkeley sockets APIs were invented? Don't they encourage bad programming practice?
Before we could even consider this question, you would have to define what you mean by "bad programming practice". From what I can see, however, you are assuming the conclusion:
you assert that a state-machine approach to programming is ideal, with the implication that anything else is sub-par.
you claim, without support, that only non-blocking calls have state-machine nature
you conclude that anything that uses blocking calls must exhibit bad programming practice.
Your conclusion is not consistent with the prevailing opinion and practice of the programming community, to the extent that I can gauge it. Your argument is hollow and unconvincing.
Multiple processes (or later, threads) with synchronous (blocking) calls are easy to understand and program and easily composable - that is, you can take two tasks that are made up of synchronous calls and run them at the same time (via a scheduler) without having to modify either one in any way.
Programming as a state machine, on the other hand, requires either manually adding states (possibly in combinatorically growing numbers) when you add new code, or some kind of tightly-coupled framework of registering handlers and explicitly storing state for the next handler to use.
What? 'blocking call' implies preemptive multitasking. The kernel of such an OS is a state-machine with interrupts as input events and a set of running threads as output actions.
The OS kernel is a state machine, and blocking calls conveniently move the FSM functionality into the kernel so that you don't have to write the miserable state-machines in user apps.
I understand that implementing a state machine is the perfect way to program the computer
What? 'perfect'? What? Have you ever developed, debugged and delivered any non-trivial multithreaded app?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
So I have an embedded (Linux) system with a crypto co-processor and two userspace applications which need to use it: SSL (httpd) and proprietary code; maximizing speed and efficiency is the main requirement. I spent they day examining the kernel hooks and part registers and have come to three possible solutions:
1) Access the co-processor directly since it's memory mapped;
2) Use the /dev/crypto library
3) Use OpenSSL calls for my proprietary application
During standard operation, SSL is used very rarely and the proprietary application produces a very heavy load of plaintext needing crypto. Here's the pros and cons for each option as I see them and how I got to this quandry:
1) Direct access
--Pros: probably the fastest method, closest to complete control of the crypto co-processor, least overhead, great for the proprietary app
--Cons: Race conditions or interference could occur when SSL is being used... I'm not sure how bad two userspace apps trying to asynchronously share a hardware resource could hork things up, and I may not know until a customer finds out and complains
2) /dev/crypto
--Pros: SSL already uses it, I believe it's session-based, so sharing problems would be mitigated if not avoided completely
--Cons: More overhead, lack of documentation for proper ioctl()s to configure the co-processor correctly for optimal, high duty cycle use
3) Use SSL
--Pros: already setup and working with /dev/crypto, rarely used... so it's just there and available for crypto calls, and probably the best resource sharing management
--Cons: Probably the most overhead, may not be using /dev/crypto as efficiently as it possible, things could get bursty when both the proprietary app and httpd require SSL
I'd really like to use option 1, and will code up a test framework in the morning, but I'm curious if anyone else out there has had this problem or has any opinions. Thanks!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
After writing several different custom serial protocols for various projects, I've started to become frustrated with re-inventing the wheel every time. In lieu of continuing to develop custom solutions for every project, I've been searching for a more general solution. I was wondering if anyone knows of a serial protocol (or better yet, implementation) that meets the following requirements:
Support multiple devices. We'd like to be able to support an RS485 bus.
Guaranteed delivery. Some sort of acknowledgement mechanism, and some simple error detection (CRC16 is probably fine).
Not master/slave. Ideally the slave(s) would be able to send data asynchronously. This is mostly just for aesthetic reasons, the concept of polling each slave doesn't feel right to me.
OS independence. Ideally it wouldn't rely on a preemptive multitasking environment at all. I'm willing to concede this if I can get the other stuff.
ANSI C. We need to be able to compile it for several different architectures.
Speed isn't too much of an issue, we're willing to give up some speed in order to meet some of those other needs. We would, however, like to minimize the amount of required resources.
I'm about to start implementing a sliding window protocol with piggybacked ACKs and without selective repeat, but thought that perhaps someone could save me the trouble. Does anyone know of an existing project that I could leverage? Or perhaps a better strategy?
UPDATE
I have seriously considered a TCP/IP implementation, but was really hoping for something more lightweight. Many of the features of TCP/IP are overkill for what I'm trying to do. I'm willing to accept (begrudgingly) that perhaps the features I want just aren't included in lighter protocols.
UPDATE 2
Thanks for the tips on CAN. I have looked at it in the past and will probably use it in the future. I'd really like the library to handle the acknowledgements, buffering, retries etc, though. I guess I'm more looking for a network/transport layer instead of a datalink/physical layer.
UPDATE 3
So it sounds like the state of the art in this area is:
A trimmed down TCP/IP stack. Probably starting with something like lwIP or uIP.
A CAN based implementation, it would probably rely heavily on the CAN bus, so it wouldn't be useful on other physical layers. Something like CAN Festival could help along the way.
An HDLC or SDLC implementation (like this one). This is probably the route we'll take.
Please feel free to post more answers if you come across this question.
Have you considered HDLC or SDLC?
There's also LAP/D (Link Access Protocol, D-Channel).
Uyless Black's "Data Link Protocols" is always nearby on my bookshelf - you might find some useful material in there too (even peruse the TOC & research the different protocols)
CAN meets a number of your criteria:
Support multiple devices: It supports a large number of devices on one bus. It's not, however, compatible with RS485.
Guaranteed delivery: The physical layer uses bit-stuffing and a CRC, all of which are implemented in hardware on an increasing number of modern embedded processors. If you need acknlowedgement, you need to add that on top yourself.
Not master/slave: There are no masters or slaves; all devices can transmit whenever they want. The processor hardware deals with arbitration and contention.
OS independence: Not applicable; it's a low-level bus. What you put on top of that is up to you.
ANSI C: Again, not applicable.
Speed: Typically, up to 1 Mbps up to 40 m; you can choose your own speed for your application.
As mentioned, its definition is fairly low-level, so there's still work to be done to turn it into a full protocol to meet your needs. However, the fact that a lot of the work is done in hardware for you does it make very useful for a variety of applications.
I'd guess a reasonable starting point could be uIP.
(Adding Wikipedia article on µIP since original link is dead.)
Would you consider the MODBUS protocol? It is master/slave oriented, so the slave could not initiate the transfer, but otherwise is lightweight for implementation, free, and well supported with high level tools. You should just get a grasp on their terminology /like holding register, input register, output coil etc).
Phy level could be RS232, RS485, Ethernet...
Have a look at Microcontroller Internet Network (MIN):
https://github.com/min-protocol/min
Inspired by CAN but using standard UART hardware, with Fletcher's checksum and frame format checking for error detection and byte-stuffing to mark a frame header.
Take a look at Profibus.
If you don't want master/slave, I think you ought to do the arbitration with hardware (Canbus, FlexRay).