SCIF Issue on Xeon Phi

SCIF Issue on Xeon Phi - c

I am trying to use SCIF inter-process communication on Xeon Phi. My program has two processes, one process writes data to another process using scif_writeto. Currently, I encountered an error " No device or address" for the scif_writeto API. I checked that the end point is set up correct, the offset is also returned correctly. I don't have any idea about what's going wrong here. Is there any good suggestion to debug this issue?

In user mode scif_writeto() returns -1 in case of fail and set errno to indicate the error. Possible errors are described in scif.h.
You could check the errno to debug your problem.

Related

Spi interrupt handler works when a printf() is used

I am trying to initiate a spi communication between an omap processor an sam4l one. I have configured spi protocol and omap is the master. Now what I see is the test data I am sending is correctly reaching on sam4l and I can see the isr is printing that data. Using more printf here and there in isr makes the operation happen and the respective operation happens, but if I remove all printfs I can't see any operation happening. What can be the cause of this anomaly? Is it a usual case of wrong frequency settings or something?
If code is needed I will post that too but its big.
Thanks

I think you are trying to print message in driver.
As printing message on console with slow down your driver, it may behave slowly and your driver work well.
Use pr_info() for debug and change setting to not come message on console by editing /proc/sys/kernel/printk to 4 4 1 7
-> It will store debug message in buffer.
-> Driver not slow down because of printing message on screen.
-> And you can see it by typing dmesg command later.
Then find orignal problem which may cause error.

If a routine works with printf "here and there" and not otherwise, almostcertainly the problem is that there are timing issues. As a trivial example, let's say you write to an SPI flash and then check its content. The flash memory write will take some times, so if you check immediately, the data would not be valid, but if you insert a printf call in between, it may have taken enough time that the read back is now valid.

how to simulate abnormal case for socket/tcp programming in linux, such as terminating one side of connection?

i am learning to use SO_SNDTIMEO and SO_RCVTIMEO to check the timeout.
It is easy to use with read socket. But when i want to check write timeout, it always return successful. Here is what i did:(all in blocking mode)
close the client read socket and exit before server start write
terminate the client before server start write
unplug the cable of server after accept but before write
well, it seems all these case write just return sucessfully.
I think the reason should be that port is resource managed by os, and at the client side, after program gone, the tcp connection still shows FIN_WAIT2 state.
so, is there any convenient way to simulate some cases that write can receive errors such as EPIPE, EAGAIN?

How to get the error EAGAIN?
To get the error EAGAIN, you need to be using Non-Blocking Sockets. With Non-Blocking sockets, you need to write huge amounts of data (and stop receiving data on the peer side), so that your internal TCP buffer gets filled and returns this error.
How to get the error EPIPE?
To get the error EPIPE, you need to send large amount of data after closing the socket on the peer side. You can get more info about EPIPE error from this SO Link. I had asked a question about Broken Pipe Error in the link provided and the accepted answer gives a detailed explanation. It is important to note that to get EPIPE error you should have set the flags parameter of send to MSG_NOSIGNAL. Without that, an abnormal send can generate SIGPIPE signal.
Additional Note
Please note that it is difficult to simulate a write failure, as TCP generally stores the data that you are trying to write into it's internal buffer. So, if the internal buffer has sufficient space, then you won't get an error immediately. The best way is to try to write huge amounts of data. You can also try setting a smaller buffer size for send by using setsockopt function with SO_SNDBUF option

You can simulate errors using fault injection. For example, libfiu is a fault injection library that comes with an example project that allows you to simulate errors from POSIX functions. Basically it uses LD_PRELOAD to inject a wrapper around the regular system calls (including write), and then the wrapper can be configured to either pass through to the real system call, or return whatever error you like.

You could set the receive buffer size to be really small on one side, and send a large buffer on the other. Or on the one side set the send buffer small and try to send a large message.
Otherwise the most common test (I think) is to let the server and client talk for a while, and then remove a network cable.

How can i catch a runtime error in ansi C90

I am using the library Function ConnectToTCPServer. This function times out when the host is not reachable. In that case the application crashes with the following error:
"NON-FATAL RUN-TIME ERROR: "MyClient.c", line 93, col 15, thread id 0x000017F0: Library function error (return value == -11 [0xfffffff5]). Timeout error"
The Errorcode 11 is a Timeout error, so this could happen quite often in my application - however the application crashes - i would like to catch this error rather than having my application crash.
How can i catch this runtime error in Ansi C90?
EDIT:
Here is a Codesnippet of the current use:
ConnectToTCPServer(&srvHandle, srvPort, srvName, HPMClientCb, answer, timeout);
with
int HPMClientCb(UINT handle, int xType, int errCode, void *transData){
printf("This was never printed\n");
return errCode;
}
The Callbackfunction is never called. My Server is not running, so ConnectToTCPServer will timeout. I would suspect that the callback is called - but it never is called.
EDIT 2: The Callback function is actually not called, the Returnvalue of ConnectToTCPServer contains the same error information. I think it might be a bug that ConnectToTCPServer throws this error. I just need to catch it and bin it in C90. Any Ideas?
EDIT 3: I tested the Callbackfunction, on the rare occaision that my server is online the callback function is actually called - this does not help though because the callback is not called when an error occurs.

Looking in NI documentation, I see this:
"Library error breakpoints -- You can set an option to break program execution whenever a LabWindows/CVI library function returns an error during run time. "
I would speculate they have a debug option to cause the program to stop on run-time errors, which you need to disable in configuration, in compile time or in run-time.
My first guess would have been configuration value or compilation flag, but this is the only option I found, which is a run-time option:
// If debugging is enabled, this function directs LabWindows/CVI not
// to display a run-time error dialog box when a National Instruments
// library function reports an error.
DisableBreakOnLibraryErrors();
Say if it helped.

Theres no such thing as a general case of "catching" an error (or an 'exception') in standard C. Thats up to your library to decide what to do with it. Likely its logging its state and then simply calling abort(). In Unix, that signals SIGABRT which can be handled and not just exit()ed. Or their library may just be logging and then calling exit().
You could run your application under a utility like strace to see what system calls are being performed and what signals are being asserted.
I'd work with your vendor if you can't make any headway otherwise.

From the documentation, it seems you should get a call to your clientCallbackFunction when an error occurs. If you don't, you should edit your question to clarify that.

I'm not sure I understand you.
I looked at the documentation for the library function ConnectToTCPServer(). It returns an int; 0 means success, negative numbers are the error codes.
EDIT: Here is a Codesnippet of the
current use:
ConnectToTCPServer(&srvHandle, srvPort, srvName, HPMClientCb, answer, timeout);
If that's really the current use, you don't seem to be trying to tell whether ConnectToTCPServer() succeeds. To do that, you'd need
int err_code;
...
err_code = ConnectToTCPServer(&srvHandle, srvPort, srvName, HPMClientCb, answer, timeout);
and then test err_code.
The documentation for ConnectToTCPServer()implies that your callback function won't be called unless there's a message from a TCP server. No server, no message. In that case,
ConnectToTCPServer() should return a negative number.
You should check the return value of ConnectToTCPServer().
Finding a negative number there, you should do something sensible.
Did I understand the documentation correctly?

Normally, you should be able to simply check the return value. The fact that your application exits implies that something is already catching the error and asserting (or something similar). Without seeing any context (i.e. code demonstrating how you're using this function), it's difficult to be any more precise.

The documentation states that ConnectToTCPServer will return the error code. The callback is only called if the connection is established, disconnected or when there is data ready to be read.
The message you get states that the error is NON-FATAL, hence it shouldn't abort. If you're sure the code doesn't abort later it seems indeed like a bug in the library.
I'm not familiar with CVI, but there might be a (compile-/runtime-) option to abort even on non-fatal errors (for debugging purposes). If you can reproduce this in a minimal example you should report it to NI.

uart buffer is not read

I am trying to read binary data from a serial device in c on linux.
The problem is, that sometimes there are chars in the driver's internal buffer, but polling (with select(2)) returns saying the device is not ready to be read.
I have read and re-read the man of termios and all the related man and searched over the internet. I believe I set all the flags correctly (namely VTIME, VMIN) and unset ICANON.
I tried using the function "tcmakeraw", as well, but it didn't solve the problem.
Do you guys have any ideas about what should I do?
Kind regards & Thanks in advance
Yannay

You should show us the code. I would start with using cfmakeraw on the serial port.
Once you have things working in raw mode, you can make modification and see how it works.
Here is a list of question or things you could check :
after modifying the attribute, using for example cfmakeraw, do you call tcsetattr(...) to
apply your change ?
How do you prove there is still data in the driver receive buffer ?
do you check your system call for errors ?
what is the result of stracing your program ?
Edit based on your comments :
Your protocol "guarantee" .... => check your assumption ! Unchecked, crystal clear guarantee are a good coandidate for "impossible error"
Basically : either select is broken, or your serial driver. Reason for serial driver being broken is a hardware fifo not being full enough to trigger un interrupt, or loosing an interrupt.

What happens when you read directly (not through C) /dev/ttyS0 (or equiv) after you setserial your parameters. Are you able to get the needed data outside of the select()?

C Socket Programming, problems with select() and fd_set

I'm learning my way about socket programming in C (referring to Beej).
Here is a simple multi-user chat server i'm trying to implement:
http://pastebin.com/gDzd0WqP
On runtime, it gives Bus Error. It's coming from the lines 68-78.
Help me trace the source of the problem?
in fact, WHY is my code even REACHING that particular region? I've just run the server. no clients have connected.. :#
ps - i know my code is highly unreliable (no error checks anywhere), but i WILL do that at a later stage, i just want to TEST the functionality of the code before implementing it in all it's glory ;)

line 81
msg[MSG_SIZE] = '\0';`
overruns your buffer. Make it
msg[MSG_SIZE - 1] = '\0';`
You also need to check the return value of all the calls that can fail, that's line 39,42,45,68 and 80
Edit: And if you'd checked for errors, likely you'd seen the accept() call fail, likely due to the socket not being in listen mode - that is, you're missing a call to listen()

Another thing to consider is that you can't necessarily copy fd_set variables by simple assignment. The only portable way to handle them is to regenerate the fd_set from scratch by looping over a list of active file descriptors each time.