Does SDL on linux support more than one gamepad/joystick?

Does SDL on linux support more than one gamepad/joystick? - c

I have a cheap PS3 controller and a NEO-GEO X controller. They are both detected on eg. Fedora 20 and a Lubuntu 14.04. They appear in lsusb
Bus 001 Device 012: ID 0e8f:0003 GreenAsia Inc. MaxFire Blaze2
Bus 001 Device 016: ID 1292:4e47 Innomedia
The devices appear underneath /dev/input. Running udevadm on them both shows that the GreenAsia device uses the pantherlord driver whereas the other device uses hid-generic
If I run the following test code only the GreenAsia device is reported by SDL. If I unplug it then the other device is detected. Is this a known limitation of SDL or some other issue?
// from http://www.libsdl.org/release/SDL-1.2.15/docs/html/guideinput.html
#include "SDL/SDL.h"
int main () {
if (SDL_Init( SDL_INIT_VIDEO | SDL_INIT_JOYSTICK ) < 0)
{
fprintf(stderr, "Couldn't initialize SDL: %s\n", SDL_GetError());
exit(1);
}
printf("%i joysticks were found.\n\n", SDL_NumJoysticks() );
printf("The names of the joysticks are:\n");
for( int i=0; i < SDL_NumJoysticks(); i++ )
{
printf(" %s\n", SDL_JoystickName(i));
}
return 0;
}

The answer to my question appears to be "no" if only one of the joysticks maps to a device /dev/input/event13 or similar, which is what happens to my PS3 controller in my case.
In SDL_SYS_JoystickInit there is the following code
#if SDL_INPUT_LINUXEV
/* This is a special case...
If the event devices are valid then the joystick devices
will be duplicates but without extra information about their
hats or balls. Unfortunately, the event devices can't
currently be calibrated, so it's a win-lose situation.
So : /dev/input/eventX = /dev/input/jsY = /dev/jsY
*/
if ( (i == 0) && (numjoysticks > 0) )
break;
#endif
When i is 0 it is looking for the "event" devices. My PS3 controller gets devices /dev/input/event13 and /dev/input/js1, but my NEO-GEO X controller only has the device /dev/input/js0, so breaking from the loop causes it to get ignored.
A workaround in this case is to add the device that doesn't have a corresponding "event" device to SDL_JOYSTICK_DEVICE
Thanks to Brian McFarland with the help in getting to the bottom of this.

Related

Wrong USB Location address of the FT4222 on my Windows7

I have a windows 7 laptop (Sony pcg 81113M) with 3 USB ports, When I connect 2 * Ft 4222HQ and I run the "Getting start code for the Ft4222" (C++ Qtcreator) I get a wrong USB location which is 0x00. The same when I connect only one in that specific 2 ports.
I am verifying the USB-location of connected FTDI using the software USBview.
I am using "connect by location" in my program and if I connect two at the same time It will conseder it as one device (cause the Location Id is the same)
The FTDI driver is the latest version and i see all the interfaces in my device manager
Note: the third USB port works normally and I get a correct Location Id:
The os is a win7 64bit, the FTDI code runs on the 32bit (I also get the same result with 64bit)
Do anyone has an Idea about that? is there some other test that I can do it to figure it out the problem?
Here is Getting started FTDI code:
[source] https://www.ftdichip.com/Support/SoftwareExamples/LibFT4222-v1.4.4.zip
void ListFtUsbDevices()
{
FT_STATUS ftStatus = 0;
DWORD numOfDevices = 0;
ftStatus = FT_CreateDeviceInfoList(&numOfDevices);
for(DWORD iDev=0; iDev<numOfDevices; ++iDev)
{
FT_DEVICE_LIST_INFO_NODE devInfo;
memset(&devInfo, 0, sizeof(devInfo));
ftStatus = FT_GetDeviceInfoDetail(iDev, &devInfo.Flags, &devInfo.Type, &devInfo.ID, &devInfo.LocId,
devInfo.SerialNumber,
devInfo.Description,
&devInfo.ftHandle);
if (FT_OK == ftStatus)
{
printf("Dev %d:\n", iDev);
printf(" Flags= 0x%x, (%s)\n", devInfo.Flags, DeviceFlagToString(devInfo.Flags).c_str());
printf(" Type= 0x%x\n", devInfo.Type);
printf(" ID= 0x%x\n", devInfo.ID);
printf(" LocId= 0x%x\n", devInfo.LocId);
printf(" SerialNumber= %s\n", devInfo.SerialNumber);
printf(" Description= %s\n", devInfo.Description);
printf(" ftHandle= 0x%x\n", devInfo.ftHandle);
const std::string desc = devInfo.Description;
if(desc == "FT4222" || desc == "FT4222 A")
{
g_FT4222DevList.push_back(devInfo);
}
}
}
}
Main program:
int main(int argc, char const *argv[])
{
ListFtUsbDevices();
if(g_FT4222DevList.empty()) {
printf("No FT4222 device is found!\n");
return 0;
}
ftStatus = FT_OpenEx((PVOID)g_FT4222DevList[0].LocId, FT_OPEN_BY_LOCATION, &ftHandle);
if (FT_OK != ftStatus)
{
printf("Open a FT4222 device failed!\n");
return 0;
}
printf("\n\n");
printf("Init FT4222 as SPI master\n");
ftStatus = FT4222_SPIMaster_Init(ftHandle, SPI_IO_SINGLE, CLK_DIV_4, CLK_IDLE_LOW, CLK_LEADING, 0x01);
if (FT_OK != ftStatus)
{
printf("Init FT4222 as SPI master device failed!\n");
return 0;
}
printf("TODO ...\n");
printf("\n");
printf("UnInitialize FT4222\n");
FT4222_UnInitialize(ftHandle);
printf("Close FT device\n");
FT_Close(ftHandle);
return 0;
}

From https://ftdichip.com/wp-content/uploads/2020/08/TN_152_USB_3.0_Compatibility_Issues_Explained.pdf Section 2.1.2 Location ID Returned As 0
LocationIDs are not strictly part of the USB spec in the format
provided by FTDI. The feature was added as an additional option to
back up identifying and opening ports by index, serial number or
product description strings.
When connected to a USB 2.0 port the location is provided on the basis
of the USB port that the device is connected to. These values are
derived from specific registry keys. As the registry tree for 3rd
party USB 3.0 host drivers is different to the Microsoft generic
driver the Location ID cannot be calculated.
There is no workaround to this current issue and as such devices
should be listed and opened by index, serial number or product
description strings.
So this is known behaviour. Btw Win 7 EOL date was january 2020. Changing the OS is strongly adviced.

VIDIOC_ENUMINPUT Not returning any video standards

I have been playing around with a userspace application based on uvc driver based on v4l2. I have been trying to get the capabilities of my integrated webcam (this is a laptop), and then I got into one problem. My driver does not set any video standard flags against VIDIOC_ENUMINPUT ioctl. Following is my code.
struct v4l2_capability caps;
memset(&caps, 0, sizeof(caps));
if(-1 == ioctl(fd, VIDIOC_QUERYCAP, &caps)) {
perror("Unable to query capabilities");
return errno;
}
printf(
"-------- VIDIOC_QUERYCAP --------\n"
"Driver = %s\n"
"Card = %s\n"
"Bus Info = %s\n"
"Version = %d\n"
"Capabilities = %#x\n"
"Device Caps = %#x\n",
caps.driver,
caps.card,
caps.bus_info,
caps.version,
caps.capabilities,
caps.device_caps);
int index;
if(-1 == ioctl(fd, VIDIOC_G_INPUT, &index)) {
perror("Unable to get current input index");
return errno;
}
struct v4l2_input input;
memset(&input, 0, sizeof(input));
input.index = index;
if(-1 == ioctl(fd, VIDIOC_ENUMINPUT, &input)) {
perror("Unabel to query attributes of video input");
return errno;
}
printf(
"--------- VIDIOC_ENUMINPUT ---------\n"
"Index = %d\n"
"Name = %s\n"
"Type = %d\n"
"Audio Set = %d\n"
"Video Stds = %lld\n"
"Status = %d\n"
"Capabilities = %d\n",
input.index,
input.name,
input.type,
input.audioset,
input.std,
input.status,
input.capabilities);
And the output looks like the following.
-------- VIDIOC_QUERYCAP --------
Driver = uvcvideo
Card = Integrated_Webcam_HD: Integrate
Bus Info = usb-0000:00:1d.0-1.6
Version = 266001
Capabilities = 0x84200001
Device Caps = 0x4200001
--------- VIDIOC_ENUMINPUT ---------
Index = 0
Name = Camera 1
Type = 2
Audio Set = 0
Video Stds = 0 // <--- Problem here.
Status = 0
Capabilities = 0
Notice that the video standards flag is set to 0. To further drill down the problem, I tried VIDIOC_G_STD ioctl, as follows,
struct v4l2_standard std;
memset(&std, 0, sizeof(std));
if(-1 == ioctl(fd, VIDIOC_G_STD, &std)) {
perror("Error");
return errno;
}
But receives the following error.
Error: Inappropriate ioctl for device
What could be the conclusion? Am I doing anything wrong here?
Platform Details
Linux linux 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Driver version: 4.15.17
Device node : /dev/video0 (only one device)

I think I found the answer myself. On closer evaluation, I have found that the integrated webcam on my laptop is on the USB bus internally. USB class devices are an exception for v4l2 video standard ioctl. As per the documentation,
Special rules apply to devices such as USB cameras where the notion of video standards makes little sense. More generally for any capture or output device which is incapable of capturing fields or frames at the nominal rate of the video standard, or that does not support the video standard formats at all. Here the driver shall set the std field of struct v4l2_input and struct v4l2_output to zero and the VIDIOC_G_STD, VIDIOC_S_STD, ioctl VIDIOC_QUERYSTD and ioctl VIDIOC_ENUMSTD ioctls shall return the ENOTTY error code or the EINVAL error code.
Thus, I think my camera falls into one of these categories, and STD query is not really applicable in my case. I'm not sure if this is true for MIPI or Parallel buses. I will update once I do a little more experiment with those hardware.

Cannot open USB device with libusb-1.0 in cygwin

I'm trying to interface with a USB peripheral using libusb-1.0 in cygwin.
libusb_get_device_list(...) works fine, I get a list of USB devices. It finds the device with the correct VendorID and ProductID in the device list, but when libusb_open(...) is called with that device, it always fails with the error code LIBUSB_ERROR_NOT_FOUND.
I don't think it's a permission issue, I've tried running this as admin, and there's a separate error code (LIBUSB_ERROR_ACCESS) for that. This same code works with libusb-1.0 in Linux.
unsigned init_usb(int vendor_id, int product_id, int interface_num)
{
int ret = libusb_init(NULL);
if (ret < 0) return CONTROL_ERROR;
libusb_device **devs = NULL;
int num_dev = libusb_get_device_list(NULL, &devs);
libusb_device *dev = NULL;
for (int i = 0; i < num_dev; i++) {
struct libusb_device_descriptor desc;
libusb_get_device_descriptor(devs[i], &desc);
if (desc.idVendor == vendor_id && desc.idProduct == product_id) {
dev = devs[i];
break;
}
}
if (dev == NULL) return CONTROL_ERROR;
libusb_device_handle *devh = NULL;
ret = libusb_open(dev, &devh);
//ret is always -5 here (in cygwin)!
if (ret < 0) return CONTROL_ERROR;
libusb_free_device_list(devs, 1);
return CONTROL_SUCCESS;
}

It turns out this was a kind of driver issue. I had to tell Windows to associate the particular device I'm using with the libusb drivers.
libusb-win32-1.2.6.0 comes with some tools to make that association (although you may need to configure your system to allow the installation of unsigned drivers).
There's one tricky bit. If you just want to associate the device with libusb, you can use the inf-wizard.exe tool to make that association, but that will change the primary association to be with libusb. In my case, the device is a USB Audio Class device (i.e. USB sound card) that also has some libusb functionality. When I used inf-wizard.exe, libusb started working (yay!), but then it stopped working as an audio device.
In my case, I needed to use the install-filter-win.exe tool to install a filter driver for libusb. That allows the device to still show up as a USB Audio device, but also interact with it using libusb.

Getting nan results using Peer-to-Peer in Tesla K80 Cluster

I'm applying UVA and OpenMP in my algorithm to make it powerful.
The thing is that when I launch a parallel kernel, that is for example, 3 CPU threads launch one kernel at the same time. One thread has nan values.
It seems that GPU X cannot read a variable from GPU0.
That is weird taking into account that I grant access to every GPU to 0 (In this case 1 and 2).
Is there a problem to use UVA and OpenMP together? Or is a problem of the code?
Here is the code and the results.
I've created a MCVE to demonstrate the error here:
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <math.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "math_constants.h"
#include <omp.h>
#include <cufft.h>
inline bool IsGPUCapableP2P(cudaDeviceProp *pProp)
{
#ifdef _WIN32
return (bool)(pProp->tccDriver ? true : false);
#else
return (bool)(pProp->major >= 2);
#endif
}
inline bool IsAppBuiltAs64()
{
#if defined(__x86_64) || defined(AMD64) || defined(_M_AMD64)
return 1;
#else
return 0;
#endif
}
__global__ void kernelFunction(cufftComplex *I, int i, int N)
{
int j = threadIdx.x + blockDim.x * blockIdx.x;
int k = threadIdx.y + blockDim.y * blockIdx.y;
if(j==0 & k==0){
printf("I'm thread %d and I'm reading device_I[0] = %f\n", i, I[N*j+k].x);
}
}
__host__ int main(int argc, char **argv) {
int num_gpus;
cudaGetDeviceCount(&num_gpus);
if(num_gpus < 1){
printf("No CUDA capable devices were detected\n");
return 1;
}
if (!IsAppBuiltAs64()){
printf("%s is only supported with on 64-bit OSs and the application must be built as a 64-bit target. Test is being waived.\n", argv[0]);
exit(EXIT_SUCCESS);
}
printf("Number of host CPUs:\t%d\n", omp_get_num_procs());
printf("Number of CUDA devices:\t%d\n", num_gpus);
for(int i = 0; i < num_gpus; i++){
cudaDeviceProp dprop;
cudaGetDeviceProperties(&dprop, i);
printf("> GPU%d = \"%15s\" %s capable of Peer-to-Peer (P2P)\n", i, dprop.name, (IsGPUCapableP2P(&dprop) ? "IS " : "NOT"));
//printf(" %d: %s\n", i, dprop.name);
}
printf("---------------------------\n");
num_gpus = 3; //The case that fails
omp_set_num_threads(num_gpus);
if(num_gpus > 1){
for(int i=1; i<num_gpus; i++){
cudaDeviceProp dprop0, dpropX;
cudaGetDeviceProperties(&dprop0, 0);
cudaGetDeviceProperties(&dpropX, i);
int canAccessPeer0_x, canAccessPeerx_0;
cudaDeviceCanAccessPeer(&canAccessPeer0_x, 0, i);
cudaDeviceCanAccessPeer(&canAccessPeerx_0 , i, 0);
printf("> Peer-to-Peer (P2P) access from %s (GPU%d) -> %s (GPU%d) : %s\n", dprop0.name, 0, dpropX.name, i, canAccessPeer0_x ? "Yes" : "No");
printf("> Peer-to-Peer (P2P) access from %s (GPU%d) -> %s (GPU%d) : %s\n", dpropX.name, i, dprop0.name, 0, canAccessPeerx_0 ? "Yes" : "No");
if(canAccessPeer0_x == 0 || canAccessPeerx_0 == 0){
printf("Two or more SM 2.0 class GPUs are required for %s to run.\n", argv[0]);
printf("Support for UVA requires a GPU with SM 2.0 capabilities.\n");
printf("Peer to Peer access is not available between GPU%d <-> GPU%d, waiving test.\n", 0, i);
exit(EXIT_SUCCESS);
}else{
cudaSetDevice(0);
printf("Granting access from 0 to %d...\n", i);
cudaDeviceEnablePeerAccess(i,0);
cudaSetDevice(i);
printf("Granting access from %d to 0...\n", i);
cudaDeviceEnablePeerAccess(0,0);
printf("Checking GPU%d and GPU%d for UVA capabilities...\n", 0, 1);
const bool has_uva = (dprop0.unifiedAddressing && dpropX.unifiedAddressing);
printf("> %s (GPU%d) supports UVA: %s\n", dprop0.name, 0, (dprop0.unifiedAddressing ? "Yes" : "No"));
printf("> %s (GPU%d) supports UVA: %s\n", dpropX.name, i, (dpropX.unifiedAddressing ? "Yes" : "No"));
if (has_uva){
printf("Both GPUs can support UVA, enabling...\n");
}
else{
printf("At least one of the two GPUs does NOT support UVA, waiving test.\n");
exit(EXIT_SUCCESS);
}
}
}
}
int M = 512;
int N = 512;
cufftComplex *host_I = (cufftComplex*)malloc(M*N*sizeof(cufftComplex));
for(int i=0;i<M;i++){
for(int j=0;j<N;j++){
host_I[N*i+j].x = 0.001;
host_I[N*i+j].y = 0;
}
}
cufftComplex *device_I;
cudaSetDevice(0);
cudaMalloc((void**)&device_I, sizeof(cufftComplex)*M*N);
cudaMemset(device_I, 0, sizeof(cufftComplex)*M*N);
cudaMemcpy2D(device_I, sizeof(cufftComplex), host_I, sizeof(cufftComplex), sizeof(cufftComplex), M*N, cudaMemcpyHostToDevice);
dim3 threads(32,32);
dim3 blocks(M/threads.x, N/threads.y);
dim3 threadsPerBlockNN = threads;
dim3 numBlocksNN = blocks;
#pragma omp parallel
{
unsigned int i = omp_get_thread_num();
unsigned int num_cpu_threads = omp_get_num_threads();
// set and check the CUDA device for this CPU thread
int gpu_id = -1;
cudaSetDevice(i % num_gpus); // "% num_gpus" allows more CPU threads than GPU devices
cudaGetDevice(&gpu_id);
//printf("CPU thread %d (of %d) uses CUDA device %d\n", cpu_thread_id, num_cpu_threads, gpu_id);
kernelFunction<<<numBlocksNN, threadsPerBlockNN>>>(device_I, i, N);
cudaDeviceSynchronize();
}
cudaFree(device_I);
for(int i=1; i<num_gpus; i++){
cudaSetDevice(0);
cudaDeviceDisablePeerAccess(i);
cudaSetDevice(i);
cudaDeviceDisablePeerAccess(0);
}
for(int i=0; i<num_gpus; i++ ){
cudaSetDevice(i);
cudaDeviceReset();
}
free(host_I);
}
The results are:
Both GPUs can support UVA, enabling...
I'm thread 0 and I'm reading device_I[0] = 0.001000
I'm thread 2 and I'm reading device_I[0] = 0.001000
I'm thread 1 and I'm reading device_I[0] = -nan
The command line to compile is:
nvcc -Xcompiler -fopenmp -lgomp -arch=sm_37 main.cu -lcufft
Here is the result of simpleP2P:
[miguel.carcamo#belka simpleP2P]$ ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 8
> GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU4 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU5 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU6 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU7 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer-to-Peer (P2P) access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> Peer-to-Peer (P2P) access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla K80 (GPU0) supports UVA: Yes
> Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 0.79GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Verification error # element 0: val = nan, ref = 0.000000
Verification error # element 1: val = nan, ref = 4.000000
Verification error # element 2: val = nan, ref = 8.000000
Verification error # element 3: val = nan, ref = 12.000000
Verification error # element 4: val = nan, ref = 16.000000
Verification error # element 5: val = nan, ref = 20.000000
Verification error # element 6: val = nan, ref = 24.000000
Verification error # element 7: val = nan, ref = 28.000000
Verification error # element 8: val = nan, ref = 32.000000
Verification error # element 9: val = nan, ref = 36.000000
Verification error # element 10: val = nan, ref = 40.000000
Verification error # element 11: val = nan, ref = 44.000000
Enabling peer access...
Shutting down...
Test failed!

It seems, based on the debugging in the comments, that the problem was ultimately related to the system that was being used, not OP's code.
K80 is a dual-GPU device, so it has a PCIE bridge chip on-board. Proper use of this configuration, especially when using Peer-to-Peer (P2P) traffic requires proper settings in the upstream PCIE switches and/or root complex. These settings are normally made by the system BIOS, and are not normally/typically software-configurable.
One possible indicator when these settings are incorrect is that the simpleP2P CUDA sample code will report errors during results validation. Therefore, a good test on any system where you are having trouble with P2P code is to run this particular CUDA sample code (simpleP2P). If validation errors are reported (see OP's posting for an example), then these should be addressed first, before any attempt is made to debug the user's P2P code.
The best recommendation is to use a system that has been validated by the system vendor for K80 usage. This is generally good practice for any usage of Tesla GPUs, as these GPUs tend to make significant demands on the host system from the standpoints of:
power delivery
cooling requirements
system compatibility (two examples are the types of PCIE settings being discussed here, as well as resource mapping and bootability issues also referred to by OP in the comments)
OEM validated systems will generally have the fewest issues associated with the above requirements/demands that Tesla GPUs place on the host system.
For this particular issue, troubleshooting starts with the simpleP2P test. When validation errors are observed in that test (but no other CUDA runtime errors are reported) then the PCIE settings may be suspect. The easiest way to attempt to address these are by checking for a newer/updated system BIOS which may have the settings correct for this type of usage, or else will offer a BIOS setup option that allows the user to make the necessary changes. The settings involved here are PCIE ACS settings, and if a BIOS setup option is available, those terms will likely be involved. Since BIOS setup varies from system to system, it's not possible to be specific here.
If the BIOS update and/or settings modification does not resolve the issue, then it's probably not fixable for that particular system type. It's possible to troubleshoot the process a bit further using the final steps described here but such troubleshooting, even if successful, cannot lead to a permanent (i.e. will survive a reboot) fix without BIOS modifications.
If the simpleP2P test runs correctly, then debug focus should return to the user's code. General recommendations of using proper cuda error checking and running the code with cuda-memcheck apply. Furthermore, the simpleP2P sample source code can be then referred to as an example of correct usage of P2P functionality.
Note that in general, P2P support may vary by GPU or GPU family. The ability to run P2P on one GPU type or GPU family does not necessarily indicate it will work on another GPU type or family, even in the same system/setup. The final determinant of GPU P2P support are the tools provided that query the runtime via cudaDeviceCanAccessPeer. P2P support can vary by system and other factors as well. No statements made here are a guarantee of P2P support for any particular GPU in any particular setup.

detecting NVIDIA GPUs without CUDA

I would like to extract a rather limited set of information about NVIDIA GPUs without linking against the CUDA libraries. The only information that is needed is compute capability and name of the GPU, more than this could be useful but it is not required. The code should be written in C (or C++). The information would be used at configure-time (when the CUDA toolkit is not available) and at run-time (when the executed binary is not compiled with CUDA support) to suggest the user that a supported GPU is present in the system.
As far as I understand, this is possible through the driver API, but I am not very familiar with the technical details of what this would require. So my questions are:
What are the exact steps to fulfill at least the minimum requirement (see above);
Is there such open-source code available?
Note that the my first step would be to have some code for Linux, but ultimately I'd need platform-independent code. Considering the platform-availability of CUDA, for a complete solution this would involve code for on x86/AMD64 for Linux, Mac OS, and Windows (at least for now, the list could get soon extended with ARM).
Edit
What I meant by "it's possible through the driver API" is that one should be able to load libcuda.so dynamically and query the device properties through the driver API. I'm not sure about the details, though.

Unfortunately NVML doesn't provide information about device compute capability.
What you need to do is:
Load CUDA library manually (application is not linked against libcuda)
If the library doesn't exist then CUDA driver is not installed
Find pointers to necessary functions in the library
Use driver API to query information about available GPUs
I hope this code will be helpful. I've tested it under Linux but with minor modifications it should also compile under Windows.
#include <cuda.h>
#include <stdio.h>
#ifdef WINDOWS
#include <Windows.h>
#else
#include <dlfcn.h>
#endif
void * loadCudaLibrary() {
#ifdef WINDOWS
return LoadLibraryA("nvcuda.dll");
#else
return dlopen ("libcuda.so", RTLD_NOW);
#endif
}
void (*getProcAddress(void * lib, const char *name))(void){
#ifdef WINDOWS
return (void (*)(void)) GetProcAddress(lib, name);
#else
return (void (*)(void)) dlsym(lib,(const char *)name);
#endif
}
int freeLibrary(void *lib)
{
#ifdef WINDOWS
return FreeLibrary(lib);
#else
return dlclose(lib);
#endif
}
typedef CUresult CUDAAPI (*cuInit_pt)(unsigned int Flags);
typedef CUresult CUDAAPI (*cuDeviceGetCount_pt)(int *count);
typedef CUresult CUDAAPI (*cuDeviceComputeCapability_pt)(int *major, int *minor, CUdevice dev);
int main() {
void * cuLib;
cuInit_pt my_cuInit = NULL;
cuDeviceGetCount_pt my_cuDeviceGetCount = NULL;
cuDeviceComputeCapability_pt my_cuDeviceComputeCapability = NULL;
if ((cuLib = loadCudaLibrary()) == NULL)
return 1; // cuda library is not present in the system
if ((my_cuInit = (cuInit_pt) getProcAddress(cuLib, "cuInit")) == NULL)
return 1; // sth is wrong with the library
if ((my_cuDeviceGetCount = (cuDeviceGetCount_pt) getProcAddress(cuLib, "cuDeviceGetCount")) == NULL)
return 1; // sth is wrong with the library
if ((my_cuDeviceComputeCapability = (cuDeviceComputeCapability_pt) getProcAddress(cuLib, "cuDeviceComputeCapability")) == NULL)
return 1; // sth is wrong with the library
{
int count, i;
if (CUDA_SUCCESS != my_cuInit(0))
return 1; // failed to initialize
if (CUDA_SUCCESS != my_cuDeviceGetCount(&count))
return 1; // failed
for (i = 0; i < count; i++)
{
int major, minor;
if (CUDA_SUCCESS != my_cuDeviceComputeCapability(&major, &minor, i))
return 1; // failed
printf("dev %d CUDA compute capability major %d minor %d\n", i, major, minor);
}
}
freeLibrary(cuLib);
return 0;
}
Test on Linux:
$ gcc -ldl main.c
$ ./a.out
dev 0 CUDA compute capability major 2 minor 0
dev 1 CUDA compute capability major 2 minor 0
Test on linux with no CUDA driver
$ ./a.out
$ echo $?
1
Cheers

Sure these people know the answer:
http://www.ozone3d.net/gpu_caps_viewer
but i can only know that i could be done with an installation of CUDA or OpenCL.
I think one way could be using OpenGL directly, maybe that is what you were talking about with the driver API, but i can only give you these example (CUDA required):
http://www.naic.edu/~phil/hardware/nvidia/doc/src/deviceQuery/deviceQuery.cpp

First, I think NVIDIA NVML is the API you are looking for. Second, there is an open-source project based on NVML called PAPI NVML.

I solved this problem by using and linking statically against the CUDA 6.0 SDK. It produces an application that works also well on a machines that does not have NVIDIA cards or on machines that the SDK is not installed. In such case it will indicate that there are zero CUDA capable devices.
There is an example in the samples included with the CUDA SDK calld deviceQuery - I used snippets from it to write the following code. I decide if a CUDA capable devices are present and if so which has the higest compute capabilities:
#include <cuda_runtime.h>
struct GpuCap
{
bool QueryFailed; // True on error
int DeviceCount; // Number of CUDA devices found
int StrongestDeviceId; // ID of best CUDA device
int ComputeCapabilityMajor; // Major compute capability (of best device)
int ComputeCapabilityMinor; // Minor compute capability
};
GpuCap GetCapabilities()
{
GpuCap gpu;
gpu.QueryFailed = false;
gpu.StrongestDeviceId = -1;
gpu.ComputeCapabilityMajor = -1;
gpu.ComputeCapabilityMinor = -1;
cudaError_t error_id = cudaGetDeviceCount(&gpu.DeviceCount);
if (error_id != cudaSuccess)
{
gpu.QueryFailed = true;
gpu.DeviceCount = 0;
return gpu;
}
if (gpu.DeviceCount == 0)
return gpu; // "There are no available device(s) that support CUDA
// Find best device
for (int dev = 0; dev < gpu.DeviceCount; ++dev)
{
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
if (deviceProp.major > gpu.ComputeCapabilityMajor)
{
gpu.ComputeCapabilityMajor = dev;
gpu.ComputeCapabilityMajor = deviceProp.major;
gpu.ComputeCapabilityMinor = 0;
}
if (deviceProp.minor > gpu.ComputeCapabilityMinor)
{
gpu.ComputeCapabilityMajor = dev;
gpu.ComputeCapabilityMinor = deviceProp.minor;
}
}
return gpu;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight