Error when Importing keras in embedded python in C - c

I'm trying to embed python in my C application. I download the package in python official website and manage to do a simple Hello World.
Now I want to go deeper and use some libraries of python like numpy, keras, tensorflow...
I'm working with Python 3.5.4, I installed all the needed package on my PC with pip3 :
pip3 install keras
pip3 install tensorflow
...
then I created my script and launch it in python environment, it works fine :
Python:
# Importing the libraries
#
import numpy as np
import pandas as pd
dataset2 = pd.read_csv('I:\RNA\dataset19.csv')
X_test = dataset2.iloc[:, 0:228].values
y_test = dataset2.iloc[:, 228].values
# 2.
import pickle
sc = pickle.load(open('I:\RNA\isVerb_sc', 'rb'))
X_test = sc.transform(X_test)
# 3.
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
classifier = Sequential()
classifier.add(Dense(units = 114, kernel_initializer = 'uniform', activation = 'relu', input_dim = 228))
classifier.add(Dropout(p = 0.3))
classifier.add(Dense(units = 114, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dropout(p = 0.3))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.load_weights('I:\RNA\isVerb_weights.h5')
y_pred = classifier.predict(X_test)
y_pred1 = (y_pred > 0.5)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred1)
But when I execute the same script in a C environment with embed python it didn't work :
At first, I execute my script directly with PyRun_SimpleFile with no luck, so I sliced it in multiple instructions with PyRun_SimpleString to detect the problem :
C:
result = PyRun_SimpleString("import numpy as np"); // result = 0 (ok)
result = PyRun_SimpleString("import pandas as pd"); // result = 0 (ok)
...
result = PyRun_SimpleString("import pickle"); // result = 0 (ok)
... (all insctruction above works)
result = PyRun_SimpleString("import keras"); // result = -1 !!
... (all under this failed)
but there is not a single stack trace about this error, I tried this but I just got :
"Here's the output: (null)"
My initialization of Python in C seems correct since others libraries import fine :
// Python
wchar_t *stdProgramName = L"I:\\LIBs\\cpython354";
Py_SetProgramName(stdProgramName);
wchar_t *stdPythonHome = L"I:\\LIBs\\cpython354";
Py_SetPythonHome(stdPythonHome);
wchar_t *stdlib = L"I:\\LIBs\\cpython354;I:\\LIBs\\cpython354\\Lib\\python35.zip;I:\\LIBs\\cpython354\\Lib;I:\\LIBs\\cpython354\\DLLs;I:\\LIBs\\cpython354\\Lib\\site-packages";
Py_SetPath(stdlib);
// Initialize Python
Py_Initialize();
When inside a Python cmd, the line import keras take some time (3sec) but works (a warning but I found no harm around it) :
>>> import keras
I:\LIBs\cpython354\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
>>>
I'm at loss now, I don't know where to look at since there is no stack trace.

it seems like when you import keras, it executes this line :
sys.stderr.write('Using TensorFlow backend.\n')
or sys.stderr was not defined in python embedded on windows
A simple correction is to define sys.stderr, for example :
import sys
class CatchOutErr:
def __init__(self):
self.value = ''
def write(self, txt):
self.value += txt
catchOutErr = CatchOutErr()
sys.stderr = catchOutErr

Related

Using glob to import txt files to an array for interpolation

Currently I am using data (wavelength, flux) in txt format and have six txt files. The wavelengths are the same but the fluxes are different. I have imported the txt files using pd.read_cvs (as can be seen in the code) and assigned each flux a different name. These different named fluxes are placed in an array. Finally, I interpolate the fluxes with a temperature array. The codes works and because currently I only have six files writing the code this way is ok. The problem I have moving forward is that when I have 100s of txt files I need a better method.
How can I use glob to import the txt files, assign a different name to each flux (if that is necessary) and finally interpolate? Any help would be appreciated. Thank you.
import pandas as pd
import numpy as np
from scipy import interpolate
fcf = 0.0000001 # flux conversion factor
wcf = 10 #wave conversion factor
temperature = np.array([725,750,775,800,825,850])
# import files and assign column headers; blank to ignore spaces
c1p = pd.read_csv("../c/725.txt",sep=" ",header=None)
c1p.columns = ["blank","0","blank","blank","1"]
c2p = pd.read_csv("../c/750.txt",sep=" ",header=None)
c2p.columns = ["blank","0","blank","blank","1"]
c3p = pd.read_csv("../c/775.txt",sep=" ",header=None)
c3p.columns = ["blank","0","blank","blank","1"]
c4p = pd.read_csv("../c/800.txt",sep=" ",header=None)
c4p.columns = ["blank","0","blank","blank","1"]
c5p = pd.read_csv("../c/825.txt",sep=" ",header=None)
c5p.columns = ["blank","0","blank","blank","1"]
c6p = pd.read_csv("../c/850.txt",sep=" ",header=None)
c6p.columns = ["blank","0","blank","blank","1"]
wave = np.array(c1p['0']/wcf)
c1fp = np.array(c1p['1']*fcf)
c2fp = np.array(c2p['1']*fcf)
c3fp = np.array(c3p['1']*fcf)
c4fp = np.array(c4p['1']*fcf)
c5fp = np.array(c5p['1']*fcf)
c6fp = np.array(c6p['1']*fcf)
cfp = np.array([c1fp,c2fp,c3fp,c4fp,c5fp,c6fp])
flux_int = interpolate.interp1d(temperature,cfp,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')
My attempts so far...I think I have loaded the files into a list using glob as
import pandas as pd
import numpy as np
from scipy import interpolate
import glob
c_list=[]
path = "../c/*.*"
for file in glob.glob(path):
print(file)
c = pd.read_csv(file,sep=" ",header=None)
c.columns = ["blank","0","blank","blank","1"]
c_list.append
I am still unsure how to extract just the fluxes into an array in order to interpolate. I will continue to post my attempts.
My updated code
fcf = 0.0000001
import pandas as pd
import numpy as np
from scipy import interpolate
import glob
c_list=[]
path = "../c/*.*"
for file in glob.glob(path):
print(file)
c = pd.read_csv(file,sep=" ",header=None)
c.columns = ["blank","0","blank","blank","1"]
c = c['1']*fcf
c_list.append(c)
fluxes = np.array(c_list)
temperature = np.array([7250,7500,7750,8000,8250,8500])
flux_int =interpolate.interp1d(temperature,fluxes,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')
When I run this code I get the following error
raise ValueError("x and y arrays must be equal in length along "
ValueError: x and y arrays must be equal in length along interpolation axis.
I think the error in the code that needs correcting is here fluxes = np.array(c_list). This is one list of all fluxes but I need a list of fluxes from each file. How is this done?
Final attempt
import pandas as pd
import numpy as np
from scipy import interpolate
import glob
c_list=[]
path = "../c/*.*"
for file in glob.glob(path):
print(file)
c = pd.read_csv(file,sep=" ",header=None)
c.columns = ["blank","0","blank","blank","1"]
c = c['1']* 0.0000001
c_list.append(c)
c1=np.array(c_list[0])
c2=np.array(c_list[1])
c3=np.array(c_list[2])
c4=np.array(c_list[3])
c5=np.array(c_list[4])
c6=np.array(c_list[5])
fluxes = np.array([c1,c2,c3,c4,c5,c6])
temperature = np.array([7250,7500,7750,8000,8250,8500])
flux_int = interpolate.interp1d(temperature,fluxes,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')
This code work but I am still not sure about
c1=np.array(c_list[0])
c2=np.array(c_list[1])
c3=np.array(c_list[2])
c4=np.array(c_list[3])
c5=np.array(c_list[4])
c6=np.array(c_list[5])
Is there a better way to write this?
Here's 2 things that you can tdo:
Instead of
c = c['1']* 0.0000001
try doing c = c['1'].to_numpy()* 0.0000001
This will build a list of numpy Arrays rather than a list of pandas Series
When constructing fluxes, you can just do
fluxes = np.array(c_list)

To convert Tif files into RGB(png/jpg) using python

I am using the code snap given below and its working without error but the converted file is not having .png extension as I am giving png in "OutputFormat".
I am running it in Colab and I am attaching the output also.
from osgeo import gdal
import numpy as np
import os
import subprocess
def _16bit_to_8Bit(inputRaster, outputRaster, outputPixType='Byte', outputFormat='png',
percentiles=[2, 98]):
#Convert 16bit image to 8bit
#Source: Medium.com, 'Creating Training Datasets for the SpaceNet Road Detection and Routing
#Challenge' by Adam Van Etten and Jake Shermeyer
srcRaster = gdal.Open(inputRaster)
cmd = ['gdal_translate', '-ot', outputPixType, '-of',
outputFormat]
# iterate through bands
for bandId in range(srcRaster.RasterCount):
bandId = bandId+1
band = srcRaster.GetRasterBand(bandId)
bmin = band.GetMinimum()
bmax = band.GetMaximum()
# if not exist minimum and maximum values
if bmin is None or bmax is None:
[enter image description here][1](bmin, bmax) = band.ComputeRasterMinMax(1)
# else, rescale
band_arr_tmp = band.ReadAsArray()
bmin = np.percentile(band_arr_tmp.flatten(),
percentiles[0])
bmax= np.percentile(band_arr_tmp.flatten(),
percentiles[1])
cmd.append('-scale_{}'.format(bandId))
cmd.append('{}'.format(bmin))
cmd.append('{}'.format(bmax))
cmd.append('{}'.format(0))
cmd.append('{}'.format(255))
cmd.append(inputRaster)
cmd.append(outputRaster)
print("Conversin command:", cmd)
subprocess.call(cmd)
path = "/content/drive/MyDrive/Spacenet_data/RGB_Pan/"
files = os.listdir(path)
for file in files:
resimPath = path+file
dstPath = "/content/drive/MyDrive/Spacenet_data/"
dstPath = dstPath+file
_16bit_to_8Bit(resimPath,dstPath)
My output is showing like this:
Conversin command: ['gdal_translate', '-ot', 'Byte', '-of', 'png', '-scale_1', '149.0', '863.0', '0', '255', '-scale_2', '244.0', '823.0200000000186', '0', '255', '-scale_3', '243.0', '568.0', '0', '255', '/content/drive/MyDrive/Spacenet_data/RGB_Pan/img0.tif', '/content/drive/MyDrive/Spacenet_data/img0.tif']
Make the below changes and you are done.
from osgeo import gdal
import numpy as np
import os
import subprocess
def _16bit_to_8Bit(inputRaster, outputRaster, outputPixType='Byte',
outputFormat='png', percentiles=[2, 98]):
srcRaster = gdal.Open(inputRaster)
cmd = ['gdal_translate', '-ot', outputPixType, '-of',
outputFormat]
for bandId in range(srcRaster.RasterCount):
bandId = bandId+1
band = srcRaster.GetRasterBand(bandId)
bmin = band.GetMinimum()
bmax = band.GetMaximum()
# if not exist minimum and maximum values
if bmin is None or bmax is None:
(bmin, bmax) = band.ComputeRasterMinMax(1)
# else, rescale
band_arr_tmp = band.ReadAsArray()
bmin = np.percentile(band_arr_tmp.flatten(),
percentiles[0])
bmax= np.percentile(band_arr_tmp.flatten(),
percentiles[1])
cmd.append('-scale_{}'.format(bandId))
cmd.append('{}'.format(bmin))
cmd.append('{}'.format(bmax))
cmd.append('{}'.format(0))
cmd.append('{}'.format(255))
cmd.append(inputRaster)
cmd.append(outputRaster)
print("Conversin command:", cmd)
subprocess.call(cmd)
path = "/content/drive/MyDrive/Spacenet_data/RGB_Pan/"
files = os.listdir(path)
for file in files:
resimPath = path+file
dstPath = "/content/drive/MyDrive/Spacenet_data/"
dstPath = dstPath+file[:-3]+"png"
_16bit_to_8Bit(resimPath,dstPath)
import os
import cv2
directory = os.fsencode(r"path")
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".tif"):
print(filename)
print(type(filename))
print("\n")
image = cv2.imread(filename)
cv2.imwrite("{}.jpg".format(filename), image)
continue
else:
continue

Python, face_recognition convert string to array

I want to convert a variable to a string and then to an array that I can use to compare, but i dont know how to do that.
my code:
import face_recognition
import numpy as np
a = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_10_32_24_Pro.jpg') # my picture 1
b = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_56_Pro.jpg') # my picture 2
c = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_52_Pro.jpg') # my picture 3
d = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\ziv sion.jpg') # my picture 4
e = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191120_17_46_40_Pro.jpg') # my picture 5
f = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191117_16_19_11_Pro.jpg') # my picture 6
a = face_recognition.face_encodings(a)[0]
b = face_recognition.face_encodings(b)[0]
c = face_recognition.face_encodings(c)[0]
d = face_recognition.face_encodings(d)[0]
e = face_recognition.face_encodings(e)[0]
f = face_recognition.face_encodings(f)[0]
Here I tried to convert the variable to a string
str_variable = str(a)
array_variable = np.array(str_variable)
my_face = a, b, c, d, e, f, array_variable
while True:
new = input('path: ')
print('Recognizing...')
unknown = face_recognition.load_image_file(new)
unknown_encodings = face_recognition.face_encodings(unknown)[0]
The program cannot use the variable:
results = face_recognition.compare_faces(array_variable, unknown_encodings, tolerance=0.4)
print(results)
recognize_times = int(results.count(True))
if (3 <= recognize_times):
print('hello boss!')
my_face = *my_face, unknown_encodings
please help me
The error shown:
Traceback (most recent call last):
File "C:/Users/zivsi/PycharmProjects/AI/pytt.py", line 37, in <module>
results = face_recognition.compare_faces(my_face, unknown_encodings, tolerance=0.4)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-
packages\face_recognition\api.py", line 222, in compare_faces
return list(face_distance(known_face_encodings, face_encoding_to_check) <= tolerance)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-packages\face_recognition\api.py", line 72, in face_distance
return np.linalg.norm(face_encodings - face_to_compare, axis=1)
ValueError: operands could not be broadcast together with shapes (7,) (128,)
First of all, the array_variable should actually be a list of the known encodings and not a numpy array.
Also you do not need str.
Now, in your case, if the input images i.e., a,b,c,d,f,e do NOT have the same dimensions, the error will persist. You can not compare images that have different sizes using this function. The reason is that the comparison is based on the distance and distance is defined on vectors of the same length.
Here is a working simple example using the photos from https://github.com/ageitgey/face_recognition/tree/master/examples:
import face_recognition
import numpy as np
from PIL import Image, ImageDraw
from IPython.display import display
# Load a sample picture and learn how to recognize it.
obama_image = face_recognition.load_image_file("obama.jpg")
obama_face_encoding = face_recognition.face_encodings(obama_image)[0]
# Load a second sample picture and learn how to recognize it.
biden_image = face_recognition.load_image_file("biden.jpg")
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
array_variable = [obama_face_encoding,biden_face_encoding] # list of known encodings
# compare the list with the biden_face_encoding
results = face_recognition.compare_faces(array_variable, biden_face_encoding, tolerance=0.4)
print(results)
[False, True] # True means match, False mismatch
# False: coming from obama_face_encoding VS biden_face_encoding
# True: coming from biden_face_encoding VS biden_face_encoding
To run it go here: https://beta.deepnote.com/project/09705740-31c0-4d9a-8890-269ff1c3dfaf#
Documentation: https://face-recognition.readthedocs.io/en/latest/face_recognition.html
EDIT
To save the known encodings you can use numpy.save
np.save('encodings',biden_face_encoding) # save
load_again = np.load('encodings.npy') # load again

TypeError: ufunc 'add' did not contain a loop

I use Anaconda and gdsCAD and get an error when all packages are installed correctly.
Like explained here: http://pythonhosted.org/gdsCAD/
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')
My imports look like this (In the end I imported everything):
import numpy as np
from gdsCAD import *
import matplotlib.pyplot as plt
My example code looks like this:
something = core.Elements()
box=shapes.Box( (5,5),(1,5),0.5)
core.default_layer = 1
core.default_colors = 2
something.add(box)
something.show()
My error message looks like this:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-2f90b960c1c1> in <module>()
31 puffer_wafer = shapes.Circle((0.,0.), puffer_wafer_radius, puffer_line_thickness)
32 bp.add(puffer_wafer)
---> 33 bp.show()
34 wafer = shapes.Circle((0.,0.), wafer_radius, wafer_line_thickness)
35 bp.add(wafer)
C:\Users\rpilz\AppData\Local\Continuum\Anaconda2\lib\site-packages\gdscad-0.4.5-py2.7.egg\gdsCAD\core.pyc in _show(self)
80 ax.margins(0.1)
81
---> 82 artists=self.artist()
83 for a in artists:
84 a.set_transform(a.get_transform() + ax.transData)
C:\Users\rpilz\AppData\Local\Continuum\Anaconda2\lib\site-packages\gdscad-0.4.5-py2.7.egg\gdsCAD\core.pyc in artist(self, color)
952 art=[]
953 for p in self:
--> 954 art+=p.artist()
955 return art
956
C:\Users\rpilz\AppData\Local\Continuum\Anaconda2\lib\site-packages\gdscad-0.4.5-py2.7.egg\gdsCAD\core.pyc in artist(self, color)
475 poly = lines.buffer(self.width/2.)
476
--> 477 return [descartes.PolygonPatch(poly, lw=0, **self._layer_properties(self.layer))]
478
479
C:\Users\rpilz\AppData\Local\Continuum\Anaconda2\lib\site-packages\gdscad-0.4.5-py2.7.egg\gdsCAD\core.pyc in _layer_properties(layer)
103 # Default colors from previous versions
104 colors = ['k', 'r', 'g', 'b', 'c', 'm', 'y']
--> 105 colors += matplotlib.cm.gist_ncar(np.linspace(0.98, 0, 15))
106 color = colors[layer % len(colors)]
107 return {'color': color}
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')
The gdsCAD has been a pain from shapely install to this plotting issue.
This issue is because of wrong datatype being passed to colors function. It can be solved by editing the following line in core.py
colors += matplotlib.cm.gist_ncar(np.linspace(0.98, 0, 15))
to
colors += list(matplotlib.cm.gist_ncar(np.linspace(0.98, 0, 15)))
If you dont know where the core.py is located. Just type in:
from gdsCAD import *
core
This will give you the path of core.py file. Good luck !
Well first, I'd ask that you please include actual code, as the 'example code' in the file is obviously different based on the traceback. When debugging, the details matter, and I need to be able to actually run the code.
You obviously have a data type problem. Chances are pretty good it's in the variables here:
puffer_wafer = shapes.Circle((0.,0.), puffer_wafer_radius, puffer_line_thickness)
I had the same error thrown when I was running a call to Pandas. I changed the data to str(data) and the code worked.
I don't know if this helps I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue as suggested by previous answer. I can't see from the example in the question exactly what you are trying to do. Below is a small example of my issue and solution. My code is making a simple Random Forest model using scikit learn.
Here is an example that will give the error and it is caused by the third to last line, concatenating the results to write to file.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
This leads to an error of;
Traceback (most recent call last):
File "min_example.py", line 40, in <module>
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')
The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
These examples are from a larger code so I hope the examples are clear enough.

Try statement in Cython for cimport (for use with mpi4py)

Is there a way to have the equivalent of the Python try statement in Cython for the cimport?
Something like that:
try:
cimport something
except ImportError:
pass
I would need this to write a Cython extension that can be compiled with or without mpi4py. This is very standard in compiled languages where the mpi commands can be put between #ifdef and #endif preprocessor directives. How can we obtain the same result in Cython?
I tried this but it does not work:
try:
from mpi4py import MPI
from mpi4py cimport MPI
from mpi4py.mpi_c cimport *
except ImportError:
rank = 0
nb_proc = 1
# solve a incompatibility between openmpi and mpi4py versions
cdef extern from 'mpi-compat.h': pass
does_it_work = 'Not yet'
Actually it works well if mpi4py is correctly installed but if
import mpi4py raises an ImportError, the Cython file does not
compile and I get the error:
Error compiling Cython file:
------------------------------------------------------------
...
try:
from mpi4py import MPI
from mpi4py cimport MPI
^
------------------------------------------------------------
mod.pyx:4:4: 'mpi4py.pxd' not found
The file setup.py:
from setuptools import setup, Extension
from Cython.Distutils import build_ext
import os
here = os.path.abspath(os.path.dirname(__file__))
include_dirs = [here]
try:
import mpi4py
except ImportError:
pass
else:
INCLUDE_MPI = '/usr/lib/openmpi/include'
include_dirs.extend([
INCLUDE_MPI,
mpi4py.get_include()])
name = 'mod'
ext = Extension(
name,
include_dirs=include_dirs,
sources=['mod.pyx'])
setup(name=name,
cmdclass={"build_ext": build_ext},
ext_modules=[ext])
Using a try-catch block in this way is something you won't be able to do.
The extension module you are making must be statically compiled and linked against the things it uses cimport to load at the C-level. A try-catch block is something that will be executed when the module is imported, not when it is compiled.
On the other hand, in theory, you should be able to get the effect you're looking for using Cython's support for conditional compilation.
In your setup.py file you can check to see if the needed modules are defined and then define environment variables to be passed to the Cython compiler that, in turn, depend on whether or not the needed modules are present.
There's an example of how to do this in one of Cython's tests.
There they pass a dictionary containing the desired environment variables to the constructor for Cython's Extension class as the keyword argument pyrex_compile_time_env, which has been renamed to cython_compile_time_env, and for Cython.Build.Dependencies.cythonize is called compile_time_env).
Thank you for your very useful answer #IanH. I include an example to show what it gives.
The file setup.py:
from setuptools import setup
from Cython.Distutils.extension import Extension
from Cython.Distutils import build_ext
import os
here = os.path.abspath(os.path.dirname(__file__))
import numpy as np
include_dirs = [here, np.get_include()]
try:
import mpi4py
except ImportError:
MPI4PY = False
else:
MPI4PY = True
INCLUDE_MPI = '/usr/lib/openmpi/include'
include_dirs.extend([
INCLUDE_MPI,
mpi4py.get_include()])
name = 'mod'
ext = Extension(
name,
include_dirs=include_dirs,
cython_compile_time_env={'MPI4PY': MPI4PY},
sources=['mod.pyx'])
setup(name=name,
cmdclass={"build_ext": build_ext},
ext_modules=[ext])
if not MPI4PY:
print('Warning: since importing mpi4py raises an ImportError,\n'
' the extensions are compiled without mpi and \n'
' will work only in sequencial.')
And the file mod.pyx, with a little bit of real mpi commands:
import numpy as np
cimport numpy as np
try:
from mpi4py import MPI
except ImportError:
nb_proc = 1
rank = 0
else:
comm = MPI.COMM_WORLD
nb_proc = comm.size
rank = comm.Get_rank()
IF MPI4PY:
from mpi4py cimport MPI
from mpi4py.mpi_c cimport *
# solve an incompatibility between openmpi and mpi4py versions
cdef extern from 'mpi-compat.h': pass
print('mpi4py ok')
ELSE:
print('no mpi4py')
n = 8
if n % nb_proc != 0:
raise ValueError('The number of processes is incorrect.')
if rank == 0:
data_seq = np.ones([n], dtype=np.int32)
s_seq = data_seq.sum()
else:
data_seq = np.zeros([n], dtype=np.int32)
if nb_proc > 1:
data_local = np.zeros([n/nb_proc], dtype=np.int32)
comm.Scatter(data_seq, data_local, root=0)
else:
data_local = data_seq
s = data_local.sum()
if nb_proc > 1:
s = comm.allreduce(s, op=MPI.SUM)
if rank == 0:
print('s: {}; s_seq: {}'.format(s, s_seq))
assert s == s_seq
Build with python setup.py build_ext --inplace and test with python -c "import mod" and mpirun -np 4 python -c "import mod". If mpi4py is not installed, one can still build the module and use it in sequential.

Resources