I am trying to add a bunch of school boundary files into the database. The boundary files are inconsistent. They are processed by DataSource as either Polygon, MultiPolygon, or GeometryCollection.
Converting Polygon into MultiPolygon is fairly simple using, but the conversion does not work for GeometryCollection.
class School(models.Model):
boundaries = models.MultiPolygonField()
from django.contrib.gis.geos import Polygon, MultiPolygon
from django.contrib.gis.geos.collections import GeometryCollection
ds = DataSource('school_boundaries.aspx')
feature = ds[0][0]
geom_geos = feature.geom.geos
if isinstance(geom_geos, Polygon):
geom_geos = MultiPolygon(geom_geos)
elif isinstance(geom_geos, GeometryCollection):
geom_geos = MultiPolygon(GeometryCollection) #This does not work
school = School(boundaries = geom_geos)
Is there some way to convert GeometryField to MultiPolygon in GeoDjango?

I figured out a good solution. This only works if the GeometryCollection is an array of Polygons. In my case I just had to loop over the Polygons in the GeometryCollection, append each of them to a list, and create a MultiPolygon from the list of Polygons.
from django.contrib.gis.geos import MultiPolygon
from django.contrib.gis.geos.collections import GeometryCollection
ds = DataSource('school_boundaries.aspx')
feature = ds[0][0]
geom_geos = feature.geom.geos
if isinstance(geom_geos, GeometryCollection):
poly_list = []
for poly in geom_geos[0]:
geom_geos = MultiPolygon(poly_list)
school = School(boundaries = geom_geos)


Pytorch Dataloader for Image GT dataset

I am new to pytorch. I am trying to create a DataLoader for a dataset of images where each image got a corresponding ground truth (same name):
When I use the path for root folder (that contains RGB and GT folders) as input for the torchvision.datasets.ImageFolder it reads all of the images as if they were all intended for input (classified as RGB and GT), and it seems like there is no way to pair the RGB-GT images. I would like to pair the RGB-GT images, shuffle, and divide it to batches of defined size. How can it be done? Any advice will be appreciated.
I think, the good starting point is to use VisionDataset class as a base. What we are going to use here is: DatasetFolder source code. So, we going to create smth similar. You can notice this class depends on two other functions from datasets.folder module: default_loader and make_dataset.
We are not going to modify default_loader, because it's already fine, it just helps us to load images, so we will import it.
But we need a new make_dataset function, that prepared the right pairs of images from root folder. Since original make_dataset pairs images (image paths if to be more precisely) and their root folder as target class (class index) and we have a list of (path, class_to_idx[target]) pairs, but we need (rgb_path, gt_path). Here is the code for new make_dataset:
def make_dataset(root: str) -> list:
"""Reads a directory with data.
Returns a dataset as a list of tuples of paired image paths: (rgb_path, gt_path)
dataset = []
# Our dir names
rgb_dir = 'RGB'
gt_dir = 'GT'
# Get all the filenames from RGB folder
rgb_fnames = sorted(os.listdir(os.path.join(root, rgb_dir)))
# Compare file names from GT folder to file names from RGB:
for gt_fname in sorted(os.listdir(os.path.join(root, gt_dir))):
if gt_fname in rgb_fnames:
# if we have a match - create pair of full path to the corresponding images
rgb_path = os.path.join(root, rgb_dir, gt_fname)
gt_path = os.path.join(root, gt_dir, gt_fname)
item = (rgb_path, gt_path)
# append to the list dataset
return dataset
What do we have now? Let's compare our function with original one:
from torchvision.datasets.folder import make_dataset as make_dataset_original
dataset_original = make_dataset_original(root, {'RGB': 0, 'GT': 1}, extensions='png')
dataset = make_dataset(root)
print('Original make_dataset:')
print(*dataset_original, sep='\n')
print('Our make_dataset:')
print(*dataset, sep='\n')
Original make_dataset:
('./data/GT/img1.png', 1)
('./data/GT/img2.png', 1)
('./data/RGB/img1.png', 0)
('./data/RGB/img2.png', 0)
Our make_dataset:
('./data/RGB/img1.png', './data/GT/img1.png')
('./data/RGB/img2.png', './data/GT/img2.png')
I think it works great) It's time to create our class Dataset. The most important part here is __getitem__ methods, because it imports images, applies transformation and returns a tensors, that can be used by dataloaders. We need to read a pair of images (rgb and gt) and return a tuple of 2 tensor images:
from torchvision.datasets.folder import default_loader
from import VisionDataset
class CustomVisionDataset(VisionDataset):
def __init__(self,
# Prepare dataset
samples = make_dataset(self.root)
self.loader = loader
self.samples = samples
# list of RGB images
self.rgb_samples = [s[1] for s in samples]
# list of GT images
self.gt_samples = [s[1] for s in samples]
def __getitem__(self, index):
"""Returns a data sample from our dataset.
# getting our paths to images
rgb_path, gt_path = self.samples[index]
# import each image using loader (by default it's PIL)
rgb_sample = self.loader(rgb_path)
gt_sample = self.loader(gt_path)
# here goes tranforms if needed
# maybe we need different tranforms for each type of image
if self.transform is not None:
rgb_sample = self.transform(rgb_sample)
if self.target_transform is not None:
gt_sample = self.target_transform(gt_sample)
# now we return the right imported pair of images (tensors)
return rgb_sample, gt_sample
def __len__(self):
return len(self.samples)
Let's test it:
from import DataLoader
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
bs=4 # batch size
transforms = ToTensor() # we need this to convert PIL images to Tensor
shuffle = True
dataset = CustomVisionDataset('./data', rgb_transform=transforms, gt_transform=transforms)
dataloader = DataLoader(dataset, batch_size=bs, shuffle=shuffle)
for i, (rgb, gt) in enumerate(dataloader):
print(f'batch {i+1}:')
# some plots
for i in range(bs):
plt.figure(figsize=(10, 5))
plt.imshow(rgb[i].squeeze().permute(1, 2, 0))
plt.title(f'RGB img{i+1}')
plt.imshow(gt[i].squeeze().permute(1, 2, 0))
plt.title(f'GT img{i+1}')
batch 1:
Here you can find a notebook with code and simple dummy dataset.

Tfidf with a custom list

I have a list of raw strings that look like this;
listtocheck = ['fadsfsfgblahsdfgsfg','adfaghelloggfg','gagfghellosdfhere','blahsgsdfgsdfhellohsdfhgshstring']
and I want to perform TfIdf with these and a list of items I have in a list (not itself).
mylist = ['blah','hello','here','string']
This list I am vectorising as such;
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer = 'char_wb', ngram_range=(2,3))
listvec = tf.fit_transform(mylist)
This gives me the tfidf of the things in mylist. What I would like to be able to go is to check the number of times that the ngrams from mylist appear in each item of listtocheck and then perform TfIdf based on the total number times that ngram appears in all of the strings in listtocheck
In order to achieve this I had to first .fit() on mylist but then .transform() on listtocheck.
Here is the code I used in the end:
from sklearn.feature_extraction.text import TfidfVectorizer
def create_vec(listtocheck,mylist):
tf = TfidfVectorizer(analyzer = 'char_wb',ngram_range=(2,3))
X = tf.transform(listtocheck)
return X
vecs = create_vec(listtocheck, mylist)

Keras input shape error - passing the whole array not each line

I am loading images from a csv file. The images are 300 x 300 pixels but flattened to 90000. I am getting an error for input shape. I am using tensorflow back end. I have attached an image of my csv file as well as an image of the error. It looks like its passing the whole list of arrays instead of passing each line.
"ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 380 arrays:[array([ 43., 45., 46., ..., 161., 152., 146.]), array([ 211., 222., 224., ..., 212., 213., 213.]), array([ 201., 201., "
csv file
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
import csv
import cv2
import re
loaded_images_train = []
loaded_labels_train = []
loaded_images_test = []
loaded_labels_test = []
with open('images_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = np.asarray(row, dtype='float')
with open('labels_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = str(row)
row = row.strip(',')
with open('images_test.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = np.asarray(row, dtype='float')
with open('labels_test.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = str(row)
row = row.strip(',')
# load data
x_train = loaded_images_train
y_train = loaded_labels_train
print("Loaded Training Data")
x_test = loaded_images_test
y_test = loaded_labels_test
print("Loaded Testing Data")
model = Sequential()
model.add(Dense(64, input_shape=(90000,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
metrics=['accuracy']), y_train,
#score = model.evaluate(x_test, y_test, batch_size=128)
The way you are converting each line with asarray and then feeding keras with a list of arrays is not working.
I've tested your code with a sightly different approach and it did run flawlessly for me with the csv you provided in the comments (changing input_size to 400).
Read all lines from the file to loaded_images_train. It will be a list of lists:
input_size = 90000
with open('images_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
assert len(row) == input_size
I've included the assertion following your feedback to my comment.
You can also assert len(row) == output_size for the labels.
On the other hand, if you are pretty sure about the sizes of the rows, you can substitute the loop by a simple:
loaded_images_train = list(csvReader)
Whichever you choose, do the same to test images.
Then do the conversion to numpy.ndarray when declaring x_train:
x_train = np.asarray(loaded_images_train, dtype=float) # you don't really need the quotes here
Finally, printing the shape of the loaded data can help you know that everything is ok. For example:
print("Loaded Training Data", x_train.shape)
The reason why you met the problem is the type of your dataset is list, but the acceptable type for Keras model is only numpy array.
You need to convert the lists to numpy array with np.asarray(loaded_images_train) and make sure the shape of the data is (n,90000).

Non conformable array error when using rpart with rpy2

I'm using rpart with rpy2 (version 2.8.6) on python 3.5, and want to train a decision tree for classification. My code snippet looks like this:
import rpy2.robjects.packages as rpackages
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri
from rpy2.robjects import DataFrame, Formula
rpart = importr('rpart')
dataf = DataFrame({'responsev': owner_train_label,
'predictorv': owner_train_data})
formula = Formula('responsev ~.')
clf = rpart.rpart(formula = formula, data = dataf, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))
where owner_train_label is a numpy float64 array of shape (12610,) and
owner_train_data is a numpy float64 array of shape (12610,88)
This is the error I'm getting when I run the last line of code to fit the data.
RRuntimeError: Error in ((xmiss %*% rep(1, ncol(xmiss))) < ncol(xmiss)) & !ymiss :
non-conformable arrays
I get that it is telling me they are non-conformable arrays but I don't know why as for the same training data, I can train using sklearn's Decision tree successfully.
Thanks for your help.
I got around this by creating the dataframe using pandas and passing the panadas dataframe to rpart using rpy2's pandas2ri to convert it to R's dataframe.
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import Formula
rpart = importr('rpart')
df = pd.DataFrame(data = owner_train_data)
df['l'] = owner_train_label
formula = Formula('l ~.')
clf = rpart.rpart(formula = formula, data = df, method = "class", control=rpart.rpart_control(minsplit = 10, xval = 10))

Bokeh MultiSelect plotting in infinite loop, distorting plot

I'm trying to plotting multiple lines into a graph based on a user's "MultiSelect" options. I read in two separate excel files of data and and plot their axis based on the user's request. I'm using Python 3.5 and running on a MAC.
1). As soon as I make a multiselection the figure gets distorted
2). It seems the plot is running in an infinite loop.
3). The plot doses not properly update when user changes selections. It just adds more plots without removing the previous plot.
from os.path import dirname, join
from pandas import *
import numpy as np
import as psql
import sqlite3 as sql
import sys, os
from bokeh.plotting import figure
from bokeh.layouts import layout, widgetbox
from bokeh.models import ColumnDataSource, HoverTool, Div
from bokeh.models.widgets import Slider, Select, TextInput, MultiSelect
from import curdoc
import matplotlib.pyplot as plt
files = list()
path = os.getcwd()
for x in os.listdir(path):
if x.endswith(".xlsx"):
if x != 'template.xlsx' :
axis_map = {
"0% void": "0% void",
"40% void": "40% void",
"70% void": "70% void",
files_list = MultiSelect(title="Files", value=["dummy2.xlsx"],
options=open(join(dirname(__file__), 'files.txt')).read().split())
voids = MultiSelect(title="At what void[s]", value=["0% void"], options=sorted(axis_map.keys()))
p = figure(plot_height=600, plot_width=700, title="", toolbar_location=None)
pline = figure(plot_height=600, plot_width=700, title="")
path = os.getcwd()
data_dict = {}
for file in os.listdir(path):
if file.endswith(".xlsx"):
xls = ExcelFile(file)
df = xls.parse(xls.sheet_names[0])
data = df.to_dict()
data_dict[file] = data
# converting dictionary to dataframe
newdict = {(k1, k2):v2 for k1,v1 in data_dict.items() \
for k2,v2 in data_dict[k1].items()}
xxs = DataFrame([newdict[i] for i in sorted(newdict)],
index=MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
master_data = xxs.transpose()
def select_data():
for vals in files_list.value:
for vox in voids.value:
pline.line(x=master_data[vals]['Burnup'], y= master_data[vals][vox])[vals]['Burnup'], y= master_data[vals][vox])
def update():
controls = [ files_list, voids]
for control in controls:
control.on_change('value', lambda attr, old, new: update())
sizing_mode = 'fixed' # 'scale_width' also looks nice with this example
inputs = widgetbox(*controls, sizing_mode=sizing_mode)
l = layout([
[inputs, pline],
], sizing_mode=sizing_mode)
curdoc().title = "Calculations"
I am not 100% certain, since the code above is not self-contained and cannot be run and investigated, but there are some issues (as of Bokeh 0.12.4) with adding new components to documents being problematic in some situations. These issues are high on the priority list for the next two point releases.
Are the data sizes reasonable such that you could create all the combinations up front? If so, I would recommend doing that, and then having the multi-select values toggle the visibility on/off appropriately. E.g., here's a similar example using a checkbox:
import numpy as np
from import curdoc
from bokeh.layouts import row
from bokeh.palettes import Viridis3
from bokeh.plotting import figure
from bokeh.models import CheckboxGroup
p = figure()
props = dict(line_width=4, line_alpha=0.7)
x = np.linspace(0, 4 * np.pi, 100)
l0 = p.line(x, np.sin(x), color=Viridis3[0], legend="Line 0", **props)
l1 = p.line(x, 4 * np.cos(x), color=Viridis3[1], legend="Line 1", **props)
l2 = p.line(x, np.tan(x), color=Viridis3[2], legend="Line 2", **props)
checkbox = CheckboxGroup(labels=["Line 0", "Line 1", "Line 2"], active=[0, 1, 2], width=100)
def update(attr, old, new):
l0.visible = 0 in
l1.visible = 1 in
l2.visible = 2 in
checkbox.on_change('active', update)
layout = row(checkbox, p)
If the data sizes are not such that you can create all the combinations up front, then I would suggest making an issue on the project issue tracker that has a complete, minimal, self-contained, runnable as-is code to reproduce the problem (i.e. generates random or synthetic data but it otherwise identical). This it the number one thing that would help the core devs address the issue more promptly.
#bigreddot Thanks for your response.
I edited the code to now make it self contained.
1). The plot does not reset. The new selected plots over the previous plot.
2). When the user makes multiple selections (ctrl+shift) the plot axis gets distorted and it seems to be running in an infinite loop
from pandas import *
import numpy as np
import sys, os
from bokeh.plotting import figure
from bokeh.layouts import layout, widgetbox
from bokeh.models.widgets import MultiSelect
from import curdoc
from bokeh.plotting import reset_output
import math
axis_map = {
"y1": "y3",
"y2": "y2",
"y3": "y1",
x1 = np.linspace(0,20,62)
y1 = [1.26 * math.cos(x) for x in np.linspace(-1,1,62) ]
y2 = [1.26 * math.cos(x) for x in np.linspace(-0.95,.95,62) ]
y3 = [1.26 * math.cos(x) for x in np.linspace(-.9,.90,62) ]
TOOLS = "pan,wheel_zoom,box_zoom,reset,save,hover"
vars = MultiSelect(title="At what void[s]", value=["y1"], options=sorted(axis_map.keys()))
master_data = { 'rate' : x1,
'y1' : y1,
'y2' : y2,
'y3' : y3
p = figure(plot_height=600, plot_width=700, title="", toolbar_location=None)
pline = figure(plot_height=600, plot_width=700, title="", tools=TOOLS)
def select_data():
for vox in vars.value:
pline.line(x=master_data['rate'], y= master_data[vox], line_width=2)['rate'], y=master_data[vox], line_width=2)
controls = [ vars]
for control in controls:
control.on_change('value', lambda attr, old, new: select_data())
sizing_mode = 'fixed'
inputs = widgetbox(*controls)
l = layout([
[inputs, pline],
curdoc().title = "Plot"
