Matplotlib scatterplot subplot legends overwrite one another - loops

I have a scatterplot figure with subplots generated using a for loop. Within the figure, I am trying to create a single legend but each time a subplot and legend is rendered the legend is overwritten by the next subplot, so the figure that is generated contains a single legend pertaining only to the last subplot. I would like the legend to pertain to all subplots (i.e., it should include years 2019, 2020, 2021 and 2022). Here is my code, please let me know how I can tweak it.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches
df = pd.read_excel(path)
spp = df.SPP.unique()
fig, axs = plt.subplots(nrows=8, ncols=4, figsize=(14, 14))
for spp_i, ax in zip(spp, axs.flat):
df_1 = df[df['SPP'] == spp_i]
labels = list(df_1.Year.unique())
x = df_1['Length_mm']
y = df_1['Weight_g']
levels, categories = pd.factorize(df_1['Year'])
colors = [plt.cm.tab10(i) for i in levels]
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
ax.scatter(x, y, c=colors)
plt.legend(handles=handles)
plt.savefig('Test.png', bbox_inches='tight', pad_inches=0.1, dpi=600)
Here is figure, as you can see the legend in the bottom right is for the last subplot only.
enter image description here

Creating this type of plots is quite cumbersome with standard matplotlib. Seaborn automates a lot of the steps.
In this case, sns.relplot(...) can be used. If you don't want all the subplots to have the same x and/or y ranges, you can add facet_kws={'sharex': False, 'sharey': False}).
The size of the individual subplots is controlled via height=, while the width will be calculated as the height multiplied by the aspect. col_wrap= tells how many columns of subplots will be put before starting a new row.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
spp_list = ["Aeloria", "Baelun", "Caelondia", "Draeden", "Eldrida", "Faerun", "Gorandor", "Haldira", "Ilysium",
"Jordheim", "Kaltara", "Lorlandia", "Myridia", "Nirathia", "Oakenfort"]
df = pd.DataFrame({'SPP': np.repeat(spp_list, 100),
'Year': np.tile(np.repeat(np.arange(2019, 2023), 25), 15),
'Length_mm': np.abs(np.random.randn(1500).cumsum()) + 10,
'Weight_g': np.abs(np.random.randn(1500).cumsum()) + 20})
g = sns.relplot(df, x='Length_mm', y='Weight_g', col='SPP', col_order=spp_list,
hue='Year', palette='turbo',
height=3, aspect=1.5, col_wrap=6,
facet_kws={'sharex': False, 'sharey': False})
g.set_axis_labels(x_var='Length (mm)', y_var='Weight (g)', clear_inner=True)
g.fig.tight_layout() # nicely fit supblots with their titles, labels and ticks
g.fig.subplots_adjust(right=0.97) # space for the legend after fitting the subplots
plt.show()

Related

Pytorch Dataloader for Image GT dataset

I am new to pytorch. I am trying to create a DataLoader for a dataset of images where each image got a corresponding ground truth (same name):
root:
--->RGB:
------>img1.png
------>img2.png
------>...
------>imgN.png
--->GT:
------>img1.png
------>img2.png
------>...
------>imgN.png
When I use the path for root folder (that contains RGB and GT folders) as input for the torchvision.datasets.ImageFolder it reads all of the images as if they were all intended for input (classified as RGB and GT), and it seems like there is no way to pair the RGB-GT images. I would like to pair the RGB-GT images, shuffle, and divide it to batches of defined size. How can it be done? Any advice will be appreciated.
Thanks.
I think, the good starting point is to use VisionDataset class as a base. What we are going to use here is: DatasetFolder source code. So, we going to create smth similar. You can notice this class depends on two other functions from datasets.folder module: default_loader and make_dataset.
We are not going to modify default_loader, because it's already fine, it just helps us to load images, so we will import it.
But we need a new make_dataset function, that prepared the right pairs of images from root folder. Since original make_dataset pairs images (image paths if to be more precisely) and their root folder as target class (class index) and we have a list of (path, class_to_idx[target]) pairs, but we need (rgb_path, gt_path). Here is the code for new make_dataset:
def make_dataset(root: str) -> list:
"""Reads a directory with data.
Returns a dataset as a list of tuples of paired image paths: (rgb_path, gt_path)
"""
dataset = []
# Our dir names
rgb_dir = 'RGB'
gt_dir = 'GT'
# Get all the filenames from RGB folder
rgb_fnames = sorted(os.listdir(os.path.join(root, rgb_dir)))
# Compare file names from GT folder to file names from RGB:
for gt_fname in sorted(os.listdir(os.path.join(root, gt_dir))):
if gt_fname in rgb_fnames:
# if we have a match - create pair of full path to the corresponding images
rgb_path = os.path.join(root, rgb_dir, gt_fname)
gt_path = os.path.join(root, gt_dir, gt_fname)
item = (rgb_path, gt_path)
# append to the list dataset
dataset.append(item)
else:
continue
return dataset
What do we have now? Let's compare our function with original one:
from torchvision.datasets.folder import make_dataset as make_dataset_original
dataset_original = make_dataset_original(root, {'RGB': 0, 'GT': 1}, extensions='png')
dataset = make_dataset(root)
print('Original make_dataset:')
print(*dataset_original, sep='\n')
print('Our make_dataset:')
print(*dataset, sep='\n')
Original make_dataset:
('./data/GT/img1.png', 1)
('./data/GT/img2.png', 1)
...
('./data/RGB/img1.png', 0)
('./data/RGB/img2.png', 0)
...
Our make_dataset:
('./data/RGB/img1.png', './data/GT/img1.png')
('./data/RGB/img2.png', './data/GT/img2.png')
...
I think it works great) It's time to create our class Dataset. The most important part here is __getitem__ methods, because it imports images, applies transformation and returns a tensors, that can be used by dataloaders. We need to read a pair of images (rgb and gt) and return a tuple of 2 tensor images:
from torchvision.datasets.folder import default_loader
from torchvision.datasets.vision import VisionDataset
class CustomVisionDataset(VisionDataset):
def __init__(self,
root,
loader=default_loader,
rgb_transform=None,
gt_transform=None):
super().__init__(root,
transform=rgb_transform,
target_transform=gt_transform)
# Prepare dataset
samples = make_dataset(self.root)
self.loader = loader
self.samples = samples
# list of RGB images
self.rgb_samples = [s[1] for s in samples]
# list of GT images
self.gt_samples = [s[1] for s in samples]
def __getitem__(self, index):
"""Returns a data sample from our dataset.
"""
# getting our paths to images
rgb_path, gt_path = self.samples[index]
# import each image using loader (by default it's PIL)
rgb_sample = self.loader(rgb_path)
gt_sample = self.loader(gt_path)
# here goes tranforms if needed
# maybe we need different tranforms for each type of image
if self.transform is not None:
rgb_sample = self.transform(rgb_sample)
if self.target_transform is not None:
gt_sample = self.target_transform(gt_sample)
# now we return the right imported pair of images (tensors)
return rgb_sample, gt_sample
def __len__(self):
return len(self.samples)
Let's test it:
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
bs=4 # batch size
transforms = ToTensor() # we need this to convert PIL images to Tensor
shuffle = True
dataset = CustomVisionDataset('./data', rgb_transform=transforms, gt_transform=transforms)
dataloader = DataLoader(dataset, batch_size=bs, shuffle=shuffle)
for i, (rgb, gt) in enumerate(dataloader):
print(f'batch {i+1}:')
# some plots
for i in range(bs):
plt.figure(figsize=(10, 5))
plt.subplot(221)
plt.imshow(rgb[i].squeeze().permute(1, 2, 0))
plt.title(f'RGB img{i+1}')
plt.subplot(222)
plt.imshow(gt[i].squeeze().permute(1, 2, 0))
plt.title(f'GT img{i+1}')
plt.show()
Out:
batch 1:
...
Here you can find a notebook with code and simple dummy dataset.

multiple matplotlib chart using loop

I need to draw multiple matploblib chart using a for loop.
I have a data from with multiple column of data points and a time series 'year'. I need to create chart for each column.
I have the following code:
df=pd.DataFrame({'Time':['2014','2015','2016','2017','2018','2019'],
'A':[1,8,3,10,5,6],
'B':[2,3,5,2,3,5],
'C':[7,4,12,11,8,1],
'D':[3,4,2,2,7,7]})
x_pos=range(len(df['Time']))
m,c = np.polyfit(x_pos,df['A'],1)
plt.scatter(x=x_pos,y='A',data=df)
plt.plot(x_pos,m*x_pos+c,'--r')
any help is appreciated
I was able to figure it out. I used the matplotlib.gridspec to achieve this.
Following is the solution:
df=pd.DataFrame({'Time':['2014','2015','2016','2017','2018','2019'],
'A':[1,8,3,10,5,6],
'B':[2,3,5,2,3,5],
'C':[7,4,12,11,8,1],
'D':[3,4,2,2,7,7]})
#import gridspec to fit subplots
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(10,10))
# set grid size 2*2
ggpec = gridspec.GridSpec(2, 2)
getaxs = []
datacolumns = list(df[['A','B','C','D']])
for i,j in zip(datacolumns,range(1,len(datacolumns)+1)):
xpos=range(len(df['Time']))
getaxs.append(fig.add_subplot(ggpec[j - 1]))
m,c = np.polyfit(x_pos,df['A'],1)
getaxs[-1].scatter(x=xpos,y=i,data=df)
getaxs[-1].plot(x_pos,m*x_pos+c,'--r')
plt.show()
The final result will be something like following:

How can I hide markers and markerclusters outside a specific zoom level in folium?

Is it possible to hide a marker and markercluster on folium map in some specific zoom level?
My code needs to react to zoom change and decide what points I want to share and register/deregister them from the map.
I know that it is possible to do it with Leaflet using get.Zoom() and zoomend. As folium uses map from Leaflet I guess that is also possible to do it with folium, but I am not sure how to do it yet.
This is what I have so far (any idea on how to improve my code and make it "smarter" is also appreciated, I am just a beginner in Python):
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
import seaborn
import folium
import mplleaflet
import os
import json
from folium import plugins
from folium.plugins import MarkerCluster
from folium import FeatureGroup, LayerControl, Map, Marker
df = pd.read_csv(r'Pakistan.csv')
data = df[['Latitude', 'Longitude']].values.tolist()
x = list(df['Latitude'])
y = list(df['Longitude'])
ID = list(df['S'])
latmean = df['Latitude'].mean()
lonmean = df['Longitude'].mean()
m = folium.Map(location=[latmean, lonmean], zoom_start= 10, zoom_control=True)
folium.TileLayer('openstreetmap').add_to(m)
folium.TileLayer('Stamen Terrain').add_to(m)
#Vega data
vis1 = os.path.join('data', 'vis1.json')
#Geojson Data
overlay = os.path.join('data', 'overlay.json')
#Distrital
fgDistrital = FeatureGroup(name='Distrital', control=True)
my_Circle1 = MarkerCluster().add_to(fgDistrital)
for i in range (1,4):
folium.Marker(location=[x[i], y[i]], popup=str("Distrital")).add_to(my_Circle1)
#Polo
fgPolo = FeatureGroup(name = 'Polo', show=False)
my_Circle2 = MarkerCluster().add_to(fgPolo)
for i in range (5,8):
folium.Marker(location=[x[i], y[i]], popup=folium.Popup(str("Polo"), max_width=450, show=True).add_child(folium.Vega(json.load(open(vis1)), width=450, height=250))).add_to(my_Circle2)
#Rota
fgRota = FeatureGroup(name='Rota', control=True)
my_Circle3 = MarkerCluster().add_to(fgRota)
for i in range (9,20):
folium.Marker(location=[x[i], y[i]], popup=str("Rota")).add_to(my_Circle3)
m.add_child(fgDistrital)
m.add_child(fgPolo)
m.add_child(fgRota)
folium.GeoJson(overlay, name = 'vis1').add_to(m)
folium.LayerControl(collapsed=True).add_to(m)
m.save('example.html')

array - list format input

I have the following question: how can I change the format of curve2 (list). I want something similar to curve
curve = [0.0556, 0.0563]
curve2 = [[0.0159, 0.0178]]
Context: I´d like to apply a certain code, but I don´t get the result I expect since the input has different format
My code is something like:
import pandas as pd
import numpy as np
curve = [0.0556, 0.0563]
curve2 = [[0.0159, 0.0178]]
df= pd.DataFrame()
def SUM (curve):
df['COl1'] = curve
return df
print(SUM(curve))
PD: curve2 is a row extracted from an array (as a list):
[[ 0.01593353 0.01783041]
[ 0.00917833 0.00593893]
[ 0.00829569 0.02123637]
[-0.03057529 -0.04138836]
[ 0.05212978 0.03239212]]

Bokeh MultiSelect plotting in infinite loop, distorting plot

I'm trying to plotting multiple lines into a graph based on a user's "MultiSelect" options. I read in two separate excel files of data and and plot their axis based on the user's request. I'm using Python 3.5 and running on a MAC.
1). As soon as I make a multiselection the figure gets distorted
2). It seems the plot is running in an infinite loop.
3). The plot doses not properly update when user changes selections. It just adds more plots without removing the previous plot.
from os.path import dirname, join
from pandas import *
import numpy as np
import pandas.io.sql as psql
import sqlite3 as sql
import sys, os
from bokeh.plotting import figure
from bokeh.layouts import layout, widgetbox
from bokeh.models import ColumnDataSource, HoverTool, Div
from bokeh.models.widgets import Slider, Select, TextInput, MultiSelect
from bokeh.io import curdoc
import matplotlib.pyplot as plt
files = list()
path = os.getcwd()
for x in os.listdir(path):
if x.endswith(".xlsx"):
if x != 'template.xlsx' :
files.append(x)
axis_map = {
"0% void": "0% void",
"40% void": "40% void",
"70% void": "70% void",
}
files_list = MultiSelect(title="Files", value=["dummy2.xlsx"],
options=open(join(dirname(__file__), 'files.txt')).read().split())
voids = MultiSelect(title="At what void[s]", value=["0% void"], options=sorted(axis_map.keys()))
p = figure(plot_height=600, plot_width=700, title="", toolbar_location=None)
pline = figure(plot_height=600, plot_width=700, title="")
path = os.getcwd()
data_dict = {}
for file in os.listdir(path):
if file.endswith(".xlsx"):
xls = ExcelFile(file)
df = xls.parse(xls.sheet_names[0])
data = df.to_dict()
data_dict[file] = data
# converting dictionary to dataframe
newdict = {(k1, k2):v2 for k1,v1 in data_dict.items() \
for k2,v2 in data_dict[k1].items()}
xxs = DataFrame([newdict[i] for i in sorted(newdict)],
index=MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
master_data = xxs.transpose()
def select_data():
for vals in files_list.value:
for vox in voids.value:
pline.line(x=master_data[vals]['Burnup'], y= master_data[vals][vox])
pline.circle(x=master_data[vals]['Burnup'], y= master_data[vals][vox])
return
def update():
select_data()
controls = [ files_list, voids]
for control in controls:
control.on_change('value', lambda attr, old, new: update())
sizing_mode = 'fixed' # 'scale_width' also looks nice with this example
inputs = widgetbox(*controls, sizing_mode=sizing_mode)
l = layout([
[inputs, pline],
], sizing_mode=sizing_mode)
update()
curdoc().add_root(l)
curdoc().title = "Calculations"
I am not 100% certain, since the code above is not self-contained and cannot be run and investigated, but there are some issues (as of Bokeh 0.12.4) with adding new components to documents being problematic in some situations. These issues are high on the priority list for the next two point releases.
Are the data sizes reasonable such that you could create all the combinations up front? If so, I would recommend doing that, and then having the multi-select values toggle the visibility on/off appropriately. E.g., here's a similar example using a checkbox:
import numpy as np
from bokeh.io import curdoc
from bokeh.layouts import row
from bokeh.palettes import Viridis3
from bokeh.plotting import figure
from bokeh.models import CheckboxGroup
p = figure()
props = dict(line_width=4, line_alpha=0.7)
x = np.linspace(0, 4 * np.pi, 100)
l0 = p.line(x, np.sin(x), color=Viridis3[0], legend="Line 0", **props)
l1 = p.line(x, 4 * np.cos(x), color=Viridis3[1], legend="Line 1", **props)
l2 = p.line(x, np.tan(x), color=Viridis3[2], legend="Line 2", **props)
checkbox = CheckboxGroup(labels=["Line 0", "Line 1", "Line 2"], active=[0, 1, 2], width=100)
def update(attr, old, new):
l0.visible = 0 in checkbox.active
l1.visible = 1 in checkbox.active
l2.visible = 2 in checkbox.active
checkbox.on_change('active', update)
layout = row(checkbox, p)
curdoc().add_root(layout)
If the data sizes are not such that you can create all the combinations up front, then I would suggest making an issue on the project issue trackerhttps://github.com/bokeh/bokeh/issues) that has a complete, minimal, self-contained, runnable as-is code to reproduce the problem (i.e. generates random or synthetic data but it otherwise identical). This it the number one thing that would help the core devs address the issue more promptly.
#bigreddot Thanks for your response.
I edited the code to now make it self contained.
1). The plot does not reset. The new selected plots over the previous plot.
2). When the user makes multiple selections (ctrl+shift) the plot axis gets distorted and it seems to be running in an infinite loop
from pandas import *
import numpy as np
import sys, os
from bokeh.plotting import figure
from bokeh.layouts import layout, widgetbox
from bokeh.models.widgets import MultiSelect
from bokeh.io import curdoc
from bokeh.plotting import reset_output
import math
axis_map = {
"y1": "y3",
"y2": "y2",
"y3": "y1",
}
x1 = np.linspace(0,20,62)
y1 = [1.26 * math.cos(x) for x in np.linspace(-1,1,62) ]
y2 = [1.26 * math.cos(x) for x in np.linspace(-0.95,.95,62) ]
y3 = [1.26 * math.cos(x) for x in np.linspace(-.9,.90,62) ]
TOOLS = "pan,wheel_zoom,box_zoom,reset,save,hover"
vars = MultiSelect(title="At what void[s]", value=["y1"], options=sorted(axis_map.keys()))
master_data = { 'rate' : x1,
'y1' : y1,
'y2' : y2,
'y3' : y3
}
p = figure(plot_height=600, plot_width=700, title="", toolbar_location=None)
pline = figure(plot_height=600, plot_width=700, title="", tools=TOOLS)
def select_data():
for vox in vars.value:
pline.line(x=master_data['rate'], y= master_data[vox], line_width=2)
pline.circle(x=master_data['rate'], y=master_data[vox], line_width=2)
return
controls = [ vars]
for control in controls:
control.on_change('value', lambda attr, old, new: select_data())
sizing_mode = 'fixed'
inputs = widgetbox(*controls)
l = layout([
[inputs, pline],
])
select_data()
curdoc().add_root(l)
curdoc().title = "Plot"

Resources