Create A Scatterplot from Pandas DataFrame - arrays

I'm working on a Pandas DF question and I am having trouble converting some Pandas data into a usable format to create a Scatter Plot.
Here is the code below, please let me know what I am doing wrong and how I can correct it going forward. Honest criticism is needed as I am a beginner.
# Import Data
df = pd.read_csv(filepath + 'BaltimoreData.csv')
df = df.dropna()
print(df.head(20))
# These are two categories within the data
df.plot(df['Bachelors degree'], df['Median Income'])
# Plotting the Data
df.plot(kind = 'scatter', x = 'Bachelor degree', y = 'Median Income')
df.plot(kind = 'density')

Simply plot x on y as below, where df is your dataframe and x and y are your dependent and independent variables:
import matplotlib.pyplot as plt
import pandas
plt.scatter(x=df['Bachelors degree'], y=df['Median Income'])
plt.show()

You can use scatter plot from pandas.
import pandas
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df.plot.scatter(x='Bachelors degree', y='Median Income');
plt.show()

Related

2D Array for Heatmap

I am trying to create a 2D array that I will use to plot a heatmap.
The array needs to be n by n and have the highest value be at its epicenter with diminishing values further away like in the diagram below.
How could I do that?
You can use numpy for the array and matplotlib for creating a heatmap respectively. Something like this:
import numpy as np
import matplotlib.pyplot as plt
# creating array using numpy
array=np.ones((9,9),dtype=int)
array[1:8,1:8]=2
array[2:7,2:7]=3
array[3:6,3:6]=4
array[4,4]=5
print(array)
fig, ax = plt.subplots()
im = ax.imshow(array,cmap="PuBuGn") # cmap can be Greys, YlGnBu, PuBuGn, BuPu etc
# Create colorbar
cbar = ax.figure.colorbar(im, ax=ax,ticks=[1,2,3,4,5])
cbar.ax.set_ylabel("My bar [1-5]", rotation=-90, va="bottom")
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_title("My heatmap")
fig.tight_layout()
plt.show()
For automatically create an array, use a loop.
import numpy as np
import matplotlib.pyplot as plt
lim=100
arr=np.ones((lim,lim),dtype=int)
for i in range(1,lim):
arr[i:len(arr)-i,i:len(arr)-i]=i+1
fig, ax = plt.subplots()
im = ax.imshow(arr,cmap="Purples") # cmap can be Greys, YlGnBu, PuBuGn, BuPu etc
# Create colorbar
cbar = ax.figure.colorbar(im, ax=ax,ticks=list(range(1,lim,5)))
cbar.ax.set_ylabel("My bar [1-50]", rotation=-90, va="bottom")
ax.set_xticklabels([])
ax.set_yticklabels([])
# Show all ticks and label them with the respective list entries
ax.set_title("My heatmap")
fig.tight_layout()
plt.show()

How do I create a multi saved chart as .png?

My goal is to save each plot after one pass of for loop. Here is what I've tried, but it shows the images overlapping. Below is my code:
import matplotlib.pyplot as plt
for i in range(10):
y = arr_df[i].flatten()
plt.title('Nitrat Graph')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.plot(x_axis, y, color='blue')
plt.savefig(f'NO_{i}.png')
As long as y is a one-dimensional array, all you need to do is add plt.show() in the loop and it'll display after each iteration. Here is an example replacing your data with random data:
import matplotlib.pyplot as plt
import numpy as np
for i in range(10):
y = np.random.rand(10,1)
plt.title('Nitrat Graph')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.plot(y, color='blue')
plt.savefig(f'NO_{i}.png')
plt.show()

Updating matplotlib figures within a for loop in python

I am trying to move a magnetic object and update its magnetic field which is then plotted in a for loop. I want to update the matplotlib figures such that the last figure is deleted before the latest one is shown (with delay as can be seen). I want this to happen in one window (I might be using the wrong technical term) only. Currently it creates a new figure every time it updates the magnetic field. I tried using plt.cla(), plt.close(), and plt.clf() without success. The code is given below. Any help will be much appreciated
import matplotlib.pyplot as plt
import numpy as np
from magpylib.source.magnet import Box,Cylinder
from magpylib import Collection, displaySystem
# create magnets
s1 = Box(mag=(0,0,600), dim=(3,3,3), pos=(-4,0,20))
# calculate the grid
xs = np.linspace(-15,10,33)
zs = np.linspace(-5,25,44)
POS = np.array([(x,0,z) for z in zs for x in xs])
X,Z = np.meshgrid(xs,zs)
for i in range(20):
Bs = s1.getB(POS).reshape(44,33,3) #B-field
s1.move((0,0,-1))
# create figure
fig = plt.figure(figsize=(5,9))
# display field in xz-plane using matplotlib
U,V = Bs[:,:,0], Bs[:,:,2]
plt.streamplot(X, Z, U, V, color=np.log(U**2+V**2))
plt.show()
sleep(0.2)```
You want to make use of matplotlib's interactive mode by invoking plt.ion() and clear the axes after every frame in the loop using plt.cla():
import matplotlib.pyplot as plt
import numpy as np
from magpylib.source.magnet import Box,Cylinder
from magpylib import Collection, displaySystem
fig, ax = plt.subplots()
# create magnets
s1 = Box(mag=(0,0,600), dim=(3,3,3), pos=(-4,0,20))
# calculate the grid
xs = np.linspace(-15,10,33)
zs = np.linspace(-5,25,44)
POS = np.array([(x,0,z) for z in zs for x in xs])
X,Z = np.meshgrid(xs,zs)
plt.ion()
plt.show()
img=0
for i in range(20):
Bs = s1.getB(POS).reshape(44,33,3) #B-field
s1.move((0,0,-1))
U,V = Bs[:,:,0], Bs[:,:,2]
ax.streamplot(X, Z, U, V, color=np.log(U**2+V**2))
plt.gcf().canvas.draw()
plt.savefig('{}'.format(img))
plt.pause(0.01)
plt.clf()
img=img+1

How to create pandas dataframe from array([[[135, 2270.24]]], dtype=object)

I am embarassed with this. I would like to transform this array to pandas dataframe with one column let's say called "feature" and one value: [135, 2270.24]:
array([[[135, 2270.24]]], dtype=object)
I tried this but returns ValueError: Must pass 2-d input
df = pd.DataFrame(C, columns = ['feature']) with C the array.
I'm not entirely sure I follow exactly what you're asking for. But if my interpretation is correct you're looking for something like this?
import pandas as pd
import numpy as np
# setup
val = np.array([[[135, 2270.24]]])
# logic
data = [{'feature': val[0][0]}]
df = pd.DataFrame(data)
Output df:
feature
0 [135.0, 2270.24]

matplotlib.pyplot errorbar ValueError depends on array length?

Good afternoon.
I've been struggling with this for a while now, and although I can find similiar problems online, nothing I found could really help me resolve it.
Starting with a standard data file (.csv or .txt, I tried both) containing three columns (x, y and the error of y), I want to read in the data and generate a line plot including error bars.
I can plot the x and y values without a problem, but if I want to add errorbars using the matplotlib.pyplot errorbar utility, I get the following error message:
ValueError: yerr must be a scalar, the same dimensions as y, or 2xN.
The code below works if I use some arbitrary arrays (numpy or plain python), but not for data read from the file. I've tried converting the tuples which I obtain from my input code to numpy arrays using asarray, but to no avail.
import numpy as np
import matplotlib.pyplot as plt
row = []
with open("data.csv") as data:
for line in data:
row.append(line.split(','))
column = zip(*row)
x = column[0]
y = column[1]
yer = column[2]
plt.figure()
plt.errorbar(x,y,yerr = yer)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
fig.savefig('example.png', dpi=300)
It must be that I am overlooking something. I would be very grateful for any thoughts on the matter.
yerr should be the added/subtracted error from the y value. In your case the added equals the subtracted equals half of the third column.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv', delimiter=',')
plt.figure()
yerr_ = np.tile(data[:, 2]/2, (2, 1))
plt.errorbar(data[:, 0], data[:, 1], yerr=yerr_)
plt.xlim([-1, 3])
plt.show()
data.csv
0,2,0.3
1,4,0.4
2,3,0.15

Resources