JGraphT exporting a weighted graph And importing it - jgrapht

I'm looking for a method to export my graph with the weighted edges.
I have a simple directed graph with weighted edges.
SimpleDirectedWeightedGraph<Integer, DefaultWeightedEdge> exGraph =
new SimpleDirectedWeightedGraph<>(vSupplier, SupplierUtil.createDefaultWeightedEdgeSupplier());
This is my graph. I generate some vertices to it and edges.
I tried dot exporter:
DOTExporter<Integer, DefaultWeightedEdge> dotExporter = new DOTExporter<>();
Writer writer = new StringWriter();
FileWriter fw = new FileWriter("ex2.dot");
But with that i cannot export the weights.
Can somebody show me how to do the export first and after that the import?

Different graph formats support edge weights in different ways. Personally, I find the DIMACS graph format the easiest format to export a weighted graph. This graph format simply writes a graph to a file as:
<edge_source> <edge_target> <edge_weight>
Here's a complete example to export/import a weighted graph using DIMACS:
Graph<Integer, DefaultWeightedEdge> graph=new SimpleWeightedGraph<>(DefaultWeightedEdge.class);
Graphs.addEdge(graph,1,2, 10);
Graphs.addEdge(graph,2,3, 11);
Graphs.addEdge(graph,3,4, 12);
Graphs.addEdge(graph,4,1, 13);
//Example of exporting a weighted graph in DIMACS format
DIMACSExporter<Integer, DefaultWeightedEdge> dimacsExporter=new DIMACSExporter<>();
//Enable exporting of weights in the DIMACS exporter
dimacsExporter.setParameter(DIMACSExporter.Parameter.EXPORT_EDGE_WEIGHTS, true);
//Export the graph
dimacsExporter.exportGraph(graph, new File("DIMACSgraph.txt"));
//Now import the graph
Graph<Integer, DefaultWeightedEdge> importGraph = new SimpleWeightedGraph<>(SupplierUtil.createIntegerSupplier(), SupplierUtil.createDefaultWeightedEdgeSupplier());
DIMACSImporter<Integer, DefaultWeightedEdge> dimacsImporter=new DIMACSImporter<>();
dimacsImporter.importGraph(importGraph, new File("DIMACSgraph.txt"));
System.out.println("Imported DIMACS graph: "+importGraph);
System.out.println("Edge weights: ");
for(DefaultWeightedEdge edge : importGraph.edgeSet())
System.out.println("Edge: "+edge+" weight: "+importGraph.getEdgeWeight(edge));
The file created during the export contains the following:
c SOURCE: Generated using the JGraphT library
p edge 4 4
e 1 2 10.0
e 2 3 11.0
e 3 4 12.0
e 4 1 13.0
The output of the above code:
Imported DIMACS graph: ([0, 1, 2, 3], [{0,1}, {1,2}, {2,3}, {3,0}])
Edge weights:
Edge: (0 : 1) weight: 10.0
Edge: (1 : 2) weight: 11.0
Edge: (2 : 3) weight: 12.0
Edge: (3 : 0) weight: 13.0
Obviously, to import a weighted graph, the graph to which you are importing must be of a weighted type, i.e. graph.getType().isWeighted() must return true.
You could also use the DOT file format as you were originally doing. Using this format, you would have to export the edge weight as an edge attribute. See their DOT file format documentation for details.


How to forecast unknown future target values with gluonts DeepAR?

How to forecast unknown future target values with gluonts DeepAR?
I have a time series from 1995-01-01 to 2021-10-01. Monthly frequency data. How to forecast values for the future (next 3 months): 2021-11-01 to 2022-01-01? Note that I don't have the target values for 2021-11-01, 2021-12-01 and 2022-01-01.
Many thanks!
from gluonts.model.deepar import DeepAREstimator
from gluonts.mx import Trainer
import numpy as np
import mxnet as mx
estimator = DeepAREstimator(
, context_length=120
, freq='M'
, trainer=Trainer(
, learning_rate=1e-03
, num_batches_per_epoch=50))
predictor = estimator.train(training_data=df_train)
# Forecasting
predictions = predictor.predict(df_test)
predictions = list(predictions)[0]
predictions = predictions.quantile(0.5)
[163842.34 152805.08 161326.3 176823.97 127003.79 126937.78
139575.2 117121.67 115754.67 139211.28 122623.586 120102.65 ]
As I understood, the predictions values are not for "2021-11-01", "2021-12-01" and "2022-01-01". How do I know to which months this values refer to? How to forecast values for the next 3 months: "2021-11-01", "2021-12-01" and "2022-01-01"?
Take a look at this code. It comes from "Advanced Forecasting with Python".
It does not seem to forecast unknown future values, once it compares the last 28 values of test_ds (Listing 20-5. R2 score and prediction graph) with the predictions made over this same dataset test_ds (Listing 20-4. Prediction)
How do I forecast unknown future values?
Many thanks!
Data source
# Listing 20-1. Importing the data
import pandas as pd
y = pd.read_csv('air_visit_data.csv.zip')
y = y.pivot(index='visit_date', columns='air_store_id')['visitors']
y = y.fillna(0)
y = pd.DataFrame(y.sum(axis=1))
y = y.reset_index(drop=False)
y.columns = ['date', 'y']
# Listing 20-2. Preparing the data format requered by the gluonts library
from gluonts.dataset.common import ListDataset
start = pd.Timestamp("01-01-2016", freq="H")
# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
train_ds = ListDataset([{'target': y.loc[:450,'y'], 'start': start}], freq='H')
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset([{'target': y['y'], 'start': start}],freq='H')
# Listing 20-3. Fitting the default DeepAR model
from gluonts.model.deepar import DeepAREstimator
from gluonts.trainer import Trainer
import mxnet as mx
import numpy as np
estimator = DeepAREstimator(
trainer=Trainer(ctx="gpu", # remove if running on windows
predictor = estimator.train(train_ds)
# Listing 20-4. Prediction
predictions = predictor.predict(test_ds)
predictions = list(predictions)[0]
predictions = predictions.quantile(0.5)
# Listing 20-5. R2 score and prediction graph
from sklearn.metrics import r2_score
print(r2_score( list(test_ds)[0]['target'][-28:], predictions))
import matplotlib.pyplot as plt
plt.legend(['predictions', 'actuals'])
In your case the context length is 120 and prediction length is 12 so the model will look behind 120 data points to predict 12 future data points
The recommendation is to reduce the context to may be 10 and include the data from past 10 months in the df_test table
you can get the start of the forecast using
based on this create a future table of 12 dates(as 12 is the prediction length)

baseline fitting using Numpy poly1d

i have the following baseline:
and as it can be seen, it has an almost sinusoidal shape. i am trying to use polyfit on it. Actually what I have are two arrays of data,one called x and the other y. So what i am using is:
porder = 2
coefs = np.polyfit(x, y, porder)
baseline = np.poly1d(coefs)
cleanspec = y - baseline(x)
My goal is to obtain a clean spectrum in the end, who has a straight baseline with no ondulation.
However, the fitting is not working. Any suggestions on using another more efficient method?
I have tried changing porder to 3, but i have this warning, and it doesn't change anything:
Polyfit may be poorly conditioned
My data for x:
[1.10192816e+11 1.10192893e+11 1.10192969e+11 1.10193045e+11
1.10193122e+11 1.10193198e+11 1.10193274e+11 1.10193350e+11
1.10193427e+11 1.10193503e+11 1.10193579e+11 1.10193656e+11
1.10193732e+11 1.10193808e+11 1.10193885e+11 1.10193961e+11
1.10194037e+11 1.10194113e+11 1.10194190e+11 1.10194266e+11
1.10194342e+11 1.10194419e+11 1.10194495e+11 1.10194571e+11
1.10194647e+11 1.10194724e+11 1.10194800e+11 1.10194876e+11
1.10194953e+11 1.10195029e+11 1.10195105e+11 1.10195182e+11
1.10195258e+11 1.10195334e+11 1.10195410e+11 1.10195487e+11
1.10195563e+11 1.10195639e+11 1.10195716e+11 1.10195792e+11
1.10195868e+11 1.10195944e+11 1.10196021e+11 1.10196097e+11
1.10196173e+11 1.10196250e+11 1.10196326e+11 1.10196402e+11
1.10196479e+11 1.10196555e+11 1.10196631e+11 1.10196707e+11
1.10196784e+11 1.10196860e+11 1.10196936e+11 1.10197013e+11
1.10197089e+11 1.10197165e+11 1.10197241e+11 1.10197318e+11
1.10197394e+11 1.10197470e+11 1.10197547e+11 1.10197623e+11
1.10197699e+11 1.10197776e+11 1.10197852e+11 1.10197928e+11
1.10198004e+11 1.10198081e+11 1.10198157e+11 1.10198233e+11
1.10198310e+11 1.10198386e+11 1.10198462e+11 1.10198538e+11
1.10198615e+11 1.10198691e+11 1.10198767e+11 1.10198844e+11
1.10198920e+11 1.10198996e+11 1.10199073e+11 1.10199149e+11
1.10199225e+11 1.10199301e+11 1.10199378e+11 1.10199454e+11
1.10199530e+11 1.10199607e+11 1.10199683e+11 1.10199759e+11
1.10199835e+11 1.10199912e+11 1.10199988e+11 1.10200064e+11
1.10200141e+11 1.10202582e+11 1.10202658e+11 1.10202735e+11
1.10202811e+11 1.10202887e+11 1.10202963e+11 1.10203040e+11
1.10203116e+11 1.10203192e+11 1.10203269e+11 1.10203345e+11
1.10203421e+11 1.10203498e+11 1.10203574e+11 1.10203650e+11
1.10203726e+11 1.10203803e+11 1.10203879e+11 1.10203955e+11
1.10204032e+11 1.10204108e+11 1.10204184e+11 1.10204260e+11
1.10204337e+11 1.10204413e+11 1.10204489e+11 1.10204566e+11
1.10204642e+11 1.10204718e+11 1.10204795e+11 1.10204871e+11
1.10204947e+11 1.10205023e+11 1.10205100e+11 1.10205176e+11
1.10205252e+11 1.10205329e+11 1.10205405e+11 1.10205481e+11
1.10205557e+11 1.10205634e+11 1.10205710e+11 1.10205786e+11
1.10205863e+11 1.10205939e+11 1.10206015e+11 1.10206092e+11
1.10206168e+11 1.10206244e+11 1.10206320e+11 1.10206397e+11
1.10206473e+11 1.10206549e+11 1.10206626e+11 1.10206702e+11
1.10206778e+11 1.10206854e+11 1.10206931e+11 1.10207007e+11
1.10207083e+11 1.10207160e+11 1.10207236e+11 1.10207312e+11
1.10207389e+11 1.10207465e+11 1.10207541e+11 1.10207617e+11
1.10207694e+11 1.10207770e+11 1.10207846e+11 1.10207923e+11
1.10207999e+11 1.10208075e+11 1.10208151e+11 1.10208228e+11
1.10208304e+11 1.10208380e+11 1.10208457e+11 1.10208533e+11
1.10208609e+11 1.10208686e+11 1.10208762e+11 1.10208838e+11
1.10208914e+11 1.10208991e+11 1.10209067e+11 1.10209143e+11
1.10209220e+11 1.10209296e+11 1.10209372e+11 1.10209448e+11
1.10209525e+11 1.10209601e+11 1.10209677e+11 1.10209754e+11
and for y:
[ 0.00143858 0.05495827 0.07481739 0.03287334 -0.06275658 0.03744501
-0.04392341 0.02849104 0.03173781 0.09748282 0.02854265 0.06573162
0.08215295 0.0240697 0.00931477 0.17572605 0.06783381 0.04853354
-0.00226023 0.03722596 0.09687121 0.10767829 0.04922701 0.08036865
0.02371989 0.13885361 0.13903188 0.09910567 0.08793601 0.06048823
0.03932097 0.04061129 0.03706228 0.13764936 0.14150589 0.12226208
0.09041878 0.13638676 0.11107155 0.12261369 0.11765545 0.07425344
0.06643712 0.1449991 0.14256909 0.0924173 0.09291525 0.12216271
0.11272059 0.07618891 0.16787807 0.07832849 0.10786856 0.12381844
0.14182937 0.08078092 0.11932429 0.06383649 0.02923562 0.0864741
0.07806758 0.04514088 0.12929371 0.11769577 0.03619867 0.02811366
0.06401639 0.06883735 0.01162673 0.0956252 0.11206549 0.0485106
0.07269545 0.01662149 0.01287365 0.13401546 0.06300487 0.01994627
0.00721926 0.04863274 -0.01578364 0.0235379 0.03102316 0.00392559
0.05662182 0.04643381 -0.00665026 0.05532307 -0.01533339 0.04838893
0.02097954 0.02551123 0.03727188 -0.04001189 -0.04294883 0.02837669
-0.06062512 -0.0743994 -0.04665618 -0.03553261 -0.07057554 -0.07028277
-0.07502298 -0.07247965 -0.03540266 -0.03226398 -0.08014487 -0.11907543
-0.18521053 -0.1117617 -0.14377897 -0.07113503 -0.02480966 -0.07459746
-0.07994097 -0.02648713 -0.10288478 -0.13328137 -0.08121377 -0.13742166
-0.024583 -0.11391389 -0.02717251 -0.08876166 -0.04369363 -0.0790144
-0.09589054 -0.12058701 0.00041344 -0.06646403 -0.06368366 -0.10335613
-0.04508286 -0.18360729 -0.0551775 -0.06476622 -0.0834523 -0.01276785
-0.04145486 -0.14549992 -0.11186823 -0.07663398 -0.11920359 -0.0539315
-0.10507118 -0.09112374 -0.09751319 -0.06848278 -0.09031172 -0.07218853
-0.03129234 -0.04543539 -0.00942861 -0.06711099 -0.00712202 -0.11696418
-0.06344093 0.03624227 -0.04798777 0.01174394 -0.08326314 -0.06761215
-0.12063419 -0.05236908 -0.03914692 -0.05370061 -0.01620056 0.06731788
-0.06600111 -0.04601257 -0.02144361 0.00256863 -0.00093034 0.00629604
-0.0252835 -0.00907992 0.03583489 -0.03761906 0.10325763 0.08016437
-0.04900467 0.0110328 0.05019604 -0.04428984 -0.03208058 0.05095359
-0.01807463 0.0691733 0.07472691 0.00659871 0.00947692 0.0014422
Having this huge offset in x is probably not helping. It definitively works when removing it for the fitting process. Looks like this:
import matplotlib.pyplot as plt
import numpy as np
scaledx = xdata * 1e-8 - 1100
coefs = np.polyfit( scaledx, ydata, 7)
base = np.poly1d( coefs )
xt = np.linspace( 1.9,2.1,150)
yt = base( xt )
fig = plt.figure()
ax = fig.add_subplot( 2, 1, 1 )
bx = fig.add_subplot( 2, 1, 2 )
ax.scatter( scaledx , ydata )
ax.plot( xt , yt )
bx.plot( scaledx , ydata - base( scaledx ) )
with xdata and ydata being numpy arrays of the OP data lists.
Concerning the poorly conditioned one should remember how simple linear optimization works. In case of a polynomial one builds the matrix:
A = [
[1, x1, x1**2, ...],
[1, x2, x2**2, ...],
[1, xn, xn**2, ...]
and one needs B^(-1) the inverse of B with B = AT.A and AT being the transposed of A. Now looking at the x values in the order of 1e11, B will have order 1 on one side of the diagonal and for a second order polynomial order 1e44 on the other. In case of a third order polynomial this is getting worse, accordingly. Making the inverse, hence, is becoming unstable, numerically. Luckily, and as used above, this can be solved easily by simple re-scaling of the problem at hand.

How to have the xlim with seaborn automatically adjust based on dataframe date range

I am trying to loop through plots. Each "station" is a pandas dataframe has a single water year of data (oct 1 to Spet 29). The data is being read in with this code:
sh_784_2020 = pd.read_csv("sh_784_WY2020.csv", parse_dates=['Date'])
sh_784_2020.columns = ["Index", "Date", "Temp_C","Precip_mm","SnowDepth_cm","SWE_mm","SM2","SM8","SM20"]
My plots loop through but the x-axis always starts at the year 2000 through the current date displayed but my data is from 2006-2020. Is there a way to have the xlim adjust automatically for the date range in the data frame? Or is there a way to create this plot in matyplotlib and not seaborn?
for station in stations:
station['Density'] = station['SWE_mm']/(station['SnowDepth_cm']*10)*100
station['Density range'] = pd.cut( station['Density'], [-np.inf, 25, 30, 35, 40, np.inf])
Date = station.loc[:, 'Date'].values
SWE_mm = station.loc[:, 'SWE_mm'].values
Density = station.loc[:, 'Density'].values
sns.scatterplot(station['Date'], station['SWE_mm'], hue='Density range', data= station, edgecolor = 'none', palette=['grey', 'green', 'gold', 'orange', 'crimson'], alpha= 1)
plt.xlim ()
Plot example 1
Plot example 2
If you upgrade to seaborn 0.11 you should find that the default autoscaling works better, but you can get a good result without upgrading by creating the Axes object before plotting and setting the units, e.g. something like
ax = plt.figure().subplots()

Increase speed creation for masked xarray file

I am currently trying to crop a retangular xarray file to the shape of a country using a mask grid. Below you can find my current solution (with simpler and smaller arrays). The code works and I get the desired mask based on 1s and 0s. The problem lies on the fact that the code when run on a real country shape (larger and more complex) takes over 30 minutes to run. Since I am using very basic operations here like nested for loops, I also tried different alternatives like a list approach. However, when timing the process, it did not improve on the code below. I wonder if there is a faster way to obtain this mask (vectorization?) or if I should approach the problem in a different way (tried exploring xarray's properties, but have not found anything that tackles this issue yet).
Code below:
import geopandas as gpd
from shapely.geometry import Polygon, Point
import pandas as pd
import numpy as np
import xarray as xr
df = pd.read_csv('Brazil_borders.csv',index_col=0)
lats = np.array([-20, -5, -5, -20,])
lons = np.array([-60, -60, -30, -30])
lats2 = np.array([-10.25, -10.75, -11.25, -11.75, -12.25, -12.75, -13.25, -13.75,
-14.25, -14.75, -15.25, -15.75, -16.25, -16.75, -17.25, -17.75,
-18.25, -18.75, -19.25, -19.75, -20.25, -20.75, -21.25, -21.75,
-22.25, -22.75, -23.25, -23.75, -24.25, -24.75, -25.25, -25.75,
-26.25, -26.75, -27.25, -27.75, -28.25, -28.75, -29.25, -29.75,
-30.25, -30.75, -31.25, -31.75, -32.25, -32.75])
lons2 = np.array([-61.75, -61.25, -60.75, -60.25, -59.75, -59.25, -58.75, -58.25,
-57.75, -57.25, -56.75, -56.25, -55.75, -55.25, -54.75, -54.25,
-53.75, -53.25, -52.75, -52.25, -51.75, -51.25, -50.75, -50.25,
-49.75, -49.25, -48.75, -48.25, -47.75, -47.25, -46.75, -46.25,
-45.75, -45.25, -44.75, -44.25])
points = []
for i in range(len(lats)):
_= [lats[i],lons[i]]
poly_proj = Polygon(points)
mask = np.zeros((len(lats2),len(lons2))) # Mask with the dataset's shape and size.
for i in range(len(lats2)): # Iteration to verify if a given coordinate is within the polygon's area
for j in range(len(lons2)):
grid_point = Point(lats2[i], lons2[j])
if grid_point.within(poly_proj):
mask[i][j] = 1
bool_final = mask
The alternative based on list approach, but with even worse processing time (according to timeit):
lats = np.array([-20, -5, -5, -20,])
lons = np.array([-60, -60, -30, -30])
lats2 = np.array([-10.25, -10.75, -11.25, -11.75, -12.25, -12.75, -13.25, -13.75,
-14.25, -14.75, -15.25, -15.75, -16.25, -16.75, -17.25, -17.75,
-18.25, -18.75, -19.25, -19.75, -20.25, -20.75, -21.25, -21.75,
-22.25, -22.75, -23.25, -23.75, -24.25, -24.75, -25.25, -25.75,
-26.25, -26.75, -27.25, -27.75, -28.25, -28.75, -29.25, -29.75,
-30.25, -30.75, -31.25, -31.75, -32.25, -32.75])
lons2 = np.array([-61.75, -61.25, -60.75, -60.25, -59.75, -59.25, -58.75, -58.25,
-57.75, -57.25, -56.75, -56.25, -55.75, -55.25, -54.75, -54.25,
-53.75, -53.25, -52.75, -52.25, -51.75, -51.25, -50.75, -50.25,
-49.75, -49.25, -48.75, -48.25, -47.75, -47.25, -46.75, -46.25,
-45.75, -45.25, -44.75, -44.25])
points = []
for i in range(len(lats)):
_= [lats[i],lons[i]]
poly_proj = Polygon(points)
grid_point = [Point(lats2[i],lons2[j]) for i in range(len(lats2)) for j in range(len(lons2))]
mask = [1 if grid_point[i].within(poly_proj) else 0 for i in range(len(grid_point))]
bool_final2 = np.reshape(mask,(((len(lats2)),(len(lons2)))))
Thank you in advance!
Based on this answer from snowman2, I created this simple function that provides a much faster solution by using geopandas and rioxarray. Instead of using a list of latitudes and longitudes, one has to use a shapefile with the desired shape to be masked (Instructions for GeoDataFrame creation from list of coordinates).
import xarray as xr
import geopandas as gpd
import rioxarray
from shapely.geometry import mapping
def mask_shape_border (DS,shape_shp): #Inputs are the dataset to be cropped and the address of the mask file (.shp )
if 'lat' in DS: #Some datasets use lat/lon, others latitude/longitude
DS.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)
elif 'latitude' in DS:
DS.rio.set_spatial_dims(x_dim="longitude", y_dim="latitude", inplace=True)
print("Error: check latitude and longitude variable names.")
DS.rio.write_crs("epsg:4326", inplace=True)
mask = gpd.read_file(shape_shp, crs="epsg:4326")
DS_clipped = DS.rio.clip(mask.geometry.apply(mapping), mask.crs, drop=False)

What strategy can I use to OCR Magic the Gathering corner text?

I need to recognize the text in the bottom left corner on Magic the Gathering paper cards (last design). Here an example:
If the text is like this
I want to retrieve the following text:
198/280 U
M20 EN
(I don't need the card author name - Lake Hurwitz in this example)
What OCR library can I use? I've tried with Tesseract without any tuning but the results are not correct. Any advice or link to a project that already does this stuff?
You can make it with tesseract (3.04.01) by sanitizing your image a bit
like in below code
import numpy as np
import cv2
def prepro(zone, prefix):
filename = 'stackmagic.png'
oriimg = cv2.imread(filename)
#keep the interesting part
(a,b,c,d) = zone
text_zone = oriimg[a:b, c:d]
height, width, depth = text_zone.shape
#resize it to be bigger (so less pixelized)
H = 50
imgScale = H/height
newX,newY = text_zone.shape[1]*imgScale, text_zone.shape[0]*imgScale
newimg = cv2.resize(text_zone,(int(newX),int(newY)))
#binarize it
gray = cv2.cvtColor(newimg, cv2.COLOR_BGR2GRAY)
th, img = cv2.threshold(gray, 130, 255, cv2.THRESH_BINARY);
#erode it
kernel = np.ones((1,1),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
cv2.imwrite(prefix+'_ero.png', erosion)
cv2.imshow("Show by CV2",erosion)
prepro((16,27, 6,130), 'upzone')
prepro((27,36, 6,130), 'downzone')
from your cropped image
you get
the upper part:
and the lower part:
and tesseract does seem to be able to extract
xx$ tesseract upzone_ero.png stdout
198/ 280 U
xx$ tesseract downzone_ero.png stdout
M20 ~ EN Duluu Hun-nu
Notice that we fail to extract Luke, but hopefully you were not interested in him/it :)
There are other tools but that'd be advertising stuff and be subjective..
