Gnuplot: import x-axis from file - file

I have two files 'results.dat' and 'grid.dat'.
The results.dat contains per row a different data set of y values.
1 325.5 875.4 658.7 365.5
2 587.5 987.5 478.6 658.5
3 987.1 542.6 986.2 458.7
The grid.dat contains the corresponding x values.
1 100.0 200.0 300.0 400.0
How can I plot with gnuplot the grid.dat as x values und a specific line of results.dat as corresponding y values? E.g. line 3:
1 100.0 987.1
2 200.0 542.6
3 300.0 986.2
4 400.0 458.7
Thanks in advance.

Thats quite similar to the recent question Gnuplot: plotting the maximum of two files. In your case it is also not possible to do it with gnuplot only.
You need an external tool to combine the two files on-the-fly, e.g. with the following python script (any other tool would also do):
""" selectrow.py: Select a row from 'results.dat' and merge with 'grid.dat'."""
import numpy as np
import sys
line = int(sys.argv[1])
A = np.loadtxt('grid.dat')
B = np.loadtxt('results.dat', skiprows=(line-1))[0]
np.savetxt(sys.stdout, np.c_[A, B], delimiter='\t')
And then plot the third line of results.dat with
plot '< python selectrow.py 3' w l

Related

Error in Matrices concatenation, inconsistent dimensions

%% INPUT DATA
input_data = [200 10.0 0.0095; %C1
240 7.0 0.0070; %C2
200 11.0 0.0090; %C3
220 8.5 0.0090; %C4
220 10.5 0.0080; %C5
0.0015 0.0014 -0.0001 0.0009 -0.0004 %Power Loss
];
pd=830; %Power Demand
%%
lambda = input ('Enter initial lambda:')
Could anyone help me fix this? I've check the row to column data but still cant fix the error.
Your 6th row, power, contains 5 entries, as opposed to c1 to c5, which contain only three entries. MATLAB doesn't do Swiss cheese; I'd suggest making power a separate variable.

Replace Null Values of a Column with mean of another Categorcial Column in Spark Dataframe

I have a dataset like this
id category value
1 A NaN
2 B NaN
3 A 10.5
5 A 2.0
6 B 1.0
I want to fill the NAN values with the mean of their respective category. As shown below
id category value
1 A 4.16
2 B 0.5
3 A 10.5
5 A 2.0
6 B 1.0
I tried to calculate first mean values of each category using group by
val df2 = dataFrame.groupBy(category).agg(mean(value)).rdd.map{
case r:Row => (r.getAs[String](category),r.get(1))
}.collect().toMap
println(df2)
I got map of each category and their respective mean values.output: Map(A ->4.16,B->0.5)
Now i tried update query in Sparksql to fill column but it seems spqrkSql dosnt support update query. I tried to fill null values with in dataframe but failed to do so.
What can i do? We can do the same in pandas as shown in Pandas: How to fill null values with mean of a groupby?
But how can i do using spark dataframe
The simplest solution would be to use groupby and join:
val df2 = df.filter(!(isnan($"value"))).groupBy("category").agg(avg($"value").as("avg"))
df.join(df2, "category").withColumn("value", when(col("value").isNaN, $"avg").otherwise($"value")).drop("avg")
Note that if there is a category with all NaN it will be removed from the result
Indeed, you cannot update DataFrames, but you can transform them using functions like select and join. In this case, you can keep the grouping result as a DataFrame and join it (on category column) to the original one, then perform the mapping that would replace NaNs with the mean values:
import org.apache.spark.sql.functions._
import spark.implicits._
// calculate mean per category:
val meanPerCategory = dataFrame.groupBy("category").agg(mean("value") as "mean")
// use join, select and "nanvl" function to replace NaNs with the mean values:
val result = dataFrame
.join(meanPerCategory, "category")
.select($"category", $"id", nanvl($"value", $"mean")).show()
I stumbled upon same problem and came across this post. But tried a different solution i.e. using window functions. The code below is tested on pyspark 2.4.3 (Window functions are available from Spark 1.4). I believe this is bit cleaner solution.
This post is quiet old, but hope this answer will be helpful for others.
from pyspark.sql import Window
from pyspark.sql.functions import *
df = spark.createDataFrame([(1,"A", None), (2,"B", None), (3,"A",10.5), (5,"A",2.0), (6,"B",1.0)], ['id', 'category', 'value'])
category_window = Window.partitionBy("category")
value_mean = mean("value0").over(category_window)
result = df\
.withColumn("value0", coalesce("value", lit(0)))\
.withColumn("value_mean", value_mean)\
.withColumn("new_value", coalesce("value", "value_mean"))\
.select("id", "category", "new_value")
result.show()
Output will be as expected (in question):
id category new_value
1 A 4.166666666666667
2 B 0.5
3 A 10.5
5 A 2
6 B 1

Split array into chunks based on timestamp in Haskell

I have an array of records (custom data type) in Haskell which I want to aggregate based on a each records' timestamp. In very general terms each record looks like this:
data Record = Record { event :: String,
time :: Double,
from :: Int,
to :: Int
} deriving (Show, Eq)
I used a Double for the timestamp since that is the same format used in the tracefile.
And I parse them from a CSV file into an array of records: [Record]
Now I'm looking to get an approximation of instantaneous events / time. So I want to split the array into several arrays based on the timestamp (say. every 1 seconds) and then fold across each smaller array.
The problem is I can't figure out how to split an array based on the value of a record. Looking on Hoogle I found several functions like splitEvery and splitWhen, but I'm lost. I considered using splitWhen to break up the list when, say, (mod time 0.1) == 0, but even if that worked it would remove the elements it's splitting on (which I don't want to do).
I should note that the records are NOT evenly spaced in time. E.g. the timestamp on sequential records is not going to differ by a fixed amount.
I am more than willing to store the data in a different format if you can suggest one that would make this sort of work easier.
A quick sample of the data I'm parsing (from a ns2 simulation):
r 0.114 1 2 tcp 1000 ________ 2 1.0 5.0 0 2
r 0.240 1 2 tcp 1000 ________ 2 1.0 5.0 0 2
r 0.914 2 1 tcp 1000 ________ 2 5.0 1.0 0 3
If you have [Record] and you want to group them by a specific condition, you can use Data.List.groupBy. I'm assuming that for your time :: Double, 1 second is the base unit, so time = 1 is 1 second, time = 100 is 100 seconds, etc, so adjust this to whatever system you're actually using:
import Data.List
import Data.Function (on)
isInSameClockSecond :: Record -> Record -> Bool
isInSameClockSecond = (==) `on` (floor . time :: Record -> Integer)
-- The type signature is given for floor . time to remove any ambiguity
-- due to floor's polymorphic type signature.
groupBySameClockSecond :: [Record] -> [[Record]]
groupBySameClockSecond = groupBy isInSameClockSecond

How to plot a graph based on a txt file and split data by words?

Single row of my output txt file is looks like :
1 open 0 heartbeat 0 closed 0
The gap between data are randomly mixture with different number of \t and space.
I wrote some code like
import numpy as np
import matplotlib.pyplot as plt
with open("../testResults/star-6.txt") as f:
data = f.read()
data = data.split('\n')
x = [row.split'HOW?')[0] for row in data]
y = [row.split('HOW?')[8] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("diagram")
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
Which obviously does not work . is there anyway I could do sort of row.spilit_by_word ?
I am appreciate for any helps ! Thanks ..
Use pandas. Given this data file (I call it "test.csv"):
1 open 0 heartbeat 8 closed 0
2 Open 1 h1artbeat 7 losed 10
3 oPen 0 he2rtbeat 6 cosed 100
4 opEn 1 hea3tbeat 5 clsed 10000
5 opeN 0 hear4beat 4 cloed 10000
6 OPen 1 heart5eat 3 closd 20000
7 OpEn 0 heartb6at 2 close 2000
8 OpeN 1 heartbe7t 1 osed 200
9 oPEn 0 heartbea8 0 lsed 20
You can do this:
import pandas as pd
df=pd.read_csv('test.csv', sep='\s+', header=False)
df.columns=['x',1,2,3,4,5,'y']
x=df['x']
y=df['y']
The rest is the same.
You could also just do:
ax = df.plot(x='x', y='y', title='diagram')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
plt.show()
The simplest approach is to first replace the \t characters to spaces and then split on the spaces:
row.replace('\t',' ').split()
otherwise (e.g. if you have more types of delimiter or very long rows) using re might be better:
re.split('\s*', row)
obviously you need to do import re first.
It's hard to decide which one is the best answer , since #TheBlackCat gives a more specific answer and it looks much simpler then directly use matplotlib I think his answer will be better for beginners like me who can see this question in future.
However based on #hitzg's suggestion I worked out my source code for share.
import numpy as np
import matplotlib.pyplot as plt
filelist = ["mesh-10","line-10","ring-10","star-10","tree-10","tri-graph-10"]
fig = plt.figure()
for filename in filelist:
with open("../testResults/%s.log" %filename) as f:
data = f.read()
data = data.replace('\t',' ')
rows = data.split('\n')
rows = [row.split() for row in rows]
x_arr = []
y_arr = []
for row in rows:
if row: #important : santitize null rows -> cause error
x = float(row[0])/10
y = float(row[8])
x_arr.append(x)
y_arr.append(y)
ax1 = fig.add_subplot(111)
ax1.set_title("result of all topologies")
ax1.set_xlabel('time in sec')
ax1.set_xlim([0,30]) #set x axis limitation
ax1.set_ylabel('num of mydist msg')
ax1.plot(x_arr,y_arr, c=np.random.rand(4,1), label=filename)
leg = ax1.legend()
plt.show()
Hope it helps , many thanks to both #TheBlackCat and #hitzg, best wishes to people who are new to matplotlib and trying to find an answer of this :)

How to read file in matlab?

I have a txt file, and the content of the file is rows of numbers,
each row have 5 float number in it, with comma seperate between each number.
example:
1.1 , 12 , 1.42562, 3.5 , 2.2
2.1 , 3.3 , 3 , 3.333, 6.75
How can I read the file content into matrix in matlab?
So far I have this:
fid = fopen('file.txt');
comma = char(',');
A = fscanf(fid, ['%f', comma]);
fclose(fid);
The problem is that it's only give me the first line and when I
try to write the content of A I get this: 1.0e+004 * some number
Can someone help me please?
I guess that for the file I need to read it in a loop but I don't know how.
Edit: One more question: When I do output to A I get this:
A =
1.0e+004 *
4.8631 0 0 0 0.0001
4.8638 -0.0000 -0.0000 0.0004 0.0114
4.8647 -0.0000 -0.0000 0.0008 0.0109
I want the same values that in the file to be in the matrix, how can I make the numbers to be regular float and not formatted like this? Or are the numbers in the matrix actually float, but the output is just displayed like this?
MATLAB's built-in dlmread function would be a much easier solution for what you want to accomplish.
A = dlmread('filename.txt',',') % call dlmread and specify a comma as the delimiter
try with using importdata function
A = importdata(`filename.txt`);
It will solve your question.
EDIT
Alternative 1)
A = dlmread('test_so.txt',',');
The answer is surprisingly simple:
fid = fopen('depthMap.txt');
A = fscanf(fid, '%f');
fclose(fid);

Resources