Adding an array to an existing csv file - arrays

I know this question was already asked, but after trying most of the accepted answer none of them seems to work with my simple task...
I have a csv file as follow:
Date,Median
2000-01-31,9
2000-02-28,8
2000-03-31,7
2000-04-30,6
2000-05-31,5
2000-06-30,4
2000-07-31,3
2000-08-31,2
2000-09-30,1
2000-10-31,0
2000-11-30,11
2000-12-31,12
and then an array:
[0.1829 0.171349 0.162461 0.152306 0.14122 0.137749 0.138802 0.150315
0.156784 0.168297 0.180634 0.187241]
I wish to append this array as a third column to the csv file to get the following output:
Date,Median,Median2
2000-01-31,9,0.1829
2000-02-28,8,0.171349
2000-03-31,7,0.162461
2000-04-30,6,0.152306
2000-05-31,5,0.14122
2000-06-30,4,0.137749
2000-07-31,3,0.138802
2000-08-31,2,0.150315
2000-09-30,1,0.156784
2000-10-31,0,0.168297
2000-11-30,11,0.180634
2000-12-31,12,0.187241
I tried most of the answer related to this kind of question but I did not succeed to make them work..here is the last code I tried, using pandas that looks easier but it does not work:
data=pd.read_csv("data_1.csv",sep=',')
array_transpose = array.reshape((-1, 1)) #in order to transpose the array
data['Median2'] = data[array_transpose]
data.to_csv('output.csv')
which produce the following error:
KeyError: '[0.1829 0.171349 0.162461 0.152306 0.14122 0.137749 0.138802 0.150315\n 0.156784 0.168297 0.180634 0.187241] not in index'
How to append this array to my csv file?

You may not need reshape
data=pd.read_csv("data_1.csv",sep=',')
data['Median2'] = array
data.to_csv('output.csv')

Related

how do I filter my values using an array?

I am trying to sift through some free text answers on geographical locations. As one of the steps, I want to check if the answer is any of the 290 municipalities in my country. As 290 entries would make my code cumbersome/hard to read I try saving them in an array, like below:
Data resorTEST;
keep R_res_ort_namn R_res_ort_txt R_kom_lan R_kommun;
set resor1TEST resor2TEST resor3TEST;
R_res_ort_namn=strip(lowcase(R_res_ort_namn));
R_res_ort_txt=strip(lowcase(R_res_ort_txt));
R_kom_lan=strip(lowcase(R_kom_lan));
array kommuner{290} $ ("upplands väsby" "vallentuna" "österåker" "värmdö"
"järfälla" "ekerö" "huddinge" "botkyrka" "salem"
"haninge" "tyresö" "upplands-bro" "nykvarn" "täby"
"danderyd" "sollentuna" "stockholm" "södertälje" "nacka" "sundbyberg"
"solna" "lidingö" "vaxholm" "norrtälje" "sigtuna" "nynäshamn"
"håbo" "älvkarleby" "knivsta" "heby" "tierp" "uppsala"
"enköping" "östhammar" "vingåker" "gnesta" "nyköping" "oxelösund" "flen"
"katrineholm" "eskilstuna" "strängnäs" "trosa" "ödeshög" "ydre"
"kinda" "boxholm" "åtvidaberg" "finspång" "valdemarsvik" "linköping"
"norrköping" "söderköping" "motala" "vadstena" "mjölby" "aneby"
"gnosjö" "mullsjö" "habo" "gislaved" "vaggeryd" "jönköping"
"nässjö" "värnamo" "sävsjö" "vetlanda" "eksjö" "tranås"
"uppvidinge" "lessebo" "tingsryd" "alvesta" "älmhult" "markaryd"
"växjö" "ljungby" "högsby" "torsås" "mörbylånga" "hultsfred"
"mönsterås" "emmaboda" "kalmar" "nybro" "oskarshamn" "västervik"
"vimmerby" "borgholm" "gotland" "olofström" "karlskrona" "ronneby"
"karlshamn" "sölvesborg" "svalöv" "staffanstorp" "burlöv" "vellinge"
"östra göinge" "örkelljunga" "bjuv" "kävlinge" "lomma" "svedala"
"skurup" "sjöbo" "hörby" "höör" "tomelilla" "bromölla"
"osby" "perstorp" "klippan" "åstorp" "båstad" "malmö"
"lund" "landskrona" "helsingborg" "höganäs" "eslöv" "ystad"
"trelleborg" "kristianstad" "simrishamn" "ängelholm" "hässleholm" "hylte"
"halmstad" "laholm" "falkenberg" "varberg" "kungsbacka" "härryda"
"partille" "öckerö" "stenungsund" "tjörn" "orust" "sotenäs"
"munkedal" "tanum" "dals-ed" "färgelanda" "ale" "lerum"
"vårgårda" "bollebygd" "grästorp" "essunga" "karlsborg" "gullspång"
"tranemo" "bengtsfors" "mellerud" "lilla edet" "mark" "svenljunga"
"herrljunga" "vara" "götene" "tibro" "töreboda" "göteborg"
"mölndal" "kungälv" "lysekil" "uddevalla" "strömstad" "vänersborg"
"trollhättan" "alingsås" "borås" "ulricehamn" "åmål" "mariestad"
"lidköping" "skara" "skövde" "hjo" "tidaholm" "falköping"
"kil" "eda" "torsby" "storfors" "hammarö" "munkfors"
"forshaga" "grums" "årjäng" "sunne" "karlstad" "kristinehamn"
"filipstad" "hagfors" "arvika" "säffle" "lekeberg" "laxå"
"hallsberg" "degerfors" "hällefors" "ljusnarsberg" "örebro" "kumla"
"askersund" "karlskoga" "nora" "lindesberg" "skinnskatteberg" "surahammar"
"kungsör" "hallstahammar" "norberg" "västerås" "sala" "fagersta"
"köping" "arboga" "vansbro" "malung-sälen" "gagnef" "leksand"
"rättvik" "orsa" "älvdalen" "smedjebacken" "mora" "falun"
"borlänge" "säter" "hedemora" "avesta" "ludvika" "ockelbo"
"hofors" "ovanåker" "nordanstig" "ljusdal" "gävle" "sandviken"
"söderhamn" "bollnäs" "hudiksvall" "ånge" "timrå" "härnösand"
"sundsvall" "kramfors" "sollefteå" "örnsköldsvik" "ragunda" "bräcke"
"krokom" "strömsund" "åre" "berg" "härjedalen" "östersund"
"nordmaling" "bjurholm" "vindeln" "robertsfors" "norsjö" "malå"
"storuman" "sorsele" "dorotea" "vännäs" "vilhelmina" "åsele"
"umeå" "lycksele" "skellefteå" "arvidsjaur" "arjeplog" "jokkmokk"
"överkalix" "kalix" "övertorneå" "pajala" "gällivare" "älvsbyn"
"luleå" "piteå" "boden" "haparanda" "kiruna");
/*if not missing(R_res_ort_namn) then R_kommun=prxchange("s/^.*-(.* kommun)/$1/",1,R_res_ort_namn);
else if prxmatch("/^.*([a-zA-Z]*? kommun).*$/",R_res_ort_txt) then R_kommun=prxchange("s/^.*?([a-zA-Z]*? kommun).*$/$1/",-1,R_res_ort_txt);
else if prxmatch("/^.*([a-zA-Z]*? kommun).*$/",R_kom_lan) then R_kommun=prxchange("s/^.*?([a-zA-Z]*? kommun).*$/$1/",-1,R_kom_lan);
else */if R_res_ort_txt in kommuner then R_kommun=R_res_ort_txt;
run;
However, for some reason this does not seem to work for all of the municipalities. The municipality of "uppsala" works for instance, but not the municipality of "ängelholm".
I have tried stripping the variables of whitespace and converting everything to lowercase. What am I doing wrong?
Additional info:
For some reason it does work flawlessly if I skip the array and just copy-paste the exact same list of municipality names into a parenthesis following the in-operator. I would however need to repeat this step 5-6 times and this solution would make my code quite cumbersome.
You are defining the ARRAY
Array kommuner{290} $
with character length 8. See what happens when you fix that.

What method am I not using right?

I'm new to Julia and currently trying to run the following code:
Using DelimitedFiles
M=readdlm(data)
ts,A=M[:,1],M[:,2:end]
(nsweeps,N)=size(A)
dx=0.01;
x=[minimum(collect(A)):dx:maximum(collect(A))];
bx=[x-dx/2,x[end]+dx/2];
But, when I try to run the last line of code, it gives me the following error:
MethodError: no method
matching(::Array{StepRangeLen{Float64,Base.TwicePrecision
{Float64},Base.TwicePrecision{Float64}},1}, ::Float64)
Closest candidates are:
-(!Matched::BigFloat, ::Union{Float16, Float32, Float64}) at
mpfr.jl:437
-(!Matched::Complex{Bool}, ::Real) at complex.jl:307
-(!Matched::Missing, ::Number) at missing.jl:115
Can you please help me? Also, the data I'm using it's
30×6 Array{Float64,2}
UPDATE here's the whole function I'm trying to run is the following:
function mymain(filename,nsamples)
start_time=time()
M=readdlm(filename)
ts,A=M[:,1],M[:,2:end]
(nsweeps,N)=size(A)
dx=0.01;
x=[minimum(collect(A)):dx:maximum(collect(A))];
bx=[x-dx/2,x[end]+dx/2];
(bx,hA)=hist(A[:],bx);
f1=figure()
subplot(2,1,1); plot(ts,A,"-o"); xlabel("Time [ms]"); ylabel("Amps
[mV]");
subplot(2,1,2); plot(x,hA,"-"); xlabel("Amps [mV]");
ylabel("Density");draw()
nparams=8
Sx=Array(ASCIIString,1,nparams)
Rx=zeros(2,nparams)
nx=zeros(Int,1,nparams)
Sx[1,1]="p"; Rx[1:2,1]=[0.02,0.98]; nx[1]=49
Sx[1,2]="n"; Rx[1:2,2]=[1,20]; nx[2]=20
Sx[1,3]="tD"; Rx[1:2,3]=[50,200]; nx[3]=46
Sx[1,4]="a"; Rx[1:2,4]=[0.05,0.5]; nx[4]=46
Sx[1,5]="siga"; Rx[1:2,5]=[0.01,0.2]; nx[5]=39
Sx[1,6]="sigb"; Rx[1:2,6]=[0.01,0.1]; nx[6]=19
Sx[1,7]="tauf"; Rx[1:2,7]=[50,200]; nx[7]=46
Sx[1,8]="u1"; Rx[1:2,8]=Rx[1:2,1]; nx[8]=nx[1]
x=zeros(maximum(nx),nparams)
p=zeros(maximum(nx),nparams)
dx=zeros(1,nparams)
for j=1:nparams
x[1:nx[j],j]=linspace(Rx[1,j],Rx[2,j],nx[j])'
dx[j]=x[2,j]-x[1,j]
end
S=zeros(Int,nsamples,nparams)
sold=zeros(Int,1,nparams)
for j=1:nparams
sold[j]=rand(1:nx[j])
end
while x[sold[4],4]<=x[sold[5],5]
sold[4]=rand(1:nx[4])
sold[5]=rand(1:nx[5])
end
while x[sold[8],8]<=x[sold[1],1]
sold[1]=rand(1:nx[1])
sold[8]=rand(1:nx[8])
end
xold=zeros(1,nparams)
xnew=zeros(1,nparams)
for j=1:nparams
xold[j]=x[sold[j],j]
end
llold=myloglikelihood(xold,ts,A)
for k=1:nsamples
snew=sold+rand(-1:1,1,nparams)
if all(ones(1,nparams).<=snew.<=nx)
allowed2=x[snew[4],4]>x[snew[5],5]
allowed3=x[snew[8],8]>x[snew[1],1]
if allowed2&allowed3
for j=1:nparams
xnew[j]=x[snew[j],j]
end
llnew=myloglikelihood(xnew,ts,A)
if rand()<exp(llnew-llold)
sold,llold=snew,llnew
end
end
end
S[k,:]=sold
end
for k=1:nsamples
for j=1:nparams
p[S[k,j],j]+=1/(nsamples*dx[j])
end
end
f2=figure()
for j=1:nparams
subplot(2,4,j)
plot(x[1:nx[j],j],p[1:nx[j],j]);
xlabel(Sx[j])
end
diff_time=time()-start_time;
println("Total runtime
",round(diff_time,3),"s=",round(diff_time/60,1),"mins." );
return S
end
This goes in line with some other functions, but as you can see, this is the main function, so I really can't move forward without first runnning this one.
It isn't clear what outcome you are hoping for here. So I'll just give some pointers that hopefully help.
First, in this line:
x=[minimum(collect(A)):dx:maximum(collect(A))];
the calls to collect are redundant. Also, I suspect you are trying to construct a StepRangeLen, but by putting it in [] you actually are getting a Vector{StepRangeLen}. I think what you want in this line is actually this:
x=minimum(A):dx:maximum(A);
Second, in this line:
bx=[x-dx/2,x[end]+dx/2];
note that dx/2 is a Float64 while x is a StepRangeLen. This is important because the latter is a collection so if you want to perform this operation element-wise across the collection you need to broadcast, that is, x .- dx/2. Note, I suspect you may not be on the latest version of Julia, because when I run this the error message actually tells me explicitly I need to broadcast. Anyway, in contrast, x[end]+dx/2 is fine and does not need to be broadcast because x[end] is Float64. So I think you want:
bx=[x .- dx/2, x[end] + dx/2];
Having said that, it isn't clear to me why you want this bx, which is why I said at the start I'm not sure what outcome you were hoping for.

How can I create and shuffle a dataset for triplet mining in TensorFlow 2?

I'm working on a network using triplet mining for training. In order to make it work properly, I need my batches to contain several images of the same class. The problem I'm currently facing is that I have 751 classes, for a total of 12,937 pictures, and a batch size of 48 pictures. When shuffling the dataset using the command below, the odds to get pictures from the same class are really low, making the triplet mining inefficient.
dataset = dataset.shuffle(12937)
What I would need instead is a way of generating batches that contain a specific number of pictures for every class represented in this batch. As an example, let's say here that I want 12 classes per batch, there would be 4 pictures for each of them.
Another problem I'm facing is how would I shuffle this dataset at the end of every epoch so that I can have different batches that still follow the condition fixed above, that is 12 classes, 4 pictures for each one of them?
Is there any proper way to do it? I can't really find one. Please let me know if I'm unclear, and if you need further details.
================ EDIT ================
I've been trying a few things, and came up with something that would do what I want. The function would be the following:
counter = 0.
# Assuming a format such as (data, label)
def predicate(data, label):
global counter
allowed_labels = tf.constant([counter])
isallowed = tf.equal(allowed_labels, tf.cast(label, tf.float32))
reduced = tf.reduce_sum(tf.cast(isallowed, tf.float32))
counter += 1
return tf.greater(reduced, tf.constant(0.))
##tf.function
def custom_shuffle(train_dataset, batch_size, samples_per_class = 4, iterations_in_epoch = 100, database='market'):
assert batch_size%samples_per_class==0, F'batch size must be a {samples_per_class} multiple.'
if database == 'market':
class_nbr = 751
else:
raise Exception('Unsuported database yet')
all_datasets = [train_dataset.filter(predicate) for _ in range(class_nbr)] # Every element of this array is a dataset of one class
for i in range(iterations_in_epoch):
choice = tf.random.uniform(
shape=(batch_size//samples_per_class,),
minval=0,
maxval=class_nbr,
dtype=tf.dtypes.int64,
) # Which classes will be in batch
choice = tf.data.Dataset.from_tensor_slices(tf.concat([choice for _ in range(4)], axis=0)) # Exactly 4 picture from each class in the batch
batch = tf.data.experimental.choose_from_datasets(all_datasets, choice)
if i==0:
all_batches = batch
else:
all_batches = all_batches.concatenate(batch)
all_batches = all_batches.batch(batch_size)
return all_batches
It does what I want, however the returned dataset is extremely slow to iterate, making modele learning impossible. As per this thread, I understood that I needed to decorate custom_shuffle with #tf.function, as the one commented out. However, when doing so, it raises the following error:
Traceback (most recent call last):
File "training.py", line 137, in <module>
main()
File "training.py", line 80, in main
train_dataset = get_dataset(TRAINING_FILENAMES, IMG_SIZE, BATCH_SIZE, database=database, func_type='train')
File "E:\Morgan\TransReID_TF\tfr_to_dataset.py", line 260, in get_dataset
dataset = custom_shuffle(dataset, batch_size)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper
[[{{node BatchDatasetV2/_206}}]] [Op:__inference_custom_shuffle_11485]
Function call stack:
custom_shuffle
Which I don't understand, and don't see how to fix.
Is there something I'm doing wrong?
PS: I'm aware the lack of minimal code to reproduce this behavior makes it hard to debug, I'll try to provide some as soon as possible.

many array element and i need to search a file and print array whcih are not present in the file and there should be no duplicate records in my output

Please help me on the below code :
I have an array with 155 elements and i have a file which has some elements of array inside it , i need all values of the array elements which are found in the file and also i need the array element to be printed as zero if the array element is not found in the file .
Thanks in advance, this is what i have tried.
args=("C9" "DP10" "DP11" "DP20" "DP21" "DP30" "DP31" "DP50" "FR31" "G128" "G402" "G602" "GA" "GI" "GT08" "GT14" "GT17" "GT25" "GT37" "GT67" "H6" "H7" "IL" "IM" "J6" "JD05" "JD09" "JD14" "JD25" "JD37" "K1" "K2" "L100" "L106" "L116" "L150" "L202" "L7" "L8" "L9" "LD11" "LD21" "LE09" "LE26" "LP11" "LP21" "LP31" "LP55" "LQ11" "LQ21" "LQ31" "LS07" "LT09" "LT10" "LT12" "LT15" "LT20" "LT22" "LT24" "LT25" "LT30" "LT38" "LT42" "LT43" "LT44" "LT48" "LT50" "LT59" "LT60" "LT65" "M395" "OV04" "OV07" "OV14" "OV18" "OV23" "OV27" "OV35" "OV39" "OV40" "OV79" "Q15" "Q150" "Q19" "QD11" "QD21" "QD31" "QD65" "QE11" "QE21" "QE31" "QF50" "QM25" "QP10" "QP15" "QP20" "QP30" "QP31" "QP50" "QT25" "QT50" "R39" "R40" "r57" "R9" "rc23" "RC27" "RC39" "rc7" "rc79" "S1" "S101" "S117" "S118" "S13p" "S18" "S202" "S317" "S318" "S319" "S40" "S408" "S67" "S76" "S82" "S99" "SD11" "SD12" "SD14" "SD17" "SD29" "SD3" "SD5" "SD98" "SF20" "SF74" "SR07" "SV19" "SV6p" "T402" "T602" "TG00" "TG17" "TG43" "TG8" "TG92" "WD09" "WD14" "WD17" "WD24" "WD29" "WD37" "WD43" "WWE1" "XR91")
MY CODE :
for loop i have used to traverse the elements search inside a file .
for i in ${args[#]}; do
grep $i file.txt
if [ $? -ne 0 ]; then
echo $i"","""0"
fi
done >> output.txt
TOTAL FILE:
C9,5015319
DP10,36870732
DP11,188
DP20,18728254
DP21,341182
DP30,8415555
DP31,2390000
DP50,12371853
FR31,24541
G128,49780
G402,2000
G602,2000
GA,879888
GT08,1580384
GT17,1968192
GT25,4104
GT37,21550
GT67,24770
H6,660652
IL,137651
JD05,1518400
JD14,325800
JD25,828600
JD37,357100
K1,261549
K2,4715330
L100,284
L116,80000
L7,200847
L8,3158
L9,5054495
LE09,75776
LE26,343410
LP11,1030
LP21,492
LP31,113
LP55,3
LQ11,6776000
LQ21,3543600
LQ31,4525600
LT09,682800
LT12,5715
LT15,568873
LT22,236077
LT24,702800
LT25,4600
LT38,28990
LT65,300125
M395,29600
OV14,462
OV18,86300
OV40,217899
Q150,678
QD11,1000022
QD31,50
QF50,58575
QM25,57900
QP10,1792153
QP15,953400
QP20,770000
QP30,179450
QP31,163223
QP50,8
QT50,66340
R39,62440
R40,18807
r57,3456
rc23,3370
RC27,2809
RC39,2570
rc7,7137
rc79,1296
S1,25007
S117,1000000
S13p,52313
S18,75000
S317,289148
S318,3046
S319,30000
S40,300
S408,4967
S76,28
S82,103238
S99,480
SD11,6719
SD12,23123
SD14,22595
SD17,100000
SD29,252392
SD3,20000
SD5,14090
SD98,653
SF20,1000
SF74,7330
SV19,26461
SV6p,154994
T402,2000
T602,2000
TG17,2031
TG8,2964
TG92,1759
WD17,131194
WD24,94589
WD29,202198
WD37,101794
WD43,112942
WWE1,9600
XR91,70000
EXPECTED OUTPUT :
The output should contain the values which are present in the file for each array element.
If not present the output should contain the array element as zero. For eg:
c9 is not present in the file
output of c9 should be
c9,0
Your approach is not bad. I just would use
^$i,
as a grep-pattern. With your current file data, it's not necessary, but maybe one day your file will contain things like
X,2354
XA,1234
and suddenly your algorithm will fail, if args contain the element X.
Also, the echo statement is unnecessarily complex. I would write it simply as
echo $x,0
You can also simplify the if, by combining it with the grep
if ! grep ^$i, file.txt
but this is mere cosmetics and a matter of taste.

Saving json dictionaries to a python dictionary with a loop, 'no JSON object' error

I'm compiling a lot of JSON data over API and getting an error: "ValueError: No JSON object could be decoded" I have bolded the line that generates the error.
Most perplexing is that it successfully enters the first dictionary entry, but I get the error when it repeates the for loop.
The problematic cell of code looks like this. 'genkey' and 'conkey' are dictionaries with about 50 keys, each one has a five or six digit number as its value.
gdata={}
cdata={}
for site in genkeys:
print genkeys[site]
gpayload = {'data_key_id': gdatakey, 'range_start': rangestart, 'range_end': rangeend,'period':'hour', 'token':'5b'}
gr = requests.get("http://appl.d.com/dta/raw.json?",params=gpayload)
print gr
print genkeys[site]
**gdata[gdatakey]=gr.json()**
if site in conkeys.keys():
cdatakey=conkeys[site]
cpayload = {'data_key_id': cdatakey, 'range_start': rangestart, 'range_end': rangeend,'period':'hour', 'token':'5b'}
cr = requests.get("http://appl.d.com/dta/raw.json?",params=cpayload)
cdata[cdatakey]=cr.json()
print gdata
print cdata
There was a null entry in the json code. That's what threw the error.

Resources