Iterate over files in a folder to create numpy array - arrays

this is my first posting and I am really new to programming -
I have a folder with some files that I want to process and then create a numpy array with the values I need I do:
listing = os.listdir(datapath)
my_array=np.zeros(shape=(0,5))
for infile in listing:
dataset = open(infile).readlines()[1:]
data = np.genfromtxt(dataset, usecols=(1,6,7,8,9))
new_array = np.vstack((my_array, data))
and although I have 2 files in listing (datapath folder) the new_array array overwrites the data and gives me only the values of the second file
any ideas?
thanks,

If I understand you correctly, the solution to your problem is simply that you need to vstack it to "my_array" not to a new one.
Just replace the last line with this one and it should work:
my_array = np.vstack((my_array, data))
However, I do not think this is the most efficient way to do it. Since you know how many files are in that folder, just predefine the size of the array and fill its content.

Here is what you need to do to read all files in a numpy array from a specific folder. I have a folder test containing only .txt files. My following file.py is in the same test folder along with all .txt files. Each .txt file contains a 4x4 matrix/array. After running the script the obtained matrices will be a numpy array of [Nx4x4].
import numpy as np
from glob import glob
def read_all_files():
file_names = glob('test/*')
arrays = [np.loadtxt(f) for f in file_names]
matrices = np.concatenate(arrays)

Related

How to save the results of an np.array for future use when using Google Colab

I am working on a project of Information Retrieval. For that I am using Google Colab. I am in the phase where I have computed some features ("input_features") and I have the labels ("labels") by doing a for loop, which took me about 4 hours to finish.
So at the end I have appended the results to an array:
input_features = np.array(input_features)
labels = np.array(labels)
So my question would be:
Is it possible to save those results in order to use them future purposes when using google colab?
I have found 2 options that maybe could be applied but I don't know where these files are created.
1) To save them as csv files. And my code would be:
from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')
And in order to load them:
from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)
But still I don't get something back when I print.
2) Save the results of an array by using pickle where I followed these instructions from here:
https://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M
from google.colab import files
import pickle
def features_pickeled(input_features, results):
input_features = input_features + '.txt'
pickle.dump(results, open(input_features, 'wb'))
files.download(input_features)
def labels_pickeled(labels, results):
labels = labels + '.txt'
pickle.dump(results, open(labels, 'wb'))
files.download(labels)
And to load them back:
def load_from_local():
loaded_features = {}
uploaded = files.upload()
for input_features in uploaded.keys():
unpickeled_features = uploaded[input_features]
loaded[input_features] = pickle.load(BytesIO(data))
return loaded_features
def load_from_local():
loaded_labels = {}
uploaded = files.upload()
for labels in uploaded.keys():
unpickeled_labels = uploaded[labels]
loaded[labels] = pickle.load(BytesIO(data))
return loaded_labes
#How do I print the pickled files to see if I have them ready for use???
When using python I would do something like this for pickle:
#Create pickle file
with open("name.pickle", "wb") as pickle_file:
pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
name_b = pickle.load(name_pickled)
But the thing is that I don't see any files to be created in my google drive.
Is my code correct or do I miss some part of the code?
Long description in order to hopefully have explained in detail what I want to do and what I have done for this issue.
Thank you in advance for your help.
Google Colaboratory notebook instances are never guaranteed to have access to the same resources when you disconnect and reconnect because they are run on virtual machines. Therefore, you can't "save" your data in Colab. Here are a few solutions:
Colab saves your code. If the for loop operation you referenced doesn't take an extreme amount of time to run, just leave the code and run it every time you connect your notebook.
Check out np.save. This function allows you to save an array to a binary file. Then, you could re-upload your binary file when you reconnect your notebook. Better yet, you could store the binary file on Google Drive, mount your drive to your notebook, and reference it like that.
# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')
#---
# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd
#---
# Create array
arr = np.array([1, 2, 3, 4, 5])
# save to csv file
savetxt('arr.csv', arr, delimiter=',') # You will see the results if you press in the File icon (left panel)
And then you can load it again by:
# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr

Converting RGB data into an array from a text file to create an Image

I am trying to convert txt RGB data from file.txt into an array. And then, using that array, convert the RGB array into an image.
(RGB data is found at this github repository: IR Sensor File.txt).
I am trying to convert the .txt file into an array which I could use the PIL/Image library and convert the array into an Image, and then put it through the following script to create my image.
My roadblock right now is converting the arrays in file.txt into an appropriate format to work with the Image function.
from PIL import Image
import numpy as np
data = [ARRAY FROM THE file.txt]
img = Image.fromarray(data, 'RGB')
img.save('my.png')
img.show()
The RGB data looks like as follows, and can also be found at the .txt file from that github repository linked above:
[[(0,255,20),(0,255,50),(0,255,10),(0,255,5),(0,255,10),(0,255,25),(0,255,40),(0,255,71),(0,255,137),(0,255,178),(0,255,147),(0,255,158),(0,255,142),(0,255,163),(0,255,112),(0,255,132),(0,255,137),(0,255,153),(0,255,101),(0,255,122),(0,255,122),(0,255,147),(0,255,66),(0,255,66),(0,255,30),(0,255,61),(0,255,0),(0,255,0),(0,255,40),(0,255,66),(15,255,0),(0,255,15)],
[(0,255,40),(0,255,45),(15,255,0),(20,255,0),(10,255,0),(35,255,0),(0,255,5),(0,255,56),(0,255,173),(0,255,168),(0,255,153),(0,255,137),(0,255,158),(0,255,147),(0,255,127),(0,255,117),(0,255,142),(0,255,142),(0,255,122),(0,255,122),(0,255,137),(0,255,137),(0,255,101),(0,255,66),(0,255,71),(0,255,61),(0,255,25),(0,255,25),(0,255,61),(0,255,35),(0,255,0),(35,255,0)],
[(0,255,15),(0,255,25),(51,255,0),(71,255,0),(132,255,0),(101,255,0),(35,255,0),(0,255,20),(0,255,91),(0,255,153),(0,255,132),(0,255,147),(0,255,132),(0,255,158),(0,255,122),(0,255,132),(0,255,142),(0,255,158),(0,255,122),(0,255,137),(0,255,142),(0,255,147),(0,255,101),(0,255,101),(0,255,86),(0,255,86),(0,255,50),(0,255,45),(0,255,50),(0,255,56),(0,255,30),(56,255,0)],
[(0,255,45),(0,255,10),(76,255,0),(127,255,0),(132,255,0)]]
I think this should work - no idea if it's decent Python:
#!/usr/local/bin/python3
from PIL import Image
import numpy as np
import re
# Read in entire file
with open('sensordata.txt') as f:
s = f.read()
# Find anything that looks like numbers
l=re.findall(r'\d+',s)
# Convert to numpy array and reshape
data = np.array(l).reshape((24,32,3))
# Convert to image and save
img = Image.fromarray(data, 'RGB')
img.save('result.png')
I enlarged and contrast-stretched the image subsequently so you can see it!

How do I get the index to take the filename as its value?

I have a list of filenames that I want to make (they don't exist yet). I want to loop through the list and create each file. Next I want to write to each file a path (along with other text not shown here) that includes the name of the file. I have written something similar to below so far but cannot see how to get the index i to take the file name values. Please help.
import os
biglist=['sleep','heard','shed']
for i in biglist:
myfile=open('C:\autumn\winter\spring\i.txt','w')
myfile.write('DATA = c:\autumn\winter\spring\i.dat')
myfile.close
Maybe you can try this below python function.
import sys
biglist=['sleep','heard','shed']
def create_file():
for i in biglist:
try:
file_name_with_ext = "C:\autumn\winter\spring\"+ i + ".txt"
file = open(file_name_with_ext, 'a')
file.close()
except:
print("caught error!")
sys.exit(0)
create_file() #invoking the function

Python: Can you extend an array on each iteration using glob (or similar) to read in files from a directory

Is there a way to extend an array that stores data from a file on each iteration of a for-loop and with command combo, using glob. Currently, I have something like
import glob
from myfnc import func
for filename in glob.glob('*.dta'):
with open(filename,'rb') as thefile:
fileHead, data = func(thefile)
where func is defined in another script myfnc. What this does is on each iteration in the directory, stores the data from each file in fileHead and data (as arrays), erasing whatever was there on the previous iteration. What I need is something that will extend each array on each pass. Is there a nice way to do this? It doesn't need to be a for-loop, with combo. That is just how I am reading in all files from the directory.
I thought of initializing the arrays beforehand and then try extending them after the with is done on one pass, but it was giving me some kind of error with the extend command. With the error, the code would look like
import glob
from myfnc import func
fileHead, data = [0]*2
for filename in glob.glob('*.dta'):
with open(filename,'rb') as thefile:
fileHeadExtend, dataExtend = func(thefile)
fileHead.extend(fileHeadExtend)
data.extend(dataExtend)
So, the issue that it has is fileHead and data are both initialized but as int's. However, I don't want want to initialize the arrays to so many zeros. There should not be any arbitrary values in there to begin with. So, that is where issue is lying for this.
You want:
import glob
from myfnc import func
fileHead = list()
data = list()
for filename in glob.glob('*.dta'):
with open(filename,'rb') as thefile:
fileHeadExtend, dataExtend = func(thefile)
fileHead.extend(fileHeadExtend)
data.extend(dataExtend)

Load CSV file to a 2-D array in Scala

I am trying to load a csv file into a 2-D array, and my code is as follows:
var data: Array[Array[AnyRef]] = _
data = Source.fromFile(filename).getLines.map(_.split(",")).flatten.toArray
But it doesn't work.
This question
provides couple of solutions, but none works for me for some reason.
Does anyone has any ideas?

Resources