How to split array with 1 column into multiple columns (numpy array) - arrays

I currently have a .txt file that has been loaded into python as a list, and then placed into a np array as a single column with n number of rows depending on the file size. The file has rows trimmed off the top and bottom to clean it up. Relevant code is shown below:
# Open the .txt file in the user selected folder
txtFile = glob.glob(folderView + "**/*.txt", recursive = True)
print (" The text files in the folder " + userInput + " are: " + str(txtFile))
# The quantitative output epatemp.txt file is always index position 1 in the glob list.
epaTemp = txtFile[1]
print (epaTemp)
###
quantData = open(epaTemp, 'r')
print (quantData)
result = [line.split(',') for line in quantData.readlines()[24:]]
print (result)
n = 4
result = result[:-n or None]
print (result)
print (" Length of List: " + str(len(result)))
print (result[0])
### Loop through all list components and split each
print (" NUMPY STUFF! ")
list_array = np.array(result)
print (list_array)
The array is then printed like this: much cleaner than the 1990's .txt files I am given to process...
My issue is that I am unable to split this single column into multiple columns after defining the opened file as a numpy array in the final lines of code listed above. I am trying to produce a single column for everything that is separated by a space, " ", but am having trouble doing so. It might also be important to note that when I use numpy.shape(list_array), it returns nothing as if the array is not being interpreted properly. The overall goal is to turn this 1col x n-rows array into 7col x n-rows for every value split by a " " in the text file. If anyone can help me with this issue I'd really appreciate it.

All fixed. I did split the txt file incorrectly, Using line.split() solved the issue. pandas was also necessary, as this was not the best application for np. Thanks everyone for the feedback.

Related

Move N elements in array from back to front

I have a text file contains 2 columns, I need to select one column of them as an array
which contains 200000 and cut N elements from this array and move them from back to front.
I used the following code:
import numpy as np
import glob
files = glob.glob("input/*.txt")
for file in files:
data_file = np.loadtxt(file)
2nd_columns = data_file [:,1]
2nd_columns_array = np.array(2nd_columns)
cut = 62859 # number of elements to cut
remain_points = 2nd_columns_array[:cut]
cut_points = 2nd_columns_array[cut:]
new_array = cut_points + remain_points
It doesn't work and gave me the following error:
ValueError: operands could not be broadcast together with shapes (137141,) (62859,)
any help, please??
It doesn't work because you are trying to add values stored in both arrays and they have different shapes.
One of the ways is to use numpy.hstack:
new_array = np.hstack((2nd_columns_array[cut:], 2nd_columns_array[:cut]))
Side notes:
with your code you will reorder only 2nd column of the last file since reordering is outside of the for loop
you don't need to store cut_poinsts nor remain_points in separate variables. You can operate directly on the 2nd_columns_array
you shouldn't name variables starting from a number
A simple method for this process is numpy.roll.
new_array = np.roll(2nd_column, cut)

Split command in Excel VBA adding a space between every character in array

I am reading a text file and splitting the data string at each space using the following code.
L = f.ReadLine
If L = "" Then GoTo errorpoint:
On Error GoTo errorpoint:
sl = Split(L, " ", -1)
However, the array I get back takes, for example, the word "Contact" in the text file and turns it into " C o n t a c t " in a single cell of my array (see below) and I'm not sure why.
Data Output in VBA Watch Window:
I have tried using
For i = 0 To UBound(sl)
sl(i) = Replace(sl(i), " ", "")
Next
Afterwards to remove the spaces, but that doesn't appear to be removing the spaces from my data. Any ideas how I can prevent the code from adding a space between every character in the first place?
I then need to check whether sl(1) and sl(2) contain the words "Contact" and "Stress" respectively, but that condition cannot be met currently, the cells contain " C o n t a c t " and " S t r e s s ", so I changed the if loop to reflect that, but the condition is still not met.
In addition, I cannot convert my data from String > Double because of the unusual formatting, it seems.
As #LocEngineer pointed out in the comments, during the update of the external software that produces the text file I was reading, the text encoding had been changed to UTF-16 and the additional characters between my data were chr(0) Null values from the ASCII table, not spaces (chr(32) in ASCII).
For i = 0 To UBound(sl)
sl(i) = Replace(sl(i), Chr(0), "")
Next
This snippet solved the problem.

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

Break array into 'tables' with ruby

I can't seem to get this down, right.
I have a big list of words in an array. I want these words to appear in 8 'tables', each 14 rows by 9 columns, with words running down each column of the table.
So I can get as far as columns = words.each_slice(14) and then later tables = columns.each_slice(9) but from there i'm not sure. I feel like I should make a hash and append the first n item of each column to an array, and then maybe join them with a tab delimiter.
My destination is a spreadsheet, so maybe outputting to CSV would make sense? I'm just not sure how to have it grouped into separate 'tables' (instead of just 9 columns with lots of rows and no separation) but maybe all it takes is a csv line with all blanks?
Anyway, any input or insight would be welcome.
This will do what you ask
You don't say anything about the output format you want, so I've just surrounded each word with quotes, joined them with commas and put a blank line between tables.
My "words" are just the numbers 1 to 200.
words = (1 .. 200).map { |v| '%03d' % v }
words.each_slice(14).each_slice(9) do |table|
(0 ... table[0].size).each do |i|
row = table.map { |column| column[i] }
row.pop if row[-1].nil?
puts row.map { |cell| %<"#{cell}"> }.join ','
end
puts ''
end
output
"001","015","029","043","057","071","085","099","113"
"002","016","030","044","058","072","086","100","114"
"003","017","031","045","059","073","087","101","115"
"004","018","032","046","060","074","088","102","116"
"005","019","033","047","061","075","089","103","117"
"006","020","034","048","062","076","090","104","118"
"007","021","035","049","063","077","091","105","119"
"008","022","036","050","064","078","092","106","120"
"009","023","037","051","065","079","093","107","121"
"010","024","038","052","066","080","094","108","122"
"011","025","039","053","067","081","095","109","123"
"012","026","040","054","068","082","096","110","124"
"013","027","041","055","069","083","097","111","125"
"014","028","042","056","070","084","098","112","126"
"127","141","155","169","183","197"
"128","142","156","170","184","198"
"129","143","157","171","185","199"
"130","144","158","172","186","200"
"131","145","159","173","187"
"132","146","160","174","188"
"133","147","161","175","189"
"134","148","162","176","190"
"135","149","163","177","191"
"136","150","164","178","192"
"137","151","165","179","193"
"138","152","166","180","194"
"139","153","167","181","195"
"140","154","168","182","196"
You were on the right track. Here's a solution that writes in CSV format for lists whose length is a multiple of 14*9. You can also create a spreadsheet directly with the appropriate gem. I'll post an update which handles any length list shortly.
Note that I think each_slice requires you to include Enumerable for at least pre 2.0 Ruby versions.
(0...14*9*2).each_slice(14).collect.each_slice(9) {|table|
table.transpose.each {|row|
puts row.inspect.delete('[]')} ; puts}
If you need to pad your input array to a multiple of 14*9 so that the transpose works, you can use the following:
def print_csv(array)
mod=array.length%(14*9)
array = array+[nil]*(14*9-mod) if mod>0
array.each_slice(14).collect.each_slice(9) {|table|
table.transpose.each {|row|
puts row.reject(&:nil?)*','} ; puts}
end

Reading and Writing text to a NEW file - Matlab

I have a file that contains a full set of values for some sentences which have transcribed for a speech recognition program. Ive been trying to write some matlab code to go through this file and extract the values for each sentence and write them to a new individual file. So instead of having them all in one 'mlf' file i want them in separate files for each sentence.
For example by 'mlf' file (contains all values for all sentences) looks like this:
#!MLF!#
"/N001.lab"
AH
SEE
I
GOT
THEM
MONTHS
AGO
.
"/N002.lab"
WELL
WORK
FOR
LIVE
WIRE
BUT
ERM
.
"/N003.lab"
IM
GOING
TO
SEE
JAMES
VINCENT
MCMORROW
.
etc
So each sentences is separated by the 'Nxxx.lab' and the '.'. I need to create a new file for every Nxxx.lab, for example the file for N001 would just contain:
AH
SEE
I
GOT
THEM
MONTHS
AGO
I've been trying to use fgetline to specify the 'Nxxx.lab' and '.' boundaries, but it doesn't work as i don't know how to write the content into a new file separate from the 'mlf'.
If anyone can give me any guidance of what sort of approach to use would be greatly appreciated!
Cheers!
Try this code (input file test.mlf has to be in the working directory):
%# read the file
filename = 'test.mlf';
fid = fopen(filename,'r');
lines = textscan(fid,'%s','Delimiter','\n','HeaderLines',1);
lines = lines{1};
fclose(fid);
%# find start and stop indices
istart = find(cellfun(#(x) strcmp(x(1),'"'), lines));
istop = find(strcmp(lines, '.'));
assert(numel(istop)==numel(istop) && all(istop>istart),'Check the input file format.')
%# write lines to new files
for k = 1:numel(istart)
filenew = lines{istart(k)}(2:end-1);
fout = fopen(filenew,'wt');
for l = (istart(k)+1):(istop(k)-1)
fprintf(fout,'%s\n',lines{l});
end
fclose(fout);
end
The code assume that the file names are in double-quotes as in your example. If not, you can find istart indices base on a pattern. Or just assuming that entries for new file start from the 2nd line and follows the dot: istart = [1; istop(1:end-1)+1];
You could use a growing cell array to gather the information.
Read one line at a time from the file.
Grab the file name and put it into the first column if its the first read for the sentence.
If the line read is a period, add it to the string and move the index to a row in the array. Write the new file with the content.
This bit of code should help you in building the cell array while appending a string within it. I assume reading line by line is not a problem. You can also retain the carriage returns/new lines within the string ('\n').
%% Declare A
A = {}
%% Fill row 1
A(1,1) = {'file1'}
A(1,2) = {'Sentence 1'}
A(1,2) = { strcat(A{1,2}, ', has been appended')}
%% Fill row 2
A(2,1) = {'file2'}
A(2,2) = {'Sentence 2'}
While I'm sure you can do this with MATLAB, I would suggest you use Perl to split the original file and then process the individual files using MATLAB.
The following Perl script reads the entire file ("xxx.txt") and writes out the individual files according the the "NAME.lab" lines:
open(my $fh, "<", "xxx.txt");
# read the entire file into $contents
# This may not be a good idea if the file is huge.
my $contents = do { local $/; <$fh> };
# iterate over the $contents string and extract the individual
# files
while($contents =~ /"(.*)"\n((.*\n)*?)\./mg) {
# We arrive here with $1 holding the filename
# and $2 the content up to the "." ending the section/sentence.
open(my $fout, ">", $1);
print $fout $2;
close($fout);
}
close($fh);
The multiline regular expression is a bit difficult but it does the job.
For these sort of text manipulation, perl is much faster and useful. A good tool to learn if you process a lot of text.

Resources