Reading and Writing Arrays from Multiple HDF Files in IDL - arrays

I am fairly new to IDL, and I am trying to write a code that will take a MODIS HDF file (level three data MOD14A1 and MYD14A1 to be specific), read the array, and then write the data from the array preferably to a csv file, but ASCII would work, too. I have code that will allow me to do this for one file, but I want to be able to do this for multiple files. Essentially, I want it to read one HDF array, write it to a csv, move to the next HDF file, and then write that array to the same csv file in the next row. Any help here would be greatly appreciated. I've supplied the code I have so far to do this with one file.
filename = dialog_pickfile(filter = filter, path = path, title = title)
csv_file = 'Data.csv'
sd_id = HDF_SD_START(filename, /READ)
; read "FirePix", "MaxT21"
attr_index = HDF_SD_ATTRFIND(sd_id, 'FirePix')
HDF_SD_ATTRINFO, sd_id, attr_index, DATA = FirePix
attr_index = HDF_SD_ATTRFIND(sd_id, 'MaxT21')
HDF_SD_ATTRINFO, sd_id, attr_index, DATA = MaxT21
index = HDF_SD_NAMETOINDEX(sd_id, 'FireMask')
sds_id = HDF_SD_SELECT(sd_id, index)
HDF_SD_GETDATA, sds_id, FireMask
HDF_SD_ENDACCESS, sds_id
index = HDF_SD_NAMETOINDEX(sd_id, 'MaxFRP')
sds_id = HDF_SD_SELECT(sd_id, index)
HDF_SD_GETDATA, sds_id, MaxFRP
HDF_SD_ENDACCESS, sds_id
HDF_SD_END, sd_id
help, FirePix
print, FirePix, format = '(8I8)'
print, MaxT21, format = '("MaxT21:", F6.1, " K")'
help, FireMask, MaxFRP
WRITE_CSV, csv_file, FirePix
After I run this, and choose the correct file, this is the output I am getting:
FIREPIX LONG = Array[8]
0 4 0 0 3 12 3 0
MaxT21: 402.1 K
FIREMASK BYTE = Array[1200, 1200, 8]
MAXFRP LONG = Array[1200, 1200, 8]
The "FIREPIX" array is the one I want stored into a csv.
Thanks in advance for any help!

Instead of using WRITE_CSV, it is fairly simple to use the primitive IO routines to write a comma-separated array, i.e.:
openw, lun, csv_file, /get_lun
; the following line produces a single line the output CSV file
printf, lun, strjoin(strtrim(firepix, 2), ', ')
; TODO: do the above line as many times as necessary
free_lun, sun

Related

Python3 - IndexError when trying to save a text file

i'm trying to follow this tutorial with my own local data files:
CNTK tutorial
i have the following function to save my data array into a txt file feedable to CNTK:
# Save the data files into a format compatible with CNTK text reader
def savetxt(filename, ndarray):
dir = os.path.dirname(filename)
if not os.path.exists(dir):
os.makedirs(dir)
if not os.path.isfile(filename):
print("Saving", filename )
with open(filename, 'w') as f:
labels = list(map(' '.join, np.eye(11, dtype=np.uint).astype(str)))
for row in ndarray:
row_str = row.astype(str)
label_str = labels[row[-1]]
feature_str = ' '.join(row_str[:-1])
f.write('|labels {} |features {}\n'.format(label_str, feature_str))
else:
print("File already exists", filename)
i have 2 ndarrays of the following shape that i want to feed the model:
train.shape
(1976L, 15104L)
test.shape
(1976L, 15104L)
Then i try to implement the fucntion like this:
# Save the train and test files (prefer our default path for the data)
data_dir = os.path.join("C:/Users", 'myself', "OneDrive", "IA Project", 'data', 'train')
if not os.path.exists(data_dir):
data_dir = os.path.join("data", "IA Project")
print ('Writing train text file...')
savetxt(os.path.join(data_dir, "Train-128x118_cntk_text.txt"), train)
print ('Writing test text file...')
savetxt(os.path.join(data_dir, "Test-128x118_cntk_text.txt"), test)
print('Done')
and then i get the following error:
Writing train text file...
Saving C:/Users\A702628\OneDrive - Atos\Microsoft Capstone IA\Capstone data\train\Train-128x118_cntk_text.txt
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-24-b53d3c69b8d2> in <module>()
6
7 print ('Writing train text file...')
----> 8 savetxt(os.path.join(data_dir, "Train-128x118_cntk_text.txt"), train)
9
10 print ('Writing test text file...')
<ipython-input-23-610c077db694> in savetxt(filename, ndarray)
12 for row in ndarray:
13 row_str = row.astype(str)
---> 14 label_str = labels[row[-1]]
15 feature_str = ' '.join(row_str[:-1])
16 f.write('|labels {} |features {}\n'.format(label_str, feature_str))
IndexError: list index out of range
Can somebody please tell me what's going wrong with this part of the code? And how could i fix it? Thank you very much in advance.
Since you're using your own input data -- are they labelled in the range 0 to 9? The labels array only has 10 entries in it, so that could cause an out-of-range problem.

Run parallel loops in Ruby

I have two sets of arrays stored in a file and I need to extract values one by one and compare them. I am using this code but does look like I am doing correctly.
# First Dataset
File.foreach(file_set_a) do |data_a|
data_array_a = data_a.split("\t")
#file_name_a = data_array_a[0]
#file_ext_a = data_array_a[1]
# Second Dataset
File.foreach(file_set_b) do |data_b|
data_array_b = data_b.split("\t")
#file_name_b = data_array_b[0]
#file_ext_b = data_array_b[1]
#Compare
#file_name_a == #file_name_b
end
end
The problem is, I cannot go back and extract the next values in the set A when I enter the set B. Any suggestions?
First, convert those 2 files into two separated data arrays
lines_array_a = File.readlines(file_set_a)
lines_array_b = File.readlines(file_set_b)
I am assuming both of the array size will be same. Now run a loop and get the items from both array to compare them.
for i in 0..(lines_array_a.count - 1) do
data_array_a = lines_array_a[i].split("\t")
#file_name_a = data_array_a[0]
#file_ext_a = data_array_a[1]
data_array_b = lines_array_b[i].split("\t")
#file_name_b = data_array_b[0]
#file_ext_b = data_array_b[1]
#file_name_a == #file_name_b
end

Splitting two large CSV files preserving relations between file A and B across the resulting files

I have TWO large, CSV files (around 1GB). They both share relation between each other (ID is lets say like a foreign key). Structure is simple, line by line but CSV cells with a line break in the value string can appear
37373;"SOMETXT-CRCF or other other line break-";3838383;"sasa ssss"
One file is P file and other is T file. T is like 70% size of the P file (P > T). I must cut them to smaller parts since they are to big for the program I have to import them... I can not simply use split -l 100000 since I will loose ID=ID relations which must be preserved! Relation can be 1:1, 2:3, 4:6 or 1:5. So stupid file splitting is no option, we must check the place where we create a new file. This is example with simplified CSV structure and a place where I want the file to be cut (and the lines above go to separate P|T__00x file and we continue till P or T ends). Lines are sorted in both files, so no need to search for IDs across whole file!
File "P" (empty lines for clearness):
CSV_FILE_P;HEADER;GOES;HERE
564788402;1;1;"^";"01"
564788402;2;1;"^";"01"
564788402;3;1;"^";"01"
575438286;1;1;"^";"01"
575438286;1;1;"^";"01"
575438286;2;1;"37145859"
575438286;2;1;"37145859"
575438286;3;1;"37145859"
575438286;3;1;"37145859"
575439636;1;1;"^"
575439636;1;1;"^"
# lets say ~100k line limit of file P is somewhere here and no more 575439636 ID lines , so we cut.
575440718;1;1;"^"
575440718;1;1;"^"
575440718;2;1;"10943890"
575440718;2;1;"10943890"
575440718;3;1;"10943890"
575440718;3;1;"10943890"
575441229;1;1;"^";"01"
575441229;1;1;"^";"01"
575441229;2;1;"4146986"
575441229;2;1;"4146986"
575441229;3;1;"4146986"
575441229;3;1;"4146986"
File T (empty lines for clearness)
CSV_FILE_T;HEADER;GOES;HERE
564788402;4030000;1;"0204"
575438286;6102000;1;"0408"
575438286;6102000;0;"0408"
575439636;7044010;1;"0408"
575439636;7044010;0;"0408"
# we must cut here since bigger file "P" 100k limit has been reached
# and we end here because 575439636 ID lines are over.
575440718;6063000;1;"0408"
575440718;6063000;0;"0408"
575441229;8001001;0;"0408"
575441229;8001001;1;"0408"
Can you please help splitting those two files into many 100 000 (or so) lines separate files T_001 and corresponding P_001 file and so on? So ID matches between file parts. I believe awk will be the best tool but I have not got much experience in this field. And the last thing - CSV header should be preserved in each of the files.
I have powerful AIX machine to cope with that (linux also possible since AIX commands are limited sometimes)
You can parse the beginning IDs with awk and then check to see if the current ID is the same as the last one. Only when it is different are you allowed close the current output file and open a new one. At that point record the ID for tracking the next file. You can track this id in a text file or in memory. I've done it in memory but with big files like this you could run into trouble. It's easier to keep track in memory than opening multiple files and reading from them.
Then you just need to distinguish between the first file (output and recording) and the second file (output and using the prerecorded data).
The code does a very brute force check on the possibility of a CRLF in a field - if the line does not begin with what looks like an ID, then it outputs the line and does no further testing on it. Which is a problem if the CRLF is followed immediately by a number and semicolon! This might be unlikely though...
Run with: gawk -f parser.awk P T
I don't promise this works!
BEGIN {
MAXLINES = 100000
block = 0
trackprevious = 0
}
FNR == 1 {
# First line is CSV header
csvheader = $0
if (FILENAME == "-")
{
_error = 1
print "Error: Need filename on command line"
exit 1
}
if (trackprevious)
{
_error = 1
print "Only one file can track another"
exit 1
}
if (block >= 1)
{
# New file - track previous output...
close(outputname)
Tracking[block] = idval
print "Output for " FILENAME " tracks previous file"
trackprevious = 1
}
else
{
print "Chunking output (" MAXLINES ") for " FILENAME
}
linecount = 0
idval = 0
block = 1
outputprefix = FILENAME "_block"
outputname = sprintf("%s_%03d", outputprefix, block)
print csvheader > outputname
next
}
/^[0-9]+;/ {
linecount++
newidval = $0
sub(/;.*$/, "", newidval)
newidval = newidval + 0 # make a number
startnewfile = 0
if (trackprevious && (idval != newidval) && (idval == Tracking[block]))
{
startnewfile = 1
}
else if (!trackprevious && (idval != newidval) && (linecount > MAXLINES))
{
# Last ID value found before new file:
Tracking[block] = idval
startnewfile = 1
}
if (startnewfile)
{
close(outputname)
block++
outputname = sprintf("%s_%03d", outputprefix, block)
print csvheader > outputname
linecount = 1
}
print $0 > outputname
idval = newidval
next
}
{
linecount++
print $0 > outputname
}

Read from text file and assign data to new variable

Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...

writing p-values to file in R

Can someone help me with this piece of code. In a loop I'm saving p-values in f and then I want to write the p-values to a file but I don't know which function to use to write to file. I'm getting error with write function.
{
f = fisher.test(x, y = NULL, hybrid = FALSE, alternative = "greater",
conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE)
write(f, file="fisher_pvalues.txt", sep=" ", append=TRUE)
}
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
The return value from fisher.test is (if you read the docs):
Value:
A list with class ‘"htest"’ containing the following components:
p.value: the p-value of the test.
conf.int: a confidence interval for the odds ratio. Only present in
the 2 by 2 case and if argument ‘conf.int = TRUE’.
etc etc. R doesn't know how to write things like that to a file. More precisely, it doesn't know how YOU want it written to a file.
If you just want to write the p value, then get the p value and write that:
write(f$p.value,file="foo.values",append=TRUE)
f is an object of class 'htest', so writing it to a file will write much more than just the p-value.
If you do want to simply save a written representation of the results to a file, just as they appear on the screen, you can use capture.output() to do so:
Convictions <-
matrix(c(2, 10, 15, 3),
nrow = 2,
dimnames =
list(c("Dizygotic", "Monozygotic"),
c("Convicted", "Not convicted")))
f <- fisher.test(Convictions, alternative = "less")
capture.output(f, file="fisher_pvalues.txt", append=TRUE)
More likely, you want to just store the p-value. In that case you need to extract it from f before writing it to the file, using code something like this:
write(paste("p-value from Experiment 1:", f$p.value, "\n"),
file = "fisher_pvalues.txt", append=TRUE)

Resources