compare and delete on an array in matlab - arrays

I am trying to write a short code to read a .m file(testin1.m) into an array, and search for a particular word( 'auto'). if match is found,delete it. i have the following code, please help me figure out my mistake.
fid = fopen('testin1.m');
txt = textscan(fid,'%s');
fclose(fid);
m_file_idx = 1;
data=['auto'];
B=cellstr(data);
for idx = i : length(txt)
A=txt{i};
is_auto=isequal(A, B);
if is_auto==0
txt{i}=[];
end
end
if txt{i}=auto then it should delete that row.

AK4749's answer is absolutely correct in showing where you went wrong. I'll just add an alternative solution to yours, which is shorter:
C = textread('testin1.m', '%s', 'delimiter', '\n');
C = C(cellfun(#isempty, regexp(C, 'auto')));
That's it!
EDIT #1: answer modified to remove the lines that contains the word 'auto', not just the word itself.
EDIT #2: answer modified to accept regular expressions.

This is an error i have hit many amany many many times:
when you set txt(i) = [], you change the length of the array. Your for loop condition is no longer valid.
A better option would be to use the powerful indexing features:
A(find(condition)) = [];
or account for the change in length:
A(i) = [];
i--; % <-- or i++, it is too early to think, but you get the idea
EDIT: I just noticed you were also using A in your program. mine was just some random variable name, not the same A you might be using

When you set txt(i) = [], you changed the length of the array but the loop indexing does not account for the change. You can use logical indexing to avoid the loop and the problem.
Example:
wordToDelete = 'auto';
txt = {'foo', 'bar', 'auto', 'auto', 'baz', 'auto'}
match = strcmp(wordToDelete, txt)
txt = txt(~match)
Output:
txt =
'foo' 'bar' 'auto' 'auto' 'baz' 'auto'
match =
0 0 1 1 0 1
txt =
'foo' 'bar' 'baz'

Related

Scala read only certain parts of file

I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))

Reading and Writing Arrays from Multiple HDF Files in IDL

I am fairly new to IDL, and I am trying to write a code that will take a MODIS HDF file (level three data MOD14A1 and MYD14A1 to be specific), read the array, and then write the data from the array preferably to a csv file, but ASCII would work, too. I have code that will allow me to do this for one file, but I want to be able to do this for multiple files. Essentially, I want it to read one HDF array, write it to a csv, move to the next HDF file, and then write that array to the same csv file in the next row. Any help here would be greatly appreciated. I've supplied the code I have so far to do this with one file.
filename = dialog_pickfile(filter = filter, path = path, title = title)
csv_file = 'Data.csv'
sd_id = HDF_SD_START(filename, /READ)
; read "FirePix", "MaxT21"
attr_index = HDF_SD_ATTRFIND(sd_id, 'FirePix')
HDF_SD_ATTRINFO, sd_id, attr_index, DATA = FirePix
attr_index = HDF_SD_ATTRFIND(sd_id, 'MaxT21')
HDF_SD_ATTRINFO, sd_id, attr_index, DATA = MaxT21
index = HDF_SD_NAMETOINDEX(sd_id, 'FireMask')
sds_id = HDF_SD_SELECT(sd_id, index)
HDF_SD_GETDATA, sds_id, FireMask
HDF_SD_ENDACCESS, sds_id
index = HDF_SD_NAMETOINDEX(sd_id, 'MaxFRP')
sds_id = HDF_SD_SELECT(sd_id, index)
HDF_SD_GETDATA, sds_id, MaxFRP
HDF_SD_ENDACCESS, sds_id
HDF_SD_END, sd_id
help, FirePix
print, FirePix, format = '(8I8)'
print, MaxT21, format = '("MaxT21:", F6.1, " K")'
help, FireMask, MaxFRP
WRITE_CSV, csv_file, FirePix
After I run this, and choose the correct file, this is the output I am getting:
FIREPIX LONG = Array[8]
0 4 0 0 3 12 3 0
MaxT21: 402.1 K
FIREMASK BYTE = Array[1200, 1200, 8]
MAXFRP LONG = Array[1200, 1200, 8]
The "FIREPIX" array is the one I want stored into a csv.
Thanks in advance for any help!
Instead of using WRITE_CSV, it is fairly simple to use the primitive IO routines to write a comma-separated array, i.e.:
openw, lun, csv_file, /get_lun
; the following line produces a single line the output CSV file
printf, lun, strjoin(strtrim(firepix, 2), ', ')
; TODO: do the above line as many times as necessary
free_lun, sun

Lua word search

How would I go about doing a multiple pattern search in Lua? (I have Lpeg set up).
For example, say I'm receiving strings in a row, I'm processing one at a time, captalizing them and calling them msg. Now I want to get msg and check if it has any of the following patterns: MUFFIN MOOPHIN MUPHEN M0FF1N for a start. How can I check if msg has any of those (doesn't matter if it's more than one) whithout having to write a huge if(or or or or)?
One thing you could do is make a table of words you want to look for, then use gmatch to iterate each word in the string and check if it's in that table.
#!/usr/bin/env lua
function matchAny(str, pats)
for w in str:gmatch('%S+') do
if pats[w] then
return true
end
end
return false
end
pats = {
['MUFFIN'] = true,
['MOOPHIN'] = true,
['MUPHEN'] = true,
['M0FF1N'] = true,
}
print(matchAny("I want a MUFFIN", pats)) -- true
print(matchAny("I want more MUFFINs", pats)) -- false
A late answer but you can construct a pattern to match all words case-insensitively (only if not followed by an alphanum), capture match position in subject and word index that is being matched with something like this:
local lpeg = require("lpeg")
local function find_words(subj, words)
local patt
for idx, word in ipairs(words) do
word = lpeg.P(word:upper()) * lpeg.Cc(idx)
patt = patt and (patt + word) or word
end
local locale = lpeg.locale()
patt = lpeg.P{ lpeg.Cp() * patt * (1 - locale.alnum) + 1 * lpeg.V(1) }
return patt:match(subj:upper())
end
local words = { "MUFFIN", "MOOPHIN", "MUPHEN", "M0FF1N" }
local pos, idx = find_words("aaaaa bbb ccc muPHEN ddd", words)
-- output: 16, 3

Addressing an index in array in Lua

I am trying to write a simple game using Love 2d engine. It uses lua as the scripting language. I have some problems with arrays and can't find any solution. Here is my issue:
for i = 1, 10 do
objects.asteroids = {}
objects.asteroids[i] = {}
objects.asteroids[i].body = love.physics.newBody(world, 650/2, 650/2, "dynamic")
objects.asteroids[i].size = 3
objects.asteroids[i].angle = math.random(6)
end
In the same function I am trying to do a following operation:
for i = 1, 10 do
objects.asteroids[i].size = 2
end
And I get this error when trying to run my game:
Error
main.lua:48: attempt to index a nil value
Where line 48 refers to this line of code:
objects.asteroids[i].size = 2
You're overwriting objects.asteroids on each loop iteration.
for i = 1, 10 do
objects.asteroids = {} -- <== Here.
objects.asteroids[i] = {}
What this means is that the asteroid objects that you're trying to add end up being erased on the next step of the loop, since object.asteroids is set to a new {} table and the old one becomes inaccessible thereafter.
You might want to rearrange it like so:
objects.asteroids = {}
for i = 1, 10 do
objects.asteroids[i] = {}
-- ...

writing p-values to file in R

Can someone help me with this piece of code. In a loop I'm saving p-values in f and then I want to write the p-values to a file but I don't know which function to use to write to file. I'm getting error with write function.
{
f = fisher.test(x, y = NULL, hybrid = FALSE, alternative = "greater",
conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE)
write(f, file="fisher_pvalues.txt", sep=" ", append=TRUE)
}
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
The return value from fisher.test is (if you read the docs):
Value:
A list with class ‘"htest"’ containing the following components:
p.value: the p-value of the test.
conf.int: a confidence interval for the odds ratio. Only present in
the 2 by 2 case and if argument ‘conf.int = TRUE’.
etc etc. R doesn't know how to write things like that to a file. More precisely, it doesn't know how YOU want it written to a file.
If you just want to write the p value, then get the p value and write that:
write(f$p.value,file="foo.values",append=TRUE)
f is an object of class 'htest', so writing it to a file will write much more than just the p-value.
If you do want to simply save a written representation of the results to a file, just as they appear on the screen, you can use capture.output() to do so:
Convictions <-
matrix(c(2, 10, 15, 3),
nrow = 2,
dimnames =
list(c("Dizygotic", "Monozygotic"),
c("Convicted", "Not convicted")))
f <- fisher.test(Convictions, alternative = "less")
capture.output(f, file="fisher_pvalues.txt", append=TRUE)
More likely, you want to just store the p-value. In that case you need to extract it from f before writing it to the file, using code something like this:
write(paste("p-value from Experiment 1:", f$p.value, "\n"),
file = "fisher_pvalues.txt", append=TRUE)

Resources