Lua word search - arrays

How would I go about doing a multiple pattern search in Lua? (I have Lpeg set up).
For example, say I'm receiving strings in a row, I'm processing one at a time, captalizing them and calling them msg. Now I want to get msg and check if it has any of the following patterns: MUFFIN MOOPHIN MUPHEN M0FF1N for a start. How can I check if msg has any of those (doesn't matter if it's more than one) whithout having to write a huge if(or or or or)?

One thing you could do is make a table of words you want to look for, then use gmatch to iterate each word in the string and check if it's in that table.
#!/usr/bin/env lua
function matchAny(str, pats)
for w in str:gmatch('%S+') do
if pats[w] then
return true
end
end
return false
end
pats = {
['MUFFIN'] = true,
['MOOPHIN'] = true,
['MUPHEN'] = true,
['M0FF1N'] = true,
}
print(matchAny("I want a MUFFIN", pats)) -- true
print(matchAny("I want more MUFFINs", pats)) -- false

A late answer but you can construct a pattern to match all words case-insensitively (only if not followed by an alphanum), capture match position in subject and word index that is being matched with something like this:
local lpeg = require("lpeg")
local function find_words(subj, words)
local patt
for idx, word in ipairs(words) do
word = lpeg.P(word:upper()) * lpeg.Cc(idx)
patt = patt and (patt + word) or word
end
local locale = lpeg.locale()
patt = lpeg.P{ lpeg.Cp() * patt * (1 - locale.alnum) + 1 * lpeg.V(1) }
return patt:match(subj:upper())
end
local words = { "MUFFIN", "MOOPHIN", "MUPHEN", "M0FF1N" }
local pos, idx = find_words("aaaaa bbb ccc muPHEN ddd", words)
-- output: 16, 3

Related

How do i copy an array from a lua file

I Want to copy an array from a text file and make another array equal it
so
local mapData = {
grass = {
cam = "hud",
x = 171,
image = "valley/grass",
y = 168,
animated = true
}
}
This is an array that is in Data.lua
i want to copy this array and make it equal another array
local savedMapData = {}
savedMapData = io.open('Data.lua', 'r')
Thank you.
It depends on Lua Version what you can do further.
But i like questions about file operations.
Because filehandlers in Lua are Objects with methods.
The datatype is userdata.
That means it has methods that can directly be used on itself.
Like the methods for the datatype string.
Therefore its easy going to do lazy things like...
-- Example open > reading > loading > converting > defining
-- In one Line - That is possible with methods on datatype
-- Lua 5.4
local savedMapData = load('return {' .. io.open('Data.lua'):read('a'):gsub('^.*%{', ''):gsub('%}.*$', '') .. '}')()
for k, v in pairs(savedMapData) do print(k, '=>', v) end
Output should be...
cam => hud
animated => true
image => valley/grass
y => 168
x => 171
If you need it in the grass table then do...
local savedMapData = load('return {grass = {' .. io.open('Data.lua'):read('a'):gsub('^.*%{', ''):gsub('%}.*$', '') .. '}}')()
The Chain of above methods do...
io.open('Data.lua') - Creates Filehandler (userdata) in read only mode
(userdata):read('a') - Reading whole File into one (string)
(string):gsub('^.*%{', '') - Replace from begining to first { with nothing
(string):gsub('%}.*$', '') - Replace from End to first } with nothing

Scala read only certain parts of file

I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))

Addressing an index in array in Lua

I am trying to write a simple game using Love 2d engine. It uses lua as the scripting language. I have some problems with arrays and can't find any solution. Here is my issue:
for i = 1, 10 do
objects.asteroids = {}
objects.asteroids[i] = {}
objects.asteroids[i].body = love.physics.newBody(world, 650/2, 650/2, "dynamic")
objects.asteroids[i].size = 3
objects.asteroids[i].angle = math.random(6)
end
In the same function I am trying to do a following operation:
for i = 1, 10 do
objects.asteroids[i].size = 2
end
And I get this error when trying to run my game:
Error
main.lua:48: attempt to index a nil value
Where line 48 refers to this line of code:
objects.asteroids[i].size = 2
You're overwriting objects.asteroids on each loop iteration.
for i = 1, 10 do
objects.asteroids = {} -- <== Here.
objects.asteroids[i] = {}
What this means is that the asteroid objects that you're trying to add end up being erased on the next step of the loop, since object.asteroids is set to a new {} table and the old one becomes inaccessible thereafter.
You might want to rearrange it like so:
objects.asteroids = {}
for i = 1, 10 do
objects.asteroids[i] = {}
-- ...

Read from text file and assign data to new variable

Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...

compare and delete on an array in matlab

I am trying to write a short code to read a .m file(testin1.m) into an array, and search for a particular word( 'auto'). if match is found,delete it. i have the following code, please help me figure out my mistake.
fid = fopen('testin1.m');
txt = textscan(fid,'%s');
fclose(fid);
m_file_idx = 1;
data=['auto'];
B=cellstr(data);
for idx = i : length(txt)
A=txt{i};
is_auto=isequal(A, B);
if is_auto==0
txt{i}=[];
end
end
if txt{i}=auto then it should delete that row.
AK4749's answer is absolutely correct in showing where you went wrong. I'll just add an alternative solution to yours, which is shorter:
C = textread('testin1.m', '%s', 'delimiter', '\n');
C = C(cellfun(#isempty, regexp(C, 'auto')));
That's it!
EDIT #1: answer modified to remove the lines that contains the word 'auto', not just the word itself.
EDIT #2: answer modified to accept regular expressions.
This is an error i have hit many amany many many times:
when you set txt(i) = [], you change the length of the array. Your for loop condition is no longer valid.
A better option would be to use the powerful indexing features:
A(find(condition)) = [];
or account for the change in length:
A(i) = [];
i--; % <-- or i++, it is too early to think, but you get the idea
EDIT: I just noticed you were also using A in your program. mine was just some random variable name, not the same A you might be using
When you set txt(i) = [], you changed the length of the array but the loop indexing does not account for the change. You can use logical indexing to avoid the loop and the problem.
Example:
wordToDelete = 'auto';
txt = {'foo', 'bar', 'auto', 'auto', 'baz', 'auto'}
match = strcmp(wordToDelete, txt)
txt = txt(~match)
Output:
txt =
'foo' 'bar' 'auto' 'auto' 'baz' 'auto'
match =
0 0 1 1 0 1
txt =
'foo' 'bar' 'baz'

Resources