I'm trying to maintain/update/rewrite/fix a bit of Python that looks a bit like this:
variable = """My name is %s and it has been %s since I was born.
My parents decided to call me %s because they thought %s was a nice name.
%s is the same as %s.""" % (name, name, name, name, name, name)
There are little snippets that look like this all over the script, and I was wondering whether there's a simpler (more Pythonic?) way to write this code. I've found one instance of this that replaces the same variable about 30 times, and it just feels ugly.
Is the only way around the (in my opinion) ugliness to split it up into lots of little bits?
variable = """My name is %s and it has been %s since I was born.""" % (name, name)
variable += """My parents decided to call me %s because they thought %s was a nice name.""" % (name, name)
variable += """%s is the same as %s.""" % (name, name)
Use a dictionary instead.
var = '%(foo)s %(foo)s %(foo)s' % { 'foo': 'look_at_me_three_times' }
Or format with explicit numbering.
var = '{0} {0} {0}'.format('look_at_meeee')
Well, or format with named parameters.
var = '{foo} {foo} {foo}'.format(foo = 'python you so crazy')
Python 3.6 has introduced a simpler way to format strings. You can get details about it in PEP 498
>>> name = "Sam"
>>> age = 30
>>> f"Hello, {name}. You are {age}."
'Hello, Sam. You are 30.'
It also support runtime evaluation
>>>f"{2 * 30}"
'60'
It supports dictionary operation too
>>> comedian = {'name': 'Tom', 'age': 30}
>>> f"The comedian is {comedian['name']}, aged {comedian['age']}."
The comedian is Tom, aged 30.
Use formatting strings:
>>> variable = """My name is {name} and it has been {name} since..."""
>>> n = "alex"
>>>
>>> variable.format(name=n)
'My name is alex and it has been alex since...'
The text within the {} can be a descriptor or an index value.
Another fancy trick is to use a dictionary to define multiple variables in combination with the ** operator.
>>> values = {"name": "alex", "color": "red"}
>>> """My name is {name} and my favorite color is {color}""".format(**values)
'My name is alex and my favorite color is red'
>>>
Use the new string.format:
name = 'Alex'
variable = """My name is {0} and it has been {0} since I was born.
My parents decided to call me {0} because they thought {0} was a nice name.
{0} is the same as {0}.""".format(name)
>>> "%(name)s %(name)s hello!" % dict(name='foo')
'foo foo hello!'
You could use named parameters. See examples here
variable = """My name is {0} and it has been {0} since I was born.
My parents decided to call me {0} because they thought {0} was a nice name.
{0} is the same as {0}.""".format(name)
have a look at Template Strings
If you are using Python 3, than you can also leverage, f-strings
myname = "Test"
sample_string = "Hi my name is {name}".format(name=myname)
to
sample_string = f"Hi my name is {myname}"
Related
I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))
Hiya i have made a program that stores the player name and strength..Here is the code:
data = {
"PLAYER":name2,
"STRENGTH":str(round(strength, 2)),
}
with open("data2.txt", "w", encoding="utf-8") as file:
file.write(repr(data))
file.close()
So this stores the data so what to i do if i wanna append/change the value after a certain action usch as a 'BATTLE'
Is it possible the get the variable of 'STRENGTH' and then change the number?
At the moment to read data from the external file 'DATA1.txt'i am using this code:
with open("data1.txt", "r", encoding="utf-8") as file:
data_string = file.readline()
data = eval(data_string)
# (data["STRENGTH"])
S1 = (float(data["STRENGTH"]))
file.close()
Now i can do something with the variable --> 'S1'
Here is the external text file 'data1.txt'
{'PLAYER': 'Oreo', 'STRENGTH': '11.75'}
... But i wanna change the strength value after a "battle" many thanks
Maybe you're not understanding Python dict semantics?
Seems to me you're doing a lot of unnecessary things like S1 = (float(data['STRENGTH'])) to try to manipulate and change values when you could be doing really simple stuff.
>>> data = {'PLAYER': 'Oreo', 'STRENGTH': '11.75'}
>>> data['STRENGTH'] = float(data['STRENGTH'])
>>> data
{'PLAYER': 'Oreo', 'STRENGTH': 11.75}
>>> data['STRENGTH'] += 1
>>> data
{'PLAYER': 'Oreo', 'STRENGTH': 12.75}
Maybe you should give Native Data Types -- Dive Into Python 3 a read to see if it clears things up.
Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...
Is there a way to assign file names to set varibles using a GUI? Say I have 6 file sets which contain 4 colors each (blue, green, nir, red). There are 24 files in total, so i'd need 24 variables. And I want the set varialbes to be something like
blue1
green1
nir1
red1
blue2
green2
nir2
red2
etc...
Currently I'm trying to use GUIDE to creat a custom GUI that will allow the user to select the files they wish and have them assigned to certain variables. I am thinking something along the lines of having 24 popupmenus that are attached to a file directory and allows the user to select which file they want, and then it will assign that file and it's path to a variable (blue1 for example) I also want 24 check boxes to associate with an if statement
Let's say popupmenu1 is associated with the variable blue1 and checkbox1
if checkbox1 == checked
do import
elseif checkbox1 == unchecked
fill with zeros
I have the basic frame of the GUI created, I am just unclear on how to apply the file select and then associate the if statements, etc...
If you know the variable files in advance, it's bad practice (look also here and here) to use string defined variable names like this:
var1name = 'blue';
var2name = 'red';
% etc.
% load data
datablue=rand(4,1);
datared =rand(4,1);
% assign
eval([var1name '1 = datablue(1);']);
eval([var2name '1 = datared (1);']);
% etc.
eval([var1name '2 = datablue(2);']);
eval([var2name '1 = datared (2);']);
% etc
It's much easier and better to just use an ordinary array, given the variable name is not changing or application dependend, which in my example I already have as datablue and datared.
Another option if you'd like user defined variable names is to use an array of structs:
var1name = 'blue';
var2name = 'red';
sample(1).(var1name) = datablue(1);
sample(1).(var2name) = datared (1);
% ...
sample(2).(var1name) = datablue(2);
sample(2).(var2name) = datared (2);
Try some of these out, and only if you have a very good reason, resort to eval!
for k = 1:6
blue(k) = sprintf('blue%d', k);
green(k) = sprintf('green%d', k);
nir(k) = sprintf('nir%d', k);
red(k) = sprintf('red%d', k);
end
This will create the variable names for you. Then you can use assignin (i believe) or eval to set the values to the variable names.
I am trying to extract the Interface from an array created from an SNMP Query.
I want to create an array like THIS:
my #array = ( "Gig 11/8",
"Gig 10/1",
"Gig 10/4",
"Gig 10/2");
It currently looks like THIS:
my #array =
( "orem-g13ap-01 Gig 11/8 166 T AIR-LAP11 Gig 0",
"orem-g15ap-06 Gig 10/1 127 T AIR-LAP11 Gig 0",
"orem-g15ap-05 Gig 10/4 168 T AIR-LAP11 Gig 0",
"orem-g13ap-03 Gig 10/2 132 T AIR-LAP11 Gig 0");>
I am doing THIS:
foreach $ints (#array) {
#gig = substr("$ints", 17, 9);
print("Interface: #gig");
Sure it works, but the hostname [orem-g15ap-01] doesn't always stay the same length, it varies depending on the site. I need to extract the word "Gig" plus the next 6 characters. I have no idea what is the best way of doing this.
I am a novice at perl but trying. Thanks
# "I need to extract the word "Gig" plus the next 6 characters."
# This looks like a fixed-with file format, so consider using unpack.
foreach ( #lines ) {
my( $orem, $gig, $rest ) = unpack 'a17 a9 a*';
print "[$gig]\n";
}
If it's not fixed-with format, then you need to find out what the file spec is and then maybe use a regular expression, something like:
my( $orem, $gig, $rest ) = m/(\S+)\s+(.{9})(.*)/;
But this will not work in the general case without a proper file spec.
Stuff like that is what Perl is made for. Regular Expressions are the way to go. Read the perldoc perlre.
foreach $ints (#array) {
$ints =~ s/(Gig.{6})/$1/g;
}
So you want the second and third field.
my #array = map { /^\S+\s+(\S+\s\S+)/s } #source;
This one is like ikegami's, but I recommend that if you know how something you want looks, then by all means, specify that. Because this is done in a list context, any string that does not match the spec, returns an empty list--or is ignored.
my #results = map { m!(\bGig\s+\d+/d+)! } #array;