Python: read one word per line of a text file - file

Its not a proper code but I want to know if there is a way to search just one word w./o using .split() as it forms a list and i dont want that with this snippet:
f=(i for i in fin.xreadlines())
for i in f:
try:
match=re.search(r"([A-Z]+\b) | ([A-Z\'w]+\b) | (\b[A-Z]+\b) | (\b[A-Z\'w]+\b) | (.\w+\b)", i) # | r"[A-Z\'w]+\b" | r"\b[A-Z]+\b" | r"\b[A-Z\'w]+\b" | r".\w+\b"
Also can i make a reusable class module like this
class LineReader: #Intended only to be used with for loop
def __init__(self,filename):
self.fin=open(filename,'r')
def __getitem__(self,index):
line=self.fin.xreadline()
return line.split()
where say f=LineReader(filepath)
and for i in f.getitem(index=line number 25) loop starts from there?
i dont know how to do that.any tips?

To get the first word of a line:
line[:max(line.find(' '), 0) or None]
line.find(' ') searches for the first whitespace, and returns it. If there is no whitespace found it returns -1
max( ... ), 0) makes sure the result is always greater than 0, and makes -1 0. This is usefull because bool(-1) is True and bool(0) is False.
x or None evaluates to x if x != 0 else None
and finaly line[:None] is equal to line[:], which returns a string identical to line
First sample:
with open('file') as f:
for line in f:
word = line[:max(line.find(' '), 0) or None]
if condition(word):
do_something(word)
And the class (implemented as a generator here)
def words(stream):
for line in stream:
yield line[:max(line.find(' '), 0) or None]
Which you could use like
gen = words(f)
for word in gen:
if condition(word):
print word
Or
gen = words(f)
while 1:
try:
word = gen.next()
if condition(word):
print word
except StopIteration:
break # we reached the end
But you also wanted to start reading from a certain linenumber. This can't be done very efficient if you don't know the lengths of the lines. The only way is reading lines and discarding them until you reach the right linenumber.
def words(stream, start=-1): # you could replace the -1 with 0 and remove the +1
for i in range(start+1): # it depend on whether you start counting with 0 or 1
try:
stream.next()
except StopIteration:
break
for line in stream:
yield line[:max(line.find(' '), 0) or None]
Be aware that you could get strange results if a line would start with a space. To prevent that, you could insert line = line.rstrip() at the beginning of the loop.
Disclaimer: None of this code is tested

Related

Ruby Array Elements

I am trying to create password Generate in ruby. At the moment all is working just got stuck at the final piece of generating the password.
I asked user if he/she would like the password to include numbers, lowercase or uppercase.
If YES, user will enter 1 and 0 for NO.
I used the code below to generate password if everything is 1. Meaning user want to include numbers, lowercase and uppercase.
if numbers == 1 && lowercase == 1 && uppercase == 1
passGen = [(0..9).to_a + ('A'..'Z').to_a + ('a'..'z').to_a].flatten.sample(10)
end
p passGen
This works 90% of the time. 10% of the time the generated password will not include say any numbers. But everything else present. I am not sure if this is because of the size or length of Array from which the password is sampled.
Anyway lets go to the main problem below
Here is the problem, I am struggling to write the code to generate password if one or more of input is 0. That's if user don't want to include numbers. Or no numbers and uppercase etc . As I can't predict what user may want or not want. I need help on this please.
Thank you.
You will need to make your input array more dynamic:
passGen = []
passGen += (0..9).to_a if numbers == 1
passGen += ('A'..'Z').to_a if uppercase == 1
passGen += ('a'..'z').to_a if lowercase == 1
passGen.sample(10).join
Now, to tackle your other issue with missing characters - this is caused as you are simply taking 10 random characters from an array. So it can just take, for example, all digits.
To tackle this you need to get one character from each generator first and then generate the remaining characters randomly and shuffle the result:
def generators(numbers:, lowercase:, uppercase:)
[
(0..9 if numbers),
('A'..'Z' if uppercase),
('a'..'z' if lowercase)
].compact.map(&:to_a)
end
def generate_password(generators:, length:, min_per_generator: 1)
chars = generators.flat_map {|g| Array.new(min_per_generator) { g.sample }}
chars += Array.new(length - chars.length) { generators.sample.sample }
chars.shuffle.join
end
gens = generators(numbers: numbers == 1, uppercase == 1, lowercase: lowercase == 1)
Array.new(10) { generate_password(generators: gens, length: 10) }
The code doesn't know it needs to include a digit/letter from every group. The sample takes random signs and since you a basically sampling 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz there is a possibility that all the signs will not be digits.
The easiest way to fix it is to check if a sign from every group is in the "password" and then replace a random sign with a sign from group that is not present.
If I were to program this I would do it like that
def random_from_range(range)
range.to_a.sample.to_s
end
def passGen(numbers, lowercase, uppercase)
result = ''
possibleSigns = []
if numbers == 1
range = (0..9)
result += random_from_range(range)
possibleSigns += range.to_a
end
if lowercase == 1
range = ('A'..'Z')
result += random_from_range(range)
possibleSigns += range.to_a
end
if uppercase == 1
range = ('a'..'z')
result += random_from_range(range)
possibleSigns += range.to_a
end
desired_lenth = 10
while result.length < desired_lenth
result += possibleSigns.sample.to_s
end
result
end
puts passGen(1,1,1)
By saying (0..9).to_a + ('A'..'Z').to_a + ('a'..'z').to_a, you're creating an Array of 10 + 26 + 26 = 62 elements, and then you pick only 10 elements out of it.
At your place I'd wrap password generation around an until block:
def generate_password_with_digits_and_caps
[(0..9).to_a + ('A'..'Z').to_a + ('a'..'z').to_a].flatten.sample(10).join
end
passGen = ''
until passGen.match(/[A-Z]/) && passGen.match(/[a-z]/) && passGen.match(/\d/)
passGen = generate_password_with_digits_and_caps
end
This could also work (closer to your snipppet):
if numbers == 1 && lowercase == 1 && uppercase == 1
passGen = ''
until passGen.match(/[A-Z]/) && passGen.match(/[a-z]/) && passGen.match(/\d/)
passGen = [(0..9).to_a + ('A'..'Z').to_a + ('a'..'z').to_a].flatten.sample(10).join
end
end
Start with something simple and stupid:
passGen = (('0'..'9').to_a.sample(1)+ ('A'..'Z').to_a.sample(1)+('a'..'z').to_a.sample(8).shuffle).join
Technically speaking, this already fulfills your requirement. From the viewpoint of aesthetics and security, the disadvantage here is that the number of upper case characters is always 8. A more elegant solution would be to find three non-zero integers which add up to 10, and can be used as the arguments for the sample call. Also, if no numbers are requested, you simply pass 0 as argument to sample.
Since this exceeds the scope of your question, and I don't even know whether you want to go so far, I don't elaborate on this here further.

How to loop assigning characters in a string to variable?

I need to take a string and assign each character to a new string variable for a Text To Speech engine to read out each character separately, mainly to control the speed at which it's read out by adding pauses in between each character.
The string contains a number which can vary in length from 6 digits to 16 digits, and I've put the below code together for 6 digits but would like something neater to handle any different character count.
I've done a fair bit of research but can't seem to find a solution, plus I'm new to Groovy / programming.
OrigNum= "12 34 56"
Num = OrigNum.replace(' ','')
sNum = Num.split("(?!^)")
sDigit1 = sNum[0]
sDigit2 = sNum[1]
sDigit3 = sNum[2]
sDigit4 = sNum[3]
sDigit5 = sNum[4]
sDigit6 = sNum[5]
Edit: The reason for needing a new variable for each character is the app that I'm using doesn't let the TTS engine run any code. I have to specifically declare a variable beforehand for it to be read out
Sample TTS input: "The number is [var:sDigit1] [pause] [var:sDigit2] [pause]..."
I've tried using [var:sNum[0]] [var:sNum[1]] to read from the map instead but it is not recognised.
Read this about dynamically creating variable names.
You could use a map in your stuation, which is cleaner and more groovy:
Map digits = [:]
OrigNum.replaceAll("\\s","").eachWithIndex { digit, index ->
digits[index] = digit
}
println digits[0] //first element == 1
println digits[-1] //last element == 6
println digits.size() // 6
Not 100% sure what you need, but to convert your input String to output you could use:
String origNum = "12 34 56"
String out = 'The number is ' + origNum.replaceAll( /\s/, '' ).collect{ "[var:$it]" }.join( ' [pause] ' )
gives:
The number is [var:1] [pause] [var:2] [pause] [var:3] [pause] [var:4] [pause] [var:5] [pause] [var:6]

How do you count number of characters from each lines then add them all up?

I have given a question to write a function "that returns a count of the number of characters in the file whose name is given as a parameter."
So if a file called "data.txt" contains "Hi there!" and is printed by using my codes from below, it will return value of 10. (which is correct)
"""Attemping Question 7.
Author: Ark
Date: 28/04/2015
"""
def file_size(filename):
"""extracts word from a line"""
filename = open(filename, 'r')
for line in filename:
result = len(line) #count number of characters in a line.
return result
However, let say I have made another file called "data2.txt" and it contains
EEEEE
DDDD
CCC
BB
A
If I print this out it would give the value of 6. So, my challenge starts here.. what can I do with my coding to read the lines and add them all up?
print(file_size("data2.txt"))
expected 16 words (?)
You must sum the lengths of the lines, right now you return the length of the very first line.
Also, you must strip a trailing newline if it's there. This should work:
def character_count(filename):
with open(filename) as f:
return sum(len(line.rstrip("\n")) for line in f)

Print words from the corresponding line numbers

Hello Everyone,
I have two files File1 and File2 which has the following data.
File1:
TOPIC:topic_0 30063951.0
2 19195200.0
1 7586580.0
3 2622580.0
TOPIC:topic_1 17201790.0
1 15428200.0
2 917930.0
10 670854.0
and so on..There are 15 topics and each topic have their respective weights. And the first column like 2,1,3 are the numbers which have corresponding words in file2. For example,
File 2 has:
1 i
2 new
3 percent
4 people
5 year
6 two
7 million
8 president
9 last
10 government
and so on.. There are about 10,470 lines of words. So, in short I should have the corresponding words in the first column of file1 instead of the line numbers. My output should be like:
TOPIC:topic_0 30063951.0
new 19195200.0
i 7586580.0
percent 2622580.0
TOPIC:topic_1 17201790.0
i 15428200.0
new 917930.0
government 670854.0
My Code:
import sys
d1 = {}
n = 1
with open("ap_vocab.txt") as in_file2:
for line2 in in_file2:
#print n, line2
d1[n] = line2[:-1]
n = n + 1
with open("ap_top_t15.txt") as in_file:
for line1 in in_file:
columns = line1.split(' ')
firstwords = columns[0]
#print firstwords[:-8]
if firstwords[:-8] == 'TOPIC':
print columns[0], columns[1]
elif firstwords[:-8] != '\n':
num = columns[0]
print d1[n], columns[1]
This code is running when I type print d1[2], columns[1] giving the second word in file2 for all the lines. But when the above code is printed, it is giving an error
KeyError: 10472
there are 10472 lines of words in the file2. Please help me with what I should do to rectify this. Thanks in advance!
In your first for loop, n is incremented with each line until reaching a final value of 10472. You are only setting values for d1[n] up to 10471 however, as you have placed the increment after you set d1 for your given n, with these two lines:
d1[n] = line2[:-1]
n = n + 1
Then on the line
print d1[n], columns[1]
in your second for loop (for in_file), you are attempting to access d1[10472], which evidently doesn't exist. Furthermore, you are defining d1 as an empty Dictionary, and then attempting to access it as if it were a list, such that even if you fix your increment you will not be able to access it like that. You must either use a list with d1 = [], or will have to implement an OrderedDict so that you can access the "last" key as dictionaries are typically unordered in Python.
You can either:
Alter your increment so that you do set a value for d1 in the d1[10472] position, or simply set the value for the last position after your for loop.
Depending on what you are attempting to print out, you could replace your last line with
print d1[-1], columns[1]
to print out the value for the final index position you currently have set.

Handling CR line endings in Lua

I'm trying to read a file with CR line endings using the file:read method which seems to be acting up for some reason. The file contents look like this:
ABCDEFGH
12345
##
6789
I want it to behave consistently with all types of line endings. Every time I try to read the file, it returns the last line in the file concatenated with the any trailing characters from the previous lines that have a greater position than the position of the last character in the last line. Here's what I mean:
> file=io.open("test.lua", "rb")
> function re_read(openFile)
openFile:seek("set");
return openFile:read("*a");
end
> =re_read(file) -- With CR
67895FGH
> =re_read(file) -- With CRLF
ABCDEFGH
12345
##
6789
> =re_read(file) -- with LF
ABCDEFGH
12345
##
6789
>
As you can see, the string being returned is the last string plus 5 in the previous line and plus FGH from the first line. Any lines shorter than the last line are skipped.
My goal is to use the file:line() method to read the file line by line. My hope is that if a 'fix' for file:read is found then it can be applied to file:lines().
In the case with CR only, re_read actually works as expected: it returns the lines separated by CR. But when the interpreter displays it, it interprets the CR characters as "go back to the beginning of the line". So here is how the result changes line by line:
ABCDEFGH
12345FGH
##345FGH
67895FGH
EDIT: here it is character by character, with a "virtual cursor" (|).
|
A|
AB|
ABC|
ABCD|
ABCDEF|
ABCDEFGH|
|ABCDEFGH
1|BCDEFGH
12|CDEFGH
123|DEFGH
1234|EFGH
12345|FGH
|12345FGH
#|2345FGH
##|345FGH
|##345FGH
6|#345FGH
67|345FGH
678|45FGH
6789|5FGH
Proof:
> s = "ABCDEFGH\r12345\r##\r6789"
> =s
67895FGH
You could normalize your line endings with gsub then iterate over the product with gmatch.
local function cr_lines(s)
return s:gsub('\r\n?', '\n'):gmatch('(.-)\n')
end
local function cr_file_lines(filename)
local f = io.open(filename, 'rb')
local s = f:read('*a')
f:close()
return cr_lines(s)
end
for ln in cr_file_lines('test.txt') do
print(ln)
end

Resources