How to read from file to array - arrays

Trying to read from a txt file and have the results be displayed in the message box. I plan on copying and pasting lines of 1000 and deleting them from the array, later in my code. For now I'd like to be able to see that the file can be read into the array and be displayed:
Local $List
FileReadToArray( "C:/Users/Desktop/recent_list.txt", $List [, $iFlags = $FRTA_COUNT [, $sDelimiter = ""] ])
MsgBox( 0, "Listing", $List )
I get an error:
>"C:\Program Files (x86)\AutoIt3\SciTE\..\autoit3.exe" /ErrorStdOut "C:\Users\Documents\Test.au3"

"FileReadToArray" has no other parameters than the file to read! You have used the function call from "_FileReadToArray".
The square brackets in the function line means: This parameters are optional! If you want to use them with the default values, its not required to write them in the function call.
And "FileReadToArray" reads the content of a file into an array. Thats why your call should look like so:
Local $arList = FileReadToArray("C:/Users/Desktop/recent_list.txt")
; to show every line in a MsgBox you must iterate
; through the result array
For $i = 0 To UBound($arList) -1
; MsgBox is not sensefull with hundred of lines in file!
; MsgBox(0, 'Line ' & $i+1, $arList[$i])
; better way - console output
ConsoleWrite('['& $i+1 & '] ' & $arList[$i] & #CRLF)
Next

Related

I am not able to get any values to store in the array #AA

It is getting the correct inputs and printing them inside the for loop but when I try to send it to a function module later or if I try to print it outside the for loop it is empty.
What do I need to change?
#!/usr/bin/perl
use lib "."; # This pragma include the current working directory
use Mytools;
$inputfilename = shift #ARGV;
open (INFILE, $inputfilename) or die
("Error reading file $inputfilename: $! \n");
# Storing every line of the input file in array #file_array
while (<INFILE>){
$file_array[ $#file_array + 1 ] = $_;
}
my $protein;
my #AA;
foreach $protein (#file_array)
{
#AA = Mytools::dnaToAA($protein);
print "The main AA\n",#AA;
}
print "The main array",#file_array;
my $header1 = "AA";
my $header2 = "DNA";
Mytools::printreport($header1, $header2, \#AA, \#file_array);
You're overwriting the #AA in every iteration of the foreach loop.
Instead of
#AA = Mytools::dnaToAA($protein);
use
push #AA, Mytools::dnaToAA($protein);
See push.
Next time, try to post runnable code (see mre), i.e. avoid Mytools as they're irrelevant to the problem and make the code impossible to run for anyone else but you.

How to loop over multiple files, ignore empty files without raising an exception?

I'm writing a program that will accept multiple files at once. But I'm trying to change this program in a way that if a empty file is entered, then ignore that file but continue to read over all the other files, and provide me an output without raising any exception errors related to the empty file.
For example:
File 1 = contains text that will work with this program
File 2 = is empty
This is a piece of my program:
from sys import argv
script , filenames = argv[0], argv[1:]
for file in filenames:
with open(file) as f:
var = f.read()
print "\n\nYou File Name: '(%r)'" % (file)
var1 = var.split()
var2 = len(var1)
print '\n\nThe Total Number of Words: "({0:,})"'.format(var2)
var3 = var.split()[0]
var4 = len(var3)
print '\n\nThe First Word and Length: "(%s)" ({0:,})'.format(var4) % (var3)
If I run this program using File 2, I will get the following error:
var3 = var.split()[0]
IndexError: list index out of range
Is there a way that allows me to run File 1 and File 2 together, but get the output for File 1, then print a message for File 2 saying that its an unrecognizable file? I tried adding try/except but still wasn't working correctly.
Use if / else to check the length of your file:
for file in filenames:
with open(file) as f:
var = f.read()
print "\n\nYou File Name: '(%r)'" % (file)
if len(var) > 0:
var1 = var.split()
var2 = len(var1)
print '\n\nThe Total Number of Words: "({0:,})"'.format(var2)
var3 = var.split()[0]
var4 = len(var3)
print '\n\nThe First Word and Length: "(%s)" ({0:,})'.format(var4) % (var3)
else:
print 'File empty'

ref to a hash -> its member array -> this array's member's value. How to elegantly access and test?

I want to use an expression like
#{ %$hashref{'key_name'}[1]
or
%$hashref{'key_name}->[1]
to get - and then test - the second (index = 1) member of an array (reference) held by my hash as its "key_name" 's value. But, I can not.
This code here is correct (it works), but I would have liked to combine the two lines that I have marked into one single, efficient, perl-elegant line.
foreach my $tag ('doit', 'source', 'dest' ) {
my $exists = exists( $$thisSectionConfig{$tag});
my #tempA = %$thisSectionConfig{$tag} ; #this line
my $non0len = (#tempA[1] =~ /\w+/ ); # and this line
if ( !$exists || !$non0len) {
print STDERR "No complete \"$tag\" ... etc ... \n";
# program exit ...
}
I know you (the general 'you') can elegantly combine these two lines. Could someone tell me how I could do this?
This code it testing a section of a config file that has been read into a $thisSectionConfig reference-to-a-hash by Config::Simple. Each config file key=value pair then is (I looked with datadumper) held as a two-member array: [0] is the key, [1] is the value. The $tag 's are configuration settings that must be present in the config file sections being processed by this code snippet.
Thank you for any help.
You should read about Arrow operator(->). I guess you want something like this:
foreach my $tag ('doit', 'source', 'dest') {
if(exists $thisSectionConfig -> {$tag}){
my $non0len = ($thisSectionConfig -> {$tag} -> [1] =~ /(\w+)/) ;
}
else {
print STDERR "No complete \"$tag\" ... etc ... \n";
# program exit ...
}

Splitting two large CSV files preserving relations between file A and B across the resulting files

I have TWO large, CSV files (around 1GB). They both share relation between each other (ID is lets say like a foreign key). Structure is simple, line by line but CSV cells with a line break in the value string can appear
37373;"SOMETXT-CRCF or other other line break-";3838383;"sasa ssss"
One file is P file and other is T file. T is like 70% size of the P file (P > T). I must cut them to smaller parts since they are to big for the program I have to import them... I can not simply use split -l 100000 since I will loose ID=ID relations which must be preserved! Relation can be 1:1, 2:3, 4:6 or 1:5. So stupid file splitting is no option, we must check the place where we create a new file. This is example with simplified CSV structure and a place where I want the file to be cut (and the lines above go to separate P|T__00x file and we continue till P or T ends). Lines are sorted in both files, so no need to search for IDs across whole file!
File "P" (empty lines for clearness):
CSV_FILE_P;HEADER;GOES;HERE
564788402;1;1;"^";"01"
564788402;2;1;"^";"01"
564788402;3;1;"^";"01"
575438286;1;1;"^";"01"
575438286;1;1;"^";"01"
575438286;2;1;"37145859"
575438286;2;1;"37145859"
575438286;3;1;"37145859"
575438286;3;1;"37145859"
575439636;1;1;"^"
575439636;1;1;"^"
# lets say ~100k line limit of file P is somewhere here and no more 575439636 ID lines , so we cut.
575440718;1;1;"^"
575440718;1;1;"^"
575440718;2;1;"10943890"
575440718;2;1;"10943890"
575440718;3;1;"10943890"
575440718;3;1;"10943890"
575441229;1;1;"^";"01"
575441229;1;1;"^";"01"
575441229;2;1;"4146986"
575441229;2;1;"4146986"
575441229;3;1;"4146986"
575441229;3;1;"4146986"
File T (empty lines for clearness)
CSV_FILE_T;HEADER;GOES;HERE
564788402;4030000;1;"0204"
575438286;6102000;1;"0408"
575438286;6102000;0;"0408"
575439636;7044010;1;"0408"
575439636;7044010;0;"0408"
# we must cut here since bigger file "P" 100k limit has been reached
# and we end here because 575439636 ID lines are over.
575440718;6063000;1;"0408"
575440718;6063000;0;"0408"
575441229;8001001;0;"0408"
575441229;8001001;1;"0408"
Can you please help splitting those two files into many 100 000 (or so) lines separate files T_001 and corresponding P_001 file and so on? So ID matches between file parts. I believe awk will be the best tool but I have not got much experience in this field. And the last thing - CSV header should be preserved in each of the files.
I have powerful AIX machine to cope with that (linux also possible since AIX commands are limited sometimes)
You can parse the beginning IDs with awk and then check to see if the current ID is the same as the last one. Only when it is different are you allowed close the current output file and open a new one. At that point record the ID for tracking the next file. You can track this id in a text file or in memory. I've done it in memory but with big files like this you could run into trouble. It's easier to keep track in memory than opening multiple files and reading from them.
Then you just need to distinguish between the first file (output and recording) and the second file (output and using the prerecorded data).
The code does a very brute force check on the possibility of a CRLF in a field - if the line does not begin with what looks like an ID, then it outputs the line and does no further testing on it. Which is a problem if the CRLF is followed immediately by a number and semicolon! This might be unlikely though...
Run with: gawk -f parser.awk P T
I don't promise this works!
BEGIN {
MAXLINES = 100000
block = 0
trackprevious = 0
}
FNR == 1 {
# First line is CSV header
csvheader = $0
if (FILENAME == "-")
{
_error = 1
print "Error: Need filename on command line"
exit 1
}
if (trackprevious)
{
_error = 1
print "Only one file can track another"
exit 1
}
if (block >= 1)
{
# New file - track previous output...
close(outputname)
Tracking[block] = idval
print "Output for " FILENAME " tracks previous file"
trackprevious = 1
}
else
{
print "Chunking output (" MAXLINES ") for " FILENAME
}
linecount = 0
idval = 0
block = 1
outputprefix = FILENAME "_block"
outputname = sprintf("%s_%03d", outputprefix, block)
print csvheader > outputname
next
}
/^[0-9]+;/ {
linecount++
newidval = $0
sub(/;.*$/, "", newidval)
newidval = newidval + 0 # make a number
startnewfile = 0
if (trackprevious && (idval != newidval) && (idval == Tracking[block]))
{
startnewfile = 1
}
else if (!trackprevious && (idval != newidval) && (linecount > MAXLINES))
{
# Last ID value found before new file:
Tracking[block] = idval
startnewfile = 1
}
if (startnewfile)
{
close(outputname)
block++
outputname = sprintf("%s_%03d", outputprefix, block)
print csvheader > outputname
linecount = 1
}
print $0 > outputname
idval = newidval
next
}
{
linecount++
print $0 > outputname
}

Read from text file and assign data to new variable

Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...

Resources