Extracting results from a txt to csv

Extracting results from a txt to csv - c

I am trying to extract some results i've got into a csv file, from a text one.
The results.txt has this form and I wanted to extract it in the following form as csv:
Benchmark, Pass/Fail, ops/m
compiler.compiler, PASSED, 18.37
compress, PASSED, 10.87
crypto.aes, PASSED, 3.91
etc...
So I want to keep only the iteration 1 results, in that form. What would you suggest me to do?
Thank you!

The following Python 2.x script should help to get you started (as was originally tagged). The results.txt file can be passed for lines containing iteration 1 as follows:
import csv
from itertools import ifilter
with open('results.txt', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_input = csv.reader(ifilter(lambda x: "iteration 1" in x, f_input), delimiter=' ', skipinitialspace=True)
csv_output = csv.writer(f_output)
csv_output.writerow(["Benchmark", "Pass/Fail", "ops/m"])
for row in csv_input:
csv_output.writerow([row[0], 'PASSED', row[6]])
Creating an output CSV file as follows:
Benchmark,Pass/Fail,ops/m
compiler.compiler,PASSED,18.37
compress,PASSED,10.87
crypto.aes,PASSED,3.91
crypto.rsa,PASSED,8.79
crypto.signverify,PASSED,15.10
derby,PASSED,9.40
mpegaudio,PASSED,7.81
scimark.fft.large,PASSED,4.27
scimark.lu.large,PASSED,0.85
scimark.sor.large,PASSED,2.38
scimark.sparse.large,PASSED,1.46
scimark.monte_carlo,PASSED,5.65
scimark.fft.small,PASSED,8.94
scimark.lu.small,PASSED,8.24
scimark.sor.small,PASSED,12.90
scimark.sparse.small,PASSED,5.61
serial,PASSED,4.53
startup.helloworld,PASSED,41.24
startup.compiler.compiler,PASSED,2.05
startup.compress,PASSED,3.62
startup.crypto.aes,PASSED,0.92
startup.crypto.rsa,PASSED,1.87
startup.crypto.signverify,PASSED,2.76
startup.mpegaudio,PASSED,1.82
startup.scimark.fft,PASSED,4.49
startup.scimark.lu,PASSED,2.44
startup.scimark.monte_carlo,PASSED,1.27
startup.scimark.sor,PASSED,3.14
startup.scimark.sparse,PASSED,1.54
startup.serial,PASSED,1.73
startup.sunflow,PASSED,3.55
startup.xml.transform,PASSED,0.27
startup.xml.validation,PASSED,4.28
sunflow,PASSED,3.69
xml.transform,PASSED,10.41
xml.validation,PASSED,15.37
However, it simply assumes all entries in the file are considered PASSED as there is no example showing what a failed entry would look like in the sample file.

Related

Python loop to extract all sequences shorter than 100AA in multiple fasta files

I'm kinda new to python and I wrote a script to loop through all fasta files in a directory and extract the sequences shorter than 100AA of each file:
from Bio import SeqIO
import sys
import os
def loop_extractsmorfs(input_handle, output_handle):
files = os.listdir(input_handle)
for file in SeqIO.parse(files, "fasta"):
if len(file.seq) <= 100 :
files.append(file)
SeqIO.write(files, output_handle, "fasta")
if __name__=="__main__" :
loop_extractsmorfs(input_handle=sys.argv[1], output_handle=sys.argv[2])
When I run this code on the terminal using both the input_handle and output_handle as arguments I get:
AttributeError: 'list' object has no attribute 'read'
I imagine there must be some mistake in the way I'm using the os.listdir or something but the examples I found online only show how to "print the files in that directory" and I need to extract and write new files.

You are trying to parse a list of files. SeqIO cannot do this. You should parse each file separately & then, for your convenience, unite resulted iterators into one (via chain in my example). Necessary sequences should be stored in a seclude list.
So:
from Bio import SeqIO
import sys
from pathlib import Path
from itertools import chain
def loop_extractsmorfs(input_dirpath: Path, output_filepath: Path) -> None:
little_records = []
for record in chain(
*(SeqIO.parse(filepath, "fasta") for filepath in input_dirpath.iterdir())
):
if len(record.seq) <= 100:
little_records.append(record)
SeqIO.write(little_records, output_filepath, "fasta")
if __name__=="__main__" :
loop_extractsmorfs(
input_dirpath=Path(sys.argv[1]), output_filepath=Path(sys.argv[2])
)
This could be put into a list comprehension but I would not like to overload the answer.

How do I get the index to take the filename as its value?

I have a list of filenames that I want to make (they don't exist yet). I want to loop through the list and create each file. Next I want to write to each file a path (along with other text not shown here) that includes the name of the file. I have written something similar to below so far but cannot see how to get the index i to take the file name values. Please help.
import os
biglist=['sleep','heard','shed']
for i in biglist:
myfile=open('C:\autumn\winter\spring\i.txt','w')
myfile.write('DATA = c:\autumn\winter\spring\i.dat')
myfile.close

Maybe you can try this below python function.
import sys
biglist=['sleep','heard','shed']
def create_file():
for i in biglist:
try:
file_name_with_ext = "C:\autumn\winter\spring\"+ i + ".txt"
file = open(file_name_with_ext, 'a')
file.close()
except:
print("caught error!")
sys.exit(0)
create_file() #invoking the function

Writing a new column in a csv file through Command line

I want to create a python script that will write a new column in a csv file along with the data through command line, using optparse.
For e.g. if following is the input file(script_name.py) :-
User_ID,Date,Num_1,Num_2,Com_ID
101,2015-04-13,12,21,1011
102,2014-4-3,1,7,1002
then python script_name.py Num_3 1101 1102. will create a new column with data.
Can anybody please direct me somewhere where I can learn how to achieve this ??I already read about OptParse but got nothing.

use
import optparse
parser = optparse.OptionParser()
(options, args) = parser.parse_args()
print args
the args variable will contain an array of all arguments
than using simple loops you can run over the file and add the contents of the args variable

Building a Route to output differences of two files

Trying to put together a file diff route... could someone help? here is what I have ->
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:inputdir?delete=true&sortBy=ignoreCase:file:name")
.unmarshal(csv)
.pollEnrich("file:backup?fileName=test.csv&sendEmptyMessageWhenIdle=true")
.unmarshal(csv)
// Need to aggregate here!!!!
.log("test");
A csv file gets dropped in the /input directory and then a backup file is consumed from the /backup directory. I would like to compare these two files and output the difference.

This is not a specific Camel problem. In order to solve this problem you may implement a diff functionality on your own, or you may use an existing library such as java-diff-utils.
Pseudocode:
// read file 1 into a list "list1"
// read file 2 into a list "list2"
// use java-diff-utils to calculate the difference
Patch patch = DiffUtils.diff(list1, list2);

How to handle large files while reading it with python xlrd in GAE without giving DeadlineExceededError

I want to read a file having size 4 MB using python xlrd in GAE.
i am getting the file from Blobstore. Code used is given below.
book = xlrd.open_workbook(file_contents=temp_file)
sh = book.sheet_by_index(0)
for col_no in range(sh.ncols):
its gives me DeadlineExceededError.
book = xlrd.open_workbook(file_contents=file_data)
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/__init__.py", line 416, in open_workbook
ragged_rows=ragged_rows,
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/xlsx.py", line 756, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "/base/data/home/apps/s~appid/app-version.369475363369053908/xlrd/xlsx.py", line 520, in own_process_stream
for event, elem in ET.iterparse(stream):
DeadlineExceededError
But i am able to read files with smaller size.
Actually i need to get only first few rows(30 to 50) of the file. Is there any other method, other than adding it as a task and getting the details using channel API to get the details with out causing deadline error ?
What i can do to handle this....?

I read a file about 1000 rows excel and it works okay the library.
I leave a link that might be useful https://github.com/cjhendrix/HXLator-SpaceAppsVersion/blob/master/gae/main.py
the code I see that this crossing of columns and rows must be at lists for each row
example:
wb = xlrd.open_workbook(file_contents=inputfile.read())
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
val_row = sh.row_values(rownum)
#here print element of list
self.response.write(val_row[1]) #depending for number for columns
regards!!!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Extracting results from a txt to csv - c

Related

Python loop to extract all sequences shorter than 100AA in multiple fasta files

How do I get the index to take the filename as its value?

Writing a new column in a csv file through Command line

Building a Route to output differences of two files

How to handle large files while reading it with python xlrd in GAE without giving DeadlineExceededError

Categories

Resources