Fasta file conversion - arrays

how can i convert a fasta file (containing peptide sequences) into a file that contain one entry per line? i would like to index it for a job array. i have 38 different fasta files to be blast with another 38 databases. For example : DB40,DB41, DB42................DB78. I want to convert those DB into data.1 .....and so forth.
im referring to this website:
http://www3.imperial.ac.uk/bioinfsupport/help/cluster_usage/submitting_array_jobs
Any help?

Related

Reading data from txt file in c language

Prepare a file called “notes.txt” which contains students' names, visa and final grades. Later
read the names, visa and final notes in the file in a structure. General with 40% of the visa and 60% of the final
Calculate the grade. Finally, students whose general grade is 60 and above are displayed on the screen with their student name and general grade.
Print the "Gecti.txt" file, the ones below 60 with the student name and general grade on the screen and the "kaldi.txt" file.
When the program runs, sample file contents and screenshot will be as follows. For the "notes.txt" file
You can use the following data, "Gecti.txt" and "Kaldi.txt" files should be created automatically when the program is run. This is what i want to do. But I dont know read a data from txt file.
In notes.txt:
Ali 50 40
Ayse 20 90
Omer 70 80
Elif 50 50
Ahmet 50 80
but I don't know how can read name and visa and final in txt file and I can't write code .pls help me
#include <stdio.h>
struct ogrenci
{
char isim[30];
int vize;
int final;
}ogr[10];
int main()
{
FILE * notlar = fopen("notlar.txt","r");
}
ı can write just this
In order to achieve this, there are the following things you need to do:
Open the file for reading (this part you have done)
Start a loop (until end-of-file is reached), and inside that loop:
Read a line from the file
Format the line (using the space character as a separator) in order to find the three different kinds of information.
Add the mentioned information in the structure you have created.
Put the mentioned structure in a collection.
Close the file
About how to read files and how to format strings, containing separators, there are plenty of examples to be found on the internet and on this site.

Fortran90 : read file names sequentially

I am working with fortran 90. I have 50 .dat files that correspond to 50 time steps. Files have a similar name, for instance tstep01.dat, tstep02.dat, tstep03.dat, etc. I have to read sequentially the name of these files. Files are loacted in the output directory that is in the same directory as my script. I want to get the name of files in order to pass it to a subroutine that generates animation. The subroutine gets this name to read the data and creates .png frames subsequently. I have already tried this:
character(len = 14) :: data_name !data name
nframes = 50 !number of timesteps
do i = 1, nframes
write(data_name, '(output/("/tstep", I2.2, ".dat"))') i
end do
but I got this error :
write(data_name, ('output/("/tstep", I2.2, ".dat")')) i
1
Error: Nonnegative width required in format string at (1)
I think the problem is with the output/, but I don't know what is the correct way for defining the directory of files. Your help would be appreciated.

Read matrices from multiple .csv files and print matrices in .csv files

So I have to write a C program to read data from .csv files supplied to me by multiple users, into matrices on which I will perform some operations (like matrix addition, multiplication with necessary conditions on dimensions, etc.) and print these matrices (or the output data) in to .csv files again.
I also need to dynamically allocate memory to my matrices.
Now, I have zero background in dealing with .csv files. I do not at all know the required code to read a .csv file or write into a .csv file. I have searched for long on the Internet but surprisingly I have not found any program that teaches how to deal with .csv files from the elementary level.
I am lost on this and need a lot of guidance, maybe a sample, fully well-written C program as I need a comprehensive example to begin with.
A CSV file is just a plain ASCII text file that contains a grid of values. Think of the file as a set of rows in a database table where each line in the file represents one record and the order of the data in each line is identical. Each item of data is separated using a comma character (hence the name). So to read the file:-
open file
until the end of the file
read line into a string
split the string into sub strings where ',' is the dilimiter
parse each sub string
Since there is no formatting information in a CSV file, if the data in each value consists of a string, then what do you do if the value has a comma in it? For reading numbers that is not a problem for you.
You could read the file in several passes, the first to determine the amount of data there is (number of columns, number of rows, etc) and the second to actually read the data.
Writing the CSV is quite simple:-
open file
for each record to write
for each element to write
write element
if not last element
write a comma
write a new line

Read/Write files in C

I'm writing a program in C that basically creates an archive file for a given list of file names. This is pretty similar to the ar command in linux. This is how the archive file would look like:
!<arch>
file1.txt/ 1350248044 45503 13036 100660 28 `
hello
this is sample file 1
file2.txt/ 1350512270 45503 13036 100660 72 `
hello
this is sample file 2
this file is a little larger than file1.txt
But I'm having difficulties trying to exract a file from the archive. Let's say the user wants to extract file1.txt. The idea is it should get the index/location of the file name (in this case file1.txt), skip 58 characters to reach the content of the file, read the content, and write it to a new file. So here's my questions:
1) How can I get the index/location of the file name in the archive file? Note that duplicate file names are NOT allowed, so I don't have to worry about having two different indecies.
2) How can I skip several characters (in this case 58) when reading a file?
3) How can I figure out when the content of a file ends? i.e. I need it to read the content and stop right before the file2.txt/ header.
My approach to solving this problem would be:
To have a header information that contains the size of each file, its name and its location in the file.
Then parse the header, use fseek() and ftell() as well as fgetc() or fread() functions to get bytes of the file and then, create+write that data to it. This is the simplest way I can think of.
http://en.wikipedia.org/wiki/Ar_(Unix)#File_header <- Header of ar archives.
EXAMPLE:
#programmer93 Consider your header is 80 bytes long(header contains the meta-data of the archive file). You have two files one of 112 bytes and the other of 182 bytes. Now they're laid out in a flat file(the archive file). So it would be 80(header).112(file1.txt).182(file2.txt).EOF . Thus if you know the size of each file, you can easily navigate(using fseek()) to a particular file and extract only that file. [to extract file2.txt I will just fseek(FILE*,(112+80),SEEK_SET); and then fgetc() 182 times. I think I made myself clear?
If the format of the file cannot be changed by adding additional header information to help, you'll have to search through it and work things out as you go.
This should not be too hard. Just read the file, and when you read a header line such as
file1.txt/ 1350248044 45503 13036 100660 28 `
you can check the filename and size etc. (You know you'll have a header line at the start after the !<arch>). If this is the file you want, the ftell() function from stdio.h will tell you exactly where you are in the file. Since the file size in bytes is given in the header line, you can read the file by reading that particular number of bytes ahead in the normal manner. Similarly, if it is not the file you want, you can use fseek() to move forward the number of bytes in the file you are skipping and be ready to read in the header info for the next file and repeat the process.

C Program - Build an index file from another file

I have a .bin file that contains records in CSV format what I want to do is assign each record in the bin file a sequence number. So the first record in the bin file would be assigned 0 and so on. These will be placed into a bianry index file such as (username, seq #).
If I have the bin file already created with records in it, how do I go through the bin file and index each record? Thanks for any help!
You would read lines from file A, and write lines to file B. Keep a count of each line you read and use that to generate the sequence number column when writing to file B.
To be honest, I would probably use Excel or a spreadsheet app if the file is already in CSV format

Resources