I do not understand the role of qw in Perl, which I understood in in debugging the code in this thread.
Its manual is very limited here: qw/string/.
I am trying to understand what goes wrong in the following code I am doing simple manipulation to the data (multiplying by one (1.0)).
Original code which is a simplified version of the situation in a real-world case
use v5.16;
use Math::Geometry::Planar qw(SegmentLineIntersection);
sub x_intercepts {
my ($points) = #_;
die 'Must pass at least 2 points' unless #$points >= 2;
my #intercepts;
my #x_axis = ( [0, 2000], [1, 2000] );
foreach my $i (0 .. $#$points - 1) {
my $intersect = SegmentLineIntersection([#$points[$i,$i+1], #x_axis]);
push #intercepts, $intersect if $intersect;
}
return \#intercepts;
}
my #xs = qw/22.595451 20.089094 17.380813 15.091260 12.477935 10.054821 7.270003 4.804673 4.728526 4.619254 4.526920 4.418416 4.321419 4.219890 4.123336 4.009777 3.912648 3.804183 3.705847 3.597756 3.512301 3.393413 3.301963 3.196725 3.098560 3.007482 2.899825 2.801002 2.688680 2.598862 2.496139 2.393526 2.282183 2.190449 2.084530 1.987778 1.877562 1.788788 1.678473 1.578123 1.467071 1.373372 1.283629 1.176670 1.071805 0.975422 0.877622 0.767820 0.667409 0.562480 0.469068 0.354589 0.264291 0.152522 0.063765 -0.045323 -0.136783 -0.248559 -0.343694 -0.459178 -0.551917 -0.640803 -0.755502 -0.845535 -0.955227 -1.045879 -1.155600 -1.254556 -1.365163 -1.461669 -1.571370 -1.658043 -1.772672 -1.865942 -1.981037 -2.073702 -2.176205 -2.276184 -2.367371 -2.476278 -2.567385 -2.686326 -2.777433 -2.884357 -2.980067 -3.087754 -3.183607 -3.291003 -3.386942 -3.495822 -3.586759 -3.702955 -3.793550 -3.900680 -3.999672 -4.093094 -4.200047 -4.301026 -4.399742 -4.493190 -4.602013 -4.705124 -4.812383 -4.907510 -5.022904 -5.109829 -5.214304 -5.317662 -7.703172 -10.350131 -12.921361 -15.431203 -18.188013 -20.544248 -22.822808 -25.679854 -22.999092 -20.540434 -17.964916 -15.398857 -12.990375 -10.402209 -7.888263 -5.504909 -5.217892 -5.109841 -5.014187 -4.908558 -4.811807 -4.704282 -4.605555 -4.504613 -4.406546 -4.292540 -4.204043 -4.088770 -3.995524 -3.905669 -3.796544 -3.707958 -3.596449 -3.490966 -3.382869 -3.293054 -3.185821 -3.088417 -2.971214 -2.880314 -2.772518 -2.677986 -2.569794 -2.473668 -2.365096 -2.276422 -2.179232 -2.068195 -1.973110 -1.859565 -1.771873 -1.669422 -1.569953 -1.462626 -1.364911 -1.258100 -1.159318 -1.050486 -0.959979 -0.849149 -0.749421 -0.640950 -0.547122 -0.451754 -0.344463 -0.252269 -0.134625 -0.051640 0.052970 0.154112 0.266505 0.353926 0.468739 0.561666 0.673810 0.759169 0.881697 0.973207 1.082409 1.170424 1.282163 1.378096 1.472728 1.586454 1.678473 1.785688 1.873862 1.984090 2.086021 2.196799 2.292400 2.386097 2.493190 2.601726 2.694346 2.803450 2.901878 3.011959 3.103050 3.196979 3.294507 3.397563 3.504076 3.600163 3.712539 3.809573 3.919799 4.012314 4.120694 4.216406 4.322895 4.416466 4.522871 4.623917 4.735925 4.826929 7.361253 9.647208 12.337984 14.870260 17.439730 19.921717 22.524080 25.125903/;
# -1 1.0 3 4 5 6 7 7 8 9 10/;
my #ys = qw/3699.316162 3676.939697 3659.950195 3641.476318 3605.965576 3580.152588 3555.092529 3528.118408 3509.595703 3504.416504 3508.355957 3510.452881 3510.304443 3499.548340 3499.943848 3493.196533 3499.488770 3494.984375 3494.390137 3493.938965 3492.606689 3493.434814 3488.926514 3488.890869 3484.599854 3488.077393 3484.177979 3477.801758 3478.098877 3472.040039 3477.642090 3477.802002 3472.908447 3478.532715 3469.805420 3472.759766 3464.269043 3463.950684 3465.888184 3458.441650 3459.006104 3455.686035 3455.677490 3454.548828 3454.241211 3455.250000 3449.803711 3447.423340 3457.498779 3448.445557 3453.106689 3447.701172 3444.543945 3448.558350 3450.073730 3449.884033 3444.751953 3444.056152 3444.825195 3449.671143 2593.656494 788.985779 776.407776 776.385925 767.522522 774.794250 770.596008 775.218384 770.962769 766.214294 766.735962 759.847351 760.073486 760.026489 753.721741 755.601929 753.942566 758.356506 747.932617 746.332214 746.464844 747.055115 750.173706 737.463379 739.810486 742.011475 744.332581 743.022461 737.302490 737.396606 734.325256 737.305359 740.642395 734.709717 735.754089 737.961182 740.697510 727.310913 730.918640 728.325012 721.845459 728.389893 727.765625 729.961243 725.608459 723.581909 730.736084 720.707764 720.398193 691.499390 657.534546 628.854431 615.219727 572.711365 561.505127 539.865173 517.139709 545.076416 580.880005 602.851135 628.006104 657.119263 683.746033 692.044373 716.640320 715.451294 715.415405 718.968018 723.698669 729.758606 728.564514 734.523376 731.454468 736.899780 731.257263 729.390686 732.837463 730.479431 733.497803 735.362732 742.581543 742.998108 735.918579 738.190002 738.108337 738.154297 740.425537 739.496033 743.105835 737.412537 740.537354 747.498108 747.424194 748.992920 752.244263 755.603455 756.611755 761.916504 762.920715 752.261658 758.293823 755.664062 753.728882 756.746338 754.825684 755.080444 761.192383 761.955505 763.209351 760.402771 764.342224 775.400940 767.148621 775.184998 777.084595 778.357117 776.217163 1086.248291 3444.476807 3438.105957 3440.625000 3438.325195 3438.430420 3449.251709 3453.216309 3452.126465 3458.547119 3451.694336 3456.417725 3457.336426 3457.777832 3455.553955 3457.256348 3458.823486 3459.088623 3459.492187 3463.538818 3466.455078 3456.521240 3459.809082 3457.505127 3462.721191 3466.518066 3467.562744 3469.211182 3469.120361 3464.043945 3466.291992 3472.698486 3476.146729 3471.635254 3472.539551 3475.163574 3473.687744 3479.102051 3488.351807 3482.367432 3481.961914 3484.844238 3481.511719 3482.469238 3488.947021 3488.882080 3491.247314 3499.116699 3511.889893 3539.602783 3565.981445 3598.203613 3628.028076 3657.928955 3685.231689/;
# my #ys = 1.0 * #ys;
my #input_list ;
foreach my $i ( 0..$#ys ) {
push #input_list, [ $xs[$i], $ys[$i] ] ;
}
my $intercept_list = x_intercepts(\#input_list) ;
say join ",", #$_ for #$intercept_list ;
where I just activate this line # my #ys = 1.0 * #ys; by commenting out which gives
Must pass at least 2 points at test2.pl line 7.
which suggests that 1.0 * qw_stuff is doing something else than I expect.
I run
my #ys = 1 * scalar(#ys); my #ys = qw/#ys/;
but the same error persists.
How can you do the simple manipulation of the data between integer and qw in Perl?
Quote Words:
my #xs = qw/22.595451 20.089094 17.380813 15.091260 12.477935 10.054821 7.270003 4.804673 4.728526 4.619254 4.526920 4.418416 4.321419 4.219890 4.123336 4.009777 3.912648 3.804183 3.705847 3.597756 3.512301 3.393413 3.301963 3.196725 3.098560 3.007482 2.899825 2.801002 2.688680 2.598862 2.496139 2.393526 2.282183 2.190449 2.084530 1.987778 1.877562 1.788788 1.678473 1.578123 1.467071 1.373372 1.283629 1.176670 1.071805 0.975422 0.877622 0.767820 0.667409 0.562480 0.469068 0.354589 0.264291 0.152522 0.063765 -0.045323 -0.136783 -0.248559 -0.343694 -0.459178 -0.551917 -0.640803 -0.755502 -0.845535 -0.955227 -1.045879 -1.155600 -1.254556 -1.365163 -1.461669 -1.571370 -1.658043 -1.772672 -1.865942 -1.981037 -2.073702 -2.176205 -2.276184 -2.367371 -2.476278 -2.567385 -2.686326 -2.777433 -2.884357 -2.980067 -3.087754 -3.183607 -3.291003 -3.386942 -3.495822 -3.586759 -3.702955 -3.793550 -3.900680 -3.999672 -4.093094 -4.200047 -4.301026 -4.399742 -4.493190 -4.602013 -4.705124 -4.812383 -4.907510 -5.022904 -5.109829 -5.214304 -5.317662 -7.703172 -10.350131 -12.921361 -15.431203 -18.188013 -20.544248 -22.822808 -25.679854 -22.999092 -20.540434 -17.964916 -15.398857 -12.990375 -10.402209 -7.888263 -5.504909 -5.217892 -5.109841 -5.014187 -4.908558 -4.811807 -4.704282 -4.605555 -4.504613 -4.406546 -4.292540 -4.204043 -4.088770 -3.995524 -3.905669 -3.796544 -3.707958 -3.596449 -3.490966 -3.382869 -3.293054 -3.185821 -3.088417 -2.971214 -2.880314 -2.772518 -2.677986 -2.569794 -2.473668 -2.365096 -2.276422 -2.179232 -2.068195 -1.973110 -1.859565 -1.771873 -1.669422 -1.569953 -1.462626 -1.364911 -1.258100 -1.159318 -1.050486 -0.959979 -0.849149 -0.749421 -0.640950 -0.547122 -0.451754 -0.344463 -0.252269 -0.134625 -0.051640 0.052970 0.154112 0.266505 0.353926 0.468739 0.561666 0.673810 0.759169 0.881697 0.973207 1.082409 1.170424 1.282163 1.378096 1.472728 1.586454 1.678473 1.785688 1.873862 1.984090 2.086021 2.196799 2.292400 2.386097 2.493190 2.601726 2.694346 2.803450 2.901878 3.011959 3.103050 3.196979 3.294507 3.397563 3.504076 3.600163 3.712539 3.809573 3.919799 4.012314 4.120694 4.216406 4.322895 4.416466 4.522871 4.623917 4.735925 4.826929 7.361253 9.647208 12.337984 14.870260 17.439730 19.921717 22.524080 25.125903/;
is identical to
my #xs = ('22.595451', '20.089094', '17.380813', '15.091260', 'etc...');
but, as you can see saves a bunch of typing.
So now that you've got a bunch of strings, you can just use them as numbers and perl will automatically do the conversion for you (generally recommended), or if their stringatude offends you, you can explicitly do a premature manual conversion.
When you want to transform an array or list, think map or foreach.
#ys = map { 1 * $_ } #ys;
or
foreach my $element (#ys) {
$element *= 1; # or += 0 both work.
}
... but seriously, what is that actually buying you? If you allow Perl to do the conversion at it's leisure, the values will become dual valued and won't need to be re-converted if you use them as strings. With Manual conversion you risk ending up with a numeric only value that may require re-stringifying and thus a possible second bite at the round-off error apple.
The only time you may really want to do a manual conversion is when you need to convert to BigInt, BigFloat, or similar objects (i.e. you are concerned about any possible loss of precision an auto convert to a float will do); or if you are prepping the values for output via JSON or YAML to a system that cares about the difference between 'text representation of a number' (49) and 'quoted strings of digits' ("49"), though be warned that this counts as re-stringifying.
Related
I am currently pulling in F1 prices from an Api, placing them into an Array. and determining what combination is less than or equal to 20. Using the below successfully:
require 'net/http'
require 'json'
#url = 'HIDDEN URL AS HAS NO RELEVANCE'
#uri = URI(#url)
#response = Net::HTTP.get(#uri)
#fantasy = JSON.parse(#response)
arr= [[#fantasy.first["Mercedes"].to_f, #fantasy.first["Ferrari"].to_f], [#fantasy.first["Hamilton"].to_f, #fantasy.first["Verstappen"].to_f]]
target = 20
#array = arr[0].product(*arr[1..-1]).select { |a| a.reduce(:+) <= target }
Where:
#fantasy = [{"Mercedes" => "4", "Ferrari" => "6.2", "Hamilton" => "7.1", "Verstappen" => "3"}]
This is successfully outputting:
[[4.0, 7.1], [4.0, 3.0], [6.2, 7.1], [6.2, 3.0]]
Eventually this will contain all F1 teams on the left side and all F1 drivers on the right (making an F1 fantasy teambuilder). But the idea is that only 1 constructor is needed and 5 drivers for the combination that should be equal or less than 20.
Is there a way to define this? To only use 1 Team (Mercedes, Ferrari etc) and 5 drivers (Hamilton, Verstappen etc) in the calculation? Obviously do not have 5 drivers included yet as just testing. So that my output would be:
[[4.0, 7.1, 3.0], [6.2, 7.1, 3.0]]
Where the constructor forms the 'base' for the calculation and then it can have any 5 of the driver calls?
My final question is, considering what I am trying to do, is this the best way to put my API into an array? As in to manually place #fantasy.first["Mercedes"].to_f inside my array brackets?
Thanks!
Not sure if I understand the question, but does this help?
arr = #fantasy.first.values.map(&:to_f)
target = 20
p result = arr.combination(2).select{|combi| combi.sum <= target}
I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})
I have an array as #versions with various versions ex- 6.1.1.1; 6.2.22.123; 2.12.1.123; 2.1.11.11, etc.
I have to make classes of all these specific format versions like- An array with version whose format is 1.1.1.1, 1.1.11.1, 1.1.1.111 and so on
My code is:
my #version1=();
foreach my $values (#versions)
{
if( $values=~ /(.?[0-9]{1}.[0-9]{1}.[0-9]{1}.[0-9]{1}$)/) # Regular expression to find version ex-1.1.1.1 at the end of the line($).
{
push (#version1,$values);
}
};
print "Version1 values are: #version1\n";
my #version2=();
foreach my $values (#versions)
{
if( $values=~ /(.?[0-9]{1}.[0-9]{1}.[0-9]{1}.[0-9]{2}$)/) # Regular expression to find version ex-1.1.1.11 at the end of the line($).
{
push (#version2,$values);
}
};
#print "Version2 values are: #version2\n";
It gives me the result but I cannot create so many regular expression, so I need to use looping.
Use the + quantifier to mean 1 or more digits:
/(.?[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$)/
I am a Perl newbie, stuck with another bioinformatics problem that requires some help and input.
The problem in brief:
I have a file, which has over 40,000 unique DNA sequences. By unique, I mean unique sequence id. I am attaching a portion of it at the end of my post to help you show what it looks like.
I need to divide each of the 40,000 sequences into 3 parts. So if a particular sequence is 999 character long, each of the 3 parts would have 333 characters.
I need to look for the following pattern through each of the 3 individual parts:
$gpat = [G]{3,5};
$npat = [A-Z]{1,25};
$pattern = $gpat.$npat.$gpat.$npat.$gpat.$npat.$gpat;
If $pattern appears in the first of the 3 parts, increase the counter of 'beginning', if $pattern occurs in the 2nd of the 3 parts, increase counter of 'middle' and lastly if the $pattern appears in the 3rd part, increase counter of 'end'.
Print the counters of 'beginning','middle' and 'end' i.e basically summation of 'beginning','middle','end' for each of the sequences.
Say in 1st sequence, the values are like '2','5','3' respectively and in 2nd sequence, the values are '4','1','6', the final count should be '7,6,9'.
The issues I am having:
If a particular sequence is split into 3 parts, potential $pattern is lost. eg say on a sequence like :
gggatgtcgatgcatggggatgcatcgatgcggggactagctagcgggatgctacgatggggatgatgataatatcgcggcgcatatatgctagtctatatatta
a split into 3 parts produces following 3 sub-parts,each of 35 character length:
gggatgtcgatgcatggggatgcatcgatgcgggg
actagctagcgggatgctacgatggggatgatgat
aatatcgcggcgcatatatgctagtctatatatta
Hence, $pattern gets split into the first 2 parts. Is there anyway to say "If $pattern begins in 1st part and ends in 2nd part", increase count of "beginning" ?
##UPDATE## The following issue has been resolved thanks to the code suggested by Cupidvogel
2.How do I divide a sequence into 3 parts if its length is not divisible by 3? I tried using int, but then the last part is 1-2
characters short.
The following is the code I have written so far.
It reads in the file, displays the header name and sequence, the length into which each sequence will be divided and finally the sequence split into 3 parts which works fine provided the sequence length is divisible by 3, for those which aren't, the final 3rd part is 1-2 characters short.
#Take Filename from user
print "Please enter file name : ";
$in =<>;
chomp $in;
open (FASTA,"$in") or die ;
while (<FASTA>)
{
$/=">";
#array = split '\n', $_;
$header=shift #array; # Header of the fasta sequence
print "\n\nNext sequence: \n";
print $header,"\n";
$seq= join '', #array; # sequence
$seq=~s/\s//g;
$seq=~s/\*//g;
$seq=~s/>//g;
print $seq,"\n\n";
$num = int(length($seq)/3);
#arr = unpack("A$num A$num A*",$seq);
print " New method gives this :", #arr;
print "\nThe first element is :", $arr[0];
print "\nThe second element is :",$arr[1];
print "\nThe third element is :",$arr[2] ;
#The following lines of code were originally written to split...
#...the sequence into 3 parts, albeit unsuccessfully
#my $split = (length $seq)/3;
#print $split,"\n\n";
#my $int = int $split;
#print $int,"\n\n";
#my #array2 = $seq =~ /(.{$int})/g;
#print join (" ", #array2),"\n\n";
#print $array2[0],"\n",$array2[1],"\n",$array2[2];
}
exit;
I have been trying the code I have written so far with the following sample file : sample.fa
>ABC_123 2
atgtcgatcgatcggcgggcatgcgcgcgcggatg
atatatagcgcgcgctatatagcgcgactctacgc
atgctgctgactagctatagtcgctgactgcgcgt
gggaaaaagggcccgggccccgttttggggatcta
ggggatagctgatgctagcatgcatgctgactgca
>DEF_456 4
gggatgtcgatgcatggggatgcatcgatgcgggg
actagctagcgggatgctacgatggggatgatgat
aatatcgcggcgcatatatgctagtctatatatta
>GHI_789 1
atagctgctagtcgatcggcgcgggtatcgatcgg
ggatcgatcgatcggggatcgatcgggggatcgat
The actual input file looks like the following:
>NR_037701 1
aggagctatgaatattaatgaaagtggtcctgatgcatgcatattaaaca
tgcatcttacatatgacacatgttcaccttggggtggagacttaatattt
aaatattgcaatcaggccctatacatcaaaaggtctattcaggacatgaa
ggcactcaagtatgcaatctctgtaaacccgctagaaccagtcatggtcg
gtgggctccttaccaggagaaaattaccgaaatcactcttgtccaatcaa
agctgtagttatggctggtggagttcagttagtcagcatctggtggagct
gcaagtgttttagtattgtttatttagaggccagtgcttatttagctgct
agagaaaaggaaaacttgtggcagttagaacatagtttattcttttaagt
gtagggctgcatgacttaacccttgtttggcatggccttaggtcctgttt
gtaatttggtatcttgttgccacaaagagtgtgtttggtcagtcttatga
cctctattttgacattaatgctggttggttgtgtctaaaccataaaaggg
aggggagtataatgaggtgtgtctgacctcttgtcctgtcatggctggga
actcagtttctaaggtttttctggggtcctctttgccaagagcgtttcta
ttcagttggtggaggggacttaggattttatttttagtttgcagccaggg
tcagtacatttcagtcacccccgcccagccctcctgatcctcctgtcatt
cctcacatcctgtcattgtcagagattttacagatatagagctgaatcat
ttcctgccatctcttttaacacacaggcctcccagatctttctaacccag
gacctacttggaaaggcatgctgggtctcttccacagactttaagctctc
cctacaccagaatttaggtgagtgctttgaggacatgaagctattcctcc
caccaccagtagccttgggctggcccacgccaactgtggagctggagcgg
gagggaggagtacagacatggaattttaattctgtaatccagggcttcag
ttatgtacaacatccatgccatttgatgattccaccactccttttccatc
tcccagaagcctgctttttaatgcccgcttaatattatcagagccgagcc
tggaatcaaactgcctctttcaaaacctgccactatatcctggctttgtg
acctcagccaagttgcttgactattctcagtctcagtttctgcacctgtc
aaatagggtttatgttaacctaactttcagggctgtcaggattaaatgag
catgaaccacataaaatgtttggtgtatagtaagtgtacagtaaatactt
ccattatcagtccctgcaattctatttttcttccttctctacacagcccc
tgtctggctttaaaatgtcctgccctgctttttatgagtggataccccca
gccctatgtggattagcaagttaagtaatgacactcagagacagttccat
ctttgtccataacttgctctgtgatccagtgtgcatcactcaaacagact
atctcttttctcctacaaaacagacagctgcctctcagataatgttgggg
gcataggaggaatgggaagcccgctaagagaacagaagtcaaaaacagtt
gggttctagatgggaggaggtgtgcgtgcacatgtatgtttgtgtttcag
gtcttggaatctcagcaggtcagtcacattgcagtgtgtcgcttcacctg
gctccctcttttaaagattttccttccctctttccaactccctgggtcct
ggatcctccaacagtgtcagggttagatgccttttatgggccacttgcat
tagtgtcctgatagaggcttaatcactgctcagaaactgccttctgccca
ctggcaaagggaggcaggggaaatacatgattctaattaatggtccaggc
agagaggacactcagaatttcaggactgaagagtatacatgtgtgtgatg
gtaaatgggcaaaaatcatcccttggcttctcatgcataatgcatgggca
cacagactcaaaccctctctcacacacatacacatatacattgttattcc
acacacaaggcataatcccagtgtccagtgcacatgcatacacgcacaca
ttcccttcctaggccactgtattgctttcctagggcatcttcttataaga
caccagtcgtataaggagcccaccccactcatctgagcttatcaaccaat
tacattaggaaagactgtatttcctagtaaggtcacattcagtagtactg
agggttgggacttcaacacagctttttgggggatcataattcaacccatg
acagccactgagattattatatctccagagaataaatgtgtggagttaaa
aggaagatacatgtggtacaaggggtggtaaggcaagggtaaaaggggag
ggaggggattgaactagacacagacacatgagcaggactttggggagtgt
gttttatatctgtcagatgcctagaacagcacctgaaatatgggactcaa
tcattttagtccccttctttctataagtgtgtgtgtgcggatatgtgtgc
tagatgttcttgctgtgttaggaggtgataaacatttgtccatgttatat
aggtggaaagggtcagactactaaattgtgaagacatcatctgtctgcat
ttattgagaatgtgaatatgaaacaagctgcaagtattctataaatgttc
actgttattagatattgtatgtctttgtgtccttttattcatgaattctt
gcacattatgaagaaagagtccatgtggtcagtgtcttacccggtgtagg
gtaaatgcacctgatagcaataacttaagcacacctttataatgacccta
tatggcagatgctcctgaatgtgtgtttcgagctagaaaatccgggagtg
gccaatcggagattcgtttcttatctataatagacatctgagcccctggc
ccatcccatgaaacccaggctgtagagaggattgaggccttaagttttgg
gttaaatgacagttgccaggtgtcgctcattagggaaaggggttaagtga
aaatgctgtataaactgcatgatgtttgcaggcagttgtggttttcctgc
ccagcctgccaccaccgggccatgcggatatgttgtccagcccaacacca
caggaccatttctgtatgtaagacaattctatccagcccgccacctctgg
actccctcccctgtatgtaagccctcaataaaaccccacgtctcttttgc
tggcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaa
>NM_198399 1
aacagattttaactctgaaaagccatttccagtgtctatagactattgtg
agcctggagaagtagcatttagttgggatagcttcactagagctgcctgc
caaagacttccttccacaggatcttgtcgcaccagcaactgacaggagct
tgggagctcgggagcttgggagagggcttatgtttttaataatgtagctg
tcagttcgaagcctggaaatgttgaccctcaaagggcataaaatcttgtt
attttaatttgcatctgggagaatgtctgagcaaggagacctgaatcagg
caatagcagaggaaggagggactgagcaggagacggccactccagagaac
ggcattgttaaatcagaaagtctggatgaagaggagaaactggaactgca
gaggcggctggaggctcagaatcaagaaagaagaaaatccaagtcaggag
caggaaaaggtaaactgactcgcagccttgctgtctgtgaggaatcttct
gccagaccaggaggtgaaagtcttcaggatcagactctctgaaaactgca
aatggaaaggaattcaaaagaatttagattaaaagttaaataaaaagtag
gcacagtagtgctgaattttcctcaaaggctctcttttgataaggctgaa
ccaaatataatcccaagtatcctctctccttccttgttggagatgtctta
cctctcagctccccaaaatgcacttgcctataagaaacacaattgctggt
tcatatgaaacttaggaaatagtgaataaggtgcatttaactttggagaa
atacttttatggctttggtggagatttctcaatactgcaaaagttgtcca
gaaatgaatctgagctgatggtgactttaagttaatattattaatatatc
actgcatatttttacccttatttttgctccttacagcaagattagtaggt
tataaaaatttaaatttaaacaaaattatttcatgacaaaatgggaaact
tcacatcatacttatttttgtttgcctttcaggcatcatattagctttta
taaaaaatggtcttgctgctgaaattgtacttattttatcagaggctggg
tgcagtcaagacaaaagtaaaatggtttacctgagcccaggggagggaaa
attgattaagatatcattatttttgtttggtttggttttgcttttttcct
cttactttaattgaaatactctgaattcccctcatggaaacagagagcat
tgagagcactttctttaaaaggaccaaaaataaattcctaatagattttg
tcctaagagagtgtttttttttctagcatcattttctttacatgccactc
atgtcataaggcatggacaggctatctttcagtggccattactatgtttc
gtacacatgctttattttacttgggctctgagaaatgtgtggctttcctt
cagcattttatttgtgcttctctttttaatggagattgaaaagggagaat
aatgtgaatatcacggcttatattattaaatgttgattgatggcttgtaa
tgtactgcacacaatatatgttaactctgcagaatgacagaccctgggag
aagtaatgccccagttgtcccccactcctaatgccaggcagagaaggaca
gcctttatagacttaatctgctttttgtcccatttgacaaggtaccagga
ggaaattttttaagggatcaactgtatcacagtgcccactctggacctaa
gtctagtgtatccatacaattggtgcagagaaataaggtgtaaatggtgc
tttgttcctgctggttccaagctcagaaaccaagactagctttgtaggag
agaatgagagcctgcaagcctctctttggattggctgaggagtggtggga
gcagggggttgatagaaaacatccagacacacatataagcaagtggccgt
gctacctttttagagaataaagaaacagacttttgagtttatatgcaatg
ccttcattaggtaccaccggcacttacaaaatgtgcggactgaatcccag
agaacactggcagatgtatacagtatatggattgtatcgcttccccaatg
tttgtaaattcacagtatttggaaaactgccttcattttccagtgtggga
aaaactcttgctacctgtattacttgatctcagacccatacctgatggtt
cagtctgtccttaagttaaaagaattttgcttttctaatgttatactatt
tacctgtcagtgtattactgcaacttgaatcactcttttactgttgttgg
atataaacttatcctgtaccaatgtatttattaacacttgtattttatta
ttgagcatatcaataaaaatattaaaaaataacagattgttttttaccaa
aaaaaaaaaaaaa
>NR_026816 1
caacccactctctgtgctatgacttcattactctttcccagcccagccct
gggcaagccccttacgaagtctcaggctacctggatgaccaccctttctt
atgatgctgcaaggagggcaggtgggcagagccccgtgcatcctgggctc
aggccagggacccaagagcttgggagaagctggttctcagactgaaggcc
agagcccagcaccttgtcaccatcccggggagcatcatggcacacaacaa
ccagagccaaggctacagctagagagttgactcctctatttgagattgac
aggcctcggaagtcaaaataagtggtttcctagaccgggtcgagagcaag
tctctattggtcccaactgagttttttcagctggtttttcaaccaaacag
cacctcatctcccagtgaggggaagggaaggctgggctgagagcagcaag
gctgctcatctcacctctccccacccagccatgccagccgcctcacctgg
tggggagaggtgggcctcacctgggtcccctggcagtgctctgtgaaggg
tcttgacattgcactgtaataataaaggtgtgtgtgaagtatcaaaaaaa
>NR_027917 1
atgaagatgattgagcagcacaatcaggaatacagggaagggaaacacag
cttcacaatggccatgaacgcctttggagaaatgaccagtgaagaattca
ggcaggtggtgaatggctttcaaaaccagaagcacaggaaggggaaagtg
ctccaggaacctctgcttcatgacatccgcaaatctgtggattggagaga
gaaaggctacgtgactcctgtgaaggatcagtgcagctggggctctgtaa
ggacagatgttaggaaaactgagaaactagtttcactgagtgtgcagacc
tggtggactgctctaggcttcaaggcaatgttggctgcatttttggagaa
ccattattttgcttccagtatgttgccgacaatggaggcctggactctga
ggaatccttttcatatgaagaaaagctctggagactggaaagtccaaggt
cacagaggtgcatctggtgagagccttcttgctagtggggaatctcagca
gagtcctgaggtggcacagtattctgggaagcatcaagtgcagtgtcatc
ttatcgaggaggctctgcagatgctaagtggtggggatgaggatcacgat
gaagacaaatggccccatgacatgaggaatcatctggctggagaggccca
ggtgtag
>NR_002777 3
cttgtcctttcagaagatcagagacaagtgatatctgtgccaatttggcc
ttttcagtgttataattatggtgtcttgggatcccaatatttctcctaat
gtttccctgatgtgatactttgagagcccaggatgccagtacaataattg
aaattcacaaatgtctggtatcttgtccctcgtgccccatatattatctg
tggtttcggagagctcacttgtctcttatcttcagaaatgacagcacatg
aaatgttgtttggagccactgtcacatcaactgtagaaaaattaacaggt
cagctaagggatataatgtaactttatttgtgatatgagagaaatcttga
taaagacttgagagaaaactgggaggaaccttgtttagaagttataagga
ggggtaagttatgtgtgtcttggaaggagaatcataaatcttaaaacatg
agcctaatagagaacataaaattctaaaagataaagataataataatgat
aagccgcagggtggcttatgataatgtgacttctccttaccccagtagcg
tcggacatctgtcagctctgaaatgataaaaatgcacaatattgaataca
aacaaaggagtcagcactgaaattcattttctctccagattagggaaaga
gtaggtatgccctatggtagggcagtaaattgctgaatgatgagatgaaa
cagccacctagccatttcccattaaatataatcccatcagcagcagacaa
tatctatcctcccctatcccctctatccatatttggaaactgcaccctct
tccctatttagcaccctaacaccacttgaattccataaccctgttgttga
tctagctctcctcacctctaaacacttctagcattcctttcagatcagga
gctcgaaacactctcctttgattttttggaaaagtttctggcttcttcaa
ggtcacgttctccgtcctaagaattaaaaaaaaaaaaaaaaacttccaaa
cctttgaccttgtgtccgtggaacacccctgacttcctatcatttcaacc
cattgaggcacttgaactctcttcttggggatcctgagaagggagagtgc
aaactcttgaccctggaggcaaacaaaatgttctcatgtttgccttccca
cttactttctgtgagaacgtgggaagatcttaacctctcagaagcacagt
ttcttccttctaaaatgaaataattaacctctccctgtctacattcttaa
actcataggacataaaaaaaaaaaaaa
>NR_033769 1
ggcctctggcgggcctccagccagttagaccatttgactaggacgtgtgc
agctcagccagccacagaactggaatttttcaggagcagggggagcatgg
agtttggactttgctgagcaactgaagtggagcgcagagcttgctcgctt
aggagagggcagcatggatggcaaacaagggggcatggatgggagcaagc
ccacggggccaagagactctcctgacaccaggcttctttcaaacccattg
atgggtgattctgtgtctgattggtctcctatgcctgaagctgcaatcta
cggacatcagctgtctctgaggaacctcatcagccacgggtggcttgtga
acatcatcatggcagatcatgtttccccactccatgaagcctgtctcaga
ggtcatccctctcgtgtaaagattttattaaagcatggagctcaggtgaa
tggcgtgacaacagactggcacactccactgtttaatgtttgtatcagca
gcagctgggattatgcttctgcagcatggagccagcgttcaacctgagag
tgatctggcatcccccgtccatgaagctgctaggagaggccacgtggagt
gtgtcgactctcttacagcttataggggcaaaaatgaccataacatcagc
cacgtgggcacttcactgtatttggcttgtgaaaaccagcagatagcctg
tgtcaagaagcttctggagtcaggagcagacctgaacccagggagaggtt
ccccacttcatgcagtggccttcatgaaggccctcatgaaggattcccca
cttcatgcagtggccaggacagccagtgaagagctggcctgcctgctcat
ggattttggagcagacacccaggccaagaatgctgaaggcaaatgtcatg
tggagctggtgcctccagagagccctttgatccagctcttcttggagaga
gaagggcccccttcttttgatgcagttatgcctagaaatcagaagggctt
tggaatccagcagcatcataagataaccaaagtcgtcctcccagaggatc
tgaaatggtttctcctacatctttgtatgtatcaatggaatggattcaca
aacaatgtgaaaacattattgagtgttgtagccactagaattttaaaatc
aagttaggtttatagagtttgactagttttttcgattagatttgtattag
ttataaatttgttcatagagtttgactaattttttcgattagatttgtat
ttgttaaactctgaagccagagtttaaacacactgcatacgtttgtatga
ttagttagaaggcatgaagacttttttccctgcttggagactgtctaaaa
taacagctattgttttgcatatccactgcaggccaagcactttcagcatc
atctaattcagccctcacagcaactgggtcaatctgtccaatttcccagg
gcaaggatagaggagtcagattcaaatacaggttttctgacgttaactta
tgtgatgatttgatcaaagcaggattttccagcatcactatccttgttcc
atctctgctatatgggaatgaaaataaagaaatgtatttcaaaaaaataa
aaagaaaagaaaaacagagacggtc
>NM_016326 3
atgcgcgcaagagagcgggaagccgagctgggcgagaagtaggggagggc
ggtgctccgccgcggtggcggttgctatcgcttcgcagaacctactcagg
cagccagctgagaagagttgagggaaagtgctgctgctgggtctgcagac
gcgatggataacgtgcagccgaaaataaaacatcgccccttctgcttcag
tgtgaaaggccacgtgaagatgctgcggctggtgtttgcacttgtgacag
cagtatgctgtcttgccgacggggcccttatttaccggaagcttctgttc
aatcccagcggtccttaccagaaaaagcctgtgcatgaaaaaaaagaagt
tttgtaattttatattactttttagtttgatactaagtattaaacatatt
tctgtattcttccacatattttctgcagttattttaactcagtataggag
ctagaggaagagatttccgaagtctgcaccccgcgcagagcactactgta
acttccaagggagcgctgggagcagcgggatcgggttttccggcacccgg
gcctgggtggcagggaagaatgtgccgggatccgcctcagggatctttga
atctctttactgcctggctggccggcagctccg
>NM_181641 2
atgcgcgcaagagagcgggaagccgagctgggcgagaagtaggggagggc
ggtgctccgccgcggtggcggttgctatcgcttcgcagaacctactcagg
cagccagctgagaagagttgagggaaagtgctgctgctgggtctgcagac
gcgatggataacgtgcagccgaaaataaaacatcgccccttctgcttcag
tgtgaaaggccacgtgaagatgctgcggctggcactaactgtgacatcta
tgaccttttttatcatcgcacaagcccctgaaccatatattgttatcact
ggatttgaagtcaccgttatcttatttttcatacttttatatgtactcag
acttgatcgattaatgaagtggttattttggcctttgcttgtgtttgcac
ttgtgacagcagtatgctgtcttgccgacggggcccttatttaccggaag
cttctgttcaatcccagcggtccttaccagaaaaagcctgtgcatgaaaa
aaaagaagttttgtaattttatattactttttagtttgatactaagtatt
aaacatatttctgtattcttccacatattttctgcagttattttaactca
gtataggagctagaggaagagatttccgaagtctgcaccccgcgcagagc
actactgtaacttccaagggagcgctgggagcagcgggatcgggttttcc
ggcacccgggcctgggtggcagggaagaatgtgccgggatccgcctcagg
gatctttgaatctctttactgcctggctggccggcagctccg
>NM_001144931 1
gtttccgttcctctgcccgccatgccgttcctagagctgcacacgaattt
ccccgccaaccgagtgcccgcggggctggagaaacggctgtgcgccgtcg
ctgcctccatcttgggcaaacctgcagaccttgtgaacgtgacggtacgg
ccgggcctggccagggcgctgagcgggtccaccgagccctgcgcgcagct
gtccatctcctccatcggcgtagtgggcaccgccgaggacaaccgcagcc
acagtgcccacttctttgagtttctcaccaaggagctagccctgggccag
gaccggtgcgcaggggtagtaggcccggaatattattctaaaacacaatc
agagtactccattcctgctaacagtttaaagccaaacacctaggcaggcc
atttaggcttctgaatgactgggtcttgaccaggagagctgctgtctagg
ttttctcttcctgaccagttcctcaagagaaatgcaaaactagtgattaa
cagtaagagtcaggcagggcgcggtggctcacgcctgtaatcccagcact
ttgggaggccgag
>NR_029429 1
ggacaccaccccaaaatttcctagtcctctttgatacgggttcctccaat
ctgtagctgccctccatctactgccagagccaagtctgctccaatcacaa
caggttcaatcccagcctgtcctccaccttcagaaacgatggacaaacct
atggactatcctatgggagtggcagcctgagtgtgttcctgggctatgac
actgtgactgttcataacatcgttgtcaataaccaggagtttggcctgag
tgagaatgagcccagcgaccccttttactattcagactttgacgggatcc
tgggaatggcctacccaaacatggcagaggggaattcccctacagtaatg
caggggatgctgcagcagagccagcttactcagcccgtcttcagcttcta
cttcacctgccagccaacccgccagtattgtggagagctcatccttggag
gtgtggaccccaactttattctggtcagatcatctggacccctgtcagcc
cgtaactgtactggcagattgccatcgaggaatttgccatcggtaaccag
gccactggcttgtgctctgagggttgccaggccattgtggataccgagac
cttcctgc
>NR_026551 1
tgtggcctgagaggacggccaggactggccagaaaagagagggacgtggc
taaacgtgagggggcgtggccaagatggccgcgtgcgggatcctcgggta
ccgggagcgaacgaggaggttctggctcagtgcatccactctgggagagc
gtggacctggttcctgggggcgatcgccagtcacccatcaacattcggtg
gagggacagtgtttatgatcccggcttaaaaccactgaccatctcttatg
acccagccacctgcctccacgtctggaataatgggtactctttcctcgtg
gaatttgaagattctacagataaatcagctgcacttagtgcattggaacg
cagtcaaatttgaaaactttgaggatgcagcactggaagaaaatggtttg
gctgtgataggagtatttttaaagatttcggaaacttctggcagcccagt
gtctactggaaggcccaagccgcttgccagaaagctgcgccccgcccaaa
agcactgggttctgcagtccaggcccttcctcagctcccaggtccaggag
aactgcaaggtcacctacttccacaggaagcactgggtccgcatccggcc
cctccgcaccactcctcccagctgggactacacccgcatctgcatccaga
gagagatggtccccgcccgcatccgcgtcctgagagagatggtccccgag
gcctggaggtgctttcccaacaggctgccgctgctgagcaacatcaggcc
tgatttctccaaggctcccctggcctacgtgaagcggtggctttggaccg
cccgccacccccacagcctgtccgcagcctggtgaccgtgaaaatcgccc
cgccagagagcagaggaagcccgacgcccaggccatctgccttcaggtct
gtgatgagaaacggagtggcctgttccgttgtgcccaggtctaggccgct
gagcagagccctcactcccaggcagagttgtctgaatccttcct
>NM_181640 2
atgcgcgcaagagagcgggaagccgagctgggcgagaagtaggggagggc
ggtgctccgccgcggtggcggttgctatcgcttcgcagaacctactcagg
cagccagctgagaagagttgagggaaagtgctgctgctgggtctgcagac
gcgatggataacgtgcagccgaaaataaaacatcgccccttctgcttcag
tgtgaaaggccacgtgaagatgctgcggctggatattatcaactcactgg
taacaacagtattcatgctcatcgtatctgtgttggcactgataccagaa
accacaacattgacagttggtggaggggtgtttgcacttgtgacagcagt
atgctgtcttgccgacggggcccttatttaccggaagcttctgttcaatc
ccagcggtccttaccagaaaaagcctgtgcatgaaaaaaaagaagttttg
taattttatattactttttagtttgatactaagtattaaacatatttctg
tattcttccacatattttctgcagttattttaactcagtataggagctag
aggaagagatttccgaagtctgcaccccgcgcagagcactactgtaactt
ccaagggagcgctgggagcagcgggatcgggttttccggcacccgggcct
gggtggcagggaagaatgtgccgggatccgcctcagggatctttgaatct
ctttactgcctggctggccggcagctccg
>NM_016951 3
atgcgcgcaagagagcgggaagccgagctgggcgagaagtaggggagggc
ggtgctccgccgcggtggcggttgctatcgcttcgcagaacctactcagg
cagccagctgagaagagttgagggaaagtgctgctgctgggtctgcagac
gcgatggataacgtgcagccgaaaataaaacatcgccccttctgcttcag
tgtgaaaggccacgtgaagatgctgcggctggcactaactgtgacatcta
tgaccttttttatcatcgcacaagcccctgaaccatatattgttatcact
ggatttgaagtcaccgttatcttatttttcatacttttatatgtactcag
acttgatcgattaatgaagtggttattttggcctttgcttgatattatca
actcactggtaacaacagtattcatgctcatcgtatctgtgttggcactg
ataccagaaaccacaacattgacagttggtggaggggtgtttgcacttgt
gacagcagtatgctgtcttgccgacggggcccttatttaccggaagcttc
tgttcaatcccagcggtccttaccagaaaaagcctgtgcatgaaaaaaaa
gaagttttgtaattttatattactttttagtttgatactaagtattaaac
atatttctgtattcttccacatattttctgcagttattttaactcagtat
aggagctagaggaagagatttccgaagtctgcaccccgcgcagagcacta
ctgtaacttccaagggagcgctgggagcagcgggatcgggttttccggca
cccgggcctgggtggcagggaagaatgtgccgggatccgcctcagggatc
tttgaatctctttactgcctggctggccggcagctccg
>NR_002773 1
cagcaccacaccaggaccctccagaggctgtgagaaacatcctgcaccca
ggtcctctctatctgtttatcattgtctattttgtattctgcattcagaa
ccaagagcctgaagacgacccaggagctttagctatggctgtcttcatta
ttttgtccctgtttagtgttctggtgacaggcatgggtgaaggtggggct
gggagtgagaaaggaggtgagagggaatgtaagctgaaccagcttcccca
ttgcccctccgtatctcccagtgcccagccttggacacaccctggccaga
gccagctgtttgcagacctgagccgagaggagctgacggctgtgatgcgc
tttctgacccagcagctggggccagggctggtggatgcagcccaggccca
gccctcggacaactgtgtcttctcagtggagttgcagctgcctcccaagg
ctgcagccctggctcacttggacagggggagccccccacctgcccgggag
gcactggccatcgtcttctttggcaggcaaccccagcccaacgtgagtga
gctggtggtggggccactgcctcacccctcctacatgcgggacgtgactg
tggagcgtcatggaggccccctgccctatcaccgacgccccatgttgttc
caagagtacctggacatagaccagatgatcttcgacagagagctgcccca
ggcttctgggcttctccatcactgttgcttctacaagcgccggggacgga
acctggtgacaatgaccacggctccccgtggtctgcaatcaggggaccgg
gccacctagtttggcctctactacaacatctcgggcgctgggttcttcct
gcaccacgtgggcttggagctgctagtgaaccacaaggcccttgaccctg
cccgctggactatccagaaggtgttctatcaaggccgctactatgacagc
ctggcccagctggaggcccagtttgaggccggcctggtgaatgtggtgct
gatcccagacaatggcacaggtgggtcctggtccctgaagtcccctgtgc
ccccgggtccagctccccctctgcagttccatccccaaggcccccgcttc
agtgtccagggaagtcgagtggcctcctcactgtggactttctcctttgg
cctcggagcattcagtggcccaaggatctttgacgttcccttccaagggg
agagggtggcctatgaagtcagtgtccaggcggccttggccatctatgga
ggcaattctccttctgctctacgaagccggtacatagatagtggctttgg
cttgggccacttctccacgcccctgacccatggggtggactgcccctacc
tggccacctacgtggactggcacttcctttttgagtcccaggccgccaag
acaatacgcgatgccttttgtatatttgaacagaaccagggcctccccct
gcggcgacaccactcagatctctactcccactactttgggggccttgcgg
aaacggtgctggtcatcagatctgtgtctactatgctcaactatgactat
gtgtgggatatggtcttccaccctaatggggccatagaaatcagactcca
caccaccggctacatcagctcagcattcccctttggtgctgcccagaggt
atggaaacaaagtttcagagcacaccctgggcacggtccacacccacagc
gcccacttcaaggtggacctggatgtagcaggtaaggcatcctggcagag
gcaaaagtgctggaggggtgagctgaagtctccatgcctagctttaaaag
ttttcgttgggctgggagcagtagcttatgcctgtaagcccaacactttg
ggagactgaggggggtggatcacttgaggtcaggagttcaaaaccagcct
ggccaacatggcgaaatcctgtctgtactaaaaatacaaaaattagctgg
gcatgggtatgctgtaatcctagctactcgggaggctgaggcaggagaat
cacttgaatctgggagtcagaggttgcagtgagctgagattgagccactg
cactccatcctgcgtgactgaac
>NR_037806 1
attcccagtcacccactcactcagaaagccgggagtcatcggacaccttg
ctggtcagaggtcctgggggtggttttgaaccatcagagcttggactttt
ctgacttccccagcaaggatcttcccacttcctgctccctgtgttcccac
cctccagtgttggcacaggcccacccctggctccaccagagccagaagca
gaggtagaatcaggcgggccccgggctgcactccgagcagtgttcctggc
catctttgctactttcctagagaacccggctgttgccttaaatgtgtgag
agggacttggccaaggcaaaagctggggagatgccagtgacaacatacag
ttcatgactaggtttaggaattgggcactgagaaaattctcaatatttca
gagagtccttcccttatttgggactcttaacacggtatcctcgctagttg
gttttaagggaaacactctgctcctgggtgtgagcagaggctctggtctt
gccctgtggtttgactctccttagaaccaccgcccaccagaaacataaag
gattaaaatcacactaataacccctggatggtcaatctgataataggatc
agatttacgtctaccctaattcttaacattgcagctttctctccatctgc
agattattcccagtctcccagtaacacgtttctacccagatcctttttca
tttccttaagttttgatctccgtcttcctgatgaagcaggcagagctcag
aggatcttggcatcacccaccaaagttagctgaaagcagggcactcctgg
ataaagcagcttcactcaactctggggaatgctaccattttttttccaaa
gtagaaaggaagcacttctgagccagtgaccactgaaagatgaacactct
tcctgatcctctcctctagaattcatctcctcctgctagcagccgcgtcc
tggaggagcagcggatggggaatccattctgtttcttcctggtgtttagg
aagttgccccacacacagattgccccgatgtccaaccagaagaagtgaaa
ctgctgctgggtctggagaggtgaagacccgtggccagcttctgttgttg
ccatcggccattgctttttgttcgcttgcttttggttttgcaagaagagc
ggcctctgtctctgatctgcttcaaatcatcattccatcagtgacagaag
tggctgttccatcagtggtcgcagccagttcagctcctgcatccatcccc
aagtgttctgagtggaatttgaggcctccccaaccacctaccaaaaaagg
agggtgaaatgaaaggaagaagaaaaactcagcattctttcctctgacaa
agagtaaaacgacaaggaatatcggcctgaattctcttcccaagaagaaa
gaaagcacaccaacgcaggcatttgtcttctgtccatggtgctgaagttt
attcactttcaaaccactttcagtaacagcaaattctttagaaaaggaaa
atacagggaaagggataaacctcactgacttggaggaaatcaagaggagt
gagcacagcatcagaaagccccctggccccagactgcacccgctttcctg
gccctaccttgaaatccatcaggtctgcgttggacacggcattgtacatg
ggattagctctg
Any help and input would be deeply appreciated.
Thank you for taking the time to go through my problem!
Rather than splitting the sequence into three parts, the way I see this working is to find all occurrences of $pattern in the complete sequence and determine in which third the pattern starts.
The built-in variable $-[0] contains the offset of the start of the most recent successful match.
The code below does what I think you want. It works by accumulating each sequence (which ends either when a new sequence ID is found or the end of file is reached) and passing it to the process_seq subroutine.
The subroutine takes the length of the sequence and caclulates the offset of the end of each third of the string. The idiomatic sprintf '%.0f', $value is used to round fractional values to the nearest character position.
The #counts array is adjusted for each occurrence of $regex in the sequence. The element of #counts to be incremented is established by comparing the starting position of the match in $-[0] with the end offset of each of the three segments of the sequence.
Once each sequence has been processed the values in #counts are accumulated into #totals to give overall figures for all sequences.
The output of the program when using your sample data is shown. The grand total is (9, 1, 6).
use strict;
use warnings;
my $gpat = '[G]{3,5}';
my $npat = '[A-Z]{1,25}';
my $pattern = $gpat.$npat.$gpat.$npat.$gpat.$npat.$gpat;
my $regex = qr/$pattern/i;
open my $fh, '<', 'sequences.txt' or die $!;
my ($id, $seq);
my #totals = (0, 0, 0);
while (<$fh>) {
chomp;
if (/^>(\w+)/) {
process_seq($seq) if $id;
$id = $1;
$seq = '';
print "$id\n";
}
elsif ($id) {
$seq .= $_;
process_seq($seq) if eof;
}
}
print "Total: #totals\n";
sub process_seq {
my $sequence = shift;
my $length = length $sequence;
my #offsets = map {sprintf '%.0f', $length * $_ / 3} 1..3;
my #counts = (0, 0, 0);
while ($sequence =~ /$regex/g) {
my $place = $-[0];
for my $i (0..2) {
next if $place >= $offsets[$i];
$counts[$i]++;
last;
}
}
print "#counts\n\n";
$totals[$_] += $counts[$_] for 0..2;
}
output
NR_037701
0 0 1
NM_198399
1 0 0
NR_026816
1 0 1
NR_027917
0 0 0
NR_002777
0 0 0
NR_033769
1 0 0
NM_016326
1 0 1
NM_181641
1 0 1
NM_001144931
0 0 0
NR_029429
0 1 0
NR_026551
1 0 0
NM_181640
1 0 1
NM_016951
1 0 1
NR_002773
1 0 0
NR_037806
0 0 0
Total: 9 1 6
I lifted Borodin's process_seq function but used Bio:SeqIO to read in the file sequence by sequence, an advantage over manually reading line by line and the logic to determine various processing. I believe those advantages are:
Code that has been developed and tested by many others
Whenever possible, if output is done via the Bio::SeqIO module, the result file can then be read using Bio::SeqIO read (next_seq) method.
Other reasons I can't think of now :-)
I imagine the BioPerl package of Bio Genetic code modules must be overwhelming to a biologist beginning programming. He might not be willing to try to dig out the information he needs to begin building a program. BioPerl wiki is a good starting place, especially the Howto section, and then there's a how to for beginners and others. You'll find code examples which are mostly(?) helpful. Bio::Seq has some good code examples in the beginning and is where most of the general sequence functions are. Also, for input/output, the Bio::SeqIO module is used and it has examples at the beginning of it's manual.
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my $gpat = '[G]{3,5}';
my $npat = '[A-Z]{1,25}';
my $pattern = $gpat.$npat.$gpat.$npat.$gpat.$npat.$gpat;
my $regex = qr/$pattern/i;
my $in = Bio::SeqIO->new ( -file => "fasta_dat.txt",
-format => 'fasta');
my #totals;
while ( my $seq = $in->next_seq() ) {
process($seq);
}
print "Totals: ";
print "#totals\n";
sub process {
my $seq = shift;
my #offset = map {sprintf '%.0f', $seq->length * $_ / 3} 1..3;
my $sequence = $seq->seq;
my #count = (0,0,0);
while ($sequence =~ /$regex/g) {
my $place = $-[0];
for my $i (0 .. 2) {
next if $place >= $offset[$i];
$count[$i]++;
last;
}
}
print $seq->id, "\n#count\n";
$totals[$_] += $count[$_] for 0 .. $#count;
}
I am trying to extract the Interface from an array created from an SNMP Query.
I want to create an array like THIS:
my #array = ( "Gig 11/8",
"Gig 10/1",
"Gig 10/4",
"Gig 10/2");
It currently looks like THIS:
my #array =
( "orem-g13ap-01 Gig 11/8 166 T AIR-LAP11 Gig 0",
"orem-g15ap-06 Gig 10/1 127 T AIR-LAP11 Gig 0",
"orem-g15ap-05 Gig 10/4 168 T AIR-LAP11 Gig 0",
"orem-g13ap-03 Gig 10/2 132 T AIR-LAP11 Gig 0");>
I am doing THIS:
foreach $ints (#array) {
#gig = substr("$ints", 17, 9);
print("Interface: #gig");
Sure it works, but the hostname [orem-g15ap-01] doesn't always stay the same length, it varies depending on the site. I need to extract the word "Gig" plus the next 6 characters. I have no idea what is the best way of doing this.
I am a novice at perl but trying. Thanks
# "I need to extract the word "Gig" plus the next 6 characters."
# This looks like a fixed-with file format, so consider using unpack.
foreach ( #lines ) {
my( $orem, $gig, $rest ) = unpack 'a17 a9 a*';
print "[$gig]\n";
}
If it's not fixed-with format, then you need to find out what the file spec is and then maybe use a regular expression, something like:
my( $orem, $gig, $rest ) = m/(\S+)\s+(.{9})(.*)/;
But this will not work in the general case without a proper file spec.
Stuff like that is what Perl is made for. Regular Expressions are the way to go. Read the perldoc perlre.
foreach $ints (#array) {
$ints =~ s/(Gig.{6})/$1/g;
}
So you want the second and third field.
my #array = map { /^\S+\s+(\S+\s\S+)/s } #source;
This one is like ikegami's, but I recommend that if you know how something you want looks, then by all means, specify that. Because this is done in a list context, any string that does not match the spec, returns an empty list--or is ignored.
my #results = map { m!(\bGig\s+\d+/d+)! } #array;