Perl: Find maximum value of a hash and compute averages - arrays

After a big break of ~6 months I am back in the world of Perl and Bioinformatics, interning under a different scientist. But the very first assignment is unlike any I had encountered last time, so while I have made some progress, I haven't been able to tackle the problem in its entirety. I am also trying to revise whatever I learnt last time as fast as possible, because I completely lost touch with programming these last 6 months.
The dataset looks like the following:
NR_046018 DDX11L1 , 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1.44 2.72 3.84 4.92 5.6 6.64 7.08 9.12 9.56 8.28 7.16 6.08 5.4 4.36 3.92 1.88 0 0 0.76 1 1 1 1.2 2 2 2 1.72 2 2 2 1.8 1 1.88 2.4 3 3.36 5 6 6 6.72 6.12 5.6 5.44 5.56 5 4.04 5 4.28 4 4 3.08 2.08 1.68 1.96 1.44 3 3.68 4 4.16 5 4.32 4.8 6.16 6 6.28 6.92 7.84 7 7.32 7.2 5.96 5 4.52 4.08 3 3 4.04 4.12 4.44 4 3.52 3.4 4 4 2.64 1.88 1 1 1 0.64 1 1 1.24 2 2.92 3 3 2.96 2 2 2.56 2 1.08 2.12 3 3 3 3 2.6 3 4.64 3.88 3.72 4 4 4.96 4.6 4 2.36 2 1.28 1 1 0.04 0 0.24 1.08 2.68 3.84 4.12 5.72 6 6 5.76 4.92 3.32 3.12 2.88 2.08 2 2 2 2 2 1.44 2.92 3.04 4.28 5.8 7.8 9.48 10.52 13.04 12.08 11.6 11.72 11 9.2 7.52 7.12 7.08 7.08 8.32 7 6.6 7.6 8.04 8.36 6.72 7.88 7.72 8.4 9.24 8.88 8.96 9.88 10.08 9.24 9.28 10.16 11.04 10.52 10 8.56 8 7.8 7.72 6.44 4.32 4 4 3.72 3.68 3.68 3.28 5.56 7.36 9.48 10 10.52 11 12.16 11.96 9.44 8.64 7.52 7 6.48 6 5 5.12 6.28 6 5.52 6 6.68 6.08 7.52 8.16 7.72 8.52 8.56 9.2 9.16 8.92 7.44 6 5 3.48 2.92 2.16 2 2 1.2 1 1 1 1.24 1.64 1 1 1.96 2 2 2 1.76 1 1 1 0.52 1.76 3.64 5.12 6 6 6 6 5.52 4.24 2.36 0.88 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 1 1 1.44 2.44 3.68 5.4 6.88 7 6 6.52 6.76 6.56 5.32 3.6 2.92 3 3.72 3.96 3.8 3 3 3 2.2 2.4 2.28 1.52 1 1 1 1.72 2 1.6 1 1 1 1 1 0.28 0.92 2 2 2.72 3.64 4 4.84 5 4.08 3 3 2.68 2.36 2 1.16 1 1 2 4.92 4.6 4 4 4 4 4.32 4 1.08 1 1.52 2 2 2 1.68 1 1 1.32 1.48 1 1 1.52 2 2 2 1.68 1 1 1.88 1.48 1 1 1 1 1 1 0.12 0.4 1 1 1.2 3.88 4 5 5 4.6 4 4 3.8 2.08 2 1 1 1.44 2.4 3
NR_047520 LOC643837 , 3 2.2 0.2 0 0 0.28 1 1 1 1 2.2 4.8 5 5.32 5 5 5 5 3.8 1.2 1 0.4 0 0 0 0 0 1 1 1 1 1 1 1 1.56 1 1 1 1 1 1 1 0.44 0.68 1 1.52 3 3.6 4.96 6.8 9 8.32 8.72 8.48 7 7.4 8.8 7.92 7.12 8.84 8.56 9.4 10.2 10 7.24 6.44 6.76 6.16 5.72 4.96 4.8 5.16 6 5.84 4.12 3 3 2.64 2.56 3.08 3 4.16 5 6.72 7 7.16 7.44 5.76 5 4.56 4 3.68 5 5.4 5.52 6 6 5.28 5 3.6 2 2.08 1.48 1 2 2 2 2 2 1.36 1 1 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 0 1.16 2 2 2 2 2.88 3 3 1.84 1 2 2 2.04 2.12 2 2 2 2 1 1.28 1.96 1.36 2.76 3 3 3 3 2.72 2 1.64 0.76 1 1.36 2 2 2 2 2 1.48 1 0.64 0 0.08 1 1 1.08 2 2 2 2 2.68 2 2 2.16 3.4 4 4 4.2 4.24 4 5.68 6.52 4.6 4 4 3.8 3.8 4 3.12 2.24 2.6 3 4 4 3.2 3 2.2 2 1.4 1.84 1.24 2 2 2 2 2 2 1.16 0.76 0 0 0 0 0 0 0 0.36 1 1.68 2 2 2.92 5.4 6.76 7.64 7 6.88 7 7.36 7.92 6.24 5.92 7.04 9.52 11.52 12.88 14.8 16.36 19.88 22.24 20 19.36 16.92 15.24 13.84 10.88 8.24 5.08 4.96 3.12 3 2.88 2 2.8 2.96 4 4.44 5 6 6 6 5.12 3.28 2 1.56 1 0.08 1.68 2 2 2.84 3 3 3.8 3.92 2.32 2 2.2 2.16 2 2 1.2 1 1 1 0.8 0 0 0 0.72 2.88 3 3 3 3 3 3 2.28 0.12 0 0.52 1 1 1 1 1.44 2 2 1.48 1 1 1 1.56 1.56 1 1 1 1 1 1 0.44 0.8 1.48 3 3 3 3 3 3.56 3.2 2.76 2 2 2 2 2.68 2.44 2 1.76 1 1.4 2 2 1.56 2 2 2 2 2.04 2 2 1.76 1 1 1 1 0.56 0 0 0 0 0 0 0 0 0.72 1.52 2 2 2 2 2 2 1.28 0.48
1. What is needed
For each row in the data file, find the maximum value from the range of numbers.
Once the maximum has been found for all the rows, find average maximum.
2. Strategy I was thinking
Separate the non numerical part from the non-numerical part into "keys" of a hash.
Put the numerical part into the "values" of a hash.
Assign the "values" into array #values
Use module use List::Util qw(max) to find maximum value from the array
Store these maximum values in another array and find average from this array.
3. Code written so far
use warnings;
use List::Util qw(max);
#Input filename
$file = 'test1.data';
#Open file
open I, '<', $file or die;
#Separate data into keys and values, based on ','
chop (%hash = map { split /\s*,\s*/,$_,2 } grep (!/^$/,<I>));
print "$_ => $hash{$_}\n" for keys %hash; #Code is working fine till here
#Create a values array
#values = values %hash;
foreach $value(#values){
print "The values are : ", $value,"\n";
}
4. The Problem
Beyond this, I am not able to figure out how to add each "individual" array
element into a new array so that I may use the max function.
What I mean is that for example, the first array element in #values contains data like 0 0 1 1 3 4.4. The second array element might have data like 3 2.2 0.28 1 1 4.8. So I need to put each of these array elements into a new array, each element going into a different array so that I may be able to use the max function.
5. Points to Note
Most of the rows contain 400 numbers, some have a little less than that, but never more than 400.
There are a total of 23,558 rows.
File is a .txt file and all the numbers in each row are tab delimited.
I would be grateful to anyone who would be kind enough to point me in the right direction, or perhaps provide a better code to tackle the problem as mentioned in 1.

If I understand your problem correctly you're making it overly complicated:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(max);
#Input filename
my $file = 'test1.data';
#Open file
open my $fh, '<', $file or die "Unable to open $file: $!\n";
my ($total, $num);
while (<$fh>) {
my #values = split;
my $max = max(#values[3 .. $#values]);
$total += $max;
$num++;
}
my $average_max = $total / $num;
Just make one pass over your file, splitting the lines into an array and feeding everything from index 3 to max. Add $max to $total for each line, increment a counter ($num) and calculate average max from that.
You should also always use use strict and lexical filehandles.

Here is a fun solution. If you are using List::Util, you might as well use sum also.
#!usr/bin/perl
use strict;
use warnings;
use List::Util qw/max sum/;
my %line_max = map {
/([\w\s]*?)\s*,\s*(.*)/ or die "bad line";
$1 => max split ' ', $2
} <DATA>;
print "$_: $line_max{$_}\n" foreach (keys %line_max);
my $avg_max = sum (values %line_max) / scalar (values %line_max);
print "average: $avg_max\n";
__DATA__
NR_046018 DDX11L1 , 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1.44 2.72 3.84 4.92 5.6 6.64 7.08 9.12 9.56 8.28 7.16 6.08 5.4 4.36 3.92 1.88 0 0 0.76 1 1 1 1.2 2 2 2 1.72 2 2 2 1.8 1 1.88 2.4 3 3.36 5 6 6 6.72 6.12 5.6 5.44 5.56 5 4.04 5 4.28 4 4 3.08 2.08 1.68 1.96 1.44 3 3.68 4 4.16 5 4.32 4.8 6.16 6 6.28 6.92 7.84 7 7.32 7.2 5.96 5 4.52 4.08 3 3 4.04 4.12 4.44 4 3.52 3.4 4 4 2.64 1.88 1 1 1 0.64 1 1 1.24 2 2.92 3 3 2.96 2 2 2.56 2 1.08 2.12 3 3 3 3 2.6 3 4.64 3.88 3.72 4 4 4.96 4.6 4 2.36 2 1.28 1 1 0.04 0 0.24 1.08 2.68 3.84 4.12 5.72 6 6 5.76 4.92 3.32 3.12 2.88 2.08 2 2 2 2 2 1.44 2.92 3.04 4.28 5.8 7.8 9.48 10.52 13.04 12.08 11.6 11.72 11 9.2 7.52 7.12 7.08 7.08 8.32 7 6.6 7.6 8.04 8.36 6.72 7.88 7.72 8.4 9.24 8.88 8.96 9.88 10.08 9.24 9.28 10.16 11.04 10.52 10 8.56 8 7.8 7.72 6.44 4.32 4 4 3.72 3.68 3.68 3.28 5.56 7.36 9.48 10 10.52 11 12.16 11.96 9.44 8.64 7.52 7 6.48 6 5 5.12 6.28 6 5.52 6 6.68 6.08 7.52 8.16 7.72 8.52 8.56 9.2 9.16 8.92 7.44 6 5 3.48 2.92 2.16 2 2 1.2 1 1 1 1.24 1.64 1 1 1.96 2 2 2 1.76 1 1 1 0.52 1.76 3.64 5.12 6 6 6 6 5.52 4.24 2.36 0.88 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 1 1 1.44 2.44 3.68 5.4 6.88 7 6 6.52 6.76 6.56 5.32 3.6 2.92 3 3.72 3.96 3.8 3 3 3 2.2 2.4 2.28 1.52 1 1 1 1.72 2 1.6 1 1 1 1 1 0.28 0.92 2 2 2.72 3.64 4 4.84 5 4.08 3 3 2.68 2.36 2 1.16 1 1 2 4.92 4.6 4 4 4 4 4.32 4 1.08 1 1.52 2 2 2 1.68 1 1 1.32 1.48 1 1 1.52 2 2 2 1.68 1 1 1.88 1.48 1 1 1 1 1 1 0.12 0.4 1 1 1.2 3.88 4 5 5 4.6 4 4 3.8 2.08 2 1 1 1.44 2.4 3
NR_047520 LOC643837 , 3 2.2 0.2 0 0 0.28 1 1 1 1 2.2 4.8 5 5.32 5 5 5 5 3.8 1.2 1 0.4 0 0 0 0 0 1 1 1 1 1 1 1 1.56 1 1 1 1 1 1 1 0.44 0.68 1 1.52 3 3.6 4.96 6.8 9 8.32 8.72 8.48 7 7.4 8.8 7.92 7.12 8.84 8.56 9.4 10.2 10 7.24 6.44 6.76 6.16 5.72 4.96 4.8 5.16 6 5.84 4.12 3 3 2.64 2.56 3.08 3 4.16 5 6.72 7 7.16 7.44 5.76 5 4.56 4 3.68 5 5.4 5.52 6 6 5.28 5 3.6 2 2.08 1.48 1 2 2 2 2 2 1.36 1 1 0 0 0.68 1 1 1 1 1 1 1 0.32 0 0 0 1.16 2 2 2 2 2.88 3 3 1.84 1 2 2 2.04 2.12 2 2 2 2 1 1.28 1.96 1.36 2.76 3 3 3 3 2.72 2 1.64 0.76 1 1.36 2 2 2 2 2 1.48 1 0.64 0 0.08 1 1 1.08 2 2 2 2 2.68 2 2 2.16 3.4 4 4 4.2 4.24 4 5.68 6.52 4.6 4 4 3.8 3.8 4 3.12 2.24 2.6 3 4 4 3.2 3 2.2 2 1.4 1.84 1.24 2 2 2 2 2 2 1.16 0.76 0 0 0 0 0 0 0 0.36 1 1.68 2 2 2.92 5.4 6.76 7.64 7 6.88 7 7.36 7.92 6.24 5.92 7.04 9.52 11.52 12.88 14.8 16.36 19.88 22.24 20 19.36 16.92 15.24 13.84 10.88 8.24 5.08 4.96 3.12 3 2.88 2 2.8 2.96 4 4.44 5 6 6 6 5.12 3.28 2 1.56 1 0.08 1.68 2 2 2.84 3 3 3.8 3.92 2.32 2 2.2 2.16 2 2 1.2 1 1 1 0.8 0 0 0 0.72 2.88 3 3 3 3 3 3 2.28 0.12 0 0.52 1 1 1 1 1.44 2 2 1.48 1 1 1 1.56 1.56 1 1 1 1 1 1 0.44 0.8 1.48 3 3 3 3 3 3.56 3.2 2.76 2 2 2 2 2.68 2.44 2 1.76 1 1.4 2 2 1.56 2 2 2 2 2.04 2 2 1.76 1 1 1 1 0.56 0 0 0 0 0 0 0 0 0.72 1.52 2 2 2 2 2 2 1.28 0.48
Note: the map syntax is cute, but if the file is large you should be using a while loop for efficiency. The while loop avoids reading the whole file into memory:
while (<DATA>)
{
if (/^([\w\s]*?)\s*,\s*(.*)/)
{
$line_max{$1} = max split ' ', $2;
}
else
{
print "Line $. is bad.\n";
}
}

Related

Spotfire Line chart with min max bars

I am trying to make a chart that has a line graph showing the change in value in the count column for each month, and then two points showing the min and max value in that month. The table table is below.
Date Min Max Count
1/1/2015 0.28 6.02 13
2/1/2015 0.2 7.72 8
3/1/2015 1 1 1
4/1/2015 0.4 6.87 7
5/1/2015 0.36 3.05 8
6/1/2015 0.17 1.26 13
7/1/2015 0.31 1.59 15
8/1/2015 0.39 3.35 13
9/1/2015 0.22 0.86 10
10/1/2015 0.3 2.48 13
11/1/2015 0.16 0.82 9
12/1/2015 0.33 2.18 5
1/1/2016 0.23 1.16 14
2/1/2016 0.38 1.74 7
3/1/2016 0.1 8.87 9
4/1/2016 0.28 0.68 3
5/1/2016 0.13 3.23 11
6/1/2016 0.33 1 5
7/1/2016 0.28 1.26 4
8/1/2016 0.08 0.41 2
9/1/2016 0.43 0.61 2
10/1/2016 0.49 1.39 4
11/1/2016 0.89 0.89 1
I tried doing a scatter plot but when I try to Add a Line from Column value I get an error saying that the line cannot work on categorical data.
Any suggestions on how I can prepare this visualization?
Thanks!
I would do this in a combination chart.
Insert a combination chart (Line & Bar Graph)
On your X-Axis put your date as <BinByDateTime([Date],"Year.Month",1)>
On your Y-Axis put your aggregations: Sum([Count]), Max([Max]), Min([Min])
Right click > Properties > Series > set the Min and Max to Line Type
(Optional) Change the Y-Axis scale

Index Rebuild and Reorganize

How we can identify that we have to rebuild and reorganize the indexes in sqlserver.
i mean to say that percentage is acceptable of fragmentation for rebuild the indexes.
for example below status report:
index_id avg_page_space_used_in_percent avg_fragmentation_in_percent index_level record_count page_count fragment_count avg_record_size_in_bytes
1 99.47111441 0 0 300000 2231 2 57.888
1 89.55707932 0 1 2231 4 2 11
1 0.617741537 0 2 4 1 1 11
4 99.72704472 0.113895216 0 300000 878 4 21.629
4 80.40214974 0 1 878 4 2 27.657
4 1.383741043 0 2 4 1 1 26.5
5 99.71136644 0 0 300000 1236 4 31.259
5 85.67899679 0 1 1236 7 2 37.286
5 3.261675315 0 2 7 1 1 36
please let me know and i would like know criteria,when this action required.
act acording to this link it explains how and when

Plot this kind of graph from data of an array

Good afternoon,
I am working on a Matlab project and I have stored some data in an array. I would like to plot a plot like the plot shown below. However, I don't know what plotting function I need to use and how, in order to obtain the image plot (it will be not the same, but this style).
My data is on a 11x16 - matrix.
Thank you guys so much beforehand!
#rayryeng
It was a really useful answer, although I didn't need that exact shape. I need the shape that my data would create, I've been trying to modify the code you wrote in order to obtain what I need but I did not obtained it...
My data is
data = ( 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ;
8.00 8.02 8.04 8.07 8.12 8.20 8.30 8.42 8.53 8.63 8.72 8.80 8.86 8.91 8.96 9.00;
6.00 6.03 6.07 6.12 6.22 6.37 6.59 6.83 7.07 7.28 7.45 7.60 7.72 7.83 7.92 8.00;
4.00 4.03 4.07 4.14 4.26 4.48 4.85 5.26 5.63 5.95 6.21 6.43 6.61 6.75 6.88 7.00;
2.00 2.02 2.05 2.10 2.20 2.44 3.08 3.70 4.23 4.67 5.01 5.29 5.52 5.70 5.86 6.00;
0 0 0 0 0 0 1.33 2.24 2.93 3.47 3.88 4.21 4.46 4.67 4.84 5.00;
0 0 0 0 0 0 0 1.01 1.78 2.38 2.84 3.19 3.46 3.67 3.84 4.00;
0 0 0 0 0 0 0 0 0.80 1.43 1.91 2.25 2.51 2.70 2.86 3.00;
0 0 0 0 0 0 0 0 0 0.63 1.10 1.41 1.62 1.77 1.89 2.00;
0 0 0 0 0 0 0 0 0 0 0.44 0.66 0.79 0.88 0.94 1.00;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
This is my matrix of data (sorry I know it's too long), well and when I try to plot writing:
[x,y] = meshgrid(1:16,1:11);
contourf(x,y,data,20,'LineStyle','none');
colorbar
It should have a different shape than what I get. I need to get that the part that are 0 (zeros) are like the white part of the plot I showed before. (Different shape though) I don't really know how to do it (my data should be read properly), if you could help me I would be really thankful.
Thank you so much for last answer.
It depends on your data, I believe you should use contourf.
This is as close as I could get,
[x,y] = meshgrid(1:16,1:11);
data = - y;
data(end,5:10) = NaN;
data(end-1,6:9) = NaN;
data(end-2,7:8) = NaN;
contourf(x,y,data,20,'LineStyle','none');
colorbar
with,
data = - y .* abs(log(sin(.10 * x - 5.5)+.5));
data(data < -4) = NaN;
So I suppose the code is right, it's matter of your data,
with data = max(data(:)) - data;
What you have is almost correct. All you need to do is set any data that is 0 to NaN. That way, when you throw it into contourf, those parts are not visualized. As such:
data = [10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 ;
8.00 8.02 8.04 8.07 8.12 8.20 8.30 8.42 8.53 8.63 8.72 8.80 8.86 8.91 8.96 9.00;
6.00 6.03 6.07 6.12 6.22 6.37 6.59 6.83 7.07 7.28 7.45 7.60 7.72 7.83 7.92 8.00;
4.00 4.03 4.07 4.14 4.26 4.48 4.85 5.26 5.63 5.95 6.21 6.43 6.61 6.75 6.88 7.00;
2.00 2.02 2.05 2.10 2.20 2.44 3.08 3.70 4.23 4.67 5.01 5.29 5.52 5.70 5.86 6.00;
0 0 0 0 0 0 1.33 2.24 2.93 3.47 3.88 4.21 4.46 4.67 4.84 5.00;
0 0 0 0 0 0 0 1.01 1.78 2.38 2.84 3.19 3.46 3.67 3.84 4.00;
0 0 0 0 0 0 0 0 0.80 1.43 1.91 2.25 2.51 2.70 2.86 3.00;
0 0 0 0 0 0 0 0 0 0.63 1.10 1.41 1.62 1.77 1.89 2.00;
0 0 0 0 0 0 0 0 0 0 0.44 0.66 0.79 0.88 0.94 1.00;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];
data(data == 0) = NaN;
[x,y] = meshgrid(1:16,1:11);
contourf(x,y,data,20,'LineStyle','none');
colorbar
This is what I get:
Given your comments, you want the y-axis to be reversed. Simply put axis ij; at the end of the code above to flip the y-axis so that y-down is the positive direction. If you do that, we get this figure:
Credit should go to Kamtal as he figured out where you needed to start. I just helped finish off the requirement.

reading and printing a .csv file like a 2D matrix with both integer and float values in c

Reading a file in c with .csv as extension. The file consisting of both integer and float type data values. Is there any way to read the csv file. Any help is appreciated.
The data is as follows:
Application_No. Actual_Effort (in PM) No of Processes No of Tasks No of partnerLinks Task Variables Element Variables Event Variables Script Developer's Skills Developer's Confidence TPSS TS TCC
1 918.28 1 3 5 33 7 2 3 3.5 1 8 135 143
2 8891.513 3 9 3 100 15 6 12 3 1 36 1197 1233
3 22479.261 5 15 23 125 25 10 20 3 1 190 2700 2890
4 2961.131 2 4 9 70 13 4 17 2 0 72 416 488
5 19650.198 7 14 19 130 28 12 5 2.5 0 231 2450 2681
6 377.75 1 2 4 22 8 2 2 3 1 6 68 74
7 2671.93 1 5 12 55 12 6 4 2 0 17 385 402
8 966.15 3 3 6 31 8 5 7 2.5 0 27 153 180
9 3765.81 2 6 17 73 14 2 3 3.5 1 46 552 590
10 7467.11 4 8 21 87 19 13 1 2 0 116 960 1076

Subsetting Last N Values From a Data Frame, R

I have a data frame of all the results of a football season, in a data frame called new. I want to extract the last 5 games of all teams home and away. The home variable is column 1 and away variable is column 2.
Say there are 20 teams in a character vector called teams, each with a unique name. If it was just a single team it would be easy to subset - say if team1 was "Arsenal", using something like
Arsenal <- "Arsenal"
head(new[new[,1] == Arsenal | new[,2] == Arsenal,], 5)
But I want to loop through the character vector teams to obtain the last 5 results of all teams, 20 in total. Can somebody help me please?
Edit: Here is some sample data. As an example, I would like to obtain the last two games of all teams- it would be easy to subset a single team but I'm not sure how to subset multiple teams.
V1 V2 V3 V4 V5
1 Chelsea Everton 2 1 19/05/2013
2 Liverpool QPR 1 0 19/05/2013
3 Man City Norwich 2 3 19/05/2013
4 Newcastle Arsenal 0 1 19/05/2013
5 Southampton Stoke 1 1 19/05/2013
6 Swansea Fulham 0 3 19/05/2013
7 Tottenham Sunderland 1 0 19/05/2013
8 West Brom Man United 5 5 19/05/2013
9 West Ham Reading 4 2 19/05/2013
10 Wigan Aston Villa 2 2 19/05/2013
11 Arsenal Wigan 4 1 14/05/2013
12 Reading Man City 0 2 14/05/2013
13 Everton West Ham 2 0 12/05/2013
14 Fulham Liverpool 1 3 12/05/2013
15 Man United Swansea 2 1 12/05/2013
16 Norwich West Brom 4 0 12/05/2013
17 QPR Newcastle 1 2 12/05/2013
18 Stoke Tottenham 1 2 12/05/2013
19 Sunderland Southampton 1 1 12/05/2013
20 Aston Villa Chelsea 1 2 11/05/2013
21 Chelsea Tottenham 2 2 08/05/2013
22 Man City West Brom 1 0 07/05/2013
23 Wigan Swansea 2 3 07/05/2013
24 Sunderland Stoke 1 1 06/05/2013
25 Liverpool Everton 0 0 05/05/2013
26 Man United Chelsea 0 1 05/05/2013
27 Fulham Reading 2 4 04/05/2013
28 Norwich Aston Villa 1 2 04/05/2013
29 QPR Arsenal 0 1 04/05/2013
30 Swansea Man City 0 0 04/05/2013
31 Tottenham Southampton 1 0 04/05/2013
32 West Brom Wigan 2 3 04/05/2013
33 West Ham Newcastle 0 0 04/05/2013
34 Aston Villa Sunderland 6 1 29/04/2013
35 Arsenal Man United 1 1 28/04/2013
36 Chelsea Swansea 2 0 28/04/2013
37 Reading QPR 0 0 28/04/2013
38 Everton Fulham 1 0 27/04/2013
39 Man City West Ham 2 1 27/04/2013
40 Newcastle Liverpool 0 6 27/04/2013
41 Southampton West Brom 0 3 27/04/2013
42 Stoke Norwich 1 0 27/04/2013
43 Wigan Tottenham 2 2 27/04/2013
Where df is your data.frame, this will create a list of 20 data.frames with each element being the dataset for one team. This also assumes that the dataset is already ordered, since you mentioned it.
setnames(df,c('hometeam','awayteam','homegoals','awaygoals','fixturedate'))
allteams <- sort(unique(df$hometeam))
eachteamlastfive <- vector(mode = "list", length = length(allteams))
for ( i in seq(length(allteams)))
{
eachteamlastfive[[i]] <- head(df[df$hometeam==allteams[i] | df$awayteam == allteams[i], ],5)
}
take a look at sapply
sapply(unique(new[,1]), function(team) head(new[new[,1] == team | new[,2] == team,], 5))

Resources