I have looking into other threads on this problem and could not find an easy solution. I have imported data from Excel tables, and joined them in lists which generally look like this:
> Hemo
[[1]]
V1 V2 V3 V4 V5 V6 V7
1 0d 3d 6d 9d 12d 15d 18d
2 10 40 20 60 50 30 40
3 20 30 30 30 30 30 30
4 20 20 30 20 40 20 50
[[2]]
V1 V2 V3 V4 V5 V6 V7
1 0d 3d 6d 9d 12d 15d 18d
2 0 10 10 0 0 0 0
3 0 10 20 20 20 0 0
4 0 0 10 20 20 0 0
However I'd like them to look like this (which is an array):
, , 1
0d 3d 6d 9d 12d 15d 18d
V2 10 40 20 60 50 30 40
V3 20 30 30 30 30 30 30
V4 20 20 30 20 40 20 50
, , 2
0d 3d 6d 9d 12d 15d 18d
V2 0 10 10 0 0 0 0
V3 0 10 20 20 20 0 0
V4 0 0 10 20 20 0 0
In the first case all elements are characters and I am not being able to coerse them to numbers. Ultimately I'd like to convert the first list into the second array where the first imported line figures as the column names. There must be some package enabling this? Please let us find a simple workaround as I am a newbie. Thanks
It appears as though you imported the data from excel, but the columnnames were interpreted as data. You didn't specify which function you used to do the importing, but with most of them you can specify that the first row of data are columnnames.
library(readxl)
data <- read_excel(filename, col_names = TRUE)
When you import your data properly, it won't confuse the actual data, and should automatically read it as numerics. This way you won't have to convert it yourself.
I have an sql table containing the gps coordinates of a device, updated every n minutes (the device is installed in a vehicle). given the nature of GPS, lots of the entries are very similar, but entirely different as far as the server is concerned. I can approximately match things (within ~3.6' or maybe 36') easy enough with CAST(lat as decimal(7,4))
I'd like to be able to take a result set and condense the approximate duplicate entries, but still maintain the time-based order. here's an example:
Row Lat Lng vel Hdg Time
01 31.12345 -88.12345 00 00 12-4-21 01:45:00
02 31.12346 -88.12345 00 00 12-4-21 01:46:00
03 31.12455 -88.12410 10 01 12-4-21 01:47:00
04 31.12495 -88.12480 17 01 12-4-21 01:48:00
05 31.12532 -88.12560 22 01 12-4-21 01:49:00
06 31.12567 -88.12608 25 02 12-4-21 01:50:00
07 31.12638 -88.12672 24 02 12-4-21 01:51:00
08 31.12689 -88.12722 19 02 12-4-21 01:52:00
09 31.12345 -88.12345 00 00 12-4-21 01:53:00
10 31.12346 -88.12346 00 00 12-4-21 01:54:00
11 31.12347 -88.12345 00 00 12-4-21 01:55:00
12 31.12346 -88.12346 00 00 12-4-21 01:56:00
13 31.12689 -88.12788 10 40 12-4-21 01:57:00
14 31.12604 -88.12691 13 39 12-4-21 01:58:00
15 31.12572 -88.12603 15 39 12-4-21 01:59:00
my desired end result would be rows 1 and 2 to be condensed to a single row, and rows 9 through 12 be condensed to a single row, containing AVG(Lat), AVG(Lng), and MIN(Time).
This is the result set i would like to receive, given the above data:
Row Lat Lng vel Hdg Time
01 31.123455 -88.12345 00 00 12-4-21 01:45:00
02 31.12455 -88.12410 10 01 12-4-21 01:47:00
03 31.12495 -88.12480 17 01 12-4-21 01:48:00
04 31.12532 -88.12560 22 01 12-4-21 01:49:00
05 31.12567 -88.12608 25 02 12-4-21 01:50:00
06 31.12638 -88.12672 24 02 12-4-21 01:51:00
07 31.12689 -88.12722 19 02 12-4-21 01:52:00
08 31.12346 -88.123455 00 00 12-4-21 01:53:00
09 31.12689 -88.12788 10 40 12-4-21 01:57:00
10 31.12604 -88.12691 13 39 12-4-21 01:58:00
11 31.12572 -88.12603 15 39 12-4-21 01:59:00
the boundaries between groupings would be movement. velocity being > 0, or gps coordinate changing more than x amount. in this case, x is .0001. the problem, as described below, is that multiple stops (AT DIFFERENT TIMES) at a given coordinate are lumped into a single stop. if i visit coordinate x today at 4 pm, and tomorrow at 8 am, and then again at 6 pm, the only one i see is the tomorrow # 6 pm (in the case of MAX(Time)) or the today # 4 pm (in the case of MIN(Time)).
It's a given that if velocity is 0, heading is also 0. It is, however, important that rows 1 and 2, and 9 through 12 not be grouped TOGETHER if their coordinates are similar enough to be the same (i.e. when rounded to 4 decimal places).
i have a query that does just that:
SELECT Geography::Point(AVG(dbo.GPSEntries.Latitude),
AVG(dbo.GPSEntries.Longitude),
4326 ) as Location,
dbo.GPSEntries.Velocity,
dbo.GPSEntries.Heading,
MAX(dbo.GPSEntries.Time) as maxTime,
MIN(dbo.GPSEntries.Time) as minTime,
AVG(dbo.RFDatas.RSSI) as avgRSSI,
COUNT(1) as samples
FROM dbo.GPSEntries
INNER JOIN
dbo.Reports ON
dbo.GPSEntries.Report_Id = dbo.Reports.Id
INNER JOIN
dbo.RFDatas ON
dbo.GPSEntries.Report_Id = dbo.RFDatas.Report_Id
GROUP BY CAST(Latitude as Decimal(7,4)),
CAST(Longitude as Decimal(7,4)),
Velocity,
Heading
ORDER BY MAX(Time)
in other words, if i travel from point A to point B, stay for 30 minutes (and 30 reports at 1 per minute), then travel to point C, stay for 20 minutes, then travel back to point B and stay for 20 more minutes before heading to point D, i would like to be able to see both separate stops at point B.
Here's some actual data from my db, sanitized to protect the innocent, or to blame someone in north east alabama.
Latitude Longitude Spd Vel MAX(Time) MIN(Time) sig RowCount
34.747420 -86.302580 68 157 2012-06-13 01:31:37.000 2012-06-13 01:31:37.000 -91 1
34.759140 -86.307620 61 134 2012-06-13 01:33:06.000 2012-06-13 01:33:06.000 -91 2
34.763237 -86.307264 0 0 2012-06-13 01:34:36.000 2012-06-12 01:27:21.000 -97 7
34.763288 -86.307280 0 0 2012-06-13 14:30:44.000 2012-06-12 01:30:21.000 -98 527
34.760220 -86.308200 38 110 2012-06-13 14:33:44.000 2012-06-13 14:33:44.000 -98 1
34.750350 -86.305750 5 90 2012-06-13 14:35:13.000 2012-06-13 14:35:13.000 -83 2
34.737160 -86.298040 70 88 2012-06-13 14:36:43.000 2012-06-13 14:36:43.000 -80 1
34.736420 -86.277270 120 33 2012-06-13 14:38:13.000 2012-06-13 14:38:13.000 -87 2
34.747090 -86.248370 120 37 2012-06-13 14:39:43.000 2012-06-13 14:39:43.000 -93 2
34.755620 -86.240640 70 179 2012-06-13 14:41:13.000 2012-06-13 14:41:13.000 -81 1
34.771240 -86.242760 70 0 2012-06-13 14:42:42.000 2012-06-13 14:42:42.000 -88 2
34.785510 -86.245710 70 6 2012-06-13 14:44:12.000 2012-06-13 14:44:12.000 -99 2
34.800220 -86.239400 70 1 2012-06-13 14:45:42.000 2012-06-13 14:45:42.000 -86 1
34.815070 -86.232180 70 16 2012-06-13 14:47:12.000 2012-06-13 14:47:12.000 -98 2
34.824540 -86.226198 0 0 2012-06-13 14:51:41.000 2012-06-13 00:13:48.000 -101 9
34.824579 -86.226171 0 0 2012-06-14 00:26:19.000 2012-06-12 00:46:57.000 -99 168
You'll note the 4th and last row have 527 and 168 entries, respectively, and they span 2 days. those entries are from 1 device only, and are from where the device was stopped for several hours in the same place on multiple occasions.
Here's some zipped csv data: sample
What I Finally Done Did
Some minor modifications to Aaron Bertrand's supplied query shown below:
WITH d AS
(
SELECT Time
,Latitude
,Longitude
,Velocity
,Heading
,TimeRN = ROW_NUMBER() OVER (ORDER BY [Time])
FROM dbo.GPSEntries
GROUP BY Time, Latitude, Longitude, Velocity, Heading
),
y AS (
SELECT BeginTime = MIN(Time)
,EndTime = MAX(Time)
,Latitude = AVG(Latitude)
,Longitude = AVG(Longitude)
-- ,[RowCount] = COUNT(*)
,GroupNumber
FROM (
SELECT Time
,Latitude
,Longitude
,GroupNumber = (
SELECT MIN(d2.TimeRN)
FROM d AS d2
WHERE d2.TimeRN >= d.TimeRN AND
NOT EXISTS (
SELECT 1
FROM d AS d3 -- Between 250 and 337 feet
WHERE ABS(d2.Latitude - d.Latitude) <= .0007 AND
ABS(d2.Longitude - d.Longitude) <= .0007 AND
d2.Velocity = d.Velocity ) )
FROM d ) AS x
GROUP BY GroupNumber
)
SELECT y.Latitude
,y.Longitude
,d.Velocity
,d.Heading
,y.BeginTime
-- ,y.EndTime
-- ,y.[RowCount]
-- ,Duration = CONVERT(time(0),DATEADD(SS,DATEDIFF(SS,y.BeginTime, y.EndTime), '0:00:00'), 108)
FROM y INNER JOIN d ON y.BeginTime = d.[Time]
-- FOR STOPS (5 minute):
-- WHERE DATEDIFF(MI, Y.BeginTime, y.EndTime) + 1 > 5
ORDER BY y.BeginTime;
Here is some sample data in tempdb:
USE tempdb;
GO
CREATE TABLE dbo.GPSEntries
(
Latitude DECIMAL(8,5),
Longitude DECIMAL(8,5),
Velocity TINYINT,
Heading TINYINT,
[Time] SMALLDATETIME
);
INSERT dbo.GPSEntries VALUES
(31.12345,-88.12345,00,00,'2012-04-21 01:45:00'),
(31.12346,-88.12345,00,00,'2012-04-21 01:46:00'),
(31.12455,-88.12410,10,01,'2012-04-21 01:47:00'),
(31.12495,-88.12480,17,01,'2012-04-21 01:48:00'),
(31.12532,-88.12560,22,01,'2012-04-21 01:49:00'),
(31.12567,-88.12608,25,02,'2012-04-21 01:50:00'),
(31.12638,-88.12672,24,02,'2012-04-21 01:51:00'),
(31.12689,-88.12722,19,02,'2012-04-21 01:52:00'),
(31.12345,-88.12345,00,00,'2012-04-21 01:53:00'),
(31.12346,-88.12346,00,00,'2012-04-21 01:54:00'),
(31.12347,-88.12345,00,00,'2012-04-21 01:55:00'),
(31.12346,-88.12346,00,00,'2012-04-21 01:56:00'),
(31.12689,-88.12788,10,40,'2012-04-21 01:57:00'),
(31.12604,-88.12691,13,39,'2012-04-21 01:58:00'),
(31.12572,-88.12603,15,39,'2012-04-21 01:59:00');
And my attempt at satisfying the query:
;WITH d AS
(
SELECT Time, Latitude, Longitude, Velocity, Heading,
NormLat = CONVERT(DECIMAL(7,4), Latitude),
NormLong = CONVERT(DECIMAL(7,4), Longitude),
TimeRN = ROW_NUMBER() OVER (ORDER BY [Time])
FROM dbo.GPSEntries
-- /* you probably want filters:
-- WHERE DeviceID = #SomeDeviceID
-- AND [Time] >= #SomeStartDate
-- AND [Time] < DATEADD(DAY, 1, #SomeEndDate)
-- /* also your sample CSV file had lots of duplicates, so:
GROUP BY Time, Latitude, Longitude, Velocity, Heading
),
y AS (
SELECT MinTime = MIN(Time), MaxTime = MAX(Time), Latitude = AVG(Latitude),
Longitude = AVG(Longitude), [RowCount] = COUNT(*) FROM
(
SELECT Time, Latitude, Longitude, GroupNumber =
(
SELECT MIN(d2.TimeRN)
FROM d AS d2 WHERE d2.TimeRN >= d.TimeRN
AND NOT EXISTS
(
SELECT 1 FROM d AS d3
WHERE d2.NormLat = d.NormLat
AND d2.NormLong = d.NormLong
)
)
FROM d
) AS x GROUP BY GroupNumber
)
SELECT [Row] = ROW_NUMBER() OVER (ORDER BY y.MinTime),
y.Latitude, y.Longitude, d.Velocity, d.Heading,
y.MinTime, y.MaxTime, y.[RowCount]
FROM y INNER JOIN d ON y.MinTime = d.[Time]
ORDER BY y.MinTime;
Results:
Row Latitude Longitude Velocity Heading MinTime MaxTime RowCount
---|---------|----------|--------|-------|----------------|----------------|--------
1 31.123455 -88.123450 0 0 2012-04-21 01:45 2012-04-21 01:46 2
2 31.124550 -88.124100 10 1 2012-04-21 01:47 2012-04-21 01:47 1
3 31.124950 -88.124800 17 1 2012-04-21 01:48 2012-04-21 01:48 1
4 31.125320 -88.125600 22 1 2012-04-21 01:49 2012-04-21 01:49 1
5 31.125670 -88.126080 25 2 2012-04-21 01:50 2012-04-21 01:50 1
6 31.126380 -88.126720 24 2 2012-04-21 01:51 2012-04-21 01:51 1
7 31.126890 -88.127220 19 2 2012-04-21 01:52 2012-04-21 01:52 1
8 31.123460 -88.123455 0 0 2012-04-21 01:53 2012-04-21 01:56 4
9 31.126890 -88.127880 10 40 2012-04-21 01:57 2012-04-21 01:57 1
10 31.126040 -88.126910 13 39 2012-04-21 01:58 2012-04-21 01:58 1
11 31.125720 -88.126030 15 39 2012-04-21 01:59 2012-04-21 01:59 1
lets say i have an array :
#time = qw(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
);
but the values 1..50 depend on the size of an array #arr
so instead of declaring #time manually, how can i populate #time with 1 .. #arr, and possibly have other TYPES of elements like TIME in seconds, etc.
This will initialise #time with the values from 1 to $#arr:
#time = (1..$#arr);
I suspect you probably want 0 .. $#arr rather than 1 .. $#arr?
and possibly have other TYPES of elements like TIME in seconds, etc.
I'm not quite sure what you mean here, but you should have a look at map for one convenient way of generating a list of values by transforming another list. That might be what you're after.
#time = 1 .. #arr;
If you want to do something with each number, like multiply them by 2, you can use map:
#time = map { 2 * $_ } 1 .. #arr;