ABSTRACT
We describe a method for tracking the path of animals in the field, based on stereo videography and aiming-angle measurements, combined in a single, rotational device. In open environments, this technique has the potential to extract multiple 3D positions per second, with a spatial uncertainty of <1 m (rms) within 300 m of the observer, and <0.1 m (rms) within 100 m of the observer, in all directions. The tracking device is transportable and operated by a single observer, and does not involve any animal tagging. As a video of the moving animal is recorded, track data can easily be completed with behavioural data. We present a prototype device based on accessible components that achieves about 70% of the theoretical maximal range. We show examples of bird ground and flight tracks, and discuss the strengths and limits of the method, compared with existing fine-scale (e.g. fixed-camera stereo videography) and large-scale tracking methods (e.g. GPS tracking).
INTRODUCTION
Tracking the path of wild animals in the field yields information about multiple aspects of a species' biology. Long-term tracks, over days or more, inform ecologists on large-scale space use (e.g. home range, migration, dispersal). Locally, short-term tracks with higher sampling frequency and finer spatial resolution, allow biologists to observe the animal's path during a given activity phase (e.g. foraging), addressing questions about the animal's exploratory strategy, orientation skills or even biomechanical interaction with its physical milieu. Nathan (2008) synthesized the existing approaches to the study of organismal movement, and proposed an integrated ‘movement ecology’ framework.
Here, we present a method for local tracking of animal movement in 3D, based on stereo videography and aiming-angle measurements, from a single observation point. We aimed for: (i) tracking free-moving animals in the field; (ii) no animal tagging; (iii) a spatial uncertainty finer than GPS; (iv) omnidirectional tracking around the observer; (v) video recording the animal's behaviour; and (vi) a transportable, single-operator, affordable device.
The general principle of our method is to measure the position of an animal through its spherical coordinates, relative to the stationary observer (Fig. 1A). An angle measuring base (AMB), similar to a theodolite, records azimuth (a) and inclination (i) angles while the observer frames the moving animal in a viewfinder. Supported by the AMB, a stereo-videography device (SVD) records stereo images of the animal, from which the distance (d) from the observer is calculated. Altogether, the device is similar to a surveying tacheometer (or total station), but works at a higher sampling frequency (up to the video frame rate). Moreover, the embedded video record of the animal is used to extract additional behavioural data that can be combined with the tracking data.
There are two expected limits to this tracking method. First, the animal must remain visible during its movement; hence, the method only applies to terrestrial and aerial paths in open environments. The second limit results from the stereo-image-based distance evaluation: as uncertainty in terms of the distance measure increases quadratically with distance from the observer (Cavagna et al., 2008), the range of the tracking device will be finite, restricting precise tracking to a given radius around the observer.
In order to assess the usefulness and limits of this method, we investigated its theoretical aspects, constructed a prototype device and tracked various bird species during their locomotor activities.
RESULTS AND DISCUSSION
Quantization resolution and position uncertainty
The main theoretical results, which are essential to understanding the field results, are reported here (see Appendix for details).
- a
azimuth angle (rad)
- AMB
angle measuring base
- BL
base length between cameras (m)
- d
observer–animal distance (m)
- dmax
maximal range (m)
- Dmax
maximum distance at which animals move (m)
- Dmin
minimum distance at which animals move (m)
- Dtyp
typical distance at which animals move (m)
- DOF
depth of field (m)
- eqFL
35 mm-equivalent focal length (m)
- f
quadratic polynomial model
- FL
focal length (m)
- FOV
camera field of view (rad)
- h
inverse-curve model
- i
inclination angle (rad)
- IW
digital image width (pixels)
- k
error multiplying factor
- lhFOV
camera linear horizontal field of view (m)
- NI
noise index
- POI
point of interest
- QPU
quantization position uncertainty (m)
- rms
root-mean-square
- RSV
rotational stereo videography
- s
lateral shift between stereo images (m)
- sc
s at the centre of the image (m)
- SF
sampling frequency (Hz)
- SVD
stereo-videography device
- SW
sensor physical width (m)
- TSL
track step length (m)
- V
animal speed (m s−1)
- VOI
volume of interest (m3)
- Δa
azimuth angle resolution (rad)
- Δd
distance resolution (m)
- Δi
inclination angle resolution (rad)
- Δm
meridian resolution as per Δi (m)
- Δp
parallel resolution as per Δa (m)
- Δs
lateral shift resolution (m)
- ε
residual difference between s and sc (m)
Device implementation
Aiming at a large range without compromising transportability, we set BL to 1 m (Fig. 2). Videos are recorded in the high definition available on most current retail digital video cameras (IW=1920 pixels), at 25 Hz. To avoid synchronization issues between dual cameras, we rely on a single camera and a set of mirrors, projecting stereo images side by side on the sensor (Inaba et al., 1993). Telephoto lenses of eqFL=323 or 646 mm are used, depending on the animal proximity. Azimuth and inclination angles are measured continuously by a pair of 13-bit digital rotary encoders and recorded on a data logger.
Our prototype device weighs ∼20 kg, and when folded can be transported by a single operator on a hand trolley. The cost of the device components amounts to approximately €5000 (including camera: €1500, lenses: €1200, tripod and head: €1000, rotary encoders: €500; laser rangefinder for calibration: €300; AMB and SVD materials and components: €500).
True error of the device
While a perfect device would measure positions with an error equal to QPUrms, a real system (physical device+video analysis) will inevitably make larger errors. Two types of error can occur: (1) systematic error, which shifts successive positions along the track by a similar vector – this type of error is of importance to users aiming at positioning the track in its absolute environment (e.g. landscape map); (2) random error, which scatters successive positions in unpredictable directions – this is of particular importance to users interested in relative measurements between positions (e.g. distance, speed, angle). We focus on this scenario below.
At the same time, there are several possible sources of error: (i) space quantization; (ii) point of interest (POI) placement error in stereo images; (iii) calibration error, in particular static device optical or structural distortion, not fully corrected by the calibration procedure (see Appendix); (iv) in-motion device structural distortion, caused by mechanical load during active tracking; and (v) time-stamping errors, causing diachrony between a, i and d measurements.
A series of error tests should be performed with any new device in order to assess its real error characteristics. For our prototype device, we performed static and dynamic tests (see supplementary material Figs S1, S2). The results show that, overall, random error is about twice the error expected from space quantization alone. We call this random error multiplying factor k. For our current prototype device, k≈2.
Predicting the tracking range for a given species and locomotor activity
If the acceptable positional error is clearly known (e.g. indexed on animal size; Theriault et al., 2014), Table 1 directly gives the theoretical maximal range of the tracking method. The k error factor of the device should be accounted for, either by dividing the acceptable error by k before entering the data in the table, or by multiplying the output dmax range value by 1/√k (i.e. 0.7 for k=2, see Eqn 6).
For NI=1, the rms random error (i.e. the standard deviation of position) is equal to the interval between two track points, resulting in a very noisy track. The acceptable NI will depend on the aim of the study (path pattern description versus biomechanics), on the scale of relevant path patterns relative to TSL, and on the intent of data smoothing. If the user awaits a raw, unsmoothed track containing readable spatial patterns, we suggest keeping NI<0.5, and monitoring the real NI value along the measured tracks. Setting upper bounds on TSL and NI allows calculation of an acceptable QPUrms (Eqn 8), and in turn a maximal range dmax for the device (Table 1, Eqn 6). In the end, dmax is the radius of a spherical volume of interest (VOI) within which the animal should be reliably trackable.
Magpie walk track
We tracked a common magpie (Pica pica) walking and feeding on a flat grass lawn, using the 323 mm eqFL lens. The 345 s track was sampled at 1 Hz (i.e. once every 25 video frames). The bird moved at a mean speed of 0.25 m s−1, covering a distance of about 90 m. The mean TSL was 0.25 m. The random position error (2 QPUrms) was 0.04 m at 30 m from the device, 0.06 m at 40 m and 0.09 m at 50 m. Based on the mean TSL, NI was 0.16, 0.24 and 0.36, respectively. Fig. 3A shows a trajectory that is indeed smoother at shorter distances, and noisier beyond 50 m. NI could potentially be lowered by using a lower SF/larger TSL (i.e. downsampling). The video record allowed identification of moments when the bird pecked in the grass (most of which were immediately followed by a trophic interaction with another, younger magpie rejoining the focal individual). With these behavioural data combined with the track (and many replications), it would be possible to study the spatial strategy underlying the foraging activity.
Swift flight track
We recorded the flight of a common swift (Apus apus), for 45 s, and sampled its 3D track at 6.25 Hz (i.e. once every 4 video frames). We used our longer lens for this track (eqFL=646 mm), and it was sometimes difficult to keep the bird within the frame, resulting in some missing data along the track. The mean speed from raw positions was 10.76 m s−1, for a travelled distance of about 470 m. The mean TSL was 1.72 m. Random error (2 QPUrms) was 0.18 m at 100 m, 0.38 m at 150 m and 0.68 m at 200 m. NI was 0.10, 0.22 and 0.40, respectively, and again the track appears less smooth at greater distances (Fig. 3B). Speed data obtained from raw position subtraction contain important noise. As an alternative to downsampling, we performed spline smoothing (Garcia, 2010). The smoothed path (Fig. 3C,D) shows speed data that could potentially be used for a kinematic analysis. The speed from smoothed data ranged from 4.77 to 14.31 m s−1 (mean 10.22 m s−1). We detected a probable prey capture at the upper-right of the track (lowest speed, protracted head). Hence, both flight (flapping/gliding) and aerial feeding behaviour data can be combined with the positional and speed data. Note that the wind speed would have to be subtracted from the ground speed to yield the air speed of the animal, as needed in a biomechanical perspective. Depending on the bird distance and height, wind measurements from stationary anemometers or balloon launch tracking should be integrated with the tracking data (see Henningsson et al., 2009; Pennycuick et al., 2013). With these complementary data, and many replications, one could provide reliable foraging speeds of a swift, to be compared with migration, roosting and display flight speeds (Henningsson et al., 2009, 2010).
Woodpecker flight track
We recorded a brief (5 s) flight bout of a European green woodpecker (Picus viridis) at close range, with the same device configuration and error as for the magpie walk. Because the woodpecker was moving much faster (mean speed from raw data, 9.63 m s−1), we could sample its track at 25 Hz (i.e. on every video frame), with a TSL large enough (mean 0.38 m) to maintain acceptable NI values (0.11, 0.16 and 0.24 at 30, 40 and 50 m, respectively). A side view of the track (Fig. 3E) shows a typical undulating pattern, with alternating flapping and bounding (fully retracted wings) phases. The ground speed during these phases can be estimated after spline smoothing.
Comparison with existing tracking methods
A first comparable tracking method is the ‘Ornithodolite’ of Pennycuick (1982), and subsequent implementations (Tucker, 1995; Hedenström et al., 1999). Those systems measure the same variables (a, i, d from a single point). However the distance measure is not based on recorded stereo images, but rather on the manual actuation of an optical rangefinder by the operator. A downside is that the tracking accuracy depends on the operator skill in aiming exactly at the moving bird, while simultaneously adjusting the rangefinder knob. Our method corrects aiming errors as long as the animal remains within the recorded images, and postpones distance measurement to later image analysis. Although this is time consuming, it enables the possibility of extracting accurate, corrected positions at high frequencies, with less user-skill dependency. Another downside of the Ornithodolite is the lack of an embedded record of the animal behaviour, unless the system is augmented with secondary behavioural data acquisition (e.g. video).
Recently, Pennycuick et al. (2013) used a pair of military binoculars equipped with a laser rangefinder, a magnetic compass and an inclinometer. Although this system is very portable, it has limited SF (<0.5 Hz) and is much less affordable than our system.
Delinger and Willis (1988) proposed a device measuring only the aiming angles (a, i) of a video camera. Two distant systems and a triangulation method are used to measure position. The requirement for dual operators is a downside and implies synchronization issues, but this system potentially offers low uncertainty at long distances, and behavioural records. Tucker and Schmidt-Koenig (1971) had used a similar dual-theodolite system, without the video record.
Image-based tracking using fixed cameras is another, more widespread method. A single fixed camera can record 2D movements, in the laboratory (e.g. Aureli et al., 2012) or even outdoors (Pillot et al., 2010; Collett et al., 2013). 3D tracks in the field have been measured using multiple fixed cameras (Major and Dill, 1978; Pomeroy and Heppner, 1992; Ikawa et al., 1994; Budgey, 1998; Ballerini et al., 2008; Corcoran and Conner, 2012; Shelton et al., 2014). The VOI is defined by the fixed intersection of the cameras' field of view (FOV). To cover a large VOI, and track animals for a significant duration, this technique usually requires wider angle lenses (eqFL≈50 mm), which has a few drawbacks. First, a larger between-cameras distance is required to maintain a low position uncertainty (Eqns 1,5), which can limit the system's portability (Cavagna et al., 2008; see Theriault et al., 2014, for recent progress). Moreover, the animal projects a small image on the camera sensor, which can limit positional and behavioural analysis (Theriault et al., 2014). However, as a benefit, fixed cameras capture the entire VOI continuously, hence multiple animals present in the VOI can be tracked simultaneously. The size of the VOI depends on the desired spatial uncertainty: recent studies have monitored VOIs from 102 m3 (Corcoran and Conner, 2012) up to 104 m3 (Theriault et al., 2014; Shelton et al., 2014) or even 106 m3 (Ballerini et al., 2008; Cavagna et al., 2008). Vertebrate flight bouts of a few seconds can usually be recorded. In comparison with fixed-camera stereo videography, our method is based on a short BL/long eqFL, rotational configuration. The short BL allows for a single, easily transportable device. The long eqFL allows a greater magnification of the animal image, but can limit the possibility of tracking multiple animals. The rotational, omnidirectional device yields a virtually spherical VOI, which in some conditions allows for longer tracking bouts (e.g. 45 s in Fig. 3B, in a field VOI≈107 m3). However, as with other stereo-videography techniques, the size of the VOI remains strongly dependent on the tolerated spatial uncertainty.
Aside from optical systems, GPS tracking (Cagnacci et al., 2010) has as a main benefit its unlimited, global range. The position uncertainty of GPS is about 6.5 m in 2D (distance rms, drms; Seeber, 2003) and more than 10 m in 3D (mean radial spherical error, MRSE). It can be increased by various environmental factors, and field errors of 30 m are often assumed (Frair et al., 2010). Although GPS tags can sample positions at up to 1 Hz (e.g. Dell'Ariccia et al., 2008; Vyssotski et al., 2009), they are often used at much lower SF, to preserve the tag’s battery life (e.g. Debeffe et al., 2013). These specifications make GPS tracking well adapted to large-scale/long-term tracking, but less so to fine-scale local path investigations (Frair et al., 2010; Rowcliffe et al., 2012). Other radiowave-based tracking methods, such as VHF tracking (smaller tags than GPS; Daniel Kissling et al., 2014), scanning harmonic radar (even smaller passive tags; Ovaskainen et al., 2008; Lihoreau et al., 2012) and surveillance or tracking radars (no tag; Gauthreaux and Belser, 2003; Henningsson et al., 2009) each have specific advantages over GPS (especially for tracking small species), but lack the global range, and usually do not provide lower spatial uncertainty than GPS tracking, nor a SF above 1 Hz.
The present tracking method attains GPS-like uncertainty (QPUrms≈10 m) around 500–1000 m from the device (Table 1). This finite range suggests that the present method should not be considered as an alternative to GPS for long-term tracking (an animal flying forward at 10 m s−1 crosses such a VOI within a few minutes), but rather as a valuable complementary technique at the local scale. Within its range, it is capable of much finer – metres to centimetres – uncertainty, combined with higher SF. Animal follow-up is based on continuous visibility rather than tagging, which has both downsides (limited to open environments, pseudoreplication) and benefits (no animal capture, sample size not limited by the cost of the tags). Lastly, the embedded record of animal behaviour provides supplementary data that help with understanding the mechanisms at play along the animal path, and reveal both movement patterns and processes (Nathan, 2008).
In conclusion, by allowing animal image magnification and omnidirectional tracking, the present method expands the range of operation – and the potential track duration – of field stereo videography, with minimal field deployment difficulties. It cannot match the range of a GPS tracking system, but within its operational range provides richer information (fine-scale spatio-temporal and behavioural data), non-invasively. We hope that this comparatively accessible tracking method (we propose the acronym RSV for rotational stereo videography) will allow biologists to develop new spatial behaviour and movement ecology studies, at intermediate spatial scales.
MATERIALS AND METHODS
Prototype components
The AMB is composed of a Manfrotto™ (Cassola, Italy) 545B tripod (25 kg payload) and 509HD head, coupled with two AKIndustrie™ (Thal-Marmoutier, France) CHO5 13 bit encoders. The 26 parallel encoder outputs are wired to an Arduino MEGA microcontroller board (www.arduino.cc), through a latch interface based on four SN74LS374N octal flip-flops (Texas-Instruments™, Dallas, TX, USA). The angular SF is 50 Hz. Angle values are converted from Gray code to steps (0–8191), time-stamped to the closest millisecond, and recorded on a SD memory card in a Data Logging Shield (Adafruit™, New York, NY, USA). The AMB is powered by a 7.2 V, 2700 mAh battery.
The SVD has a BL of 1 m. We use 6 mm-thick first surface mirrors (FSM, Toledo, OH, USA) of dimensions 150×150 mm (outer, primary mirrors) and 70×150 mm (W×H; inner, secondary mirrors). Mirrors and camera are supported by 30×30 mm aluminium beams, assembled with 9 mm-thick PVC machined plates. The angular position of outer mirrors can be adjusted, allowing for FOV convergence adjustment. We use a Canon™ (Tokyo, Japan) EOS 7D camera, recording full HD (1920×1088 pixels, W×H) frames at 25 Hz. The lens is either a Nikon™ (Tokyo, Japan) 200 mm f/4 Ai, or a Canon™ EF 400 mm f/5.6 L. As the camera has a 22.3 mm-wide sensor, the eqFL is 323 mm and 646 mm, respectively (see Appendix).
Image analysis
We use Matlab™ (MathWorks™, Natick, MA, USA) to analyze individual video frames. The lateral distance between the left and right images of the animal is measured and converted to distance using a reference curve inferred from a calibration video. The horizontal and vertical position of the animal in the frame is used to correct recorded angles for aiming errors. See Appendix and supplementary material Fig. S5 for details.
APPENDIX
Distance resolution
Device design issues
Focal length choice
If the FOV is very narrow, continuously framing an erratically moving animal is not easy, which gives rise to missing data. Moreover, for a given sensor size (SW), longer lenses provide less depth of field (DOF), suggesting the lens should be used at a smaller aperture to obtain a sharp image throughout the range of the device. Lastly, very long lenses are heavy; hence, a stiffer and vibration-dampened support is needed. Note that a solution for obtaining a higher eqFL without the DOF and weight downsides is to use a camera with a smaller sensor (Eqn A7). See supplementary material Figs S3 and S4 for FOV and DOF plots that can help identify the appropriate FL value.
Convergence
A small angle of convergence (typically <1 deg), tilting the optical axes of each camera inwards, is needed to maximize superposition of the two FOVs, especially at shorter distances. Although the formal relationship between d and s (Eqn A2) gets more complex (see Woods et al., 1993), the small angle implied does not significantly affect the subsequent calculation results.
Video versus photo
Current retail cameras are both photo and video capable, contrary to what was previously available (Cavagna et al., 2008). Most cameras provide a video mode, recording 1920×1080 pixel frames at 30 or even 60 Hz. In photo mode, higher definition images can be recorded, at a lower frame rate (e.g. 5184×3456 pixels at 8 Hz for our camera). Hence, when the tracking does not need a very high SF, using the camera in photo mode instead of video mode can increase IW and hence the range of the device (Eqn 6). However, photo frame rate is usually less stable than video frame rate, and a series of photo files contains less behavioural information than a video file.
Mirrors versus dual cameras
With the mirrors/single camera configuration, left and right images are each projected on one half of the same sensor, which solves synchronization issues. The range of the apparatus remains unchanged (as SW/IW in Eqn A6), but the captured FOV is halved compared with a dual camera system (multiply the results of Eqn A8 by 0.5).
Rolling shutter effect
On the widely available CMOS sensor cameras, each video frame is captured progressively from top to bottom, usually within 1/100 to 1/30 s (‘rolling’ electronic shutter), such that different parts of a single frame are actually not recorded perfectly simultaneously. This can contribute to time-stamping errors (see Results and discussion, ‘True error of the device’, error source v) when rotating the SVD very quickly (fast, close movements), but could be corrected in a refined analysis method. Note that the rolling shutter effect is much less pronounced but still exists in photo mode (about 1/250 s for a mechanical shutter). CCD sensors are free from this effect (‘global’ shutter).
Ways to increase range
According to the noise-to-signal approach of maximal range (Eqn 8), the first way to increase the range, as already discussed, is to choose a larger TSL (lower SF, i.e. downsampling). If this is not possible without losing relevant path information, Eqn 6 states that doubling BL or IW (e.g. ‘4K’ video standard) or eqFL of the device will multiply the range by a factor of √2. These effects are multiplying, hence the ranges given in Table 1 could be increased about 3-fold by doubling all three parameters (but with cost, portability and data storage consequences).
Operation in the field
To set the device in the field: (i) choose an unobstructed point of view and evaluate a typical distance (Dtyp) and distance range (Dmin to Dmax) at which animals move; (ii) select a focal length that will provide enough distance resolution (dmax≥Dmax), and check that the FOV is not too narrow to reliably frame the moving animal, even at Dmin; (iii) install the tripod and AMB using a spirit level, then place the SVD on top; set the tripod head friction and counterbalance so that the SVD can move smoothly; (iv) set the camera to full manual video mode; (v) focus the lens to Dtyp, and close the lens aperture until the captured image is sharp from Dmin to Dmax; an aperture as small as f/16 or smaller might be needed; leave the focus ring untouched afterwards; (vi) set the mirrors' convergence so that a point at Dtyp is projected on the centre of each stereo image; then check that FOV superposition is effective from Dmin up to Dmax; (vii) set the camera exposure: set a shutter speed that will stop animal motion on each video frame (1/200 s or faster) and then adjust camera sensitivity (ISO) to get a properly exposed image.
The procedure for tracking an animal is as follows: (i) start the video record; (ii) start the angle record; (iii) perform a brief angular oscillation with the SVD, for angular/video synchronization purposes; (iv) track the animal(s) by keeping it in both right and left images; (v) at the end of tracking, perform a second quick angular movement; (vi) stop the video and angle records.
Image analysis
The main steps of the analysis are: (i) extract still frames from the video file, at the desired SF; (ii) for each frame, measure the lateral shift (s) and the position of the animal in the image (xm, ym) (see supplementary material Fig. S5); (iii) convert s to distance (d), using the reference curve from the calibration video (see below); (iv) synchronize distance and angle data, and for each d value obtain an associated azimuth (a) and inclination (i) value; (v) correct a and i for aiming errors using the position of the animal in the image (xm, ym); (vi) convert spherical coordinates (a, i, d) to Cartesian coordinates (x, y, z); (vii) plot the track.
Calibration
Using this calibration reference model to compute the distance to a tracked animal is a three-step process: (i) extract (s, xm, ym) from the video frame; (ii) compute ɛ using the f model, and subtract ɛ from s to obtain sc (i.e. the lateral shift if the animal was perfectly centred); (iii) compute d using the h model.
Acknowledgements
The authors are grateful to Stéphane Louazon and Fouad Nassur (Rennes University) for technical support in the field, and Prof. Marie Trabalon (Rennes University) for supporting the present method development. We thank C. Baczkowski (AST35) for providing access to property for swift tracking. We also thank three anonymous referees for their useful comments (including the idea of a ‘ball toss’ test procedure).
Footnotes
Author contributions
The behavioural questioning underlying this project was elaborated by E.d.M., C.H. and S.L. E.d.M. proposed the method's concept, studied the theoretical aspects, designed the device (optics and mechanics) and programmed the analysis routine. J.-P.C. designed the device's electronics. M.S. collected and analysed the data, under mentorship by E.d.M. The manuscript was composed in its entirety by E.d.M. with revisions by C.H., S.L., J.-P.C. and M.S.
Funding
A grant from the city of Rennes Métropole to E.d.M. enabled the acquisition of the analysis software and computers used in this study.
References
Competing interests
The authors declare no competing or financial interests.