In this paper, we present a method for synchronizing high-speed audio and video recordings of bio-acoustic experiments. By embedding a random signal into the recorded video and audio data, robust synchronization of a diverse set of sensor streams can be performed without the need to keep detailed records. The synchronization can be performed using recording devices without dedicated synchronization inputs. We demonstrate the efficacy of the approach in two sets of experiments: behavioral experiments on different species of echolocating bats and the recordings of field crickets. We present the general operating principle of the synchronization method, discuss its synchronization strength and provide insights into how to construct such a device using off-the-shelf components.
Data acquisition in behavioral experiments on a wide range of animal species often relies on multi-modal (i.e. video and single- or multi-channel audio) sensor information (Valente et al., 2007). Indeed, many behavioral experiments on bats (e.g. Geberl et al., 2015; Geipel et al., 2013; Greif et al., 2017; Luo and Moss, 2017; Stilz, 2017; Übernickel et al., 2013), zebra finches (e.g. Ullrich et al., 2016) and even insects such as fruit flies (Coen and Murthy, 2016) rely on capturing synchronized high-speed video and single- or multi-channel audio data. Having an accurate measure of the relative time shift between the individual sensor data streams is essential for subsequent analysis of the sensor data. Determining and compensating for the time shift between multiple-sensor data streams is referred to as time synchronization. The time synchronization of both commercial and custom-built multimodal sensor systems often relies on using a so-called trigger event, e.g. a post-trigger which is provided by the experimenter once the animal has performed its task. This finishes the data capture sequence on multiple devices simultaneously, providing a synchronization point for all the captured sequences. This is often performed by broadcasting a digital transistor-transistor level (TTL) pulse to the dedicated synchronization inputs of each individual data-acquisition device. We argue that this synchronization approach suffers from two important disadvantages. First, this approach depends on the availability of compatible synchronization inputs on the various data-acquisition devices. Furthermore, if the synchronization-handling mechanism is not implemented carefully, precise synchronization cannot be guaranteed, e.g. whenever part of the synchronization system relies on a software component running on a non-real-time operating system. Second, the downside of the digital synchronization pulse-based approach is that the synchronization information is not embedded in the data. By synchronizing either the start or the end of the captured data sequences, the relative time shift between the individual data streams, e.g. audio and video streams, can be deduced. However, this synchronization information is easily lost in the case of truncation of the data sequences. Furthermore, data sequences are often recorded in such a way that a portion of the data is recorded before the so-called trigger event, and a portion after the trigger event (pre- and post-trigger data). The information about the type of the captured data sequence needs to be recorded very carefully in metadata, which increases the risk of data loss or inconsistencies. Again, truncation of the data, i.e. throwing away uninteresting sections of the data sequences, aggravates the risk of inconsistencies.
To overcome these shortcomings of traditional synchronization techniques, we propose a method based on embedding a random 1-bit signal into the data streams directly. This type of signal is exceptionally good for alignment purposes as it exhibits a very narrow autocorrelation function. Embedding this type of synchronization signal into the recorded data sets solves both issues at once: no specialized synchronization input is needed to store synchronization information and the accuracy and precision of the synchronization do not depend on the manufacturer of the recording equipment. In addition, as the synchronization information is embedded in the data streams directly, the synchronization information can be made very robust to truncation and even re-sampling of the sensor data. It should be noted that our proposed approach is similar in concept to the SMPTE timecode system (Poynton, 1996) used in synchronization of television audio and video, the main difference being that we propose to embed a random sequence in the data instead of using structured timecode.
MATERIALS AND METHODS
Topology of the synchronization system
The proposed synchronization system consists of two main parts: a 1-bit signal generator with random periods, and one or more means of transferring the generated random sequences to the sensor modalities that require synchronization. In case of video and multi-channel audio, these means are a blinking LED on the one hand and an electrical copy of the 1-bit synchronization signal on the other hand (e.g. a TTL-compatible version with ‘−1’ represented by 0 V and ‘+1’ represented by 5 V). The blinking LED is recorded using the video camera, and the electrical signal is recorded with the multi-channel microphone recording equipment, sacrificing one microphone channel. Indeed, many multi-channel microphone array systems can spare a single channel in return for highly accurate synchronization with video data.
Pseudo-random number generators
Note that in Fig. 4, the total fragment length has been chosen to be rather short (only 100 samples), leading to a very limited range of proper Pmin and Pmax values, and a relatively high risk of meeting an odd failure of the synchronization principle. The short length has been specifically chosen to illustrate these effects. In practical cases, the length of the fragment will be significantly greater, resulting in a larger valid range and a vanishing chance of a failing synchronization. In practice, we commonly use fragment lengths above 500 samples, easily accommodating 50 transitions or more.
We can illustrate the procedure outlined above with a practical example. Most often, the slowest recording modality is a video camera; even high-speed video cameras typically have measurement rates between only 100 and 5000 Hz. We choose the minimal period Pmin to be at least twice the sampling period of the slowest device. For example, for a camera with a sampling rate of 100 Hz, we set the minimal period Pmin=20 ms. The maximal period is chosen to be four times the minimal period: Pmax=80 ms. This allows for a good synchronization of fragments as small as 500 ms.
As hardware for an implementation, an ST-Microelectronics Nucleo F410RB development board was chosen. This development board features an ARM Cortex M4 micro-controller together with a USB programmer and debugger. The board can be powered using the USB connection or using an external DC voltage (between 7 and 12 V). The STM32F410 micro-controller includes a great number of peripherals on a chip. The most interesting peripheral for this application is the integrated hardware true-random number generator (TRNG), that generates random 32-bit values using analog noise. This is an improvement over the PRNG that has been assumed before, though it is in no way required for a well-functioning system. A pseudo-code description of the software running on the micro-controller is presented in Listing 1 (below). For easy replication of the proposed synchronization system, we also provide Arduino-compatible code (Listing 2), which can be easily implemented to build synchronization devices. In order to feed the 1-bit random signal to multiple external measurement systems, the micro-controller output pin is connected to multiple Bayonet Neill–Concelman (BNC) connectors. In the experimental setup shown in Fig. 2, one synchronization output is connected to a National Instruments DAQ device, acquiring both the microphone signal and the synchronization signal, while the other is connected to a LED driver placed in the field of view of a GoPro camera.
Note that in many cases DAQ devices come with high-speed digital to analog (DA) channels or digital output channels that may be used to generate the pseudo-random synchronization signal. In addition, most studio audio devices provide a large number of DA channels that can be used for driving other audio device inputs and even directly driving a LED for video synchronization (in case the output is DC coupled and is able to source a sufficient amount of current). As many such devices support DA conversion rates up to 192 kHz or higher, the proposed method can be implemented yielding accurate synchronization, without the need to build separate hardware. One only needs to generate an audio sequence according to the principles outlined above.
Aligning multiple data sequences
RESULTS AND DISCUSSION
Illustration: cricket songs
To illustrate the efficacy of the proposed approach, we performed two sets of experiments. The first experiment consisted of recording a calling field cricket (Gryllus campestris) using a GoPro Hero 3+ camera capturing video at 240 frames s−1. The audio was recorded using a Brüel & Kjær (B&K) 1/8 in microphone, and the signals were recorded using a National Instruments USB 6356 DAQ device. Fig. 2 shows the setup in more detail. We performed audio and video recordings of 5 s, and extracted the synchronization signals from audio and video separately. Using custom-written Matlab code, we performed alignment of the audio and video data. Separate wave (audio) and MPEG-4 (video) files were written to the hard disk using a 10× lower sampling rate. Using a video-editing tool (Wondershare Filmora), the two sequences were combined, and the resulting video is shown in Movie 1. When observing the video, the motion of the cricket's wings during stridulation is synchronized with the recorded sound.
Illustration: bat behavioral experiments
The second illustration is a series of bat behavioral experiments performed during the EU-FP7 project ‘ChiRoPing’, on Barro Colorado Island, Panama. We performed experiments on Micronycteris microtis, Macrophyllum macrophyllum and Noctilio leporinus, one gleaning and two trawling bats, respectively. All required research permissions were obtained from the Panamanian Environmental Agency (ANAM) and the Smithsonian Institutional Animal Care and Use Committee (IACUC; 2008-11-06-24-08). The acoustic data were recorded with a custom-made 16-channel microphone array based around Knowles FG-23329-p07 condenser microphones. The video data were recorded with a high-speed camera (CamRecord CR600×2, Optronis GmbH, Kehl, Germany) at 500 frames s−1 (M. microtis) and 250 frames s−1 (N. leporinus), respectively. Synchronization was performed using our proposed synchronization mechanism, and audio and video data were combined in single video files. The results of these measurements are shown in Movie 1 and annotated screen-shots of the M. microtis and N. leporinus recordings can be seen in Fig. 3.
In this paper, we have described and demonstrated a flexible and low-cost synchronization method for multi-modal bio-acoustic experimental data. Our proposed synchronization method relies on embedding synchronization information into the sensor data directly. The main advantages are (1) that no manufacturer-provided synchronization method is needed, (2) that synchronization information is not easily lost, i.e. no manual recording of metadata is required, and (3) different sensor data streams can be easily checked for correspondence. As the method does not rely upon manufacturer-standardized synchronization mechanisms, it can be easily extended to other sensing modalities using vibration sensors, force sensors, etc., by electrically coupling them into a sacrificial channel of a multi-channel recording device. The synchronization method can even be extended to synchronize with optical 3D tracking equipment, e.g. the Qualisys Miqus cameras, by using an 830 nm infrared LED. We have provided an example of how to construct such a synchronization device using off-the-shelf hardware components, and provided a pseudo-code implementation for the pseudo-random generator.
Currently, we have only demonstrated synchronization with multi-channel recording systems through a sacrificial data channel. In the case where multi-channel recording equipment is not available, other means of inserting the synchronization data can be devised. For example, the synchronization information could be embedded in the least-significant bit (LSB) of the recorded data, which is often occupied by noise in real-world recordings. This, however, would require alteration of the recording hardware, making this approach less straightforward. The synchronization information might also be inserted acoustically by using amplitude-shift keying (ASK) modulation (or more advanced modulation schemes) in a section of the acoustic spectrum which is not relevant to the biological experiment, resembling the approach used in the cameras using a blinking LED. This ASK-modulated signal can also be inserted electrically into the analog-to-digital converter of the recording device, requiring a small electronic circuit to sum the microphone signal with the ASK-modulated signal.
The proposed approach opens the opportunity to synchronize an arbitrary number of data-acquisition systems and sensor streams, as no intrinsic limitation is present in the proposed architecture. During our bat behavioral experiments, we routinely synchronized up to four high-speed cameras with two 16-channel microphone arrays. We argue that the flexibility and robustness of our proposed approach, in combination with the fact that it can be constructed using off-the-shelf components, makes it a useful tool that can be applied in a broad range of biological behavioral experiments in which the combined recording of multi-sensor data streams is required.
The authors would like to acknowledge the entire ChiRoPing consortium for the fruitful discussions, collaboration and experimentation leading to this synchronization mechanism and the illustrative data. The videos of the bat experiments were produced during the ChiRoPing EU-FP7 project. More details can be found on the ChiRoPing website (http://www.chiroping.org).
Conceptualization: H.P., J.S.; Methodology: D.L., H.P.; Software: D.L., E.V., J.S.; Validation: D.L., E.V., J.S.; Investigation: D.L., E.V., I.G., J.S.; Resources: I.G.; Data curation: J.S.; Writing - original draft: D.L., E.V., I.G., W.D., H.P., J.S.; Visualization: J.S.; Supervision: W.D., H.P., J.S.; Project administration: W.D., H.P., J.S.; Funding acquisition: H.P.
The authors gratefully acknowledge the support of the Industrial Research Fund (Industrieel Onderzoeksfonds) of the University of Antwerp. Part of this study was funded through the ChiRoPing project (Seventh Framework Programme of the European Union, IST contract number 215370).
The authors declare no competing or financial interests.