Audio Data

Audio data is primarily produced by hydrophones (underwater microphones). ONC hosts a large number of hydrophones (over 160 devices, 11 different types) that are the single largest contributor to the archive (about 70% of the 1.2 petabytes of storage). These hydrophones produce audio files with sounds at a wide range of frequencies, having applications in seismology, marine mammal studies, ship noise and more.

Audio data may be diverted or delayed for security reasons or bandwidth limitations. Read on below for more information.


We have recently switched our primary storage format for Audio Data from compressed wav files to FLAC files. FLAC is more space and CPU efficient, and is widely supported. As of April 9, 2021, all hydrophones produce FLAC files as source. Prior to that, there was a transition period when FLAC files were generated from the source wav files and archived as well, that period was August 10, 2020 to April 9, 2021. During this transition period, both wav and FLAC formats were archived. File formats that are archived are readily available, while other formats are generated on-the-fly and are much slower to access. For fast results for lossless audio data, select wav file formats up to August 10, 2020, and select FLAC thereafter. The data availability graph in Data Search maybe updated in the future to show which files are available directly, while the archivefiles service can offer a list of files in the archive as well.

If users create large requests that generate FLAC/MP3/WAV formats on-the-fly, the temporary space to hold these data products may fill up and their search requests will be cancelled and the data deleted. Searches may also be cancelled for very large search bounds if the search will take too long complete without being interrupted by semi-monthly software maintenance. Please see the above date ranges to create searches that only retrieve data from the archive and avoid generating data products on-the-fly. While the temporary / holding space is quite large, it is easy to request over 10 terabytes of data in a single search. We're working on a more permanent fix for this issue. Please accept our apologies if your search requests are impacted. Cancellations will only be done if necessary and the user will be contacted via email.

Another very good workaround is to request data in smaller "chunks" of time. Users can also do this programmatically using the dataProductDelivery Service (Oceans 3.0 API), so that users can download and process the data as they go. Another alternative is to run processing code or software on the ONC cloud. This avoids downloading the data entirely, removing the limitations of local computing resources. Contact us if you're interested.


On-the-fly conversion between audio formats is now disabled by default because of the tendency to fill the temporary space as noted above. With the default option, searches may return no data found for some formats while there is data in other audio formats. FLAC and wav formats are available as noted above. MP3 format is not recommended and is almost always generated on-the-fly. See Audio Format Conversion for more information.

IOS Hydrophone Arrays always convert on-the-fly from their source raw data .hyd files to audio file products - this is slow and may also fill the temporary space. Contact us before requesting IOS Hydrophone Array audio data.

Given the sensitive nature of hydrophone data, the military has the ability to divert the audio data as required. Diverted data is then reviewed by military authorities, and if it does not contain sensitive recordings it is returned to the ONC archive. This process usually blocks out a few days of near-live data and then the review and return takes up to 3 months (usually about a month). Diversion occurs securely at the source and ONC has no access to the data until it is reviewed and returned. The data products, Data Preview and data availability are all updated automatically when the data is returned. Because hydrophones generate multiple types of data, the data availability plot in Data Search may show available data when audio data is not available. This often indicated by the tones of the colour in the data availability plot:

(Light green means less data, likely that the audio data is not available). The other data provided by hydrophones are scalar sensor data, stored as log .txt files, and FFT files, both of which are not subject to diversion. Spectrogram plots and associated data products can be made from either FFT file or audio data as source files or a combination of both; users will see the option to choose the source in a few places (Data Search and Search Hydrophone Data pages). Data Preview's monthly tab shows both the FFT source and audio source spectrograms for the past month - users can also refer to those previews to see if audio data is available. Please note that data may be unavailable for reasons other than diversion, such as internet bandwidth limitations and technical issues.

Prior to August 2016, instead of diverting the entire frequency range of the data, the military diverted only a low frequency band of the data. When this filtering occurred, the file-name was appended with 'LPF' for low-pass filtering. Once filtered, this data remained in separate frequency bands in separate files, so data returned to ONC was labelled 'HPF' for high-pass filtered. See here for more information on the diversion of hydrophone and seismometer data. Data Search and the DataProductDelivery API provide options for filtering out HPF and LPF files, with the "original data" filter in the data diversion mode options (see the Data Product Options section below, one can also select the HPF or LPF files for download). If you happen to download a dataset that has LPF or HPF files and you want to segregate the files, follow these instructions: How to remove LPF and HPF data from an acoustic data set.

For hydrophones located on remote observatories with limited internet bandwidth (Cambridge Bay for instance), live hydrophone audio files may not be available. In this case, any missing audio data is usually stored on site to be retrieved during regular maintenance (yearly or less). Spectrogram data (FFT) files will normally be available near-live as they are much smaller and can be sent over limited connections.

When requested format of audio data already exists in the archive, the search will complete immediately, otherwise some processing will occur to convert the formats. Processing times vary, but in general, expect it would take approximately 15 seconds to process one 5 minute file, so one hour to process a day's worth of data.

Some software applications that employ good error checking may reject some of our wav files for erroneous header information. The cause of this is in the military diversion drivers and is outside of ONC's direct control (they hardcoded these values). Only a select few customized applications will reject the files. Audacity, MATLAB and many others read these files fine. If you get an error like this: 

and you are an active MATLAB user, you can fix the wav files with this MATLAB code: oncwavfilefix.m and its dependency: wavchunksizefix.m. If not, please contact us and we'll be happy to help.

See the format sections below the data product options for information on calibration.

Oceans 3.0 API filterdataProductCode=AD

Revision History

  1. 20100217: Hydrophone files initially made publicly available
  2. 20130123: searches handled by MATLAB search to process data on the fly if needed; all formats made available to all devices

Data Product Options

Hydrophone Channel

Hydrophone Data Diversion Mode

Audio Downsampling

Audio Format Conversion

Format

Hydrophone data is available in WAVMP3, and FLAC audio files. For normal hydrophones, WAV files were the source data format until April 9, 2021, after which FLAC is the source data format. For the IOS hydrophone arrays, HYD files are the source data format. MP3s are generated from the source data format (WAV or FLAC), usually in 128 kilobit quality and were stored in the archive for fast retrieval for most hydrophones until August 10, 2020. After that date, FLAC files were generated from the WAV source format and archived for fast retrieval. After April 9, 2021, FLAC is the source format, stored directly in the archive. All three formats may be generated on-the-fly if they are not pre-generated and archived. On-the-fly generation is relatively slow; users should try to access the archived data formats for best results. Overall, audio / hydrophone data is voluminous and is best handled in small chunks, perhaps via the API: Oceans 3.0 API Home

Some autonomously operated third-party hydrophones only acquired data in FLAC format and did not have any formats pre-generated. These files are archived and retrieved without any modification: none of the above data product options apply to FLAC files from those hydrophones. 

For WAV, FLAC and other lossless format data, the files will be accompanied by a calibration file (except for FileDownloadService requests). This file is a text file, comma delimited with one descriptive header row. It is named following the usual standard and ending with '-hydrophoneCalibration.txt'. The dateFrom / dateTo is taken from the calibration date range of the first sensitivity bin attribute. If a search extends over multiple calibrations, multiple files will be produced. The calibrations included are any that overlap with the data time range. A currently applicable or on-going calibration will produce a file named with a dateTo that is midnight tomorrow. Here is an example of the first few lines of a hydrophone calibration text file:

#Hydrophone calibration sensitivities. The file contains one header line followed by comma delimited data. First column is the centre frequency of each frequency bin(Hz). Second column is the sensitivity calibration for each bin (dB/uPa). Data is from device attributes: http://qaweb2.neptune.uvic.ca/DeviceListing?DeviceId=1230 . Device attribute HydrophoneSensitivityVectorPart1 last modified: 02-Dec-2015 20:59:52. File created: 29-Mar-2016 15:36:06.
1, -33.1853
2, -30.6233
3, -29.8957
4, -29.6103

Calibrations are stored in the device attributes system, visible in the "additional attributes" tab in device details, for example: https://data.oceannetworks.ca/DeviceListing?DeviceId=23483 Sensitivities are stored in attributes named HydrophoneSensitivityVectorPart# and the frequency bins HydrophoneSensitivityVectorBinsLeadingEdgePart#, broken into parts in order to fit in the system, where # is the part number in order of increasing frequency. 

Post-deployment calibrations are also available. The device attribute fields are the same except with a "Post" inserted, e.g. HydrophonePostSensitivityVectorPart1. The format of the calibration text file is the same except that the file-name ends with '-hydrophonePostCalibration.txt' and the header line includes the text 'post-deployment'. Post-deployment calibrations are taken after a device is returned from the field and the frequencies calibrated may only be up to 2000 Hz (lower frequencies tend to show degradation over time, whereas higher frequency responsiveness is more stable). For all calibrations, dateFrom on the device attribute is the time when the calibration was generated. So Data Search requests for post-deployment calibrations include calibrations that occur during the data time range and the first one after the data, if available, so that users have access to the post-deployment calibration.

Oceans 3.0 API filter: extension={wav,mp3,flac}

Discussion

To comment on this product, click Add Comment below.