How RoEx Automix works under the hood
In this post, we'll peek under the hood of our RoEx Automix technology to give you an idea of how we treat your audio to give you back a professional and balanced sounding mix in minutes instead of days.
When mixing audio, it is paramount that all audio components/signals be heard clearly. Masking is the phenomenon where a sound source becomes inaudible due to the presence of other sources. In the context of mixing audio in a studio, an example would be when the kick drum cannot be heard due to the bass guitar. In practice, a balanced mix, in which every sound source can be heard clearly, is achieved by the application of audio effects to each source or combination of sources. Volume levels are set, sources are positioned to the left or right in the stereo field (Panning), and equalisation (EQ) is applied to boost or reduce certain frequencies. Finally, dynamic range compression (DRC) is applied to control the dynamics of sources.
Our mixing system
RoEx Automix uses AI to figure out the correct audio effect settings for spatial balance, masking reduction and perceived loudness within any multitrack audio submitted to be mixed.
RoEx Automix uses advanced music information retrieval (MIR) techniques to analyse each track/stem in the context of all the other tracks/stems that are affecting it. This occurs in the "Multitrack Analysis Module" as illustrated in the figure above. Within this module, we can analyse and extract various multitrack audio features. We can extract some of these features in real-time, thus allowing us to mix audio in real-time if required.
The multitrack features that are extracted from each track/stem are then fed to a model that understands mix engineering rules and makes decisions on what it thinks are the most appropriate settings for volume, EQ, DRC, Panning and Reverb. The decision the model makes is based on the sonic characteristics of each track/stem submitted, how they interact with each other as well as the musical style. This allows RoEx Automix to mix individual stems just as well as a full multitrack.
Once the model has decided on the best multitrack audio settings, the audio effect, panning and loudness settings are applied to each track/stem, the multitrack audio is combined and then finally peak-normalised to -3dBFs. This allows for any headroom required for mastering.
We have also built a mastering module that is based on very similar technology as described above. This part of the signal chain is optional, but this module takes the mixed audio, the loudness preference the user has indicated and applies our AI mastering signal chain. The final output is either a wave, FLAC or mp3 file, that is ready for distribution on the likes of Spotify, Soundcloud or Bandcamp.
Our API (Tonn)
We currently host our RoEx Automix technology on Google Cloud Platform (GCP) as part of our Tonn API, where it exists as a containerised application that can scale with the number of mixes required at any one time. This allows us to scale with demand and our customers' needs. Furthermore, it allows any external application to create mix tasks in parallel, thus speeding up the process of mixing a very large multitrack.
A toy example of how this would work is if a multitrack consisted of say 10 guitar tracks, 10 drum tracks, 10 string tracks and 10 synth tracks i.e. 40 tracks in total. The user would create a mix task for each instrument group to occur in parallel. Once the guitar, drum, string and synth mixes were finished, the user can create a final mix and mastered track from the guitar, drums, string and synth mixes. This is illustrated in the figure above.
Using Tonn API our RoEx Automix technology can mix 8 tracks that are all 3 mins (typical pop song) in length in roughly 4.5 mins. This is our current benchmark and is something that we are always striving to improve. If you would like to try out the Tonn API for yourself please contact us for an API key. The Tonn API documentation can be found here.
RoEx Realtime Mix
We are currently developing a system that can take multiple audio channels, analyse each channel and apply audio effects in real-time with the aim of reducing masking and enhancing auditory clarity. This is based on the same technology we use for RoEx Automix but is meant for things like a live broadcast, video games or VR where there are multiple sound sources coming in and out over time. The system is adaptive and can respond to external stimuli i.e. if the main character in a video game was speaking, it would automatically emphasise their speech and filter the other sounds so they're not masking as much. Please get in touch, if you'd like to find out more.
Music production or ‘mixing’ is time-consuming and utilises a very different skill set than music creation. One typically has many sound sources, and each needs to be heard simultaneously, but they all can be created in different environments and with different attributes. The final mix should enable all sound sources to be distinct, yet contribute to a nice clean blend of the sounds. Achieving this is very challenging and usually requires a professional sound engineer.
Automatic music production tools like RoEx Automix solves these problems and allows musicians to get their content to their intended audience faster, easier and cheaper than if they had either mixed and mastered themselves or paid for professional services. This lowers barriers to entry and makes a music career more accessible to those who do not have a technical background.