amazon ai synthesized singers
Kyle Wiggers, writing for VentureBeat: AI and machine learning algorithms are quite skilled at generating works of art — and highly realistic images of apartments, people, and pets to boot. But relatively few have been tuned to singing synthesis, or the task of cloning musicians’ voices. Researchers from Amazon and Cambridge put their collective minds to the challenge in a recent paper in which they propose an AI system that requires “considerably” less modeling than previous work of features like vibratos and note durations. It taps a Google-designed algorithm — WaveNet — to synthesize the mel-spectrograms, or representations of the power spectrum of sounds, which another model produces using a combination of speech and signing data. The system comprises three parts, the first of which is a frontend that takes a musical score as input and produces note embeddings (i.e., numerical representations of notes) to be sent to an encoder. The second is a model that is modified to accept the aforementioned embeddings, whose decoder produces mel-specrograms. As for the third and final component — the WaveNet vocoder, which mimics things like stress and intonation in speech — it synthesizes the spectrograms into song. The frontend performs linguistic analysis on the score lyrics, allowing for three possible vowel levels of stress and ignoring punctuation. In time, it discovers which phonemes (perceptually distinct units of sound) correspond to each note of the score using syllabification information specified in the score itself. It also computes the expected duration in seconds of each note, as well as the tempo and time signature of the score, which it combines into embeddings.Read more of this story at Slashdot.
Keep visiting hayahmagazine.com for the latest news and updates