Key Points

  • Meta has developed Voicebox, an advanced AI model capable of editing audios, sampling, and stylization.
  • Voicebox can generate high-quality audio fragments and edit pre-recorded audio, including noise reduction and pronunciation correction.
  • The model is multilingual and can generate speech in six different languages, offering versatility and global applicability.
  • Voicebox represents a significant milestone in generative AI research, opening new possibilities for speech synthesis, style transfer, and audio editing.
  • While Voicebox is not currently available to the public due to potential risks, its development showcases the potential of AI in audio processing and inspires further technological advancements.

Meta has exclusively announced the development of Voicebox, a state-of-the-art artificial intelligence model capable of performing speech-related tasks such as editing, sampling, and stylization, even without being specifically trained for them, but through context-based learning, as stated by Mark Zuckerberg

Voicebox has the ability to produce high-quality audio snippets and edit pre-recorded audio, such as removing unwanted noise or correcting pronunciations, while maintaining the original content and style. Additionally, this model is multilingual and can generate speech in six different languages. Meta is creating a new way to use AI.

Voicebox is Here

In the future, multipurpose generative AI models like this are expected to perform functions such as delivering natural speeches to virtual assistants and metaverse characters, allowing visually impaired individuals to listen to written messages in their preferred styles, and providing creators with audio editing tools for video production, among many other applications.

The versatility of Voicebox stands out in tasks such as contextual text-to-speech synthesis, speech editing and noise reduction, style transfer across languages, and diverse speech sampling.

This advancement represents a significant milestone in generative AI research and promises to open up new possibilities in the field of audio, while also inspiring other researchers to further develop this technology.

Key Features:

  • Voicebox: State-of-the-art speech generative model.
  • Flow Matching Method: New approach used by Meta AI to address text-guided speech filling tasks.
  • Data Scale: Trained with a large amount of data to enhance its context-based learning capability.
  • Style Variety: Can generate results in various styles and create high-quality audio clips.
  • Limited Availability: Due to potential risks of misuse, the model and its code are currently not available to the public.
  • Transparency and Accountability: Meta AI seeks to strike a balance between sharing its research with the AI community and ensuring responsible use of its models.

Large-scale generative models like GPT and DALL-E have revolutionized research in natural language processing and computer vision. These models not only generate high-fidelity texts or images but are also generalists and can tackle tasks that have not been explicitly taught.

However, generative speech models are still in a primitive stage in terms of scale and task generalization. In the case of Voicebox, it is a non-autoregressive flow matching model trained to complete speech fragments given an audio and text context, using over 50,000 hours of unfiltered and unenhanced speech.

Similar to GPT, this tool can perform various tasks through context-based learning, but with the advantage of being able to condition on future context as well. It can be used for monolingual or multilingual text-to-speech synthesis without the need for pre-training, noise removal, content editing, style conversion, and diverse sample generation.



About Meta Platforms, Inc.


  • Ticker META
  • Exchange NASDAQ
  • Sector Communication Services
  • Industry Internet Content & Information
  • Shares Outstandng 2,248,669,952
  • Market Cap $1.05T
  • Description
  • Meta Platforms, Inc. develops products that enable people to connect and share with friends and family through mobile devices, personal computers, virtual reality headsets, wearables, and in-home devices worldwide. It operates in two segments, Family of Apps and Reality Labs. The Family of Apps segment's products include Facebook, which enables ...
More about META