OpenAI is venturing into the realm of voice cloning with the preview debut of its Voice Engine, an extension of its text-to-speech API. While the technology shows promise, OpenAI is proceeding cautiously, cognizant of the potential risks associated with its misuse.
Voice Engine allows users to generate synthetic copies of voices based on a 15-second voice sample, utilizing a generative AI model developed over two years. However, the tool’s public release date remains uncertain as OpenAI seeks to address concerns and ensure responsible deployment.
Jeff Harris, a member of OpenAI’s product staff, emphasized the company’s commitment to understanding and mitigating the risks associated with voice cloning technology. The model powering Voice Engine has been used in other OpenAI products, including the ChatGPT chatbot and Spotify’s podcast dubbing feature.
While the training data for Voice Engine remains undisclosed, Harris noted that it comprises a mix of licensed and publicly available data. OpenAI’s approach aims to maintain the quality of synthesized speech while preserving user privacy, with no user data used for training or fine-tuning.
Voice Engine’s pricing, set at $15 per one million characters, positions it competitively in the market. However, the tool currently lacks customization options, such as tone or pitch adjustment, which may limit its appeal to some users.
The introduction of Voice Engine raises ethical questions regarding the future of voice talent and potential misuse of synthesized voices. OpenAI is monitoring industry developments closely, recognizing the impact on voice actors and exploring ways to balance technological advancements with ethical considerations.
While Voice Engine holds promise for various applications, including healthcare and accessibility, OpenAI is taking steps to prevent misuse. Watermarking techniques and engagement with a red teaming network aim to safeguard against malicious use cases.
As OpenAI navigates the preview phase and gathers feedback, the company remains committed to ensuring the safe and responsible development of Voice Engine. The technology heralds a new era in voice cloning, offering both opportunities and challenges for the future of synthesized speech.