ON HUMANS BLOCKCHAIN

Speak in any language

The AIs we developed will empower people to talk in any language, by generating unique synthetic voice that can act as an extension of their real voice in the digital space.

Request yours

AT GLANCE

Stage

Proof of Concept Ready

Area

Synthetic Media | AI NFTs

Status

Ongoing

PROJECT LINKS

Project Website

FUNDED

Undisclosed

Maximum funding goal reached

Allocation

$ 0.000

Price per token

$ 0.0

About
AI NFT

About the project

Let your digital DNA
speak the language you want!

With our unique AI models, we can train your cloned voice to speak any language you want to speak. Don’t spend months on studying to learn a new language but let your digital DNA do it for you. Request to get your voice cloned now and train it with the language you want!

Subscribe to updates

The idea

As speech is one of the main means for us to communicate, the AIs we developed will empower people to talk in any language, by generating unique synthetic voice that can act as an extension of their real voice in the digital space. The technology developed can enable a full-on translation that keeps the original voice of the person used. This way, everyone can generate his own avatar, and make it speak in multiple languages.

Each language has its own facial expressions and lip movements that are translated automatically by the system when the language is chosen. As such, the technology developed not only generates a digital avatar that can talk in multiple languages but also adapts the lip-synch to fit the language that is spoken, rendering an avatar that can be mistaken for a real person. This initial use case allows users to insert text input which will be converted into audio that will be spoken in any language by the digital avatar.

The tech

Human voices have a wide range of natural variations and fluctuations that make it very challenging for a machine to reproduce. In order to tackle this, the cross language voice conversion we developed, enables the generation of speech in other languages based on samples from a speaker's native language. A model trained from a mix of languages is used to disentangle the information from the user’s data and generate the mel-spectrograms for the output. Finally, a vocoder converts the result to speech. The algorithm is able to generate phonemes that are not found in the source dataset, in order to replicate the user’s voice as it would sound from a native speaker.

We are entering an era of continuous disruption.

In order to map the new language generated on the avatar, the lip animation starts from sample videos of pronouncing different words and pieces of text. After that, a neural network detects patterns like what the lips look like when the speaker says different vowels and consonants, as well as the subtle micro-expressions the face does when speaking.

This is also known as mapping, a process in which the neural network learns to distinguish between different phonemes and the movement of the lips. A mapping process is also performed on the avatar to ensure that the mouth movement is in sync with the text provided in any language. More precisely, the neural network maps the image, outlining its key features like eyes, nose, and ears, focusing especially on the lips.

Updates

Jun 15, 2022

Talkens, the innovations that bring NFTs to life

Talkens transforms your digital assets from a generic image that resides inside a blockchain into an intelligent asset class infused with artificial intelligence.

Jun 1, 2022

AI-generative art, a new catalyst for NFTs

The market for non-fungible tokens soared in 2021, reaching a total volume of USD 41 billion in 2021, according to the latest metrics, meaning that the NFT collectables market is steadily catching up with the traditional art market which positioned itself at around USD 50 billion in 2020.

Explore all