FlexLip: A controllable text-to-lip system

This webpage present qualitative results for our submission:

FlexLip: A controllable text-to-lip system.
Dan Oneață, Beáta Lőrincz, Adriana Stan, Horia Cucu.
Submitted at Sensors, 2022.

Audio samples

The following samples are generated with the proposed text-to-speech component. They correspond to the following three sentences:


System id Sample 1 Sample 2 Sample 3
Natural
O-8
O-8-LJ
O-8-LT
O-8-LJ-dvb
O-8-LT-dvb
O-1-LJ
O-1-LT
O-1-LJ-dvb
O-1-LT-dvb
O-0.3-LJ
O-0.3-LT
O-0.3-LJ-dvb
O-0.3-LT-dvb

Text to keypoints

Here we present results for the full pipeline, which goes from text to keypoints. The results correspond to section §5.3 in the paper.

audio source:

key:
text:

key:
text:

key:
text:

key:
text:

key:
text:

key:
text:

key:
text:

key:
text:

Zero-shot speaker adaptation

Below we show results when applying the Obama pretrained model on audio data collected from Trump. We present two cases:

The predictions are shown in orange and the groundtruth lips in blue. These results correspond to Table 3 in our paper.


key: xrPZBTNjX_o-000-023

key: xrPZBTNjX_o-000-034

key: xrPZBTNjX_o-000-039

key: xrPZBTNjX_o-000-045