Oculus Lipsync maps human speech to a set of mouth shapes, called “visemes”, which are a visual analog to phonemes. Each viseme depicts the mouth shape for a specific set of phonemes. Over time these visemes are interpolated to simulate natural mouth motion. Below we give the reference images we used to create our own demo shapes. For each row we give the viseme name, example phonemes that map to that viseme, example words, and images showing both mild and emphasized production of that viseme. We hope that you will find these useful in creating your own models. For more information on these 15 visemes and how they were selected, please read the following documentation: Viseme MPEG-4 Standard
You can click each image to view in larger size. Only a subset of phonemes are shown for each viseme.
Viseme Name | Phonemes | Examples | Mild Production | Emphasized Production | 3/4 Rotation |
---|---|---|---|---|---|
sil
|
neutral
|
(none - silence)
|
None
|
||
PP
|
p, b, m
|
put, bat, mat
|
|||
FF
|
f, v
|
fat, vat
|
|||
TH
|
th
|
think, that
|
|||
DD
|
t, d
|
tip, doll
|
|||
kk
|
k, g
|
call, gas
|
|||
CH
|
tS, dZ, S
|
chair, join, she
|
|||
SS
|
s, z
|
sir, zeal
|
|||
nn
|
n, l
|
lot, not
|
|||
RR
|
r
|
red
|
|||
aa
|
A:
|
car
|
|||
E
|
e
|
bed
|
|||
I
|
ih
|
tip
|
|||
O
|
oh
|
toe
|
|||
U
|
ou
|
book
|
This document is a reproduction of the official Meta Developer Documentation in response to the notice that it may be removed from their site soon. This document is very important to VRChat avatar development so I thought it prudent to create this backup here.