You are on page 1of 5

Facial Expression Communication for Healthcare

Androids
Patrick Santos
University of Ottawa
Ottawa, Canada
Erbene de Castro Maia Junior
Military Institute of Engineering
Rio de Janeiro, Brazil
Miriam Goubran, Emil Petriu
University of Ottawa
Ottawa, Canada


Abstract This paper presents an experimental android
platform for two-way communication between humans and
robots in a healthcare environment using facial expressions. A
depth and color sensor is used to analyze and map human facial
expressions to a standard 3D face model. An android robotic face
is then configured to act out the expressions by deforming an
elastic skin using servo actuators.
KeywordsHuman robot interaction; Humanoid robots;
Deformable models
I. INTRODUCTION
Facial expressions are an important nonverbal
communication modality between humans. They can also add a
more humanlike communication method to personal service
robots which allows for an affective human-robot interface
(HRI). In smart homes or long-term healthcare environments,
the ability for service robots to communicate using facial
expressions would allow them to interact more naturally with
the humans (patients and any caregivers) in the facility than
with industrial-style mechanical interfaces such as buttons and
switches.
In this paper, we present an experimental hardware and
software platform to enable an android (humanlike robot) to
generate facial expressions through analysis of facial
expressions from a depth and colour (RGBD) sensor. It is
designed as the counterpart to our previous facial expression
recognition work in [1]. Section II discusses the use of facial
expressions for healthcare scenarios. Sections III and IV
discuss previous work with expressive robots and facial
expression cloning. Section V shows the details of the software
design. The description of the android head, developed in
parallel with the software, is discussed in Section VI.
II. FACIAL EXPRESSIONS
Facial expressions are temporary deformations on the face
that humans use for communication. Facial expressions in HRI
provide a powerful nonverbal communication modality along
with the tone of voice, body gestures, and body postures.
Research in psychology has shown that humans can accurately
predict the (projected) emotional state of other humans by
looking at photographs of their facial expressions [2]. The
expressions for happiness, anger, sadness, disgust, fear, and
surprise, sometimes referred to as the six basic emotions,
have been found to cross cultural boundaries. Facial
expressions also complement speech in a conversation [3] and
convey messages a human is not willingly trying to convey,
such as pain and tiredness [4].
The use of facial expressions allows for richer
communication between humans and healthcare androids.
Robots with humanlike gestures and facial expressions have
been studied for elder care applications [5] and for assisting in
the treatment of children with autism [6]. These robots used
nonverbal communication, including facial expressions, to
provide more engaging interaction with human patients. In a
healthcare scenario, androids should be able to use nonverbal
communication to facilitate communication with all humans in
their environment.
III. FACIAL EXPRESSION ANIMATION IN ROBOTICS
Many face robots (robots that consist only of an expressive
face) and androids were implemented with facial expressions.
The robots in [7] and [8] used facial expressions as responses
to a humans attempts to communicate with them. For instance,
these two works present robots which use a confused
expression to let a human user know that their verbal query
was not understood. This use of facial expressions is faster than
having the robot say a verbal error message, and it could
complement such an error message. The Furby robot toy [9],
which was popular among children, used facial expressions as
part of emotional displays. To perform facial expressions, most
robots use actuators to deform their structure or rubber skin.
The mechanical construction of a face robot or android
depends largely on which facial expressions it is designed to
convey, its application, and its general appearance. The SAYA
[10] android was constructed with a very humanlike
appearance. It controlled 19 points on its face using McKibben
pneumatic actuators, despite only having six pre-set facial
expressions (plus neutral). The CRF3 robot in [11] was able to
use a simpler set of motors to control 13 points on its face
because its designers opted for an intentionally non-human
appearance. Other robots with non-human appearances use
other methods or components to emphasise their facial
expressions, such as colored lighting on the face [12]. The
Probo robot [13], designed for interaction with children,
incorporated flexible cables instead of attaching motors
directly to its control points. The use of cables provided more
compliant joints, to avoid injuring children who touch the
robot. The robot also had an elephant-like trunk which moved
as part of its facial expressions. Many robots have pre-
programmed facial expressions, some of which are transferred,
or cloned, from existing animation or capture data.
This work is partially supported by NSERC.
978-1-4673-5197-3/13/$31.00 2013 IEEE


IV. FACIAL EXPRESSION CLONING
Facial expression cloning (FEC) allows human animators
to create a set of expressions for a face and automatically
transfer these expressions to other faces with minimal effort.
FEC is the process of acquiring existing animations from a
source face and mapping those animations to another model,
the target face, which is not identical to the first model [14].
The source face can be a human or a virtual 3D face model,
and the target face can be a 3D face model or a robot.
Most work in FEC is concerned with the mapping of facial
expressions from a human to a virtual model or between virtual
models. A method in [15] cloned facial expressions from a
humanlike 3D model to cartoon-like animal and fantasy
creature models with human-like facial features. Its mapping
algorithm fitted the velocity of the points on the source and
target faces during an animation. However, the work relied on
the assumption that the target could express all the same
expressions as the source. The work in [16] cloned expressions
from humans to various 3D models. The system of [16]
measured 54 feature points on a human actors face. Similar to
[15], it then mapped the velocities of the source face to the
target. The authors of [16] claimed that fitting the velocities
rather than the positions of the feature points reduced artifacts
that would occur if the target face was less expressive than the
source face. The use of velocities in FEC targeting a robot head
could also be effective since the robots actuators set limits on
the speed and the largest displacements during animation.
Compared to FEC systems that map to virtual 3D models,
the current FEC systems that map to robot targets have greater
limitations. The system in [11] mapped facial expressions
defined in a 3D model to a robot which could move 13 control
points on its face. It used linear scaling to do the mapping of
static expressions by multiplying the magnitude of the source
displacements. Since a simple scaling was used, this system
could only map between faces that had the same degrees of
freedom and had very similar shapes. FEC was performed from
a human to a humanlike robot in [17]. That system used the
partial least squares method to map facial expressions gathered
from video of a humans performance to a robot with 34
degrees of freedom. This method required a human animator to
manually map static images from the video to servo positions
for the robot, so this manual mapping process would have to be
repeated if the configuration of servos on the robot was
changed. There is a need for fully automated methods that can
map facial expressions from 3D models or from humans to a
dissimilar robot.
V. SOFTWARE PLATFORM
The software in our system is currently being developed to
analyze human and animated facial expressions and experiment
with techniques to map them to a robot. It contains several
components. The first component, called Kinect capture, uses
the Kinect for Windows RGBD sensor [18] to take facial
expressions from a humans performance and convert the
expressions into the Candide3 format [19]. The second
component, the analysis component, analyzes static poses of an
input face. These components are the first steps in a system
that will automatically map facial expressions from a human
performance or Candide3 face animation to a robot face. The
final framework of the system is shown in Fig. 1. Since the
mapping system is still being designed, only the Kinect capture
and analysis components are discussed in this paper.
The Kinect capture component is written in C# using the
Kinect SDK version 1.5, and it stores a copy of the Candide3
face directly from the definitions in [20]. The mapping from
the Kinect sensor output to the Candide3 face is a direct
mapping, since the Kinect SDKs face animation parameters is
a subset of those for the Candide3 face.
Since the Candide3 source face is so different from the
target face (in terms of its shape and possible deformations),
the expressions must be transformed to an intermediate space
that represents the intersection of the expressions possible on
the source face and the expressions possible on the target face.
To find this intersection of the expression spaces, both faces
must be analyzed separately to find similarities in both their
shape and their possible expressions, which is the purpose of
the analysis component.
The analysis component is written in MATLAB. The
output of the analysis is a matrix of size p v, where p is the
number of control parameters the face can use for a
deformation and v is the number of vertices in the face. Its
elements denote for each control parameter, how much a vertex
on the face moves relative to other points. Each element c
ij
in
the matrix is described by (1).
c
ij
=

k= 0
d
tanh
(
x
jk
max( x
k
)
)

k= 0
d
p
ik

(1)
In (1), d is the number of recorded deformations inputted to the
system, x
jk
is the individual displacement of the vertex j in the
single recorded deformation k, max(x
k
) is the largest
displacement of any vertex recorded in the deformation k, and
p
ik
is value of one of the control parameters used in the
deformation k.
This analysis is necessary for mapping to a robot target
because there will inevitably be a nonlinear relationship
Fig. 1: Software block diagram


between expressions on the virtual 3D model and the target
robot. The nonlinear relationship is due to the robot not exactly
matching the virtual 3D model in terms of its facial expressions
and because of the nonlinear relationship between control
parameters (servo positions) of the robot and the displacements
of control points during its actuation.
VI. MECHANICAL ROBOT PLATFORM
Our mechanical robot platform is a prototype to calibrate
and test the software described in the previous section and to
experiment with the construction of simple but robust
mechanical structures to generate facial expressions. Its design
was loosely inspired by the robot in [10] as its actuators move
control points on the skin to produce its deformations. The
robot moves 15 control points distributed along its eyebrows
and mouth as shown in Fig. 2.
The mechanical platform was designed with do-it-
yourself concepts in mind to make it possible for others to
construct similar robot platforms with readily-available tools
and materials. The use of simple and inexpensive materials
also allows for rapid prototyping of different face
configurations, including the reconfiguration of control points.
For the structure, we used a Halloween skull made of plastic
with a moveable jaw. To produce facial expressions, standard
RC servos pull metal strings located at the back of the head as
shown in Fig. 3. The robot uses a total of 12 servos. Eight
servos are used for the mouth and jaw, while the remaining
four actuate the eyebrows.
VII. EXPERIMENT
Experiments were performed to see how the movements
provided by the Kinect Capture stage could be used to map
expressions to the robot. Note the mappings of both the robot
and software models were performed manually, using
guidelines of animating cartoon characters from [21].
First, the Analysis function of the software was tested on
the Candide3 face to test its viability for use on the robot face.
The following animations were inputted into the analysis stage:
Upper Lip Raiser
Jaw Drop
Lip Stretcher
Brow Lowerer
Lip Corner Depressor
Outer Brow Raiser
These parameters were chosen because these are the
animations provided by Kinect Capture stage (available
through the Kinect SDK). See the Candide3 definition file in
[20] for a numerical description of the animations themselves.
We summed all the rows of the matrix generated from the
analysis stage (described earlier in Section V) into a vector of
length v which represents how much each vertex in the

Fig. 2: Layout and direction of control points
Fig. 3: Servo actuators connected with strings

Fig. 4: Map of active vertices on Candide3 face


Candide3 face can move for any one of the parameters from
the Kinect Capture stage. The values of that vector were
mapped to an image of the Candide3 face in Fig. 4. The figure
shows that the most active points on the Candide3 face are
around the mouth and eyebrows. Compared to the Candide3
model, the robot performs larger displacements on the inner
parts of the eyebrows and at the mouth corners.
The second experiment was to attempt to make expressions
manually in both the Candide3 face and the robot face. The
servo parameters on the robot face were compared to potential
Candide3 animation parameters to make a similar expression.
In this experiment, the robot was configured to display a
neutral face plus the six basic emotions listed in Section II.
Next, a similar configuration of the Candide3 face was
manually constructed. The manual mapping result is shown in
Fig. 5.
VIII. DISCUSSION
Although the analysis stage was successful in detecting the
most active vertices on the Candide3 face, it has properties that
would make it poor for analyzing the expressions on a robot
platform. Since it accumulates all deformations across an
animation, it would count noisy readings as control points and
it would miss subtle deformations, such as movements in the
cheeks (which have lower magnitude than those around the
mouth corners). Additionally, parts of the face move in
different directions during expressions, and the analysis stage
does not consider this at all. As such, it would not be useful to
make an intermediate space using the output of the analysis
component as it stands for the robot face.
The robot platform was able to perform the deformations
required to mimic cartoon-like face animations, and its
deformations could be reasonably matched to the Candide3
face using parameters captured from a Kinect sensor. As
expected, there are many nonlinear relationships between servo
positions and the Candide3 parameters. For instance, to make
the frowning mouth shape, the robots servos only pulled
control points but the Candide3 face needed to push and pull
vertices from their neutral positions. Additionally, the robot
platform had issues with the strings and the latex rubber skin.
After several cycles, the strings loosened and required
recalibration of the motors to produce the expressions properly,
or they broke completely. The latex skin provided a sufficient
amount of deformation for the expressions, but it lacked a
human-like skin texture.
IX. CONCLUSION AND FUTURE WORK
We have successfully implemented a function that can
identify where on a face deformations occur by examining the
displacements, and we implemented an android head with
similar movements to a subset of the Candide3 heads
animation parameters. This platform will be refined over
several revisions to create a testbed to experiment with facial
expression animation for an android in a long-term care or
smart home scenario.
In future works, we plan to continue on both the software
and robot platforms. The software will be developed to
examine movements from the robot as well as the Kinect
capture stage. Additionally, the function in the analysis stage
will be refined to include magnitude and direction of
displacements. We plan to measure the quality of the cloned
expressions by measuring the difference between the
displacement of the source face and the displacement of the
robot face, with the magnitudes scaled to the maximum
distance from neutral of each component of the face in any
given expression. This will allow us a quantitative means to
optimize the mapping function and the construction and
placement of the robots skin and actuators. The mechanics of
the robot platform will be strengthened by using thicker coated
flex cable instead of the steel strings, and it will be given
compliant links to reduce the stress on the servo motors. We
also plan to try other materials such as silicone and gel layers
in the skin of the robot to allow for puffing of cheeks and
other regions during an expression.








Neutral Happiness Sadness Surprise Anger Fear Disgust
Fig. 5: Facial Expressions of Virtual and Robot Faces


REFERENCES
[1] Y. Zhao, X. Wang, M. Goubran, T. Whalen, and E. M. Petriu, Human
emotion and cognition recognition from body language of the head using
soft computing techniques, Journal of Ambient Intelligence and
Humanized Computing, Feb. 2012.
[2] P. Ekman, Universals and cultural differences in facial expressions of
emotion, Nebraska Symposium on Motivation, vol. 19, 1971.
[3] C. L. Breazeal, Sociable machines: expressive social exchange between
humans and robots, Massachusetts Institute of Technology, 2000.
[4] B. Fasel and J. Luettin, Automatic facial expression analysis: a
survey, Pattern Recognition, vol. 36, no. 1, pp. 259275, 2003.
[5] C. Datta, P. Tiwari, H. Y. Yang, E. Broadbent, and B. A. MacDonald,
Utilizing a closed loop medication management workflow through an
engaging interactive robot for older people, in e-Health Networking,
Applications and Services (Healthcom), 2012 IEEE 14th International
Conference on, 2012, pp. 313316.
[6] E. S. Kim, L. D. Berkovits, E. P. Bernier, D. Leyzberg, F. Shic, R. Paul,
and B. Scassellati, Social Robots as Embedded Reinforcers of Social
Behavior in Children with Autism., Journal of autism and
developmental disorders, Oct. 2012.
[7] T. Tojo, Y. Matsusaka, T. Ishii, and T. Kobayashi, A conversational
robot utilizing facial and body expressions, in SMC 2000 Conference
Proceedings. 2000 IEEE International Conference on Systems, Man and
Cybernetics. Cybernetics Evolving to Systems, Humans, Organizations,
and their Complex Interactions (Cat. No.00CH37166), 2000, vol. 2,
pp. 858863.
[8] C. Breazeal, Role of expressive behaviour for robots that learn from
people., Philosophical transactions of the Royal Society of London.
Series B, Biological sciences, vol. 364, no. 1535, pp. 352738, Dec.
2009.
[9] T. Fong, I. Nourbakhsh, and K. Dautenhahn, A survey of socially
interactive robots, Robotics and Autonomous Systems, vol. 42, no. 34,
pp. 143166, Mar. 2003.
[10] T. Hashimoto, H. Kobayashi, and N. Kato, Educational system with the
android robot SAYA and field trial, in Fuzzy Systems (FUZZ), 2011
IEEE International Conference on, 2011, pp. 766771.
[11] T. Fukuda, M. Nakashima, F. Arai, and Y. Hasegawa, Facial
expressive robotic head system for human-robot communication and its
application in home environment, Proceedings of the IEEE, vol. 92, no.
11, pp. 18511865, Nov. 2004.
[12] H. Song, Y.-M. Kim, J.-C. Park, C. H. Kim, and D.-S. Kwon, Design
of a robot head for emotional expression: EEEX, in RO-MAN 2008 -
The 17th IEEE International Symposium on Robot and Human
Interactive Communication, 2008, pp. 207212.
[13] K. Goris, J. Saldien, B. Vanderborght, and D. Lefeber, Probo, an
Intelligent Huggable Robot for HRI Studies with Children, in Human-
Robot Interaction, 2010, no. February, pp. 3342.
[14] J. Noh and U. Neumann, Expression cloning, in Proceedings of the
28th annual conference on Computer graphics and interactive
techniques, 2001, pp. 277288.
[15] J. Song, B. Choi, Y. Seol, and J. Noh, Characteristic facial retargeting,
Computer Animation and Virtual Worlds, vol. 22, no. 23, pp. 187194,
Apr. 2011.
[16] Y. Seol, J. P. Lewis, J. Seo, B. Choi, K. Anjyo, and J. Noh, Spacetime
expression cloning for blendshapes, ACM Trans. Graph., vol. 31, no. 2,
pp. 14:114:12, Apr. 2012.
[17] P. Jaeckel, N. Campbell, and C. Melhuish, Facial behaviour mapping
From video footage to a robot head, Towards Autonomous Robotic
Systems 2008: Mobile Robotics in the UK 10th British Conference on
Mobile Robotics - Towards Autonomous Robotic Systems (TAROS
2007), vol. 56, no. 12, pp. 10421049, 2008.
[18] Microsoft, Kinect for Windows Features, 2012. [Online]. Available:
http://www.microsoft.com/en-
us/kinectforwindows/discover/features.aspx.
[19] J. Ahlberg, CANDIDE-3 - An Updated Parameterised Face, vol. 1.
2001.
[20] J. Ahlberg, Candide-3.1.6, 2012. [Online]. Available:
http://www.icg.isy.liu.se/candide/candide3.wfm.
[21] S. Roberts, Animation of acting facial expressions, in Character
Animation: 2D Skills for Better 3D, 2nd ed., Burlington, USA: Focal
Press, 2007, pp. 219224.

You might also like