1. architekture.com, inc. TM
design with intelligence
Optimizing Video Conferences with
Macromedia Flash Technologies
Jim Cheng
jim.cheng@architekture.com
Allen Ellison
allen.ellison@architekture.com
February 2005
3. INTRODUCTION
It is well known that the combination of Macromedia Flash Communication Server
and Macromedia Flash Player offers many exciting possibilities for live video
conferencing. The task of choosing optimal hardware selections and software settings,
however, has remained quite burdensome and arcane. All too often, developers have
to deal with audio synchronization, frozen video images, and lag issues. Even for
seasoned Macromedia Flash developers, the task of implementing quality Flash-based
video conferencing applications becomes a challenge when confronted with the
bewildering selection of cameras, network configurations, and software settings.
However, the ability to create high-quality video conferencing experiences in Flash is
essential to meeting client expectations for many of today’s cutting-edge Flash
Communication Server applications. In the course of developing such applications for
a variety of clients during 2004, Architekture.com has conducted significant research
on optimizing high-bandwidth video conferencing applications with the goal of finding
a good balance between video and sound quality, and limiting the use of CPU and
network resources to mitigate problems associated with skipped frames, lag, or out-of-
sync sound. We are pleased to present our findings and recommendations to the
Flash developer community in this white paper.
Architekture.com is a leading Macromedia Flash development firm with recognized
expertise in Flash Communication Server. Our world-class development team creates
cutting-edge solutions that push the limits of what is thought possible. We specialize in
the development of immersive, real-time multi-player simulations, as well as rapid
prototype development and real-time business collaboration applications.
iii
Copyright 2005, Architekture.com, All Rights Reserved.
4. CONTENTS
Introduction........................................................................................................ iii
Why Optimization Matters ................................................................................... 1
Focusing on the Client Side ................................................................................. 1
Testing Environment ............................................................................................ 2
Hardware........................................................................................................... 2
Cameras ........................................................................................................ 2
Microphones................................................................................................... 8
Networking..................................................................................................... 8
Software Settings ................................................................................................ 9
Camera Settings.............................................................................................. 9
Camera.setMode() ....................................................................................... 9
Camera.setQuality()................................................................................... 10
Camera.setKeyFrameInterval()..................................................................... 13
Microphone Settings ...................................................................................... 13
Microphone.setRate() ................................................................................. 13
Microphone.setGain() and Microphone.setSilenceLevel()................................ 13
Microphone.setUseEchoSuppression() .......................................................... 14
Buffer Times.................................................................................................. 14
Embedded Video Sizes................................................................................... 14
MovieClip.attachAudio() ................................................................................ 15
Stream Latency.............................................................................................. 15
Scaling ............................................................................................................ 16
Flash Communication Server Limitations.......................................................... 16
Network Limitations ....................................................................................... 17
Client Machine Limitations ............................................................................. 18
CPU Utilization and Resolution ....................................................................... 19
Summary ......................................................................................................... 21
Appendix A: Error Margins and Significance ........................................................ 22
Appendix B: Detailed Experimental Setups and Results.......................................... 23
Camera Testing ............................................................................................ 23
Encoding/Decoding and CPU Utilization ......................................................... 27
Video Settings ............................................................................................... 30
Scaling......................................................................................................... 34
Appendix C: Where to Download Test Files ......................................................... 38
Appendix D: IIDC/DCAM Camera List ................................................................ 39
iv
Copyright 2005, Architekture.com, All Rights Reserved.
5. WHY OPTIMIZATION MATTERS
Many-to-many video conferencing on desktop computers requires significant
quantities of resources, both in terms of processor utilization and network bandwidth.
In order to achieve optimal results, it is necessary to find a good balance between
video and sound quality that limits the use of resources to a level where processor and
network loads do not introduce deleterious effects such as frame skipping, lag, or out-
of-sync sound into the video conference experience.
Poor choices in hardware selection and improper software settings often contribute to
a poor video conferencing experience, and the bewildering number of options often
makes it seem next to impossible to create high-quality video conferencing
experiences, even with best-of-breed tools. This discourages both clients and
developers alike, and convinces many that even with today’s technologies, video
conferencing applications are difficult to use and cannot meet the promise of rich
audio and visual communication between groups of individuals.
Judicious choices of optimal hardware configuration and software settings, however,
can make all the difference between a glitchy and nearly useless video conference
application, and an impressive high-quality experience that exceeds client
expectations. In the course of developing rich video conferencing applications using
Macromedia technologies, we at Architekture.com have spent many hours
determining best choices in specifying and configuring collaborative video
conferencing products for our clients. We hope that sharing our results with the Flash
developer community will lead to the development and release of many high-quality
video conferencing applications in the future.
FOCUSING ON THE CLIENT SIDE
Although Flash Communication Server plays a crucial role in facilitating video
conferencing with Flash technologies, for the most part it only serves to relay streams
from one client machine to another in live video conferencing situations. In our testing
environments, we have noted that even fairly modest server hardware setups such as a
single 2.8 GHz Pentium 4 system with 512 MB of RAM can easily accommodate
relatively intensive video conferencing situations that push the limit of a typical
professional license.
The limitations affecting video conferencing performance are instead mainly
concentrated on the client side, because this is where the bulk of the work is done.
When publishing a stream, the client machine has to acquire video and audio data,
encode it, and push it across the network to the server, all in real time. And in a many-
to-many conferencing situation, the same machine will need to subscribe to streams
published by all of the other participants, decode them in real time, and present the
1
Copyright 2005, Architekture.com, All Rights Reserved.
6. results onscreen and through the speakers or headphones—this too in real time (or as
close to it as possible). Consequently, our optimization research and
recommendations focus nearly entirely on the client-side systems.
TESTING ENVIRONMENT
Principal testing was conducted in the Architekture.com development laboratory on a
hyper-threaded 2.8 GHz Pentium 4 computer running Windows XP Professional SP1
with 1.25 GB of RAM. The Flash Communications Server application runs on a similar
processor with 512 MB of RAM under Windows Server 2003. These machines are
connected on a 100 Mbps Ethernet LAN through a switch, and tests were conducted
with in-house testing utilities running under Flash Player 7.0.19.0. We also conducted
some additional testing on machines belonging to clients for proprietary video
conferencing applications.
HARDWARE
A developer's ability to make or suggest hardware configurations for use with an
application will vary depending on client requirements. However, we have found that
the choice of hardware goes a long way in affecting the overall video conferencing
experience. Even if you are building a video conferencing application for the web and
have no control over the hardware configurations of client machines, these findings
may help in determining minimum system requirements and in optimizing software
settings for an expected range of client machines and network configurations.
Our goal in making effective hardware choices for optimal performance is to minimize
the load on the client processor and network while maintaining a high-quality audio
and video stream. During our tests, we found that high processor loads were strongly
correlated with poor performance, because the CPU’s time became divided between
processes supporting the video conference and other applications contending for
processor time.
Maintaining reasonable network loads is an important secondary consideration,
particularly in low-bandwidth settings, because available network bandwidth directly
limits the amount of data that can be transferred between the client machine and
Flash Communication Server.
CAMERAS
Cameras play a basic role in acquiring the video signal for conferencing applications.
However, the video signal itself usually requires some degree of additional processing
by the CPU before it is ready for use by the client Flash Player. Equally important are
the drivers used to interface the camera with the operating system, because poorly
2
Copyright 2005, Architekture.com, All Rights Reserved.
7. written camera drivers coupled with a camera’s high data throughput can place even
greater demands on the processor.
For most video conferencing applications, camera resolutions greater than 640 x 480
and frame rates greater than 30 frames per second (fps) are generally not necessary.
Furthermore, consumer-level cameras intended for use with video conferencing
applications seldom provide resolutions and frame rates higher than these for real-
time video feeds. Because of this, we will limit our discussion to these cameras and will
not consider those with higher resolutions or frame rates that are typically used for
scientific and industrial applications.
Most cameras designed for video conferencing use one of two serial bus architectures
for communication with the client machine: USB (typically the faster 2.0 specification),
or Firewire, also known as IEEE 1394. Firewire cameras can also be further divided in
two categories based on data transfer protocol: DV (digital video) cameras, which
provide a compressed data stream to the computer, and IIDC/DCAM cameras, which
output uncompressed data streams and also offer camera hardware control over the
Firewire bus.
Our tests, as well as available documentation, suggest that there are significant
differences in terms of overall processor demands between the various protocols used
to transfer data from the camera to the computer. To determine the processor use
required to handle video acquisition for different cameras, we conducted tests with
three representative cameras using different bus and protocol combinations for
transferring data to the client machine under identical resolution and frame rate
settings.
For our tests, we used the following cameras: Apple iSight, an IIDC/DCAM-compliant
webcam that connects through a 400-Mbit Firewire bus; Sony DCR-TRV460, a DV-
compliant consumer camcorder that also connects through 400-Mbit Firewire bus;
and Creative Labs NX Ultra, a higher-quality USB webcam.
All cameras were specified by their manufacturers as having a maximum live video
resolution of 640 x 480 pixels as well as the capability of yielding streams of up to 30
fps (with the exception of the Creative NX Ultra camera, which was limited to 15 fps
according to manufacturer specifications). Although the Sony DCR-TRV460 camera
also sports a USB connection, we only used its Firewire DV connection for our tests.
Table 1 provides an overview of the cameras we used for our tests.
3
Copyright 2005, Architekture.com, All Rights Reserved.
8. Table 1: Basic Camera Capabilities
Camera Data Bus Max. Resolution Max. FPS
Apple iSight 1394 IIDC/DCAM 640x480 30
Sony DCR-TRV460 1394 DV 640x480 30
Creative NX Ultra USB 640x480 15
We measured CPU utilization for locally visualizing video output at varying resolutions
and frame rates using each camera. To isolate the processor requirements needed to
process the video signal and import it into Flash, we conducted these tests entirely
locally using a simple Flash application running under Flash Player 7.0.19.0 without
Flash Communication Server integration.
Resolutions tested were all at the standard definition ratio of 4:3: 160 x 120, 200 x
150, 240 x 180, 320 x 240, 400 x 300, and 640 x 480 at frame rates of 1, 5, 10,
15, 24, and 30 fps. CPU utilization was measured using Windows Task Manager and
averaged over roughly 30 seconds of video acquisition with all other applications and
non-essential processes disabled.
Although data points were obtained for all cameras at our test resolutions and frame
rates, no camera supported all the resolutions natively. Actual resolution and frame
rate can be assessed programmatically after making a Camera.setMode() call
through the camera object’s width, height, and currentFps properties for
comparison. When unsupported resolutions or frame rates were requested, Flash
typically causes the video stream to be returned from the camera at a lower resolution
and scaled up for display with fairly obvious pixelization.
Figure 1 shows example frame captures illustrating this pixelization effect.
Creative Labs NX Ultra 240 x 180 Apple iSight 240 x 180
(Camera Resolution: 176 x 132) (Camera Resolution: 240 x 180)
Figure 1: Sample frame captures illustrating pixelization
In this example, a resolution of 240x180 was requested of both the Creative Labs NX
Ultra and the Apple iSight cameras. The NX Ultra, which does not support a 240x180
4
Copyright 2005, Architekture.com, All Rights Reserved.
9. capture resolution, is instead yielding a 176 x 132 stream, resulting in pixelization as
Flash scales up the image to the display resolution of 240 x 180. On the other hand,
Apple iSight natively supports a 240 x 180 capture resolution, resulting in significantly
better picture quality.
Table 2 lists the supported resolutions for each camera in the test set.
Table 2: Supported Camera Resolutions
160 x 120 200 x 150 240 x 180 320 x 240 400 x 300 640 x 480
Apple iSight Yes Yes Yes Yes No Yes
Sony DCR-TRV460 Yes No No Yes No Yes
Creative NX Ultra Yes No No Yes No Yes
The cameras tested do not all support the same range of resolutions and frame rates.
For this reason, we focused our analysis on configurations supported by multiple
cameras to determine comparative performance, even though data points were
obtained for a significantly larger set of configurations. In particular, the 160 x 120,
320 x 240, and 640 x 480 resolutions allowed commensurate comparisons between
the cameras at various frame rates up to 15 fps for all cameras, and up to 30 fps for
the Sony DCR-TRV460 and the Apple iSight cameras.
We also made a number of fairly interesting observations with regard to frame rates.
In the case of the Creative NX Ultra camera, Flash was successfully able to request
and receive video streams at frame rates up to 30 fps as reported by the
Camera.fps property, although the camera itself is specified as having a maximum
frame rate of 15 fps. We suspect this might be due to inaccurate reporting on the part
of the driver or software-level interpolation. The results from our experiments do not
yield conclusive evidence for either possibility.
Also, although the Apple iSight camera is not officially supported on the Windows
platform, we were able to use it with the default Microsoft drivers for 1394 desktop
cameras. However, when using this driver, the frame rate was capped at a maximum
frame rate of 15 fps. Using the third-party Unibrain Fire-i IIDC/DCAM driver instead
enabled us to reach the specified hardware maximum frame rate of 30 fps as shown
in Figure 2.
It should also be noted that the Creative Labs NX Ultra camera yielded significantly
noisier CPU utilization data than the other cameras during testing. We presume this is
due to USB bus usage by other devices, including our keyboard and mouse, but could
not conclusively determine the source.
Overall, the processor load results came in strongly in favor of the IIDC/DCAM-
compliant Apple iSight camera. Processor utilization for image acquisition and
importing in Flash was roughly half that required for the other two cameras at the
5
Copyright 2005, Architekture.com, All Rights Reserved.
10. same resolution and frame rate in all comparable cases, with the Unibrain Fire-i driver
slightly outperforming the Microsoft driver.
Processor utilization was roughly comparable between the Sony DCR-TRV460 and the
Creative NX Ultra cameras at low resolutions. At a resolution of 320 x 240, the DV-
compliant Sony DCR-TRV460 camera came out in the middle and outperformed the
Creative Labs NX Ultra camera, although at 640 x 480, the Sony DCR-TRV460
camera came in last when used with higher frame rates.
Also, as expected, processor utilization increases with higher resolutions and frame
rates.
From a hardware perspective, we recommend the use of IIDC/DCAM-compliant
cameras, because the uncompressed data stream appears to reduce significantly the
overhead needed to process the image for consumption by Flash, particularly if
processor resources are at a premium (for example, slower machines, visually rich
user interfaces, or video conferences involving more than two participants).
Figure 2 shows graphs of experimental results for various requested resolutions at
reported frame rates of 15, 24, and 30 fps (lower CPU utilization is better). Note that
resolutions other than 160 x 120, 320 x 240, and 640 x 480 are not directly
commensurable between cameras due to differences in actual hardware resolution.
6
Copyright 2005, Architekture.com, All Rights Reserved.
11. Resolution vs. CPU Utilization - 15 FPS
25
20
% CPU Utilization
15
10
5
0
160x120 200x150 240x180 320x240 400x300 640x480
Resolution
Resolution vs. CPU Utilization - 24 FPS
25
20
% CPU Utilization
15
10
5
0
160x120 200x150 240x180 320x240 400x300 640x480
Resolution
Resolution vs. CPU Utilization - 30 FPS
25
20
% CPU Utilization
15
10
5
0
160x120 200x150 240x180 320x240 400x300 640x480
Resolution
Figure 2: Resolution versus CPU utilization graphs
7
Copyright 2005, Architekture.com, All Rights Reserved.
12. MICROPHONES
One of the most common problems we encountered with microphones used for video
conferencing was the introduction of unwanted echoes and background noise.
Although Flash does provide an option for echo suppression via software, we have
found that we were able to obtain significantly better results when we reduced the
incidence of echoes and irrelevant background noise on the hardware level through
proper microphone selection. Echoes and ambient noise are particularly undesirable,
because they not only make speech less intelligible, but the unwanted sounds also
interfere with our ability to set accurately the silence level needed to toggle the
microphone activity state.
In the course of developing video conferencing applications for our clients, we have
experimented with a number of different microphone setups, including analog
headsets, USB headsets, and discrete microphone and speaker combinations to
determine the best configurations for obtaining high-quality sound capture while
minimizing unwanted noise. The best setup for reducing echo and ambient noise we
have found so far seems to be with noise-canceling USB headsets.
Additional improvements to audio quality that can be made through software will be
discussed later.
NETWORKING
Our video conferencing application development is, for the most part, geared towards
high-bandwidth intranet applications. For this reason, we primarily conduct our testing
over 100 Mbit Ethernet connections, with and without non-RTMP “noise” traffic. In our
experiments with up to 5 actual participants and simulated conferences involving up to
10 participants, we have not encountered any problems with network saturation thus
far. For LAN-based intranet applications, a 100 Mbit Ethernet setup appears to be
quite sufficient for video conferencing. We have not tested other local network
technologies such as 802.11, but results would be similar to those we have obtained
given ample bandwidth and network latencies commensurate with 100 Mbit Ethernet
connections.
High-quality live video conferencing over high-bandwidth Ethernet connections is
possible even at relatively high resolutions such as 320 x 240 for small numbers of
simultaneous participants. Additionally, bandwidth utilization can be capped at
reasonably low levels (for example, 38,400 bytes per second per video stream)
without significant loss of video quality given a judicious choice of video encoding
parameters as we describe later.
For lower bandwidth usage such as across the Internet, available bandwidth will be
markedly lower than that available on a LAN, and latency—the amount of time
elapsed from when the video has been encoded on one machine to when the video
8
Copyright 2005, Architekture.com, All Rights Reserved.
13. has been decoded on the recipient machine—will be increased. These issues are
essentially the facts of life when developing Internet-based applications. However, they
can be dealt with fairly effectively by minimizing bandwidth usage and allowing for
increased latency.
It should also be noted that for many-to-many video conferencing, the bandwidth
required grows exponentially relative to the number of participants. We discuss this
issue in greater depth shortly when we consider network limitations on scaling. This is
particularly relevant for cases of limited bandwidth, but is an important concern when
dealing with collaborative video conferences with increasing numbers of participants.
SOFTWARE SETTINGS
We have experimented with a large number of the possible software settings in Flash
Player 7 for video conferencing and have documented our observations in this section.
In particular, we have found that many of the typical glitches observed in video
conferencing can be addressed with changes in the settings used in the Flash Player
client-side communication objects. We also review several other interesting items that
we have found in engineering video conferencing applications.
CAMERA SETTINGS
The principal methods for manipulating the camera object in Flash Player are
setMode(), setQuality(), and setKeyFrameInterval(). As the camera
object is responsible for generating the bulk of the data needed to be streamed to
Flash Communication Server, the settings here have a significant effect on both the
video quality and the overall video conferencing experience.
We’ll consider each of these methods in turn and discuss the possible options for each
setting and our observations, test results, and recommendations for configuring an
optimal video conferencing experience.
Camera.setMode()
The Camera.setMode() method allows specification of the desired resolution and
frame rate for the video data being collected. Of course, only certain resolutions and
frame rates are supported natively by each physical camera due to hardware
limitations. If the settings specified are not intrinsically supported by the camera, Flash
Player will instead fall back to the closest possible setting. The capture size will be
favored over frame rate by default, but the preference of capture rate over frame rate
can be changed through the optional favorSize flag. While this behavior does
allow specification of practically any resolution and frame rate, we have found that
using unsupported resolutions is undesirable, because it usually results in a pixelated
image (as shown in Figure 1 earlier).
9
Copyright 2005, Architekture.com, All Rights Reserved.
14. From experience, we have found that resolutions of 160 x 120 and 320 x 240 tend to
be good choices because they seem to be supported natively by many typical cameras
used for video conferencing applications, and they are small enough to function well
when encoding for streaming. It is possible to detect programmatically whether the
specified size and frame rate were actually used for the camera hardware by
inspecting the read-only width, height, and currentFps properties.
From our previous tests conducting basic video capture without encoding for network
transport, we observed that lower resolutions and frame rates reduce the processor
demand on the machine. With this in mind, we recommend choosing the lowest
acceptable capture size and frame rate for an application. For high-bandwidth
intranet applications, we have found that a resolution of 320 x 240 at 24 fps works
relatively well for up to five simultaneous participants. For conferences intended to be
conducted across the Internet through broadband connections, capture size and frame
rate will need to be scaled down accordingly.
Camera.setQuality()
Camera.setQuality() allows specification of both the maximum bandwidth per
second to be used by an outgoing video stream, and the required video quality of the
outgoing stream. By default, these are 16384 and 0, respectively. These settings allow
for the choice of different setups, each with its own benefits.
Either parameter can be set to zero to allow Flash to automatically use as much
bandwidth as necessary to maintain a specified video quality, or to throttle video
quality to avoid exceeding the given bandwidth cap. The video quality can also be set
to 100 to use the lossless non-compressing codec instead. Also, an exact bandwidth
limit and a required video quality can be specified when both are equally important.
We have been unable to determine significant differences in processor utilization
between the various setups. However, our experiments revealed marked differences in
how Flash handles the edge cases where quality or bandwidth must be sacrificed to
remain within the specified limits. In particular, we focused on settings intended for use
in intranets with high-bandwidth network connectivity.
For the case where both a maximum bandwidth and desired frame quality are
specified, we found that a bandwidth limit between 400,000 and 900,000 bytes per
second and a frame quality setting of 60 to 85 gave very acceptable results with
smooth playback and no audio synchronization issues.
Lower frame quality settings yielded increasingly pixelated video as expected. Low
bandwidth limits, however, yielded skipped frames as described in the camera object’s
documentation.
10
Copyright 2005, Architekture.com, All Rights Reserved.
15. We also note that in cases where we chose relatively high bandwidth caps, the actual
outgoing bandwidth usage seemed to reach a maximal upper limit below the specified
cap. For example, we observed total bandwidth usage to seldom exceed 250,000
bytes per second for a 320 x 240 stream captured at 24 fps despite the fact that
maximum bandwidth was allocated for video and that the server-to-client maximum
total bandwidth on the Flash Communication Server application was set to higher
values.
With the frame quality specified and bandwidth usage left up to Flash (set to zero), we
conducted a series of experiments to determine actual bandwidth usage and observed
video quality for various frame quality settings under simulated video conferencing
conditions by publishing and self-subscribing to the same stream with the settings
recommended by Giacomo “Peldi” Guilizzoni on his weblog.
Table 3: Camera.setQuality() Basic Settings
Bandwidth: 0
FPS: 24
Favor Size: 0
Frame Quality: As Below
Key Frame Interval: 8
Camera Width: 280
Camera Height: 208
Buffer Time: 0
Audio Rate: 22 MHz
Table 4 shows the results we obtained for each specified frame quality. Outgoing
bandwidth usage per second and processor utilization were averaged over 30 seconds
of simulated video conference usage with intermittent audio and relatively little
physical motion.
Table 4: Variable Frame Quality Results
Frame Quality Bandwidth/Sec. CPU Util. (%) Subjective Findings
100 250,000 33 Excellent picture, marked frame skipping
90 68,000 29 Excellent picture, some frame skipping
80 36,000 30 Excellent picture, occasional frame skipping
70 24,000 Not Measured Faint pixelization, smooth playback
60 19,000 Not Measured Mild pixelization, smooth playback
50 13,000 Not Measured Medium pixelization, smooth playback
40 11,000 Not Measured Loss of fine detail, smooth playback
30 10,000 Not Measured Moderate loss of detail, smooth playback
20 9,000 Not Measured Severe loss of detail, smooth playback
10 8,000 27 Loss of gross detail, smooth playback
11
Copyright 2005, Architekture.com, All Rights Reserved.
16. From the data, we observed that CPU utilization dropped rather slowly with decreasing
frame quality. High frame quality yielded very high quality pictures at the cost of frame
skipping, whereas specifying lower frame quality yielded smooth playback by
sacrificing detail. The sweet spot, as it were, seems to be at about a frame quality
between 70 to 80.
It is also rather interesting to note that at a frame quality of 100 (using the lossless
codec with no compression, and causing exceptionally high bandwidth consumption),
the CPU utilization seems to be somewhat greater than when the frame quality is set to
lower values and the video data compressed.
Using similar settings with the frame quality set to 80 and varying the specified
bandwidth, we repeated the experiment to obtain the results shown in Table 5.
Table 5: Variable Bandwidth Results
Spec. Bandwidth CPU Use (%) Subjective Findings
19,200 30 Smooth, significant pixelization upon movement
38,400 Not Measured Smooth, some pixelization upon movement
51,200 Not Measured Occasional frame skips, pixelization on gross movement
76,800 Not Measured Frequent frame skips, pixelization with extreme movement
128,000 Not Measured Frequent frame skips, high-quality picture
192,000 Not Measured Frequent frame skips, high-quality picture
256,000 Not Measured Very frequent frame skips, high-quality picture
384,000 30 Constant frame skip, high-quality picture
Here, the trade-off seems to be in smooth video playback versus greater pixelization
upon movement. If the video image is very still over time, a high-quality picture can be
obtained for practically all the specified bandwidths. However, this is somewhat
impractical for most video conferencing applications where one would expect at least
a small amount of movement. The sweet spot here for a frame quality of 80 is
apparently somewhere between 38,400 to 51,200 bytes per second, though 38,400
is quite acceptable if you don’t mind momentary pixelization upon a video conference
participant’s sudden movement. Processor utilization, however, appears to be fairly
constant throughout.
Allowing Flash to modulate the frame quality as needed has the considerable benefit
of keeping the bandwidth usage capped to relatively low levels without significantly
sacrificing image quality. This is particularly important for low-bandwidth usage
scenarios, such as video conferencing over the Internet, and for scaling video
conferences to larger numbers of simultaneous participants for intranet use. It is our
preferred setting, because momentary pixelization upon gross movements is
considered preferable to frequent and unpredictable frame skipping.
However, each application may benefit from experimentation with various bandwidth
and frame quality settings, depending on requirements and preferences. Alternatively,
12
Copyright 2005, Architekture.com, All Rights Reserved.
17. Guilizzoni offers a rather handy calculator for choosing these settings with a number
of configurable options at:
http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html
Camera.setKeyFrameInterval()
The key frame interval determines how often a full key frame is published to the
stream, as opposed to an interpolated frame generated by the codec. Flash allows
values ranging from 1 to 48, with the default being 15 (every fifteenth frame is a key
frame). Testing with varying values for the key frame interval indicates that low key
frame intervals tend to contribute to increased frame skipping (as additional
bandwidth is used to transmit a full key frame more often), whereas large intervals
yield decreased to non-existent frame skipping, but introduce longer normalization
times in cases where the frame quality was automatically throttled down in response to
motion. For applications demanding very high quality video, we typically set the key
frame interval to be equal to or greater than our frame rate, because we feel that
occasionally lowered frame quality is preferable to frequent frame skipping.
MICROPHONE SETTINGS
There are several settings that can be specified for the microphone object from within
Flash. Specifically, these are the sampling rate, the gain, the silence level and time
out, and whether to enable echo suppression in the audio codec. These settings affect
sound acquisition and encoding for publishing to Flash Communication Server.
Microphone.setRate()
This method determines the sampling rate used to acquire sound from the microphone
in kilohertz (kHz). Flash allows settings of 5, 8, 11, 22, and 44 kHz, with 8 being the
default in most cases. In general, higher sampling rates yield more natural-sounding
audio with increased bandwidth usage. We generally use settings of 22 or 44 kHz to
achieve relatively high-quality audio transmission and haven’t noticed significant
performance increases with lower sampling rates.
Microphone.setGain() and Microphone.setSilenceLevel()
The gain on the microphone is applied as a multiplier for boosting the input much like
a volume knob works, with zero silencing the audio, the default level of 50 leaving the
signal strength unchanged, and a maximum value of 100. This setting is used in
conjunction with the silence level, which determines the threshold above which the
microphone is activated for publishing audio data. Optionally, the
Microphone.setSilenceLevel() method can also take a second parameter to
specify the silence timeout, which is the time in milliseconds that audio should
continue to be published after the sound level drops below the specified silence level.
We have noted that oftentimes it can be rather difficult to set the audio gain and
silence levels as precisely as we would like to enable the microphone to toggle state
13
Copyright 2005, Architekture.com, All Rights Reserved.
18. correctly. In some cases, the sweet spot for the silence level has been as narrow as
one unit, with too low a value causing the microphone to be keyed on constantly and
picking up all manner of ambient noise, while a slightly higher value would not
accurately detect a video conference participant’s voice at normal conversational
volume.
The proper choice of gain and silence level values seems to differ significantly between
individual machines and microphone setups, so we are unable to recommend specific
values outside of experimentation with particular hardware setups. We do, however,
recommend implementation of a tell-tale “talk” light in many cases so a participant
can see whether his or her audio signal is being broadcast. Too frequently, we have
seen the case of a video of a participant mouthing words silently on-screen, unaware
that the microphone remains deactivated.
If it is necessary to silence the audio programmatically in response to low activity levels
or to implement a push-to-talk feature, setting the gain to zero is an effective means of
doing this. However, we have not found setting the silence level to 100 to be effective
in all instances, because very loud microphone input can raise the activity level to 100
and thus breach the threshold.
Microphone.setUseEchoSuppression()
Flash allows for optional echo suppression through the audio codec to be toggled on
and off using ActionScript. We usually enable this with good results, although we have
found that a more effective solution to echo reduction is to use USB headsets with
noise cancellation over analog headsets or discrete microphone and speaker setups.
This has the added benefit of filtering out the majority of background noise before it
hits Flash, making it easier to get the silenceLevel setting right.
BUFFER TIMES
The NetStream object allows a buffer time to be set on both publishing and
subscribing but with significantly different effects. If set on publishing, it determines the
maximum size of the outgoing buffer that, when full, will cause the remaining frames
to be dropped. The Macromedia documentation states that this is generally not a
concern on high-bandwidth connections and we have found this to be the case in our
use. On the subscribing end, the buffer time determines the amount of data to be
buffered prior to display. We have typically set both of these to zero with excellent
results for use with live video conferencing applications.
EMBEDDED VIDEO SIZES
Our experience with sizing embedded videos suggests that processor load is
minimized when the embedded video object is sized to match the subscribed video
stream’s resolution exactly. In experiments where the displayed video is sized to be
both larger and smaller than the published resolution, we have observed increased
14
Copyright 2005, Architekture.com, All Rights Reserved.
19. processor utilization. Given that the camera resolution on a publishing machine can
be changed easily, we recommend matching subscribers’ embedded video object
sizes with the stream’s video resolution. The stream’s native resolution can be
determined programmatically on the subscriber machine by examining the attached
Video object’s width and height properties.
MOVIECLIP.ATTACHAUDIO()
In order to control certain aspects of a stream’s incoming sound (such as volume and
panning), a developer can use the MovieClip.attachAudio() method to attach
the incoming sound to a MovieClip and then control it through a Sound object as
suggested in the client-side development documentation. However, in our experience,
we have found that while such technique does provide for additional control over the
incoming sound, it also has an unfortunate tendency to desynchronize the audio
playback from the video playback. We have not found an adequate solution for this
problem as of yet and recommend against using MovieClip.attachAudio() on
live video conferencing streams.
STREAM LATENCY
Latency can be a significant problem with many video conferencing situations, and
manifests itself as the delay between events captured at the publishing machine and
their arrival and display on a subscriber machine. Because there is no native provision
for a client-side determination of latency, we measure latency by broadcasting a
message using the NetStream.send() method on a publishing machine and
timing the difference in time between the initial broadcast and the subsequent receipt
of the message on a second, self-subscribed stream. While this technique measures
data latency, all of our observations thus far indicate this directly coincides with video
latency. Therefore, we have also taken to interpreting data latency as video latency.
In the course of our research, we have noted that, upon subscribing to a live stream,
latency typically averages below 50 milliseconds (ms) when audio data is entirely
absent. However, upon playback of streamed audio data, latency will typically
increase rapidly to several hundred milliseconds with little to no recovery to previous
levels, even after audio data has ceased. We have also observed that in some cases
with continuous audio data (for example, when the microphone is always keyed on
because of significant volume or too low a silence level), the measured latency
increases slowly in a continuous manner.
While in many cases the latency will tend to level out at 200 to 400 ms (values that we
find acceptable), latency will sometimes continue to grow into the seconds, yielding a
very poor-quality video conferencing experience. While we typically can restore the
latency to low levels by closing the subscribed stream and resubscribing, such a
solution is not particularly appealing because it interrupts the video and audio for
15
Copyright 2005, Architekture.com, All Rights Reserved.
20. several seconds while the stream is reconnected. To date, we have not found an
adequate solution for capping latency at manageable levels.
It is also important to note that we have not discovered a way of automating the
measurement of audio latency, and aside from implementing a questionable
hardware-based solution such as feeding the speaker jack into the microphone jack
and monitoring the audio activity level, we are at a loss on how to measure audio
latency. A means of determining audio latency would be extremely valuable, of
course, because we could then identify and measure audio sync issues as they
occurred through automated means.
SCALING
While it is relatively easy to create a high-quality video conferencing experience for
two simultaneous participants, the demands on both the network and the machines
increase quickly as an application is scaled to involve greater numbers of
simultaneous participants. Specifically, the bandwidth needed to support many-to-
many video conferencing grows exponentially relative to the number of participants
such that n2 streams are required for n participants. (For more information on
bandwidth usage, see Brian Hock’s Macromedia white paper entitled Calculating Your
Bandwidth and Software License Needs for the Macromedia Flash Communication
Server MX.) Additionally, each client machine will need to dedicate additional
resources to handle the decoding of each subscribed stream.
These factors place upper limits on the maximum number of possible participants in a
single video conference on several fronts, the FCS server, the network infrastructure’s
available bandwidth, and the capabilities of the client machines.
FLASH COMMUNICATION SERVER LIMITATIONS
Flash Communication Server is licensed in increments of 10 Mbit per second or 2,500
simultaneous connections, so the primary consideration when it comes to scaling
Flash Communication Server to accommodate increasing numbers of video
conference participants is adequate bandwidth support by its current license(s).
For video conferencing applications, the 10 Mbit per second peak bandwidth limit will
almost surely be reached before coming close to making 2,500 simultaneous
connections. There aren’t any limits on the number of streams served, just peak
bandwidth usage and total simultaneous connections.
A single professional license offers 10 Mbit per second, or about 1.19 megabytes per
second in available bandwidth. To calculate the usage for a hypothetical case, let us
assume a fairly typical high-bandwidth video conferencing stream with a maximum of
38,400 bytes per second allocated to video data, and a 44 kHz audio sampling rate.
Experimentally, this utilizes a peak maximum of roughly 50 kilobytes per second.
16
Copyright 2005, Architekture.com, All Rights Reserved.
21. Using 50 kilobytes per second as our estimated bandwidth usage, for increasing
numbers of participants, we can generate the total streams and estimated maximum
bandwidth usage per second in Table 6.
Table 6: Example Bandwidth Calculations for n Participants
Participants Total Streams Max. Bandwidth (Bytes per Sec.)
2 4 200,000
3 9 450,000
4 16 800,000
5 25 1,250,000
6 36 1,800,000
7 49 2,450,000
8 64 3,200,000
9 81 4,050,000
10 100 5,000,000
Of course, these numbers are a rough estimate and probably err slightly on the high
side, because we are assuming that all streams are simultaneously reaching their
expected peak bandwidth utilization. However, we can use these figures to estimate
the bandwidth load on the Flash Communication Server software.
Given our earlier assumptions, a single professional license will likely become
saturated somewhere between four and five simultaneous participants. To
accommodate larger numbers of participants, the maximum bandwidth cap on Flash
Communication Server would need to be increased by stacking additional licenses or
purchasing higher capacity licenses from Macromedia.
In practice, actual bandwidth usage will depend on the choice of settings and how the
application is actually used. As screen real estate on the client side is also expected to
diminish with increasing numbers of video conference participants, we recommend a
strategy of reducing per-stream bandwidth usage with increasing numbers of
participants by scaling down the capture resolution and frame rate, video bandwidth
cap, or frame quality as the number of participants in a video conference grows.
Even with an unlimited capacity license on Flash Communication Server, the
limitations on hardware, operating system, and processor performance will eventually
impose a hard ceiling on the number of simultaneous participants supported for a
video conferencing application.
NETWORK LIMITATIONS
Much of our research has been focused on video conferencing in high-bandwidth
intranet configurations with ample network headroom. However, network limitations
should be kept in mind when scaling video conferencing applications for deployment
on all network configurations, particularly those in heavily used environments or
17
Copyright 2005, Architekture.com, All Rights Reserved.
22. across the Internet. Also, when comparing bandwidth utilization reported by Flash
Communication Server to actual bandwidth used on the physical network, some
degree of additional network overhead used for packet envelopes, retransmitted
packets, and control messages should be taken into account.
In our experience, video conferencing works very well in an intranet setting. However,
in busy local network environments, you will need to take into account additional,
non-video conference traffic such as e-mail, web browsing, and file transfers also
contending for network bandwidth. Depending on local traffic volume and the network
architecture, you may encounter lower available bandwidth and quality of service than
might be expected in ideal conditions.
While we have not encountered any problems traceable to network congestion in test
cases involving both shared and dedicated 100 Mbit Ethernet connections for our
video conferencing tests, we do suggest testing to ensure that an application runs well
in its specific network environment.
When video conferencing is conducted over the Internet, other factors come into play.
First, significantly greater latency and lower available bandwidth can be expected than
those achievable in a local network configuration, even for users with broadband
connections. Also, some users may have asymmetric upload and download bandwidth
capacities. These limitations place additional constraints on the size and quality of
video streams that can be delivered to each client. As recommended by Guilizzoni and
Hock, lowering the capture size, bandwidth and video quality of your streams will be
necessary to accommodate the limitations of Internet-based conferencing.
CLIENT MACHINE LIMITATIONS
On the client machines, the principal consideration in scaling to larger numbers of
participants is the incremental growth of the number of streams that need to be
decoded and displayed. We have conducted a number of simulated tests on single
video conference clients subscribing to and displaying up to 10 live streams without
significant problems when used with reasonable bandwidth and quality settings. We
observed that the settings in Table 7 yield very acceptable performance with smooth
playback when decoding and rendering up to 10 incoming streams on our test
machine.
Table 7: Recommended 10-Participant Settings
Bandwidth: 38400
FPS: 15
Favor Size: 0
Frame Quality: 0
Key Frame Interval: 30
Camera Width: 160
18
Copyright 2005, Architekture.com, All Rights Reserved.
23. Camera Height: 120
Buffer Time: 0
Audio Rate: 22 MHz
Average processor utilization for 10 streams utilizing the settings in Table 7 was only
36%, demonstrating that a high-quality 10-participant video conference is entirely
possible on current systems using Macromedia Flash technologies. We have also
conducted additional tests with varying parameters, but have found this combination
of settings to yield the best results.
CPU UTILIZATION AND RESOLUTION
We wanted to determine the effect of stream resolution on processor usage and
determine optimal resolutions to use with different numbers of simultaneous
participants. Using matched publishing and display resolutions, we measured
averaged CPU utilization on our test machine over 60 to 90 seconds when subscribed
to 4, 6, or 8 simulated video conferencing streams at various resolutions in 4:3 aspect
ratios using the settings in Table 8.
Table 8: CPU Utilization versus Resolution Basic Settings
Bandwidth: 38400
FPS: 24
Favor Size: 0
Frame Quality: 0
Key Frame Interval: 48
Buffer Time: 0
Audio Rate: 44 MHz
Figure 3 shows the plotted results obtained for the 4, 6, and 8 stream cases at various
resolutions
19
Copyright 2005, Architekture.com, All Rights Reserved.
24. Figure 3: CPU utilization versus stream resolution area graph
The x-axis in this graph is measured in somewhat unusual units, the area resolution of
a stream’s video feed in thousands of pixels. For example, a video resolution of 320 x
240 would yield an area of 76.8 kilopixels (320 x 240 = 76,800). To convert back
from the area to the original 4:3 aspect ratio dimensions, divide the area by 12 and
take the square root of the resulting value. This can be multiplied by 4 to obtain the
width, and by 3 to obtain the height. This unit of measurement was used so that we
could quantitatively compare various resolutions against each other. The numeric
results are provided in Appendix B.
The positions of the 160 x 120 and 320 x 240 capture resolutions that are typically
supported at the hardware level by many commonly used video conferencing cameras
are indicated on the graph to assist in reading.
At present, we suspect that the appearance of shelves, where CPU utilization remains
fairly stable across relatively substantial changes in resolution with marked changes
between certain resolutions, stems from the encoding algorithm used by Flash in
compressing video. However, we do not have sufficient information to determine
conclusively whether this is the case.
20
Copyright 2005, Architekture.com, All Rights Reserved.
25. SUMMARY
In summary, we offer these findings of optimal hardware and software configurations
for use in live conferencing applications using Flash Communication Server:
• Cameras for video conferencing differ significantly in the processor load
needed for video acquisition. We have found that Firewire cameras using the
IIDC/DCAM protocol perform significantly better than USB cameras or DV
Firewire cameras.
• USB headsets with active noise cancellation are preferred, because they
provide superior sound quality and echo reduction compared to analog
headsets or discrete setups.
• Resolutions natively supported by the camera hardware are preferable in order
to avoid pixelization. Typically, 160 x 120 and 320 x 240 are supported and
work reasonably well for streaming.
• Bandwidth utilization should be carefully balanced with image quality.
Maximizing either or both tends to yield poor results. A bandwidth limit of
about 38,400 bytes per second with an unspecified frame quality and a key
frame interval at or above the camera frame rate serves our purposes rather
well. Experimentation may be in order to find the configuration best fitting a
given application’s requirements. Giacomo Guilizzoni has provided an easy-
to-use calculator that recommends values for various setups at:
http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html
• Microphone sampling rates of 22 or 44 kHz work well. Low sampling rates,
while reducing bandwidth usage, also result in poor audio quality.
• Embedded videos used for displaying subscribed streams should be sized to
match the originating camera resolution for optimal performance.
• MovieClip.attachAudio() should not be used to manipulate the audio
from a subscribed stream. This has a tendency to introduce unwanted
synchronization issues.
21
Copyright 2005, Architekture.com, All Rights Reserved.
26. APPENDIX A: ERROR MARGINS AND SIGNIFICANCE
Most of our test results, particularly those of processor utilization read from Windows
Task Manager, were obtained by manual estimates of averages from values provided
from various tools on a periodic basis. Additionally, video conferences were typically
simulated by speaking into our USB headsets in front of our test cameras in a calm
manner for up to several minutes. Unfortunately, such practices limit our ability to
reproduce visual and audio inputs exactly for each test case.
As such, we have assumed a moderate error margin and have refrained from reading
significance into cases where only marginal differences were observed due to our
inability to obtain results with high precision or granularity.
We are actively working to obtain results with greater statistical rigor through research
in tools that would yield both better-reproducible test cases and more precise results.
Using such tools, we would be able to analyze for significant variations more
effectively.
To alleviate some of the problems that our current methods introduce, we provide
detailed experimental results and community access to our experimental tools in these
appendixes so that our tests can be reproduced and the results be compared by others
in the Flash Communication Server development community.
22
Copyright 2005, Architekture.com, All Rights Reserved.
27. APPENDIX B: DETAILED EXPERIMENTAL SETUPS AND
RESULTS
CAMERA TESTING
For our camera tests, three representative cameras supporting different protocols were
used in conjunction with our CamTest tool: Apple iSight, an IIDC/DCAM-compliant
webcam that connects via Firewire; Sony DCR-TRV460, a DV-compliant camcorder
that also connects via Firewire; and Creative Labs NX Ultra, a USB webcam. All
cameras were specified as having a maximum live video resolution of 640 x 480
pixels, the capability of yielding streams of up to 30 fps (with the exception of the
Creative NX Ultra, which was limited to 15 fps). Although the Sony DCR-TRV460
camcorder also supports a USB connection, we only tested it using its DV connection.
Table 9: Basic Camera Specifications
Camera Data Bus Max. Resolution Max. FPS
Apple iSight IIDC/DCAM 640x480 30
Sony DCR-TRV460 DV 640x480 30
Creative NX Ultra USB 640x480 15
CPU utilization for locally visualizing video output at varying resolutions and frame
rates was measured using each camera using the Windows Task Manager with all
non-essential processes disabled. To isolate the processor requirements needed to
process the video signal into Flash, these tests were conducted entirely locally using a
simple Flash application running under Flash Player 7.0.19.0 with no Flash
Communication Server integration. Resolutions tested were all at the standard
definition ratio of 4:3: 160 x 120, 200 x 150, 240 x 180, 320 x 240, 400 x 300,
and 640 x 480 at rates of 1, 5, 10, 15, 24, and 30 fps. CPU utilization was averaged
over roughly 30 seconds of video acquisition.
Table 10 provides the supported resolutions for each camera among the test set. The
footnotes provide additional details on the actual sizes of the video streams when the
given resolution was requested.
23
Copyright 2005, Architekture.com, All Rights Reserved.
28. Table 10: Supported Resolutions for Test Cameras (Extended)
160x120 200x150 240x180 320x240 400x300 640x480
Apple iSight Yes Yes Yes Yes No1 Yes
Sony DCR-TRV460 Yes No2 No2 Yes No1 Yes3
Creative NX Ultra Yes4 No5 No5 Yes4 No6 Yes4
As a result, the CPU utilization observations obtained for the 200 x 150, 240 x 180,
and 400 x 300 resolutions should be interpreted with some caution compared to the
resolutions for which all tested cameras provided matched video streams. It is
probable that the scaling of lower-resolution video streams to the originally requested
size in the Flash Player contributes somewhat to the overall CPU utilization.
Additionally, we had some issues with frame rates. In the case of the Creative NX
Ultra, although the camera itself is specified as having a maximum frame rate of 15
fps, Flash was successfully able to request and receive video streams at frame rates up
to 30 fps. We suspect this might be due to inaccurate reporting on the part of the
driver or software-level interpolation. The results from our experiments do not yield
conclusive evidence for either possibility.
In the case of the Apple iSight camera, we were only able to attain a maximum frame
rate of 15 fps, although the technical specifications state that a frame rate of 30 fps
was possible. This was likely due to the use of the generic Windows 1394 Desktop
Camera driver, because a manufacturer-supplied driver for the Windows operating
system was not available. Resolution and frame rate testing for the Apple iSight
camera was therefore limited to frame rates of 15 fps and below for the tests
described here, though at a later point, we were able to obtain a 30 fps frame rate
from the Apple iSight camera using the Unibrain third-party Fire-i drivers for 1394
IIDC/DCAM cameras as described in the main body of this white paper.
It should also be noted that results for the Creative NX Ultra camera were significantly
noisier than for the other cameras, presumably due to noise from additional USB
devices connected to the test machine.
Figure 4 presents graphs of our experimental results (lower CPU utilization is better).
1
A video stream of 320 x 240 was obtained when a 400 x 300 stream was requested.
2
Video streams of 160 x 120 were obtained when 200 x 150 and 240 x 180 streams were requested.
3
The Sony DCR-TRV460 camera produces an interlaced video stream at 640 x 480.
4
The Creative NX Ultra camera produced slightly letterboxed frames at 160 x 120, 320 x 240, and 640 x 480.
5
Video streams of 176 x 132 were obtained when 200 x 150 and 240 x 180 streams were requested.
6
A video stream of 352 x 264 was obtained when a 400 x 300 stream was requested.
24
Copyright 2005, Architekture.com, All Rights Reserved.
29. Frame Rate vs. CPU Utilization at 160x120
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
Frame Rate vs. CPU Utilization at 240x180
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
Frame Rate vs. CPU Utilization at 200x150
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
25
Copyright 2005, Architekture.com, All Rights Reserved.
30. Frame Rate vs. CPU Utilization at 320x240
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
Frame Rate vs. CPU Utilization at 400x300
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
Frame Rate vs. CPU Utilization at 640x480
25
20
% CPU Utilization
15
Apple iSight (1394 IIDC/DCAM)
Sony DCR-TRV460 (1394 DV)
Creative NX Ultra (USB)
10
5
0
0 5 10 15 20 25 30
Frames Per Second
Figure 4: Frame rate versus CPU utilization graphs
26
Copyright 2005, Architekture.com, All Rights Reserved.
31. Figure 5 shows the results of graphing the same data to compare CPU utilization at
15, 24, and 30 fps.
Resolution vs. CPU Utilization - 24 FPS
25
20
15
Sony DCR-TRV 460
Creative NX Ultra
10
5
0
160x120 200x150 240x180 320x240 400x300 640x480
Resolution vs. CPU Utilization - 30 FPS
25
20
% CPU Utilization
15
Sony DCR-TRV 460
Creative NX Ultra
10
5
0
160x120 200x150 240x180 320x240 400x300 640x480
Resolution
Figure 5: Resolution versus utilization graphs
ENCODING/DECODING AND CPU UTILIZATION
With the understanding obtained of the effects of various cameras and video stream
formats on processor utilization, we next analyzed the CPU utilization incurred by
publishing audio and video to Flash Communication Server as well as that needed for
subscribing to a stream from Flash Communication Server using our FCSDiag tool.
We tested a broadcasting-only configuration (with no local visualization), and a simple
loopback configuration where the published stream was resubscribed and rendered by
the same machine under several different video setting configurations that have
27
Copyright 2005, Architekture.com, All Rights Reserved.
32. proven to give high-quality results. The loopback case effectively simulates the load for
a participant machine in a simple 1-to-1 video conference.
The first test was conducted with a configuration that yields, as we have found through
prior work in Flash video conferencing, a relatively high-quality experience. Two
additional tests were also conducted, the first having the camera bandwidth set to
38,400 and allowing Flash to throttle the video quality dynamically, and the second
having the video quality set to 90 and the bandwidth unspecified by being set to zero,
as recent experiments have shown that these configurations also yielded relatively
high-quality results.
Table 11 lists the configurations used for each of these tests. In the graphed results
that follow, the “Publish Only” CPU utilization encompasses the CPU use needed for
video acquisition and publishing of the encoded stream to Flash Communication
Server, while the “Loopback” CPU utilization adds on the additional processor use
needed to subscribe and display the same stream on the test machine.
Table 11: Encoding/Decoding Test Configurations
Test Configuration A Test Configuration B Test Configuration C
Bandwidth: 400,000 38,400 0
7 7
FPS: 24 24 247
Favor Size: 0 0 0
Frame Quality: 85 0 90
Key Frame Interval: 48 48 48
Camera Width: 320 320 320
Camera Height: 240 240 240
Buffer Time: 0.01 0.01 0.01
Audio Rate: 22 MHz 22 MHz 22 MHz
Figure 6 shows the results graphically.
7
In practice, this results in an actual frame rate of 15 FPS for the Apple iSight due to driver limitations.
28
Copyright 2005, Architekture.com, All Rights Reserved.
33. Encoding / Decoding - Test Configuration A
30
25
20
% CPU Utilization
Publish Only
15
Loopback
10
5
0
Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7
Camera / Flash Player
Encoding / Decoding - Test Configuration B
30
25
20
% CPU Utilization
Publish Only
15
Loopback
10
5
0
Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7
Camera / Flash Player
Encoding / Decoding - Test Configuration C
30
25
20
% CPU Utilization
Publish Only
15
Loopback
10
5
0
Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7
Camera / Flash Player
Figure 6: Encoding/Decoding Graphs
29
Copyright 2005, Architekture.com, All Rights Reserved.
34. Our three test configurations yielded comparable results in terms of CPU utilization
despite the differences in settings. Because the Apple iSight camera was operating at
only 15 fps for these tests (as described earlier), we believe that the CPU utilization in
tests employing it are artificially lowered to a certain extent. As such, there do not
appear to be substantial differences in the amount of work needed to encode or
decode video from the cameras tested. In terms of subjective quality, all tested
configurations utilizing the different cameras were quite acceptable, as we had
anticipated.
Additionally, the data from this series of experiments enable us to break down CPU
usage into its constituent parts when combined with our earlier results for CPU
utilization during video acquisition with our test cameras. Applying this to the data for
Test Configuration A, we can derive the breakdown shown Figure 7. Results for the
other test configurations are similar.
Test Configuration A -- Loopback Breakdown
30
25
20
13
% CPU Utilization
Decoding
15 Encoding
10 Acquisition
10
3
10
2.5
5
3.5 9
6.5
2.5
0
Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7
Figure 7: CPU utilization breakdown for Test Configuration A
While there are slight differences in the CPU usage for encoding and decoding in the
three cases shown, the greatest factor affecting total CPU utilization in these simulated
simplest case 1-to-1 video conferencing tests remains the choice of camera.
VIDEO SETTINGS
From prior experience with video conferences involving five participants, we had
usually set both frame quality and the maximum bandwidth for the camera during
testing and used a static video resolution of 320 x 240 pixels at 24 fps (the same as
the Flash movie’s frame rate). After significant trial and error, we had arrived at the
settings shown in Table 12. These settings yielded the best overall performance with
minimal frozen frames and synchronization issues.
30
Copyright 2005, Architekture.com, All Rights Reserved.
35. Table12: Initial Video Settings
Bandwidth: 400,000-900,000
FPS: 24
Favor Size: 0
Frame Quality: 60-85
Key Frame Interval: 48
Camera Width: 320
Camera Height: 240
Buffer Time: 0.01
Audio Rate: 22 MHz
In FCSDiag loopback tests employing both video and audio input, the CPU utilization
and average latency (time for a NetStream.send call to reach the Flash
Communication Server application and return) did not show significant variation within
the range of bandwidth and frame quality settings given in Table 12 and were
essentially the same as the results obtained in Table 8 for Test Configuration A in the
encoding/decoding tests.
Subjectively, the video stream appeared very smooth and no frozen frames or
problems with audio synchronization were observed. At lower frame quality settings,
some pixelization was observed as expected.
Table 13 lists typical CPU utilization and average latency obtained using these settings
on Flash Player 7.
Table13: CPU Utilization and Latency for Cameras
Camera Avg. Latency (ms) % CPU Utilization
Apple iSight 150 13
Sony DCR-TRV460 180 21
Creative NX Ultra 180 25
It should be noted that the average latency tends to remain fairly stable, with the
loopback signal being delayed about 150 to 180 ms from real time once audio data
has been introduced to the stream. On some occasions, latency will increase to
markedly higher values (~1,500 ms) for unknown reasons and yield unsatisfactory
results as the received stream lags over a second behind real time.
We have also experimented with only setting either the maximum bandwidth or the
frame quality so as to allow Flash to manage one or the other in real-time. We were
initially introduced to such a possibility from Giacomo Guilizzoni’s weblog, where he
presented an optimal settings calculator for Flash Communication Server video
settings under different settings.
31
Copyright 2005, Architekture.com, All Rights Reserved.
36. Adapting his results to our needs, we conducted a number of tests to quantify the
effects of each of the parameters under such regimes. Our experiment results indicate
these approaches also produce relatively high-quality results with properly chosen
settings. As these tests were done using a different program than was employed in the
earlier tests in order to measure and graph bandwidth utilization in real time. The
resultant CPU utilization measures are not directly comparable to the data obtained in
previous experiments. We used the Creative NX Ultra for these experiments. For our
initial battery of tests, we set the bandwidth to 0 and throttled the frame quality from
100 down to 0 with the audio muted (to keep latency relatively constant) under the
conditions given in the Table 14.
Table14: Variable Frame Quality Settings
Bandwidth: 0
FPS: 24
Favor Size: 0
Frame Quality: As Below
Key Frame Interval: 8
Camera Width: 280
Camera Height: 208
Buffer Time: 0
Audio Rate: 22 MHz
We obtained the following results, with the average bandwidth utilization in bytes,
selected average CPU utilizations, and subjective findings for each test listed in Table
15.
Table 15: Variable Frame Quality Results
Frame Quality Bandwidth/Sec CPU Util. (%) Subjective Findings
100 250,000 33 High-quality picture, marked frame skipping
90 68,000 29 High-quality picture, some frame skipping
80 36,000 30 High-quality picture, occasional frame skipping
70 24,000 Not Measured Faint pixelization, smooth playback
60 19,000 Not Measured Mild pixelization, smooth playback
50 13,000 Not Measured Medium pixelization, smooth playback
40 11,000 Not Measured Loss of fine detail, smooth playback
30 10,000 Not Measured Moderate loss of detail, smooth playback
20 9,000 Not Measured Severe loss of detail, smooth playback
10 8,000 27 Loss of gross detail, smooth playback
Here, CPU utilization seems to drop rather slowly with decreasing frame quality. High
frame quality yielded very high-quality pictures at the cost of frame skipping, whereas
specifying lower frame quality yielded smooth playback by sacrificing detail. The sweet
spot, as it were, seems to be at about a frame quality of 70 to 80. It is also interesting
to note that at a frame quality of 100 (zero compression, accompanied by
32
Copyright 2005, Architekture.com, All Rights Reserved.
37. exceptionally high bandwidth consumption), the CPU utilization seems to be somewhat
greater than when the frame quality is set to lower values and the video data
compressed.
Subsequently, we performed another battery of experiments, this time varying the
specified bandwidth but keeping the frame quality set to 80 with settings otherwise
identical to those given in Table 6. Although a frame quality of 80 had produced
occasional frame skipping shown in Table 15, from previous experience such a value
typically yields a decent trade-off between high bandwidth and CPU utilization and low
picture quality, and so it was chosen for this set of experiments. Table 16 lists the
results.
Table 16: Variable Bandwidth Results
Spec. Bandwidth CPU Util. (%) Subjective Findings
19,200 30 Smooth, significant pixelization upon movement
38,400 Not Measured Smooth, some pixelization upon movement
51,200 Not Measured Occasional frame skips, pixelization on gross movement
76,800 Not Measured Frequent frame skips, pixelization with extreme movement
128,000 Not Measured Frequent frame skips, high quality picture
192,000 Not Measured Frequent frame skips, high quality picture
256,000 Not Measured Very frequent frame skips, high quality picture
384,000 30 Constant frame skip, high quality picture
The trade-off seems to be in smooth video playback versus greater pixelization upon
movement. If the video image is very still over time, a high-quality picture can be
obtained for practically all the specified bandwidths. The sweet spot for a frame quality
of 80 is apparently somewhere between 38,400 to 51,200 bytes per second,
although 38,400 is quite acceptable if it's acceptable to experience momentary
pixelization upon a video conference participant’s sudden movement.
Such settings also have the benefit of keeping the bandwidth usage capped relatively
low without significantly sacrificing image quality. This is of particular benefit as we
assume that keeping the bandwidth usage in check becomes increasingly necessary
when scaling the video conference to greater numbers of participants.
Additionally, several ad hoc tests indicate that a low key-frame interval tends to
contribute to increased frame skipping, whereas high key-frame intervals, particularly
ones higher than the frame rate, result in decreased frame skipping but introduce
somewhat longer normalization times in cases where the video image has become
pixelated due to motion.
Although these tests were not repeated on the Apple iSight camera or the Sony DCR-
TRV460 camcorder, the results obtained here lead to the configurations chosen for
Test Configuration B and C in the encoding/decoding tests described earlier, which
replicate a subset of these batteries for the two additional cameras.
33
Copyright 2005, Architekture.com, All Rights Reserved.
38. SCALING
The other major goal of our research is the feasibility of scaling video conferencing to
support up to 10 simultaneous participants, as one of our goals is determining both
the feasibility and extensibility of Flash video conferencing to large participant video
conference situations. To do this, we conducted a number of tests using our FCSDiag
suite of test applications.
Due to both screen size and network bandwidth constraints, we are primarily looking
at a resolution of 160 x 120 for each participant’s video stream. The principal
considerations in finding optimal settings for supporting a 10-participant conference
are maintaining a relatively low CPU utilization, as each machine will need to encode
its own stream as well as decode 10 incoming streams, and minimizing network
bandwidth utilization, as bandwidth requirements scale exponentially with the number
of participants.
Some of the initial scaling tests documented in the following tables were performed
prior to our determination that the Apple iSight camera performed significantly better
in reducing the CPU overhead involved in video acquisition. Our initial tests were
done using the Creative NX Ultra camera with relatively naïve video settings with
marginally acceptable results.
Significantly better results were obtained in tests conducted with the Apple iSight
camera incorporating refinements in the video configuration learned through testing.
Our efforts in determining optimal configurations for scaling video conferences to 10
participants are described below.
All tests were conducted with the test machine publishing its own stream, and
subscribing to and displaying n (varying between 1 and 10) streams with identical
video settings being broadcast from a second participant machine through Flash
Communication Server. This effectively simulates the load of a participant machine in
a conference with n + 1 participants where the participant machine is not monitoring
a loopback stream.
A second participant machine was used to provide the streams to be subscribed on
the test machine as this allowed us to focus the second machine’s camera (Logitech
QuickCam Orbit) on ambient street traffic outside our facility. With large numbers of
video feeds, it was significantly easier to assess frame skipping when imaging steadily
moving vehicles rather than facial movements. Audio data was collected and
published by both machines from ambient sound in the room.
Our initial test (Test 1) was conducted with the configuration shown in Table 17,
chosen to sacrifice video quality momentarily if necessary to contain bandwidth usage
to reasonable limits.
34
Copyright 2005, Architekture.com, All Rights Reserved.