This document provides a summary of the fundamentals of AV Foundation, including analog and digital media, encoding, containers, and metadata. It then discusses the different iOS media frameworks over time, including Core Audio, Media Player, and AV Foundation. Finally, it covers topics within AV Foundation like assets and players, capture, and HTTP Live Streaming. The document is intended to provide a high-level roadmap and introduction to working with media on iOS using AV Foundation.
Introduction to AV Foundation (CocoaConf, Aug '11)
1. Introduction to AV Foundation
Chris Adamson — @invalidname — http://www.subfurther.com/blog
CocoaConf ’11 — Columbus, OH — August 12, 2011
Tuesday, August 23, 11
2. Road Map
✤ Fundamentals of Dynamic Media
✤ iOS Media Frameworks
✤ Playback
✤ Capture
✤ Editing
✤ Advanced Stuff
Tuesday, August 23, 11
4. Analog
✤ Having a measurable value that is continuously variable
✤ Contrast to digital or “discrete” signals
Tuesday, August 23, 11
5. Audio
✤ Phonograph — Grooves
vibrate a needle, which is
amplified to a speaker
✤ Telephone — Voice vibrates
microphone membrane,
vibration is transmitted as
voltage and reproduced by
vibrating headset speaker
✤ Radio — Audio signal
modulated on a carrier wave
Tuesday, August 23, 11
6. Film
✤ Light projected through
translucent film frames onto
screen
✤ Each frame held in place briefly
(! 1/24 sec)
✤ Eye sees a moving image due
to “persistence of vision”
✤ Sound may be out-of-band
Tuesday, August 23, 11
7. Television
✤ Chrominance and luminance
sent as continuous AM signal
✤ CRT gun sweeps across screen
in zig-zag pattern, illuminating
phosphors
✤ Sound is FM in adjacent
spectrum
Tuesday, August 23, 11
8. Digital Media
✤ Represents a continuous signal numerically
✤ Audio — sample the signal at some frequency
✤ Video — digital images as frames
✤ Other kinds of samples — text (captions, subtitles), metadata, web
links, executable code, etc.
Tuesday, August 23, 11
9. Encoding
✤ How do we turn a digital signal into numbers
✤ Audio — Pulse Code Modulation (PCM). Each sample represents
amplitude of audio signal at a specific time.
✤ Compressed audio — Lossless and lossy transformations to and from
PCM
✤ Video — Series of images (e.g., M-JPEG), or keyframe image (i-frame)
followed by deltas (p-frames and b-frames)
✤ Other media — Text samples are just strings
Tuesday, August 23, 11
10. Containers
✤ Transport and/or storage of encoded streams
✤ Examples: MP3, AIFF, QuickTime Movie, Core Audio Format, .mp4,
MPEG-2 transport stream
✤ Containers may be optimized for streaming, editing, end-user
delivery, etc.
Tuesday, August 23, 11
13. Metadata
✤ Information related to the audio data other than the signal itself
✤ Song title/album/artist, movie title, TV episode title/series, etc.
✤ Some containers support metadata, otherwise it is provided out-of-
band
Tuesday, August 23, 11
14. Keep in mind…
✤ Different codecs may go in different containers
✤ A network stream is a container
✤ A media stream and a network stream are two different things
✤ Containers can contain multiple media streams
✤ A stream’s data is not necessarily in the container file
✤ Media samples may be in distinct places, or interleaved
Tuesday, August 23, 11
16. iPhone 2 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player Full-screen video player
Obj-C wrapper for audio
AV Foundation
playback (2.2 only)
Tuesday, August 23, 11
17. iPhone 3 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player iPod library search/playback
Obj-C wrapper for audio
AV Foundation
playback, recording
Tuesday, August 23, 11
18. iOS 4 Media Frameworks
Core Audio /
Low-level audio streaming
OpenAL
Media Player iPod library search/playback
Audio / Video capture, editing,
AV Foundation
playback, export…
Core Video Quartz effects on moving images
Objects for representing media
Core Media
times, formats, buffers
Tuesday, August 23, 11
28. “Boom Box” APIs
✤ Simple API for playback, sometimes
recording
✤ Little or no support for editing,
mixing, metadata, etc.
✤ Example: HTML 5 <audio> tag
Tuesday, August 23, 11
29. “Streaming” APIs
✤ Use “stream of audio” metaphor
✤ Strong support for mixing, effects,
other real-time operations
✤ Example: Core Audio
Tuesday, August 23, 11
30. “Streaming” APIs
✤ Use “stream of audio” metaphor
✤ Strong support for mixing, effects,
other real-time operations
✤ Example: Core Audio
and AV Foundation (capture)
Tuesday, August 23, 11
31. “Document” APIs
✤ Use “media document” metaphor
✤ Strong support for editing
✤ Mixing may be a special case of
editing
✤ Example: QuickTime
Tuesday, August 23, 11
32. “Document” APIs
✤ Use “media document” metaphor
✤ Strong support for editing
✤ Mixing may be a special case of
editing
✤ Example: QuickTime
and AV Foundation (playback and editing)
Tuesday, August 23, 11
33. AV Foundation Classes
✤ Capture
✤ Assets and compositions
✤ Playback, editing, and export
✤ Legacy classes
Tuesday, August 23, 11
36. AVAsset
✤ A collection of time-based media data
✤ Sound, video, text (closed captions, subtitles, etc.)
✤ Each distinct media type is contained in a track
✤ An asset represents the arrangement of the tracks. The tracks
represent the traits of the media’s presentation (volume, pan, affine
transforms, opacity, etc.).
✤ Asset ≠ media. Track ≠ media. Media = media.
✤ Also contains metadata (where common to all tracks)
Tuesday, August 23, 11
37. AVAsset subclasses
✤ AVURLAsset — An asset created from a URL, such as a song or
movie file or network document/stream
✤ AVComposition — An asset created from assets in multiple files, used
to combine and present media together.
✤ Used for editing
Tuesday, August 23, 11
38. AVPlayer
✤ Provides the ability to play an asset
✤ play, pause, seekToTime: methods; currentTime, rate properties
✤ Init with URL or with AVPlayerItem
NSURL *url = [NSURL URLWithString:
@"http://www.subfurther.com/video/running-start-
iphone.m4v"];
AVURLAsset *asset = [AVURLAsset URLAssetWithURL:url
! ! ! ! ! ! ! ! options:nil];
AVPlayerItem *playerItem = [AVPlayerItem
playerItemWithAsset:asset];
player = [[AVPlayer playerWithPlayerItem:playerItem]
retain];
Tuesday, August 23, 11
39. AVPlayerLayer (or not)
✤ CALayer used to display video from a player
✤ Check that the media has video
NSArray *visualTracks = [asset tracksWithMediaCharacteristic:
AVMediaCharacteristicVisual];
if ((!visualTracks) ||
! ([visualTracks count] == 0)) {
! playerView.hidden = YES;
! noVideoLabel.hidden = NO;
}
Tuesday, August 23, 11
40. AVPlayerLayer (no really)
✤ If you have video, create AVPlayerLayer from AVPlayer.
✤ Set bounds and video “gravity” (bounds-filling behavior)
else {
! playerView.hidden = NO;
! noVideoLabel.hidden = YES;
! AVPlayerLayer *playerLayer = [AVPlayerLayer
playerLayerWithPlayer:player];
! [playerView.layer addSublayer:playerLayer];
! playerLayer.frame = playerView.layer.bounds;
! playerLayer.videoGravity =
AVLayerVideoGravityResizeAspect;
}
Tuesday, August 23, 11
43. HTTP Live Streaming
✤ Audio / Video network streaming standard developed by Apple
✤ Replaces RTP/RTSP
✤ Built-in support in iOS (AV Framework, Media Player) and Mac OS X
10.6 (QTKit)
✤ Required for apps that stream more than 10 MB over cellular network
Tuesday, August 23, 11
44. How HTTP Live Streaming works
✤ Segmenting server splits source media into separate files
(usually .m4a for audio-only, .ts for A/V), usually of about 10 seconds
each, and creates an .m3u8 playlist file
✤ Playlist may point to bandwidth-appropriate playlists
✤ Clients download the playlist, fetch the segments, queue them up
✤ Server updates playlist periodically with latest segments; clients
refresh playlist, fetch and queue new segments
Tuesday, August 23, 11
46. HTTP Live Streaming wins
✤ Works with existing file servers and content delivery networks
✤ Port 80 is never blocked
✤ Adapts to changes in available bandwidth
✤ Can be encrypted
✤ Has been submitted as a proposed IETF standard
✤ http://tools.ietf.org/html/draft-pantos-http-live-streaming-04
Tuesday, August 23, 11
47. HTTP Live Streaming fails
✤ Not really “live” when buffer can be a minute long
✤ Can’t watch a game on TV and listen to HLS web radio for the
audio
✤ Modest adoption outside of the Apple world
✤ VLC 1.2, third-party Silverlight project, partial support in Android
3.0
Tuesday, August 23, 11
48. HLS in AVF
NSURL *url = [NSURL URLWithString:@"http://
qthttp.apple.com.edgesuite.net/11piubpwiqubf06/sl_vod.m3u8"];
AVURLAsset *asset = [AVURLAsset URLAssetWithURL:url
! ! ! ! ! ! ! ! options:nil];
✤ Just use the .m3u8 to create an AVAsset like any other NSURL
✤ Do not check for presence of visual tracks immediately
✤ AVF needs to start parsing the stream before it knows what kinds
of tracks it has
Tuesday, August 23, 11
50. Media Capture
✤ AV Foundation capture classes for audio / video capture, along with
still image capture
✤ Programmatic control of white balance, autofocus, zoom, etc.
✤ Does not exist on the simulator. AV Foundation capture apps can
only be compiled for and run on the device.
✤ API design is borrowed from QTKit on the Mac
Tuesday, August 23, 11
53. Capture basics
✤ Create an AVCaptureSession to coordinate the capture
✤ Investigate available AVCaptureDevices
✤ Create AVCaptureDeviceInput and connect it to the session
✤ Optional: set up an AVCaptureVideoPreviewLayer
✤ Optional: connect AVCaptureOutputs
✤ Tell the session to start recording
Tuesday, August 23, 11
55. Getting capture device and input
AVCaptureDevice *videoDevice = [AVCaptureDevice
defaultDeviceWithMediaType: AVMediaTypeVideo];
if (videoDevice) {
! NSLog (@"got videoDevice");
! AVCaptureDeviceInput *videoInput = [AVCaptureDeviceInput
deviceInputWithDevice:videoDevice
! ! ! ! ! ! ! ! error:&setUpError];
! if (videoInput) {
! ! [captureSession addInput: videoInput];
! }
}
Note 1: You may also want to check for AVMediaTypeMuxed
Note 2: Do not assume devices based on model (c.f. iPad
Camera Connection Kit)
Tuesday, August 23, 11
56. Creating a video preview layer
AVCaptureVideoPreviewLayer *previewLayer =
[AVCaptureVideoPreviewLayer
layerWithSession:captureSession];
previewLayer.frame = captureView.layer.bounds;
previewLayer.videoGravity =
AVLayerVideoGravityResizeAspect;
[captureView.layer addSublayer:previewLayer];
Keep in mind that the iPhone cameras have a
portrait orientation
Tuesday, August 23, 11
57. Setting an output
captureMovieOutput = [[AVCaptureMovieFileOutput alloc] init];
if (! captureMovieURL) {
! captureMoviePath = [getCaptureMoviePath() retain];
! captureMovieURL = [[NSURL alloc]
initFileURLWithPath:captureMoviePath];
}
NSLog (@"recording to %@", captureMovieURL);
[captureSession addOutput:captureMovieOutput];
We’ll use the captureMovieURL later…
Tuesday, August 23, 11
58. Start capturing
[captureSession startRunning];
recordButton.selected = YES;
if ([[NSFileManager defaultManager]
fileExistsAtPath:captureMoviePath]) {
! [[NSFileManager defaultManager]
removeItemAtPath:captureMoviePath error:nil];
}
// note: must have a delegate
[captureMovieOutput
startRecordingToOutputFileURL:captureMovieURL
! ! ! ! ! ! ! ! recordingDelegate:self];
Tuesday, August 23, 11
60. More fun with capture
✤ Can analyze video data coming off the camera with the
AVCaptureVideoDataOutput class
✤ Can provide uncompressed frames to your
AVCaptureVideoDataOutputSampleBufferDelegate
✤ The callback provides you with a CMSampleBufferRef
Tuesday, August 23, 11
62. Core Media
✤ C-based framework containing structures that represent media
samples and media timing
✤ Opaque types: CMBlockBuffer, CMBufferQueue,
CMFormatDescription, CMSampleBuffer, CMTime, CMTimeRange
✤ Handful of convenience functions to work with these
✤ Buffer types provide wrappers around possibly-fragmented memory,
time types provide timing at arbitrary precision
Tuesday, August 23, 11
72. AVComposition
✤ An AVAsset that gets its tracks from multiple file-based sources
✤ To create a movie, you typically use an AVMutableComposition
composition = [[AVMutableComposition alloc] init];
Tuesday, August 23, 11
74. CMTime
✤ CMTime contains a value and a timescale (similar to QuickTime)
✤ Time scale is how the time is measured: “nths of a second”
✤ Time in seconds = value / timescale
✤ Allows for exact timing of any kind of media
✤ Different tracks of an asset can and will have different timescales
✤ Convert with CMTimeConvertScale()
Tuesday, August 23, 11
76. Export
✤ Create an AVAssetExportSession
✤ Must set outputURL and outputFileType properties
✤ Inspect possible types with supportedFileTypes property (list of
AVFileType… strings in docs)
✤ Begin export with exportAsynchronouslyWithCompletionHandler:
✤ This takes a block, which will be called on completion, failure,
cancellation, etc.
Tuesday, August 23, 11
78. Coming Up in Advanced…
✤ Capture-time data processing
✤ Writing raw samples
✤ Reading raw samples
✤ Multi-track editing with effects
Tuesday, August 23, 11
79. Q&A Time
✤ Cleaned-up code will be available on my blog:
✤ http://www.subfurther.com/blog
✤ invalidname [at] gmail.com
✤ @invalidname
✤ Check out the Rough Cut of my Core Audio book too.
✤ Get the WWDC sessions on AV Foundation
Tuesday, August 23, 11