This work was presented at the Open Standards session at the IEEE ISMAR 2013 event. It provides a detailed overview and working examples that show exactly where Augmented Reality and Computer Vision are up to on the Web Platform.
This presentation also provides a detailed description of how to define exactly what the Augmented Web is.
Scaling API-first – The story of a global engineering organization
Web Standards for AR workshop at ISMAR13
1. https://buildAR.com - image credit
Web Standards for AR
An intro to the latest developments
on the Augmented Web Platform
2. Who's making these outlandish claims?
Rob Manson @nambor
CEO of MOB-labs the creators of buildAR.com
Chair of the W3C's Augmented Web Community Group
Invited Expert with the ISO, W3C & the Khronos Group
Author of “Getting started with WebRTC” - (Packt Publishing)
https://buildAR.com
15. So what the hell is a
Stream processing pipeline?
https://buildAR.com
The vision we proposed in 2010 is now here!
ARStandards Workshop in Seoul 2010 – Rob Manson
16. https://buildAR.com
Stream processing pipelines
1. Get Stream
2. Connect to a Stream pipeline
A way to connect a Stream to an ArrayBuffer
3. Get ArrayBuffer
5. Populate a scene Array with Array Buffer Views
6. Move through Views to process data
7. Output events and metadata
8. Update UI and/or send requests
17. Now lets look at the more specific
MediaStream processing pipeline
https://buildAR.com
18. https://buildAR.com
MediaStream processing pipelines
1. Get MediaStream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture, Recording,
ScriptNodePorcessor or Video/Shader pipelines
3. Get ArrayBuffer
new ArrayBuffer()
5. Populate a scene Array with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
19. https://buildAR.com
This is where the APIs are still evolving
1. Get MediaStream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture, Recording,
ScriptNodePorcessor or Video/Shader pipelines
3. Get ArrayBuffer
new ArrayBuffer()
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
5. Populate a scene Array with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
20. https://buildAR.com
This is where the Augmented Web comes to life
1. Get MediaStream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture, Recording,
ScriptNodePorcessor or Video/Shader pipelines
3. Get ArrayBuffer
new ArrayBuffer()
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
5. Populate a scene Array with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
21. https://buildAR.com
This is where the fun is!
1. Get MediaStream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture, Recording,
ScriptNodePorcessor or Video/Shader pipelines
3. Get ArrayBuffer
new ArrayBuffer()
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
5. Populate a scene Array with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
26. https://buildAR.com
Now we've broken through the “binary barrier”
The Video/Canvas pipeline
stream -> <video> -> <canvas> -> image data -> array buffer -> process -> output
27. https://buildAR.com
Now we've broken through the “binary barrier”
The Video/Canvas pipeline
stream -> <video> -> <canvas> -> image data -> array buffer -> process -> output
The Mediastream Image Capture API pipeline
stream -> track -> image capture -> image data -> array buffer -> process -> output
28. https://buildAR.com
Now we've broken through the “binary barrier”
The Video/Canvas pipeline
stream -> <video> -> <canvas> -> image data -> array buffer -> process -> output
The Mediastream Image Capture API pipeline
stream -> track -> image capture -> image data -> array buffer -> process -> output
The MediaStream Recording API pipeline
stream -> recorder -> blob -> file reader -> array buffer -> process -> output
29. https://buildAR.com
Now we've broken through the “binary barrier”
The Video/Canvas pipeline
stream -> <video> -> <canvas> -> image data -> array buffer -> process -> output
The Mediastream Image Capture API pipeline
stream -> track -> image capture -> image data -> array buffer -> process -> output
The MediaStream Recording API pipeline
stream -> recorder -> blob -> file reader -> array buffer -> process -> output
Plus other non-MediaStream pipelines:
????? -> Web Sockets/XHR/File/JS -> array buffer -> process -> output
30. Now lets take a closer look at the
Video/Canvas MediaStream pipeline
https://buildAR.com
31. https://buildAR.com
The Video/Canvas MediaStream pipeline
Access the camera (or some other stream) source
a. getUserMedia()
Setup <video> element in the DOM
a. declaratively then via getElementById or similar
b. createElement(“video”) (no need to appendChild())
Pipe camera stream into <video>
a. video.src = stream
Setup <canvas> element in the DOM
a. declaratively then via getElementById or similar
b. createElement(“canvas”) then appendChild()
Get 2D drawing context
a. canvas.getContext('2d');
Draw <video> frame onto <canvas>
a. canvas.drawImage(video, top, left, width, height);
Get RGBA Uint8ClampedArray of the pixels
a. context.getImageData(top, left, width, height).data.buffer;
Burn CPU (not GPU) cycles
a. fore each(var obj in view) { … }
NOTE: Integrate other streams & sensor data here
Render results
a. using HTML/JS/CSS
b. using another <canvas> and drawImage()
c. using WebGL
d. a combination of all
1. Get MediaStream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture or Recording pipelines
3. Get ArrayBuffer
new ArrayBuffer()
5. Populate a scene with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
Follow along on github: github.com/buildar/getting_started_with_webrtc
32. https://buildAR.com
The Video/Canvas MediaStream pipeline
Follow along on github: github.com/buildar/getting_started_with_webrtc
Access the camera (or some other stream) source
a. getUserMedia()
Setup <video> element in the DOM
a. declaratively then via getElementById or similar
b. createElement(“video”) (no need to appendChild())
Pipe camera stream into <video>
a. video.src = stream
Setup <canvas> element in the DOM
a. declaratively then via getElementById or similar
b. createElement(“canvas”) then appendChild()
Get 2D drawing context
a. canvas.getContext('2d');
Draw <video> frame onto <canvas>
a. canvas.drawImage(video, top, left, width, height);
Get RGBA Uint8ClampedArray of the pixels
a. context.getImageData(top, left, width, height).data.buffer;
Burn CPU (not GPU) cycles
a. fore each(var obj in view) { … }
NOTE: Integrate other streams & sensor data here
Render results
a. using HTML/JS/CSS
b. using another <canvas> and drawImage()
c. using WebGL
d. a combination of all
setInterval()
RequestAnimationFrame()
setTimeout()
33. Now lets compare that to the
WebGLSL based Video/Shader pipeline
https://buildAR.com
34. https://buildAR.com
The Video/Shader MediaStream pipeline
Access the camera (or some other stream) source
a. getUserMedia()
Setup <video> element in the DOM
a. declaratively then via getElementById or similar
b. createElement(“video”) (no need to appendChild())
Pipe camera stream into <video>
a. video.src = stream
Compile shaders from text in elements in the DOM
a. declaratively then via getElementById or similar
b. createElement(“canvas”) then appendChild()
Get 3D drawing context
a. canvas.getContext('webgl|experimental-webgl');
Load <video> frame into fragment shader
a. gl.texImage2D(..., video)
Burn GPU cycles
a. for each(var obj in view) { … }
NOTE: Integrate other streams & sensor data here
Render results
a. using HTML/JS/CSS
b. using another <canvas> and drawImage()
c. using WebGL
d. a combination of all
setInterval()
RequestAnimationFrame()
setTimeout()
37. https://buildAR.com
Views are moveable windows
into the raw data streams
1. Get stream
MediaStream from getUserMedia or RTCPeerConnection
2. Connect to a MediaStream pipeline
Canvas/Video, Image Capture or Recording pipelines
3. Get ArrayBuffer
new ArrayBuffer()
6. Move through Views to process data
for each (var view in scene) { …process(view)... }
7. Output events and metadata
return results
8. Update UI and/or send requests
Wave hands inspriationally here!
5. Populate a scene Array with Array Buffer Views
var scene = [new Uint8ClampedArray(buffer), ...]
41. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
Mobile AR at end of 2009 Augmented Web at end of 2013
42. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
Mobile AR at end of 2009 Augmented Web at end of 2013
43. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
QR codes, Geo AR and Fiducial
Marker tracking were commonly
available but Natural Feature
Tracking was still evolving.
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
Mobile AR at end of 2009 Augmented Web at end of 2013
44. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
QR codes, Geo AR and Fiducial
Marker tracking were commonly
available but Natural Feature
Tracking was still evolving.
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
QR codes, Geo AR and Fiducial
Marker tracking are now available
and Natural Feature Tracking is
quickly evolving in our R&D.
Mobile AR at end of 2009 Augmented Web at end of 2013
45. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
QR codes, Geo AR and Fiducial
Marker tracking were commonly
available but Natural Feature
Tracking was still evolving.
There was limited if any support for
rich 3D on all the mainstream AR
browsers.
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
QR codes, Geo AR and Fiducial
Marker tracking are now available
and Natural Feature Tracking is
quickly evolving in our R&D.
Mobile AR at end of 2009 Augmented Web at end of 2013
46. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
QR codes, Geo AR and Fiducial
Marker tracking were commonly
available but Natural Feature
Tracking was still evolving.
There was limited if any support for
rich 3D on all the mainstream AR
browsers.
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
QR codes, Geo AR and Fiducial
Marker tracking are now available
and Natural Feature Tracking is
quickly evolving in our R&D.
Rich 3D is supported in Chrome
and Firefox on Android. Safari
supports WebGL, and iOS devices
support OpenGL, but iSafari does
not have WebGL enabled
Mobile AR at end of 2009 Augmented Web at end of 2013
47. https://buildAR.com
Mobile AR vs. the Augmented Web
Apple had not yet opened up
camera access on iOS through
objective-c.
See the Letter from Ori
QR codes, Geo AR and Fiducial
Marker tracking were commonly
available but Natural Feature
Tracking was still evolving.
There was limited if any support for
rich 3D on all the mainstream AR
browsers.
Apple has not yet opened up
camera access by adopting
getUserMedia()/WebRTC.
See 1.0 spec within ~5mths
QR codes, Geo AR and Fiducial
Marker tracking are now available
and Natural Feature Tracking is
quickly evolving in our R&D.
Rich 3D is supported in Chrome
and Firefox on Android. Safari
supports WebGL, and iOS devices
support OpenGL, but iSafari does
not have WebGL enabled
(except for advertisers!!!?).
Mobile AR at end of 2009 Augmented Web at end of 2013
48. https://buildAR.com
What's in the near future?
Integrating WebRTC and Visual Search
Using WebGL/GLSL to utilise GPU parallelism
Khronos Group's OpenVX
Khronos Group's Camera Working Group
Lots more demos to share! 8)
49. https://buildAR.com
What's in the near future?
Integrating WebRTC and Visual Search
Using WebGL/GLSL to utilise GPU parallelism
Khronos Group's OpenVX
Khronos Group's Camera Working Group
Lots more demos to share! 8)
50. https://buildAR.com
What's in the near future?
Integrating WebRTC and Visual Search
Using WebGL/GLSL to utilise GPU parallelism
Khronos Group's OpenVX
Khronos Group's Camera Working Group
Lots more demos to share! 8)
51. https://buildAR.com
What's in the near future?
Integrating WebRTC and Visual Search
Using WebGL/GLSL to utilise GPU parallelism
Khronos Group's OpenVX
Khronos Group's Camera Working Group
Lots more demos to share! 8)