2. Чтобы не скучали…
• Возьмите свой смартфон/планшет/ноутбук
• Поставьте приложение Socrative Student
• С компьютера – socrative.com
• Зайдите (Google login или регистрация)
• Введите код ictmiem
• Там будут появляться вопросы, отвечайте.
4. Природа информации (в видео)
• Аналоговая
• Требуется оцифровка
• Аналого-цифровое преобразование – что это?
• Какие два основных процесса происходят при
АЦП?
• Дискретизация (Sampling)
• Квантование (Quantization) цветовые
пространства и модели
5. + Дискретизация (Sampling)
• О том, как превратить свет и звук в цифры
• Могут встречаться формулы!
• Надо будет вспомнить физические основы из КГ
• Все это по материалам Параграфа 2 из Книги*
• * Рекомендованная вам книга Compression for Great Video and Audio – B. Waggoner, Elsevier,
2010, p. 15
6. Аналоговая природа и цифровая форма сигнала
• Свет и звук – это непрерывный аналоговый
сигнал.
• Сигнал для датчиков (глаза, уха, камеры,
микрофона)
• Непрерывный – значит бесконечно подробный
• Мы не можем записать в дискретной форме всё -
>
• Как часто мы будем замерять сигнал?
• Это частота дискретизации
8. Теорема Найквеста-Шеннона, она же – Котельникова
• Теорема Найквеста-Шеннона
• В России называется теоремой Котельникова
Если аналоговый сигнал имеет конечный
(ограниченный по ширине) спектр,
то он может быть восстановлен однозначно и без
потерь по своим отсчётам, взятым с частотой,
большей или равной удвоенной верхней частоте
10. Критерий Найквеста в пространстве
3 px
256/320*3=2,4
2 px
256/320*2=1,6
320x240 -> 256x192
11. Квантование
• Сколько значений может принимать каждый
дискретный элемент (sample) 2^x bit
• Помним допустимые пределы:
8bit RGB: 0 255 (256)
8bit YCbCr: 16 235 (219) – «так принято»
12. Динамический диапазон
• Помните, что это?
• Для монитора отношение яркости около 4000:1
• В кинозале шагов может быть чуть больше 100.
• Человек видит разницу, а не абсолютное
значение
• Для яркости могут выделять больше бит (10-16)
• Цветность обычно урезают (субдискретизация)
13. Работа с видеокодеками
• Профиль
• Уровень
• Поток (data rate)
H.264 High 4:2:0 @ 2.1
Кодек Профиль
Цветовая
субдискретизация Уровень
20. Размер кадра
Rule of “ˆ0,75”
640 x 360 @ 1000 Kbps
1280 x 720 @ ?
(1280 x 720)/(640 x 360) = 4
40.75 = 2.828
1000 Kbps x 2.828 = 2828 Kbps
21. Неквадратный пиксель
• Когда это требуется выходным форматом (DVD).
• Когда это определено исходным форматом.
• При активном движении по одной оси.
• Компьютерное видео имеет квадратный пиксель.
22. Глубина цвета
• Обычно: 8 бит/канал
• Профессиональные кодеки: 10, 12, 16 бит/канал
• Для съемки и промежуточного сохранения
• Просмотр – только 8 бит/канал.
25. B-кадры
• Минимальный размер,
• Могут быть хуже, на них никто не ссылается.
• Сэкономленное место уходит в I, P кадры, на которых
строятся те же B кадры.
• Их можно пропускать, от них другие не зависят
• Упрощают навигацию – меньше P кадров декодировать
28. Некоторые особенности H.264
• CABAC Entropy Coding
• Context Adaptive Binary Arithmetic Coding
• +40% нагрузки на декодер*
• +10-20% эффективности сжатия (при сильном сжатии)
• Не используется в Baseline профиле.
* По сравнению с ранее применявшимся CAVLC.
Compression Efficiency
Compression efficiency is the key competitive feature between codecs and bitstreams. When people talk about a “best codec” they’re really talking “most efficient.” When a codec is described as “20 percent better than X” this means it can deliver equivalent quality at a 20 percent lower data rate, even if it’s used to improve quality at the original data rate.
Some authoring and special-use codecs don’t offer a data rate control at all, either because the data rate is fixed (as in DV) or because it is solely determined by image complexity (like PNG, Lagarith, and Cineform).
Back in 1875, the metric system was fully codified, and it defined the common kilo-, mega-, and giga- prefixes, each 1000x greater than the one before. These are power-of-ten numbers, and thus can be written in scientific notation.
However, computer technology is based on binary calculation and hence uses power- of-two numbers. Ten binary digits (1 210) is 1024, very close to three decimal digits (1 102) 1000. And so computer folks started calling 1024 “kilo.” And then extended that to mega, tera, and on to penta and so on.
All interframe compressed video codecs are variable in the sense that not every frame uses the same number of bits as every other frame.
Even codecs labeled “CBR” can vary data rate quite a bit throughout the file. The only true CBR video codecs are some fixed frame-size authoring codecs like DV.
A CBR codec will vary quality in order to maintain bitrate, and a VBR codec will vary bitrate in order to maintain quality.
What the decoder really cares about is getting new video data fast enough that it’s able to have a frame ready by the time that frame needs to be displayed, but doesn’t get so much video data that it can’t store all the frames it hasn’t decoded yet.
Video buffering verifier (VBV) defines what a decoder has to be able to handle for a particular profile@level, and hence how much variability the encoder could face.
A 4-second buffer at 1000 Kbps would be 4000 Kbits, or 500,000 bytes.
a stream is a constant flow of water. If the video fades to black for a few seconds, there simply isn’t enough detail to spend it on.
Depending on format, it’s perfectly possible to use VBR files for streaming. This is sometimes used as a cost-saving measure in order to reduce total bandwidth consumption.
The average and peak rates are really independent axes. Average bitrate gets down to how big a file you want, and the peak is based on how much CPU you need to play it back. Takea DVD with video at a pretty typical 5 Mbps average 9 Mbps peak. If a bunch of content gets added, meaning more minutes need to be stuffed into the same space, the average bitrate may need to be dropped to 4 Mbps, but the 9 Mbps maximum peak will stay the same. Conversely, if the project shifts from using replication to DVD-R without any change in content, the peak bitrate may be dropped to 6.5 Mbps for better compatibility with old DVD players, but the average wouldn’t need to change.
VBR when data rate isn’t controlled at all; each frame gets as many bits as it needs in order to hit the quality target.
In some cases, this can be a fixed- QP encode. Other models can be a little more sophisticated in targeting a constant perceptual quality; for example allowing B-frames to have higher QP.
Lrft: these graphs all aim to show the relationship between two values over time: data rate and the quantization parameter (Qp).
Right: three CBr encodes with the same bitrate, with a 1, 4, and 8 second buffer. the larger buffer results in a little more variability in bitrate (higher spikes) anda slight reduction in variability of Qp
Left: three 500Kbps encodes with different peaks: one CBR (so 500Kbps peak), one VBr at 750 Kbps peak, the last VBr with 1500 Kbps peak. as bitrate variability increases, we see bigger changes in bitrate, but much smaller changes in Qp.
Right: three encodes with the same peak bitrate of 750Kbps, but different averages of 250, 500, and 750. as you’d expect, the CBr encode has a flat bitrate and a pretty variable Qp, always lower than the streams with a lower average. But the 500 and 750 Kbps streams match closely in terms of the hardest part of the video, where they both use the full 750 Kbps peak and hit the same Qp.
Any live encoding will require 1-pass encoding.
The limitation of traditional 1-pass codecs is that they have no foreknowledge of how complex future content is and thus how to optimally set frame types and distribute bits.
most VC-1 implementations are able to buffer up to 16 frames to analyze for scene changes, fades to/from black, and flash frames, and then set each frame for the optimum mode. Some also support lookahead rate control, where bitrate itself is tuned based on future frames in the buffer.
2-pass codecs first do an analysis pass, where they essentially do a scratch encode to figure out how hard each frame is.
this lets the codec see into the future, so it always knows when and how much to vary the data rate. 2-pass compression can yield substantial improvements in compression efficiency for VBR encodes, easily 50 percent with highly variable content.
Segment re-encoding is when the encoder is able to encode just specific sections of the video, leaving others alone.
More commonly, segment re-encoding is a manual process, with a compressionist picking particular shots that aren’t quite perfect and adjusting encoding settings for them. This is the domain of high-end, high-touch compression products
the relationship between frame size and bitrate isn’t linear. Math Warning! There’s an old rule of thumb called the “Power of 0.75” that says data rate needs to be changed bythe power of 0.75 of the relative change in frame size.
3: By squeezing the video to 75 percent of the original wide (like 480 360 instead of 640 360) we were able to get better overall compression efficiency.
Square-pixel is slightly more efficient to encode, so it’s the right default when there’s not an obvious reason to do something else.
Modern codecs in common use for content delivery are all 8-bit per channel.
10-bit would mainly be used in making an archival or mezzanine file for later processing using a codec like Cineform, DNxHD, or ProRes.
(W149)
Modern formats let you specify duration per frame, changing frame rate on the fly.
MPEG-1 and MPEG-2 have a limited number of options (the native frame rates for PAL, NTSC, and film).
Frame rate has a less linear impact on bitrate than you might imagine,
having a keyframe rate of “every 100” normally doesn’t mean you’ll get a keyframe at frames 1, 101, 201, 301, and so on. In this case, if you had scene changes triggering natural keyframes at 30 and 140, you’d get keyframes at 1, 30, 130, 140, 240, and so on.
With a short GOP, a regular pulsing can be quite visible and annoying. Modern codecs have made great strides in reducing strobing, particularly when doing 2-pass, lookahead, or VBR encoding. Open GOP also reduces keyframe strobing.
MPEG-1 and MPEG-2 typically use a keyframe every half second.
For the web, the GOP lengths are typically 1–10 seconds.
A few tools let you manually specify particular frames to be keyframes. Like natural keyframes, inserted keyframes typically reset the GOP length target.
B-frame is a bidirectional frame that can reference the previous and next I or P frame.
In an Open GOP, the first frame of the GOP can actually be a B- frame, with the I-frame following. That B-frame can reference the last P-frame in the previous GOP
Can be a big help in reducing keyframe strobing
H.264 High Profile can achieve quality at 100 Kbps that Cinepak required more than 1,000 Kbps for. Some codecs work well across a wide range of data rates; others have a minimum floor below which results aren’t acceptable.
Playback performance is typically proportional to pixels per second: height x width x fps. B-frames may also be dropped on playback on slower machines. The worst case would be that only keyframes would be played.
Great-looking content that can’t be seen doesn’t count.