Tag Archives: ProRes

Some thoughts on performance optimization

I have a bad habit to try to make my code highly optimized even if the unit not fully finished, and I`m serious when say its a bad habit as it really slows down the development process and sometimes affects code quality…

For example here is an interesting trick I added to my ProRes encoder/decoder:

When you encode/decode DC you should use one of 4 existing codebooks and codebook should be chosen based on value just encoded DC which obviously can be way bigger of 3. So what most of us will do here just add branching:

if (codebook > 3) {
    codebook = 3;
}

And in 999 of 1000 cases its probably will be the best solution. For my case, I have 3 * 30 DC per slice (technically 3 * 32 but first 2 DCs don’t need any branching to know codebook) or 364500 DC per UHD frame which is kinda bad…

as you might see max value of codebook == 3 eg (2^2 – 1) and this is the case where we can easily avoid branching, so I replaced that branching with code:

codebook = (3 & (4 - !!(codebook & 0xfffc))) + ((codebook & 3) & (4 - !(codebook & 0xfffc)));

Which is avoid branching and in my particular case improves performance a bit ~0.3-0.5% but to be fair this code would be hard to support and I still doubt if I need to push it.

This is really tricky example which probably shouldn’t be considered especially when your unit is not 100% complete and when there are for sure a million ways to optimize your code.

One more example

This one is more critical and which I faced doing my Contract.
I had to fix performance regression which the company faced after FFmpeg upgrade from FFmpeg 3.x to FFmpeg 4.2 as they use FFmpeg to demux MOV files.

One of the developers found out that FFmpeg finally added 12-bit decoding support and now they claim all HQX and 4444 profiles as 12 bit and indeed that commit causes regression. It sounds weird if you consider the fact they use official libs from Apple for decoding …

So how is it possible? My first thought was decoder still used somewhere how else is it possible File open would become x2 slower? So how do you open File with ffmpeg libs? Something like:

avformat_open_input ...
avformat_find_stream_info ..

What I found out avformat_find_stream_info reads the first frame from file and decodes it and does it single-threaded. How do you like it? To be fair there is a reason behind it as sometimes there is no way to get all needed metadata without decoding frame header (for example bit depth or pixel format so on) but the problem is we don’t need to decode whole frame to get that metadata we just need to decode frame header… So I added an extra flag wich force to stop FFmpeg Prores and DNx decoders after headers decoded, something like:

//proresdec2.c .  line 784
if (avctx->flags & AV_CODEC_STOP_AFTER_HEADER_DECODED) {
    return avpkt->size;
}

Believe it or not but instead of x2 slowdown we achieved x2 speed up and now file open became constant time regardless of resolution when with the previous version the higher resolution was the slower file open would be…

Prores tools updates

I didn’t write last time as was quite busy and unfortunately didn’t work a lot on announced tools as still doing my main contract and literally have max 2 hours a day to do my project :(.

Nevertheless, as there was high interest I want to post some updates:

  1. I finally finished ProRes decoder which needed for a lot of reasons (playback, transcode as we need to stop decoding before doing IDCT, in-place edit as we need to decode just some of the slices, not whole frame)
  2. I finished MOV demuxer which based on the project I started here (https://github.com/da8eat/qtfile_pp) never committed updates though but at least you can see the main idea
  3. I made huge progress on MOV muxer (also based on GitHub project mentioned above)
  4. I finished basic UI (Qt/qml based)
    ProRes Tools UI

5. I implemented GLSL shaders for Video Renderer

Next couple of months I`m going to finish the Player and MOV muxer, and after that, finally, start features integration.

First one will be ProRes to ProRes transcode (for example transcode HQX to Proxy profile)

The second one will be in-place ProRes editing (I plan to detect faces on video and blur all detected faces in the input file without video re-encoding and file re-muxing)

Prores QUALITY

UPD: ffmpeg builds I shared were added just to show encoder is no a myth, but as FFmpeg (or at least some guys from the community) has something against I had to remove repo…s o all github links below invalid, sorry

In the previous post I forgot to mention the problem I mentioned couple times before – Quality. Its not always easy to detect by eye big difference, but I have some test files where any of ffmpeg prores encoders really fails.

I uploaded one to github if you want to check:

https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/1.bmp

and you can see how badly ffmpeg encodes it if you want Proxy profile:

ffmpeg -i 1.bmp -c:v prores_aw -profile:v -pix_fmt yuv422p10le aw.mov

ffmpeg -i 1.bmp -c:v prores_ks -profile:v -pix_fmt yuv422p10le ks.mov

as you see both looks quite blury (aw looks better but as I said before there is nothing about rate control and aw guarantees nothing except correct bitstream)

Thats how looks same frame encoded with encoder I made:


ffmpeg -i 1.bmp -c:v prores_amcdx -profile:v -pix_fmt yuv422p10le amcdx.mov

I uploaded all 3 mov files so you can compare results by yourself:
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/aw.mov
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/ks.mov
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/amcdx.mov

I also do believe you have your own test footage which you want to try encoder with, so I built ffmpeg master branch and added one more Prores Encoder, so you can test and check results . Usage:
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/build/ffmpeg_win_MSVS2015.7z
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/build/ffmpeg_osx_clang.7z

ffmpeg.exe -i 1.bmp -c:v prores_amcdx -profile:v 5 -pix_fmt yuv444p12le xq.mov

profiles same as others ffmpeg Prores encoders: 0 – Proxy, 1 – LT, 2 – Standard, 3 – HQ, 4 – 4444, 5 – XQ

supported pixel formats: uyvy422, yuv422p10le, yuv422p12le, yuv444p12le

I do believe my encoder still have some bugs, so If you face any do not hesitate to message me

Prores progress updates

As I got some questions about progress I decided to post some updates and clarifications:

  1. I succeeded to improve performance so now encoder a bit faster of Apple implementation with identical output (and I still see room for improvements)
  2. I fixed some minor issues and fully implemented XQ profile
  3. About 12 bit support: there was a thread in ffmpeg dev list where some core developers were claiming 12-bit Prores is a myth so you know Apple encoder encode all data as 12 bit even if you pass 8-bit uyvy it first converted to 12 bits and encoded after that
  4. Based on statement (3) I can expose big mistake I made in Cinedeck Prores Insert-Edit. Basically Cinedeck checks some stream parameters to make the decision if input stream should be re-encoded or not before insert. One of them is src pixel format and if input and output has different src pixel formats video gets re-encoded before insert which now i can say is wrong behavior as basically on encoder side its always 12 bit and not src pixel format but chroma subsampling had to be checked
  5. There is one more util I work on. Its Prores smart transcoder:
    Lets say we need to transcode Prores HQ to Prores Proxy, thats how any transcoding app will do it:
    1) Decode frame (vlc decode -> dequantize -> inverse dct -> assemble slices to frame buffer
    2) Encode (disassemble frame to slices, -> forward dct -> rate control -> quantize -> vlc encode)

    From first point of view it looks ok, but from my point of view it should be:
    1) vlc decode -> dequantize -> rate control -> quantize -> vlc encode
    so basically I got rid of some heavy but useless steps which make transcode almost x2 faster comparing to the classical way
    Obviously, it works only if you transcode from Prores to Prores

I`m still quite far to show demo (except some command line applications), but here is priority list:

  1. Make user friendly UI so it easy to show and use
  2. Finish MOV parser/muxer (as im not a fan to use FFMPEG for demo)

how good is your prores workflow?

I was silent for a year and to be fair I haven’t touched prores for a year as well, except last couple months

so whats happened:
Basically I was looking for my self, what I want to do and what kind of work makes me happy. I tried couple different projects but at the very end returned to the fact that reverse engineering and video codecs most interesting to me.

I returned back to reversing ProRes where I went trough couple different periods/questions:

1. Why would anyone need it if apple shares it for free?

2. Why would anyone need it if ffmpeg has 2 different implementations?

3. Why cant I contribute to ffmpeg?

Every period made my position that one more ProRes encoder needed even stronger, and here are my answers on questions I asked my self:

1. First of all its possible to get Apple implementation (obviously not source codes) for free my previous employee experience confirms it, but procedure is unclear and seems you need connections as well

2. Both ffmpeg implementations far from what Apple encoder does. no one support 12 bit input, both produce bad ouptut especially for Proxy and XQ qualities.

2.1 Anatoly`s implementation has really poor rate control with all this min/max quant limits

2.1 Kostya`s implementation has better rate control but still nothing similar to what Apple does and waaay slower

3. At some point I tried to contribute to ffmpeg one of my patches was approved, but when I started to work on performance optimizations and pushed patch I just got ignored… Yes patch didnt break fate test, patch improved performance but was not approved as well as declined or anyhow commented it just will be for ever in the review state, and to be fair reading ffmpeg dev-emails I found that guys more open to fight about sponsored changes than something else, so yes deep inside ffmpeg became super commercialized 🙁

So now about progress I made:

1. I finished first version of encoder, which support 8/10/12 bit input, supports 422 and 444 input (alpha still not implemented)

2. Encoder has Apple similar rate control and to be fair on same input produce absolutely identical output comparing to Apple version

It is like x3 faster of ffmpeg encoder but still slower of Apple one (there are still a lot of room for optimizations)

Returning to the question why anyone would need one more ProRes encoder. here is pros of what I do:

  1. Absolutely identical output with apple implementation
  2. 12 bit support
  3. At some point Prores RAW will be added

Another words its way better of what ffmpeg does and easy to port to any platform (which not that easy to get from Apple in case you succeeded to get anything)

But the main reason is new Product I`m building based on this encoder.

When I started understand well Apple bitstream and logic in general. I almost immediately had couple ideas how to use it:
1. In-place editing. for example you dont like couple frames in your final file, now you can replace it without re-encoding whole file.
1.1. You cant do it with Apple implementation as you cannot control output frame size.
1.2. You can do it with Cinedeck tools (but you would need to re-wrap your file) as they relay on Apple implementation, its probably not a big deal if your file is couple hundred megabytes, but if its hundreds of gigabytes? and your file on S3?

2. I decided to go even further. Lets say you need to put logo on your frame or blur any part of frame, now you can not only replace frames inside file, but its possible to replace part of any frame, another words to add logo to ProRes encoded frame you would need to re-encode only that part of frame where logo will be placed

thats bassicly it. sounds exciting? ping me if you going to NAB2019 so you can see how powerfull it is!

Reversing ProRes: Part1 (Bitrate)

One of the main problem I met with 3rd parties Prores encoders is bitrate. Approx bitrate could be found in Prores white papers.

But I talk about logic of calculation max possible size. Why max? Because Apple doesnt really care about lower bound, so with black frame as source you will never be even close to bitrate mentioned in white papers.

Basically Apple has simple algorithm and they calc size of frame that encoder never exceed and second seems ideal size to fulfill declared bitrate.

so max/avg size depends on couple input conditions:

  1. resolution
  2. quality
  3. alpha

Alpha case is quite simple and weird same time, if you say to Prores encoder that you going to encode alpha it automatically increase max size by 3* width * height

Main logic is resolution based:

  • if resolution less or equal of SD_NTSC its 288 * 1024
  • if res less or equal of SD_PAL its 336 * 1024
  • so on

that actually was second weird thing I found: the code how its implemented:

int size = width * height;
int rate = 0;

if (size <= 720 * 486) {
    rate = 288 * 1024;
}
else if (size <= 720 * 576) {
    rate = 336 * 1024;
}
else if (size <= 960 * 720) {
    rate = 432 * 1024;
}
// so on

my first question was “have they ever heard about binary search?” 🙂

Nevertheless when base value found they  tune it with respect to quality:

if (qual == proxy) {
    rate = 13 * rate / 63;
}
else if (qual == lt) {
    rate = 13 * rate / 28
}
//so on

thats basically it for max frame size. second value which I named avg value calculated even easier we just multiply previously found rate by 8 and divide by 9

https://github.com/da8eat/prores_encoder

P.S. there are also some color space tricks for example when we encode 422 frame to 4444 quality, but I wont cover it as its a bit artificial situation

Reversing Prores – Part0

yes I know ProRes encoder was reversed long time ago, for example there 3 different encoders in ffmpeg, but :

  • a) there are several needs which any of ffmpeg version doesnt meet
  • b) seems I`m that kind of person who prefer to invent bicycle

so I started to reverse ProRes encoder on my own. there are min and max goals I will try to achieve:

minimum goal is:

  • create ProRes encoder that produce correct bitstream (by correct I mean decodeable by most popular ProRes decoders),
  • encoded frame size more or less equal to size  produced by native Apple encoder
  • encoder performs better of ffmpeg/apple versions

maximum goal is:

  • encoded frame binary identical to Apple native encoder produce
  • encoder performs better of Apple native encoder

so basically maximum goal is to create better version of Apple Prores encoder without source codes 🙂

here in blog I am going to post my progress and thoughts and I think will share code on GitHub