Cuda optimized ProRes decoder V0.2b

After I released the early beta I received a couple of messages. Most of them to be fair something like: “80fps I`m not impressed…”
But there was one message from a company which as well work on GPU optimized decoder/encoder. And to be fair it’s good to know someone else works on it as probably at some point I hope to have a chance to compare performance/quality.
I don’t have much info on what exactly they try to do (most probably just optimize FFmpeg version) but as far as I know, their initial goals to be able to decode/encode 1000 fps ProRes 4:2:2 HD.

As my decoder is in the early-stage and the encoder way behind of decoder, I still not sure if 1000 is a limit, but I guess it will be the first goal I will try to achieve.

Even though I didn’t have much time last weeks I was inspired but the fact someone else works on it, so today I ready to release a new version of the decoder.

V0.2b is still early beta it still decodes only Progressive frames, but it’s twice faster than the previous version. (~150fps 4K and ~ 610fps HD)

API was not changed

6 thoughts on “Cuda optimized ProRes decoder V0.2b”

    1. as I mentioned in one of the previous posts I struggle to find the motivation to continue work on metadata editor, so to be honest I don’t have real estimates, its more about when I get an inspiration to work on it again

  1. 150fps on UHD is better. Looks like current FFmpeg (not very well optimised code?) does about 250fps (ProResHQ) on 10 core @4GHz i9, so once you get there, then it will be something. This is probably in line with 1000fps for HD. Another problem is how this would translate to real world scenario, where eg. ffmpeg decodes on GPU ProRes and encodes eg. DNxHR. How much performance will be lost on data traveling from GPU to CPU?

    1. 1) first of all I would say its an unfair comparison I test on P4000 (GPU from 2017). I dont think it will be a problem to have 250 FPS on RTX 3080 or even RTX 3070
      2) luckily real work scenario is not just transcoding, but video effects, color corrections so on and most NLEs do it on GPU so there is no need to transfer data from GPU to CPU, more than that there is data transfer from CPU to GPU… not to mention the mentioned 150FPS includes data transfer.

      and there are tons of different use cases like play-out, video recorders so on when you cannot afford to load CPU 100%

  2. Yes, but apps which heavily rely on GPU like eg. Resolve don’t need decoding to happen on GPU because typically GPU is bottleneck there. We don’t want another process to go to GPU and kill it where CPU will be doing nothing. CPU/GPU balance is very important.
    GPU decoding would be very useful for apps like VDMX, etc. In this case it could be nice replacement for HAP codec.

    1. I`m not sure what are we arguing about here… I do see how it can improve Resolve workflows.
      I also can imagine how it useful for play-out as I can easily play-out 4K streams which from my experience was not possible even with 32 cores.
      Still, though this conversation doesn’t lead anywhere I just work on the project I like regardless if anyone needs it or not

Leave a Reply

Your email address will not be published. Required fields are marked *