Cuda optimized ProRes decoder V0.2b

After I released the early beta I received a couple of messages. Most of them to be fair something like: “80fps I`m not impressed…”
But there was one message from a company which as well work on GPU optimized decoder/encoder. And to be fair it’s good to know someone else works on it as probably at some point I hope to have a chance to compare performance/quality.
I don’t have much info on what exactly they try to do (most probably just optimize FFmpeg version) but as far as I know, their initial goals to be able to decode/encode 1000 fps ProRes 4:2:2 HD.

As my decoder is in the early-stage and the encoder way behind of decoder, I still not sure if 1000 is a limit, but I guess it will be the first goal I will try to achieve.

Even though I didn’t have much time last weeks I was inspired but the fact someone else works on it, so today I ready to release a new version of the decoder.

V0.2b is still early beta it still decodes only Progressive frames, but it’s twice faster than the previous version. (~150fps 4K and ~ 610fps HD)

API was not changed

Cuda optimized ProRes decoder

I don’t have a lot of free time last months, so really struggle to release a new Version of AMCDX Video Patcher. And to be fair when the main challenges solved I don’t have that much motivation.

I still interested in the performance part of the project what was to be fair the main goal.
There is though not really much left to optimize on CPU size (Prores Encoder/Decoder overperforms FFmpeg implementation, Frame Editor and File To File are fast enough)…

So I finally found some time to start the part of the project I was dreaming of during last year – GPU optimization.

So today I officially release ProRes Cuda optimized decoder. Its early beta: It doesn’t support interlaced frames yet, there plenty of room for extra optimizations, but it’s still good enough to show it.

So what supported in v0.1b?
1) Decoding of progressive ProRes frames on GPU (both 444 and 422 supported)
2) Always decodes to 12 bit

API
amcdx_cu_prores_decoder.dll exports functions:

1) void * amcdx_cupr_decoder_create() – creates decoder instance allocates memory so on.
Returns decoder handle.

2) int amcdx_cupr_decoder_decode(void * decoder, void * buffer, int size) – decodes passed frame
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – encoded ProRes frame
size – frame size
Returns 0 if success, otherwise returns error code.

3) unsigned int amcdx_cupr_get_pitch(void * decoder) – returns plane line size. in 444 case line size same for all 3 planes, in 422 case line size of chroma planes equals amcdx_cupr_get_pitch / 2
decoder – decoder handle returned by amcdx_cupr_decoder_create

4) unsigned int amcdx_cupr_get_width(void * decoder) – returns width of decoded frame
decoder – decoder handle returned by amcdx_cupr_decoder_create

5) unsigned int amcdx_cupr_get_height(void * decoder) – returns height of decoded frame
decoder – decoder handle returned by amcdx_cupr_decoder_create

6) int amcdx_cupr_is_444(void * decoder) – returns 1 if we have 444 chroma subsampling, otherwise returns 0
decoder – decoder handle returned by amcdx_cupr_decoder_create

7) void amcdx_cupr_decoder_read(void * decoder, void ** buffer) – copies decoded frame from GPU to CPU. this function should be called if you frame buffer line sizes equals to amcdx_cupr_get_pitch
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – output frame plane buffers

8) void amcdx_cupr_decoder_read_pitch(void * decoder, void ** buffer, int * pitch) – copies decoded frame from GPU to CPU. this function should be called if you frame buffer line sizes are not equal to amcdx_cupr_get_pitch
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – output frame plane buffers
pitch – output frame plane line sizes

9) void amcdx_cupr_decoder_destroy(void * decoder) – destroys decoder instance
decoder – decoder handle returned by amcdx_cupr_decoder_create

10) const char * amcdx_cupr_version() – returns library version string

I added a simple wrapper so it could be used with FFmpeg
Pre-built Binaries

P.S. As I mentioned before it’s an early beta, so I didn’t do a lot of benchmarking. Currently, I have ~80 FPS decoding ProRes XQ 4444, 4K on Quadro P4000

AMCDX VIDEO PATCHER V0.5.5

AMCDX Video Patcher v0.5.5 released
Starting v0.5.5 there is no more required to have CPU which supports AVX2 Instruction Set (so you can use it on old Intel/AMD CPUs and on ARM CPUs)

Features:
1) You can type exact frame you want to position to (instead of moving scrubber)
2) Added “Show Shortcuts” Menu Item

Bugs Fixed:
1) Fixed bug when after some manipulations Keyboard shortcuts got inactive

Windows Installer
OSX Installer
Linux Build available upon request

What’s Next?
1) Insert directly to file stored on AWS S3
2) VC-3 playback and Insert support