Category Archives: ffmpeg

Cuda optimized ProRes decoder

I don’t have a lot of free time last months, so really struggle to release a new Version of AMCDX Video Patcher. And to be fair when the main challenges solved I don’t have that much motivation.

I still interested in the performance part of the project what was to be fair the main goal.
There is though not really much left to optimize on CPU size (Prores Encoder/Decoder overperforms FFmpeg implementation, Frame Editor and File To File are fast enough)…

So I finally found some time to start the part of the project I was dreaming of during last year – GPU optimization.

So today I officially release ProRes Cuda optimized decoder. Its early beta: It doesn’t support interlaced frames yet, there plenty of room for extra optimizations, but it’s still good enough to show it.

So what supported in v0.1b?
1) Decoding of progressive ProRes frames on GPU (both 444 and 422 supported)
2) Always decodes to 12 bit

API
amcdx_cu_prores_decoder.dll exports functions:

1) void * amcdx_cupr_decoder_create() – creates decoder instance allocates memory so on.
Returns decoder handle.

2) int amcdx_cupr_decoder_decode(void * decoder, void * buffer, int size) – decodes passed frame
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – encoded ProRes frame
size – frame size
Returns 0 if success, otherwise returns error code.

3) unsigned int amcdx_cupr_get_pitch(void * decoder) – returns plane line size. in 444 case line size same for all 3 planes, in 422 case line size of chroma planes equals amcdx_cupr_get_pitch / 2
decoder – decoder handle returned by amcdx_cupr_decoder_create

4) unsigned int amcdx_cupr_get_width(void * decoder) – returns width of decoded frame
decoder – decoder handle returned by amcdx_cupr_decoder_create

5) unsigned int amcdx_cupr_get_height(void * decoder) – returns height of decoded frame
decoder – decoder handle returned by amcdx_cupr_decoder_create

6) int amcdx_cupr_is_444(void * decoder) – returns 1 if we have 444 chroma subsampling, otherwise returns 0
decoder – decoder handle returned by amcdx_cupr_decoder_create

7) void amcdx_cupr_decoder_read(void * decoder, void ** buffer) – copies decoded frame from GPU to CPU. this function should be called if you frame buffer line sizes equals to amcdx_cupr_get_pitch
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – output frame plane buffers

8) void amcdx_cupr_decoder_read_pitch(void * decoder, void ** buffer, int * pitch) – copies decoded frame from GPU to CPU. this function should be called if you frame buffer line sizes are not equal to amcdx_cupr_get_pitch
decoder – decoder handle returned by amcdx_cupr_decoder_create
buffer – output frame plane buffers
pitch – output frame plane line sizes

9) void amcdx_cupr_decoder_destroy(void * decoder) – destroys decoder instance
decoder – decoder handle returned by amcdx_cupr_decoder_create

10) const char * amcdx_cupr_version() – returns library version string

I added a simple wrapper so it could be used with FFmpeg
Pre-built Binaries

P.S. As I mentioned before it’s an early beta, so I didn’t do a lot of benchmarking. Currently, I have ~80 FPS decoding ProRes XQ 4444, 4K on Quadro P4000

Prores QUALITY

UPD: ffmpeg builds I shared were added just to show encoder is no a myth, but as FFmpeg (or at least some guys from the community) has something against I had to remove repo…s o all github links below invalid, sorry

In the previous post I forgot to mention the problem I mentioned couple times before – Quality. Its not always easy to detect by eye big difference, but I have some test files where any of ffmpeg prores encoders really fails.

I uploaded one to github if you want to check:

https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/1.bmp

and you can see how badly ffmpeg encodes it if you want Proxy profile:

ffmpeg -i 1.bmp -c:v prores_aw -profile:v -pix_fmt yuv422p10le aw.mov

ffmpeg -i 1.bmp -c:v prores_ks -profile:v -pix_fmt yuv422p10le ks.mov

as you see both looks quite blury (aw looks better but as I said before there is nothing about rate control and aw guarantees nothing except correct bitstream)

Thats how looks same frame encoded with encoder I made:


ffmpeg -i 1.bmp -c:v prores_amcdx -profile:v -pix_fmt yuv422p10le amcdx.mov

I uploaded all 3 mov files so you can compare results by yourself:
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/aw.mov
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/ks.mov
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/amcdx.mov

I also do believe you have your own test footage which you want to try encoder with, so I built ffmpeg master branch and added one more Prores Encoder, so you can test and check results . Usage:
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/build/ffmpeg_win_MSVS2015.7z
https://github.com/da8eat/ffmpeg_prores_encoder/blob/master/build/ffmpeg_osx_clang.7z

ffmpeg.exe -i 1.bmp -c:v prores_amcdx -profile:v 5 -pix_fmt yuv444p12le xq.mov

profiles same as others ffmpeg Prores encoders: 0 – Proxy, 1 – LT, 2 – Standard, 3 – HQ, 4 – 4444, 5 – XQ

supported pixel formats: uyvy422, yuv422p10le, yuv422p12le, yuv444p12le

I do believe my encoder still have some bugs, so If you face any do not hesitate to message me

Prores progress updates

As I got some questions about progress I decided to post some updates and clarifications:

  1. I succeeded to improve performance so now encoder a bit faster of Apple implementation with identical output (and I still see room for improvements)
  2. I fixed some minor issues and fully implemented XQ profile
  3. About 12 bit support: there was a thread in ffmpeg dev list where some core developers were claiming 12-bit Prores is a myth so you know Apple encoder encode all data as 12 bit even if you pass 8-bit uyvy it first converted to 12 bits and encoded after that
  4. Based on statement (3) I can expose big mistake I made in Cinedeck Prores Insert-Edit. Basically Cinedeck checks some stream parameters to make the decision if input stream should be re-encoded or not before insert. One of them is src pixel format and if input and output has different src pixel formats video gets re-encoded before insert which now i can say is wrong behavior as basically on encoder side its always 12 bit and not src pixel format but chroma subsampling had to be checked
  5. There is one more util I work on. Its Prores smart transcoder:
    Lets say we need to transcode Prores HQ to Prores Proxy, thats how any transcoding app will do it:
    1) Decode frame (vlc decode -> dequantize -> inverse dct -> assemble slices to frame buffer
    2) Encode (disassemble frame to slices, -> forward dct -> rate control -> quantize -> vlc encode)

    From first point of view it looks ok, but from my point of view it should be:
    1) vlc decode -> dequantize -> rate control -> quantize -> vlc encode
    so basically I got rid of some heavy but useless steps which make transcode almost x2 faster comparing to the classical way
    Obviously, it works only if you transcode from Prores to Prores

I`m still quite far to show demo (except some command line applications), but here is priority list:

  1. Make user friendly UI so it easy to show and use
  2. Finish MOV parser/muxer (as im not a fan to use FFMPEG for demo)

MXF OP1b + FFmpeg Part1

some time ago I was requested to fix strange ffmpeg bug. by customer words they had op1b files from Panasonic camera ffmpeg doesnt read audio correct (first couple seconds were looped).

First thoughts was “easy money” so I signed up.
the problem was trivial basically op1b allows more 1 essence containers (smpte 319) and each essence stored in its own essence container as result each track is unique in essence container and all audio tracks have same track number so when ffmpeg assign new read essence packet it check track number and always assigned int to first audio track

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv)
{
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        if (track && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
        }
    }
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;
}

as file had 4 tracks it made loop effect because of all 4 tracks had same audio.

to resolve we need:

1) in MXFContentStorage read field which contains list of all EssenceContainerData (0x1902)

static int mxf_read_content_storage(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
{
    MXFContext *mxf = arg;
    switch (tag) {
    case 0x1901:
        if (mxf->packages_refs)
            av_log(mxf->fc, AV_LOG_VERBOSE, "Multiple packages_refs\n");
        av_free(mxf->packages_refs);
        return mxf_read_strong_ref_array(pb, &mxf->packages_refs, &mxf->packages_count);
    case 0x1902:
        av_free(mxf->essence_container_data_refs);
        return mxf_read_strong_ref_array(pb, &mxf->essence_container_data_refs, &mxf->essence_container_data_count);
    }
    return 0;
}


2) read each EssenceContainerData (ffmpeg didnt read it at all).

3) each EssenceContainerData has reference to SourcePackage and also holds index sid and body sid

typedef struct MXFEssenceContainerData {
    UID uid;
    enum MXFMetadataSetType type;
    UID package_uid;
    UID package_ul;
    int index_sid;
    int body_sid;
} MXFEssenceContainerData;

static int mxf_read_essence_container_data(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
{
    MXFEssenceContainerData * essence_data = arg;
    switch(tag) {
        case 0x2701:
            /* linked package umid UMID */
            avio_read(pb, essence_data->package_ul, 16);
            avio_read(pb, essence_data->package_uid, 16);
            break;
        case 0x3f06:
            essence_data->index_sid = avio_rb32(pb);
            break;
        case 0x3f07:
            essence_data->body_sid = avio_rb32(pb);
            break;
    }
    return 0;
}

static const MXFMetadataReadTableEntry mxf_metadata_read_table[] = {
//removed to not post too many code
    { { 0x06,0x0e,0x2b,0x34,0x02,0x53,0x01,0x01,0x0d,0x01,0x01,0x01,0x01,0x01,0x23,0x00 }, mxf_read_essence_container_data, sizeof(MXFEssenceContainerData), EssenceContainerData },
    { { 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 }, NULL, 0, AnyType },
};


4) go through all tracks from each SourcePackage and assign to each track index and body sid we found in corresponding EssenceContainerData

for (k = 0; k < mxf->essence_container_data_count; k++) {
            if (!(essence_data = mxf_resolve_strong_ref(mxf, &mxf->essence_container_data_refs[k], EssenceContainerData))) {
                av_log(mxf, AV_LOG_TRACE, "could not resolve essence container data strong ref\n");
                continue;
            }

            if (memcmp(component->source_package_ul, essence_data->package_ul, sizeof(UID)) || memcmp(component->source_package_uid, essence_data->package_uid, sizeof(UID))) {
                continue;
            }

            source_track->body_sid = essence_data->body_sid;
            source_track->index_sid = essence_data->index_sid;
        }


5) when we read next KLV triplet we look for partition this triplet belongs to and each partition has body sid

static int find_body_sid_by_offset(MXFContext *mxf, int64_t offset) {
    //we basically look for partition where current klv triplet placed

    int i;
    MXFPartition * prev = 0;

        for (i = 0; i < mxf->partitions_count; ++i) {
            MXFPartition * partition = &mxf->partitions[i];

            if (partition->body_sid) {
                if (partition->this_partition < offset) {
                    prev = partition;
                }
                else {
                    break;
                }
            }
        }

    if (prev) {
        return prev->body_sid;
    }

    return 0;
}


6) when we look for which track to assign this triplet we compare track number and body sid (before was only track number compared)

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv, int body_sid)
{
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        //we check body_sid and track->body_sid equal to zero just just to be compatible with old where no body_sid assigned to track
        if (track && (body_sid == 0 || track->body_sid == 0 || track->body_sid == body_sid) && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
        }
    }
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;
}

Those changes fixed audio read 🙂

Unfortunately that wasnt it for me… as seek issues occurred and audio packets were too big (audio was custom wrapped e.g. clip wrapped but split on 2 seconds chunk each chunk in new partition) but this is topic for the next post

 

P.S. changes described in this post could be found by link:

https://github.com/da8eat/FFmpeg/blob/master/libavformat/mxfdec.c

Â