MXF OP1b + FFmpeg Part1

some time ago I was requested to fix strange ffmpeg bug. by customer words they had op1b files from Panasonic camera ffmpeg doesnt read audio correct (first couple seconds were looped).

First thoughts was “easy money” so I signed up.
the problem was trivial basically op1b allows more 1 essence containers (smpte 319) and each essence stored in its own essence container as result each track is unique in essence container and all audio tracks have same track number so when ffmpeg assign new read essence packet it check track number and always assigned int to first audio track

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv)
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        if (track && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;

as file had 4 tracks it made loop effect because of all 4 tracks had same audio.

to resolve we need:

1) in MXFContentStorage read field which contains list of all EssenceContainerData (0x1902)

static int mxf_read_content_storage(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
    MXFContext *mxf = arg;
    switch (tag) {
    case 0x1901:
        if (mxf->packages_refs)
            av_log(mxf->fc, AV_LOG_VERBOSE, "Multiple packages_refs\n");
        return mxf_read_strong_ref_array(pb, &mxf->packages_refs, &mxf->packages_count);
    case 0x1902:
        return mxf_read_strong_ref_array(pb, &mxf->essence_container_data_refs, &mxf->essence_container_data_count);
    return 0;

2) read each EssenceContainerData (ffmpeg didnt read it at all).

3) each EssenceContainerData has reference to SourcePackage and also holds index sid and body sid

typedef struct MXFEssenceContainerData {
    UID uid;
    enum MXFMetadataSetType type;
    UID package_uid;
    UID package_ul;
    int index_sid;
    int body_sid;
} MXFEssenceContainerData;

static int mxf_read_essence_container_data(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
    MXFEssenceContainerData * essence_data = arg;
    switch(tag) {
        case 0x2701:
            /* linked package umid UMID */
            avio_read(pb, essence_data->package_ul, 16);
            avio_read(pb, essence_data->package_uid, 16);
        case 0x3f06:
            essence_data->index_sid = avio_rb32(pb);
        case 0x3f07:
            essence_data->body_sid = avio_rb32(pb);
    return 0;

static const MXFMetadataReadTableEntry mxf_metadata_read_table[] = {
//removed to not post too many code
    { { 0x06,0x0e,0x2b,0x34,0x02,0x53,0x01,0x01,0x0d,0x01,0x01,0x01,0x01,0x01,0x23,0x00 }, mxf_read_essence_container_data, sizeof(MXFEssenceContainerData), EssenceContainerData },
    { { 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 }, NULL, 0, AnyType },

4) go through all tracks from each SourcePackage and assign to each track index and body sid we found in corresponding EssenceContainerData

for (k = 0; k < mxf->essence_container_data_count; k++) {
            if (!(essence_data = mxf_resolve_strong_ref(mxf, &mxf->essence_container_data_refs[k], EssenceContainerData))) {
                av_log(mxf, AV_LOG_TRACE, "could not resolve essence container data strong ref\n");

            if (memcmp(component->source_package_ul, essence_data->package_ul, sizeof(UID)) || memcmp(component->source_package_uid, essence_data->package_uid, sizeof(UID))) {

            source_track->body_sid = essence_data->body_sid;
            source_track->index_sid = essence_data->index_sid;

5) when we read next KLV triplet we look for partition this triplet belongs to and each partition has body sid

static int find_body_sid_by_offset(MXFContext *mxf, int64_t offset) {
    //we basically look for partition where current klv triplet placed

    int i;
    MXFPartition * prev = 0;

        for (i = 0; i < mxf->partitions_count; ++i) {
            MXFPartition * partition = &mxf->partitions[i];

            if (partition->body_sid) {
                if (partition->this_partition < offset) {
                    prev = partition;
                else {

    if (prev) {
        return prev->body_sid;

    return 0;

6) when we look for which track to assign this triplet we compare track number and body sid (before was only track number compared)

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv, int body_sid)
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        //we check body_sid and track->body_sid equal to zero just just to be compatible with old where no body_sid assigned to track
        if (track && (body_sid == 0 || track->body_sid == 0 || track->body_sid == body_sid) && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;

Those changes fixed audio read 🙂

Unfortunately that wasnt it for me… as seek issues occurred and audio packets were too big (audio was custom wrapped e.g. clip wrapped but split on 2 seconds chunk each chunk in new partition) but this is topic for the next post


P.S. changes described in this post could be found by link:


Reversing ProRes: Part1 (Bitrate)

One of the main problem I met with 3rd parties Prores encoders is bitrate. Approx bitrate could be found in Prores white papers.

But I talk logic of calculation max possible size. Why max? Because Apple doesnt really care about lower bound, so with black frame as source you will never be even close to bitrate mentioned in white papers.

Basically Apple has simple algorithm and they calc size of frame that encoder never exceed and second seems ideal size to fulfill declared bitrate.

so max/avg size depends on couple input conditions:

  1. resolution
  2. quality
  3. alpha

Alpha case is quite simple and weird same time, if you say to Prores encoder that you going to encode alpha it automatically increase max size by 3* width * height

Main logic is resolution based:

  • if resolution less or equal of SD_NTSC its 288 * 1024
  • if res less or equal of SD_PAL its 336 * 1024
  • so on

that actually was second weird thing I found, no I didnt mean link resolution and bitrate, but code how its implemented:

int size = width * height;
int rate = 0;

if (size <= 720 * 486) {
    rate = 288 * 1024;
else if (size <= 720 * 576) {
    rate = 336 * 1024;
else if (size <= 960 * 720) {
    rate = 432 * 1024;
// so on

my first question was “have they ever heard about binary search?” 🙂

Nevertheless when base value found they  tune it with respect to quality:

if (qual == proxy) {
    rate = 13 * rate / 63;
else if (qual == lt) {
    rate = 13 * rate / 28
//so on

thats basically it for max frame size. second value which I named avg value calculated even easier we just multiply previously found rate by 8 and divide by 9

P.S. there are also some color space tricks for example when we encode 422 frame to 4444 quality, but I wont cover it as its a bit artificial situation

Reversing Prores – Part0

yes I know ProRes encoder was reversed long time ago, for example there 3 different encoders in ffmpeg, but :

  • a) there are several needs which any of ffmpeg version doesnt meet
  • b) seems I`m that kind of person who prefer to invent bicycle

so I started to reverse ProRes encoder on my own. there are min and max goals I will try to achieve:

minimum goal is:

  • create ProRes encoder that produce correct bitstream (by correct I mean decodeable by most popular ProRes decoders),
  • encoded frame size more or less equal to size  produced by native Apple encoder
  • encoder performs better of ffmpeg/apple versions

maximum goal is:

  • encoded frame binary identical to Apple native encoder produce
  • encoder performs better of Apple native encoder

so basically maximum goal is to create better version of Apple Prores encoder without source codes 🙂

here in blog I am going to post my progress and thoughts and I think will share code on GitHub


its sounds a bit strange to me but I still receive emails with request to sell or share DirectShowNETCF source codes.

As I said in first post I can hardly find reason why anyone still need it as result no need support it anymore. Nevertheless if someone still need it for any reason I published sources codes I found on one of my old laptops (no quite sure if its most recent version)


P.S. please dont judge much code quality, it was done ton years ago and mostly with educational purpose 🙂

Reversing mxf op1a

Today I want to show technic how to reverse MXF files, thats what I do from time to time to repair some broken files, but also useful if you need to get some information.

Our user case: We have mxf op1a with XAVC stream inside, if open that file in Sony catalyst browser, we can see detailed information including Level and Profile, we need to understand which part of that file contains that information.

what we will need for that:
 1. libMXF any version (I use custom forked from v.1.0.0-rc1)
 2. MXDump from libMXF
 3. any Hex viewer (I use Far manager on win)

Before we start, best way to understand what part of mxf contains information about Level/Profile is to buy and read
 SMPTE 381-3 "Mapping AVC Streams into the MXF Generic Container" but will it help to improve your mxf reverse technics?

first of all I want to notice that you must have at least basic understanding of MXF (SMPTE 336M, 377M, 379M) that basic knowledge helps us understand object structure, ber, mxf list structure so on.

there are 2 known ways to find XAVC Profile/Level:
 1) read encoded XAVC frame and parse Sequence Parameter Set (SPS) to extract fields profile_idc and level_idc
 2) parse MXF metadata (in most cases and in our specific case parse descriptor)

1st way is more solid as for me except some weird cases like Avid AVCI (they exclude SPS and PPS). I will explain at the end why first method more solid.
 Nevertheless as I saw tons XAVC files that have SPS but Catalyst browser still do not detect/show Profile/Level we can make conclusion they use 2nd way to detect it.

so what we do first is makind dump of needed MXF file using MXFDump app from LibMXF:

MXFDump.exe -m "e:\xavc.mxf" >> "e:\xavc.txt"

Console output was redirected to file so It could be opened and analyzed. As we found Catalyst browser looks for profile and level in metadata and most probably its stream descriptor.

in result dump file CDCI descriptor was found, descriptor contains 1 field that MXFDump knows nothing about so marked it as Dark (check image1)
(image 1)
that dark field has structure that typical for MXF List where first 4 bytes - number of elements, second 4 bytes - element size and after that all elements with defined size.
 In current particular case we see list with 1 element 16 bytes long, so looks like its UID, there is one observation that confirms it, if we compare UID value with instance UID of CDCI descriptor we will find its almost identical except 4th byte which increased by 1, it makes me think we found UID generated using same UID generation algorithm and it was generated right after CDCI descriptor instance UID was generated. We still dont know what that UID means, but next line after CDCI descriptor saying "[ Omitted 1 dark KLV triplets ]" and it makes me think there is an object (unknown for MXFDump). Lets assume dark UID in CDCI descriptor refers to this object, so its time to open our hex viewer and find our unknown UID and check what preceded and follows it.
(image 2)
(image 2)
UID was found twice (highlighted on image2 by black and red rectangles) 1st obviously our CDCI field and second looks more interesting to us, we assumed its Instance UID of unknown to us object, lets check if its true now time to get back to out basic understanding of MXF, we know that MXF object looks like:
 1) 16 bytes long Key
 2) BER encoded length
 3) Set of values that represented like:
 3.1) 2 bytes unique value (Local Tag) which we can use to find key in Primer
 3.2) 2 bytes length of value
 3.3) Value

so if found UID is Instance UID of unknown object it should follow same rules which means found UID id corresponds to (3.3) and 2 word long values should precede  found UID ((3.1) and (3.2)) more than that we know our UID is 16 bytes long, so (3.2) should be equal to 16 or 0x00 0x10 in hex and if you check hex viewer (image2 highlighted by yellow rectangle) you`ll see yes (3.2) really equal to 16,
 and (3.1) equal to 0x3C 0x0A and if we are right we should find record in Primer with local Tag equal to 0x3C  0x0A (image 3)
(image 3)
(image 3)
lets check dump again and yes we see there record with Local Tag equal to 3c.0a and UID corresponds to this tag is 06.0e.2b.
 if you grep libMXF for that UID you'll find:
MXF_ITEM_DEFINITION(InterchangeObject, InstanceUID,
                        MXF_LABEL (0x06,0x0e,0x2b,0x34,0x01,0x01,0x01,0x01,0x01,0x01,0x15,0x02,0x00,0x00,0x00,0x00),
Ok we confirmed that Dark item contains UID of referenced from CDCI descriptor object, lets name it SubDescriptor.
 Next assumption: Catalyst browser looks for Profile and Level in SubDescriptor referenced from CDCI descriptor and its easy to confirm, for current particular file I see in browser:
 "Profile and level" detected as High422@5.2 its not a problem to google that for profile High422 profile_idc should be equal to 122 or 0x7A in hex and level_idc for level 5.2 equal to 52 or 0x34 in hex, if we return back to our found sub descriptor (image 2 highlighted by orange rectangles) we can see both profile and level presented:
 Profile - 1 byte long value with Local Tag equal to 80 08
 Level - 1 byte long value with Local Tag equal to 80 0B

knowing local tags we can find unique UIDs for Level and Profile field in Primer (image 3)

so at the end we can conclude, Sony Catalyst browser reads Profile and Level from sub descriptor referenced from CDCI descriptor
P.S. yes it described on 15 pages of smpte 381-3, unfortunately I spent one day reversing sample files instead buying that specs

P.P.S. I promised to explain why parse SPS more solid comparing to read sub descriptor. I saw tons different MXF files with AVC stream wrapped (AVC, AVCI, XAVC) and tons different descriptors some have AVC sub descriptor, some have sub descriptor fields added directly to CDCI descriptor, some files have Mpeg Descriptor instead CDCI so on, and btw smpte 381-3 says all avc sub descriptor fields are optional, and only parsing SPS could guaranty you same result regardless descriptor structure (except Avid case as I mentioned before)

Closed Captions and MXF

Last week I went deep to the Closed Captions wrapping into MXF (op1a, as02)
so the task to extend our MXF writer to write captions track and extend reader to read captions track.
by captions I mean CDP (Closed Caption Data Packet)

I had bunch of op1a/as02 files containing captions track inside, sure what I did firts is made dump to reverse structure.
Any CDP should have header which starts with MAGIC 0x9669 so to read you dont even have to understand structure (in case you dont need afd or so) its enough to go through data after Captions Essence Element key and find CDP header start, but its not enough if you want to write captions or you want todo everything right.
captions data mapping in MXF described in SMPTE 436m it costs $75 before buying I decided to google if something was discussed on forums. Most usefull comment was by link (
it more or less describes header before cdp packet except 8 last bytes as result i had to buy SMPTE 436m, but unfortunatelly  I was really surprised that 436m says nothing about that 8 bytes 🙁
I read standard like 4 or 5 times and finally I found answer in line “The Payload Byte Array is an MXF array including an array element count”
so those 8 bytes its part of MXF Array structure:
first 4 bytes – number of elements (in our case size of packet)
second 4 bytes – element size, as we have byte array its always  = 1

After that, will be logical to ask why is that size different of “payload sample count” mentioned by that link?
and this is the second thing not mentioned in that comment, standard says that payload should be double word aligned and if its not it should padded by zeros

so lets say you need to map CDP with size 15 to MXF header strcuture will be:

first 2 bytes:  0x00 0x01  (we have just 1 packet so its 1)
second 2 bytes: 0x00 0x09 (as a rule captions in line 9 but you can set number you need)
fifth byte: 0x02 (if its interlaced or psf frame or 0x04 if its progressive)
sixth byte: 0x04 (8-bit luma)
7th and 8th bytes: 0x00 0x12 (why 18? because we add 3 bytes before cdp: did, sdid and cdp size)
9th – 12th bytes: 0x00 0x00 0x00 0x14 (20 because we pad 18 bytes packet to 20 to be double word aligned)
13th – 16th bytes: 0x00 0x00 0x00 0x01 (always 1)
17th byte:  0x61 (did could be 0x80 as well)
18th byte: 0x01 (sdid 1 if cea708 or 2 if 608, could be also dfferent if did = 0x80)
19th byte: 0x0F (size of your cdp)
20th – 34th bytes:    your cdp
35th – 36th bytes: 0x00 0x00 (padding)

1st one

hello world 🙂

after 3 years keeping silence I decided to get back bloging.
my previous one was
it was more or less about  directshow for .netcf unfortunately one day because of hoster problems I lost db and decided to not restore that blog as had no more interests in win mobile/ce programming as well as no interests in .net or .netcf

so this one will somehow related to video/audio processing:

-sometimes I will post problems I meet and resolve doing media repairing

– sometimes I hope to post something about performance optimizations SIMD or GPU or algorithmical

– sometimes it could be media parsing/muxing issues