Prores progress updates

As I got some questions about progress I decided to post some updates and clarifications:

  1. I succeeded to improve performance so now encoder a bit faster of Apple implementation with identical output (and I still see room for improvements)
  2. I fixed some minor issues and fully implemented XQ profile
  3. About 12 bit support: there was a thread in ffmpeg dev list where some core developers were claiming 12-bit Prores is a myth so you know Apple encoder encode all data as 12 bit even if you pass 8-bit uyvy it first converted to 12 bits and encoded after that
  4. Based on statement (3) I can expose big mistake I made in Cinedeck Prores Insert-Edit. Basically Cinedeck checks some stream parameters to make the decision if input stream should be re-encoded or not before insert. One of them is src pixel format and if input and output has different src pixel formats video gets re-encoded before insert which now i can say is wrong behavior as basically on encoder side its always 12 bit and not src pixel format but chroma subsampling had to be checked
  5. There is one more util I work on. Its Prores smart transcoder:
    Lets say we need to transcode Prores HQ to Prores Proxy, thats how any transcoding app will do it:
    1) Decode frame (vlc decode -> dequantize -> inverse dct -> assemble slices to frame buffer
    2) Encode (disassemble frame to slices, -> forward dct -> rate control -> quantize -> vlc encode)

    From first point of view it looks ok, but from my point of view it should be:
    1) vlc decode -> dequantize -> rate control -> quantize -> vlc encode
    so basically I got rid of some heavy but useless steps which make transcode almost x2 faster comparing to the classical way
    Obviously, it works only if you transcode from Prores to Prores

I`m still quite far to show demo (except some command line applications), but here is priority list:

  1. Make user friendly UI so it easy to show and use
  2. Finish MOV parser/muxer (as im not a fan to use FFMPEG for demo)

how good is your prores workflow?

I was silent for a year and to be fair I haven’t touched prores for a year as well, except last couple months

so whats happened:
Basically I was looking for my self, what I want to do and what kind of work makes me happy. I tried couple different projects but at the very end returned to the fact that reverse engineering and video codecs most interesting to me.

I returned back to reversing ProRes where I went trough couple different periods/questions:

1. Why would anyone need it if apple shares it for free?

2. Why would anyone need it if ffmpeg has 2 different implementations?

3. Why cant I contribute to ffmpeg?

Every period made my position that one more ProRes encoder needed even stronger, and here are my answers on questions I asked my self:

1. First of all its possible to get Apple implementation (obviously not source codes) for free my previous employee experience confirms it, but procedure is unclear and seems you need connections as well

2. Both ffmpeg implementations far from what Apple encoder does. no one support 12 bit input, both produce bad ouptut especially for Proxy and XQ qualities.

2.1 Anatoly`s implementation has really poor rate control with all this min/max quant limits

2.1 Kostya`s implementation has better rate control but still nothing similar to what Apple does and waaay slower

3. At some point I tried to contribute to ffmpeg one of my patches was approved, but when I started to work on performance optimizations and pushed patch I just got ignored… Yes patch didnt break fate test, patch improved performance but was not approved as well as declined or anyhow commented it just will be for ever in the review state, and to be fair reading ffmpeg dev-emails I found that guys more open to fight about sponsored changes than something else, so yes deep inside ffmpeg became super commercialized 🙁

So now about progress I made:

1. I finished first version of encoder, which support 8/10/12 bit input, supports 422 and 444 input (alpha still not implemented)

2. Encoder has Apple similar rate control and to be fair on same input produce absolutely identical output comparing to Apple version

It is like x3 faster of ffmpeg encoder but still slower of Apple one (there are still a lot of room for optimizations)

Returning to the question why anyone would need one more ProRes encoder. here is pros of what I do:

  1. Absolutely identical output with apple implementation
  2. 12 bit support
  3. At some point Prores RAW will be added

Another words its way better of what ffmpeg does and easy to port to any platform (which not that easy to get from Apple in case you succeeded to get anything)

But the main reason is new Product I`m building based on this encoder.

When I started understand well Apple bitstream and logic in general. I almost immediately had couple ideas how to use it:
1. In-place editing. for example you dont like couple frames in your final file, now you can replace it without re-encoding whole file.
1.1. You cant do it with Apple implementation as you cannot control output frame size.
1.2. You can do it with Cinedeck tools (but you would need to re-wrap your file) as they relay on Apple implementation, its probably not a big deal if your file is couple hundred megabytes, but if its hundreds of gigabytes? and your file on S3?

2. I decided to go even further. Lets say you need to put logo on your frame or blur any part of frame, now you can not only replace frames inside file, but its possible to replace part of any frame, another words to add logo to ProRes encoded frame you would need to re-encode only that part of frame where logo will be placed

thats bassicly it. sounds exciting? ping me if you going to NAB2019 so you can see how powerfull it is!

MXF OP1b + FFmpeg Part1

some time ago I was requested to fix strange ffmpeg bug. by customer words they had op1b files from Panasonic camera ffmpeg doesnt read audio correct (first couple seconds were looped).

First thoughts was “easy money” so I signed up.
the problem was trivial basically op1b allows more 1 essence containers (smpte 319) and each essence stored in its own essence container as result each track is unique in essence container and all audio tracks have same track number so when ffmpeg assign new read essence packet it check track number and always assigned int to first audio track

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv)
{
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        if (track && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
        }
    }
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;
}

as file had 4 tracks it made loop effect because of all 4 tracks had same audio.

to resolve we need:

1) in MXFContentStorage read field which contains list of all EssenceContainerData (0x1902)

static int mxf_read_content_storage(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
{
    MXFContext *mxf = arg;
    switch (tag) {
    case 0x1901:
        if (mxf->packages_refs)
            av_log(mxf->fc, AV_LOG_VERBOSE, "Multiple packages_refs\n");
        av_free(mxf->packages_refs);
        return mxf_read_strong_ref_array(pb, &mxf->packages_refs, &mxf->packages_count);
    case 0x1902:
        av_free(mxf->essence_container_data_refs);
        return mxf_read_strong_ref_array(pb, &mxf->essence_container_data_refs, &mxf->essence_container_data_count);
    }
    return 0;
}


2) read each EssenceContainerData (ffmpeg didnt read it at all).

3) each EssenceContainerData has reference to SourcePackage and also holds index sid and body sid

typedef struct MXFEssenceContainerData {
    UID uid;
    enum MXFMetadataSetType type;
    UID package_uid;
    UID package_ul;
    int index_sid;
    int body_sid;
} MXFEssenceContainerData;

static int mxf_read_essence_container_data(void *arg, AVIOContext *pb, int tag, int size, UID uid, int64_t klv_offset)
{
    MXFEssenceContainerData * essence_data = arg;
    switch(tag) {
        case 0x2701:
            /* linked package umid UMID */
            avio_read(pb, essence_data->package_ul, 16);
            avio_read(pb, essence_data->package_uid, 16);
            break;
        case 0x3f06:
            essence_data->index_sid = avio_rb32(pb);
            break;
        case 0x3f07:
            essence_data->body_sid = avio_rb32(pb);
            break;
    }
    return 0;
}

static const MXFMetadataReadTableEntry mxf_metadata_read_table[] = {
//removed to not post too many code
    { { 0x06,0x0e,0x2b,0x34,0x02,0x53,0x01,0x01,0x0d,0x01,0x01,0x01,0x01,0x01,0x23,0x00 }, mxf_read_essence_container_data, sizeof(MXFEssenceContainerData), EssenceContainerData },
    { { 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 }, NULL, 0, AnyType },
};


4) go through all tracks from each SourcePackage and assign to each track index and body sid we found in corresponding EssenceContainerData

for (k = 0; k < mxf->essence_container_data_count; k++) {
            if (!(essence_data = mxf_resolve_strong_ref(mxf, &mxf->essence_container_data_refs[k], EssenceContainerData))) {
                av_log(mxf, AV_LOG_TRACE, "could not resolve essence container data strong ref\n");
                continue;
            }

            if (memcmp(component->source_package_ul, essence_data->package_ul, sizeof(UID)) || memcmp(component->source_package_uid, essence_data->package_uid, sizeof(UID))) {
                continue;
            }

            source_track->body_sid = essence_data->body_sid;
            source_track->index_sid = essence_data->index_sid;
        }


5) when we read next KLV triplet we look for partition this triplet belongs to and each partition has body sid

static int find_body_sid_by_offset(MXFContext *mxf, int64_t offset) {
    //we basically look for partition where current klv triplet placed

    int i;
    MXFPartition * prev = 0;

        for (i = 0; i < mxf->partitions_count; ++i) {
            MXFPartition * partition = &mxf->partitions[i];

            if (partition->body_sid) {
                if (partition->this_partition < offset) {
                    prev = partition;
                }
                else {
                    break;
                }
            }
        }

    if (prev) {
        return prev->body_sid;
    }

    return 0;
}


6) when we look for which track to assign this triplet we compare track number and body sid (before was only track number compared)

static int mxf_get_stream_index(AVFormatContext *s, KLVPacket *klv, int body_sid)
{
    int i;
    for (i = 0; i < s->nb_streams; i++) {
        MXFTrack *track = s->streams[i]->priv_data;
        /* SMPTE 379M 7.3 */
        //we check body_sid and track->body_sid equal to zero just just to be compatible with old where no body_sid assigned to track
        if (track && (body_sid == 0 || track->body_sid == 0 || track->body_sid == body_sid) && !memcmp(klv->key + sizeof(mxf_essence_element_key), track->track_number, sizeof(track->track_number))) {
            return i;
        }
    }
    /* return 0 if only one stream, for OP Atom files with 0 as track number */
    return s->nb_streams == 1 ? 0 : -1;
}

Those changes fixed audio read 🙂

Unfortunately that wasnt it for me… as seek issues occurred and audio packets were too big (audio was custom wrapped e.g. clip wrapped but split on 2 seconds chunk each chunk in new partition) but this is topic for the next post

 

P.S. changes described in this post could be found by link:

https://github.com/da8eat/FFmpeg/blob/master/libavformat/mxfdec.c

 

Reversing ProRes: Part1 (Bitrate)

One of the main problem I met with 3rd parties Prores encoders is bitrate. Approx bitrate could be found in Prores white papers.

But I talk about logic of calculation max possible size. Why max? Because Apple doesnt really care about lower bound, so with black frame as source you will never be even close to bitrate mentioned in white papers.

Basically Apple has simple algorithm and they calc size of frame that encoder never exceed and second seems ideal size to fulfill declared bitrate.

so max/avg size depends on couple input conditions:

  1. resolution
  2. quality
  3. alpha

Alpha case is quite simple and weird same time, if you say to Prores encoder that you going to encode alpha it automatically increase max size by 3* width * height

Main logic is resolution based:

  • if resolution less or equal of SD_NTSC its 288 * 1024
  • if res less or equal of SD_PAL its 336 * 1024
  • so on

that actually was second weird thing I found: the code how its implemented:

int size = width * height;
int rate = 0;

if (size <= 720 * 486) {
    rate = 288 * 1024;
}
else if (size <= 720 * 576) {
    rate = 336 * 1024;
}
else if (size <= 960 * 720) {
    rate = 432 * 1024;
}
// so on

my first question was “have they ever heard about binary search?” 🙂

Nevertheless when base value found they  tune it with respect to quality:

if (qual == proxy) {
    rate = 13 * rate / 63;
}
else if (qual == lt) {
    rate = 13 * rate / 28
}
//so on

thats basically it for max frame size. second value which I named avg value calculated even easier we just multiply previously found rate by 8 and divide by 9

https://github.com/da8eat/prores_encoder

P.S. there are also some color space tricks for example when we encode 422 frame to 4444 quality, but I wont cover it as its a bit artificial situation

Reversing Prores – Part0

yes I know ProRes encoder was reversed long time ago, for example there 3 different encoders in ffmpeg, but :

  • a) there are several needs which any of ffmpeg version doesnt meet
  • b) seems I`m that kind of person who prefer to invent bicycle

so I started to reverse ProRes encoder on my own. there are min and max goals I will try to achieve:

minimum goal is:

  • create ProRes encoder that produce correct bitstream (by correct I mean decodeable by most popular ProRes decoders),
  • encoded frame size more or less equal to size  produced by native Apple encoder
  • encoder performs better of ffmpeg/apple versions

maximum goal is:

  • encoded frame binary identical to Apple native encoder produce
  • encoder performs better of Apple native encoder

so basically maximum goal is to create better version of Apple Prores encoder without source codes 🙂

here in blog I am going to post my progress and thoughts and I think will share code on GitHub

DirectShowNETCF

its sounds a bit strange to me but I still receive emails with request to sell or share DirectShowNETCF source codes.

As I said in first post I can hardly find reason why anyone still need it as result no need support it anymore. Nevertheless if someone still need it for any reason I published sources codes I found on one of my old laptops (no quite sure if its most recent version)

https://github.com/da8eat/directshow_netcf

 

P.S. please dont judge much code quality, it was done ton years ago and mostly with educational purpose 🙂

Reversing mxf op1a

Today I want to show technic how to reverse MXF files, thats what I do from time to time to repair some broken files, but also useful if you need to get some information.

Our user case: We have mxf op1a with XAVC stream inside, if open that file in Sony catalyst browser, we can see detailed information including Level and Profile, we need to understand which part of that file contains that information.

what we will need for that:
 1. libMXF any version (I use custom forked from v.1.0.0-rc1)
 2. MXDump from libMXF
 3. any Hex viewer (I use Far manager on win)

Before we start, best way to understand what part of mxf contains information about Level/Profile is to buy and read
 SMPTE 381-3 "Mapping AVC Streams into the MXF Generic Container" but will it help to improve your mxf reverse technics?

first of all I want to notice that you must have at least basic understanding of MXF (SMPTE 336M, 377M, 379M) that basic knowledge helps us understand object structure, ber, mxf list structure so on.

there are 2 known ways to find XAVC Profile/Level:
 1) read encoded XAVC frame and parse Sequence Parameter Set (SPS) to extract fields profile_idc and level_idc
 2) parse MXF metadata (in most cases and in our specific case parse descriptor)

1st way is more solid as for me except some weird cases like Avid AVCI (they exclude SPS and PPS). I will explain at the end why first method more solid.
 Nevertheless as I saw tons XAVC files that have SPS but Catalyst browser still do not detect/show Profile/Level we can make conclusion they use 2nd way to detect it.

so what we do first is makind dump of needed MXF file using MXFDump app from LibMXF:

MXFDump.exe -m "e:\xavc.mxf" >> "e:\xavc.txt"

Console output was redirected to file so It could be opened and analyzed. As we found Catalyst browser looks for profile and level in metadata and most probably its stream descriptor.

in result dump file CDCI descriptor was found, descriptor contains 1 field that MXFDump knows nothing about so marked it as Dark (check image1)

image1
(image 1)

that dark field has structure that typical for MXF List where first 4 bytes - number of elements, second 4 bytes - element size and after that all elements with defined size.
 In current particular case we see list with 1 element 16 bytes long, so looks like its UID, there is one observation that confirms it, if we compare UID value with instance UID of CDCI descriptor we will find its almost identical except 4th byte which increased by 1, it makes me think we found UID generated using same UID generation algorithm and it was generated right after CDCI descriptor instance UID was generated. We still dont know what that UID means, but next line after CDCI descriptor saying "[ Omitted 1 dark KLV triplets ]" and it makes me think there is an object (unknown for MXFDump). Lets assume dark UID in CDCI descriptor refers to this object, so its time to open our hex viewer and find our unknown UID and check what preceded and follows it.

(image 2)
(image 2)

UID was found twice (highlighted on image2 by black and red rectangles) 1st obviously our CDCI field and second looks more interesting to us, we assumed its Instance UID of unknown to us object, lets check if its true now time to get back to out basic understanding of MXF, we know that MXF object looks like:
 1) 16 bytes long Key
 2) BER encoded length
 3) Set of values that represented like:
 3.1) 2 bytes unique value (Local Tag) which we can use to find key in Primer
 3.2) 2 bytes length of value
 3.3) Value

so if found UID is Instance UID of unknown object it should follow same rules which means found UID id corresponds to (3.3) and 2 word long values should precede  found UID ((3.1) and (3.2)) more than that we know our UID is 16 bytes long, so (3.2) should be equal to 16 or 0x00 0x10 in hex and if you check hex viewer (image2 highlighted by yellow rectangle) you`ll see yes (3.2) really equal to 16,
 and (3.1) equal to 0x3C 0x0A and if we are right we should find record in Primer with local Tag equal to 0x3C  0x0A (image 3)

(image 3)
(image 3)

lets check dump again and yes we see there record with Local Tag equal to 3c.0a and UID corresponds to this tag is 06.0e.2b.34.01.01.01.01.01.01.15.02.00.00.00.00
 if you grep libMXF for that UID you'll find:
MXF_ITEM_DEFINITION(InterchangeObject, InstanceUID,
                        MXF_LABEL (0x06,0x0e,0x2b,0x34,0x01,0x01,0x01,0x01,0x01,0x01,0x15,0x02,0x00,0x00,0x00,0x00),
                        0x3c0a
                        MXF_UUID_TYPE,
                        1
);
Ok we confirmed that Dark item contains UID of referenced from CDCI descriptor object, lets name it SubDescriptor.
 Next assumption: Catalyst browser looks for Profile and Level in SubDescriptor referenced from CDCI descriptor and its easy to confirm, for current particular file I see in browser:
 "Profile and level" detected as High422@5.2 its not a problem to google that for profile High422 profile_idc should be equal to 122 or 0x7A in hex and level_idc for level 5.2 equal to 52 or 0x34 in hex, if we return back to our found sub descriptor (image 2 highlighted by orange rectangles) we can see both profile and level presented:
 Profile - 1 byte long value with Local Tag equal to 80 08
 Level - 1 byte long value with Local Tag equal to 80 0B

knowing local tags we can find unique UIDs for Level and Profile field in Primer (image 3)

so at the end we can conclude, Sony Catalyst browser reads Profile and Level from sub descriptor referenced from CDCI descriptor
P.S. yes it described on 15 pages of smpte 381-3, unfortunately I spent one day reversing sample files instead buying that specs

P.P.S. I promised to explain why parse SPS more solid comparing to read sub descriptor. I saw tons different MXF files with AVC stream wrapped (AVC, AVCI, XAVC) and tons different descriptors some have AVC sub descriptor, some have sub descriptor fields added directly to CDCI descriptor, some files have Mpeg Descriptor instead CDCI so on, and btw smpte 381-3 says all avc sub descriptor fields are optional, and only parsing SPS could guaranty you same result regardless descriptor structure (except Avid case as I mentioned before)

Closed Captions and MXF

Last week I went deep to the Closed Captions wrapping into MXF (op1a, as02)
so the task to extend our MXF writer to write captions track and extend reader to read captions track.
by captions I mean CDP (Closed Caption Data Packet)

I had bunch of op1a/as02 files containing captions track inside, sure what I did firts is made dump to reverse structure.
Any CDP should have header which starts with MAGIC 0x9669 so to read you dont even have to understand structure (in case you dont need afd or so) its enough to go through data after Captions Essence Element key and find CDP header start, but its not enough if you want to write captions or you want todo everything right.
captions data mapping in MXF described in SMPTE 436m it costs $75 before buying I decided to google if something was discussed on forums. Most usefull comment was by link (https://trac.ffmpeg.org/ticket/726#comment:10)
it more or less describes header before cdp packet except 8 last bytes as result i had to buy SMPTE 436m, but unfortunatelly  I was really surprised that 436m says nothing about that 8 bytes 🙁
I read standard like 4 or 5 times and finally I found answer in line “The Payload Byte Array is an MXF array including an array element count”
so those 8 bytes its part of MXF Array structure:
first 4 bytes – number of elements (in our case size of packet)
second 4 bytes – element size, as we have byte array its always  = 1

After that, will be logical to ask why is that size different of “payload sample count” mentioned by that link?
and this is the second thing not mentioned in that comment, standard says that payload should be double word aligned and if its not it should padded by zeros

so lets say you need to map CDP with size 15 to MXF header strcuture will be:

first 2 bytes:  0x00 0x01  (we have just 1 packet so its 1)
second 2 bytes: 0x00 0x09 (as a rule captions in line 9 but you can set number you need)
fifth byte: 0x02 (if its interlaced or psf frame or 0x04 if its progressive)
sixth byte: 0x04 (8-bit luma)
7th and 8th bytes: 0x00 0x12 (why 18? because we add 3 bytes before cdp: did, sdid and cdp size)
9th – 12th bytes: 0x00 0x00 0x00 0x14 (20 because we pad 18 bytes packet to 20 to be double word aligned)
13th – 16th bytes: 0x00 0x00 0x00 0x01 (always 1)
17th byte:  0x61 (did could be 0x80 as well)
18th byte: 0x01 (sdid 1 if cea708 or 2 if 608, could be also dfferent if did = 0x80)
19th byte: 0x0F (size of your cdp)
20th – 34th bytes:    your cdp
35th – 36th bytes: 0x00 0x00 (padding)

1st one

hello world 🙂

after 3 years keeping silence I decided to get back bloging.
my previous one was http://alexmogurenko.com/blog
it was more or less about  directshow for .netcf unfortunately one day because of hoster problems I lost db and decided to not restore that blog as had no more interests in win mobile/ce programming as well as no interests in .net or .netcf

so this one will somehow related to video/audio processing:

-sometimes I will post problems I meet and resolve doing media repairing

– sometimes I hope to post something about performance optimizations SIMD or GPU or algorithmical

– sometimes it could be media parsing/muxing issues