Category Archives: Uncategorized

AMCDX VIDEO PATCHER V0.5.5

AMCDX Video Patcher v0.5.5 released
Starting v0.5.5 there is no more required to have CPU which supports AVX2 Instruction Set (so you can use it on old Intel/AMD CPUs and on ARM CPUs)

Features:
1) You can type exact frame you want to position to (instead of moving scrubber)
2) Added “Show Shortcuts” Menu Item

Bugs Fixed:
1) Fixed bug when after some manipulations Keyboard shortcuts got inactive

Windows Installer
OSX Installer
Linux Build available upon request

What’s Next?
1) Insert directly to file stored on AWS S3
2) VC-3 playback and Insert support

Some thoughts on performance optimization

I have a bad habit to try to make my code highly optimized even if the unit not fully finished, and I`m serious when say its a bad habit as it really slows down the development process and sometimes affects code quality…

For example here is an interesting trick I added to my ProRes encoder/decoder:

When you encode/decode DC you should use one of 4 existing codebooks and codebook should be chosen based on value just encoded DC which obviously can be way bigger of 3. So what most of us will do here just add branching:

if (codebook > 3) {
    codebook = 3;
}

And in 999 of 1000 cases its probably will be the best solution. For my case, I have 3 * 30 DC per slice (technically 3 * 32 but first 2 DCs don’t need any branching to know codebook) or 364500 DC per UHD frame which is kinda bad…

as you might see max value of codebook == 3 eg (2^2 – 1) and this is the case where we can easily avoid branching, so I replaced that branching with code:

codebook = (3 & (4 - !!(codebook & 0xfffc))) + ((codebook & 3) & (4 - !(codebook & 0xfffc)));

Which is avoid branching and in my particular case improves performance a bit ~0.3-0.5% but to be fair this code would be hard to support and I still doubt if I need to push it.

This is really tricky example which probably shouldn’t be considered especially when your unit is not 100% complete and when there are for sure a million ways to optimize your code.

One more example

This one is more critical and which I faced doing my Contract.
I had to fix performance regression which the company faced after FFmpeg upgrade from FFmpeg 3.x to FFmpeg 4.2 as they use FFmpeg to demux MOV files.

One of the developers found out that FFmpeg finally added 12-bit decoding support and now they claim all HQX and 4444 profiles as 12 bit and indeed that commit causes regression. It sounds weird if you consider the fact they use official libs from Apple for decoding …

So how is it possible? My first thought was decoder still used somewhere how else is it possible File open would become x2 slower? So how do you open File with ffmpeg libs? Something like:

avformat_open_input ...
avformat_find_stream_info ..

What I found out avformat_find_stream_info reads the first frame from file and decodes it and does it single-threaded. How do you like it? To be fair there is a reason behind it as sometimes there is no way to get all needed metadata without decoding frame header (for example bit depth or pixel format so on) but the problem is we don’t need to decode whole frame to get that metadata we just need to decode frame header… So I added an extra flag wich force to stop FFmpeg Prores and DNx decoders after headers decoded, something like:

//proresdec2.c .  line 784
if (avctx->flags & AV_CODEC_STOP_AFTER_HEADER_DECODED) {
    return avpkt->size;
}

Believe it or not but instead of x2 slowdown we achieved x2 speed up and now file open became constant time regardless of resolution when with the previous version the higher resolution was the slower file open would be…

Prores tools updates

I didn’t write last time as was quite busy and unfortunately didn’t work a lot on announced tools as still doing my main contract and literally have max 2 hours a day to do my project :(.

Nevertheless, as there was high interest I want to post some updates:

  1. I finally finished ProRes decoder which needed for a lot of reasons (playback, transcode as we need to stop decoding before doing IDCT, in-place edit as we need to decode just some of the slices, not whole frame)
  2. I finished MOV demuxer which based on the project I started here (https://github.com/da8eat/qtfile_pp) never committed updates though but at least you can see the main idea
  3. I made huge progress on MOV muxer (also based on GitHub project mentioned above)
  4. I finished basic UI (Qt/qml based)
    ProRes Tools UI

5. I implemented GLSL shaders for Video Renderer

Next couple of months I`m going to finish the Player and MOV muxer, and after that, finally, start features integration.

First one will be ProRes to ProRes transcode (for example transcode HQX to Proxy profile)

The second one will be in-place ProRes editing (I plan to detect faces on video and blur all detected faces in the input file without video re-encoding and file re-muxing)

FFMPEG + GPL

Just to make things clear

0) yes I agree I violated GLP and I feel sorry, but I didnt do it on purpose (and to be fair I didnt know much licensing details till the week I was noticed about my violation)

1) After Kieran left the comment about violation I made repo private (to spend some time to understand all details) and today I fully removed the repo

2) To be fair its easy to see it wasn’t done on purpose as I never posted updates even though the posted version had interlaced coding bug. I was asked a couple of times privately to make a custom build with my encoder and I rejected it as the only purpose I persued was to show encoder exists and performs well

3) Why dont I disclose source codes?

3.1) I was going to re-work prores_ks or add one more encoder and I had an exact plan on how to do it and I started with it. I sent couple patches 1st was approved 2nd still under review. I decided to not wait forever and Im not the one to ping every day/week to make it pushed (I believe if the community needs something it will be pushed, and its easy to prove with my mxf op1b research when my changes were pushed even though Ive never sent that patch)

3.2) I still was interested to finish my Prores encoder so I continued to work on it. And now when encoder done I plan to make a product based on it. It will be free but probably close-sourced.

3.3) Even if I want one day to make it part of ffmpeg it wont be easy to do, as my implementation done in C++ and ffmpeg is C

3.4) so the build I shared (and later removed) literally a bag of tricks.

3.5) so when Kieran/Martin/Carl or whoever says I should share prores_amcdx_encoder I can easily do it, but what will you see here? as its basically skeleton copied from prores_anatoly with whole logic replaced by calling functions from private static library

#include "libavutil/opt.h"
#include "avcodec.h"
#include "internal.h"
#include "profiles.h"
#include "prores_defs.hpp"

#define DEFAULT_SLICE_MB_WIDTH 8

static const AVProfile profiles[] = {
    { FF_PROFILE_PRORES_PROXY,    "apco"},
    { FF_PROFILE_PRORES_LT,       "apcs"},
    { FF_PROFILE_PRORES_STANDARD, "apcn"},
    { FF_PROFILE_PRORES_HQ,       "apch"},
    { FF_PROFILE_PRORES_4444,     "ap4h"},
    { FF_PROFILE_PRORES_XQ,       "ap4x"},
    { FF_PROFILE_UNKNOWN }
};

static const int valid_primaries[9]  = { AVCOL_PRI_RESERVED0, AVCOL_PRI_BT709, AVCOL_PRI_UNSPECIFIED, AVCOL_PRI_BT470BG,
                                         AVCOL_PRI_SMPTE170M, AVCOL_PRI_BT2020, AVCOL_PRI_SMPTE431, AVCOL_PRI_SMPTE432,INT_MAX };
static const int valid_trc[4]        = { AVCOL_TRC_RESERVED0, AVCOL_TRC_BT709, AVCOL_TRC_UNSPECIFIED, INT_MAX };
static const int valid_colorspace[5] = { AVCOL_SPC_BT709, AVCOL_SPC_UNSPECIFIED, AVCOL_SPC_SMPTE170M,
                                         AVCOL_SPC_BT2020_NCL, INT_MAX };

typedef struct {
    AVClass *class;
    void *encoder;
    int cs;
    int qual;
    int field_order;
    int planes;
    int target_size;
} ProresContext;

static int prores_encode_frame2(AVCodecContext *avctx, AVPacket *pkt,
                               const AVFrame *pict, int *got_packet)
{
    ProresContext *ctx = avctx->priv_data;
    int ret;
    int frame_size = amcdx_pr_encoder_encode(ctx->encoder, (void **)pict->data, (int *)pict->linesize, ctx->planes); //for the time being


    if ((ret = ff_alloc_packet2(avctx, pkt, frame_size, 0)) < 0)
        return ret;

    amcdx_pr_encoder_read(ctx->encoder, pkt->data, &pkt->size);


    pkt->flags |= AV_PKT_FLAG_KEY;

    *got_packet = 1;
    return 0;
}

static av_cold int prores_encode_init2(AVCodecContext *avctx)
{
    ProresContext* ctx = avctx->priv_data;

    avctx->bits_per_raw_sample = 10;

    if (avctx->width & 0x1) {
        av_log(avctx, AV_LOG_ERROR,
                "frame width needs to be multiple of 2\n");
        return AVERROR(EINVAL);
    }

    if (avctx->width > 65534 || avctx->height > 65535) {
        av_log(avctx, AV_LOG_ERROR, "The maximum dimensions are 65534x65535\n");
        return AVERROR(EINVAL);
    }

    switch (avctx->profile) {
    case FF_PROFILE_UNKNOWN:
    case FF_PROFILE_PRORES_STANDARD:
        ctx->qual = Quality_422;
        break;
    case FF_PROFILE_PRORES_4444:
        ctx->qual = Quality_4444;
        break;
    case FF_PROFILE_PRORES_HQ:
        ctx->qual = Quality_422HQ;
        break;
    case FF_PROFILE_PRORES_LT:
        ctx->qual = Quality_422LT;
        break;
    case FF_PROFILE_PRORES_PROXY:
        ctx->qual = Quality_422Proxy;
        break;
    case FF_PROFILE_PRORES_XQ:
        ctx->qual = Quality_4444XQ;
        break;
    default:
        return -1;
        break;
    }

    switch (avctx->pix_fmt) {
    case AV_PIX_FMT_UYVY422:
        ctx->cs = ColorSpace_uyvy;
        ctx->planes = 1;
        break;
    case AV_PIX_FMT_YUV422P10:
        ctx->cs = ColorSpace_yuv10_422_planar;
        ctx->planes = 3;
        break;
    case AV_PIX_FMT_YUV422P12:
        ctx->cs = ColorSpace_yuv12_422_planar;
        ctx->planes = 3;
        break;
    case AV_PIX_FMT_YUV444P12:
        ctx->cs = ColorSpace_yuv12_444_planar;
        ctx->planes = 3;
        break;
    default:
        break;
    }

     //for the time being

    switch (avctx->field_order)
    {
    case AV_FIELD_BT:
        avctx->field_order = FieldOrder_BottomFieldFirst;
        break;
    case AV_FIELD_TB:
        avctx->field_order = FieldOrder_TopFieldFirst;
        break;
    case AV_FIELD_PROGRESSIVE:
    default: //otherwise we think its progressive
        avctx->field_order = FieldOrder_Progressive;
        break;
    }

    ctx->encoder = amcdx_pr_encoder_create();

    if (ctx->target_size != 0) {
        amcdx_pr_encoder_set_frame_size(ctx->encoder, ctx->target_size);
    }

    avctx->codec_tag = MKTAG(profiles[avctx->profile].name[0], profiles[avctx->profile].name[1], profiles[avctx->profile].name[2], profiles[avctx->profile].name[3]);// AV_RL32((const uint8_t*)profiles[avctx->profile].name);

    return amcdx_pr_encoder_init(ctx->encoder, avctx->width, avctx->height, ctx->cs, ctx->qual, ctx->field_order) - 1;
}

static av_cold int prores_encode_close2(AVCodecContext *avctx)
{
    ProresContext* ctx = avctx->priv_data;
    amcdx_pr_encoder_destroy(ctx->encoder);

    return 0;
}

#define OFFSET(x) offsetof(ProresContext, x)
#define VE     AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM

static const AVOption options[] = {
    { "target_size", "force frame size", OFFSET(target_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, VE },
    { NULL }
};

static const AVClass proresamcdx_enc_class = {
    .class_name = "ProRes amcdx encoder",
    .item_name  = av_default_item_name,
    .option     = options,
    .version    = LIBAVUTIL_VERSION_INT,
};

AVCodec ff_prores_amcdx_encoder = {
    .name           = "prores_amcdx",
    .long_name      = NULL_IF_CONFIG_SMALL("Apple ProRes"),
    .type           = AVMEDIA_TYPE_VIDEO,
    .id             = AV_CODEC_ID_PRORES,
    .priv_data_size = sizeof(ProresContext),
    .init           = prores_encode_init2,
    .close          = prores_encode_close2,
    .encode2        = prores_encode_frame2,
    .pix_fmts       = (const enum AVPixelFormat[]){AV_PIX_FMT_UYVY422, AV_PIX_FMT_YUV422P10, AV_PIX_FMT_YUV422P12, AV_PIX_FMT_YUV444P12, AV_PIX_FMT_NONE},
    .capabilities   = AV_CODEC_CAP_FRAME_THREADS | AV_CODEC_CAP_INTRA_ONLY,
    .priv_class     = &proresamcdx_enc_class,
    .profiles       = NULL_IF_CONFIG_SMALL(ff_prores_profiles),
};

how good is your prores workflow?

I was silent for a year and to be fair I haven’t touched prores for a year as well, except last couple months

so whats happened:
Basically I was looking for my self, what I want to do and what kind of work makes me happy. I tried couple different projects but at the very end returned to the fact that reverse engineering and video codecs most interesting to me.

I returned back to reversing ProRes where I went trough couple different periods/questions:

1. Why would anyone need it if apple shares it for free?

2. Why would anyone need it if ffmpeg has 2 different implementations?

3. Why cant I contribute to ffmpeg?

Every period made my position that one more ProRes encoder needed even stronger, and here are my answers on questions I asked my self:

1. First of all its possible to get Apple implementation (obviously not source codes) for free my previous employee experience confirms it, but procedure is unclear and seems you need connections as well

2. Both ffmpeg implementations far from what Apple encoder does. no one support 12 bit input, both produce bad ouptut especially for Proxy and XQ qualities.

2.1 Anatoly`s implementation has really poor rate control with all this min/max quant limits

2.1 Kostya`s implementation has better rate control but still nothing similar to what Apple does and waaay slower

3. At some point I tried to contribute to ffmpeg one of my patches was approved, but when I started to work on performance optimizations and pushed patch I just got ignored… Yes patch didnt break fate test, patch improved performance but was not approved as well as declined or anyhow commented it just will be for ever in the review state, and to be fair reading ffmpeg dev-emails I found that guys more open to fight about sponsored changes than something else, so yes deep inside ffmpeg became super commercialized 🙁

So now about progress I made:

1. I finished first version of encoder, which support 8/10/12 bit input, supports 422 and 444 input (alpha still not implemented)

2. Encoder has Apple similar rate control and to be fair on same input produce absolutely identical output comparing to Apple version

It is like x3 faster of ffmpeg encoder but still slower of Apple one (there are still a lot of room for optimizations)

Returning to the question why anyone would need one more ProRes encoder. here is pros of what I do:

  1. Absolutely identical output with apple implementation
  2. 12 bit support
  3. At some point Prores RAW will be added

Another words its way better of what ffmpeg does and easy to port to any platform (which not that easy to get from Apple in case you succeeded to get anything)

But the main reason is new Product I`m building based on this encoder.

When I started understand well Apple bitstream and logic in general. I almost immediately had couple ideas how to use it:
1. In-place editing. for example you dont like couple frames in your final file, now you can replace it without re-encoding whole file.
1.1. You cant do it with Apple implementation as you cannot control output frame size.
1.2. You can do it with Cinedeck tools (but you would need to re-wrap your file) as they relay on Apple implementation, its probably not a big deal if your file is couple hundred megabytes, but if its hundreds of gigabytes? and your file on S3?

2. I decided to go even further. Lets say you need to put logo on your frame or blur any part of frame, now you can not only replace frames inside file, but its possible to replace part of any frame, another words to add logo to ProRes encoded frame you would need to re-encode only that part of frame where logo will be placed

thats bassicly it. sounds exciting? ping me if you going to NAB2019 so you can see how powerfull it is!

DirectShowNETCF

its sounds a bit strange to me but I still receive emails with request to sell or share DirectShowNETCF source codes.

As I said in first post I can hardly find reason why anyone still need it as result no need support it anymore. Nevertheless if someone still need it for any reason I published sources codes I found on one of my old laptops (no quite sure if its most recent version)

https://github.com/da8eat/directshow_netcf

 

P.S. please dont judge much code quality, it was done ton years ago and mostly with educational purpose 🙂

1st one

hello world 🙂

after 3 years keeping silence I decided to get back bloging.
my previous one was http://alexmogurenko.com/blog
it was more or less about  directshow for .netcf unfortunately one day because of hoster problems I lost db and decided to not restore that blog as had no more interests in win mobile/ce programming as well as no interests in .net or .netcf

so this one will somehow related to video/audio processing:

-sometimes I will post problems I meet and resolve doing media repairing

– sometimes I hope to post something about performance optimizations SIMD or GPU or algorithmical

– sometimes it could be media parsing/muxing issues