Futzing with the x264 code -- possible improvements

mace · Messages: 842 Registration: 12.30.2001

Could be useful as a HQ option extension for UMH. :)

DHoang · Messages: 39 Registration: 11.17.2003

I've heard from somewhere (can't remember where) that the code for x264 is rather unoptimised, there's a lot of places where MMX/MMXEXT/SSE/SSE2/SSE3/SSSE3 code can be included for extra speed but are currently missing. Is this true?

AceHazardX · Messages: 121 Registration: 09.27.2001

Dark Shikari : everything you did is OK, except the way you compare the results. As you have noticed, the result quality depends on both the bitrate and the PSNR/SSIM/metric, so since both change at the same time, it's not easy to compare them. You decided to avoid that issue by saying, arbitrarily, that 'quality = 1/(1-SSIM)/bitrate', and then comparing qualities together. That is definitely not how it should be done. The proper way is to encode at several CRFs, and then to draw the curve metric/bitrate. Once curves are drawn, you can compare the modifications. Especially, you can say "at the same bitrate, the metrics differ by XXX", or "at the same metrics, the bitrate differs by YYY %". It's slower, but it works.

BumblBeeRacer · Messages: 737 Registration: 04.26.2003

burfadel : you've heard wrong. x264 can be made faster - everything can be made faster. But it's definitely not "rather unoptimized". What is missing, last time I checked, is SSSE3 for 32bits OSs ( since akupenguin uses a 64bits OS ), and, perhaps, some SSE2 functions instead of MMXEXT ( it would help on P4/conroe ). Imho, that won't represent more than 5/10% of speed gain. And, imho, if development time were to be spent on x264, I would rather look toward psychovisual enhancements, there are none at the moment, and it can dramatically improve things.

clex2 · Messages: 578 Registration: 10.30.2003

While you're at it, remove MMX1, SSE1, and SSE3 from your list of instruction sets. SSE1 and SSE3 are floating-point and thus useless for video coding, and the last cpu that only had MMX1 was way too slow for x264 anyway.

06.12.2002

CROSS( cross_start, i_me_range, i_me_range/2 ); if(saved_omx != bmx

Low Level · Messages: 2,867 Registration: 09.19.2002

It appears that changing the hexagon grid in UMH to: /* hexagon grid */ omx = bmx; omy = bmy; for( i = 1; i <= i_me_range/4; i++ ) { static const int hex4[20][2] = { {-4, 2}, {-4, 1}, {-4, 0}, {-4,-1}, {-4,-2}, { 4,-2}, { 4,-1}, { 4, 0}, { 4, 1}, { 4, 2}, { 2, 3}, { 0, 4}, {-2, 3}, {-2,-3}, { 0,-4}, { 2,-3}, { 3, 2}, { 3,-2}, {-3, 2}, {-3,-2} }; if( 4*i > X264_MIN4( mv_x_max-omx, omx-mv_x_min, mv_y_max-omy, omy-mv_y_min ) ) { for( j = 0; j < 20; j++ ) { int mx = omx + hex4[j][0]*i; int my = omy + hex4[j][1]*i; if( CHECK_MVRANGE(mx, my) ) COST_MV( mx, my ); } } else { COST_MV_X4( -4*i, 2*i, -4*i, 1*i, -4*i, 0*i, -4*i,-1*i ); COST_MV_X4( -4*i,-2*i, 4*i,-2*i, 4*i,-1*i, 4*i, 0*i ); COST_MV_X4( 4*i, 1*i, 4*i, 2*i, 2*i, 3*i, 0*i, 4*i ); COST_MV_X4( -2*i, 3*i, -2*i,-3*i, 0*i,-4*i, 2*i,-3*i ); COST_MV_X4( -3*i, 2*i, -3*i,-2*i, 3*i, 2*i, 3*i,-2*i ); } } gives a decent boost on the clips/settings I've tried it on (adding 4 more spots to the hexagon).

jim m · Messages: 274 Registration: 08.28.2002

Any results for those metrics you were planning to run? Assuming this "futzing" would indeed yield such improvement in the general case, what effect would changing --merange have with this new algorithm? Would X- and Y-direction motion searching be offset proportionally to the overall extension in search range? Also, in the neighborhood of suggested improvements, I would without hesitation suggest shunting the Exhaustive search onto a different thread than all the other processing. That is, if it proves too difficult to implement ESA into the current multi-thread framework.

Forum

Futzing with the x264 code -- possible improvements