Futzing with the x264 code -- possible improvements » General IT Discussions » Forum
Forum

Forum



ПоискПоиск   Users   Registration   Entrance
Today: 06.06.2025 - 08:22:02
Pages:  1  

Futzing with the x264 code -- possible improvements

Advertising

/
MessageAuthor

Could be useful as a HQ option extension for UMH. :)

---------------------------
John 05 MCS (200.75whp/179.83wtq) ■ ALTA ■ GIAC ■ MINIMANIA.COM HEADER ■ INVIDIA ■ B&M ■ R-SPEED ■ M7 ■ LOTF ENGINE DAMPER ■ ASA AR1 ■

mace

users




Statistics:
Messages: 842
Registration: 12.30.2001
13.07.23 - 18:15:28
Message # 1
RE: Futzing with the x264 code -- possible improvements

I've heard from somewhere (can't remember where) that the code for x264 is rather unoptimised, there's a lot of places where MMX/MMXEXT/SSE/SSE2/SSE3/SSSE3 code can be included for extra speed but are currently missing. Is this true?

---------------------------

DHoang

users




Statistics:
Messages: 39
Registration: 11.17.2003
13.07.23 - 18:26:37
Message # 2
RE: Futzing with the x264 code -- possible improvements

Dark Shikari : everything you did is OK, except the way you compare the results. As you have noticed, the result quality depends on both the bitrate and the PSNR/SSIM/metric, so since both change at the same time, it's not easy to compare them. You decided to avoid that issue by saying, arbitrarily, that 'quality = 1/(1-SSIM)/bitrate', and then comparing qualities together. That is definitely not how it should be done. The proper way is to encode at several CRFs, and then to draw the curve metric/bitrate. Once curves are drawn, you can compare the modifications. Especially, you can say "at the same bitrate, the metrics differ by XXX", or "at the same metrics, the bitrate differs by YYY %". It's slower, but it works.

---------------------------

AceHazardX

users




Statistics:
Messages: 121
Registration: 09.27.2001
13.07.23 - 18:34:12
Message # 3
RE: Futzing with the x264 code -- possible improvements

burfadel : you've heard wrong. x264 can be made faster - everything can be made faster. But it's definitely not "rather unoptimized". What is missing, last time I checked, is SSSE3 for 32bits OSs ( since akupenguin uses a 64bits OS ), and, perhaps, some SSE2 functions instead of MMXEXT ( it would help on P4/conroe ). Imho, that won't represent more than 5/10% of speed gain. And, imho, if development time were to be spent on x264, I would rather look toward psychovisual enhancements, there are none at the moment, and it can dramatically improve things.

---------------------------

BumblBeeRacer

users




Statistics:
Messages: 737
Registration: 04.26.2003
13.07.23 - 18:43:44
Message # 4
RE: Futzing with the x264 code -- possible improvements

While you're at it, remove MMX1, SSE1, and SSE3 from your list of instruction sets. SSE1 and SSE3 are floating-point and thus useless for video coding, and the last cpu that only had MMX1 was way too slow for x264 anyway.

---------------------------

clex2

users




Statistics:
Messages: 578
Registration: 10.30.2003
13.07.23 - 18:49:54
Message # 5
RE: Futzing with the x264 code -- possible improvements

CROSS( cross_start, i_me_range, i_me_range/2 ); if(saved_omx != bmx

---------------------------
10

06.12.2002

users




Statistics:
Messages:
Registration: saved_omy != bmy) { omx = bmx; omy = bmy; CROSS( cross_start, i_me_range, i_me_range/2 ); } gives the exact same results and seems to be a bit faster, so this would be preferable to the Double Cross solution above. This requires this: int saved_omx = omx; int saved_omy = omy; to be placed after the previous instance of omx = bmx; omy = bmy;
13.07.23 - 18:57:29
Message # 6
RE: Futzing with the x264 code -- possible improvements

It appears that changing the hexagon grid in UMH to: /* hexagon grid */ omx = bmx; omy = bmy; for( i = 1; i <= i_me_range/4; i++ ) { static const int hex4[20][2] = { {-4, 2}, {-4, 1}, {-4, 0}, {-4,-1}, {-4,-2}, { 4,-2}, { 4,-1}, { 4, 0}, { 4, 1}, { 4, 2}, { 2, 3}, { 0, 4}, {-2, 3}, {-2,-3}, { 0,-4}, { 2,-3}, { 3, 2}, { 3,-2}, {-3, 2}, {-3,-2} }; if( 4*i > X264_MIN4( mv_x_max-omx, omx-mv_x_min, mv_y_max-omy, omy-mv_y_min ) ) { for( j = 0; j < 20; j++ ) { int mx = omx + hex4[j][0]*i; int my = omy + hex4[j][1]*i; if( CHECK_MVRANGE(mx, my) ) COST_MV( mx, my ); } } else { COST_MV_X4( -4*i, 2*i, -4*i, 1*i, -4*i, 0*i, -4*i,-1*i ); COST_MV_X4( -4*i,-2*i, 4*i,-2*i, 4*i,-1*i, 4*i, 0*i ); COST_MV_X4( 4*i, 1*i, 4*i, 2*i, 2*i, 3*i, 0*i, 4*i ); COST_MV_X4( -2*i, 3*i, -2*i,-3*i, 0*i,-4*i, 2*i,-3*i ); COST_MV_X4( -3*i, 2*i, -3*i,-2*i, 3*i, 2*i, 3*i,-2*i ); } } gives a decent boost on the clips/settings I've tried it on (adding 4 more spots to the hexagon).

---------------------------

Low Level

users




Statistics:
Messages: 2,867
Registration: 09.19.2002
13.07.23 - 19:05:41
Message # 7
RE: Futzing with the x264 code -- possible improvements

Any results for those metrics you were planning to run? Assuming this "futzing" would indeed yield such improvement in the general case, what effect would changing --merange have with this new algorithm? Would X- and Y-direction motion searching be offset proportionally to the overall extension in search range? Also, in the neighborhood of suggested improvements, I would without hesitation suggest shunting the Exhaustive search onto a different thread than all the other processing. That is, if it proves too difficult to implement ESA into the current multi-thread framework.

---------------------------

jim m

users




Statistics:
Messages: 274
Registration: 08.28.2002
13.07.23 - 19:13:28
Message # 8
RE: Futzing with the x264 code -- possible improvements
Nobody Knows Everything!!! : Previous topic
Pages:  1  

The administrator has prohibited guests from replying to messages! To register, follow the link: register


Participants