RE: Futzing with the x264 code -- possible improvements
I've heard from somewhere (can't remember where) that the code for x264 is rather unoptimised, there's a lot of places where MMX/MMXEXT/SSE/SSE2/SSE3/SSSE3 code can be included for extra speed but are currently missing. Is this true?
RE: Futzing with the x264 code -- possible improvements
Dark Shikari : everything you did is OK, except the way you compare the results. As you have noticed, the result quality depends on both the bitrate and the PSNR/SSIM/metric, so since both change at the same time, it's not easy to compare them. You decided to avoid that issue by saying, arbitrarily, that 'quality = 1/(1-SSIM)/bitrate', and then comparing qualities together. That is definitely not how it should be done. The proper way is to encode at several CRFs, and then to draw the curve metric/bitrate. Once curves are drawn, you can compare the modifications. Especially, you can say "at the same bitrate, the metrics differ by XXX", or "at the same metrics, the bitrate differs by YYY %". It's slower, but it works.
RE: Futzing with the x264 code -- possible improvements
burfadel : you've heard wrong. x264 can be made faster - everything can be made faster. But it's definitely not "rather unoptimized". What is missing, last time I checked, is SSSE3 for 32bits OSs ( since akupenguin uses a 64bits OS ), and, perhaps, some SSE2 functions instead of MMXEXT ( it would help on P4/conroe ). Imho, that won't represent more than 5/10% of speed gain. And, imho, if development time were to be spent on x264, I would rather look toward psychovisual enhancements, there are none at the moment, and it can dramatically improve things.
RE: Futzing with the x264 code -- possible improvements
While you're at it, remove MMX1, SSE1, and SSE3 from your list of instruction sets. SSE1 and SSE3 are floating-point and thus useless for video coding, and the last cpu that only had MMX1 was way too slow for x264 anyway.
RE: Futzing with the x264 code -- possible improvements
Any results for those metrics you were planning to run? Assuming this "futzing" would indeed yield such improvement in the general case, what effect would changing --merange have with this new algorithm? Would X- and Y-direction motion searching be offset proportionally to the overall extension in search range? Also, in the neighborhood of suggested improvements, I would without hesitation suggest shunting the Exhaustive search onto a different thread than all the other processing. That is, if it proves too difficult to implement ESA into the current multi-thread framework.