Horde3D

Next-Generation Graphics Engine
It is currently 15.05.2024, 05:39

All times are UTC + 1 hour




Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: utMath elite
PostPosted: 22.01.2009, 17:35 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Hi, I've performed some high-school level optimizations [not SIMD] on utMath. Any ideas are welcome :wink:


Attachments:
File comment: outdated, only compatible with Horde3D SDK Beta2
utMath_elite.zip [6.23 KiB]
Downloaded 626 times


Last edited by Siavash on 04.04.2009, 16:49, edited 1 time in total.
Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 22.01.2009, 18:53 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
Siavash wrote:
Hi, I've performed some high-school level optimizations [not SIMD] on utMath.
I am pretty sure that isn't what you meant, high-level would be the term ;)

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 23.01.2009, 03:50 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
swiftcoder wrote:
Siavash wrote:
Hi, I've performed some high-school level optimizations [not SIMD] on utMath.
I am pretty sure that isn't what you meant, high-level would be the term ;)
High-level? Is there any performance increase? [there is only some precalculated values, factoring of polynomials and ...]


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 04.04.2009, 15:03 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Is there any feel able performance diff between Horde3D beta2 & beta3 utMath libs?


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 05.04.2009, 12:31 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
Siavash wrote:
Is there any feel able performance diff between Horde3D beta2 & beta3 utMath libs?

The most inportant thing in utMath Beta3 is an optimized float to int conversion.


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 12:18 
Offline

Joined: 11.04.2009, 08:42
Posts: 14
Location: France
not about the patch above but on the same topic.

The determinant() function looked so... suboptimal that I couldn't restrain myself and factorized it. The resulting code is not that cute but the performance gain is there :

Code:
float fastDeterminant() const
   {
      /*
         factorization result :
            192 -> 20 ptr deref
            96 -> 28 fpmult
            24 -> 18 fpadd
            
            => solid 30% speed improvement
      */
      
      const float * const c0 = c[0];
      const float * const c1 = c[1];
      const float * const c2 = c[2];
      const float * const c3 = c[3];
      
      const float c00 = c0[0];
      const float c01 = c0[1];
      const float c02 = c0[2];
      const float c03 = c0[3];
      
      const float c10 = c1[0];
      const float c11 = c1[1];
      const float c12 = c1[2];
      const float c13 = c1[3];
      
      const float c0011 = c00*c11;
      const float c0012 = c00*c12;
      const float c0013 = c00*c13;
      
      const float c0110 = c01*c10;
      const float c0112 = c01*c12;
      const float c0113 = c01*c13;
      
      const float c0210 = c02*c10;
      const float c0211 = c02*c11;
      const float c0213 = c02*c13;
      
      const float c0310 = c03*c10;
      const float c0311 = c03*c11;
      const float c0312 = c03*c12;
      
      const float c03x12m02x13 = c0312 - c0213;
      const float c03x11m01x13 = c0311 - c0113;
      const float c02x11m01x12 = c0211 - c0112;
      const float c02x10m00x12 = c0210 - c0012;
      const float c03x10m00x13 = c0310 - c0013;
      const float c01x10m00x11 = c0110 - c0011;
      
      const float c20 = c2[0];
      const float c21 = c2[1];
      const float c22 = c2[2];
      const float c23 = c2[3];
      
      return
         c3[0] * ( c03x12m02x13*c21 - c03x11m01x13*c22 + c02x11m01x12*c23 )
         -
         c3[1] * ( c03x12m02x13*c20 - c03x10m00x13*c22 + c02x10m00x12*c23 )
         +
         c3[2] * ( c03x11m01x13*c20 - c03x10m00x13*c21 + c01x10m00x11*c23 )
         -
         c3[3] * ( c02x11m01x12*c20 - c02x10m00x12*c21 + c01x10m00x11*c22 );
      


Now there is probably room for explicit SIMD vectorization here but I'm not familiar enough with this to do it myself atm. By the way, "implicit" vectorization (passing this to gcc : -msse -msse2 -msse3 -mfpmath=sse) does not change the benchmark results.

Also note that some profiling shows (valgrind, under linux 32 bit (arch : core 2) compiled with -O2) that the strategy used for += *= and /= operators is quite suboptimal (I'm not posting "fixes" here as they are extremely simple).


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 13:40 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Thanks a lot fullmetalcoder for the patch; You made me to perform some benchmarks using Intel Vtune :wink:
[MSVC2008 Express : Debug]
Code:
////////Determinant() Benchmark///////////////
utMath beta3   : 33 ;
fullmetalcoder : 30 ;
utMath elite   : 28 ;
//////////////////////////////////////////////

////////Inverted() Benchmark//////////////////
utMath beta3   : 148 ;
fullmetalcoder : 148 ;
utMath elite   : 132 ;
//////////////////////////////////////////////
Looks that utMath elite rocks 8)


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 13:57 
Offline

Joined: 11.04.2009, 08:42
Posts: 14
Location: France
interesting benchmark figures. Could it be that MSVC does a better job at optimizing complex lookup + math ops? here is what I get with the code from SVN :

Quote:
det test : 100000000
normal : result=-0.480000, elapsed=123
fast : result=-0.480000, elapsed=88

elapsed time in ms, 10**8 determinant computations. Code compiled with GCC ( -march=i686 -mtune=generic -O2 ) running on my laptop (core 2 T7700, 3GB DDR, Linux)

edit : just tried replacing regular determinant() with the one from the archive above. Here are the results :
Quote:
det test : 100000000
normal : result=-0.480000, elapsed=112
fast : result=-0.480000, elapsed=87


edit2 : benchmarking in debug mode is generally not a good idea for such simple operations. It is very likely that det = o(debug overhead)


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 14:24 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
fullmetalcoder wrote:
interesting benchmark figures. Could it be that MSVC does a better job at optimizing complex lookup + math ops?
It's too interesting because the system that I was using was an old PentiumIII 750MHz + 256mb SD-RAM on WindowsXP SP3 without any optimizations.
fullmetalcoder wrote:
benchmarking in debug mode is generally not a good idea for such simple operations. It is very likely that det = o(debug overhead)
Yes I know, but if you have a look at disasm of code you will see that MSVC cheats in the release mode [precalculated values and ...].


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 14:27 
Offline

Joined: 11.04.2009, 08:42
Posts: 14
Location: France
Siavash wrote:
Yes I know, but if you have a look at disasm of code you will see that MSVC cheats in the release mode [precalculated values and ...].

All compilers cheat. I have been forced to add some extra code to in the loop of determinant compuation to make sure the whole loop was not optimized away (turned into a single call)...


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 11.04.2009, 15:14 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
I've forced thee compiler to print out the results outside of loop [Release mode /O2]:
Code:
/////// det -  inv /////////
beta3 : 13  -  56
fast  : 9   -  49
elite : 9   -  49
////////////////////////////
Both of fullmetalcoders & elite versions are same :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: utMath elite
PostPosted: 12.04.2009, 18:22 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
It's a good idea to replace some parts of utMath beta3 with elite version, there is a ~13% performance boost in det & inv functions.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group