<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>hombrealto</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/" />
    <link rel="self" type="application/atom+xml" href="http://hombrealto.com/blog/atom.xml" />
    <id>tag:hombrealto.com,2009-11-04:/blog//1</id>
    <updated>2010-04-16T16:56:19Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.32-en</generator>

<entry>
    <title>Trip to Kenya</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2010/04/trip-to-kenya.html" />
    <id>tag:hombrealto.com,2010:/blog//1.21</id>

    <published>2010-04-16T16:23:05Z</published>
    <updated>2010-04-16T16:56:19Z</updated>

    <summary>I didn&apos;t tell most people, but on April 1st I went on a 10 day trip to Kenya. I&apos;m planning to upload here a good portion of the nearly a thousand photos we made during the trip, but until then,...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[I didn't tell most people, but on April 1st I went on a 10 day trip to Kenya. I'm planning to upload here a good portion of the nearly a thousand photos we made during the trip, but until then, here's a good one to show that IT-related services is never the best paid career. Not even down there ;-)<div><br /><div><img src="http://hombrealto.com/blog/assets_c/2010/04/IMGA0061_2-thumb-500x888-10.jpg" width="500" height="888" alt="IMGA0061_2.JPG" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /><div style="text-align: left;">And here's an image of an absolutely beautiful place on the delta of the Ramisi river, south of Mombasa. The guy in the image is me, running in that crystal clear, mildly-warm water. That beach is an island that disappears when the tide comes up. All white sand and&nbsp;no-one&nbsp;else but ourselves in the island. I just loved that place.</div></div><div style="text-align: left;"><br /></div><div><a href="http://hombrealto.com/blog/assets_c/2010/04/IMGP1330-13.html" onclick="window.open('http://hombrealto.com/blog/assets_c/2010/04/IMGP1330-13.html','popup','width=3264,height=2448,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hombrealto.com/blog/assets_c/2010/04/IMGP1330-thumb-500x375-13.jpg" width="500" height="375" alt="IMGP1330.JPG" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></a></div><div>When I have some more time, I'll create a gallery with loads and loads of photographs of that extremely beautiful country.</div></div>]]>
        
    </content>
</entry>

<entry>
    <title>GPU Raymarching with Distance Fields</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2010/03/gpu-raymarching-with-distance-fields.html" />
    <id>tag:hombrealto.com,2010:/blog//1.20</id>

    <published>2010-03-26T10:03:26Z</published>
    <updated>2010-03-26T10:13:38Z</updated>

    <summary>These last two weeks I&apos;ve been pretty distracted with personal issues, and to get back on track it always helps me to start a small personal project in which I experiment something I&apos;ve never done before. So here we go:...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="Graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="OpenGL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[These last two weeks I've been pretty distracted with personal issues, and to get back on track it always helps me to start a small personal project in which I experiment something I've never done before. So here we go: GPU Raymarching using Distance Fields.<div><br /></div><div>This project is completely based on the work of Iñigo Quilez, the great IQ from rgba. He put some slides in his website explaining how to do this kind of rendering, and after reading them I just had to try it for myself.</div><div><br /></div><div>If you want to read more, just go here:&nbsp;</div><div><a href="http://www.iquilezles.org/www/articles/raymarchingdf/raymarchingdf.htm">http://www.iquilezles.org/www/articles/raymarchingdf/raymarchingdf.htm</a></div><div><br /></div><div>And now, time for some videos:</div><div><br /></div><object width="501" height="376"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=10346355&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=10346355&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="376"></object>
<br />
<object width="501" height="376"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=10349168&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=10349168&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="376"></object>
<br />

<object width="501" height="376"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=10383803&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=10383803&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="376"></object>&nbsp;<br /><div><br /></div><div>I already have another build with more shapes and weird things, but it'll have to wait until I get distracted again ;-)</div>]]>
        
    </content>
</entry>

<entry>
    <title>On C++ performance: The Evil Mr. Branch</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/12/on-performance-function-calls-or-branching.html" />
    <id>tag:hombrealto.com,2009:/blog//1.18</id>

    <published>2009-12-26T08:53:01Z</published>
    <updated>2009-12-26T08:54:03Z</updated>

    <summary>Here&apos;s a simple problem . For some reason, you&apos;ve got to select between two types or values: smaller (or equal) than 50, and bigger than 50. And you need to do that fast using one core (no multithreading). ¿How do...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="C++" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[Here's a simple problem . For some reason, you've got to select between two types or values: smaller (or equal) than 50, and bigger than 50. And you need to do that <b>fast</b> using one core (no multithreading). ¿How do you do it?<br/><br/>

For our example, we'll have an array called <em>data</em> where all values are stored, and an array called <em>results</em>, where an integer value is stored, indicating if the input was bigger or smaller than 50.<br/><br/>

So, this is the straighfoward way to do it.

<pre class='brush: cpp'>
for(int i=0;i&lt;NUM_DATA;++i)
{
    if(data[i] &lt;= 50)
        results[i] = SMALLERTHAN_50;
    else
        results[i] = BIGGERTHAN_50;
}
</pre>

And most of us (and myself 2 days ago) would say: "you won't get much faster than that!". Well, in fact you can. How? <br/><br/>

By not branching!<br/><br/>

In school they show you that branching can make the processor flush the data it has in the pipeline and start over. They also tell you that today's general-purpose processors do not generally take a big hit when branching, but that hit still exists (I'm talking x86 PCs, here. Consoles DO have serious problems with branching). <br/><br/>

So, how do you do this without branching? With a small trick ;-)<br/><br/>
<code>
result = x + (y-x) & condition
</code>
<br/><br/>
If condition has all bits to 0, the second half of the computation will be 0, so result ends up being x. On the other side, if condition has all bits to 1, the x cancels out, and we get result=y.<br/><br/>

So we get a function like this (code shamelessly <a href=http://assemblyrequired.crashworks.org/2009/01/04/fcmp-conditional-moves-for-branchless-math/>taken from here</a>):
<pre class='brush: cpp'>
int isel( int a, int x, int y )
{
    int mask = a &gt;&gt; 31; // arithmetic shift right, splat out the sign bit
    return x + ((y - x) &amp; mask); // mask is 0xFFFFFFFF if (a &lt; 0) and 0x00 otherwise.
}
</pre>

If <em>a</em> is positive, it returns <em>x</em>, if negative, returns <em>y</em>.<br/><br/>

So using this function, we can get our array sorted out just like this:
<pre class='brush: cpp'>
for(int i=0;i&lt;NUM_DATA;++i)
    results[i] = isel(50 - data[i], SMALLERTHAN_50, BIGGERTHAN_50); 
</pre>

It works just as well... the only question is performance. What will run faster?
<br/><br/>
Well, in my machine (a Core 2 Duo at 2.4GHz), for NUM_DATA=10,000, and 200,000 iterations of those loops, I get:<br/><br/>

- Using GCC with no optimizations:
<pre>
Elapsed time BRANCHING:       19.156000000  sec
Elapsed time NOT BRANCHING:   17.566000000  sec
</pre>
- Using GCC with -O3:
<pre>
Elapsed time BRANCHING:       2.605000000  sec
Elapsed time NOT BRANCHING:   1.981000000  sec
</pre>

The difference is not that big without optimizations, but with them, it's clear as day: not branching gets you further. More than a 20%, in this case!!
<br/><br/>
And this goes to show you what some people might already have told you: compilers inline the hell out of your code if set to high optimization values, even if you don't explicitly ask them to. This code is <em>faster</em> using a function call inside the main loop than without it!<br/><br/>

By the way, GCC is NOT generating MMX, SSE, or any instructions like that. This is the disassembly for the main loop in the branchless version (the text on the right is my interpretation for the asm):
<br/><br/>
<pre>
.text:00401170 loc_401170:
.text:00401170             mov     eax, ecx                  ## eax=50
.text:00401172             sub     eax, [ebx+edx*4]          ## eax-=data[i]
.text:00401175             shr     eax, 1Fh                  ## eax>>=31
.text:00401178             mov     [edi+edx*4], eax          ## result[i] = eax
.text:0040117B             inc     edx                       ## ++i
.text:0040117C             cmp     edx, 270Fh                ## if(i&lt;NUM_DATA)
.text:00401182             jle     short loc_401170          ##    repeat loop
</pre><br/>
The process is heavily optimized by GCC, but it's all there in normal, honest-to-fsm, everyday-life instructions.<br/><br/>

I do still have some doubts about this, however, as the GCC optimization for the branching loop doesn't use a <code>j(n)le</code> or <code>jz</code> instruction, but <code>setnle</code>. And I can't find out if this instruction does in fact flush the pipeline or not. Anyway, it still results in a faster processing when using the <code>isel</code> version.
<br/><br/>
For comparison, here's the optimized branching loop:
<br/><pre>
.text:004010F0 loc_4010F0:
.text:004010F0             xor     eax, eax
.text:004010F2             cmp     dword ptr [ebx+edx*4], 32h
.text:004010F6             setnle  al
.text:004010F9             mov     [esi+edx*4], eax
.text:004010FC             inc     edx
.text:004010FD             cmp     edx, 270Fh
.text:00401103             jle     short loc_4010F0
</pre>


And, well, here's the whole testing code if anyone is curious about it:
<pre class='brush: cpp'>
#include &lt;cstdlib&gt;
#include &lt;cstdio&gt;
#include &lt;ctime&gt;

enum{ SMALLERTHAN_50, BIGGERTHAN_50};

int isel( int a, int x, int y )
{
    int mask = a &gt;&gt; 31; // arithmetic shift right, splat out the sign bit
    return x + ((y - x) &amp; mask); // mask is 0xFFFFFFFF if (a &lt; 0) and 0x00 otherwise.
}

int main ()
{
    const int NUM_DATA = 10000;
    const size_t NUM_ITERS = 200000;

    clock_t startTime, endTime;

    int * data     = new int[NUM_DATA];
    int * results  = new int[NUM_DATA];
    int * results2 = new int[NUM_DATA];
    
    // Initialize test data
    for(int i=0;i&lt;NUM_DATA;++i)
        data[i]= rand()%100; 

    startTime = clock();
    // ------------------------------ 
    // Branch test
    for (int j=0;j&lt;NUM_ITERS;++j)
        for(int i=0;i&lt;NUM_DATA;++i)
        {
            if(data[i] &lt;= 50)
                results[i] = SMALLERTHAN_50;
            else
                results[i] = BIGGERTHAN_50;
        }
    // ------------------------------ 
    endTime = clock();

    printf (&quot;\tElapsed time BRANCHING:       %.9f  sec\n&quot;, 
        (float)(endTime-startTime) / CLOCKS_PER_SEC);
    
    startTime = clock();
    // ------------------------------ 
    // Branchless test
    for (int j=0;j&lt;NUM_ITERS;++j)
        for(int i=0;i&lt;NUM_DATA;++i)
            results2[i] = isel(
                50 - data[i], 
                SMALLERTHAN_50, 
                BIGGERTHAN_50);                                                                                   
    // ------------------------------ 
    endTime = clock();
 
    printf (&quot;\tElapsed time NOT BRANCHING:   %.9f  sec\n&quot;,
        (float)(endTime-startTime) / CLOCKS_PER_SEC);

    // Check we didn't mess things up 
    for(int i=0;i&lt;NUM_DATA;++i)
        if( results[i] != results2[i])
            printf (&quot;ERROR in elem %i=%i, first value: %i, second value: %i\n&quot;,
                i, data[i], results[i], results2[i]);
}                                          
</pre>]]>
        
    </content>
</entry>

<entry>
    <title>C++: Could you please code cleanly? Pretty please?</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/12/how-not-to-write-c.html" />
    <id>tag:hombrealto.com,2009:/blog//1.17</id>

    <published>2009-12-20T23:07:20Z</published>
    <updated>2009-12-21T13:50:45Z</updated>

    <summary>One of my little things when coding is that I always feel ashamed when my code is not clean. This means I try very hard to make my code simple and easy to understand. I don&apos;t always succeed (not even...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="C++" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[One of my little things when coding is that I always feel ashamed when my code is not clean. This means I try very hard to make my code simple and easy to understand. I don't always succeed (not even close), but I try.
<br/><br/>
I wasn't always like that, mind you. As a teenager, my code was just a horrible mess of spaghetti code.  But <a href="http://pablo.ordunya.com">Pablo Orduña</a> showed me long ago the value of easily understandable code, and since then I always try my best to code in a clear and non-hackish way.
<br/><br/>
Anyway, these days I've found myself working with code from other people. And, at that, people that didn't try to create readable code. I'm sure they did know how to do it. They just weren't in the mood. You can just read it in the code: "if it works, ship it!"
<br/><br/>
As a small example (there are much much worse things than this), here's a slightly modified version of a function I found the other day:

<pre class='brush: cpp'>
bool RetrieveProperty(object_t * object, string propName, VARIANT* pPropertyValue)
{
	PropertyReader * pPropertyReader = NULL;
	object-&gt;FromInterface(IID_IPropertyReader, (void**)&amp;pPropertyReader);
	if (pPropertyReader)
	{
		Properties * pProperties = NULL;
		pPropertyReader-&gt;get_Properties(&amp;pProperties);
		if (pProperties)
		{
			long propCount = 0;
			if (pProperties-&gt;get_Count(&amp;propCount) == OK)
			{
				for (long i = 0; i &lt; propCount; i++)
				{
					Property * pProperty = NULL;
					pProperties-&gt;get_Item(i, &amp;pProperty);
					if (pProperty)
					{
						PropertyDictionary * pPropertyDictionary = NULL;
						pProperty-&gt;FromInterface(IID_PropertyDictionary, (void**)&amp;pPropertyDictionary);
						if (pPropertyDictionary)
						{
							string bDictionaryName;
							pPropertyDictionary-&gt;get_Name(&amp;bDictionaryName);
							
							if (!strcmp((char*)bDictionaryName.c_str(), &quot;MY_DICTIONARY&quot;))
							{
								HRESULT hr;
								if ((hr = pPropertyDictionary-&gt;get_Item(PropertyName, pPropertyValue)) == OK)
									return true;					
							}
						}
					}
				}
			}
		}
	}

	return false;
}
</pre>

Now, did anyone notice something odd? The nested <em>if</em>s and loops? They make it unnecessarily hard to follow the program flow and they don't buy us anything in return!
<br/><br/>
In fact,the whole structure of the function allows us to remove most of the 9 (yes, 9) indentation levels in that code. We just have to reverse the <em>ifs</em> and return early. Like this:

<pre class='brush: cpp'>
bool RetrieveProperty(object_t * object, string propName, VARIANT* pPropertyValue)
{
	PropertyReader * pPropertyReader = NULL;
	object-&gt;FromInterface( IID_IPropertyReader, (void**)&amp;pPropertyReader);
	if (!pPropertyReader)
		return false;

	Properties * pProperties = NULL;
	pPropertyReader-&gt;get_Properties(&amp;pProperties);
	if (!pProperties)
		return false;

	long propCount = 0;
	if (pProperties-&gt;get_Count(&amp;propCount) != OK)
		return false;
		
	for (long i = 0; i &lt; propCount; i++)
	{
		Property * pProperty = NULL;
		pProperties-&gt;get_Item(i, &amp;pProperty);
		if (!pProperty)
			continue;
		
		PropertyDictionary * pPropertyDictionary = NULL;
		pProperty-&gt;FromInterface(IID_PropertyDictionary, (void**)&amp;pPropertyDictionary);
		if (!pPropertyDictionary)
			continue;
	
		string bDictionaryName;
		pPropertyDictionary-&gt;get_Name(&amp;bDictionaryName);
	
		if (strcmp((char*)bDictionaryName.c_str(), &quot;MY_DICTIONARY&quot;))
			continue;

 		HRESULT hr; // why do we need this, I say?
		if (hr = pPropertyDictionary-&gt;get_Item(PropertyName, pPropertyValue) == OK)
			return true;					
	}

	return false;
}
</pre>
Max indentation: 3 levels.
<br/><br/>
It takes less than a minute, it's safe to do, the meaning of the program is exactly the same... and it's much cleaner and easy to understand! And as a bonus, it isn't even longer than the previous version!!
<br/><br/>
There are even clever people that have said this before me <a href=http://www.refactoring.com/catalog/replaceNestedConditionalWithGuardClauses.html>several</a> <a href=http://www.codinghorror.com/blog/archives/000486.html>times!</a>
<br/><br/>
So please, do try. I know it seems easier and faster not to, but it's only in the short run. Some day (when you have to go back to try to understand your own code) you'll regret it.
<br/><br/>
Still unconvinced? Now try imagining this with <a href=http://thedailywtf.com/Articles/Coding-Like-the-Tour-de-France.aspx>700 loc functions</a> (and yes, I have a few like that one here...)]]>
        
    </content>
</entry>

<entry>
    <title>New job!</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/new-job.html" />
    <id>tag:hombrealto.com,2009:/blog//1.16</id>

    <published>2009-11-22T22:12:58Z</published>
    <updated>2009-11-22T22:53:26Z</updated>

    <summary><![CDATA[Tomorrow monday is officially my first day working for RandomControl as the fryrender&nbsp;plugins subsystem developer and&nbsp;maintainer.By the way, I'll be telecommuting from home, so I'll avoid cubicle-work once again (yay!)And, well, I hope this job leaves me some time (and...]]></summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="Blog" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[Tomorrow monday is officially my first day working for <a href="http://randomcontrol.com">RandomControl</a> as the <a href="http://randomcontrol.com/index.php?option=com_content&amp;view=article&amp;id=3&amp;Itemid=16">fryrender</a>&nbsp;plugins subsystem developer and&nbsp;maintainer.<div><br /></div><div>By the way, I'll be telecommuting from home, so I'll avoid cubicle-work once again (yay!)</div><div><br /></div><div>And, well, I hope this job leaves me some time (and it doesn't eliminate my coding-mood) so I can keep on posting some weird stuff here!</div><div><br /></div><div>And now, I'm going to bed. Gotta get up early to go to the office tomorrow in the morning :-P</div>]]>
        
    </content>
</entry>

<entry>
    <title>Voronoi video-filling</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/voronoi-video-filling.html" />
    <id>tag:hombrealto.com,2009:/blog//1.15</id>

    <published>2009-11-18T10:43:06Z</published>
    <updated>2009-11-18T10:51:12Z</updated>

    <summary><![CDATA[After showing the last test to a&nbsp;colleague, he guessed that this method would look really bad if seen in motion, using a random sampling method for the image. He thought that a constant sampling pattern would be needed... and so...]]></summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[After showing the last test to a&nbsp;colleague, he guessed that this method would look really bad if seen in motion, using a random sampling method for the image. He thought that a constant sampling pattern would be needed... and so I had to test it out.<div><br /></div><div>This is the result, using a video sample from a Fringe episode. I think it's not half-bad when you get past 1000-2000 samples in the image. And it's completely real-time! :-)</div><div><br /></div>

<object width="501" height="376"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=7680865&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=7680865&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="376"></object>]]>
        
    </content>
</entry>

<entry>
    <title>Voronoi image-filling</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/voronoi-image-filling.html" />
    <id>tag:hombrealto.com,2009:/blog//1.14</id>

    <published>2009-11-18T01:49:54Z</published>
    <updated>2009-11-18T01:55:06Z</updated>

    <summary>Little test to see what can you do to approximate a big image using only a few samples, and using Voronoi diagrams for that. It&apos;s not amazingly cool or anything, but it runs in real-time :)...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="Graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="OpenGL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[Little test to see what can you do to approximate a big image using only a few samples, and using Voronoi diagrams for that. It's not amazingly cool or anything, but it runs in real-time :)<div><br /></div>

<object width="501" height="376"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=7675420&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=7675420&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="376"></object>]]>
        
    </content>
</entry>

<entry>
    <title>C++ template instantiation problem (and some solutions)</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/c-template-instantiation-problem-and-some-solutions.html" />
    <id>tag:hombrealto.com,2009:/blog//1.13</id>

    <published>2009-11-15T22:36:14Z</published>
    <updated>2009-11-16T10:23:12Z</updated>

    <summary>Disclaimer: Really long post. If you&apos;re not interested in C++ templates (or in correcting me when I write about them), don&apos;t read this! Also, if you know more about C++ than I do, please respond with a better solution!Now, these...</summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="C++" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[<p><b><i>Disclaimer</i></b><i>: Really long post. If you're not interested in C++ templates (or in correcting me when I write about them), don't read this! Also, if you know more about C++ than I do, please respond with a better solution!</i></p><p>Now, these days I've been reading one of Herb Sutter's wonderful books (Exceptional C++ Style), and one advice he gives is "<i>Where possible, prefer writing functions as nonmembers nonfriends</i>". His arguments seemed pretty solid, so I decided to give it a shot. However, not a day after reading that, I've already found something that keeps me from doing it in certain types of templated code.</p><p>For the small engine I've been writing, I've coded a templated Vec2&lt;T&gt; class. And writing some code to test it, g++ gave this wonderful error:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>
error: no match for 'operator*' in 'geom::Vec2&lt;float&gt;(((const float&amp;)((const float*)(&amp;-1.0e+1f))), ((const float&amp;)((const float*)(&amp;0.0f)))) * ((#'float_expr' not supported by dump_expr#&lt;expression error&gt; * 5.00000000000000010408340855860842566471546888351e-3) + 1.0e+0)'</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><br /></code></blockquote><p>The line that gave that error is this one:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>geom::Vec2&lt;float&gt; v = geom::Vec2&lt;float&gt;(-10,0) * (1+rand()%10*0.005); </code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><br /></code></blockquote> 

<p>The problem (leaving aside some really weird things about float_expr and dump_expr) seems to be that the compiler can't find operator*, even though I did write one. So, what's happening?</p>

<p>Let's see. operator* is defined as a nonmember method, like this:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><span class="Apple-style-span" style="font-family: monospace; ">
template &lt;typename T&gt; const Vec2&lt;T&gt; operator*(const Vec2&lt;T&gt; &amp;lhs, T rhs)&nbsp;</span></code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><span class="Apple-style-span" style="font-family: monospace; ">{&nbsp;</span></code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><span class="Apple-style-span" style="font-family: monospace; ">&nbsp;&nbsp; &nbsp;return Vec2&lt;T&gt;(lhs.x*rhs,lhs.y*rhs);&nbsp;</span></code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><span class="Apple-style-span" style="font-family: monospace; ">}</span></code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><span class="Apple-style-span" style="font-family: monospace; "><br /></span></code></blockquote><p>And if we look close at the line where it fails, we see it's a multiplication just like <code>&nbsp;&nbsp;&nbsp;</code></p><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><p><code>Vec2&lt;float&gt;(10,0) * 0.005</code></p></blockquote>

<p>See the problem now?</p><p>The compiler, at the moment of the <code>operator*</code> template instantiation, has to choose the type of the template to instantiate, but finds none that fits exactly. It finds that it should call <code>operator*(Vec2&lt;float&gt;, double)</code>, but there's only <code>operator*(Vec2&lt;T&gt;,T)</code> defined, so it just sighs and proclaims "What the hell do you want me to do with this?".</p><p>In fact, what we probably want it to do is to convert that double to a float, and then choose the float version of <code>operator*</code>. However, the compiler is not smart enough to do that. As it happens, templated functions parameter type selection and automatic type conversion don't usually mix very well (I read something about it from one of Sutter's books, but I don't remember the exact details). So what can we do?&nbsp;</p><p><br /></p>

<p><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 1.25em; ">Option 1 (bad)</font></font></p>

<p>One option would be to give the compiler a little nudge (well, not so little really) so it chooses the correct template instantiation. For example, this would compile cleanly:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>geom::Vec2&lt;float&gt; v = geom::Vec2&lt;float&gt;(-10,0) * (float)(1+rand()%10*0.005);</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><br /></code></blockquote>

<p>However, it's not very polite of us, as utility class programmers, to negate the class user the option to multiply a float vector by a double scalar. So what other option is there?&nbsp;</p><p><span class="Apple-style-span" style="font-size: 20px; "><br /></span></p><p><span class="Apple-style-span" style="font-size: 20px; ">Option 2 (better?)</span></p><p>Another option would be to make the operator* method to be a template with 2 typenames, one for each side of the operation. Like this:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><span class="Apple-style-span" style="font-family: monospace; ">
template &lt;typename T,typename Y&gt;&nbsp;</span></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><span class="Apple-style-span" style="font-family: monospace; ">const Vec2&lt;T&gt; operator*(const Vec2&lt;T&gt; &amp;lhs, Y rhs)<br />
{<br />&nbsp;&nbsp; &nbsp;return Vec2&lt;T&gt;(lhs.x*rhs,lhs.y*rhs);<br />
}</span></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><span class="Apple-style-span" style="font-family: monospace; "><br /></span></blockquote>
<p>This would work pretty well.&nbsp;</p><p>As a side note, we would have problems if we tried to do multiplications with the scalar in the left side, like:&nbsp;</p><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><p>Vec2&lt;float&gt; v = 2.0 * Vec2&lt;float&gt;(10,0);</p></code></blockquote>

<p>So we would have to define the swapped version too:</p>
<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>
template &lt;typename T,typename Y&gt;&nbsp;</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>const Vec2&lt;T&gt; operator*(Y lhs, const Vec2&lt;T&gt; &amp;rhs)<br />
{<br />&nbsp;&nbsp; &nbsp;return Vec2&lt;T&gt;(rhs.x*lhs,rhs.y*lhs); <br />
}</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code><br /></code></blockquote><p>The problem with this approach is that every different type we use will instantiate another version of the code. For this particular method it won't really matter because the compiler will probably inline it anyway. But for more complicated methods it will just instantiate another full version of the method, creating a serious case of template bloat. For example (assuming they are not inlined), each of these expressions would instantiate another version of <code>operator*</code>:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>
Vec2&lt;float&gt; v1 = Vec2&lt;float&gt;(10,0) * 65;&nbsp;</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>Vec2&lt;float&gt; v2 = Vec2&lt;float&gt;(10,0) * 65.0;&nbsp;</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>Vec2&lt;float&gt; v3 = Vec2&lt;float&gt;(10,0) * 65.0f;&nbsp;</code></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>Vec2&lt;float&gt; v4 = Vec2&lt;float&gt;(10,0) * 'a';</code></blockquote>

<p><br /></p><p>In fact, if we compile those lines with the <code>-ggdb</code> flag, and then inspect them with gdb, we'll see all the instantiations appear:</p>
<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>(gdb) info function geom::operator*</code><p><code>All functions matching regular expression "geom::operator":</code></p><p><code>File test.cpp:</code></p><p><code>const geom::Vec2&lt;float&gt; geom::Vec2&lt;float&gt; const geom::operator*&lt;float, char&gt;(geom::Vec2&lt;float&gt; const&amp;, char);</code></p><p><code>const geom::Vec2&lt;float&gt; geom::Vec2&lt;float&gt; const geom::operator*&lt;float, double&gt;(geom::Vec2&lt;float&gt; const&amp;, double);</code></p><p><code>const geom::Vec2&lt;float&gt; geom::Vec2&lt;float&gt; const geom::operator*&lt;float, float&gt;(geom::Vec2&lt;float&gt; const&amp;, float);</code></p><p><code>const geom::Vec2&lt;float&gt; geom::Vec2&lt;float&gt; const geom::operator*&lt;float, int&gt;(geom::Vec2&lt;float&gt; const&amp;, int);</code></p><p><code><br /></code></p></blockquote><p>I know this is an&nbsp;exaggerated&nbsp;example, but template bloat (excessive template instantiation) can be a real problem. So, in the end, I'm not sure this option is a very good idea.</p><p><br /></p><p><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 0.8em; ">Option 3 (almost good?)</font></font></font></font></p><p>Another option (even though we wanted to avoid it from the start to follow the advice of Mr Sutter) is a member method. As simple as this:</p>

<blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><code>template &lt;typename T&gt; struct Vec2<br /></code><code>
{<br /></code><code>&nbsp;&nbsp; &nbsp;...<br /></code><code>&nbsp;&nbsp; &nbsp;Vec2&lt;T&gt; operator*(T scalar) const<br /></code><code>&nbsp;&nbsp; &nbsp;{<br /></code><code>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;return Vec2&lt;T&gt;(x*scalar,y*scalar);<br /></code><code>&nbsp;&nbsp; &nbsp;}<br /></code><code>
};</code></blockquote><div><br /></div><div>This method would have no problem instantiating, as there's only one possible <code>T</code> it can accept, and it's clear that we want to convert the type of <code>scalar</code> to that type <code>T</code>. And for the same reason, it would only instantiate once (again, assuming the compiler won't inline it because of size or something).</div><div><br /><p>But we would still have the problem with the swapped version. If we try to do a multiplication like <code>2.0*Vec2&lt;float&gt;(10,0)</code>, we'll get a nice compiler error. What can we do about that?</p><p>The best way I've found is to do this:</p>

</div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><code>template &lt;typename T, typename Y&gt;&nbsp;</code></div><div><code>const Vec2&lt;T&gt; operator*(Y lhs, const Vec2&lt;T&gt; &amp;rhs)</code></div><div><code>
{</code></div><div><code>&nbsp;&nbsp; &nbsp;return rhs*lhs;</code></div><div><code>
}</code></div></blockquote><div><br /></div><div>That is, create a nonmember double-typenamed swapped version, and make it just call the member straight one. Hopefully the compiler will inline it (it's a small method after all) and we'll avoid the associated template bloat I talked about previously.</div><div><br /></div><div>So, in the straight version we've got no template bloat, and in the swapped one we will most likely avoid it too, thanks to the compiler inlining capabilities. We would, however, be forced to use member methods and ignore Sutter's advice this time.<p><br /></p><p>And I've already run out of ideas to get this to work properly. If anyone knows of a more elegant way to do it, or has found an error in code, logic, technique, grammar or whatever, <b>please</b> <b>leave a comment</b> :-)</p><p><br /></p><p></p>
</div>]]>
        
    </content>
</entry>

<entry>
    <title>Time for some OpenGL bashing</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/time-for-some-opengl-bashing.html" />
    <id>tag:hombrealto.com,2009:/blog//1.12</id>

    <published>2009-11-11T18:06:19Z</published>
    <updated>2009-11-11T19:06:44Z</updated>

    <summary><![CDATA[Yes, I know, I'm a bit late to the game, but today (after 8 years of using OpenGL exclusively) I've just gotten the small rendering engine I'm working on to use DirectX 9.&nbsp;Why DX9 and not 10 (or 11)? Because...]]></summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="DirectX" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="OpenGL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[<p>Yes, I know, I'm a bit late to the game, but today (after 8 years of using OpenGL exclusively) I've just gotten the small rendering engine I'm working on to use DirectX 9.&nbsp;</p><div>Why DX9 and not 10 (or 11)? Because I need it to run on computers with WinXP (could Microsoft have been more stupid, limiting DirectX like that?)<div><br /></div><div>Anyway, the engine was working on MacOS and Linux, but I needed to be sure to support DX as well (OpenGL drivers for Windows are known to be buggy as hell and fairly incomplete in some cases). So here we are, coding using a propietary Microsoft library. Yes, I know I'm unclean and un-free, but such is life (and for what it's worth, I'm still using bash, g++ and vim, even on Windows ;)</div><div><br /></div><div>The thing is, after using DirectX for 2 days, I wish OpenGL was more like it in some ways.&nbsp;</div><div><br /></div><div>Small example. First, OpenGL code to render from a vertex buffer object:</div><div><div><b><br /></b></div><div><b>glBindBuffer(GL_</b><em><b>ARRAY_</b></em><b>BUFFER, m</b><em><b>dataBuffer);</b></em></div><b><em></em></b><div><em><b>glEnableClientState(GL_</b></em><b>VERTEX_</b><em><b>ARRAY);</b></em></div><b><em></em></b><div><em><b>glEnableClientState(GL_</b></em><b>COLOR_</b><em><b>ARRAY);&nbsp;</b></em></div><b><em></em></b><div><em><b>glColorPointer(4, GL_</b></em><b>UNSIGNED_</b><em><b>BYTE, sizeof(VertexData),&nbsp;<span class="Apple-style-span" style="font-style: normal; font-weight: normal; "><b>(char*)NULL+s</b><em><b>izeof(uint8</b></em><b>t)*3<span class="Apple-style-span" style="font-style: italic; ">);</span></b></span></b></em></div><b><em></em></b><div><em><b>glVertexPointer(3, GL_</b></em><b>FLOAT, sizeof(VertexData), 0</b><b>);</b></div><div><b>glDrawArrays(GL_</b><em><b>TRIANGLES, startVertex, endVertex-startVertex);</b></em></div><em><div><div><br /></div><div><span class="Apple-style-span" style="font-style: normal;">And now, equivalent DirectX9 code:</span></div></div></em></div><em><div><br /></div></em><div><em></em><div><em><b>m</b></em><b>d3dDevice-&gt;SetFVF(D3DFVF_</b><em><b>XYZ | D3DFVF_</b></em><b>DIFFUSE);</b></div><div><b>m</b><em><b>d3dDevice-&gt;SetStreamSource( 0, mdataBuffer</b></em><b>, 0, sizeof(VertexData) );</b></div><div><div><b>m</b><em><b>d3dDevice-&gt;DrawPrimitive( D3DPT_</b></em><b>TRIANGLELIST, startVertex, (endVertex-startVertex)/3);</b></div><div><br /></div><div>Those 2 blocks of code do exactly the same thing: take information from a buffer about a bunch of vertices (vertex position and color), and use that info to render some triangles. Just that.</div><div>&nbsp;</div></div></div><div>But see the difference in the code? Now choose the cleaner, saner API. I don't think many will choose OpenGL. And that's a shame.</div><div><br /></div><div>Sometimes I hope for a clean API redesign for OpenGL, and do away with all that crap. ...sigh...</div><div><br /></div><div>I highly recomend&nbsp;<a href="http://www.khronos.org/developers/library/2008_siggraph_asia/SA2008_Modern_OpenGL.zip">this session (page 194)</a>&nbsp;(btw, thanks to shash for that link) where an NVidia engineer talks about problems the OpenGL API (and DirectX's too) causes for application portability.&nbsp;</div></div><p></p>]]>
        
    </content>
</entry>

<entry>
    <title>New blog, and new video</title>
    <link rel="alternate" type="text/html" href="http://hombrealto.com/blog/2009/11/new-blog-and-new-video.html" />
    <id>tag:hombrealto.com,2009:/blog//1.10</id>

    <published>2009-11-06T10:37:50Z</published>
    <updated>2009-11-16T10:24:02Z</updated>

    <summary><![CDATA[Hi there! New blog (again), I hope this one lives for more than 2 years...&nbsp;This last year I got really tired of Drupal, and decided to use something else and create another blog. In the end (after some googling), I...]]></summary>
    <author>
        <name>Jon Valdés Furriel</name>
        
    </author>
    
        <category term="Blog" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="C++" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://hombrealto.com/blog/">
        <![CDATA[<div>Hi there! New blog (again), I hope this one lives for more than 2 years...&nbsp;</div><div><br /></div>This last year I got really tired of Drupal, and decided to use something else and create another blog. In the end (after some googling), I decided to go with Movable Type. It's much more organized and stable than Drupal, and has almost no cpu cost for each page view (pages are generated just once as an html, instead of each time a user views the page, like Drupal does). I hope it doesn't give me as many headaches as the old web did...<div><br /></div><div>Also, I've decided to write in english from now on. Almost everyone that read the previous blog knows english, so I won't be losing lots of readers (except my mother. Sorry, mom). And more importantly, this way I get to practice my written english, which has always been my&nbsp;weakest and was weakening by the moment.</div><div><br /></div><div>And about the page itself, I'm not entirely convinced of the page design (it's just a default theme with a changed header), but it'll have to do for the moment. My design skills aren't really the best, and my designer friends are all too busy to work for free (or in exchange of a couple of beers).</div><div><br /></div><div>Anyway, to start filling this new web with something useful, I've uploaded a new video showing the output of my software raytracer (anyone remembers?). It's just that I recently upgraded to a quad core machine, and had to test the machine with a heavily multithreaded program :)</div><div><br /></div><div>So, here's the video (it's in full HD, so hit the fullscreen button!):</div><div><br /></div>

<object width="501" height="282"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=7468446&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=7468446&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="501" height="282"></object>
<br />
<br />
<div>The video is also linked in the Software Raytracer page in the Projects section, by the way ;-)</div>]]>
        
    </content>
</entry>

</feed>
