Search this blog

27 May, 2008

Collecting garbage

Our is one of the most fast advancing fields in computer science. Still there are some misconceptions that are hard to be broken. One of those is about garbage collection. Most game programmers hate it. Still most game programmers work in frameworks that do at least reference counting (very basic one, most of them are not thread safe, nor able to handle cyclic references correctly), and many times, when two reference counted systems have to be bridged, we design our own, bad, slow implementations of garbage collection.

The best examples of garbage collectors are the ones built into java and c# virtual machines, those systems manage to be faster than explicit allocators on server workloads. They are tested, they have excellent performances. Why most people still think that garbage collection is slow?

Well I think that has various reasons behind it. First of all, of course, is that early GC implementations, conservative and stop-the-world, were not as refined as current ones. Then probably there's a misconception that is really related to desktop java programs. Java used to suck for such tasks, mostly due to slow graphics and slow IO, that were never related to the GC itself. Last, but not least (at all) is that GC on desktop systems have to live with other programs, and the OS itself, that does not understanda GC concepts. That usually means that the GC will allocate a huge heap that they will manage themselves, thus eating up (or seeming to eat) much memory for even the most trivial hello world program.

Moreover for typical game systems, we don't rely on heap allocators as well, for performance critical code. In game, we usually want to have close to no allocations/deallocations, we want to be in complete control of the memory budgets for each subsystem, and most of the times, we just craft a bunch of ad-hoc class pools were memory management is a performance problem. Class pools can be done in a GC system as well, so it shouldn't really matter if we use GC or manual allocation, at all.

But first of all, why should we strieve for a GC world? Why should we like it, were are the advantages?

Many people think that a key advantage is that GC ensure not to leak memory. What a small feature! In any decent C++ code you will have your own debug memory allocator that makes tracking and slashing leak bugs a trivial task. Also, while you can't leak memory, you could leak other resources, and it also makes RAII more difficoult to implement, as usually you don't know when your destructors are going to be called!
GC does not help with other memory related bugs, like memory stomping, that's a language feature (not allowing out-of-bounds access and pointer arithmethic), it's true that such a feature is usually available in languages that use GC (because it makes the design of the collector much easier, otherwise, only conservative ones could be engineered), but still its not strictly a GC feature. GC do protect you from double deletes and using accessing deleted memory, still those bugs are not too hard to detect for a debugging allocator.

The real benefit is not that. It's about composibility. Simply, manual allocation does not compose (in a somewhat similar way to explicit locking). If you allocate a class, and then pass it to another subsystem, or you have to completelly transfer its ownership (and still that should be clearly documented, it's not the standard policy), and forget about it (don't store a pointer to it), or you have to manage the subsystem as well, making sure it will not use that class after you destroy it.
That's why we prefer to use reference counting. Garbage collection is only a usually more powerful, and faster, implementation of that idea, that is, decoupling memory managment from the rest of the code, to make composition easier.

There is also another benefit, knowing where all the pointers are (non conservative GC) also lets you move classes around in memory (moving GC). That's why GC usually have much lower fragmentation than the best non-fragmenting heap allocators. and it's another big win as fragmentation is usually a nasty problem. That also lets the system improve cache and page friendliness (this is done automatically by most GC, but noone prevents you to implement a system were you can expilictly hint about a given memory ordering in a moving GC) and it's one of the key features that let some GC systems to be faster than manual heap allocators, heap allocations are not simple as they might seem...

Update: an interesting read here and here
Update: I know that there are people that believe that manual allocation is still better because you can code a reference counting system or even a (usually incredibly bad) GC system for it. So you get more choice, and more power. That's a typical C++ mantra, sung without any real world experience. The problem is that as manual allocation is bad, everyone will need at least RC, so everyone will code its own system, that's because they have the power to choose, and they will choose, in different ways. Different and incompatible. Different and incompatible ways that will eventually meet and produce tons of slow, bad, ugly code in order to let them talk each other. That's it, typical C++ code.

26 May, 2008

Artist-coders iteration

  1. Artists gather references
  2. Coders gather knowledge
  3. Artists make prototypes - visual targets in DCC applications
  4. Coders find a procedural way to create the same effect
  5. Artists iterate with the existing technology
  6. If existing tech is not enough, coders improve it, and we return to point 5
Everything trying always to communicate, and work together. Most coders think that artists should be limited or constrained so they can't end up using too many resources or doing too crazy things.
That's completely wrong, the real key is to always explain how things work, what are the limits, and to make tools that can easily and interactively profile the cost of a scene while it's being worked.
Artists have the great gift to find new and incredible ways to tweak parameters to make our technology reach limits that we did not even foresee. They will usually try only to achieve a given visual target, but they are more than able to optimize on performaces as well, if we make easy for them to have a feedback about that.

On the other hand many times artist won't trust coders as well. They will always ask to have more control, just in case they need to tweak something. Usually they will be more happy to achieve an effect using textures and vertices, stuff they can manipulate, instead of algorithms and procedures. They have to learn that some effects are cheaper and way better and easier to do procedurally, instead of craft them by hand.

Cross-contamination is fundamental. Artists that know how to write scripts and shaders will find that many tasks are incredibly easier to do with coding. Coders that know 3d and 2d applications will find that many algorithms are easy to visually prototype in those applications. We really do need coder-artists and artist-coders. Procedural art is the only way we can author such huge amounts of contents that are required for next-gen games, and are also our best bet of rendering all that with decent performances (as ALU is cheaper than data access).

25 May, 2008

Devil is in the details

I'm not even halfway through "the exorcism of Emily Rose" so I still have to write. From some comments I've noticed that I wrote too much, too fast in my post about the rendering equation, so this is a (hopefully better, even if I doubt it, as I'm watching those scary movies) rewrite of its second part.

One of the things I'm confident of is that wathever models we provide the artits with, they are capable of tweaking them in unpredictable ways in order to make them fit the idea they want to express. That's great, but has two problems. The first one is that such tweaking could end up with suboptimal use of our precious, scarce, computing resources. The second, somewhat related to the other, is that bad models could be too complicated to fit, to find parameters that achieve the desidered look.

So what has the good rendering engineer to do in order to avoid those problems? To me the main thing is to always work together with the artists (a good idea is to look at what they're trying to do, ask them to make prototypes with their DCC tools, and then see how we can express in a more procedural way their art), known what they need, know the physics, and base our models both on artists needs and good maths. Good maths do not mean correct physics, we are far from that in realtime rendering, but reasonable physics, models that are based on solid ideas.

Now a simple example of how things can go wrong, it's something related to a small problem I had at work.

A nice trick, known since the software rendering times, is to simulate the specular reflections on a curved object by assuming that each point in that object sees the same scene. It's the basic environment mapping technique that has been around for years. We project an image of the surrounding environment on an infinite sphere or cube around the model, save it into a texture, and index that texture with the per-pixel reflection vector (camera to point vector reflected with the point normal). We can do it in realtime as well, and we usually do, i.e. in racing games.

That's intresting because we're shading the object by considering the light reflected from other objects towards it, so it's a rare example of simulating indirect global illumination (indirect specular, that's quite easy compared to indirect diffuse or worse, glossy).

But what is, that texturemap, encoding? Well, it's a spherical function, something that is indexed by a direction vector. It's an approximation of the spherical function that encodes the incoming radiance, the incoming light that scene objects are transmitting towards the point we want to shade.
Now for a purely specular mirror we have to notice that its BRDF is a Dirac puse, it's a spherical function that is one only along the reflection vector, and zero everywere else. That BRDF can be encoded in a two dimensional function (in general, BRDFs are four dimensional, i.e. they need two direction vectors, the incoming and outgoing one).

What happens if we convolve that BRDF with our approximated incoming light function, as the rendering equation does in order to compute the total incoming energy that is going to be scattered towards the view direction? Well we have a function that's zero everywhere and that the envmap texture value only along the reflection direction. That's exactly the same as taking only that sample from the envmap in the first place. So our envmapping algorithm is a reasonable approximation for a purely specular part of our material. Easy!

Now another thing that was easily discovered in those mystical early days is that if you replace your spherical reflection image with an image that encodes a phong lobe, you get a cheap way of doing phong shading (cheap when memory access was not so expensive compared to ALU instructions).
Why do that work? It does because what we're encoding in the envmap, for each direction is the convolution of the BRDF with the lighting function. In that case we are considering the light function as a Dirac impulse (a single point light), and convolving it with a phong lobe. Convolving something with a Dirac again results in an unchanged function, so we're storing the phong lobe in our texture map, and as Phong specular reflection model can be reduced to a bidimensional function, that precomputation works.

But we can think to be smarter, and not use Dirac impulses. We can take an arbitrary light configuration, convolve it with our specular model, index it with the reflection vector, and voilà, we have (an approximation of) the specular part of our shading. If we do the same, convolving this time the light function with a cosine lobe (Lambert model), and index that with the normal vector, we get the diffuse part as well.

This is a clever trick, that we use a lot nowdays, in some way it's the same thing we do with spherical harmonics too (SH are another way of storing spherical functions, they're really intresting but that's the subject for another post). You can use a cubemap indexed with the surface normal for the diffuse term and another indexed with the reflection vector for the glossy one. But care has to be taken when computing those cubemaps. They have to be the light function convolved with the term of the local lighting model we're considering, as we just said!

What is usually done instead is for the artists to use the gaussian blur in photoshop, or, if the cubemaps are generated in realtime, for the renderer to use a separable gaussian filter (as gaussians are the only circular filters that are separable).
But a gaussian is not a cos lobe nor a phong one! And I seriously doubt that artists are going to find a gaussian that is a good approximation of those as well. And even if they do that, filtering the cubemap faces is not the same as applying a convolution to the spherical function the cubemap is representing (the equivalent convolution will be distorted towards the vertices of the cubemap, as a cubemap has not the same topology of a sphere, its texels are not equally distant when projected on the sphere, so we have to consider that when applying our filter!).
Moreover, we can't have a pair of cubemaps for each different specular function, so we have to choose a convolution size and hack some kind of exponential AFTER the cubemap access!
That led to complains about the inability of finding the right blur sizes to achieve the correct look, that wasn't a failure of the cubemaps, but of how we were using them. That incorrect model could not be made to fit the look we wanted. In theory, that was a good approximation for diffuse and phong shading arbitrary light sources, in practice, the implementation details made our approximation different from what we were thinking, and in the end, bad.

Update: HDRShop is capable of doing the right convolutions on cubemaps, as I described there!

Best tech for next-gen games

I've got a new projector, here at home, so I'm trying to watch some horror movies. I have "the exorcism of Emily Rose" and "the Blair witch project 2", both of them I've already seen. Problem is that I'm really scared by horrors (that's why I watch them) so I used to always watch them with a friend of mine, when I was in Naples. Shame is that I can't call her now, so I'll try to distract myself from the movie by writing a couple of posts.

What's the best tech, the most useful piece of technology we can code for a next-gen game. Well, some will say things like realtime shadows, ambient occlusion, HDR, DOF, motion blur, rigid body physics etc... Very interesting and exciting stuff.

Stuff that requres time, to be done properly. And who knows what other interesting stuff we will invent, in the following years... Does all that really matter?

I don't think so, no, it does not matter too much. I've seen incredible games being made with relatively low tech engines an example being GT5. The truth is that we're not limited by the technology, we're limited by the time (and money). Next-gen projects take so much effort in order to be done that we're not really using properly even well known algorithms and shaders.
Designers and artists need time, and iterations, in order to fully use what we give them. Giving them more time, faster iterations, that will make the difference. And the same applies to coders as well.

The key to the success of a nextgen game is the iteration time. What's the key to lowering iteration times? Data-driven design. Databases. Databases are the best tech for next-generation games. Tools matter. More than anything else.

Some effort has been made, it's not rare to have some kind of scripting, probably an in-game console, the ability to tweak some parameters, probably to save them too. You will be really lucky if you are working with an engine that lets you hotswap shaders. Even more lucky if you can hotswap meshes and textures, after processing them with your asset pipeline.

But usually even if you have all those commodities, what I still have to see is an integrated system, decoupling code from data, dividing data in its most basic components and relation between components (i.e. meshes, shaders, textures, material parameters, are separate, but related, entities), and integrating with version control, build, DCC applications, networked update (hotswapping), reflection of code structures into data and code generation of data into code structures.

I would like to be able to launch the game, drop a track, make a model of a car with a DCC application, "build" it, load into the game, attach the car to the AI, tweak its material parameters, and save the resulting configuration into the version control system. That would really be a technological breaktrough.

It would be a dream come true if I could even hotswap code, not only tweaking scripts, but actually having a dynamic linking system... On some platforms this is not allowed in a shipping game, but for development is doable...

P.P.S. Data driven does NOT mean overengineered. You should not reduce everything to single system, small, specialized systems are fundamental, procedural models, ad-hoc algorithms, are often better than collating together pre-existing generic pieces. The data-driven infastructure
should also make easier to write a new subsystem, and to integrate it with the tools. Generalizations should be made only if they save work, and only if and when they are needed, not apriori.

19 May, 2008

Reflection saved the day

An easy question: can fx composer 2 render to a given mipmap level (surface) of a rendertarget texture instead of using only the first one?

Ask this question on NVidia developer forums. Wait. Hope. No reply... Well, never fear! Let's just download the .NET Reflector and try to read the code by ourselves. After a little bit of wandering around (wow, they have quite a few assemblies!), we find FXComposer.Scene.Render.Direct3D.dll, that's a really nice starting point. Class FXRenderDevice_D3D, we're getting closer, SetRenderTarget method, cool, let's disassemble it and we find:

using (Surface surf = target.GetSurfaceLevel(0))

Unfortunately that makes me pretty sure that the answer is no, nope, you can't, doesn't matter if you use SAS or ColladaFX, better luck next time.

Reflection is great. You can use it to gather all kinds of information on your code, or to dynamically create it as well (via Reflection.Emit or CodeDOM, the latter requires the c# compiler). Cool stuff.

P.S. Shame on you NVidia I really do badly need to be able to render to a mipmap level and everyone involved with shadowmaps and/or posteffects (follow that link!) probably have the same needs as well! I'm really disappointed by the new FX composer, you added a lot of functionality but for coders, it's as good or worse than the old one. Also I don't believe that most studios do need all those scene/animation capabilities just because usually, when you have to author a material shader, you test it directly in yor DCC application anyway...

P.P.S. The next step, for the hacker inclinded guy, would be to (if assemblies are not signed) use a debugger to find where the SAS scripting calls the setrendertarget, modify it to parse a surface number, and pass it into the texture that will be set. Everything should be doable within .Net Reflector via the Reflexil and Deblector addins!

13 May, 2008

Give your matrices the right shape

Maths have been around for a while. A bit more than rendering (well to tell the whole truth, matrices are kinda "new", even if the ideas weren't, the formalization is not one of the oldests topics in mathematics, still they're way older than rendering :D). So please, respect them. MAKE YOUR MATRICES COLUMN MAJOR. There's NO other option, you can't choose, it's NOT up to you. If you're happy with that you can store four row vectors in your matrix or four column ones, that's fine, the order in memory is an implementation detail that's up to you. But basis vectors are NOT in the rows, your GetX() should not be the same as GetRow(0), GetTranslation() returns your last COLUMN, not your last row. Well let's not talk about the code I had to debug today, involving a matrix library that had them all (and wrong) GetRow(), GetXAxis(), GetX(), XAxis(), X(), Up() and GetXColumn()!!!

Chris Hecker wrote a fairly detailed article on why maths is done the way it is.

Last but not least, if your matrix is stored per-rows in memory but logically column-major has the nice advantage of being displayed correctly in your debugger, and more importantly, letting you create a 3x4 (affine) matrix with 3 vector4s that's way better than 4 vector3 (of a per-row row-major or of a per-column column-major*) as most of the times your vector3, to be SIMD friendly, will be vector4 anyway (due to alignment), and also the shaders love to use a float4[3] constant array much more than a float3[4]...

P.S. Cross-product does follow the RIGHT HAND RULE

* note: of course those two are in fact EXACTLY the same, and that's why OpenGL is column-major in a per-column order, so it's 100% compatible with all crazy libraries that are per-row in memory and logically row-major...

10 May, 2008

The rendering equation in the real(time) world

So, I've just submitted the first draft of my ShaderX article (and I don't like it), I'm watching "a beautiful mind" and I've been banned from facebook, tomorrow I don't have to work (Victoria day or something like that) so it seemed to be the perfect time for writing this post...

My motivation in writing this article is because I do believe that we should root our work in the correct maths, and in the correct physics. That's not because hacks do not have a value, they do. It's because if we don't know what we're doing, we're giving artists models that simply can't achieve the result they're aiming for, or that are too complicated. In the end everything is about that, we provide models, artists optimize parameters to fit our parametric models to a given look. Unfortunately, providing the "right" parameters is not as easy as in the context of automatic optimization, we strieve not only for a minimun number of orthogonal ones, but for something that is actually understoodable by humans. Still doing things wrong surely won't help, as the whole linear lighting thing demonstrated.

All the work we do as rendering enginners is rooted in one, single equation that describes how light propagates in the world, under a simple model of light that's well enough for our visualization purposes (geometric optics). This equation is appropriately called the rendering equation, and was first derived by Jim Kajiya (here).

I won't delve much into the details (a very good read is the Eric Veach
thesis). To make that long story short(er) (and more suitable to our needs), shading a visible point on a surface amounts to computing how much light that surface scatters from the light sources back to our camera plane.
That amount is the product of three functions:
  1. The local lighting model, that's a function that depends on the incoming ray of light direction and the outgoing direction we're considering, it usually varies from point to point on a surface (by changing the input parameters to a model for a given material, i.e. using texturemaps). This function is called the BRDF (bidirectional reflectance distribution). Note: for convenience, in the following I'll assume in the lighting model also the integration of the cosine angle term of the rendering equation.
  2. The visibility function, in other words, for each considered light source direction (as seen from the shaded point) we need to know if that light source is blocked or not by other geometries in the scene (this is the non-local part as it involves interactions with other surfaces other than the one being shaded)
  3. The incoming light value.
We take those three functions, multiply them together, and gather for each direction the amount of lighting scattered back to the camera. If we consider only direct lighting (i.e. the direct effect of the lights on the surfaces, not considering that the amount of lighting not scattered into the camera actually bounces around the environment, acting as other, indirect light sources) that amounts to, for each light, to take the light intensity, scale it by the occlusion (that's a binary function) and by the BRDF.
Or, if we see those functions as hemispherical maps (by fixing a given outcoming direction for the BRDF) then what we're doing is multiplying the maps together and computing the integral of the result (and this is exactly what Spherical Harmonic Lighting methods do, read this and this one).

In our current rendering work, what we do is to further (usually) split that integral into three parts, corresponding to different frequencies of lighting (an interesting read related to that is the presentation of the Halo 3 material and lighting system):
  1. The ambient term, that's the diffuse (Lambertian) response of the material to a light that's everywhere in the environment and emits in every direction with a constant intensity. Note that the sum of the correct visibility function for that light exactly is what is called ambient occlusion.
  2. The diffuse term, the part of the material response that is view-independent. It's the Lambert model of light scattering, its BRDF is constant, only the cosine term is considered.
  3. The specular term, responsible for the glossy highlights, like the Phong or the Blinn model.
Note: the sum of those three terms should give you a spherical function that integrates at most to one, otherwise you have a material that's not scattering light, but emitting it!
Computing the second and the third term mostly depend on our choice of how to represent lights. Given a light model, the correct visibility function for that light function has to be found.

Using the wrong shadow model will cause incorrect results, worst thing, as our occlusion functions usually are approximate too (i.e. shadow maps) having them to operate in regions that should already be in shadow by the local lighting, but they happen not to be due to inconsitencies between local and direct models (i.e. lighting with N lights but shadowing only the most influential one to save rendering time), shows those defects even those regions.

Update: I've moved the second part of this article here.

Rendering equation in the real world (my apartment)

05 May, 2008


I don't post many links to other sites and news on this blog. That's because I believe that there are plenty of sites that already do an excellent job at this.

Today I want to share you a few RSS feeds, for your reading pleasure. If you look around, you'll find somewhere in my blog a list of RSS readers for various portable devices, my advice is to use them, I'm at the moment reading those feeds with an HTC smartphone (windows ce) on my way to work (usually when I return home I read magazines, books or articles that I print at work).

The following is just a list of what I read, I've selected those sites based on various criterias, signal-to-noise ratio, orthogonality with the other feeds, etc. I expecially hate feeds that do not provide full text as they make harder my offline reading.
And that's what I actually have on my smartphone, probably there are some feeds that I should remove too.
That's not the full list as well, I also read non-programming stuff, photography, politics etc, but let's stick to the topic of this blog. I also have some feeds about generative art, that probably should post as well, maybe at a later time, don't want to make this too long.

  • DevMaster: news, mostly about libraries and tools, not one of my favourites but's it's fast to read and sometimes useful.
  • Tom Forsyth Wiki: one of the old, big names in realtime graphics, everything he writes is interesting.
  • Lab49 Blog: lab 49 provides high-end information services for finance. Its blog is very intresting, a mixed bag of languages, algorithms, high performance computing. A must read.
  • Lambda the ultimate: language theory, way too much for me, but the posts I manage to understand are always nice.
  • Chris Hecker: I guess most people know him, a must read.
  • Power of two games: A startup game company, the blog is about them, agile development, computer engineering. Not updated often.
  • Gooey Bugs: visual studio debugger tips and tricks. Not updated often.
  • Wolfram Blog: mostly advertisement, from the creators of Mathematica, sometimes it's intresting.
  • Lucille development blog: blog written by the developer of the Lucille raytracer. Japanese. And I can't read japanese. Still, sometimes I can find some really nice stuff.
  • Peter Shirley's blog: another guy that surely does not need presentations. Raytracing and other stuff. Must read.
  • Realtimecollisiondetection: Christer Ericson's blog. I love his book, and his blog. Really smart. The book is a must read even if you don't care at all about collision detection (it has a very nice chapter about linear algebra and geometry, it's a nice introduction to space partitioning structures, and the last chapters about numerical robustness and performance optimization are a must read)
  • Senzee 5: EA technology guru. Must read.
  • Game physics: Erin Catto's blog, nice physics, shamefully it's not updated often.
  • LukeH: F# and C# at its best, from a MS guy
  • Sutter's Mill: Herb Sutter... C++ and multithreading. Must read.
  • Fabulous adventures in coding: Eric Lippert's blog, another MS guy, very nice, about some intricacies of the C# compiler.
  • Mellow Musings of dr.T: more MS, functional programming and more.
  • Generalities&Details: CLR, parallel computing,
  • Artima: news site, a little too focused on web-ish programming for me, but interesting, really well made.
  • ThinkingParallel: parallel programming at its best.
  • Diary of a graphics programmer: Wolfgang Engel, lead graphics programmer at Rockstar and ShaderX editor.
  • Coder Corner: graphics programming by Pierre Terdiman, very good.
  • Level of Detail: another incredibly good one about graphics programming. One of my favourites
  • Meshula: must read, this guy has a lot of experience. Ex-realtime rendering coder now working at ILM.
  • Cedrick Collomb: the maker of Unlocker, one must have tools for windows. And he's running a fine blog too!
  • Pixel's too many: Marco Salvi, the author of the ESM technique for soft shadows. And an ex italian demoscener too.
  • NVidia developers news: I find the ATI developer site to be more useful, but they don't seem to have a news RSS...
  • Beyond3d News: graphics technology more than programming but very good.
  • as for devmaster, it's usually boring, but still worth a read.
  • CellPerformance: Cell and PS3 (of course, in the limits of NDA), but also parallel programming in general, compiler tricks, optimizations for modern CPUs in general.
  • Mike Acton: Also on Cell (hosted on CellPerformance)
  • Physics based animation: very intersting
  • Hectorgon: I've recently added this one, looks promising!
Of course if you now any other cool feeds, please, post them in the comments section!!!

03 May, 2008

I'll go to the programmer's hell...

...but I have to say that I don't agree with Donald Knuth. Well to say the truth, it's not a big deal, he simply expressed some doubts, in a (very intresting) recent interview, about the recent multicore CPU trend. But I don't want to "flame", it's a very reasonable interview, he only says that for most applications multicore does not matter much, and probably it's also true, because for a lot of applications CPU does not matter, they are simply I/O bound.

What I can't agree with is that CPU designers are just becoming lazy and so are throwing at us all this parallel programming burden instead of finding nice ways to make us happy. In fact it's quite the contrary. Moore's law talks about transistors, not performace. Performance is due to transistors and clock speed. CPU performance following the Moore's law actually indicated that they were "wasting" transistor usage, as the "correct" speedup should be more than that, as it's not surprisingly happening with GPUs. The thing is that engineers were so concerned in not changing our habits that they did avoid spending transistors on raw computing power, but they tried to find smarter way to decode and execute our instruction streams instead. Until they reached a limit that was predicted and predictable, as it's ultimately driven by physics. Multithread and multicore architectures are the natural evolution and a paradigm shift towards them is surely needed. And the problem is that I suspect it's not the only one that will be needed. The limits of parallel programming lies in data access, we not only need to learn how to do threads, but also how to engineer data access in a more CPU friendly way. Luckily the answer of BOTH problems seem to be possible by employing data parallel paradigms, like the stream computing one, that we should all be familar with by having knowledge of how GPUs work.

Read it anyway, of course it's intresting, and then maybe you could compensate by reading something about parallel programming, as it seems to be our future. I would reccomend Suess' blog, in which I've found this nice article about OpenMP performance (and parallel programming in general).

P.S. It's incredible how some news sites take those very reasonable interviews and make big titles out of them. An example is the whole raytracing versus rasterization debate, surely it was intresting, probably there was something to say about Carmak's interview, but how can you title as Slashdot did, Yerli's one as "
Crytek Bashes Intel's Ray Tracing Plans"?

01 May, 2008

Mipmaps, generation and level selection

How do we generate, use and manage mipmaps? The devil is in the details, I want to talk about this because today at work, we had a couple of problems with those, and because they are another nice example of how doing things without exactly knowing what you're doing, in a mathematical perspective, can cause subtle or less subtle problems.

Mipmap levels
On modern hardware (GPU) mipmaps are chosen according to UV gradients. The GPU computes the rate of change for the shaded point of the texture coordinates for each texture fetch instruction, by using finite differences. That's why GPU always operates on 2x2 pixel quads, and also why dynamic branching in pixel shader can't in the most general case conditionally jump over texture fetching instructions (it usually can if the texture fetch is indexed by coordinates that are directly obtained from the vertex shader interpolated outputs, as the gradient of those outputs is always available).

Those gradients are an estimate of the footprint of the currently shaded pixel (or sample) in texture space, it's like taking our square pixel, projecting it into the triangle and transforming this projection in the texture space. Note that if our texture coordinates are computed for each vertex and then linearly interpolated over the triangle (with no perspective correction), that linear estimate is exactly our footprint.

If the sampler is not using anisotropic filtering, the mipmap level is chosen by considering the gradient vector with the highest magnitude and finding a mipmap level where the texel size best fits that vector. If the sampler is using anisotropic filtering, then the mipmap is chosen according to the smallest of the two vectors, and a number of samples is averaged along the directon of the longest one. This operation is needed as if the two gradients are of radically different sizes, then considering the biggest one results in overblurring by sampling a too coarse mipmap. If even we bias GPU mipmap computations to choose a less coarse mipmap level we can't fix the problem, we'll just trade overblurring with aliasing. If the pixel footprint is not round, we need anisotropic filtering.

Optimizing samplers
Mipmapping is a win-win technique on modern GPU. It improves quality, by avoiding texture aliasing, and it improves performances, by insuring that texture fetches from nearby pixels happen in nearby texels in the chosen mipmap level. But care has to be taken. If we use anisotropic filtering or if we don't we want to minimize the anisotropy of our gradients. In the first case to reduce blurring, in the second, also to improve performances, as anisotropic filtering on some GPUs is dynamic, it performs more samples if needed.
What causes texture gradients anisotropy? Well, it depends on how the mapping is seen by the currect viewing direction, so it's a factor of the mapping itself and of the viewing angle between the triangle and the camera. We usually have total control over the first factor, as texture coordinates are most of the time authored by artists (UV mapping) and a limited one over the second. What we should try is to UV map our objects in a way that minimizes texture streching.

For most objects, that have no "preferred" viewing angle, that means simply that the mapping has to be uniform, a triangle in the UV map has to maintain a shape similar to its 3d version. Isotropic sampling will work well for front-facing triangles and blurriness will be seen at grazing angles. Failing to achieve this can lead to terrible results, yesterday I "fixed" a severely blurred texture by simply asking the artists to resize it from 1024x1024 to 512x1024, as that size (even if smaller!) resulted in better gradients.

But this also leads to an interesting optimization for objects that do have a preferred viewing angle (in texture space), like roads in a racing game, where the asphalt is usually mapped in a way that follows the road (i.e. the U coordinate is always parallel to the road the the V is perpendicular to it) and mostly seen at a given angle. In those cases we can easily compute the optimal U/V mapping ratio (to minimize gradient anisotropy) by either analitic means or by visualizing and analyzing an image that displays the gradient ratio.
Usually in those cases anisotropic filtering is required (and if even we optimize for a given viewing direction, it still will be needed, because eventually our player will do something wrong and the car will run perpendicular to the track, pessimizing our optimized gradients), but better gradients can lead to better quality and performances (on some GPUs also trilinear filtering can be tuned, and a performance gain can be obtained by limiting the regions were trilinar filter is used, but that's mostly an hack that trades quality for performance).

Mipmap generation (1)
Usually each level of a mipmap chain is half the size, in both dimensions, of the previous level. This means that mipmapped textures cost only 1.(3) times non mipmapped ones. And also makes mipmap generation trivial, for each level, we compute a texel by taking the mean of a 2x2 texel block in the preceeding mipmap level. It's so easy and fast that we can also do it for runtime generated (render-to-texture) textures. And we do. And it is wrong.

The mean that we do exactly corresponds to computing the integral over the texel's footprint of the original texture texels. If we interpret our texture as a bidimensional function, made of square steps (the "little squares" model), then it's an exact way of computing a resampled version of it, as it's analytical, no aliasing occours. The problem is in the model we are using, our texture does not encode that function, and it's easy to see that is not what we want, that function if magnified will show all the texel's steps as big, square pixels. Steps are discontinous, and are a source of infinite frequency in the Fourier domain. Also, when we sample the texture, we don't usually interpet it as made of steps (if we're not point filtering it), we use bilinear interpolation. So we have to change our framework, our texels are not little squares, are sample points!

Boring, long digression

Let's introduce a bit of point sampling theory. When we take uniform samples of a function (equally spaced) we can exactly reconstruct that function from the samples if that function did not contain too high frequencies. How high? Up to a limit, that is directly proportional to the sampling density and that's called the Nyquist limit.
Unfortunately in our rendering work, most of the times we're sampling a geometrical scene, and that scene contains discontinuities, and those discontinuities yield infinite frequencies, so we can't sample it in a way where the samples contain all the information we need. Some frequencies will mix with others, in an unpleasing way called aliasing. There's an easy way to fight aliasing, and it's to prefilter the function we're sampling to cut the nasty frequencies. In other words we want to take our source function in the frequency space and multiply its frequencies by a step function to cut everything above a given limit. Multiplying in frequency space equals to convolution in the image space, the step in frequency equals to a sinc function.
We incour in two problems. First one is that the representation we have of our function is not in any way analitically filtrable. The second one is that the sinc function has an infinite support. So this approach is not useful. What we do instead to fight aliasing is to take many samples, to push our Nyquist limit as far as possible, and then reconstruct from those samples a bandlimited function, by convolving them with an approximate version of the sinc (linear and cubical approximations exists other than more complex windowed sincs), and then using for our pixel the integral over the pixel's footprint of that reconstructed function (in practice, it's as simple as multiplying the samples with the filter function, centered in the pixel center and taking their mean). Or if we can take non uniform samples, trade aliasing for noise by using random sampling positions (or better, quasi-random, or randomized-quasi-random)

Mipmap generation (2)

We can see our texture as a point sampled version of a bandlimited function, containing frequencies only up to the Nyquist limit. If so, to resample it, we have to reconstruct our function from our samples, and then evaluate it at the new sample locations. Bilinear filtering used by GPU samplers approximates the function with a linear kernel. Nice thing is that for that linear kernel, taking the average is still the correct analityical resampling we need, so mipmap generation by averaging is coherent with that the GPU does when sampling. But it also introduces overblurring.

The bottom line is that using better resizing methods, even if they're not coherent with how the GPU intepretes samples, leads to sharper mipmaps and better quality.

But that's not the only thing we have to be aware of when generating the mipmaps. We also have to take care of what kind of data we store in the texture itself. Is that a color? Colors are usually stored in a non linear, gamma corrected space. That's fine, it optimizes the bit use storing colors in a space that is visually linear(ish). But it's bad when we have to perform computations on them because our math prefer to work on numbers that are uniformly spaced. So when we generate mipmaps for color textures, we should take care of the degamma/gamma problem, as we do in shaders and samplers (see valve publications, post processing in the orange box, or the post that started it all, at least for me, by Peter Shirley).
Failing to correctly process gamma in mipmap generation usually produces mipmaps that are darker than they should be.

What if they contain other data? Well, of course, similar care has to be taken. Normalmaps are hard (see this paper and this one too). At least we should always take care that the normals do not change their mean length across the mipmap levels (i.e. maintain them normalized if we only want normal vectors), but that does not solve all the problems. Normals are directions on a sphere, normalmaps are four dimensional functions, reampling is not trivial. The easiest thing is to thing about them as the gradients of another function, that we can obtain by solving a Poisson integral (that's actually quite fast to do in practice). Then resample that function, and compute from that our gradients. That's also a good way to think about all operation involving normal data (see the GDC 2008 slides "care and feeding of normal vectors", by
Jörn Loviscach).

Final remarks
Computer graphics is both easy and hard. You will always have an image, but if you don't know what are you doing and why, subtle errors will creep in, that probably won't be filed as bugs, but will compromise quality, or performance, and worse, artists will try to fix those errors by tweaking their assets, producing images that usually are fine, but still do not look quite right. Most of the times we blame our cheap mathematical models for that failures and seek for more complex tools, or worse, introduce hacks and tweaks on those models to bend them into something that looks better but it's even more wrong.
We should start right, do the right maths, and only then blame our cheap models and seek for something else, always trying to know why we are doing that, basing our maths on a solid base.