Master thesis: Motivation speech. What do you think of it?

I’m starting with my master thesis on “Skin and Object detection” right now. The idea is to detect skin in video sequences, similar to the one that you find on YouTube or MSN Video (former Soapbox).

I have already done some pre-work during an exercise on image detection and processing at university. Some of the results are found here, here and here.

The motivation speech, the kick off event, will be held in approximately one week and I have already prepared the presentation. During the speech I have 15 minutes to present my ideas. After that there is going to be a short discussion where the people from the institute give me some input for my master thesis.

You can download the current draft of the presentation from here.

I’m wondering what you think about my presentation and motivation speech? Any feedback is welcome! :)

Published on Feb 29th, 2008 — Tags: ,
Comments (3)    digg it!    kick it   

Interesting insides from the GDC keynote

Seems like I study the right thing ;)

Published on Feb 21st, 2008 — Tags: , ,
Comments (0)    digg it!    kick it   

Channel 8 DreamSpark: Student? Get a lot of Microsoft’s software for free!

Tommorow, on the 18th of February 2008, we are going to launch Channel 8 DreamSpark. This is exciting news for students because that is the time when you get a lot of Microsoft Software for free! Yeah, you read it right, for free, no charge at all! You only need to sign in with your university e-mail handle to allows us to understand that you really study at a university and then the download fun can begin!

Available will be the following packages:

  • Expression Studio
  • Sql Server 2005 Express
  • Sql Server Developer Edition
  • Virtual PC 2007
  • Visual Basic 2005
  • Visual C++ 2005
  • Visual C# 2005
  • Visual J# 2005
  • Visual Studio 2008
  • Visual Web Developer 2005
  • Visual Studio 2005 Professional
  • Windows Server 2003
  • XNA Game Studio

Cool, uh? Isn’t it a huge list? 8)

By the time the software will be available we will also publish a series of webcasts that will introduce you with the software; a getting started package of webcasts! I have done some of them :P

Keep an eye on Channel 8 to be one of the first to download the software! Happy downloading and playing with the new software! :)

Published on Feb 19th, 2008 — Tags: , ,
Comments (0)    digg it!    kick it   

Edelweiss: demonstrating some cool graphic effects

Martin Kinkelin (a friend of mine) and I have finished developing the graphic demo (see related posts: 1, 2). It shows a museum scene with a dinosaur (a raptor), four anubis statues sitting on stone bases and two dragons. The dragon model is taken from “The Stanford 3D Scanning Repository”.

Further, a commuting light source is positioned at the top of the room and generates some nice shadow effects. On the two sides (left and right) of the room there are two dragons that look at the opposite direction of each other:



We have implemented the following effects: Bloom, Shadow Maps, Normal Mapping, Parallax Mapping and animated objects. Try to spot them in the demo :D

Curious? Want to check it out? Download the demo from here.

Published on Jan 27th, 2008 — Tags: , , ,
Comments (0)    digg it!    kick it   

Dynamic Shadow Maps

As mentioned in a previous post a friend of mine and I are currently doing a rendering engine in OpenGL that implements a selection of graphic effect. I also mentioned that the next thing that is on the plan is shadow mapping. It’s done! After we met a few afternoons we finally got it working properly and finished the whole project.

But why do we need shadows? Shadows add realism to the game because they are found everywhere in real life and how can you have a light without shadows. :) Without the shadows everything looks like as if it is floating in the air and not as if it is sitting on the ground (this effect is also seen in the pictures at the end of this post).

There are two approaches to create shadows in the games: one is shadow mapping and the second one is shadow volumes. Shadow volumes cost a lot - especially if you have complex objects (that’s also why Doom 3 has very simple meshes); computing the volumes need a lot of CPU and GPU cycles and aren’t therefore much seen in nowadays games. DirectX 10 adds new features to make creating shadow volumes faster, but since we both don’t own a DX10 GFX card we couldn’t test it and haven’t therefore added shadow volumes to the engine. We went with shadow maps.

When creating shadows in games you have to take two things into consideration: the position of the light and the area that is visible by the camera: the player’s field of view.

The first thing that you do, when creating the shadows, is to switch to the position of the light and generate a shadow map. A shadow map is basically a depth map that holds for each pixel a depth information. That information is computed by detecting the nearest point (for each pixel in the shadow map) to the current position. Since we are at the position of the light it will be the closest position to the light source. Keep in mind that creating the shadow map involves to render all the objects; although you can disable shaders and texturing.

To create the shadow map we used a OpenGL FrameBufferObject where a texture was specified as depth component:

glBindTexture(GL_TEXTURE_2D, _textures[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT, _width, _height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_BORDER);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_BORDER);
float borderColor[] = { 1.0f, 1.0f, 1.0f, 1.0f };
glTexParameterfv(GL_TEXTURE_2D, GL_TEXTURE_BORDER_COLOR, borderColor);
glTexParameteri(GL_TEXTURE_2D, GL_DEPTH_TEXTURE_MODE, GL_LUMINANCE);
glBindTexture(GL_TEXTURE_2D, 0);

// set the texture as depth attachment.
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT,
    GL_TEXTURE_2D, _textures[0], NULL);

// required to be set to none.
glDrawBuffer(GL_NONE);
glReadBuffer(GL_NONE);

The next step is to switch back to the camera view. Here it is needed to check each pixel’s (that is part of a mesh) depth value (speaking of the distance from the light position) with the value that is found for that pixel in the shadow map. This is done in the pixel shader where the shadow map is passed in as texture. If the value is heigher than the value that is found in the shadow map that pixel will be in the shadow otherwise it is in the light.

The problem that you get with this approach is that you have hard shadows; something that isn’t found in real life. With “hard shadows” I mean that a pixel is or completely in the light or completely in the shadow. To avoid that PCF (Percentage Closer Filtering) is done. PCF checks the neighbor values in the shadow map and if a certain number of them (threshold) are in the light the current point is calculated as not being fully in the shadow. This allows us to similate soft shadows.

The final results look like this:

 

Published on Jan 23rd, 2008 — Tags: , ,
Comments (1)    digg it!    kick it   

Naked people detection in videos (part 3)

I’m doing a university project where the goal is to detect naked people in videos. It’s going to be some kind of filtering algorithm that should allow webmaster of video sites to understand what videos should be removed and what not. I have already published part 1 and part 2 that are holding some of my findings. It’s now time for a new post where I lay out my latest findings and results :)

Since the last post I have read a few papers on face detection via skin recognition. It seems like a good approach to detect skin, since they detect the faces of persons via their skin. The papers that I have read are all referenced in the one that I have mentioned in part 2 of this series:

  • D. Chai, K.N. Ngan, Locating facial region of a head-and-shoulders color image, in: Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition (FG ’98), Nara, Japan, April 1998, pp. 124–129.
  • D. Chai, K.N. Ngan, Face segmentation using skin-color map in videophone applications, IEEE Trans. Circuits Syst. for Video Technol. 9 (4) (1999) 551–564.
  • S.L. Phung, A. Bouzerdoum, D. Chai, A novel skin color model in YCBCR color space and its application to human face detection, in: Proceedings of IEEE International Conference on Image Processing, vol. I, 2002, pp. 289–292.

The most interesting is the one from 1998 because the other papers are based on that approach. It is not much that has been added that is of relevance for the work here. The idea in the paper is to perform the following four steps to detect the skin:

The first step converts the image into YCbCr color space. The range 77-127 for Cb and 133-173 for Cr is used to detect “skin pixel” in the image. That means if the Cb and Cr values of a pixel fall into these ranges it might be a possible skin pixel and is detected as such. The Y component is ignored because in YCbCr the Y component is only holding the brightness of the pixel: that means by only measuring Cr and Cb we can cover all the skin tones. For all pixels that are skin we return a solid black and for all the others a solid white.

To convert the RGB values into YCbCr I have use the following formula:



This formula is also used by JPEG and MPEG to encode images or videos. It simply converts the RGB values into YCbCr values and doesn’t reserve certain bits for command instructions. On the German Wikipedia website about the YCbCr color space there are also other formulas, but they don’t use the full bandwidth of the byte to reserve certain bits for command information. These command instructions are used by TVs to understand how to process the image. That’s not required here because we work on images and we can therefore use the whole bandwidth.

The second step means to downscale the image (with / 4 and height / 4) that we have gotten as result from step 1. During the process of downscaling the amount of black pixel is summed up. For each black pixel we add the value of 1 to the resulting pixel, which would make it 16 if all the pixels in the image from step 1 are black.

After that we have to weigh the pixels in some way to understand if the pixel is still “skin” in the resulting image. The idea is to make a pixel black (skin) when all of its original pixels are black (value = 16) and it is surrounded by more than 5 neighbors who are also black (have also value 16). Otherwise the pixel is made white. If the pixel is white (or gray; a value below 16) and it is surrounded by more than 2 black (value = 16) neighbors it is made black.

The next step uses the output image that has resulted from step 2 and calculates the average value and variance for a neighborhood of 16 pixels. This average/variance is calculated on the original image by using the Y component. That’s also why I have selected 16 pixels. These 16 pixels represent one pixel in the downscaled version of the image.

If the variance is below a certain threshold (I have selected 4) the pixel is turned white. Otherwise the pixel is turned black. The idea here is that background is usually very flat (in pictures) and that the skin (the face in concrete) of a person is not flat. This works great for faces, but has some problems when you have, for example, a back in the image. The back of a person is usually also very flat and hard to distinguish from the background.

The fourth step is very similar to the third step. It’s again time to weight the pixels by the neighborhood to understand if the pixel should stay black or turn into white. In this case the weighting is a little bit different: if the pixel is black (value of 1 in this case since we don’t do a downscaling again) and 3+ neighbors are black it stay black; otherwise it is turned white. If the pixel is white and 5+ neighbors are black it is turned black; otherwise stays white.

This step involves also a vertical and a horizontal scan of the image. If less than 4 pixels in a row or column are black they are all removed. This is done to remove some of the noise that might still be in the image.

After that the resulting black pixels are counted and a percentage is calculated.

The results of the algorithm are really poor. I have expected a lot more from it. It seems to me that the algorithm is very much trimmed to only detect faces and no other parts of the body. In some scenarios it finds also arms and things, but that’s only working because arms are like faces to this algorithm. Backs aren’t detected at all. To show some results of the algorithm I have extracted two frames of a short scene from “Baywatch”. In the first one the algorithm works really great. In the second one the algorithm isn’t working at all - you can see that the main part that is facing the camera is the back of the guy:

92-0.jpg
92-1.jpg
92-2.jpg92-3.jpg92-4.jpg

213-0.jpg
213-1.jpg
213-2.jpg213-3.jpg213-4.jpg

The next thing that I thought is to remove the variance calculation to include all the possible skin pixels and to detect also some wrong skin pixels. But that was before I tried the algorithm on another video (remember: black is skin):

0-0.jpg
0-1.jpg

“Houston we have a problem!” It seems to me that the YCbCr color space is already having some problems to detect skin at all. As you can see in the two images all the background is detected as skin and the skin itself isn’t. I don’t know if that can be fixed by adjusting the ranges but that will change the results on the “Baywatch” video and I’m sure it won’t get better there…

But since this is still an ongoing project I’m going to investigate further and post about my findings - stay tuned!

Published on Dec 18th, 2007 — Tags: , ,
Comments (2)    digg it!    kick it   

Bloom: a little bit of HDR

Another project that we (a friend of mine and I) are currently doing is a rendering engine in OpenGL that implements a selection of graphic effect. The title of the university exercise is “Echtzeitgraphik” and the goal is to implement some effects in a demo like fashion - it needs to be powered by a real-time game engine that needs also to be developed by the groups, which we did by recycling and enhancing the FishSalad engine :P

Right now the engine is featuring normal mapping and parallax mapping:

If you look at the dragon you might be able to see the normal mapping effect. We use a low polygon model and apply a normal map in the pixel shader to make it look like a high polygon model. The parallax mapping effect is visible on the box. The stones look very much 3D although it is a flat texture. The effect is a lot more visible if you move around in the game.

Adding the bloom
What I did the last two days was to implement the bloom effect. This effect is seen in a lot of the current games and was also introduced in Half-Life 2 with the famous Lost Coast level. The bloom effect isn’t a real HDR effect, it’s some kind of faked HDR and simulates looking from a darker standpoint into something very bright. You usually get dazzled by that bright light and the outlines of the object become very blurry.

The question was on how to achieve this effect. The bloom effect is a post-processing effect that means that you apply it when the scene has been already rendered. As I found out there are a few easy steps to perform the bloom effect. The theory looks very easy, but implementing it is another story:

  1. You render your scene into a texture: best is using the so called Frame Buffer Objects (FBO) because that avoids copying the data from the back buffer to a texture. FBOs allow you to directly render the scene into a texture.
  2. You use a pixel shader that extracts the more interesting parts of the scene, by darkening the darker parts and brightening the brighter parts. That’s done via a threshold. That result is rendered in another texture - it’s time to use a FBO that allows you to target multiple textures.
  3. Next you use a gaussian blur (in a pixel shader again) to blur the texture in one direction. The bigger the kernel, the better it looks like and the slower because you need to do a lot of texture lookups. The result is again rendered into a texture.
  4. The next step is to use the gaussian blur again. But this time in the other direction. As input the result of the previous step is taken. The result of this step is rendered in another texture and is a two way blurred image.
  5. The final step involves another pixel shader that merges the original scene with the blurred scene by doing an additional overblending (adding the both together). The brighter parts will brighten the original texture and generate a nice bloom effect.

Now if you do this on a 1280×1024 texture you can imagine how many texture lookups that are. But it is running quite fast on my older graphic card: it’s impressive how powerful even older graphic cards are… very impressive. I wonder how much this thing would fly on a 128 shader units card, like the Geforce 8800 GTS, GTX or Ultra. :D

The final result is shown in the image underneath:

Next on the plan for this little engine is shadow mapping. :)

Published on Dec 16th, 2007 — Tags: , , , ,
Comments (2)    digg it!    kick it   

Naked people detection in videos (part 2)

I’m still working on the university project about naked image detection in videos. My professor has send me a nice paper that I have been reading yesterday. Its title is: Naked image detection based on adaptive and extensible skin color model by Jiann-Shu Lee et all.

They have some nice ideas in the paper: like switching to another color space, where the different color tone of the skin isn’t important anymore. You ignore one channel in that new space (YCrCb) to cover all different colors tones: like black, white, brown etc. The approach sounds very promising to me. Another thing, which I’ll probably not implement is that they use a neural network to train the system to get the best values for the skin color. I don’t know if I have the time to do that. The algorithm also groups pixels together to create one (or more) blob(s) that represents the whole skin areas in the picture.

They make also very strong assumptions, which might not hold for videos. For example they think that the naked person is in the middle of the image. That’s true for images but not true for videos. The person might move through the visible area in one or more directions. They use also face recognition to understand if that skin blob in the image is a person or not. That increases there results dramatically but isn’t very suitable for me too because in a video you don’t have the face visible. More often if you have a face (like a face closeup sequence) it shouldn’t be classified as a naked scene because it is part of a movie. Whereas when you have no face, like you see only the body, it should be classified as sequence with naked persons in it - from the sample videos that I have gotten there are a lot of amateur videos where people show only their naked body and not the face at all.

The problem hasn’t gotten any easier so far.

What’s also a little disappointing is that they say in the paper that a SVN (Support Vector Machine) is not very usable for this kind of pattern recognition, because it often reproduces wrong results.

I have also found a nice project from Microsoft Research, called Accelerator. A guy in the Channel 9 forums posted a question about it and it seems very usable for my color detector. It moves code from the CPU to the GPU and uses their power to process the images or even to do other stuff like general computing. I hope I can speed up things by using it.

Published on Dec 5th, 2007 — Tags: , ,
Comments (2)    digg it!    kick it