A fast way to grayscale an image in .NET (C#)

Have you ever needed a way to edit images directly in .NET? The framework comes with the Bitmap and Image class that allows you to edit different images. But the problem is that by default you probably will end up with the GetPixel and the SetPixel methods (of these classes) to edit the image.

It won’t be long until you realize that the GetPixel and SetPixel couple is damn slow. But there’s another way. This way is a lot faster but requires you to use a little bit pointer arithmetics and embed that pieces of code into an unsafe block. That means that you need to compile the project with the /unsafe switch (found in the project properties). If you don’t want to do that you could extract this class into an own assembly - that you compiled with the unsafe switch - and use it from the other assembly.

The Bitmap class has a method that is called LockBits. With that you can lock the bits in the image and from then on use the data in the image directly. The following piece of code shows you how to walk through the pixels of the bitmap and convert all of them into a grayscale representation. For that I used the Y value that is calculated while converting an image from RGB space into the YCbCr color space. The Y component contains the brightness of the pixel, which is exactly what we need:

/// <summary>
/// Grayscales a given image.
/// </summary>
/// <param name="image">The image that is transformed to a grayscale image.</param>
public static void GrayScaleImage(Bitmap image)
{
    if (image == null)
        throw new ArgumentNullException(“image”);

    // lock the bitmap.
    var data = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, image.PixelFormat);
    try
    {
        unsafe
        {
            // get a pointer to the data.
            byte* ptr = (byte*)data.Scan0;

            // loop over all the data.
            for (int i = 0; i < data.Height; i++)
            {
                for (int j = 0; j < data.Width; j++)
                {
                    // calculate the gray value.
                    byte y = (byte)((0.299 * ptr[2]) + (0.587 * ptr[1]) + (0.114 * ptr[0]));

                    // set the gray value.
                    ptr[0] = ptr[1] = ptr[2] = y;

                    // increment the pointer.
                    ptr += 3;
                }

                // move on to the next line.
                ptr += data.Stride - data.Width * 3;
            }
        }
    }
    finally
    {
        // unlock the bits when done or when
        // an exception has been thrown.
        image.UnlockBits(data);
    }
}

Published on Dec 28th, 2007 — Tags: , , ,
Comments (2)    digg it!    kick it   

Naked people detection in videos (part 3)

I’m doing a university project where the goal is to detect naked people in videos. It’s going to be some kind of filtering algorithm that should allow webmaster of video sites to understand what videos should be removed and what not. I have already published part 1 and part 2 that are holding some of my findings. It’s now time for a new post where I lay out my latest findings and results :)

Since the last post I have read a few papers on face detection via skin recognition. It seems like a good approach to detect skin, since they detect the faces of persons via their skin. The papers that I have read are all referenced in the one that I have mentioned in part 2 of this series:

  • D. Chai, K.N. Ngan, Locating facial region of a head-and-shoulders color image, in: Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition (FG ’98), Nara, Japan, April 1998, pp. 124–129.
  • D. Chai, K.N. Ngan, Face segmentation using skin-color map in videophone applications, IEEE Trans. Circuits Syst. for Video Technol. 9 (4) (1999) 551–564.
  • S.L. Phung, A. Bouzerdoum, D. Chai, A novel skin color model in YCBCR color space and its application to human face detection, in: Proceedings of IEEE International Conference on Image Processing, vol. I, 2002, pp. 289–292.

The most interesting is the one from 1998 because the other papers are based on that approach. It is not much that has been added that is of relevance for the work here. The idea in the paper is to perform the following four steps to detect the skin:

The first step converts the image into YCbCr color space. The range 77-127 for Cb and 133-173 for Cr is used to detect “skin pixel” in the image. That means if the Cb and Cr values of a pixel fall into these ranges it might be a possible skin pixel and is detected as such. The Y component is ignored because in YCbCr the Y component is only holding the brightness of the pixel: that means by only measuring Cr and Cb we can cover all the skin tones. For all pixels that are skin we return a solid black and for all the others a solid white.

To convert the RGB values into YCbCr I have use the following formula:



This formula is also used by JPEG and MPEG to encode images or videos. It simply converts the RGB values into YCbCr values and doesn’t reserve certain bits for command instructions. On the German Wikipedia website about the YCbCr color space there are also other formulas, but they don’t use the full bandwidth of the byte to reserve certain bits for command information. These command instructions are used by TVs to understand how to process the image. That’s not required here because we work on images and we can therefore use the whole bandwidth.

The second step means to downscale the image (with / 4 and height / 4) that we have gotten as result from step 1. During the process of downscaling the amount of black pixel is summed up. For each black pixel we add the value of 1 to the resulting pixel, which would make it 16 if all the pixels in the image from step 1 are black.

After that we have to weigh the pixels in some way to understand if the pixel is still “skin” in the resulting image. The idea is to make a pixel black (skin) when all of its original pixels are black (value = 16) and it is surrounded by more than 5 neighbors who are also black (have also value 16). Otherwise the pixel is made white. If the pixel is white (or gray; a value below 16) and it is surrounded by more than 2 black (value = 16) neighbors it is made black.

The next step uses the output image that has resulted from step 2 and calculates the average value and variance for a neighborhood of 16 pixels. This average/variance is calculated on the original image by using the Y component. That’s also why I have selected 16 pixels. These 16 pixels represent one pixel in the downscaled version of the image.

If the variance is below a certain threshold (I have selected 4) the pixel is turned white. Otherwise the pixel is turned black. The idea here is that background is usually very flat (in pictures) and that the skin (the face in concrete) of a person is not flat. This works great for faces, but has some problems when you have, for example, a back in the image. The back of a person is usually also very flat and hard to distinguish from the background.

The fourth step is very similar to the third step. It’s again time to weight the pixels by the neighborhood to understand if the pixel should stay black or turn into white. In this case the weighting is a little bit different: if the pixel is black (value of 1 in this case since we don’t do a downscaling again) and 3+ neighbors are black it stay black; otherwise it is turned white. If the pixel is white and 5+ neighbors are black it is turned black; otherwise stays white.

This step involves also a vertical and a horizontal scan of the image. If less than 4 pixels in a row or column are black they are all removed. This is done to remove some of the noise that might still be in the image.

After that the resulting black pixels are counted and a percentage is calculated.

The results of the algorithm are really poor. I have expected a lot more from it. It seems to me that the algorithm is very much trimmed to only detect faces and no other parts of the body. In some scenarios it finds also arms and things, but that’s only working because arms are like faces to this algorithm. Backs aren’t detected at all. To show some results of the algorithm I have extracted two frames of a short scene from “Baywatch”. In the first one the algorithm works really great. In the second one the algorithm isn’t working at all - you can see that the main part that is facing the camera is the back of the guy:

92-0.jpg
92-1.jpg
92-2.jpg92-3.jpg92-4.jpg

213-0.jpg
213-1.jpg
213-2.jpg213-3.jpg213-4.jpg

The next thing that I thought is to remove the variance calculation to include all the possible skin pixels and to detect also some wrong skin pixels. But that was before I tried the algorithm on another video (remember: black is skin):

0-0.jpg
0-1.jpg

“Houston we have a problem!” It seems to me that the YCbCr color space is already having some problems to detect skin at all. As you can see in the two images all the background is detected as skin and the skin itself isn’t. I don’t know if that can be fixed by adjusting the ranges but that will change the results on the “Baywatch” video and I’m sure it won’t get better there…

But since this is still an ongoing project I’m going to investigate further and post about my findings - stay tuned!

Published on Dec 18th, 2007 — Tags: , ,
Comments (2)    digg it!    kick it   

Naked people detection in videos (part 2)

I’m still working on the university project about naked image detection in videos. My professor has send me a nice paper that I have been reading yesterday. Its title is: Naked image detection based on adaptive and extensible skin color model by Jiann-Shu Lee et all.

They have some nice ideas in the paper: like switching to another color space, where the different color tone of the skin isn’t important anymore. You ignore one channel in that new space (YCrCb) to cover all different colors tones: like black, white, brown etc. The approach sounds very promising to me. Another thing, which I’ll probably not implement is that they use a neural network to train the system to get the best values for the skin color. I don’t know if I have the time to do that. The algorithm also groups pixels together to create one (or more) blob(s) that represents the whole skin areas in the picture.

They make also very strong assumptions, which might not hold for videos. For example they think that the naked person is in the middle of the image. That’s true for images but not true for videos. The person might move through the visible area in one or more directions. They use also face recognition to understand if that skin blob in the image is a person or not. That increases there results dramatically but isn’t very suitable for me too because in a video you don’t have the face visible. More often if you have a face (like a face closeup sequence) it shouldn’t be classified as a naked scene because it is part of a movie. Whereas when you have no face, like you see only the body, it should be classified as sequence with naked persons in it - from the sample videos that I have gotten there are a lot of amateur videos where people show only their naked body and not the face at all.

The problem hasn’t gotten any easier so far.

What’s also a little disappointing is that they say in the paper that a SVN (Support Vector Machine) is not very usable for this kind of pattern recognition, because it often reproduces wrong results.

I have also found a nice project from Microsoft Research, called Accelerator. A guy in the Channel 9 forums posted a question about it and it seems very usable for my color detector. It moves code from the CPU to the GPU and uses their power to process the images or even to do other stuff like general computing. I hope I can speed up things by using it.

Published on Dec 5th, 2007 — Tags: , ,
Comments (2)    digg it!    kick it   

How to automatically classify videos as pr0n?

I got an interesting assignment from university. I’m attending a course where we create stuff that analizes video material and does guessing and reasoning on it. Like algorithms that understand where parts where moved and find blobs of data that are making up different things (like a person or something else) in the video.

They gave me as assignment to understand if videos contain adult only scenes (aka. pr0n). The uni department has gotten a request from a company to design something like that. I guess it has to do with automatic filtering of video content for an internet website (something like YouTube or so). The assignment says that the video should be analyzed and the probability for it being pr0n should be returned.

First thing that I did was looking around for existing solutions. And imagine what: I didn’t find much. I went to search for how YouTube or even Google Video is doing filtering on adult materials and I found out that they completely rely on the community. People can flag videos as pr0n and an employee is then going to check whether that video should be removed. The same happens on Google Video. It’s also interesting to know that the community is rather slow and new material is only filtered after a few hours, which means that YouTube could possibly get problems in the future.

Next I thought: but how do they filter copyrighted material? Is that also done by the community? No! They use the sound stream to do that. It means that they create a sound profile for a movie (or the company who produces the movie does that) and they validate each uploaded video against that sound profile. If they got a match they remove it. Sound is also much easier to classify then images. Images may be blurry or even at a very low quality. Sound on the other hand has also at a low quality a certain profile.

But that doesn’t work for pr0n because even somebody with a webcam could do such stuff. You can’t validate against the sound profile to understand if that movie is containing adult stuff. ;)

So, I was there with a problem. How to classify these videos? I started coding and what I have done so far (in like a few hours of coding) is to allow people to select a color and then a tolerance level. Like for example you select R: 100 G: 200 B: 150 and a tolerance of R: 50 G: 50 B: 50, which means that all pixel that fall into a range of R: 50 - 150 G: 150 - 250 B: 100 - 200 are classified as possible “skin pixels”. They are summed up and compared against the whole amount of pixels in the image. That is then done for all images in the video to get an average score of “skin” in the movie.

What’s interesting is that this method returns quite good results. I have a few videos here and the one with a lot of skin have been classified correctly. I even have a video with a pig running around and that isn’t classified as human skin at all. It’s very interesting.

I don’t know how stable this algorithm is going to be - and I think it is not very solid because it heavily depends on the light and the skin color - and if this is going to be the thing with what I’m going to end up, but I’ll keep you guys posted about what I got from my research in this area :)

Ah yes, as a side note: I have used the .NET Parallel FX library to process the different frames of the movie on the different cores. I don’t own a multi core CPU but I’ll test it on my dad’s PC to see if the different cores are leveraged properly :D

Published on Dec 2nd, 2007 — Tags: , ,
Comments (7)    digg it!    kick it   

Fascinating algorithm to resize pictures

Dr. Ariel Shamir from Efi Arazi School of Computer Science is one of the co-inventors of a clever algorithm to resize images. Usually if you have an image that is to big in size you crop or resize it. Now if you resize it, important information might get lost because some pieces in the picture are getting to tiny that you can’t see them anymore (they might even get at a sub pixel level where they are merged with other pixels). On the other hand if you crop the image, parts are missing completely. That could also include important information! Another problem would be two hot spots in an image: you would need to crop and merge the two pieces somehow, which isn’t always an easy task.

Now the people around Dr. Shamir invented a way to resize pictures without losing important details. They have published a video on how their algorithm works. The paper describing the whole algorithm and ideas is found here.

A few days ago Dr. Shamir has been hired by Adobe. Wouldn’t it be very cool if we would get the algorithm as a feature in one of the next Photoshop versions?

Published on Aug 30th, 2007 — Tags: , ,
Comments (1)    digg it!    kick it