CSE/EE 585: Digital Image Processing II

    Computer Project Report 2 :

    Image Segmentation

    Implementation of a MDL-Based Multi-Band Image Segmentation Algorithm using a Fast Region Merging Scheme

    Group: Nils Krahnstoever, Kaizhi Tang, Chunwei Yu

    Date: 05/06/1999

    1. Objectives

    2. The goal of this project was to implement one of several state-of-the art region based image segmentation algorithm published since 1995. For this report the algorithm published by Tapas Kanungo, Byron Dom, Wayne Niblack, David Steele and Jacob Sheinvald, "MDL-Based Multi-Band Image Segmentation Using a Fast Region Merging Scheme", IBM Research Report,  RJ 9960 (87919) May 17, 1995 was implemented and tested on a set of synthetic and real images.
       

    3. Introduction

    4. The algorithm implemented for this project is utilizing the minimum description length (MDL) principle. The image segmentation is driven by an objective function that seeks to minimize the amount of information necessary to describe the segmented image. The segmented regions are described by their boundaries, polynomial surfaces and residuals. The degree of the polynomials are increased during the segmentation. The segmentation starts with single pixel regions and uses a region merging scheme that minimizes the objective function following the greedy principle.
       

    5. Summary of related work.

    6. The MDL criterion was first used for gray level images with polynomial approximations of 2nd order degree [2]. In [3], a continuation scheme was used to minimize the description length of the segmentation, resulting in a potentially better segmentation because local minima are avoided. However the continuation scheme is computationally much more expensive. The MDL principle was also used to obtain segmentation by encoding the topology of the regions [4] and for model based segmentation [5]. Many more researchers applied the MDL and other information related principles to the problem of image segmentation and other applications. A more complete summary of related work and image segmentation in general can be found in [1] and [6].
       

    7. Algorithm and Implementation Details

    8. The algorithm tries to achieve a segmentation of the image by finding the global optimum of an objective function that measures the amount of information necessary to encode the image together with its segmentation. The segmentation consists of the boundaries of the regions, the coefficients of the polynomials that describe the contents of each region and the residuals of all the pixels in each region, i.e., the difference in gray value between each pixel and the polynomial approximation.
      The algorithm assumes that the image consists of regions and that the pixel values in each region arose from stochastic processes that lead to polynomial intensity distributions plus white spatially uncorrelated Gaussian noise. Furthermore, the algorithm assumes that the segmentation of the image can be described with a minimal amount of information, i.e, that the segmentation corresponds to the minimum of the objective function.
      A segmentation is described by the boundaries of the regions.  Assuming the region boundaries are known, the parameters for the polynomials are obtained through multivariate regression. Once the boundaries and the polynomial approximations are known for each region, only the residuals of the pixel values are necessary to reproduce the image exactly. Therefore, the objective function measures the amount of information necessary to encode the boundaries (or more specifically, their lengths), the parameters of the polynomial approximations and the residuals.
      At each stage and for a given polynomial degree, the algorithm seeks to merge two regions such that the description length is maximally reduced until no such regions can be found in which case the degree of the polynomials used for the approximations is increased. The segmentation is stopped at a given maximum polynomial degree.
      One feature of the algorithm is that the merging of two regions is done incrementally, i.e., the algorithm can use available information about two regions to obtain the corresponding information for the combined regions.

      In practice, the algorithm keeps pairs of neighboring regions the decrease (or increase) in description length that would result from a merging of the regions. At each step it selects the pair that reduces the DL maximally, merges the regions and recalculates all pairs that are affected by the merger.

      For a successful implementation additional points had to be considered:
       

      • When the degree of the polynomial where increased the algorithm had difficulties if regions where present that where so small that the regression problem was underdetermined, e.g., when it tried to approximate two pixel regions with a polynomial degree of 1 (in three dimensions at least three points are necessary to specify a plane). We solved this problem by forcefully merging all regions of insufficient size with one of its neighbors.
      • The algorithm has difficulties handling single pixel regions in the beginning of the segmentation, single pixel regions tend to be assimilated by other larger regions resulting in an incorrect segmentation. The algorithm has to run in a special mode that avoids this problem until all regions are sufficiently "grown up", i.e., until all regions have a size of at least two and maximally three. We followed the suggestions outlined in [1] that initially favors the creation of two pixel regions. In addition to the solution outlined in [1] it proved to be vital to afterwards eliminate all remaining 1 pixel regions by the method mentioned in the previous item - otherwise the above problem remained.
      • For multi-band images (rather than single band images) an additional problem arose: It turned out that the algorithm terminated without ever merging any regions because no merging of regions led to a decrease of the description length. This problem was avoided by dropping all terms but the one accounting for the residuals from the objective function during the initialization phase. When no single pixel regions remained the algorithm proceeded normally.

      •  
    9. Results and Observations

    10. The algorithm was applied to a large number of different images. We considered the following classes:

      (i) single band vs. multi-band images
      (ii) synthetic vs. natural images
      (iii) noise vs. no noise (for synthetic images)

      Single band synthetic noise free  image
       
       

      _
      Figure 1: A synthetic single band image without noise and the result of the segmentation algorithm (right). The image shown here is enlarged by a factor of two.

      In Fig. 1 the application of the algorithm in case of a noise free synthetic single band image can be seen. The algorithm produced the correct segmentation. A small deviation from the perfect result can be seen in the bottom right corner. However, the result is very satisfactory.

      Single band synthetic noisy  image

      _
      Figure 2: A synthetic single band image with noise (sigma=40 pixel) and the result of the segmentation algorithm (right). The image shown here is enlarged by a factor of two.

      In Fig. 2 Gaussian noise with a sigma of 40 pixel has been added. The algorithm is able to handle the noise; it still produces a perfect segmentation.

      The cost to encode noise vs. encoding the boundaries

      _
      Figure 3: Segmentation of a synthetic wheel.

      When the algorithm is applied to the synthetic image of a 'wheel' it does not manage to capture the 'spokes' (Fig. 3). This is a surprising result. It appears as if the intensity difference of the spokes compared to the background is too small. The algorithm finds it better to encode the difference as noise rather than introducing additional boundaries. Due to its length, the boundary of the spokes is especially expensive to encode.

      _
      Figure 4: Same as Fig. 3 but with the intensity of the 'spokes' increased.

      Increasing the intensity of the spokes in Fig. 4 indeed leads to a different result but the algorithm still tries to save boundary length by merging the 'spokes' with the 'rim' of the wheel.

      _
      Figure 5: Fig. 3 with noise of sigma=40 added and the white corner removed.

      Adding noise leads to yet another result. The result is similar to the one for Fig. 3 but one of holes in the wheel was segmented. This indicates that the difference between encoding the 'spokes' and not encoding them is small.

      The above series shows that the algorithm does not always produce results that are 'natural'. The decision whether or not to segment a region depends on the length of its boundary.


      Figure 6: A single band image similar to the one considered in [1] (without noise).

      In Fig. 6 an image of a circle with a gray value gradient is shown. This image is similar to the one displayed by the authors of Ref. [1]. This image tests the influence of the degree of the polynomials that approximate the contents of the regions.

      __
      Figure 7: Segmentation with polynomials of degree 0, 1 and 2 (left to right). The outline of the ball is segmented perfectly the region boundary only blends with the image background.

      The result of the segmentation algorithm applied to Fig. 6 reveals that the algorithm produces a different result than the implementation in [1]. Even with a degree of 0 the algorithm nearly encodes the whole circle in one region. This behavior could not be traced back to an error in the algorithm. The implementation for this project appears to weigh the cost of encoding region boundaries higher than the cost of encoding the 'noise' (i.e., the residuals).

      Single band natural scene


      Figure 8: A single band image of a dolphin.

      __
      Figure 9: Segmentation of Fig. 8 with polynomials of degree 0, 1 and 2 (left to right).

      Applying the algorithm to a natural scene, the image of a dolphin, Fig. 8, yields the result in Fig. 9 for varying degrees of the polynomials. Except for the dark region of the left fin,  the algorithm manages to produce a result that does not seem unlikely. Here as with many other images of natural scenes, the question of what exactly the ground truth is, arises. The MDL objective function produces good results in many situations but the concept of MDL is not necessarily always compatible with the 'objective function' of a human.
       
       


      Figure 10: A common natural image.

      __
      Figure 11: Segmentation of Fig. 10 with polynomials of degree 0, 1 and 2 (left to right).

      Fig. 10 and 11 show another segmentation of natural scene. The result of the segmentation is not what one would expect. The algorithm seems to tend to combine neighboring regions with low contrast but also obvious regions such as the arch to the right in the background have been missed. The most likely cause of the relatively bad segmentation seems to be the low overall contrast in the image. Again, the algorithm seems to merge regions more happily than it should.

      _
      Figure 12: A part of the previous image with equalized histogram to increase the contrast.

      Fig. 12 shows parts of Fig. 10 enlarged with the contrast increased. The segmentation of this image is considerably better indicating that the low contrast in Fig. 10 was indeed the reason for the bad result.

      Multi-band synthetic images

      _
      Figure 13: A synthetic three band image without noise (RGB) and the result of the segmentation (right).

      Fig. 13 shows the segmentation of a synthetic noise free three band color image. The result is almost perfect.

      _
      Figure 14: A synthetic three band image with noise (RGB) and the result of the segmentation (right).

      Fig. 14 shows the same image as Fig. 13 with noise added. The algorithm fails to merge some small regions. Noise will most likely always have an affect like this on a segmentation.

      Multi-band natural scenes


      Figure 15: A four band color image (CMYK) of a natural scene.

      _
      Figure 16: Segmentation of Fig. 13 with polynomial degrees of  0 and 2 (left to right), segmentation of degree 1 is identical to the one of degree 2.

      Fig. 15 and 16 show the segmentation of a four band image of a natural scene. Again the question of ground truth arises but the balloons are not segmented and blend with the crowd. Some colors of the balloons where similar to the sky. All the flying balloons where segmented but the top-most was divided into many small regions. It is worth noting that the increase in degree of the polynomial 'ate' one of the balloons.

      This shows how the algorithm is, unlike humans, unaware of texture information. Although the blue color of the balloons is somewhat similar to the sky, a human can clearly distinguish the blue areas of the balloons from the sky because of the variation in texture (ignoring the obvious contextual information). The algorithm as it was applied here is not adapted to handle texture.

      It is desirable to test the algorithm on an image where the 'solution' is more obvious.


      Figure 16: Parts of the image from Fig. 13.

      __
      Figure 17: Segmentation of Fig. 16 with varying polynomial degrees (as before).

      Fig. 16 and 17 show the application of the algorithm to a natural scene where the ground truth is more obvious than in the previous example. The resulting segmentation can be considered to be very good. The balloons are clearly segmented and almost surprisingly, the blue color of the larger balloon did not merge with the sky.

      It is worth noting that the computational time of the algorithm is very high. The segmentation of Fig. 16 (141 pixel x 182 pixel x 4 bands) took 146 minutes on a 400Mhz Pentium PC.

      Increasing the weight of the residuals

      The algorithm did not show the expected behavior when the degree of the approximating polynomials was increased. As stated above our implementation of the algorithm seemed to have a tendency to merge and encode the residuals rather than encoding additional region boundaries. An ad hoc solution to this problem was to multiply the term that encodes the residuals with a small factor greater than one.

      Applying the modified algorithm to an image containing a gray value gradient gives the following result:

      __
      Figure 18: Segmentation of a gray value gradient giving more weight to the cost of encoding the residuals. The right image is a segmentation with polynomials of degree at most 2 which is virtually identical to order 0 and 1.

      Fig. 18 shows that this changed the behavior of the segmentation but that the algorithm is now reluctant to merge regions. Even with 2nd order polynomials the different gray levels are still segmented separately.
       

    11. Conclusions


    12. In this project a multi band MDL based region merging segmentation algorithm was implemented. The implementation closely follows the description in [1] but deviations and additional policies where necessary to avoid numerical instabilities.
      The implementation produced acceptable results in most cases for both single band and multi-band images. The algorithm encountered difficulties in cases with low image contrast where it favored merging regions that the eye can clearly distinguish. Due to the MDL based objective function the resulting segmentations where clearly dependent on related image properties such as the length of a region boundary. The algorithm also seemed to not utilize the different polynomial degrees as the implementation by Kanungo et al. [1]. Whether or not this was due to an error in the algorithm or due to differing implementation policies could not be determined. An experiment suggests that the different parts of the objective function where somehow weighted differently compared to the implementation in [1]. The algorithm does not depend on any parameters that could account for different behaviors.
      After all the MDL approach to image segmentation clearly appears to have many very promising properties. It is free of any parameters that could be adjusted and can be used for images with arbitrary numbers of bands. A clear disadvantage is that the algorithm is slow, even though a fast region merging scheme was implemented. In addition the memory demand is high in the beginning of the segmentation.
       
    13. List of references


    14. [1] Tapas Kanungo, Byron Dom, Wayne Niblack, David Steele and Jacob Sheinvald, "MDL-Based Multi-Band Image Segmentation Using a Fast Region Merging Scheme", IBM Research Report,  RJ 9960 (87919) May 17, 1995
      [2] Y. G. Leclerc, "Constructing simple stable descriptions for image partitioning", International Journal of Computer Vision, 3, 73-102, 1989
      [3] A. Blake, A. Zisserman, "Visual Reconstruction", MIT Press, Cambridge, MA, 1987
      [4] A. Keeler, "Minimal length encoding of planar subdivision topologies with application to image segmentation", AAAI 1990 Spring Symposium of the Theory and Application of Minimal Length Encoding, 1990
      [5] P. Fua, A. J. Hanson, "Extracting generic shapes using model driven optimization", In Proceedings of the Image Understanding Workshop, pages 994-1004, Boston 1988
      [6] R. M. Haralick and L. G. Shapiro, "Computer and Robot Vision", Vol. 1, Addison-Wesley, 1992
       
    15. Time spent doing the project:


    16. Approximately 100 man hours. Most time was spent debugging the algorithm, especially after the different part from the team members where merged.
       
    17. Code