Pages

Aug 4, 2012

N3 - yet another SixthSense project

Introduction

This article describes my new project N3, which will follow the idea of SixthSense (hereinafter referred to as 'SS') which is a 'wearable gesture interface that augments the physical world around us with digital information'.

Motivation

I am a big fan of SS. On the TED video, the original developer Pranav showed a lot of example of future technology and great prospect of user interface. The source code of SS was opensourced, and I tried it on my PC the other day. It worked, but it seemed it has some problems.

  • The first problem is recognition precision. SS recognize the color makers, but the precision is not so good. SS uses Touchless SDK to detect markers and the precision completely depends on this SDK so that improving precision is not easy stuff. 
  • Second, portability of original SS is not good, because it depends on DirectX and run only on Widows. SS is 'wearable' application, so I think more cross-platform framework should be used so that SS can run on the mobile devices.
  • The third problem is just a structure of original source code SS. The source code is not well structurize and it takes extra effort to add/change function to SS. As an open source project, the source code itself should be attractive I believe.
Of course, in spite of these problems, SS is a great product and I believe the essence of SS is its idea. The idea of wearable gesture interface which can bring the digital world into the real world is great invention. However, to improve SS, I believe these problems should be fixed. It drived me to make my own SS -- N3.


N3's Approach

For N3, I am using OpenCV for most part of the logics including marker recognition. OpenCV should be fix SS's recognition precision problem and portability problem. The recognition method which I came up with first is color histogram comparison. OpenCV has a lot of convenient function for color histogram comparison, and it's easier to tune it up. Currently I am using very simple algorithm -- first convert the image into HSV planes, compute histogram from H and S plane using cvCalcHist(), calculate back projection image using cvCalcBackProject(), and extract maximum matched area.

Draw App Demo

Here is my draw app demo video. This is a bit different from SS's original draw app. The difference is that a projection plane is calibrated by projecting a chessboard so that we can draw the exact point our finger put on and feel like drawing on the wall. To camera calibration, cvFindHomography() is used to find homography matrix between the coordinate of a chessboard in the projector image and one in the camera image.


Gesture and Physical World Demo

One more demo is an application which recognize the circle gesture and create a circle in the virtual physical world. For gesture recognition, I am using baylor wetzel's C++ port of $1 Unistroke Recognizer, which original SS uses as well. In this application, we can also get a real tissue box mixed in the virtual world. To do so, I use SURF feature detector to extract key points from tissue box image and find it from a camera frame. For physical engine, Box2D is used.


Full Source Code of N3

Source code is on my github. It's not well documented now, but if someone is interested in this project, I'll make more effort to documentation.

Conclusion

This is a brief introduction of N3. It's using OpenCV as a basic library, and it's aiming to improve recognition precision, portability and source code structure. Currently some applications are implemented and the following technologies are implemented/integrated in them.

  • Color Marker Detection and Tracking
  • Camera calibration
  • Object Recognition
  • Gesture Recognition
  • Physical Engine
I will improve these technologies and add more features, so that N3 can be more useful and practical. Stay tuned!

Jan 1, 2012

Image Morphing


Happy new year. I've started this blog today with an article about computer vision.

Introduction

In this article, I will explain about "Image Morphing" which means changing the shape of a object gradually into the shape of another object. I wrote a program code using OpenCV. My algorithm is based on this slide. My code morph myself into "dragon". Why dragon? In Japan, each year is associated with an animal according to a 12-year mathematical cycle, and 2012 is a year-of-dragon. So, this program is a kind of new year greeting to readers :)

Overview

Image Morphing needs two images and mapping between their feature point as input, and is composed of two phase. First, images need to be warped so that their corresponding feature point is at the same position. Second, blend two images into one image using cross-dissolving. So images are warped and cross-dissolving into one image while time parameter is increasing. I will explain the detail about these methods.

Warping

Warping is required to make corresponding pixels in two images be at the same position. To do so,  we have to prepare pairs of lines. In this experiment, I use these two images. the left is a source image, and the right is a destination image.

source image destination image

Pairs of lines for these images are below.
First, lines of the source image.

[srcFeature.txt]
15
74 130 122 1
74 130 82 132
82 132 106 135
106 135 153 137
106 135 118 177
118 177 153 137
122 1 205 130
74 130 81 209
205 130 208 219
81 209 123 266
208 219 123 266
98 211 119 226
119 226 159 215
98 211 119 206
119 206 159 215
And lines of the destination image are below.

[destFeature.txt]
15
84 83 147 20
84 83 96 82
96 82 119 88
119 88 184 88
119 88 154 127
154 127 184 88
147 20 230 82
84 83 75 135
230 82 221 135
75 135 156 275
221 135 156 275
80 135 156 257
156 257 217 140
80 135 151 154
151 154 217 140
The first line is the number of lines n. Next n lines contains four integers - Px, Py, Qx, Qy which represent the coordinate of the starting point and the ending point. For example, the first line or left image is from (74, 130) to (122, 1) which means from left ear to top of the head, and the first line of right image is the corresponding one.

We need to get warped image from the pairs. If the location of X' on the destination image is given, how to calculate the location of X on the source image which is corresponding to X'?

First, we calculate u and v. The value u goes from 0 to 1 as the pixel moves from P' to Q'. The value v is the perpendicular distance from the line to X'. Once we get u and v, we can apply them to the source image, and so X coordinate will be acquired.

In this case, we have multiple pairs of lines, so we need to average the sum of X coordinate weighted by a kind of factor. The factor is set to be inversely proportional to the distance from line and directly proportional to the length of the line. (the details are described in the slide mentioned in "Instruction".)

By calculating source pixels for each pixels in the destination image, we can get warped image whose shape is closer to the destination image.

Cross-dissolving

Cross-dissolving is simpler than warping. We already have similar-shape images. For each pixel in the destination image, we mix the values of corresponding source pixel and destination pixel. The ratio of mixture depends on the time parameter. Time parameter goes from 0 to 1, so at the beginning point(t=0) the ratio of the source image is 1 and the other is 0, and at the end point (t=1) vice versa.

Warping+cross-dissolving=Morphing

As I mentioned above, the time parameter goes from 0 to 1. To complete morphing the source image into the destination image, at first we compute the intermediate image at each time. This means that both the source image and the destination image are warped into the intermediate image. So, for each pixels in intermediate image, we can also compute the corresponding pixels in the source image and the destination image. After that, apply cross-dissolving to these pixels.

Result


Source code

Notice: This source code requires OpenCV 2.0 or later, and this only outputs image files corresponding to each frame. I used ffmpeg to combine those image files into a movie file.