Apr 1, 2015
Aug 4, 2012
This article describes my new project N3, which will follow the idea of SixthSense (hereinafter referred to as 'SS') which is a 'wearable gesture interface that augments the physical world around us with digital information'.
MotivationI am a big fan of SS. On the TED video, the original developer Pranav showed a lot of example of future technology and great prospect of user interface. The source code of SS was opensourced, and I tried it on my PC the other day. It worked, but it seemed it has some problems.
- The first problem is recognition precision. SS recognize the color makers, but the precision is not so good. SS uses Touchless SDK to detect markers and the precision completely depends on this SDK so that improving precision is not easy stuff.
- Second, portability of original SS is not good, because it depends on DirectX and run only on Widows. SS is 'wearable' application, so I think more cross-platform framework should be used so that SS can run on the mobile devices.
- The third problem is just a structure of original source code SS. The source code is not well structurize and it takes extra effort to add/change function to SS. As an open source project, the source code itself should be attractive I believe.
Of course, in spite of these problems, SS is a great product and I believe the essence of SS is its idea. The idea of wearable gesture interface which can bring the digital world into the real world is great invention. However, to improve SS, I believe these problems should be fixed. It drived me to make my own SS -- N3.
For N3, I am using OpenCV for most part of the logics including marker recognition. OpenCV should be fix SS's recognition precision problem and portability problem. The recognition method which I came up with first is color histogram comparison. OpenCV has a lot of convenient function for color histogram comparison, and it's easier to tune it up. Currently I am using very simple algorithm -- first convert the image into HSV planes, compute histogram from H and S plane using cvCalcHist(), calculate back projection image using cvCalcBackProject(), and extract maximum matched area.
Draw App Demo
Here is my draw app demo video. This is a bit different from SS's original draw app. The difference is that a projection plane is calibrated by projecting a chessboard so that we can draw the exact point our finger put on and feel like drawing on the wall. To camera calibration, cvFindHomography() is used to find homography matrix between the coordinate of a chessboard in the projector image and one in the camera image.
Gesture and Physical World Demo
One more demo is an application which recognize the circle gesture and create a circle in the virtual physical world. For gesture recognition, I am using baylor wetzel's C++ port of $1 Unistroke Recognizer, which original SS uses as well. In this application, we can also get a real tissue box mixed in the virtual world. To do so, I use SURF feature detector to extract key points from tissue box image and find it from a camera frame. For physical engine, Box2D is used.
Full Source Code of N3
Source code is on my github. It's not well documented now, but if someone is interested in this project, I'll make more effort to documentation.
This is a brief introduction of N3. It's using OpenCV as a basic library, and it's aiming to improve recognition precision, portability and source code structure. Currently some applications are implemented and the following technologies are implemented/integrated in them.
- Color Marker Detection and Tracking
- Camera calibration
- Object Recognition
- Gesture Recognition
- Physical Engine
I will improve these technologies and add more features, so that N3 can be more useful and practical. Stay tuned!
Jan 1, 2012
Happy new year. I've started this blog today with an article about computer vision.
IntroductionIn this article, I will explain about "Image Morphing" which means changing the shape of a object gradually into the shape of another object. I wrote a program code using OpenCV. My algorithm is based on this slide. My code morph myself into "dragon". Why dragon? In Japan, each year is associated with an animal according to a 12-year mathematical cycle, and 2012 is a year-of-dragon. So, this program is a kind of new year greeting to readers :)
OverviewImage Morphing needs two images and mapping between their feature point as input, and is composed of two phase. First, images need to be warped so that their corresponding feature point is at the same position. Second, blend two images into one image using cross-dissolving. So images are warped and cross-dissolving into one image while time parameter is increasing. I will explain the detail about these methods.
WarpingWarping is required to make corresponding pixels in two images be at the same position. To do so, we have to prepare pairs of lines. In this experiment, I use these two images. the left is a source image, and the right is a destination image.
|source image||destination image|
Pairs of lines for these images are below.
First, lines of the source image.
15 74 130 122 1 74 130 82 132 82 132 106 135 106 135 153 137 106 135 118 177 118 177 153 137 122 1 205 130 74 130 81 209 205 130 208 219 81 209 123 266 208 219 123 266 98 211 119 226 119 226 159 215 98 211 119 206 119 206 159 215And lines of the destination image are below.
15 84 83 147 20 84 83 96 82 96 82 119 88 119 88 184 88 119 88 154 127 154 127 184 88 147 20 230 82 84 83 75 135 230 82 221 135 75 135 156 275 221 135 156 275 80 135 156 257 156 257 217 140 80 135 151 154 151 154 217 140The first line is the number of lines n. Next n lines contains four integers - Px, Py, Qx, Qy which represent the coordinate of the starting point and the ending point. For example, the first line or left image is from (74, 130) to (122, 1) which means from left ear to top of the head, and the first line of right image is the corresponding one.
We need to get warped image from the pairs. If the location of X' on the destination image is given, how to calculate the location of X on the source image which is corresponding to X'?
First, we calculate u and v. The value u goes from 0 to 1 as the pixel moves from P' to Q'. The value v is the perpendicular distance from the line to X'. Once we get u and v, we can apply them to the source image, and so X coordinate will be acquired.
In this case, we have multiple pairs of lines, so we need to average the sum of X coordinate weighted by a kind of factor. The factor is set to be inversely proportional to the distance from line and directly proportional to the length of the line. (the details are described in the slide mentioned in "Instruction".)
By calculating source pixels for each pixels in the destination image, we can get warped image whose shape is closer to the destination image.