影像座標轉換是將,影像中某個像素,映射到另一張影像的某個位置,常見的應用有:縮放&旋轉&校正變形影像。
在OpenCV之中,是透過cv::remap()的函式來達成。
本文就是針對remap的函式進行解析。
1. 呼叫OpenCV的remap函式:1.1 remap的函式宣告如下(*1):
cv::remap(InputArray src, OutputArray dst, InputArray map1, InputArray map2, int interpolation, int borderMode=BORDER_CONSTANT, const Scalar& borderValue=Scalar());
src : 想要轉換的影像(任何影像型別)。
dst : 轉換後的影像(與src一樣的型別) 。
map1 : 指定src轉換的座標到dst(X座標),型態可以是CV_16SC2(X,Y) , CV_32FC1(X) , or CV_32FC2(X,Y) 。
map2 : 指定src轉換的座標到dst(Y座標),假設map1為CV_32FC1,則需要輸入此參數,反之,將此參數設定成空白的Mat。
interpolation : 指定在帶有小數點座標時如何轉換,轉換方法有:
(1) INTER_NEAREST : 將小數點的座標做四捨五入,取整數位的座標位置,假設座標為(0.3 , 0.8)則轉換為(0 , 1),轉換的灰階值就會是從src(0 , 1)轉換。
(2) INTER_LINEAR : 將小數點的座標,利用線性內插的方式(2 * 2),找到轉換的灰階值[*2 ]。
(3) INTER_CUBIC : 將小數點的座標,利用三次內插的方式(4 * 4),找到轉換的灰階值[*3]。
(4) INTER_LANCZOS4 : 將小數點的座標,利用LANCZOS的方式(8 * 8),找到轉換的灰階值[*4]。
borderMode : 當指定轉換的座標不在src裡面,轉換的座標方式,預設值: BORDER_CONSTANT。(5) borderValue : 當指定的轉換座標不在src裡面,轉換座標的座標值。
1.2 呼叫範例如下:
#include "opencv2\core.hpp"
#include "opencv2\highgui.hpp"
#include "opencv2\imgproc.hpp"
int main() {
//產生1000 * 1000的影像
cv::Mat mat = cv::Mat::zeros( 1000 , 1000 , CV_8UC1 ) ;
for ( int i = 0 ; i < 1000 ; ++i ) {
uchar *ptr = mat.data + i * mat.step ;
for ( int j = 0 ; j < 1000 ; ++j , ++ptr) {
int val = ( i * j ) % 255 ;
*ptr = (uchar)val ;
}
}
//產生230 * 30 的轉換座標
cv::Mat abc = cv::Mat::zeros( 230 , 30 , CV_32FC2 ) ;
for ( int a = 0 ; a < 230 ; ++a ) {
float *ptr = (float *)(abc.data + a * abc.step);
for ( int j = 0 ; j < 30 ; ++j , ptr += 2 ) {
//這邊可以指定轉換後的座標,無論是旋轉,縮放,變形。
ptr[0] = a + 0.5F ;
ptr[1] = j + 0.5F ;
}
}
//轉換後的影像
cv::Mat dst ;
cv::remap( mat , dst , abc , cv::Mat() , cv::InterpolationFlags::INTER_LINEAR , cv::BorderTypes::BORDER_CONSTANT , cv::Scalar(0)) ;
//顯示結果
cv::imshow("src" , mat) ;
cv::imshow("dst" , dst) ;
cv::waitKey() ;
cv::destroyAllWindows() ;
}
在學會使用remap的函式之後,來研究一下OpenCV裡面是如何實作。
2. 解析:
2.1 運算概念: 其實裡面只是將指定的座標,套用內插法,將灰階值填入對應的位置。
寫成數學式就是 dst(i , j) = src( map( i , j ) )
2.2 SSE指令集(SIMD)加速,SSE是CPU中特殊的指令集[*5],然而CPU裡面包含特殊的暫存器(128 bits),
其指令可以一次對128 bits的資料進行運算(and , or , add , multi),相對於一次只處理1個8 bits的灰階值的運算,
SSE最多可以把16 個8 bits灰階值進行一次的運算。
2.3 以內插法為Bilinear的為例(程式碼摘錄自OpenCV 3.2之中的imgwarp.cpp):
(1)
假設目前要內插的位置 = ( 1.2 , 1.8 ),且鄰近的灰階值為 : [ 100 , 200 ; 120 , 220 ],
則dst( 1.2 , 1.8 )
= [ ( 100 * ( 1 - 0.2 ) + 200 * 0.2 ) * ( 1 - 0.8 ) + ( 120 * ( 1 - 0.2 ) + 220 * 0.2 ) * ( 0.8 ) ]
= ( 100 * 0.8 * 0.2 + 200 * 0.2 * 0.2 ) + ( 120 * 0.8 * 0.8 + 220 * 0.2 * 0.8 )
= 100 * coef[0] + 200 * coef[1] + 120 * coef[2] + 220 * coef[3]
係數的部分就可以先用一張表存起來,往後只需要查表做乘積即可得到灰階值。
可是電腦無法儲存如此多的係數(0~1,有無限多的小數),所以OpenCV裡面將1個像素(pixel),
切分成32等分,以下為建立表格的程式碼。
const int INTER_RESIZE_COEF_BITS=11;
const int INTER_RESIZE_COEF_SCALE=1 << INTER_RESIZE_COEF_BITS;
const int INTER_REMAP_COEF_BITS=15;
const int INTER_REMAP_COEF_SCALE=1 << INTER_REMAP_COEF_BITS;
static uchar NNDeltaTab_i[INTER_TAB_SIZE2][2];
static float BilinearTab_f[INTER_TAB_SIZE2][2][2];
static short BilinearTab_i[INTER_TAB_SIZE2][2][2];
#if CV_SSE2 || CV_NEON
static short BilinearTab_iC4_buf[INTER_TAB_SIZE2+2][2][8];
static short (*BilinearTab_iC4)[2][8] = (short (*)[2][8])alignPtr(BilinearTab_iC4_buf, 16);
#endif
static float BicubicTab_f[INTER_TAB_SIZE2][4][4];
static short BicubicTab_i[INTER_TAB_SIZE2][4][4];
static float Lanczos4Tab_f[INTER_TAB_SIZE2][8][8];
static short Lanczos4Tab_i[INTER_TAB_SIZE2][8][8];
static inline void interpolateLinear( float x, float* coeffs )
{
coeffs[0] = 1.f - x;
coeffs[1] = x;
}
static inline void interpolateCubic( float x, float* coeffs )
{
const float A = -0.75f;
coeffs[0] = ((A*(x + 1) - 5*A)*(x + 1) + 8*A)*(x + 1) - 4*A;
coeffs[1] = ((A + 2)*x - (A + 3))*x*x + 1;
coeffs[2] = ((A + 2)*(1 - x) - (A + 3))*(1 - x)*(1 - x) + 1;
coeffs[3] = 1.f - coeffs[0] - coeffs[1] - coeffs[2];
}
static inline void interpolateLanczos4( float x, float* coeffs )
{
static const double s45 = 0.70710678118654752440084436210485;
static const double cs[][2]=
{{1, 0}, {-s45, -s45}, {0, 1}, {s45, -s45}, {-1, 0}, {s45, s45}, {0, -1}, {-s45, s45}};
if( x < FLT_EPSILON )
{
for( int i = 0; i < 8; i++ )
coeffs[i] = 0;
coeffs[3] = 1;
return;
}
float sum = 0;
double y0=-(x+3)*CV_PI*0.25, s0 = sin(y0), c0=cos(y0);
for(int i = 0; i < 8; i++ )
{
double y = -(x+3-i)*CV_PI*0.25;
coeffs[i] = (float)((cs[i][0]*s0 + cs[i][1]*c0)/(y*y));
sum += coeffs[i];
}
sum = 1.f/sum;
for(int i = 0; i < 8; i++ )
coeffs[i] *= sum;
}
static void initInterTab1D(int method, float* tab, int tabsz)
{
float scale = 1.f/tabsz;
if( method == INTER_LINEAR )
{
for( int i = 0; i < tabsz; i++, tab += 2 )
interpolateLinear( i*scale, tab );
}
else if( method == INTER_CUBIC )
{
for( int i = 0; i < tabsz; i++, tab += 4 )
interpolateCubic( i*scale, tab );
}
else if( method == INTER_LANCZOS4 )
{
for( int i = 0; i < tabsz; i++, tab += 8 )
interpolateLanczos4( i*scale, tab );
}
}
static const void* initInterTab2D( int method, bool fixpt )
{
static bool inittab[INTER_MAX+1] = {false};
float* tab = 0;
short* itab = 0;
int ksize = 0;
if( method == INTER_LINEAR )
tab = BilinearTab_f[0][0], itab = BilinearTab_i[0][0], ksize=2;
else if( method == INTER_CUBIC )
tab = BicubicTab_f[0][0], itab = BicubicTab_i[0][0], ksize=4;
else if( method == INTER_LANCZOS4 )
tab = Lanczos4Tab_f[0][0], itab = Lanczos4Tab_i[0][0], ksize=8;
if( !inittab[method] )
{
AutoBuffer _tab(8*INTER_TAB_SIZE);
int i, j, k1, k2;
initInterTab1D(method, _tab, INTER_TAB_SIZE);
for( i = 0; i < INTER_TAB_SIZE; i++ )
for( j = 0; j < INTER_TAB_SIZE; j++, tab += ksize*ksize, itab += ksize*ksize )
{
int isum = 0;
NNDeltaTab_i[i*INTER_TAB_SIZE+j][0] = j < INTER_TAB_SIZE/2;
NNDeltaTab_i[i*INTER_TAB_SIZE+j][1] = i < INTER_TAB_SIZE/2;
//for each 1D Tab Coef
for( k1 = 0; k1 < ksize; k1++ )
{
//vy = 1D Tab Coef
float vy = _tab[i*ksize + k1];
for( k2 = 0; k2 < ksize; k2++ )
{
//
float v = vy*_tab[j*ksize + k2];
tab[k1*ksize + k2] = v;
isum +=
itab[k1*ksize + k2] =
saturate_cast(v*INTER_REMAP_COEF_SCALE);
}
}
if( isum != INTER_REMAP_COEF_SCALE )
{
int diff = isum - INTER_REMAP_COEF_SCALE;
int ksize2 = ksize/2, Mk1=ksize2, Mk2=ksize2, mk1=ksize2, mk2=ksize2;
for( k1 = ksize2; k1 < ksize2+2; k1++ )
for( k2 = ksize2; k2 < ksize2+2; k2++ )
{
if( itab[k1*ksize+k2] < itab[mk1*ksize+mk2] )
mk1 = k1, mk2 = k2;
else if( itab[k1*ksize+k2] > itab[Mk1*ksize+Mk2] )
Mk1 = k1, Mk2 = k2;
}
if( diff < 0 )
itab[Mk1*ksize + Mk2] = (short)(itab[Mk1*ksize + Mk2] - diff);
else
itab[mk1*ksize + mk2] = (short)(itab[mk1*ksize + mk2] - diff);
}
}
tab -= INTER_TAB_SIZE2*ksize*ksize;
itab -= INTER_TAB_SIZE2*ksize*ksize;
#if CV_SSE2 || CV_NEON
if( method == INTER_LINEAR )
{
for( i = 0; i < INTER_TAB_SIZE2; i++ )
for( j = 0; j < 4; j++ )
{
BilinearTab_iC4[i][0][j*2] = BilinearTab_i[i][0][0];
BilinearTab_iC4[i][0][j*2+1] = BilinearTab_i[i][0][1];
BilinearTab_iC4[i][1][j*2] = BilinearTab_i[i][1][0];
BilinearTab_iC4[i][1][j*2+1] = BilinearTab_i[i][1][1];
}
}
#endif
inittab[method] = true;
}
return fixpt ? (const void*)itab : (const void*)tab;
}
(2) 接下來將轉換的map轉成整數與小數部,並將小數部換算成表的索引位置。
int x1 = 0 ;
for( ; x1 < bcols; x1++ )
{
int sx = cvRound(sXY[x1*2]*INTER_TAB_SIZE);
int sy = cvRound(sXY[x1*2+1]*INTER_TAB_SIZE);
int v = (sy & (INTER_TAB_SIZE-1))*INTER_TAB_SIZE + (sx & (INTER_TAB_SIZE-1));
XY[x1*2] = saturate_cast(sx >> INTER_BITS); //取整數部分
XY[x1*2+1] = saturate_cast(sy >> INTER_BITS);
A[x1] = (ushort)v; //取小數點部分
}
(3) 算出灰階值並填入dst。
struct RemapVec_8u
{
int operator()( const Mat& _src, void* _dst, const short* XY,
const ushort* FXY, const void* _wtab, int width ) const
{
//假設 src.step = 1000
// ch = 1
int cn = _src.channels(), x = 0, sstep = (int)_src.step;
if( (cn != 1 && cn != 3 && cn != 4) || !checkHardwareSupport(CV_CPU_SSE2) ||
sstep > 0x8000 )
return 0;
const uchar *S0 = _src.ptr(), *S1 = _src.ptr(1);
const short* wtab = cn == 1 ? (const short*)_wtab : &BilinearTab_iC4[0][0][0];
uchar* D = (uchar*)_dst;
//delta = REMAP_COEF_SCALE / 2 = 32768 / 2
// = 0x00004000 , 0x00004000 , 0x00004000 , 0x00004000
__m128i delta = _mm_set1_epi32(INTER_REMAP_COEF_SCALE/2);
//xy2ofs = 1 + ( 0x000003E8 << 16 )
// = 1 + ( 0x03E80000 )
// = 0x03E80001 , 0x03E80001 , 0x03E80001 , 0x03E80001
__m128i xy2ofs = _mm_set1_epi32(cn + (sstep << 16));
//z = 0x00000000 , 0x00000000 , 0x00000000 , 0x00000000
__m128i z = _mm_setzero_si128();
int CV_DECL_ALIGNED(16) iofs0[4], iofs1[4];
__m128i tmp1 , tmp2 ;
if( cn == 1 )
{
for( ; x <= width - 8; x += 8 )
{
//load 16 bytes = map of (x , y) * 8
__m128i xy0 = _mm_loadu_si128( (const __m128i*)(XY + x*2));
__m128i xy1 = _mm_loadu_si128( (const __m128i*)(XY + x*2 + 8));
__m128i v0, v1, v2, v3, a0, a1, b0, b1;
unsigned i0, i1;
//y0 , x0 , y1 , x1 , y2 , x2 , y3 , x3
// xy0 = (y0 * widthStep + x0 , y1 * widthStep + x1 , y2 * widthStep + x2 , y3 * widthStep + x3 )
// = (p0 , p1 , p2 , p3)
xy0 = _mm_madd_epi16( xy0, xy2ofs );
//y4 , x4 , y5 , x5 , y6 , x6 , y7 , x7
// xy1 = (y4 * widthStep + x4 , y5 * widthStep + x5 , y6 * widthStep + x6 , y7 * widthStep + x7 )
// = (p4 , p5 , p6 , p7)
xy1 = _mm_madd_epi16( xy1, xy2ofs );
//save target offset(index) to src
_mm_store_si128( (__m128i*)iofs0, xy0 );
_mm_store_si128( (__m128i*)iofs1, xy1 );
//get the pixel value from src
//i0 = (ushort)(src[p0] , src[p0 + 1]) | ((ushort)(src[p1] , src[p1 + 1])) << 16
i0 = *(ushort*)(S0 + iofs0[0]) + (*(ushort*)(S0 + iofs0[1]) << 16);
//i1 = (ushort)(src[p2] , src[p2 + 1]) | ((ushort)(src[p3] , src[p3 + 1])) << 16
i1 = *(ushort*)(S0 + iofs0[2]) + (*(ushort*)(S0 + iofs0[3]) << 16);
//v0 = ( src[p0] , src[p0 + 1], src[p1] , src[p1 + 1] ,
// src[p2] , src[p2 + 1], src[p3] , src[p3 + 1] ,
// 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
// = ( s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
v0 = _mm_unpacklo_epi32(
_mm_cvtsi32_si128(i0), // ( src[p0] , src[p0 + 1], src[p1] , src[p1 + 1] , 0 , 0 , 0 , 0 )
_mm_cvtsi32_si128(i1) // ( src[p2] , src[p2 + 1], src[p3] , src[p3 + 1] , 0 , 0 , 0 , 0 )
);
//i0 = (ushort)(srcN[p0] , srcN[p0 + 1]) | ((ushort)(srcN[p1] , srcN[p1 + 1])) << 16
i0 = *(ushort*)(S1 + iofs0[0]) + (*(ushort*)(S1 + iofs0[1]) << 16);
//i1 = (ushort)(srcN[p2] , srcN[p2 + 1]) | ((ushort)(srcN[p3] , srcN[p3 + 1])) << 16
i1 = *(ushort*)(S1 + iofs0[2]) + (*(ushort*)(S1 + iofs0[3]) << 16);
//v1 = (srcN[p0] , srcN[p0 + 1] , srcN[p1] , srcN[p1 + 1] ,
// (srcN[p2] , srcN[p2 + 1] , srcN[p3] , srcN[p3 + 1] ,
// 0 , 0 , 0 , 0 , 0 , 0 , 0 ,0)
// = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
v1 = _mm_unpacklo_epi32(
_mm_cvtsi32_si128(i0), // ( srcN[p0] , srcN[p0 + 1] , srcN[p1] , srcN[p1 + 1] , 0 , 0 , 0 , 0 )
_mm_cvtsi32_si128(i1) // ( srcN[p2] , srcN[p2 + 1] , srcN[p3] , srcN[p3 + 1] , 0 , 0 , 0 , 0 )
);
//v0 = ( 0 , s00 , 0 , s01 , 0 , s10 , 0 , s11 , 0 , s20 , 0 , s21 , 0 , s30 , 0 , s31)
// = ( s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 )
v0 = _mm_unpacklo_epi8(v0, z);
//v1 = ( 0 , n00 , 0 , n01 , 0 , n10 , 0 , n11 , 0 , n20 , 0 , n21 , 0 , n30 , 0 , n31)
// = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 )
v1 = _mm_unpacklo_epi8(v1, z);
//load table by FXY(floating part)
//load p0 interpolate table(4 short value , i00 , i01 , i02 , i03)
tmp1 = _mm_loadl_epi64(
(__m128i*)(wtab+FXY[x]*4));
//load p1 interpolate table( i10 , i11 , i12 , i13 )
tmp2 = _mm_loadl_epi64(
(__m128i*)(wtab+FXY[x+1]*4));
//( i00 , i01 , i02 , i03 , i10 , i11 , i12 , i13 )
a0 = _mm_unpacklo_epi32 (
tmp1 ,
tmp2
) ;
//i20 , i21 , i22 , i23
tmp1 = _mm_loadl_epi64(
(__m128i*)(wtab+FXY[x+2]*4));
//i30 , i31 , i32 , i33
tmp2 = _mm_loadl_epi64(
(__m128i*)(wtab+FXY[x+3]*4));
//( i20 , i21 , i22 , i23 , i30 , i31 , i32 , i33 )
a1 = _mm_unpacklo_epi32(
tmp1,
tmp2
);
//(i00 , i01 , i10 , i11 , i20 , i21 , i30 , i31)
b0 = _mm_unpacklo_epi64(a0, a1);
//(i02 , i03 , i12 , i13 , i22 , i23 , i32 , i33)
b1 = _mm_unpackhi_epi64(a0, a1);
//x direction interpolate
/*
// v0 = (s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 )
// b0 = (i00 , i01 , i10 , i11 , i20 , i21 , i30 , i31 )
*/
// ( s00 * i00 + s01 * i01 ,
// s10 * i10 + s11 * i11 ,
// s20 * i20 + s21 * i21 ,
// s30 * i30 + s31 * i31 )
// = ( y0 , y1 , y2 , y3 )
v0 = _mm_madd_epi16(v0, b0);
//y direction interpolate
// v1 = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 )
// b1 = ( i02 , i03 , i12 , i13 , i22 , i23 , i32 , i33 )
//
// ( n00 * i02 + n01 * i03 ,
// n10 * i12 + n11 * i13 ,
// n20 * i22 + n21 * i23 ,
// n30 * i32 + n31 * i33 )
// = ( x0 , x1 , x2 , x3 )
v1 = _mm_madd_epi16(v1, b1);
//v0 = ( y0 + x0 + delta , y1 + x1 + delta , y2 + x2 + delta , y3 + x3 + delta )
v0 = _mm_add_epi32(_mm_add_epi32(v0, v1), delta);
//next 4 * (x , y) pixel
i0 = *(ushort*)(S0 + iofs1[0]) + (*(ushort*)(S0 + iofs1[1]) << 16);
i1 = *(ushort*)(S0 + iofs1[2]) + (*(ushort*)(S0 + iofs1[3]) << 16);
v2 = _mm_unpacklo_epi32(
_mm_cvtsi32_si128(i0),
_mm_cvtsi32_si128(i1)
);
i0 = *(ushort*)(S1 + iofs1[0]) + (*(ushort*)(S1 + iofs1[1]) << 16);
i1 = *(ushort*)(S1 + iofs1[2]) + (*(ushort*)(S1 + iofs1[3]) << 16);
v3 = _mm_unpacklo_epi32(
_mm_cvtsi32_si128(i0),
_mm_cvtsi32_si128(i1)
);
v2 = _mm_unpacklo_epi8(v2, z);
v3 = _mm_unpacklo_epi8(v3, z);
a0 = _mm_unpacklo_epi32(_mm_loadl_epi64((__m128i*)(wtab+FXY[x+4]*4)),
_mm_loadl_epi64((__m128i*)(wtab+FXY[x+5]*4)));
a1 = _mm_unpacklo_epi32(_mm_loadl_epi64((__m128i*)(wtab+FXY[x+6]*4)),
_mm_loadl_epi64((__m128i*)(wtab+FXY[x+7]*4)));
b0 = _mm_unpacklo_epi64(a0, a1);
b1 = _mm_unpackhi_epi64(a0, a1);
v2 = _mm_madd_epi16(v2, b0);
v3 = _mm_madd_epi16(v3, b1);z
v2 = _mm_add_epi32(_mm_add_epi32(v2, v3), delta);
//------------------------------------------------------------------------
v0 = _mm_srai_epi32(v0, INTER_REMAP_COEF_BITS);
v2 = _mm_srai_epi32(v2, INTER_REMAP_COEF_BITS);
v0 = _mm_packus_epi16(_mm_packs_epi32(v0, v2), z);
_mm_storel_epi64( (__m128i*)(D + x), v0 );
}
}
return x;
}
};
以上如有錯誤請指教。
*1: remap宣告(https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=remap#remap)
*2: Bilinear內插(https://en.wikipedia.org/wiki/Bilinear_interpolation)
*3: Bicubic內插(https://en.wikipedia.org/wiki/Bicubic_interpolation)
*4: Lanczos取樣(https://en.wikipedia.org/wiki/Lanczos_resampling)
*5: SSE指令集參考(https://software.intel.com/sites/landingpage/IntrinsicsGuide/)
沒有留言:
張貼留言