2017年11月4日 星期六

OpenCV影像座標轉換(remap)程式碼解析

影像座標轉換是將,影像中某個像素,映射到另一張影像的某個位置,常見的應用有:縮放&旋轉&校正變形影像。
在OpenCV之中,是透過cv::remap()的函式來達成。
本文就是針對remap的函式進行解析。

1. 呼叫OpenCV的remap函式:
1.1 remap的函式宣告如下(*1):
cv::remap(InputArray src, OutputArray dst, InputArray map1, InputArray map2, int interpolation, int borderMode=BORDER_CONSTANT, const Scalar& borderValue=Scalar()); 
src : 想要轉換的影像(任何影像型別)。
dst : 轉換後的影像(與src一樣的型別) 。
map1 : 指定src轉換的座標到dst(X座標),型態可以是CV_16SC2(X,Y) , CV_32FC1(X) , or CV_32FC2(X,Y) 。
map2 : 指定src轉換的座標到dst(Y座標),假設map1為CV_32FC1,則需要輸入此參數,反之,將此參數設定成空白的Mat。
interpolation : 指定在帶有小數點座標時如何轉換,轉換方法有:
(1) INTER_NEAREST : 將小數點的座標做四捨五入,取整數位的座標位置,假設座標為(0.3 , 0.8)則轉換為(0 , 1),轉換的灰階值就會是從src(0 , 1)轉換。
(2) INTER_LINEAR : 將小數點的座標,利用線性內插的方式(2 * 2),找到轉換的灰階值[*2 ]。
(3) INTER_CUBIC  : 將小數點的座標,利用三次內插的方式(4 * 4),找到轉換的灰階值[*3]。
(4) INTER_LANCZOS4 : 將小數點的座標,利用LANCZOS的方式(8 * 8),找到轉換的灰階值[*4]。
borderMode : 當指定轉換的座標不在src裡面,轉換的座標方式,預設值: BORDER_CONSTANT。
(5) borderValue : 當指定的轉換座標不在src裡面,轉換座標的座標值。


1.2 呼叫範例如下:


#include "opencv2\core.hpp"
#include "opencv2\highgui.hpp"
#include "opencv2\imgproc.hpp"
int main() {
 //產生1000 * 1000的影像
 cv::Mat mat = cv::Mat::zeros( 1000 , 1000 , CV_8UC1 ) ;
 for ( int i = 0 ; i < 1000 ; ++i ) {
  uchar *ptr = mat.data + i * mat.step ;
  for ( int j = 0 ; j < 1000 ; ++j , ++ptr) {
    int val = ( i * j ) % 255 ;
    *ptr = (uchar)val ;
  }
 }
 //產生230 * 30 的轉換座標
 cv::Mat abc = cv::Mat::zeros( 230 , 30 , CV_32FC2 ) ;
 for ( int a = 0 ; a < 230 ; ++a ) {
  float *ptr = (float *)(abc.data + a * abc.step);
  for ( int j = 0 ; j < 30 ; ++j , ptr += 2 ) {
   //這邊可以指定轉換後的座標,無論是旋轉,縮放,變形。
   ptr[0] = a + 0.5F ;
   ptr[1] = j + 0.5F ;
  }
 }
 //轉換後的影像
 cv::Mat dst ;
 cv::remap( mat , dst , abc , cv::Mat() , cv::InterpolationFlags::INTER_LINEAR , cv::BorderTypes::BORDER_CONSTANT , cv::Scalar(0)) ;
 //顯示結果
 cv::imshow("src" , mat) ;
 cv::imshow("dst" , dst) ;
 cv::waitKey() ;
 cv::destroyAllWindows() ;
}


在學會使用remap的函式之後,來研究一下OpenCV裡面是如何實作。
2. 解析:
2.1 運算概念: 其實裡面只是將指定的座標,套用內插法,將灰階值填入對應的位置。
寫成數學式就是 dst(i , j) = src( map( i , j ) )
2.2 SSE指令集(SIMD)加速,SSE是CPU中特殊的指令集[*5],然而CPU裡面包含特殊的暫存器(128 bits),
其指令可以一次對128 bits的資料進行運算(and , or , add , multi),相對於一次只處理1個8 bits的灰階值的運算,
SSE最多可以把16 個8 bits灰階值進行一次的運算。
2.3 以內插法為Bilinear的為例(程式碼摘錄自OpenCV 3.2之中的imgwarp.cpp):
(1)
假設目前要內插的位置 = ( 1.2 , 1.8 ),且鄰近的灰階值為 : [ 100 , 200 ; 120 , 220 ],
則dst( 1.2 , 1.8 )
= [ ( 100 * ( 1 - 0.2 ) + 200 * 0.2 ) * ( 1 - 0.8 ) + ( 120 * ( 1 - 0.2 ) + 220 * 0.2 ) * ( 0.8 ) ]
= ( 100 * 0.8 * 0.2 + 200 * 0.2 * 0.2 ) + ( 120 * 0.8 * 0.8 + 220 * 0.2 * 0.8 )
= 100 * coef[0] + 200 * coef[1] + 120 * coef[2] + 220 * coef[3]
係數的部分就可以先用一張表存起來,往後只需要查表做乘積即可得到灰階值。
可是電腦無法儲存如此多的係數(0~1,有無限多的小數),所以OpenCV裡面將1個像素(pixel),
切分成32等分,以下為建立表格的程式碼。


const int INTER_RESIZE_COEF_BITS=11;
const int INTER_RESIZE_COEF_SCALE=1 << INTER_RESIZE_COEF_BITS;

const int INTER_REMAP_COEF_BITS=15;
const int INTER_REMAP_COEF_SCALE=1 << INTER_REMAP_COEF_BITS;

static uchar NNDeltaTab_i[INTER_TAB_SIZE2][2];

static float BilinearTab_f[INTER_TAB_SIZE2][2][2];
static short BilinearTab_i[INTER_TAB_SIZE2][2][2];

#if CV_SSE2 || CV_NEON
static short BilinearTab_iC4_buf[INTER_TAB_SIZE2+2][2][8];
static short (*BilinearTab_iC4)[2][8] = (short (*)[2][8])alignPtr(BilinearTab_iC4_buf, 16);
#endif

static float BicubicTab_f[INTER_TAB_SIZE2][4][4];
static short BicubicTab_i[INTER_TAB_SIZE2][4][4];

static float Lanczos4Tab_f[INTER_TAB_SIZE2][8][8];
static short Lanczos4Tab_i[INTER_TAB_SIZE2][8][8];

static inline void interpolateLinear( float x, float* coeffs )
{
    coeffs[0] = 1.f - x;
    coeffs[1] = x;
}

static inline void interpolateCubic( float x, float* coeffs )
{
    const float A = -0.75f;

    coeffs[0] = ((A*(x + 1) - 5*A)*(x + 1) + 8*A)*(x + 1) - 4*A;
    coeffs[1] = ((A + 2)*x - (A + 3))*x*x + 1;
    coeffs[2] = ((A + 2)*(1 - x) - (A + 3))*(1 - x)*(1 - x) + 1;
    coeffs[3] = 1.f - coeffs[0] - coeffs[1] - coeffs[2];
}

static inline void interpolateLanczos4( float x, float* coeffs )
{
    static const double s45 = 0.70710678118654752440084436210485;
    static const double cs[][2]=
    {{1, 0}, {-s45, -s45}, {0, 1}, {s45, -s45}, {-1, 0}, {s45, s45}, {0, -1}, {-s45, s45}};

    if( x < FLT_EPSILON )
    {
        for( int i = 0; i < 8; i++ )
            coeffs[i] = 0;
        coeffs[3] = 1;
        return;
    }

    float sum = 0;
    double y0=-(x+3)*CV_PI*0.25, s0 = sin(y0), c0=cos(y0);
    for(int i = 0; i < 8; i++ )
    {
        double y = -(x+3-i)*CV_PI*0.25;
        coeffs[i] = (float)((cs[i][0]*s0 + cs[i][1]*c0)/(y*y));
        sum += coeffs[i];
    }

    sum = 1.f/sum;
    for(int i = 0; i < 8; i++ )
        coeffs[i] *= sum;
}

static void initInterTab1D(int method, float* tab, int tabsz)
{
    float scale = 1.f/tabsz;
    if( method == INTER_LINEAR )
    {
        for( int i = 0; i < tabsz; i++, tab += 2 )
            interpolateLinear( i*scale, tab );
    }
    else if( method == INTER_CUBIC )
    {
        for( int i = 0; i < tabsz; i++, tab += 4 )
            interpolateCubic( i*scale, tab );
    }
    else if( method == INTER_LANCZOS4 )
    {
        for( int i = 0; i < tabsz; i++, tab += 8 )
            interpolateLanczos4( i*scale, tab );
    }
}

static const void* initInterTab2D( int method, bool fixpt )
{
    static bool inittab[INTER_MAX+1] = {false};
    float* tab = 0;
    short* itab = 0;
    int ksize = 0;
    if( method == INTER_LINEAR )
        tab = BilinearTab_f[0][0], itab = BilinearTab_i[0][0], ksize=2;
    else if( method == INTER_CUBIC )
        tab = BicubicTab_f[0][0], itab = BicubicTab_i[0][0], ksize=4;
    else if( method == INTER_LANCZOS4 )
        tab = Lanczos4Tab_f[0][0], itab = Lanczos4Tab_i[0][0], ksize=8;

    if( !inittab[method] )
    {
        AutoBuffer _tab(8*INTER_TAB_SIZE);
        int i, j, k1, k2;
        initInterTab1D(method, _tab, INTER_TAB_SIZE);
        for( i = 0; i < INTER_TAB_SIZE; i++ )
            for( j = 0; j < INTER_TAB_SIZE; j++, tab += ksize*ksize, itab += ksize*ksize )
            {
                int isum = 0;
                NNDeltaTab_i[i*INTER_TAB_SIZE+j][0] = j < INTER_TAB_SIZE/2;
                NNDeltaTab_i[i*INTER_TAB_SIZE+j][1] = i < INTER_TAB_SIZE/2;

    //for each 1D Tab Coef
                for( k1 = 0; k1 < ksize; k1++ )
                {
     //vy = 1D Tab Coef
                    float vy = _tab[i*ksize + k1];
                    for( k2 = 0; k2 < ksize; k2++ )
                    {
      //
                        float v = vy*_tab[j*ksize + k2];
                        tab[k1*ksize + k2] = v;
                        isum += 
       itab[k1*ksize + k2] =
       saturate_cast(v*INTER_REMAP_COEF_SCALE);
                    }
                }

                if( isum != INTER_REMAP_COEF_SCALE )
                {
                    int diff = isum - INTER_REMAP_COEF_SCALE;
                    int ksize2 = ksize/2, Mk1=ksize2, Mk2=ksize2, mk1=ksize2, mk2=ksize2;
                    for( k1 = ksize2; k1 < ksize2+2; k1++ )
                        for( k2 = ksize2; k2 < ksize2+2; k2++ )
                        {
                            if( itab[k1*ksize+k2] < itab[mk1*ksize+mk2] )
                                mk1 = k1, mk2 = k2;
                            else if( itab[k1*ksize+k2] > itab[Mk1*ksize+Mk2] )
                                Mk1 = k1, Mk2 = k2;
                        }
                    if( diff < 0 )
                        itab[Mk1*ksize + Mk2] = (short)(itab[Mk1*ksize + Mk2] - diff);
                    else
                        itab[mk1*ksize + mk2] = (short)(itab[mk1*ksize + mk2] - diff);
                }
            }
        tab -= INTER_TAB_SIZE2*ksize*ksize;
        itab -= INTER_TAB_SIZE2*ksize*ksize;
#if CV_SSE2 || CV_NEON
        if( method == INTER_LINEAR )
        {
            for( i = 0; i < INTER_TAB_SIZE2; i++ )
                for( j = 0; j < 4; j++ )
                {
                    BilinearTab_iC4[i][0][j*2] = BilinearTab_i[i][0][0];
                    BilinearTab_iC4[i][0][j*2+1] = BilinearTab_i[i][0][1];
                    BilinearTab_iC4[i][1][j*2] = BilinearTab_i[i][1][0];
                    BilinearTab_iC4[i][1][j*2+1] = BilinearTab_i[i][1][1];
                }
        }
#endif
        inittab[method] = true;
    }
    return fixpt ? (const void*)itab : (const void*)tab;
}


(2) 接下來將轉換的map轉成整數與小數部,並將小數部換算成表的索引位置。

int x1 = 0 ;
for( ; x1 < bcols; x1++ )
{
    int sx = cvRound(sXY[x1*2]*INTER_TAB_SIZE);
    int sy = cvRound(sXY[x1*2+1]*INTER_TAB_SIZE);
    int v = (sy & (INTER_TAB_SIZE-1))*INTER_TAB_SIZE + (sx & (INTER_TAB_SIZE-1));
    XY[x1*2] = saturate_cast(sx >> INTER_BITS); //取整數部分
    XY[x1*2+1] = saturate_cast(sy >> INTER_BITS); 
    A[x1] = (ushort)v; //取小數點部分
}

(3) 算出灰階值並填入dst。

struct RemapVec_8u
{
    int operator()( const Mat& _src, void* _dst, const short* XY,
                    const ushort* FXY, const void* _wtab, int width ) const
    {
  //假設  src.step = 1000
  //  ch = 1
        int cn = _src.channels(), x = 0, sstep = (int)_src.step;

        if( (cn != 1 && cn != 3 && cn != 4) || !checkHardwareSupport(CV_CPU_SSE2) ||
            sstep > 0x8000 )
            return 0;

        const uchar *S0 = _src.ptr(), *S1 = _src.ptr(1);
        const short* wtab = cn == 1 ? (const short*)_wtab : &BilinearTab_iC4[0][0][0];
        uchar* D = (uchar*)_dst;
  //delta = REMAP_COEF_SCALE / 2 = 32768 / 2
  //  = 0x00004000 , 0x00004000 , 0x00004000 , 0x00004000
        __m128i delta = _mm_set1_epi32(INTER_REMAP_COEF_SCALE/2);
  //xy2ofs = 1 + ( 0x000003E8 << 16 )
  //   = 1 + ( 0x03E80000 )
  //   = 0x03E80001 , 0x03E80001 , 0x03E80001 , 0x03E80001
        __m128i xy2ofs = _mm_set1_epi32(cn + (sstep << 16));
  //z   = 0x00000000 , 0x00000000 , 0x00000000 , 0x00000000
        __m128i z = _mm_setzero_si128();
        int CV_DECL_ALIGNED(16) iofs0[4], iofs1[4];
  __m128i tmp1 , tmp2 ;

        if( cn == 1 )
        {
            for( ; x <= width - 8; x += 8 )
            {
    //load 16 bytes = map of (x , y) * 8
                __m128i xy0 = _mm_loadu_si128( (const __m128i*)(XY + x*2));
                __m128i xy1 = _mm_loadu_si128( (const __m128i*)(XY + x*2 + 8));
                __m128i v0, v1, v2, v3, a0, a1, b0, b1;
                unsigned i0, i1;

    //y0 , x0 , y1 , x1 , y2 , x2 , y3 , x3
    // xy0 = (y0 * widthStep + x0 , y1 * widthStep + x1 , y2 * widthStep + x2 , y3 * widthStep + x3 )
    //  = (p0 , p1 , p2 , p3) 
                xy0 = _mm_madd_epi16( xy0, xy2ofs );
    //y4 , x4 , y5 , x5 , y6 , x6 , y7 , x7
    // xy1 = (y4 * widthStep + x4 , y5 * widthStep + x5 , y6 * widthStep + x6 , y7 * widthStep + x7 )
    //  = (p4 , p5 , p6 , p7)
                xy1 = _mm_madd_epi16( xy1, xy2ofs );
    //save target offset(index) to src
                _mm_store_si128( (__m128i*)iofs0, xy0 );
                _mm_store_si128( (__m128i*)iofs1, xy1 );

    //get the pixel value from src
    //i0 = (ushort)(src[p0] , src[p0 + 1]) | ((ushort)(src[p1] , src[p1 + 1])) << 16
                i0 = *(ushort*)(S0 + iofs0[0]) + (*(ushort*)(S0 + iofs0[1]) << 16);
    //i1 = (ushort)(src[p2] , src[p2 + 1]) | ((ushort)(src[p3] , src[p3 + 1])) << 16
                i1 = *(ushort*)(S0 + iofs0[2]) + (*(ushort*)(S0 + iofs0[3]) << 16);
    //v0 = ( src[p0] , src[p0 + 1], src[p1] , src[p1 + 1] ,
    //   src[p2] , src[p2 + 1], src[p3] , src[p3 + 1] ,
    //   0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
    //  = ( s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
                v0 = _mm_unpacklo_epi32( 
     _mm_cvtsi32_si128(i0),  // ( src[p0] , src[p0 + 1], src[p1] , src[p1 + 1] , 0 , 0 , 0 , 0 )
     _mm_cvtsi32_si128(i1) // ( src[p2] , src[p2 + 1], src[p3] , src[p3 + 1] , 0 , 0 , 0 , 0 )
     ); 
                //i0 = (ushort)(srcN[p0] , srcN[p0 + 1]) | ((ushort)(srcN[p1] , srcN[p1 + 1])) << 16
    i0 = *(ushort*)(S1 + iofs0[0]) + (*(ushort*)(S1 + iofs0[1]) << 16);
    //i1 = (ushort)(srcN[p2] , srcN[p2 + 1]) | ((ushort)(srcN[p3] , srcN[p3 + 1])) << 16
                i1 = *(ushort*)(S1 + iofs0[2]) + (*(ushort*)(S1 + iofs0[3]) << 16);
    //v1 = (srcN[p0] , srcN[p0 + 1] , srcN[p1] , srcN[p1 + 1] ,
    //    (srcN[p2] , srcN[p2 + 1] , srcN[p3] , srcN[p3 + 1] ,
    //    0 , 0 , 0 , 0 , 0 , 0 , 0 ,0)
    //  = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )
                v1 = _mm_unpacklo_epi32(
     _mm_cvtsi32_si128(i0), // ( srcN[p0] , srcN[p0 + 1] , srcN[p1] , srcN[p1 + 1] , 0 , 0 , 0 , 0 )
     _mm_cvtsi32_si128(i1)  // ( srcN[p2] , srcN[p2 + 1] , srcN[p3] , srcN[p3 + 1] , 0 , 0 , 0 , 0 )
     );
                //v0 = ( 0 , s00 , 0 , s01 , 0 , s10 , 0 , s11 , 0 , s20 , 0 , s21 , 0 , s30 , 0 , s31)
    //  = ( s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 )
    v0 = _mm_unpacklo_epi8(v0, z);
    //v1 = ( 0 , n00 , 0 , n01 , 0 , n10 , 0 , n11 , 0 , n20 , 0 , n21 , 0 , n30 , 0 , n31)
    //  = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 )
                v1 = _mm_unpacklo_epi8(v1, z);
    
    //load table by FXY(floating part)
    //load p0 interpolate table(4 short value , i00 , i01 , i02 , i03)
    tmp1 = _mm_loadl_epi64(
      (__m128i*)(wtab+FXY[x]*4));
    //load p1 interpolate table( i10 , i11 , i12 , i13 )
    tmp2 = _mm_loadl_epi64(
      (__m128i*)(wtab+FXY[x+1]*4));

    //( i00 , i01 , i02 , i03 , i10 , i11 , i12 , i13 )
                a0 = _mm_unpacklo_epi32 (
      tmp1 ,
      tmp2
     ) ;

    //i20 , i21 , i22 , i23
    tmp1 = _mm_loadl_epi64(
       (__m128i*)(wtab+FXY[x+2]*4));
    //i30 , i31 , i32 , i33
    tmp2 = _mm_loadl_epi64(
       (__m128i*)(wtab+FXY[x+3]*4));

    //( i20 , i21 , i22 , i23 , i30 , i31 , i32 , i33 )
                a1 = _mm_unpacklo_epi32(
      tmp1,
                        tmp2
      );


    //(i00 , i01 , i10 , i11 , i20 , i21 , i30 , i31)
                b0 = _mm_unpacklo_epi64(a0, a1);
    //(i02 , i03 , i12 , i13 , i22 , i23 , i32 , i33)
                b1 = _mm_unpackhi_epi64(a0, a1);
    
    
    //x direction interpolate
    /*
    // v0 = (s00 , s01 , s10 , s11 , s20 , s21 , s30 , s31 )
    // b0 = (i00 , i01 , i10 , i11 , i20 , i21 , i30 , i31 )
    */
    // ( s00 * i00 + s01 * i01 ,
    //  s10 * i10 + s11 * i11 ,
    //  s20 * i20 + s21 * i21 ,
    //  s30 * i30 + s31 * i31 )
    // = ( y0 , y1 , y2 , y3 )
                v0 = _mm_madd_epi16(v0, b0);
    //y direction interpolate
    // v1 = ( n00 , n01 , n10 , n11 , n20 , n21 , n30 , n31 )
    // b1 = ( i02 , i03 , i12 , i13 , i22 , i23 , i32 , i33 )
    //
    // ( n00 * i02 + n01 * i03 ,
    //  n10 * i12 + n11 * i13 ,
    //  n20 * i22 + n21 * i23 ,
    //  n30 * i32 + n31 * i33 )
    // = ( x0 , x1 , x2 , x3 )
                v1 = _mm_madd_epi16(v1, b1);

    //v0 = ( y0 + x0 + delta , y1 + x1 + delta , y2 + x2 + delta , y3 + x3 + delta )
                v0 = _mm_add_epi32(_mm_add_epi32(v0, v1), delta);


    //next 4 * (x , y) pixel
                i0 = *(ushort*)(S0 + iofs1[0]) + (*(ushort*)(S0 + iofs1[1]) << 16);
                i1 = *(ushort*)(S0 + iofs1[2]) + (*(ushort*)(S0 + iofs1[3]) << 16);
                v2 = _mm_unpacklo_epi32(
     _mm_cvtsi32_si128(i0), 
     _mm_cvtsi32_si128(i1)
     );
                i0 = *(ushort*)(S1 + iofs1[0]) + (*(ushort*)(S1 + iofs1[1]) << 16);
                i1 = *(ushort*)(S1 + iofs1[2]) + (*(ushort*)(S1 + iofs1[3]) << 16);
                v3 = _mm_unpacklo_epi32(
     _mm_cvtsi32_si128(i0), 
     _mm_cvtsi32_si128(i1)
     );
                v2 = _mm_unpacklo_epi8(v2, z);
                v3 = _mm_unpacklo_epi8(v3, z);

                a0 = _mm_unpacklo_epi32(_mm_loadl_epi64((__m128i*)(wtab+FXY[x+4]*4)),
                                        _mm_loadl_epi64((__m128i*)(wtab+FXY[x+5]*4)));
                a1 = _mm_unpacklo_epi32(_mm_loadl_epi64((__m128i*)(wtab+FXY[x+6]*4)),
                                        _mm_loadl_epi64((__m128i*)(wtab+FXY[x+7]*4)));
                b0 = _mm_unpacklo_epi64(a0, a1);
                b1 = _mm_unpackhi_epi64(a0, a1);
                v2 = _mm_madd_epi16(v2, b0);
                v3 = _mm_madd_epi16(v3, b1);z
                v2 = _mm_add_epi32(_mm_add_epi32(v2, v3), delta);
    //------------------------------------------------------------------------

                v0 = _mm_srai_epi32(v0, INTER_REMAP_COEF_BITS);
                v2 = _mm_srai_epi32(v2, INTER_REMAP_COEF_BITS);
                v0 = _mm_packus_epi16(_mm_packs_epi32(v0, v2), z);
                _mm_storel_epi64( (__m128i*)(D + x), v0 );
            }
        }
        

        return x;
    }
};

以上如有錯誤請指教。

*1: remap宣告(https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=remap#remap)
*2: Bilinear內插(https://en.wikipedia.org/wiki/Bilinear_interpolation)
*3: Bicubic內插(https://en.wikipedia.org/wiki/Bicubic_interpolation)
*4: Lanczos取樣(https://en.wikipedia.org/wiki/Lanczos_resampling)
*5: SSE指令集參考(https://software.intel.com/sites/landingpage/IntrinsicsGuide/)

沒有留言: