BitMiracle.LibJpeg.Classic.Internal.jpeg_forward_dct.jpeg_fdct_islow C# (CSharp) Method

jpeg_fdct_islow() private static method

Perform the forward DCT on one block of samples. NOTE: this code only copes with 8x8 DCTs. A slow-but-accurate integer implementation of the forward DCT (Discrete Cosine Transform). A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT on each column. Direct algorithms are also available, but they are much more complex and seem not to be any faster when reduced to code. This implementation is based on an algorithm described in C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics, Speech, and Signal Processing 1989 (ICASSP '89), pp. 988-991. The primary algorithm described there uses 11 multiplies and 29 adds. We use their alternate method with 12 multiplies and 32 adds. The advantage of this method is that no data path contains more than one multiplication; this allows a very simple and accurate implementation in scaled fixed-point arithmetic, with a minimal number of shifts. The poop on this scaling stuff is as follows: Each 1-D DCT step produces outputs which are a factor of sqrt(N) larger than the true DCT outputs. The final outputs are therefore a factor of N larger than desired; since N=8 this can be cured by a simple right shift at the end of the algorithm. The advantage of this arrangement is that we save two multiplications per 1-D DCT, because the y0 and y4 outputs need not be divided by sqrt(N). In the IJG code, this factor of 8 is removed by the quantization step, NOT here. We have to do addition and subtraction of the integer inputs, which is no problem, and multiplication by fractional constants, which is a problem to do in integer arithmetic. We multiply all the constants by CONST_SCALE and convert them to integer constants (thus retaining SLOW_INTEGER_CONST_BITS bits of precision in the constants). After doing a multiplication we have to divide the product by CONST_SCALE, with proper rounding, to produce the correct output. This division can be done cheaply as a right shift of SLOW_INTEGER_CONST_BITS bits. We postpone shifting as long as possible so that partial sums can be added together with full fractional precision. The outputs of the first pass are scaled up by SLOW_INTEGER_PASS1_BITS bits so that they are represented to better-than-integral precision. These outputs require BITS_IN_JSAMPLE + SLOW_INTEGER_PASS1_BITS + 3 bits; this fits in a 16-bit word with the recommended scaling. (For 12-bit sample data, the intermediate array is int anyway.) To avoid overflow of the 32-bit intermediate results in pass 2, we must have BITS_IN_JSAMPLE + SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS <= 26. Error analysis shows that the values given below are the most effective.
private static jpeg_fdct_islow ( int data, byte sample_data, int start_row, int start_col ) : void
data int
sample_data byte
start_row int
start_col int
return void
        private static void jpeg_fdct_islow(int[] data, byte[][] sample_data, int start_row, int start_col)
        {
            /* Pass 1: process rows. */
            /* Note results are scaled up by sqrt(8) compared to a true DCT; */
            /* furthermore, we scale the results by 2**SLOW_INTEGER_PASS1_BITS. */
            int dataIndex = 0;
            for (int ctr = 0; ctr < JpegConstants.DCTSIZE; ctr++)
            {
                byte[] elem = sample_data[start_row + ctr];
                int elemIndex = start_col;

                int tmp0 = elem[elemIndex + 0] + elem[elemIndex + 7];
                int tmp1 = elem[elemIndex + 1] + elem[elemIndex + 6];
                int tmp2 = elem[elemIndex + 2] + elem[elemIndex + 5];
                int tmp3 = elem[elemIndex + 3] + elem[elemIndex + 4];

                /* Even part per LL&M figure 1 --- note that published figure is faulty;
                * rotator "sqrt(2)*c1" should be "sqrt(2)*c6".
                */

                int tmp10 = tmp0 + tmp3;
                int tmp12 = tmp0 - tmp3;
                int tmp11 = tmp1 + tmp2;
                int tmp13 = tmp1 - tmp2;

                tmp0 = elem[elemIndex + 0] - elem[elemIndex + 7];
                tmp1 = elem[elemIndex + 1] - elem[elemIndex + 6];
                tmp2 = elem[elemIndex + 2] - elem[elemIndex + 5];
                tmp3 = elem[elemIndex + 3] - elem[elemIndex + 4];

                data[dataIndex + 0] = (tmp10 + tmp11 - 8 * JpegConstants.CENTERJSAMPLE) << SLOW_INTEGER_PASS1_BITS;
                data[dataIndex + 4] = (tmp10 - tmp11) << SLOW_INTEGER_PASS1_BITS;

                int z1 = (tmp12 + tmp13) * SLOW_INTEGER_FIX_0_541196100;
                /* Add fudge factor here for final descale. */
                z1 += 1 << (SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS - 1);

                data[dataIndex + 2] = JpegUtils.RIGHT_SHIFT(z1 + tmp12 * SLOW_INTEGER_FIX_0_765366865, /* c2-c6 */
                                                SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + 6] = JpegUtils.DESCALE(z1 - tmp13 * SLOW_INTEGER_FIX_1_847759065,
                                                SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);

                /* Odd part per figure 8 --- note paper omits factor of sqrt(2).
                * cK represents cos(K*pi/16).
                * i0..i3 in the paper are tmp4..tmp7 here.
                */

                tmp12 = tmp0 + tmp2;
                tmp13 = tmp1 + tmp3;

                z1 = (tmp12 + tmp13) * SLOW_INTEGER_FIX_1_175875602; /*  c3 */
                /* Add fudge factor here for final descale. */
                z1 += 1 << (SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS - 1);

                tmp12 = tmp12 * (-SLOW_INTEGER_FIX_0_390180644);          /* -c3+c5 */
                tmp13 = tmp13 * (-SLOW_INTEGER_FIX_1_961570560);          /* -c3-c5 */
                tmp12 += z1;
                tmp13 += z1;

                z1 = (tmp0 + tmp3) * (-SLOW_INTEGER_FIX_0_899976223);       /* -c3+c7 */
                tmp0 = tmp0 * SLOW_INTEGER_FIX_1_501321110;              /*  c1+c3-c5-c7 */
                tmp3 = tmp3 * SLOW_INTEGER_FIX_0_298631336;              /* -c1+c3+c5-c7 */
                tmp0 += z1 + tmp12;
                tmp3 += z1 + tmp13;

                z1 = (tmp1 + tmp2) * (-SLOW_INTEGER_FIX_2_562915447);       /* -c1-c3 */
                tmp1 = tmp1 * SLOW_INTEGER_FIX_3_072711026;              /*  c1+c3+c5-c7 */
                tmp2 = tmp2 * SLOW_INTEGER_FIX_2_053119869;              /*  c1+c3-c5+c7 */
                tmp1 += z1 + tmp13;
                tmp2 += z1 + tmp12;

                data[dataIndex + 1] = JpegUtils.RIGHT_SHIFT(tmp0, SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + 3] = JpegUtils.RIGHT_SHIFT(tmp1, SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + 5] = JpegUtils.RIGHT_SHIFT(tmp2, SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + 7] = JpegUtils.RIGHT_SHIFT(tmp3, SLOW_INTEGER_CONST_BITS - SLOW_INTEGER_PASS1_BITS);

                dataIndex += JpegConstants.DCTSIZE;     /* advance pointer to next row */
            }

            /* Pass 2: process columns.
            * We remove the SLOW_INTEGER_PASS1_BITS scaling, but leave the results scaled up
            * by an overall factor of 8.
            * cK represents sqrt(2) * cos(K*pi/16).
            */

            dataIndex = 0;
            for (int ctr = JpegConstants.DCTSIZE - 1; ctr >= 0; ctr--)
            {
                /* Even part per LL&M figure 1 --- note that published figure is faulty;
                 * rotator "sqrt(2)*c1" should be "sqrt(2)*c6".
                 */
                int tmp0 = data[dataIndex + JpegConstants.DCTSIZE * 0] + data[dataIndex + JpegConstants.DCTSIZE * 7];
                int tmp1 = data[dataIndex + JpegConstants.DCTSIZE * 1] + data[dataIndex + JpegConstants.DCTSIZE * 6];
                int tmp2 = data[dataIndex + JpegConstants.DCTSIZE * 2] + data[dataIndex + JpegConstants.DCTSIZE * 5];
                int tmp3 = data[dataIndex + JpegConstants.DCTSIZE * 3] + data[dataIndex + JpegConstants.DCTSIZE * 4];

                /* Add fudge factor here for final descale. */
                int tmp10 = tmp0 + tmp3 + (1 << (SLOW_INTEGER_PASS1_BITS - 1));
                int tmp12 = tmp0 - tmp3;
                int tmp11 = tmp1 + tmp2;
                int tmp13 = tmp1 - tmp2;

                tmp0 = data[dataIndex + JpegConstants.DCTSIZE * 0] - data[dataIndex + JpegConstants.DCTSIZE * 7];
                tmp1 = data[dataIndex + JpegConstants.DCTSIZE * 1] - data[dataIndex + JpegConstants.DCTSIZE * 6];
                tmp2 = data[dataIndex + JpegConstants.DCTSIZE * 2] - data[dataIndex + JpegConstants.DCTSIZE * 5];
                tmp3 = data[dataIndex + JpegConstants.DCTSIZE * 3] - data[dataIndex + JpegConstants.DCTSIZE * 4];

                data[dataIndex + JpegConstants.DCTSIZE * 0] = JpegUtils.RIGHT_SHIFT(tmp10 + tmp11, SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + JpegConstants.DCTSIZE * 4] = JpegUtils.RIGHT_SHIFT(tmp10 - tmp11, SLOW_INTEGER_PASS1_BITS);

                int z1 = (tmp12 + tmp13) * SLOW_INTEGER_FIX_0_541196100;       /* c6 */
                /* Add fudge factor here for final descale. */
                z1 += 1 << (SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS - 1);

                data[dataIndex + JpegConstants.DCTSIZE * 2] = JpegUtils.RIGHT_SHIFT(
                    z1 + tmp12 * SLOW_INTEGER_FIX_0_765366865,
                    SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + JpegConstants.DCTSIZE * 6] = JpegUtils.RIGHT_SHIFT(
                    z1 - tmp13 * SLOW_INTEGER_FIX_1_847759065,
                    SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);

                /* Odd part per figure 8 --- note paper omits factor of sqrt(2).
                * i0..i3 in the paper are tmp4..tmp7 here.
                */

                tmp12 = tmp0 + tmp2;
                tmp13 = tmp1 + tmp3;

                z1 = (tmp12 + tmp13) * SLOW_INTEGER_FIX_1_175875602; /*  c3 */
                /* Add fudge factor here for final descale. */
                z1 += 1 << (SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS - 1);

                tmp12 = tmp12 * (-SLOW_INTEGER_FIX_0_390180644);          /* -c3+c5 */
                tmp13 = tmp13 * (-SLOW_INTEGER_FIX_1_961570560);          /* -c3-c5 */
                tmp12 += z1;
                tmp13 += z1;

                z1 = (tmp0 + tmp3) * (-SLOW_INTEGER_FIX_0_899976223);       /* -c3+c7 */
                tmp0 = tmp0 * SLOW_INTEGER_FIX_1_501321110;              /*  c1+c3-c5-c7 */
                tmp3 = tmp3 * SLOW_INTEGER_FIX_0_298631336;              /* -c1+c3+c5-c7 */
                tmp0 += z1 + tmp12;
                tmp3 += z1 + tmp13;

                z1 = (tmp1 + tmp2) * (-SLOW_INTEGER_FIX_2_562915447);       /* -c1-c3 */
                tmp1 = tmp1 * SLOW_INTEGER_FIX_3_072711026;              /*  c1+c3+c5-c7 */
                tmp2 = tmp2 * SLOW_INTEGER_FIX_2_053119869;              /*  c1+c3-c5+c7 */
                tmp1 += z1 + tmp13;
                tmp2 += z1 + tmp12;

                data[dataIndex + JpegConstants.DCTSIZE * 1] = JpegUtils.RIGHT_SHIFT(tmp0, SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + JpegConstants.DCTSIZE * 3] = JpegUtils.RIGHT_SHIFT(tmp1, SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + JpegConstants.DCTSIZE * 5] = JpegUtils.RIGHT_SHIFT(tmp2, SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);
                data[dataIndex + JpegConstants.DCTSIZE * 7] = JpegUtils.RIGHT_SHIFT(tmp3, SLOW_INTEGER_CONST_BITS + SLOW_INTEGER_PASS1_BITS);

                dataIndex++;          /* advance pointer to next column */
            }
        }