Lucene.Net.Analysis.CommonGrams.CommonGramsFilter.IncrementToken C# (CSharp) Method

CommonGramsFilter Class Documentation Usage Examples Of Lucene.Net.Analysis.CommonGrams.CommonGramsFilter::IncrementToken ファイルを表示 Open project: apache/lucenenet

IncrementToken() public method

Inserts bigrams for common words into a token stream. For each input token, output the token. If the token and/or the following token are in the list of common words also output a bigram with position increment 0 and type="gram" TODO:Consider adding an option to not emit unigram stopwords as in CDL XTF BigramStopFilter, CommonGramsQueryFilter would need to be changed to work with this. TODO: Consider optimizing for the case of three commongrams i.e "man of the year" normally produces 3 bigrams: "man-of", "of-the", "the-year" but with proper management of positions we could eliminate the middle bigram "of-the"and save a disk seek and a whole set of position lookups.

public IncrementToken ( ) : bool
return	bool

        public override bool IncrementToken()
        {
            // get the next piece of input
            if (savedState != null)
            {
                RestoreState(savedState);
                savedState = null;
                SaveTermBuffer();
                return true;
            }
            else if (!input.IncrementToken())
            {
                return false;
            }

            /* We build n-grams before and after stopwords. 
             * When valid, the buffer always contains at least the separator.
             * If its empty, there is nothing before this stopword.
             */
            if (lastWasCommon || (Common && buffer.Length > 0))
            {
                savedState = CaptureState();
                GramToken();
                return true;
            }

            SaveTermBuffer();
            return true;
        }

Usage Example

コード例 #1

ファイルを表示

ファイル: CommonGramsFilterTest.cs プロジェクト: ChristopherHaws/lucenenet

        public virtual void TestReset()
        {
            const string input = "How the s a brown s cow d like A B thing?";
            WhitespaceTokenizer wt = new WhitespaceTokenizer(TEST_VERSION_CURRENT, new StringReader(input));
            CommonGramsFilter cgf = new CommonGramsFilter(TEST_VERSION_CURRENT, wt, commonWords);

            ICharTermAttribute term = cgf.AddAttribute<ICharTermAttribute>();
            cgf.Reset();
            assertTrue(cgf.IncrementToken());
            assertEquals("How", term.ToString());
            assertTrue(cgf.IncrementToken());
            assertEquals("How_the", term.ToString());
            assertTrue(cgf.IncrementToken());
            assertEquals("the", term.ToString());
            assertTrue(cgf.IncrementToken());
            assertEquals("the_s", term.ToString());
            cgf.Dispose();

            wt.Reader = new StringReader(input);
            cgf.Reset();
            assertTrue(cgf.IncrementToken());
            assertEquals("How", term.ToString());
        }

All Usage Examples Of Lucene.Net.Analysis.CommonGrams.CommonGramsFilter::IncrementToken

CommonGramsFilter

GramToken

IncrementToken

Reset

SaveTermBuffer