报告庖丁2.01停用词的Bug
|
guoyi
2007-09-12
x-noise-charactor.dic里的单独的停用词没有建立索引,但是x-noise-word.dic里的停用词都建到索引里去了。代码如下,对“但是,的”建立索引,搜索“的”没有结果,但搜索停用词“但是”得到一个结果。
package textss; import net.paoding.analysis.analyzer.PaodingAnalyzer; import net.paoding.analysis.knife.Paoding; import net.paoding.analysis.knife.PaodingMaker; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; public class Test{ static private Paoding paoding = null; static private Analyzer analyzer = null; static private IndexReader reader = null; static private IndexSearcher searcher = null; static private IndexWriter writer = null; static private String index = "d:/lucene/index"; public static void createIndex(String str){ try{ paoding = PaodingMaker.make(); analyzer = PaodingAnalyzer.writerMode(paoding); writer = new IndexWriter(index,analyzer,true); Document doc = new Document(); doc.add(new Field("contents", str, Field.Store.YES, Field.Index.TOKENIZED)); writer.addDocument(doc); }catch(Exception e){ e.printStackTrace(); }finally{ try{ writer.close(); }catch(Exception e){} } } public static void searchIndex(String queries){ try{ IndexReader reader = IndexReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); QueryParser parser = new QueryParser("contents", analyzer); Query query = parser.parse(queries); Hits hits = searcher.search(query); System.out.println(hits.length()); }catch(Exception e){ e.printStackTrace(); }finally{ try{ searcher.close(); reader.close(); }catch(Exception e){} } } public static void main(String[] args){ createIndex("但是,的"); searchIndex("但是"); } } |
|
|
guoyi
2007-09-12
我是在这里下载的:
zip下载:http://code.google.com/p/paoding/downloads/list Eclipse编码设置的UTF8,/dic里的词典都没有更改。 Debug发现 停用字 和 停用词 的词典都加载进去了,后面为什么没有去掉停用词 代码我还没看明白。 |
|
|
guoyi
2007-09-12
我刚下载了2.02试用版,没有这个问题了。
|

