报告庖丁2.01停用词的Bug

guoyi 2007-09-12
x-noise-charactor.dic里的单独的停用词没有建立索引,但是x-noise-word.dic里的停用词都建到索引里去了。代码如下,对“但是,的”建立索引,搜索“的”没有结果,但搜索停用词“但是”得到一个结果。


package textss;

import net.paoding.analysis.analyzer.PaodingAnalyzer;
import net.paoding.analysis.knife.Paoding;
import net.paoding.analysis.knife.PaodingMaker;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

public class Test{
static private Paoding paoding = null;
static private Analyzer analyzer = null;
static private IndexReader reader = null;
static private IndexSearcher searcher = null;
static private IndexWriter writer = null;
static private String index = "d:/lucene/index";

public static void createIndex(String str){
try{
paoding = PaodingMaker.make();
analyzer = PaodingAnalyzer.writerMode(paoding);
writer = new IndexWriter(index,analyzer,true);
Document doc = new Document();
doc.add(new Field("contents", str, Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc);
}catch(Exception e){
e.printStackTrace();
}finally{
try{
writer.close();
}catch(Exception e){}
}
}
public static void searchIndex(String queries){
try{
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("contents", analyzer);
Query query = parser.parse(queries);
Hits hits = searcher.search(query);
System.out.println(hits.length());
}catch(Exception e){
e.printStackTrace();
}finally{
try{
searcher.close();
reader.close();
}catch(Exception e){}
}
}
public static void main(String[] args){
createIndex("但是,的");
searchIndex("但是");
}
}
guoyi 2007-09-12
我是在这里下载的:
zip下载:http://code.google.com/p/paoding/downloads/list
Eclipse编码设置的UTF8,/dic里的词典都没有更改。
Debug发现 停用字 和 停用词 的词典都加载进去了,后面为什么没有去掉停用词 代码我还没看明白。
guoyi 2007-09-12
我刚下载了2.02试用版,没有这个问题了。