java, j2ee: Lucene Java 下的全文搜尋的使用

Lucene 是一個很棒的全文搜尋程式庫, 看過一些文件說明及測試後,
我們說明一下如何整合進入我們的產品 RUNEC/WE.

把物件加入索引

我們所有想要做索引的物件須要有一個唯一 oid/uuid/guid, 因為我們須要有辦法將物件和索引串在一起.
getIndexDir() 是放置 Lucene 產生的索引目錄
getAnalyzer() 我們是採用 StandardAnalyzer, 你可以考慮 CJKAnalyzer
o 是要索引的物件

範例

protected void addIndex(Object o) {
  boolean create = false;
  if (!IndexReader.indexExists(getIndexDir()))
    create = true;


  IndexWriter w = null;
  try {
    w = new IndexWriter(getIndexDir(), getAnalyzer(), create);
    w.mergeFactor = 20; //default:10

    final Document ld = new Document();
    //oid
    ld.add( Field.Keyword("oid", o.oid().toString()) );
    //others to do index
    ld.add( Field.Keyword("title", o.getTitle()) );
    ld.add( Field.Text("content", o.getMessage()) );
    ld.add( Field.Text("content", o.getAppendix()) );

    //...
    w.addDocument(ld);
    w.optimize();
  } catch (IOException ex) {
    throw SystemException.Aide.wrap(ex);
  } finally {
    try {
      if (w != null)
        w.close();
    } catch (IOException ex) {
      throw SystemException.Aide.wrap(ex);
    }
  }
}

若物件被刪除時要記得刪除索引

利用 o 的 oid 找到索引後刪除

範例

protected void removeIndex(Object o) {
  if (!IndexReader.indexExists(getIndexDir()))
    return; IndexReader r = null;

  try {
    r = IndexReader.open(getIndexDir());

    final Term term = new Term("oid", o.oid().toString());
    r.delete(term);
  } catch (IOException ex) {
    throw SystemException.Aide.wrap(ex);
  } finally {
    try {
      if (r != null)
      r.close();
    } catch (IOException ex) {
      throw SystemException.Aide.wrap(ex);
    }
  }
}

若物件被修改也要記得修正索引內容

Lucene 找不到修正索引的方法, 所以只能利用 o 的 oid 找到索引後刪除再重新加入, 1.+2. 即可

索引的搜尋

Lucene 支援蠻複雜的搜尋語, 我們只用簡單的就夠了
"keyword" 是要搜尋的字串

範例

Hits hits = null;
try {
  final Query qt = QueryParser.parse(keyword, "title", getAnalyzer());
  final Query qc = QueryParser.parse(keyword, "content", getAnalyzer());

  final BooleanQuery q = new BooleanQuery();
  q.add(qt, false, false);
  q.add(qc, false, false);

  final IndexSearcher s = new IndexSearcher(getIndexDir());
  hits = s.search(q);
} catch (ParseException ex) {
  throw SystemException.Aide.wrap(ex);
} catch (IOException ex) {
  throw SystemException.Aide.wrap(ex);
}

範例, n 為搜尋結果第幾筆資料

try {
  final Document ld = hits.doc(n);
  final float score = hits.score(n);

  final String oid = ld.get("oid");
  //you can look for o from oid here

  final String title = ld.get("title");
  final String[] cts = ld.getValues("content");
  final StringBuffer sb = new StringBuffer();
  for (int i = 0; i < cts.length; i++)
    sb.append(cts[i]).append(' ');
} catch (IOException ex) {
  throw SystemException.Aide.wrap(ex);
}

Luke 是一個不錯的工具程式, 可以查看 Lucene 產生的索引目錄下的狀況, 可以幫你除錯.

java, j2ee

Monday, February 20, 2006

Lucene Java 下的全文搜尋的使用

No comments:

About Me

Links

Blog Archive