The server is under maintenance between 08:00 to 12:00 (GMT+08:00), and please visit later.
We apologize for any inconvenience caused
Login  | Sign Up  |  Oriprobe Inc. Feed
China/Asia On Demand
Journal Articles
Laws/Policies/Regulations
Companies/Products
Bookmark and Share
Automatic Selection of Chinese Stoplist
Author(s): 
Pages: 337-340
Year: Issue:  4
Journal: TRANSACTIONS OF BEIJING INSTITUTE OF TECHNOLOGY

Keyword:  停用词中文停用词表联合熵;
Abstract: 通过对现有基于统计的停用词选取方法的考察,提出了一种新的停用词选取方法.用该方法分别计算词条在语料库中各个句子内发生的概率和包含该词条的句子在语料库中的概率,在此基础上计算它们的联合熵,依据联合熵选取停用词.将该方法与传统方法选取的停用词表进行了对比,并比较了将各种方法用于文本分类的预处理时对分类效果的影响.实验结果表明,该方法更好地避免了语料的行文格式对停用词选取的影响,比传统方法更适用于文本分类的预处理.
Related Articles
No related articles found