Hadoop svm
This presentation is the property of its rightful owner.
Sponsored Links
1 / 8

基于 Hadoop+SVM 的关键词分类解决方案 PowerPoint PPT Presentation


  • 190 Views
  • Uploaded on
  • Presentation posted in: General

基于 Hadoop+SVM 的关键词分类解决方案. 队伍名称:雨石 队员组成:张延祥 潘临杰. 目录. 算法总体流程 Hadoop 实现 调优 可扩展点 参考文献. 算法总体流程. 中文分词 向量化 模型训练 样本预测. Hadoop 实现. 分词 IKAnalyzer SVM Liblinear Hdfs 读取 一对一训练 or 分组训练 训练预测 map-reduce 投票预测. 调优. 分组数目与分类性能的权衡( 0.05%-0.15% ) 细粒度分词( 0.8% 左右) 张三 / 说的 / 确实 / 在理

Download Presentation

基于 Hadoop+SVM 的关键词分类解决方案

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hadoop svm

基于Hadoop+SVM的关键词分类解决方案

队伍名称:雨石

队员组成:张延祥 潘临杰


Hadoop svm

目录

  • 算法总体流程

  • Hadoop实现

  • 调优

  • 可扩展点

  • 参考文献


Hadoop svm

算法总体流程

  • 中文分词

  • 向量化

  • 模型训练

  • 样本预测


Hadoop

Hadoop实现

  • 分词

    • IKAnalyzer

  • SVM

    • Liblinear

    • Hdfs读取

    • 一对一训练or分组训练

    • 训练预测map-reduce

  • 投票预测


Hadoop svm

调优

  • 分组数目与分类性能的权衡(0.05%-0.15%)

  • 细粒度分词(0.8%左右)

    • 张三/说的/确实/在理

    • 张三/三/说的/的确/确实/实在/在理

  • 向量化权重(0.02%)

  • svm参数(0.2%)

    • -s 4 (MCSVM_CS,Multi-class SVM by Crammer and Singer)

  • 停用词(0.04%)


Hadoop svm

可扩展点

  • 模型训练并行化

  • 切分数据(抽样,聚类等)


Hadoop svm

参考资料

  • IKAnalyzer官网:https://code.google.com/p/ik-analyzer/

  • Liblinear官网:http://www.csie.ntu.edu.tw/~cjlin/liblinear/

  • Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR: A library for large linear classification[J]. The Journal of Machine Learning Research, 2008, 9: 1871-1874.

  • Keerthi S S, Sundararajan S, Chang K W, et al. A sequential dual method for large scale multi-class linear SVMs[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008: 408-416.


Hadoop svm

谢谢!

  • Q&A


  • Login