hadoop svm n.
Download
Skip this Video
Download Presentation
基于 Hadoop+SVM 的关键词分类解决方案

Loading in 2 Seconds...

play fullscreen
1 / 8

基于 Hadoop+SVM 的关键词分类解决方案 - PowerPoint PPT Presentation


  • 239 Views
  • Uploaded on

基于 Hadoop+SVM 的关键词分类解决方案. 队伍名称:雨石 队员组成:张延祥 潘临杰. 目录. 算法总体流程 Hadoop 实现 调优 可扩展点 参考文献. 算法总体流程. 中文分词 向量化 模型训练 样本预测. Hadoop 实现. 分词 IKAnalyzer SVM Liblinear Hdfs 读取 一对一训练 or 分组训练 训练预测 map-reduce 投票预测. 调优. 分组数目与分类性能的权衡( 0.05%-0.15% ) 细粒度分词( 0.8% 左右) 张三 / 说的 / 确实 / 在理

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '基于 Hadoop+SVM 的关键词分类解决方案' - cicada


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hadoop svm

基于Hadoop+SVM的关键词分类解决方案

队伍名称:雨石

队员组成:张延祥 潘临杰

slide2
目录
  • 算法总体流程
  • Hadoop实现
  • 调优
  • 可扩展点
  • 参考文献
slide3
算法总体流程
  • 中文分词
  • 向量化
  • 模型训练
  • 样本预测
hadoop
Hadoop实现
  • 分词
    • IKAnalyzer
  • SVM
    • Liblinear
    • Hdfs读取
    • 一对一训练or分组训练
    • 训练预测map-reduce
  • 投票预测
slide5
调优
  • 分组数目与分类性能的权衡(0.05%-0.15%)
  • 细粒度分词(0.8%左右)
    • 张三/说的/确实/在理
    • 张三/三/说的/的确/确实/实在/在理
  • 向量化权重(0.02%)
  • svm参数(0.2%)
    • -s 4 (MCSVM_CS,Multi-class SVM by Crammer and Singer)
  • 停用词(0.04%)
slide6
可扩展点
  • 模型训练并行化
  • 切分数据(抽样,聚类等)
slide7
参考资料
  • IKAnalyzer官网:https://code.google.com/p/ik-analyzer/
  • Liblinear官网:http://www.csie.ntu.edu.tw/~cjlin/liblinear/
  • Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR: A library for large linear classification[J]. The Journal of Machine Learning Research, 2008, 9: 1871-1874.
  • Keerthi S S, Sundararajan S, Chang K W, et al. A sequential dual method for large scale multi-class linear SVMs[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008: 408-416.