1 / 34

北京富士通研发中心实习报告 邱 诚

北京富士通研发中心实习报告 邱 诚. 报告主题. 富士通的工作 Auto-Regressive and Moving Average Model (ARMA) 介绍 RHadoop 介绍. 富士通的工作. 研究数据选择方式; TBSC 均值法 指示性片段 优化 ARMA 模型和 SVR 模型; 动态结合 ARMA 模型和 SVR 模型;. 均值法描述. 基本步骤 查找与预测天 1~9 点的欧式距离最接近的五天; 将所得到的五天通过 10~20 点的欧式距离进行展; 将前两步得到的全部天通过 k-means 聚成两类;

Download Presentation

北京富士通研发中心实习报告 邱 诚

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 北京富士通研发中心实习报告 邱 诚

  2. 报告主题 • 富士通的工作 • Auto-Regressive and Moving Average Model (ARMA)介绍 • RHadoop介绍

  3. 富士通的工作 • 研究数据选择方式; • TBSC • 均值法 • 指示性片段 • 优化ARMA模型和SVR模型; • 动态结合ARMA模型和SVR模型;

  4. 均值法描述 • 基本步骤 • 查找与预测天1~9点的欧式距离最接近的五天; • 将所得到的五天通过10~20点的欧式距离进行展; • 将前两步得到的全部天通过k-means聚成两类; • 挑选预测天之前最接近的同一工作日作为判定天,和两个聚类中心计算欧式距离,挑选距离较小的聚类; • 将所得聚类中的各天求平均值作为预测结果。

  5. ARMA模型介绍 • ARMA模型原理 • ARMA模型优化 • R中ARMA模型的使用

  6. ARMA基本原理 Auto-Regressive model Moving Average model

  7. X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

  8. ARMA基本原理 • 自回归模型描述的是当前值与历史值之间的关系; • 滑动平均模型描述的是自回归部分的误差累计; • ARMA模型就是通过将自回归模型的预测值与累计误差相结合;

  9. ARMA模型的优化 • Akaike’s Information Criterion (AIC) • AIC, Bias Corrected (AICc) • Bayesian Information Criterion (BIC) • 以上优化都是针对通过最大似然估计进行拟合得到的ARMA模型

  10. AIC优化指标 :代表最大似然; :代表模型的参数个数;

  11. R中ARMA模型的使用 • arima • auto.arima

  12. arima函数 arima ( x, order = c(0, 0, 0), seasonal = list(order = c(0, 0, 0), period = NA), xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond, optim.method = "BFGS", optim.control = list(), kappa = 1e6 )

  13. R中arima参数说明

  14. auto.arima函数 auto.arima( x, d=NA, D=NA, max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5, start.p=2, start.q=2, start.P=1, start.Q=1, stationary=FALSE, ic=c("aicc","aic", "bic"), stepwise=TRUE, trace=FALSE, approximation=(length(x)>100 | frequency(x)>12), xreg=NULL, test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"), allowdrift=TRUE, lambda=NULL, parallel=FALSE, num.cores=NULL )

  15. Nowadays, we have lots of data. BIG DATA!

  16. What is R?

  17. What is R?

  18. Why R?

  19. Why R?

  20. What need? • There is a need for more than counts and averages on these big data sets • Analyzing all of the data can lead to insights that sampling or subsets can’t reveal

  21. Why R and Hadoop?

  22. Why R and Hadoop?

  23. Why R and Hadoop?

  24. Why R and Hadoop?

  25. RHadoop介绍

  26. Rhadoop用途 The open-source RHadoop project makes it easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel statistical computing cluster based on R.

  27. Rhadoop

  28. rhdfs • Manipulate HDFS directly from R • Mimic as much of the HDFS Java API as possible

  29. rhdfs Functions

  30. rmr • Designed to be the simplest and most elegant way to write MapReduce programs • Gives the R programmer the tools necessary to perform data analysis in a way that is “R” like • Provides an abstraction layer to hide the implementation details

  31. rmr mapreduce Function

  32. Thank you!

More Related