320 likes | 343 Views
INTRODUCTION OF R PROGRAMMING DEFINING ITS DATA TYPES, FUNCTION AND APPLICATION
E N D
R PROGRAMMING ARAVALI COLLEGE OF ENGINEERING & MANAGEMENT (ACEM, FARIDABAD)
Program Name : B.TECH CSESemester : VIICourse Name: R PROGRAMMINGCourse Code: OEC-CS-701(III)Faculty Name: RASHIKA SINGHDesignation: ASSISTANT PROFESSORDepartment : __CSE_____
UNIT No.1 • INTRODUCTION • Getting R • R Version • 32-bit versus 64-bit • The R Environment • Command Line Interface • RStudio • Revolution Analytics • Learning Outcome Familiarize themselves with R and the RStudio IDE
R PROGRAMMING R is a software environment which is used to analyze statistical information and graphical representation. R allows us to do modular programming using functions. This programming language was named R, based on the first name letter of the two authors (Robert Gentleman and Ross Ihaka).
"R is an interpreted computer programming language which was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand." • The R Development Core Team currently develops R. It is also a software environment used to analyze statistical information, graphical representation, reporting, and data modeling. • R is the implementation of the S programming language, which is combined with lexical scoping semantics. • R not only allows us to do branching and looping but also allows to do modular programming using functions. R allows integration with the procedures written in the C, C++, .Net, Python, and FORTRAN languages to improve efficiency. • In the present era, R is one of the most important tool which is used by researchers, data analyst, statisticians, and marketers for retrieving, cleaning, analyzing, visualizing, and presenting data.
History of R Programming • The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and Robert Gentleman in the University of Auckland, New Zealand, and the R Development Core Team currently develops it. • This programming language name is taken from the name of both the developers. • The first project was considered in 1992. The initial version was released in 1995, and in 2000, a stable beta version was released.
The following table shows the release date, version, and description of R language:
Features of R programming • It is a simple and effective programming language which has been well developed. • It is data analysis software. • It is a well-designed, easy, and effective language which has the concepts of user-defined, looping, conditional, and various I/O facilities. • It has a consistent and incorporated set of tools which are used for data analysis. • For different types of calculation on arrays, lists and vectors, R contains a suite of operators. • It provides effective data handling and storage facility. • It is an open-source, powerful, and highly extensible software. • It provides highly extensible graphical techniques. • It allows us to perform multiple calculations using vectors. • R is an interpreted language.
R is important for Data Science? R plays a very important role in Data Science, you will be benefited with following operations in R. • You can run your code without any compiler – R is an interpreted language. Hence we can run code without any compiler. R interprets the code and makes the development of code easier. • Many calculations done with vectors – R is a vector language, so anyone can add functions to a single Vector without putting in a loop. Hence, R is powerful and faster than other languages. • Statistical Language – R used in biology, genetics as well as in statistics. R is a turning complete language where any type of task can perform. 2. R is Good for Business? • R will just not help you in the technical fields, it will also be a great help in your business. • Here, the major reason is that R is open-source, therefore it can be modified and redistributed as per the user’s need. It is great for visualization and has far more capabilities as compared to other tools. • For data-driven businesses, lack of Data Scientists is a huge concern. Companies are using R programming as their core platform and are recruiting trained R programmers.
3. R is a gateway to Lucrative Career R language is used extensively in Data Science. This field offers some of the highest-paying jobs in the world today. 4. Open-source R is an open-source language. It is maintained by a community of active users and you can avail R for free. You can modify various functions in R and make your own packages. Since R is issued under the General Public Licence (GNU), there are no restrictions on its usage. 5. Popularity R has become one of the most popular programming languages in the industries. Conventionally, R was mostly used in academia but with the emergence of Data Science, the need for R in the industries became evident. R is used at Facebook for social network analysis. It is being used at Twitter for semantic analysis as well as visualizations. 6. Robust Visualization Library R comprises of libraries like ggplot2, plotly that offer aesthetic graphical plots to its users. R is most widely recognized for its stunning visualizations which gives it an edge over other Data Science programming languages.
7. With R, you can develop amazing Web-Apps R provides you with the ability to build aesthetic web-applications. Using the R Shiny package, you can develop interactive dashboards straight from the console of your R IDE. Using this, you can embed your visualizations and enhance the storytelling of your data analysis through aesthetic visualizations. 8. A go-to language for Statistics and Data Science R is the standard language for Statistics and Data Science. R was developed for statistics, by statisticians. It has been in use even before the word “Data Science” was coined. Statisticians and Data Scientists are most familiar with R than any other programming language. R facilitates various statistical operations through its thousands of packages. 9. R is being used in almost every industry R is one of the most widely used programming languages in the world today. It is used in almost every industry, ranging from finance, banking to medicine and manufacturing. R is used for portfolio management, risk analytics in finance and banking industries. It is used for carrying out an analysis of drug discovery and genomic analysis in bioinformatics. R is also used to implement various statistical measures to optimize industrial processes.
PROS/ADVATAGES OF R PROGRAMMING 1. Open Source • R is an open-source programming language. This means that anyone can work with R without any need for a license or a fee. Furthermore, you can contribute towards the development of R by customizing its packages, developing new ones and resolving issues. 2. Exemplary Support for Data Wrangling • R provides exemplary support for data wrangling. The packages like dplyr, readr are capable of transforming messy data into a structured form. 3. The Array of Packages • R has a vast array of packages. With over 10,000 packages in the CRAN repository, the number is constantly growing. These packages appeal to all the areas of industry. 4. Quality Plotting and Graphing • R facilitates quality plotting and graphing. The popular libraries like ggplot2 and plotly advocate for aesthetic and visually appealing graphs that set R apart from other programming languages. 5. Highly Compatible • R is highly compatible and can be paired with many other programming languages like C, C++, Java, and Python. It can also be integrated with technologies like Hadoop and various other database management systems as well.
6. Platform Independent • R is a platform-independent language. It is a cross-platform programming language, meaning that it can be run quite easily on Windows, Linux, and Mac. 7. Eye-Catching Reports • With packages like Shiny and Markdown, reporting the results of an analysis is extremely easy with R. You can make reports with the data, plots and R scripts embedded in them. You can even make interactive web apps that allow the user to play with the results and the data. 8. Machine Learning Operations • R provides various facilities for carrying out machine learning operations like classification, regression and also provides features for developing artificial neural networks. 9. Statistics • R is prominently known as the lingua franca of statistics. This is the main reason as to why R is dominant among other programming languages for developing statistical tools. 10. Continuously Growing • R is a constantly evolving programming language. It is a state of the art technology that provides updates whenever any new feature is added.
Disadvantages of R Programming 1. Weak Origin • R shares its origin with a much older programming language “S”. This means that it’s base package does not have support for dynamic or 3D graphics. With common packages of R like Ggplot2 and Plotly, it is possible to create dynamic, 3D as well as animated graphics. 2. Data Handling • In R, the physical memory stores the objects. This is in contrast to other languages like Python. Furthermore, R utilizes more memory as compared with Python. Also, R requires the entire data in one single place, that is, in the memory. Therefore, it is not an ideal option when dealing with Big Data. However, with data management packages and integration with Hadoop possible, this is easily covered. 3. Basic Security • R lacks basic security. This feature is an essential part of most programming languages like Python. Because of this, there are several restrictions with R as it cannot be embedded into a web-application.
4. Complicated Language • R is not an easy language to learn. It has a steep learning curve. Due to this, people who do not have prior programming experience may find it difficult to learn R. 5. Lesser Speed • R packages and the R programming language is much slower than other languages like MATLAB and Python. 6. Spread Across various Packages • The algorithms in R are spread across different packages. Programmers without prior knowledge of packages may find it difficult to implement algorithms.
R VERSIONS • RStudio Server enables users and administrators to have very fine-grained control over which versions of R are used in various contexts. Capabilities include: • Administrators can install several versions of R and specify a global default version as well as per-user or per-group default versions. • Users can switch between any of the available versions of R as they like. • Users can specify that individual R projects remember their last version of R and always use that version until explicitly migrated to a new version.
On Windows, RStudio uses the system's current version of R by default. When R is installed on Windows it writes the version being installed to the Registry as the "current" version of R (the specific registry keys written are described here). This is the version of R which RStudio runs against by default. • You can override which version of R is used via General panel of the RStudio Options dialog. This dialog allows you to specify that RStudio should always bind to the default 32 or 64-bit version of R, or to specify a different version altogether:
Note that by holding down the Control key during the launch of RStudio you can cause the R version selection dialog to display at startup.
32-BIT VERSUS 64-BIT • we have two R binaries • /APPS/32/bin/R /APPS/64/bin/R If you are on 32 bit, then you automatically get the first when you just say R. If you are on 64 bit, then you automatically get the second when you just say R. • On 64 bit you can run the first, if you so desire, but you must invoke it using the full path name /APPS/32/bin/R. It will work in emulation mode. • If you never load your own C code into R with dyn.load(sharedlibraryname) and never use libraries other than those that those installed by the system administrators and you load withlibrary(packagename), then you should have no problems. Just say R with no additional fuss to invoke R, and it will work. • Otherwise, you will have to be aware of What architecture you are running on.What architecture the R binary you are r
APPLICATIONS OF R 1. Finance • Data Science is most widely used in the financial industry. • R is the most popular tool for this role. This is because R provides an advanced statistical suite that is able to carry out all the necessary financial tasks. • With the help of R, financial institutions are able to perform downside risk measurement, adjust risk performance and utilize visualizations like candlestick charts, density plots, drawdown plots, etc. 2. Banking • Just like financial institutions, banking industries make use of R for credit risk modeling and other forms of risk analytics. • Bank of America makes use of R for financial reporting. With the help of R, the data scientists at BOA are able to analyze financial losses and make use of R’s visualization tools.
3. Healthcare • Genetics, Bioinformatics, Drug Discovery, Epidemiology are some of the fields in healthcare that make heavy usage of R. With the help of R, these companies are able to crunch data and process information, providing an essential backdrop for further analysis and data processing. 4. Social Media • For many beginners in Data Science and R, social media is a data playground. Sentiment Analysis and other forms of social media data mining are some of the important statistical tools that are used with R. • Social Media is also a challenging field for Data Science because the data prevalent on social media websites is mostly unstructured in nature. R is used for social media analytics, for segmenting potential customers and targeting them for selling your products.
5. E-Commerce • The e-commerce industry is one of the most important sectors that utilize Data Science. R is one of the standard tools that is being used in e-commerce. • Since these internet-based companies have to deal with various forms of data, structured and unstructured, as well as from varying data sources like spreadsheets and databases (SQL & NoSQL), R proves to be an effective choice for these industries. 6. Manufacturing • Manufacturing companies like Ford, Modelez, and John Deere use R to analyze customer sentiment. This helps them optimize their product according to trending consumer interests and also to match their production volume to varying market demand. They also use R to minimize their production costs and maximize profits.
Real-Life Use Cases of R Language • Facebook – Facebook uses R to update status and its social network graph. It is also used for predicting colleague interactions with R. • Ford Motor Company – Ford relies on Hadoop. It also relies on R for statistical analysis as well as carrying out data-driven support for decision making. • Google – Google uses R to calculate ROI on advertising campaigns and to predict economic activity and also to improve the efficiency of online advertising. • Foursquare – R is an important stack behind Foursquare’s famed recommendation engine. • John Deere – Statisticians at John Deere use R for time series modeling and also geospatial analysis in a reliable and reproducible way. The results are then integrated with Excel and SAP. • Microsoft – Microsoft uses R for the Xbox matchmaking service and also as a statistical engine within the Azure ML framework. • Mozilla – It is the foundation behind the Firefox web browser and uses R to visualize web activity.
1. Data Scientist The profession of Data Scientist is the most demanding job role. A Data Scientist is supposed to extract data, transform it into a structured format, perform analysis and forecast future insights. For this purpose, R is the most ideal tool as it provides efficient data handling capability as well as a robust set of analysis and machine learning tools. 2. Business Analyst A Business Analyst has to develop solutions that are technical in nature for the various business problems. They are required to seek solutions, advance the efforts of the company as well as fulfill the requirements of the business. For this purpose, R provides various business intelligence tools through its extensive packages.
3. Data Analyst A Data Analyst is responsible for extracting and analyzing data. This task requires extensive usage of R’s statistical libraries to deliver accurate results so that the companies can make careful data-driven decisions. 4. Data Visualization Expert R is most popular for its visualization libraries. Due to this reason, Data Visualization experts in R programming are in-demand in the industries. The various packages of R likeggplot2, plotly,etc provide visually appealing graphs and plots to their users. 5. Quantitative Analyst Quantitative Analysts are engaged in the financial and banking industries. These industries have to deal with all types of data and R provides an ideal solution to their various data problems.
Aravali College of Engineering And Management Jasana, Tigaon Road, Neharpar, Faridabad, Delhi NCR Toll Free Number : 91- 8527538785 Website : www.acem.edu.in