Data Science Tutorial
- What is Data Science?
- Life Cycle of Data Analytics
- What is Machine Learning?
- Python Tools in Data Science
- Working with DataBase
- Data Science using R
- Hierarchical Indexing
- Data Science Using Scikit
- Clustering in Data Science
- Working with Network Data
- What is Plotting
- String Manipulation
- What is Text Analysis?
Data Science using R
R is a programming language and software application for statistical analysis and graphics. It is available for use under the GNU General Public License, R software and installation instructions can be accessed via the Comprehensive R Archive and Network. This section gives an overview of the basic functionality of R.This foundation in R is utilized to show many of the presented analytical techniques.
The following R code illustrates a commonly analytical situation in which a dataset is imported, the contents of the dataset are determined, and some modeling building tasks are implemented. Although the reader cannot yet be familiar with the R syntax, the code can be followed by reading the embedded comments, indicated by #. In the following method, the annual sales in U.S. dollars for 10,000 retail users have been supported in the form of a comma-separated-value (CSV) file. The read.csv() function can import the CSV file. This dataset is saved to the R variable sales utilizing the assignment operator <-.
# import a CSV file of the total annual sales for each user
sales <- read.csv("c:/data/yearly_sales.csv")
# determine the imported dataset
# plot num_of_orders vs. sales
main="Number of Orders vs. Sales")
# implement a statistical analysis (fit a linear regression model)
results <- lm(sales$sales_total ~ sales$num_of_orders)
# implement some diagnostics on the fitted model
# plot histogram of the residuals
hist(results$residuals, breaks = 800)
In this example, the data file is imported using the read.csv() function. Once the file has been imported, it is beneficial to determine the contents to provide that the data was loaded properly as well as to develop into simple with the data.
The summary() function provides some descriptive statistics, including the mean and median, for each data column. The minimum and maximum values, including the 1st and 3rd quartiles are provided.
The summary() function is an example of a generic function. A generic function is a group of functions sharing the same name but behaving differently depending on the number and the type of arguments they receive.
R Graphical User Interfaces
R software uses a command-line interface (CLI) that is equivalent to the BASH shell in Linux or the interactive versions of scripting languages such as Python. UNIX and Linux users can enter command R at the terminal prompt to use the CLI. For Windows installations, R comes with RGui.exe, which provides a basic graphical user interface (GUI).
Figure provides a screenshot of the R code example executed in RStudio.
The four highlighted window panes follow.
Scripts: Serves as an area to write and save R code
Workspace: Lists the datasets and variables in the R environment
Plots: Displays the plots generated by the R code and provides a straightforward mechanism to export the plots
Console: Provides a history of the executed R code and the output
R allows one to save the workspace environment, including variables and loaded libraries, into an .Rdata file using the save.image() function. An existing .Rdata file can be loaded using the load.image() function. Tools such as RStudio prompt the user for whether the developer wants to save the workspace connects before exiting the GUI.
NOIR Attributes Type
|Definition||The values represent labels that distinguish one from another.||Attributes imply a sequence.||The difference between the two values is meaningful||Both the difference and the ratio of two values are significant.|
|Examples||ZIP codes, nationality, street names, gender, employee ID numbers, TRUE or FALSE||Quality of diamonds, academic grades, the magnitude of earthquakes||Temperature in Celsius or Fahrenheit, calendar dates, latitudes||Age, the temperature in Kelvin, count, length, weight|
|Operations||=, ≠||=, ≠, <, ≤, >, ≥||=, ≠, <, ≤, >, ≥, +, –||=, ≠, <, ≤, >, ≥, +, -, ×, ÷|