Total Pageviews

Sunday, June 14, 2020

R LANGUAGE

R LANGUAGE

BASIC :

1.  ASSIGN 2 VALUES USING <- OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE - CLICK   CLICK
2.  ASSIGN 2 VALUES USING -> OPERATOR AND PRINT SUM USING CAT() IN R LANGUAGE -  CLICK CLICK
3.  ASSIGN 2 VALUES USING = OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE -  CLICK CLICK
4.   ASSIGN 2 VALUES USING -> OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE -  CLICK  CLICK
5.   ARITHMETIC OPERATOR IN R LANGUAGE (  USING TWO INTEGER VALUES ) -  CLICK  CLICK 
6.   ARITHMETIC OPERATOR IN R LANGUAGE (  USING TWO FLOAT VALUES ) - CLICK CLICK
7.   RELATIONAL OPERATORS IN R LANGUAGE - CLICK  CLICK
8.   LOGICAL OPERATORS IN R LANGUAGE -  CLICK CLICK
9.   IF ELSE STRUCTURE IN R LANGUAGE -  CLICK CLICK
10. NESTED IF STRUCTURE IN R LANGUAGE - CLICK
11. INPUT STRING FROM USER AND PRINT IN R LANGUAGE - CLICK
12. FOR LOOP STRUCTURE IN R LANGUAGE - CLICK
13. WHILE LOOP STRUCTURE IN R LANGUAGE - CLICK
14. SWITCH CASE IN R LANGAUAGE - CLICK
15. SWITCH CASE WHEN CASE VALUE GIVEN BY USER IN R LANGUAGE - CLICK




VECTOR:
 1. VECTOR CREATION, MODIFICATION,DELETION,BASIC OPERARTION OF TWO     VECTORS IN R LANGUAGE  -- CLICK
2. SUM AND AVERAGE OF ELEMENTS INVECTOR  IN R LANGUAGE  -  CLICK
3. LARGEST AND SMALLEST ELEMENT IN VECTOR IN R LANGUAGE  -  CLICK
4. LINEAR SEARCH IN R LANGUAGE  - CLICK
5. BINARY SEARCH IN R LANGUAGE  - CLICK 
6. BUBBLE SORT IN R LANGUAGE  -  CLICK
7. INSERTION SORT IN R LANGUAGE  - CLICK
8. SELECTION SORT IN R LANGUAGE  - CLICK 
9. RANDOMIZED QUICK SORT IN R LANGUAGE  - CLICK
10. MERGE SORT IN R LANGUAGE  - CLICK
11. VECTOR USING RANDOM VALUES - CLICK


MATRIX :
1. MATRIX CREATION IN R LANGUAGE , ADDITION OF TWO MATRICES IN R LANGUAGE , SUBTRACTION OF TWO MATRICES IN R LANGUAGE, MULTIPLICATION OF TWO MATRICES IN R LANGUAGE, DIVISION OF TWO MATRICES IN R LANGUAGE - CLICK


APPLY FAMILY IN R LANGUAGE:
1.APPLY() IN R LANGUAGE: CLICK
2.LAPPLY() IN R LANGUAGE: CLICK
3.SAPPLY () IN R LANGUAGE: CLICK


UNIVERSITY ASSIGNMENT - CLICK HERE

DATA HANDLING :
1. READ DATA FROM CSV FILE    I) CLICK  II) CLICK  III) CLICK 
2.  Read data from CSV files - FIND LARGEST AND SMALLEST    I) CLICK   II) CLICK
3. WRITE DATA TO CSV FILE  I)  CLICK  II) CLICK
4. DATA / SUBSET FROM CSV FILE IN  R LANGUAGE CLICK
5. Read data from txt file CLICK


DATA PROCESSING / CLEANING
1. Type conversion in R language CLICK










📘 Module 1: Introduction to R Programming

(6 Classes)


🎯 Learning Outcomes

After completing this module, students will be able to:

✔ Install R and RStudio

✔ Understand the RStudio Interface

✔ Write basic R programs

✔ Perform arithmetic and logical operations

✔ Work with different data types

✔ Create and manipulate vectors, lists, matrices and data frames

✔ Understand factors and categorical variables


CLASS 1

Introduction to R


What is R?

R is an open-source programming language specially designed for

  • Data Analysis
  • Statistics
  • Machine Learning
  • Artificial Intelligence
  • Data Visualization
  • Research

It was developed by

  • Ross Ihaka
  • Robert Gentleman

at the University of Auckland.

Today R is maintained by the R Foundation.


Why Learn R?

Advantages

✔ Free

✔ Open Source

✔ Easy to Learn

✔ Powerful Graphics

✔ Huge Package Library

✔ Excellent Statistical Functions

✔ Cross Platform


Applications of R

R is widely used in

  • Data Science
  • Business Analytics
  • Bioinformatics
  • Finance
  • Healthcare
  • Marketing
  • Machine Learning
  • Research

Installing R

Step 1

Download R from

https://cran.r-project.org

Install normally.


Step 2

Download RStudio

https://posit.co/download/rstudio-desktop/

Install after installing R.


CLASS 2

RStudio Interface


When RStudio opens, four main windows appear.

+---------------------+----------------------+
| Source Editor | Environment |
| | History |
+---------------------+----------------------+
| Console | Files |
| | Plots |
| | Packages |
| | Help |
+---------------------+----------------------+

1. Source

Used to

  • Write scripts
  • Save programs
  • Edit code

Shortcut

Ctrl + Shift + N

2. Console

Used to execute commands immediately.

Example

5+10

Output

15

3. Environment

Shows

  • Variables
  • Data
  • Functions

4. Files

Displays project files.


5. Plots

Displays graphs.


6. Packages

Shows installed packages.


7. Help

Displays documentation.

Example

help(mean)

Understanding the R Command Prompt

Console Prompt

>

means R is ready.

Example

> 5+2
[1] 7

CLASS 3

Basic Operations


Arithmetic Operators

OperatorMeaning
+Addition
-Subtraction
*Multiplication
/Division
^Power
%%Modulus
%/%Integer Division

Example

a <- 20
b <- 6

a+b
a-b
a*b
a/b
a%%b
a%/%b
a^2

Output

26
14
120
3.333333
2
3
400

Comparison Operators

OperatorMeaning
>Greater
<Less
>=Greater Equal
<=Less Equal
==Equal
!=Not Equal

Example

10>5
5==5
10!=2

Output

TRUE
TRUE
TRUE

Logical Operators

OperatorMeaning
&AND
|OR
!NOT

Example

TRUE & FALSE
TRUE | FALSE
!TRUE

Output

FALSE
TRUE
FALSE

CLASS 4

Data Types


R supports many data types.


Numeric

x <- 10.5

class(x)
typeof(x)

Output

"numeric"

"double"

Integer

x <- 10L

class(x)

Output

"integer"

Character

name <- "Rahul"

class(name)

Output

"character"

Logical

flag <- TRUE

class(flag)

Output

"logical"

Factor

gender <- factor(c("Male","Female","Male"))

gender

Output

Male Female Male

Levels:
Female Male

Variable Assignment

There are three assignment operators.

x <- 10

y = 20

30 -> z

Output

x=10

y=20

z=30

Variable Naming Rules

✔ Can contain letters

✔ Numbers

✔ Underscore

✔ Dot

Cannot start with numbers.

Correct

student_name

age

salary1

marks.math

Wrong

1age

my-name

CLASS 5

Data Structures in R


Vector

A vector stores similar data.


Create Vector

marks <- c(80,90,75,85,95)

marks

Output

80 90 75 85 95

Length

length(marks)

Output

5

Class

class(marks)

Output

"numeric"

Type

typeof(marks)

Output

"double"

Indexing

marks[2]

Output

90

Multiple Values

marks[c(2,4)]

Output

90
85

Functions

sum(marks)

mean(marks)

max(marks)

min(marks)

Output

425

85

95

75

List

Lists store different data types.

student <- list(
Name="Amit",
Age=20,
Marks=85,
Passed=TRUE
)

student

Output

$Name
"Amit"

$Age
20

$Marks
85

$Passed
TRUE

Access

student$Name

student[[2]]

Output

"Amit"

20

Matrix

Stores data in rows and columns.

mat <- matrix(1:9,nrow=3,ncol=3)

mat

Output

1 4 7

2 5 8

3 6 9

Indexing

mat[2,3]

Output

8

Matrix Addition

A<-matrix(1:4,2,2)

B<-matrix(5:8,2,2)

A+B

Output

6 10

8 12

CLASS 6

Data Frame and Factors


Data Frame

Most important data structure.

student <- data.frame(

Roll=c(1,2,3),

Name=c("A","B","C"),

Marks=c(90,85,95)

)

student

Output

Roll Name Marks

1 A 90

2 B 85

3 C 95

Structure

str(student)

Summary

summary(student)

Access Column

student$Marks

First Row

student[1,]

Import CSV

data <- read.csv("student.csv")

head(data)

Export CSV

write.csv(student,"student.csv")

Factors

Factors store categorical data.

Example

grade <- factor(c(

"A",

"B",

"A",

"C",

"B"

))

grade

Output

A

B

A

C

B

Levels

A B C

Levels

levels(grade)

Output

"A"

"B"

"C"

Frequency

table(grade)

Output

A 2

B 2

C 1

Summary of Data Structures

Data StructureStores
VectorSame Data Type
ListDifferent Data Types
Matrix2D Same Data Type
Data FrameTabular Data
FactorCategorical Data

Common Built-in Functions

FunctionPurpose
length()Number of elements
class()Data class
typeof()Internal type
sum()Addition
mean()Average
max()Maximum
min()Minimum
str()Structure
summary()Summary
head()First rows
tail()Last rows
table()Frequency

Practical Exercises

  1. Create two variables and perform all arithmetic operations.
  2. Compare two numbers using comparison operators.
  3. Demonstrate logical operators using TRUE and FALSE.
  4. Create variables of numeric, integer, character, logical, and factor types.
  5. Create a vector of 10 numbers and calculate its sum, mean, maximum, and minimum.
  6. Create a list containing a student's name, age, course, and marks.
  7. Create a 3×3 matrix and print the second row.
  8. Create a data frame of five students with roll number, name, and marks.
  9. Import a CSV file and display the first five records.
  10. Create a factor for student grades and display the frequency of each grade.

Viva Questions

  1. What is R?
  2. What is RStudio?
  3. What is the difference between R and RStudio?
  4. What are the data types in R?
  5. Explain vectors with an example.
  6. What is a list?
  7. What is a matrix?
  8. What is a data frame?
  9. What are factors?
  10. Explain the difference between class() and typeof().
  11. What is the use of summary()?
  12. What is indexing in R?
  13. How do you import a CSV file?
  14. How do you export a CSV file?
  15. Why are factors important in statistical analysis?






📘 Module 2: Data Manipulation and Management (10 Classes)

📚 Syllabus

1. Data Import and Export

  • Reading data from CSV files
  • Reading data from Excel files
  • Writing data to CSV files
  • Writing data to Excel files

2. Data Cleaning and Preparation

  • Handling missing values (NA)
  • Detecting and removing duplicates
  • Data type conversion
  • Renaming rows and columns

3. Data Transformation

  • Selecting columns (select())
  • Filtering rows (filter())
  • Arranging data (arrange())
  • Creating new variables (mutate())
  • Transforming variables (transmute())
  • Summarizing data (summarise())
  • Grouping data (group_by())

📖 Class-wise Course Plan

ClassTopics
Class 1Introduction to Data Manipulation, Reading CSV Files (read.csv())
Class 2Reading Excel Files (readxl), Importing Different File Formats
Class 3Writing Data to CSV and Excel (write.csv(), writexl)
Class 4Data Cleaning: Missing Values (NA), is.na(), na.omit()
Class 5Handling Duplicate Records, Data Type Conversion
Class 6Renaming Rows and Columns, Working with Data Frames
Class 7Data Transformation: select(), filter(), arrange()
Class 8mutate(), transmute(), Creating New Variables
Class 9summarise(), group_by(), Statistical Summaries
Class 10Complete Data Cleaning & Transformation Case Study, Revision, Viva Questions, Lab Exercises



Class 1: Data Import and Export – Reading Data from CSV Files

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the concept of data import.
  • Know different file formats supported by R.
  • Read CSV files into R.
  • Display and inspect imported data.
  • Understand the structure of a data frame.
  • Perform basic data exploration.

📖 2.1 Introduction to Data Import

Definition

Data Import is the process of loading data from external sources into R for analysis and visualization.

Most real-world datasets are stored in external files such as:

  • CSV Files
  • Excel Files
  • Text Files
  • JSON Files
  • Database Tables

R provides powerful functions to import these datasets efficiently.


🌟 Why Data Import is Important?

Data import is the first step in any data analysis project because it allows users to work with real-world datasets.

Advantages

  • Imports large datasets quickly.
  • Supports multiple file formats.
  • Easy to analyze imported data.
  • Compatible with data visualization and machine learning.

📊 Common Data File Formats

File FormatExtensionDescription
CSV.csvComma-Separated Values
Excel.xlsxMicrosoft Excel Workbook
Text.txtPlain Text File
JSON.jsonJavaScript Object Notation
R Data.RDataNative R Data File

📖 2.2 What is a CSV File?

CSV stands for Comma-Separated Values.

Each row represents one record, and each column represents one variable.

CSV is the most widely used format for data exchange because it is simple and supported by almost every software application.


📊 Sample CSV Dataset (10 Records)

File Name: employee.csv

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinance3155000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
110AnjaliHR3252000

📖 2.3 Creating a CSV File

The dataset above can be saved in Notepad or Microsoft Excel as:

employee.csv

CSV Content

Emp_ID,Name,Department,Age,Salary
101,Amit,HR,25,30000
102,Priya,Sales,28,35000
103,Rahul,IT,30,50000
104,Sneha,HR,27,32000
105,Karan,IT,35,60000
106,Neha,Finance,31,55000
107,Arjun,Sales,29,40000
108,Pooja,Finance,33,58000
109,Rohan,IT,26,45000
110,Anjali,HR,32,52000

🔵 2.4 Reading a CSV File

Method 1: Using read.csv()

Syntax

read.csv(file, header = TRUE)

Parameters

ParameterDescription
fileCSV file path
headerTRUE if the first row contains column names

💻 Example 1: Read Employee Data

employee <- read.csv("employee.csv")

employee

Output

   Emp_ID   Name Department Age Salary
1 101 Amit HR 25 30000
2 102 Priya Sales 28 35000
3 103 Rahul IT 30 50000
4 104 Sneha HR 27 32000
5 105 Karan IT 35 60000
6 106 Neha Finance 31 55000
7 107 Arjun Sales 29 40000
8 108 Pooja Finance 33 58000
9 109 Rohan IT 26 45000
10 110 Anjali HR 32 52000

Explanation

  • read.csv() imports the CSV file.
  • The data is stored as a data frame.
  • Each row represents one employee.
  • Each column represents one variable.

💻 Example 2: View the First Six Records

head(employee)

Output

  Emp_ID Name Department Age Salary
1 101 Amit HR 25 30000
2 102 Priya Sales 28 35000
3 103 Rahul IT 30 50000
4 104 Sneha HR 27 32000
5 105 Karan IT 35 60000
6 106 Neha Finance 31 55000

💻 Example 3: View the Last Six Records

tail(employee)

Output

  Emp_ID Name Department Age Salary
5 105 Karan IT 35 60000
6 106 Neha Finance 31 55000
7 107 Arjun Sales 29 40000
8 108 Pooja Finance 33 58000
9 109 Rohan IT 26 45000
10 110 Anjali HR 32 52000

💻 Example 4: Display Structure of Dataset

str(employee)

Output

'data.frame': 10 obs. of 5 variables:

$ Emp_ID : int
$ Name : chr
$ Department : chr
$ Age : int
$ Salary : int

Explanation

str() displays:

  • Number of rows
  • Number of columns
  • Data types of variables

💻 Example 5: Dataset Dimensions

dim(employee)

Output

[1] 10 5

Interpretation: The dataset contains 10 rows and 5 columns.


💻 Example 6: Column Names

colnames(employee)

Output

[1] "Emp_ID" "Name" "Department" "Age" "Salary"

💻 Example 7: Row Names

rownames(employee)

Output

[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"

💻 Example 8: Summary of Dataset

summary(employee)

Output (Example)

Emp_ID
Min. :101
1st Qu.:103.25
Median :105.5
Mean :105.5
3rd Qu.:107.75
Max. :110

Age
Min. :25
Mean :29.6
Max. :35

Salary
Min. :30000
Mean :45700
Max. :60000

💻 Example 9: Display Individual Column

employee$Salary

Output

[1] 30000 35000 50000 32000 60000
[6] 55000 40000 58000 45000 52000

💻 Example 10: Display Multiple Columns

employee[,c("Name","Salary")]

Output

     Name Salary

1 Amit 30000

2 Priya 35000

3 Rahul 50000

4 Sneha 32000

5 Karan 60000

6 Neha 55000

7 Arjun 40000

8 Pooja 58000

9 Rohan 45000

10 Anjali 52000

📊 Common Functions for Exploring Data

FunctionPurpose
head()First 6 rows
tail()Last 6 rows
str()Structure
summary()Statistical summary
dim()Rows and columns
nrow()Number of rows
ncol()Number of columns
colnames()Column names
rownames()Row names

🌍 Real-Life Applications

  • Importing student records
  • Employee databases
  • Sales reports
  • Banking transactions
  • Hospital patient data
  • Survey results
  • Research datasets
  • Machine learning datasets

✔ Advantages of CSV Files

  • Easy to create and edit.
  • Lightweight and portable.
  • Supported by Excel, R, Python, and databases.
  • Ideal for data exchange.

✖ Limitations

  • Does not store formatting.
  • Does not support formulas.
  • No multiple worksheets (unlike Excel).
  • Data types are not preserved automatically.

📝 Lab Exercises

  1. Create an employee.csv file with 10 employee records.
  2. Import the file using read.csv().
  3. Display the first and last six records.
  4. Find the number of rows and columns.
  5. Display the structure of the dataset.
  6. Print only the Name and Salary columns.
  7. Generate a statistical summary using summary().

❓ Viva Questions

  1. What is a CSV file?
  2. What is the purpose of read.csv()?
  3. What does the header argument do?
  4. Which function displays the first six rows?
  5. Which function shows the structure of a dataset?
  6. How do you display column names?
  7. What is the difference between head() and tail()?
  8. What information does summary() provide?
  9. Name two advantages of CSV files.
  10. Give two real-world applications of importing CSV data.

📚 Class Summary

In this class, you learned:

  • The concept of data import.
  • CSV file structure.
  • Reading CSV files using read.csv().
  • Exploring datasets with head(), tail(), str(), dim(), and summary().
  • Practical examples using a 10-record employee dataset.
  • Real-world applications, advantages, limitations, exercises, and viva questions.



Class 2: Data Import and Export – Reading Data from Excel Files (.xlsx)

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand Excel file formats.
  • Install and use the readxl package.
  • Read Excel files into R.
  • Import specific worksheets.
  • Read multiple sheets from an Excel workbook.
  • Explore imported data using R functions.
  • Compare CSV and Excel file formats.

📖 2.5 Introduction to Excel Files

Definition

An Excel file is a spreadsheet created using Microsoft Excel. It stores data in rows and columns and may contain multiple worksheets, formulas, charts, and formatting.

Unlike CSV files, Excel files can store multiple sheets in a single workbook.


🌟 Advantages of Excel Files

  • Multiple worksheets in one file
  • Supports formulas and functions
  • Can contain charts and graphs
  • Easy to edit using Microsoft Excel
  • Widely used in businesses and organizations

📊 Excel File Extensions

ExtensionDescription
.xlsExcel 97–2003 Workbook
.xlsxExcel 2007 and Later Workbook
.xlsmMacro-Enabled Workbook

📖 2.6 The readxl Package

The readxl package is used to import Excel files into R.

If it is not installed, install it once using:


Install Package

install.packages("readxl")

Load Package

library(readxl)

📊 Sample Excel File

File Name: employee.xlsx

Worksheet: Employee

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinance3155000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
110AnjaliHR3252000

📖 2.7 Reading an Excel File

Syntax

read_excel(path)

💻 Example 1: Read an Excel File

library(readxl)

employee <- read_excel("employee.xlsx")

employee

Output

# A tibble: 10 × 5

Emp_ID Name Department Age Salary

1 101 Amit HR 25 30000
2 102 Priya Sales 28 35000
3 103 Rahul IT 30 50000
4 104 Sneha HR 27 32000
5 105 Karan IT 35 60000
6 106 Neha Finance 31 55000
7 107 Arjun Sales 29 40000
8 108 Pooja Finance 33 58000
9 109 Rohan IT 26 45000
10 110 Anjali HR 32 52000

💻 Example 2: Read a Specific Worksheet

Suppose the workbook contains two sheets:

  • Employee
  • Salary
library(readxl)

employee <- read_excel(
"employee.xlsx",
sheet="Employee"
)

employee

Output

Displays all records from the Employee worksheet.


💻 Example 3: Read Sheet by Number

library(readxl)

employee <- read_excel(
"employee.xlsx",
sheet=1
)

Output

Imports the first worksheet.


💻 Example 4: Display Available Sheet Names

library(readxl)

excel_sheets("employee.xlsx")

Output

[1] "Employee"

[2] "Salary"

💻 Example 5: Read Selected Columns

library(readxl)

employee <- read_excel(
"employee.xlsx",
range="A:C"
)

employee

Output

Emp_ID Name Department

101 Amit HR

102 Priya Sales

103 Rahul IT

...

110 Anjali HR

💻 Example 6: Read Specific Cell Range

library(readxl)

employee <- read_excel(
"employee.xlsx",
range="A1:E6"
)

employee

Output

Imports only the first six rows.


💻 Example 7: View Dataset Structure

str(employee)

Output

tibble [10 × 5]

Emp_ID : numeric

Name : character

Department : character

Age : numeric

Salary : numeric

💻 Example 8: Display Summary

summary(employee)

Output

Emp_ID

Min :101

Mean :105.5

Max :110

Age

Min :25

Mean :29.6

Max :35

Salary

Min :30000

Mean :45700

Max :60000

💻 Example 9: First Six Records

head(employee)

Output

First six employee records are displayed.

💻 Example 10: Last Six Records

tail(employee)

Output

Last six employee records are displayed.

📊 Comparison: CSV vs Excel

FeatureCSVExcel
File Extension.csv.xlsx
Multiple Sheets❌ No✅ Yes
Supports Formatting❌ No✅ Yes
Supports Charts❌ No✅ Yes
File SizeSmallLarger
SpeedFasterSlightly Slower
Best ForData ExchangeBusiness Reports

🌍 Real-Life Applications

  • Student attendance records
  • Employee payroll
  • Banking reports
  • Hospital patient data
  • Sales reports
  • Inventory management
  • Research datasets
  • Financial statements

✔ Advantages of readxl

  • Reads Excel files directly.
  • Supports .xls and .xlsx.
  • Imports selected sheets.
  • Imports selected cell ranges.
  • Fast and reliable.

✖ Limitations

  • Cannot modify Excel files (reading only).
  • Formatting is not imported.
  • Macros are ignored.
  • Charts and images are not imported.

📝 Lab Exercises

Exercise 1

Install the readxl package.


Exercise 2

Read an Excel file named employee.xlsx.


Exercise 3

Display available worksheet names.


Exercise 4

Read only the first worksheet.


Exercise 5

Import only columns A to C.


Exercise 6

Import rows 1–6 from the worksheet.


Exercise 7

Display the structure and summary of the imported dataset.


❓ Viva Questions

  1. What is an Excel workbook?
  2. Which package is used to read Excel files in R?
  3. Which function imports Excel data?
  4. What is the purpose of excel_sheets()?
  5. How do you read a worksheet by name?
  6. How do you read a worksheet by number?
  7. What is the difference between CSV and Excel?
  8. Can readxl read .xls files?
  9. Can readxl import charts?
  10. Give two applications of Excel data import.

📚 Class Summary

In this class, you learned:

  • Introduction to Excel files.
  • Installing and loading the readxl package.
  • Reading Excel files with read_excel().
  • Importing specific worksheets and ranges.
  • Viewing sheet names with excel_sheets().
  • Comparing CSV and Excel formats.
  • Practical R programs with outputs.
  • Real-world applications, exercises, and viva questions.


Class 3: Data Export – Writing Data to CSV and Excel Files

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand data export in R.
  • Write data frames to CSV files.
  • Write data frames to Excel files.
  • Export selected columns and filtered data.
  • Save processed data for future use.
  • Understand the differences between CSV and Excel exports.

📖 2.8 Introduction to Data Export

Definition

Data Export is the process of saving data from R into an external file so that it can be used in other software such as Microsoft Excel, LibreOffice Calc, databases, or shared with others.

Common export formats include:

  • CSV (.csv)
  • Excel (.xlsx)
  • Text (.txt)
  • RData (.RData)

🌟 Why Data Export is Important?

Data export allows users to:

  • Save processed datasets.
  • Share reports with others.
  • Store analysis results.
  • Create backup copies.
  • Use data in other applications.

📊 Sample Dataset (10 Records)

employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

   Emp_ID   Name Department Age Salary
1 101 Amit HR 25 30000
2 102 Priya Sales 28 35000
3 103 Rahul IT 30 50000
4 104 Sneha HR 27 32000
5 105 Karan IT 35 60000
6 106 Neha Finance 31 55000
7 107 Arjun Sales 29 40000
8 108 Pooja Finance 33 58000
9 109 Rohan IT 26 45000
10 110 Anjali HR 32 52000

🔵 2.9 Writing Data to a CSV File

Syntax

write.csv(data, file, row.names = FALSE)

Parameters

ParameterDescription
dataData frame to export
fileOutput file name
row.names=FALSEPrevents row numbers from being written

💻 Example 1: Export Entire Dataset

write.csv(employee,
"employee.csv",
row.names=FALSE)

Output

employee.csv created successfully.

💻 Example 2: Export Selected Columns

emp_salary <- employee[,c("Name","Salary")]

write.csv(emp_salary,
"salary.csv",
row.names=FALSE)

Output

salary.csv created successfully.

💻 Example 3: Export Employees from IT Department

IT_emp <- subset(employee,
Department=="IT")

write.csv(IT_emp,
"IT_Employees.csv",
row.names=FALSE)

Output

IT_Employees.csv created successfully.

💻 Example 4: Export Employees with Salary > 50,000

high_salary <- subset(employee,
Salary>50000)

write.csv(high_salary,
"HighSalary.csv",
row.names=FALSE)

Output

HighSalary.csv created successfully.

🟢 2.10 Writing Data to Excel Files

R uses the writexl package to export Excel files.


Install Package

install.packages("writexl")

Load Package

library(writexl)

Syntax

write_xlsx(data, path)

💻 Example 5: Export to Excel

library(writexl)

write_xlsx(employee,
"employee.xlsx")

Output

employee.xlsx created successfully.

💻 Example 6: Export Salary Data

salary_data <- employee[,c("Name","Salary")]

write_xlsx(salary_data,
"EmployeeSalary.xlsx")

Output

EmployeeSalary.xlsx created successfully.

💻 Example 7: Export HR Department

HR_emp <- subset(employee,
Department=="HR")

write_xlsx(HR_emp,
"HR_Department.xlsx")

Output

HR_Department.xlsx created successfully.

💻 Example 8: Export Finance Department

Finance_emp <- subset(employee,
Department=="Finance")

write_xlsx(Finance_emp,
"Finance.xlsx")

Output

Finance.xlsx created successfully.

💻 Example 9: Export Employees Older Than 30

older_emp <- subset(employee,
Age>30)

write.csv(older_emp,
"AgeAbove30.csv",
row.names=FALSE)

Output

AgeAbove30.csv created successfully.

💻 Example 10: Export Summary Statistics

summary_data <- summary(employee)

write.table(summary_data,
"Summary.txt")

Output

Summary.txt created successfully.

📊 Comparison: write.csv() vs write_xlsx()

Featurewrite.csv()write_xlsx()
Output FormatCSVExcel
Multiple Sheets❌ No❌ No (basic usage)
File SizeSmallerLarger
Readable in Excel✅ Yes✅ Yes
Supports Formatting❌ NoLimited

🌍 Real-Life Applications

  • Exporting employee payroll reports.
  • Saving student examination results.
  • Generating monthly sales reports.
  • Creating financial statements.
  • Exporting survey responses.
  • Sharing machine learning results.
  • Backing up processed datasets.
  • Sending reports to management.

✔ Advantages

  • Saves processed data permanently.
  • Easy to share with others.
  • Compatible with Excel and other software.
  • Useful for report generation.
  • Supports automation.

✖ Limitations

  • CSV files cannot store formatting.
  • Excel export requires an additional package.
  • Charts and formulas are not exported automatically.

📝 Lab Exercises

  1. Create a data frame containing 10 student records.
  2. Export the data frame to a CSV file.
  3. Export only the Name and Marks columns.
  4. Export students scoring more than 80 marks.
  5. Export the dataset to an Excel file.
  6. Create separate Excel files for different departments.
  7. Generate a summary report and save it as a text file.

❓ Viva Questions

  1. What is data export?
  2. Which function exports data to CSV?
  3. Why is row.names = FALSE commonly used?
  4. Which package is used to export Excel files?
  5. What is the purpose of write_xlsx()?
  6. Can CSV files store formatting?
  7. Name two advantages of exporting data.
  8. What is the difference between CSV and Excel export?
  9. How can you export only selected columns?
  10. Give two real-life applications of data export.

📚 Class Summary

In this class, you learned:

  • The concept of data export.
  • Writing data frames to CSV files using write.csv().
  • Writing Excel files using the writexl package.
  • Exporting filtered and selected datasets.
  • Comparison of CSV and Excel exports.
  • Practical examples with outputs.
  • Real-world applications, lab exercises, and viva questions. 

Class 4: Data Cleaning and Preparation – Handling Missing Values (NA)

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand missing values (NA) in R.
  • Identify missing values in datasets.
  • Count missing values.
  • Remove missing values.
  • Replace missing values.
  • Perform statistical analysis after handling missing data.

📖 2.11 Introduction to Data Cleaning

Definition

Data Cleaning is the process of detecting, correcting, or removing inaccurate, incomplete, duplicate, or inconsistent data from a dataset.

Data cleaning is one of the most important steps in Data Science, Machine Learning, and Statistical Analysis because the quality of the analysis depends on the quality of the data.


🌟 Why Data Cleaning is Important?

Data cleaning helps to:

  • Improve data quality.
  • Increase the accuracy of analysis.
  • Remove errors and inconsistencies.
  • Handle missing values effectively.
  • Improve machine learning model performance.

📖 2.12 What are Missing Values?

A missing value is a data value that is unavailable or unknown. In R, missing values are represented by NA (Not Available).

Common Causes of Missing Values

  • Data entry errors
  • Survey respondents skipping questions
  • Equipment or sensor failures
  • Data transmission errors
  • Incomplete records

📊 Sample Dataset (10 Records)

student <- data.frame(

Roll_No=c(1,2,3,4,5,6,7,8,9,10),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Marks=c(85,NA,78,92,NA,81,75,88,NA,95),

Age=c(20,21,20,22,21,NA,20,22,21,20)

)

student

Output

   Roll_No   Name Marks Age
1 1 Amit 85 20
2 2 Priya NA 21
3 3 Rahul 78 20
4 4 Sneha 92 22
5 5 Karan NA 21
6 6 Neha 81 NA
7 7 Arjun 75 20
8 8 Pooja 88 22
9 9 Rohan NA 21
10 10 Anjali 95 20

📖 2.13 Detecting Missing Values

Syntax

is.na(object)

is.na() checks each value and returns TRUE if it is missing, otherwise FALSE.


💻 Example 1: Detect Missing Values

is.na(student)

Output

      Roll_No Name Marks Age
1 FALSE FALSE FALSE FALSE
2 FALSE FALSE TRUE FALSE
3 FALSE FALSE FALSE FALSE
4 FALSE FALSE FALSE FALSE
5 FALSE FALSE TRUE FALSE
6 FALSE FALSE FALSE TRUE
7 FALSE FALSE FALSE FALSE
8 FALSE FALSE FALSE FALSE
9 FALSE FALSE TRUE FALSE
10 FALSE FALSE FALSE FALSE

💻 Example 2: Count Missing Values

sum(is.na(student))

Output

[1] 4

Explanation: There are 4 missing values in the dataset.


💻 Example 3: Missing Values in Each Column

colSums(is.na(student))

Output

Roll_No   0
Name 0
Marks 3
Age 1

💻 Example 4: Missing Values in Each Row

rowSums(is.na(student))

Output

1 0
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 1
10 0

📖 2.14 Removing Missing Values

Syntax

na.omit(data)

💻 Example 5: Remove Missing Records

clean_student <- na.omit(student)

clean_student

Output

Roll_No Name Marks Age

1 Amit 85 20

3 Rahul 78 20

4 Sneha 92 22

7 Arjun 75 20

8 Pooja 88 22

10 Anjali 95 20

Explanation

Rows containing missing values are removed.


📖 2.15 Replacing Missing Values

Instead of deleting rows, missing values can be replaced.


💻 Example 6: Replace Missing Marks with Zero

student$Marks[is.na(student$Marks)] <- 0

student

Output

Marks

85

0

78

92

0

81

75

88

0

95

💻 Example 7: Replace Missing Age with Mean Age

student$Age[is.na(student$Age)] <-

mean(student$Age, na.rm=TRUE)

student

Output

Age

20

21

20

22

21

20.78

20

22

21

20

Explanation

na.rm=TRUE ignores missing values while calculating the mean.


💻 Example 8: Calculate Mean Without Missing Values

mean(student$Marks, na.rm=TRUE)

Output

[1] 84.86

💻 Example 9: Calculate Median

median(student$Marks, na.rm=TRUE)

Output

[1] 84.5

💻 Example 10: Standard Deviation

sd(student$Marks, na.rm=TRUE)

Output

[1] 7.38

(Approximate value.)


📖 2.16 Methods for Handling Missing Values

MethodDescription
Delete rowsRemove incomplete records
Replace with MeanNumerical data
Replace with MedianSkewed numerical data
Replace with ModeCategorical data
Predict Missing ValuesMachine learning techniques

📊 Useful Functions

FunctionPurpose
is.na()Detect missing values
sum(is.na())Count missing values
colSums(is.na())Missing values by column
rowSums(is.na())Missing values by row
na.omit()Remove missing rows
mean(..., na.rm=TRUE)Ignore missing values
median(..., na.rm=TRUE)Ignore missing values
sd(..., na.rm=TRUE)Standard deviation without missing values

🌍 Real-Life Applications

  • Student attendance records
  • Hospital patient databases
  • Banking transactions
  • Insurance claims
  • Sales and inventory management
  • Customer feedback analysis
  • Survey data cleaning
  • Machine learning preprocessing

✔ Advantages

  • Improves data quality.
  • Increases analysis accuracy.
  • Prevents errors in statistical calculations.
  • Enhances model performance.
  • Produces reliable reports.

✖ Disadvantages

  • Removing records may reduce dataset size.
  • Replacing values may introduce bias if done incorrectly.
  • Requires careful selection of imputation methods.

📝 Lab Exercises

  1. Create a dataset with 10 student records containing missing values.
  2. Detect missing values using is.na().
  3. Count total missing values.
  4. Find missing values in each column.
  5. Find missing values in each row.
  6. Remove missing records using na.omit().
  7. Replace missing marks with 0.
  8. Replace missing ages with the mean age.
  9. Calculate the mean and median while ignoring missing values.
  10. Find the standard deviation of marks after handling missing values.

❓ Viva Questions

  1. What is a missing value in R?
  2. How are missing values represented in R?
  3. What is the purpose of is.na()?
  4. What does na.omit() do?
  5. Why is na.rm=TRUE used?
  6. How can missing values be counted?
  7. What are common causes of missing data?
  8. When should you replace missing values instead of deleting rows?
  9. What are the advantages of handling missing values?
  10. Give two real-life applications of data cleaning.

📚 Class Summary

In this class, you learned:

  • The concept of data cleaning.
  • Missing values (NA) and their causes.
  • Detecting missing values using is.na().
  • Counting missing values.
  • Removing missing records with na.omit().
  • Replacing missing values with constants and statistical measures.
  • Practical examples with outputs.
  • Real-world applications, exercises, and viva questions.

Class 5: Handling Duplicate Records and Data Type Conversion

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand duplicate records in datasets.
  • Detect duplicate rows and values.
  • Remove duplicate records.
  • Understand different data types in R.
  • Convert data between numeric, character, factor, and logical types.
  • Apply data type conversion in real-world datasets.

📖 2.17 Introduction to Duplicate Data

Definition

A duplicate record is a row or value that appears more than once in a dataset.

Duplicate data may occur because of:

  • Repeated data entry
  • System errors
  • Database merging
  • Data import from multiple sources

Duplicate records can lead to inaccurate statistical analysis and incorrect reports.


🌟 Why Remove Duplicate Records?

Removing duplicates helps to:

  • Improve data quality.
  • Reduce storage space.
  • Increase analysis accuracy.
  • Prevent incorrect statistical results.
  • Improve machine learning performance.

📊 Sample Dataset (10 Records)

employee <- data.frame(

Emp_ID=c(101,102,103,104,105,103,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Rahul","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"IT","Sales","Finance","IT","HR"),

Salary=c(30000,35000,50000,32000,60000,
50000,40000,58000,45000,52000)

)

employee

Output

Emp_ID Name Department Salary

101 Amit HR 30000

102 Priya Sales 35000

103 Rahul IT 50000

104 Sneha HR 32000

105 Karan IT 60000

103 Rahul IT 50000

107 Arjun Sales 40000

108 Pooja Finance 58000

109 Rohan IT 45000

110 Anjali HR 52000

📖 2.18 Detecting Duplicate Records

Syntax

duplicated(data)

💻 Example 1: Detect Duplicate Rows

duplicated(employee)

Output

[1]

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

FALSE

FALSE

FALSE

Explanation

The 6th row is a duplicate of the 3rd row.


💻 Example 2: Display Duplicate Records

employee[duplicated(employee),]

Output

Emp_ID Name Department Salary

103 Rahul IT 50000

💻 Example 3: Count Duplicate Records

sum(duplicated(employee))

Output

[1] 1

💻 Example 4: Remove Duplicate Records

employee_unique <- employee[!duplicated(employee),]

employee_unique

Output

Duplicate row removed successfully.

Total Records = 9

💻 Example 5: Detect Duplicate Employee IDs

duplicated(employee$Emp_ID)

Output

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

FALSE

FALSE

FALSE

📖 2.19 Data Types in R

R supports different types of data.

Data TypeDescriptionExample
NumericNumbers100
CharacterText"Amit"
LogicalTRUE/FALSETRUE
FactorCategoriesHR, Sales

📖 2.20 Data Type Conversion

Data type conversion changes one data type into another.


💻 Example 6: Numeric to Character

x <- 100

class(x)

x <- as.character(x)

class(x)

Output

[1] "numeric"

[1] "character"

💻 Example 7: Character to Numeric

x <- "250"

class(x)

x <- as.numeric(x)

class(x)

Output

[1] "character"

[1] "numeric"

💻 Example 8: Character to Factor

department <- c(

"HR",

"Sales",

"IT",

"HR",

"Finance"

)

factor_department <-

as.factor(department)

factor_department

Output

[1]

HR

Sales

IT

HR

Finance

Levels:

Finance

HR

IT

Sales

💻 Example 9: Numeric to Logical

x <- c(1,0,5)

as.logical(x)

Output

[1]

TRUE

FALSE

TRUE

Explanation

  • 0 becomes FALSE.
  • Any non-zero value becomes TRUE.

💻 Example 10: Check Data Type

class(employee)

str(employee)

Output

[1]

"data.frame"

'data.frame':

10 obs.

4 variables

📊 Common Conversion Functions

FunctionPurpose
as.numeric()Convert to numeric
as.character()Convert to character
as.factor()Convert to factor
as.logical()Convert to logical
class()Display data type
str()Display structure

📊 Comparison of Data Types

TypeStoresExample
NumericNumbers100
CharacterText"Amit"
LogicalTRUE/FALSETRUE
FactorCategoriesHR

🌍 Real-Life Applications

Duplicate Handling

  • Banking transactions
  • Employee databases
  • Hospital patient records
  • Student admission systems
  • Customer databases

Data Type Conversion

  • Machine learning preprocessing
  • Survey analysis
  • Statistical modeling
  • Financial analysis
  • Database management

✔ Advantages

  • Removes redundant information.
  • Improves dataset quality.
  • Ensures correct data types for analysis.
  • Enhances model accuracy.
  • Simplifies data manipulation.

✖ Disadvantages

  • Removing duplicates without verification may delete valid records.
  • Incorrect data type conversion may cause data loss.
  • Requires careful validation before conversion.

📝 Lab Exercises

  1. Create a dataset containing duplicate employee records.
  2. Detect duplicate rows using duplicated().
  3. Count duplicate records.
  4. Remove duplicate records.
  5. Detect duplicate employee IDs.
  6. Convert numeric data to character.
  7. Convert character data to numeric.
  8. Convert department names to factors.
  9. Convert numeric values to logical.
  10. Display the structure of the dataset.

❓ Viva Questions

  1. What is a duplicate record?
  2. Which function detects duplicate rows?
  3. How can duplicate rows be removed?
  4. What is the purpose of duplicated()?
  5. What are the four basic data types in R?
  6. Which function converts data to numeric?
  7. Which function converts data to character?
  8. What is a factor in R?
  9. How does as.logical() work?
  10. Why is data type conversion important?

📚 Class Summary

In this class, you learned:

  • Duplicate records and their effects.
  • Detecting and removing duplicate data.
  • Basic data types in R.
  • Data type conversion using as.numeric(), as.character(), as.factor(), and as.logical().
  • Practical examples with outputs.
  • Real-world applications, exercises, and viva questions. 

Class 6: Renaming Columns and Rows in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the importance of meaningful column and row names.
  • Rename columns using colnames(), names(), and rename().
  • Rename rows using rownames().
  • Rename multiple columns simultaneously.
  • Apply renaming techniques in real-world datasets.

📖 2.21 Introduction to Renaming

Definition

Renaming is the process of changing the names of columns or rows in a dataset to make them more meaningful, readable, and easier to understand.

For example:

Old NameNew Name
M1Marks
DeptDepartment
SalSalary
Age1Age

Using meaningful names improves code readability and makes data analysis easier.


🌟 Why Rename Columns and Rows?

Renaming helps to:

  • Improve readability.
  • Use meaningful variable names.
  • Avoid confusion during analysis.
  • Make reports easier to understand.
  • Prepare data for machine learning and visualization.

📊 Sample Dataset (10 Records)

employee <- data.frame(

ID=c(101,102,103,104,105,106,107,108,109,110),

EmpName=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Dept=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Sal=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

IDEmpNameDeptAgeSal
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinance3155000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
110AnjaliHR3252000

📖 2.22 Renaming Columns Using colnames()

Syntax

colnames(dataframe) <- c("Column1","Column2",...)

💻 Example 1: Rename All Columns

colnames(employee) <- c("Emp_ID",
"Name",
"Department",
"Age",
"Salary")

employee

Output

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
...............

💻 Example 2: Display Column Names

colnames(employee)

Output

[1] "Emp_ID"

[2] "Name"

[3] "Department"

[4] "Age"

[5] "Salary"

📖 2.23 Renaming Columns Using names()

names() works similarly to colnames().


Syntax

names(dataframe)

💻 Example 3

names(employee)

Output

[1]

"Emp_ID"

"Name"

"Department"

"Age"

"Salary"

💻 Example 4: Rename One Column

names(employee)[5] <- "Monthly_Salary"

employee

Output

Emp_IDNameDepartmentAgeMonthly_Salary
101AmitHR2530000
102PriyaSales2835000
...............

📖 2.24 Renaming Rows

Rows can also have names.


Syntax

rownames(dataframe)

💻 Example 5: Display Row Names

rownames(employee)

Output

[1]

"1"

"2"

"3"

...

"10"

💻 Example 6: Rename Rows

rownames(employee) <-

paste("Employee",

1:10,

sep="_")

employee

Output

Employee_1

Employee_2

Employee_3

...

Employee_10

📖 2.25 Renaming Using rename() from dplyr

The dplyr package provides the rename() function.


Install Package

install.packages("dplyr")

Load Package

library(dplyr)

Syntax

rename(data,

NewName = OldName)

💻 Example 7

library(dplyr)

employee <-

rename(employee,

Salary=Monthly_Salary)

employee

Output

The column Monthly_Salary is renamed to Salary.


💻 Example 8: Rename Multiple Columns

library(dplyr)

employee <-

rename(

employee,

Employee_ID=Emp_ID,

Employee_Name=Name
)

Output

Employee_IDEmployee_NameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
...............

💻 Example 9: Verify Structure

str(employee)

Output

'data.frame':

10 obs.

5 variables

💻 Example 10: Display Dataset

head(employee)

Output

Employee_ID Employee_Name Department Age Salary

101 Amit HR 25 30000

102 Priya Sales 28 35000

103 Rahul IT 30 50000

104 Sneha HR 27 32000

105 Karan IT 35 60000

106 Neha Finance 31 55000

📊 Comparison of Renaming Functions

FunctionPurpose
colnames()Rename all columns
names()Rename one or more columns
rownames()Rename rows
rename()Rename selected columns using dplyr

📊 Advantages of Meaningful Column Names

Poor NameBetter Name
M1Marks
DeptDepartment
SalSalary
EmpEmployee_Name
IDEmployee_ID

🌍 Real-Life Applications

  • Employee management systems
  • Student databases
  • Banking records
  • Hospital patient databases
  • Inventory management
  • Sales reporting
  • Data visualization
  • Machine learning preprocessing

✔ Advantages

  • Improves readability.
  • Makes code easier to understand.
  • Helps create professional reports.
  • Simplifies data manipulation.
  • Enhances collaboration among team members.

✖ Limitations

  • Renaming columns incorrectly may break existing code.
  • Duplicate column names should be avoided.
  • Frequent renaming may reduce code consistency.

📝 Lab Exercises

  1. Create a dataset containing 10 employee records.
  2. Rename all column names using colnames().
  3. Display column names.
  4. Rename one column using names().
  5. Display row names.
  6. Rename all row names.
  7. Install and load the dplyr package.
  8. Rename one column using rename().
  9. Rename two columns simultaneously.
  10. Display the structure of the renamed dataset.

❓ Viva Questions

  1. What is the purpose of renaming columns?
  2. Which function changes column names?
  3. Which function changes row names?
  4. What is the difference between colnames() and names()?
  5. Which package contains rename()?
  6. How do you rename multiple columns?
  7. Why are meaningful column names important?
  8. Can row names be customized?
  9. What is the syntax of rename()?
  10. Give two real-life applications of renaming data.

📚 Class Summary

In this class, you learned:

  • The importance of meaningful column and row names.
  • Renaming columns using colnames() and names().
  • Renaming rows using rownames().
  • Using rename() from the dplyr package.
  • Practical examples with outputs.
  • Comparison tables, real-world applications, lab exercises, and viva questions.

Class 7: Data Transformation – select(), filter(), and arrange()

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Select specific columns from a dataset.

  • Filter rows based on conditions.

  • Sort data in ascending and descending order.

  • Combine multiple transformation operations.

  • Use dplyr functions for efficient data analysis.

📖 2.26 Introduction to Data Transformation

Data Transformation means modifying, selecting, filtering, or arranging data into a form suitable for analysis.

R provides powerful transformation functions through the dplyr package.

Install and Load dplyr

📊 Sample Dataset (10 Records)

🔵 2.27 Selecting Columns with select()

Definition

The select() function chooses specific columns from a dataset.

Syntax

💻 Example 1: Select Name and Salary

Output

Name

Salary

Amit

30000

Priya

35000

Rahul

50000

...

...

💻 Example 2: Select Multiple Columns

💻 Example 3: Exclude a Column

🟢 2.28 Filtering Rows with filter()

Definition

The filter() function selects rows that satisfy specified conditions.

Syntax

💻 Example 4: Employees from IT Department

Output

Name

Department

Rahul

IT

Karan

IT

Rohan

IT

💻 Example 5: Salary Greater Than 50,000

💻 Example 6: Multiple Conditions (AND)

💻 Example 7: Multiple Conditions (OR)

🟣 2.29 Arranging Data with arrange()

Definition

The arrange() function sorts rows based on one or more columns.

Syntax

💻 Example 8: Sort by Salary (Ascending)

Output

Name

Salary

Amit

30000

Sneha

32000

Priya

35000

...

...

💻 Example 9: Sort by Salary (Descending)

Output

Name

Salary

Karan

60000

Pooja

58000

Neha

55000

...

...

💻 Example 10: Sort by Department and Salary

📊 Combining Functions

Example: IT Employees Sorted by Salary

Output

Name

Salary

Karan

60000

Rahul

50000

Rohan

45000

📊 Comparison of Functions

Function

Purpose

select()

Choose columns

filter()

Choose rows

arrange()

Sort rows

🌍 Real-Life Applications

  • Selecting important columns from large databases.

  • Filtering customers with high purchases.

  • Sorting employees by salary.

  • Analyzing sales by region.

  • Preparing data for machine learning.

  • Generating management reports.

✔ Advantages

  • Simple and readable syntax.

  • Fast processing.

  • Works well with large datasets.

  • Easy to combine multiple operations.

  • Widely used in data science projects.

✖ Limitations

  • Requires the dplyr package.

  • Very large datasets may require additional optimization.

  • Incorrect conditions may produce unexpected results.

📝 Lab Exercises

  • Select only Name and Salary columns.

  • Exclude the Age column.

  • Filter employees from the Sales department.

  • Filter employees with salary greater than 40,000.

  • Filter employees from IT with salary greater than 45,000.

  • Sort employees by Age ascending.

  • Sort employees by Salary descending.

  • Sort employees by Department and Salary.

  • Display only IT employees sorted by salary.

  • Combine select(), filter(), and arrange() in one program.

❓ Viva Questions

  • What is data transformation?

  • What is the purpose of select()?

  • What is the purpose of filter()?

  • What is the purpose of arrange()?

  • How do you sort data in descending order?

  • How do you apply multiple conditions in filter()?

  • What does desc() do?

  • Can select() exclude columns?

  • What is the pipe operator %>%?

  • Give two real-life applications of data transformation.

📚 Class Summary

In this class, you learned:

  • select() for choosing columns.

  • filter() for selecting rows.

  • arrange() for sorting data.

  • Using multiple conditions.

  • Combining transformation functions with the pipe operator.

  • Practical examples with outputs.

  • Real-world applications, exercises, and viva questions.


Class 8: Data Transformation using mutate() and transmute()

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the purpose of mutate() and transmute().
  • Create new variables in a dataset.
  • Modify existing variables.
  • Perform arithmetic operations on columns.
  • Calculate bonus, tax, gross salary, and net salary.
  • Understand the difference between mutate() and transmute().

📖 2.30 Introduction to mutate()

Definition

The mutate() function from the dplyr package is used to create new columns or modify existing columns in a data frame.

It is one of the most frequently used functions in data analysis and machine learning.


Install and Load Package

install.packages("dplyr")

library(dplyr)

📊 Sample Dataset (10 Records)

employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinance3155000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
110AnjaliHR3252000

📖 2.31 Creating New Columns with mutate()

Syntax

mutate(dataframe,
NewColumn = Expression)

💻 Example 1: Calculate 10% Bonus

library(dplyr)

employee_bonus <- employee %>%
mutate(Bonus = Salary * 0.10)

employee_bonus

Output

NameSalaryBonus
Amit300003000
Priya350003500
Rahul500005000
Sneha320003200
Karan600006000
Neha550005500
Arjun400004000
Pooja580005800
Rohan450004500
Anjali520005200

💻 Example 2: Calculate Gross Salary

employee_gross <- employee %>%
mutate(Gross_Salary = Salary + (Salary * 0.10))

employee_gross

Output

NameSalaryGross_Salary
Amit3000033000
Priya3500038500
Rahul5000055000
Sneha3200035200
Karan6000066000
Neha5500060500
Arjun4000044000
Pooja5800063800
Rohan4500049500
Anjali5200057200

💻 Example 3: Calculate 5% Income Tax

employee_tax <- employee %>%
mutate(Tax = Salary * 0.05)

employee_tax

Output

NameSalaryTax
Amit300001500
Priya350001750
Rahul500002500
Sneha320001600
Karan600003000
Neha550002750
Arjun400002000
Pooja580002900
Rohan450002250
Anjali520002600

💻 Example 4: Calculate Net Salary

employee_net <- employee %>%
mutate(

Bonus = Salary*0.10,

Tax = Salary*0.05,

Net_Salary = Salary + Bonus - Tax

)

employee_net

Output

NameSalaryBonusTaxNet_Salary
Amit300003000150031500
Priya350003500175036750
Rahul500005000250052500
Sneha320003200160033600
Karan600006000300063000
Neha550005500275057750
Arjun400004000200042000
Pooja580005800290060900
Rohan450004500225047250
Anjali520005200260054600

💻 Example 5: Increase Salary by ₹5,000

employee %>%
mutate(Salary = Salary + 5000)

Output

Each employee's salary increases by ₹5,000.


📖 2.32 The transmute() Function

Definition

The transmute() function creates new columns but returns only the newly created columns.

Unlike mutate(), the original columns are not included.


Syntax

transmute(dataframe,
NewColumn = Expression)

💻 Example 6: Display Bonus Only

employee %>%
transmute(Name,
Bonus = Salary*0.10)

Output

NameBonus
Amit3000
Priya3500
Rahul5000
Sneha3200
Karan6000
Neha5500
Arjun4000
Pooja5800
Rohan4500
Anjali5200

💻 Example 7: Gross Salary Only

employee %>%
transmute(Name,
Gross = Salary*1.10)

Output

Displays only Name and Gross Salary.


💻 Example 8: Age After Five Years

employee %>%
mutate(Age_After_5_Years = Age + 5)

Output

NameAgeAge_After_5_Years
Amit2530
Priya2833
Rahul3035
Sneha2732
Karan3540
Neha3136
Arjun2934
Pooja3338
Rohan2631
Anjali3237

💻 Example 9: Annual Salary

employee %>%
mutate(Annual_Salary = Salary * 12)

Output

NameMonthly SalaryAnnual Salary
Amit30000360000
Priya35000420000
Rahul50000600000
Sneha32000384000
Karan60000720000
Neha55000660000
Arjun40000480000
Pooja58000696000
Rohan45000540000
Anjali52000624000

💻 Example 10: Employee Category

employee %>%
mutate(Category = ifelse(Salary >= 50000,
"High Salary",
"Normal Salary"))

Output

NameSalaryCategory
Amit30000Normal Salary
Priya35000Normal Salary
Rahul50000High Salary
Sneha32000Normal Salary
Karan60000High Salary
Neha55000High Salary
Arjun40000Normal Salary
Pooja58000High Salary
Rohan45000Normal Salary
Anjali52000High Salary

📊 Comparison of mutate() and transmute()

Featuremutate()transmute()
Keeps Original Columns✅ Yes❌ No
Creates New Columns✅ Yes✅ Yes
Modifies Existing Columns✅ Yes✅ Yes
Returns Only New Columns❌ No✅ Yes

🌍 Real-Life Applications

  • Employee payroll systems
  • Student result processing
  • Banking interest calculation
  • GST and tax calculation
  • Insurance premium calculation
  • Sales commission reports
  • Financial reporting
  • Business analytics

📝 Lab Exercises

  1. Calculate a 15% bonus for each employee.
  2. Create a Gross Salary column.
  3. Create a Net Salary column after deducting 8% tax.
  4. Calculate annual salary.
  5. Increase every salary by ₹2,000.
  6. Create a category column (High Salary, Medium Salary, Low Salary).
  7. Display only Name and Bonus using transmute().
  8. Calculate age after 10 years.
  9. Create a PF deduction column (12% of salary).
  10. Calculate Take Home Salary = Salary + Bonus − Tax − PF.

❓ Viva Questions

  1. What is the purpose of mutate()?
  2. What is the difference between mutate() and transmute()?
  3. Can mutate() modify existing columns?
  4. Which package contains mutate()?
  5. Which function returns only new columns?
  6. How do you create a new column in R?
  7. What is the use of ifelse() inside mutate()?
  8. How do you calculate annual salary?
  9. What are the advantages of mutate()?
  10. Give two real-life applications of transmute().

📚 Class Summary

In this class, you learned:

  • Creating new variables with mutate().
  • Modifying existing variables.
  • Using transmute() to return only selected transformed columns.
  • Calculating bonus, tax, gross salary, net salary, annual salary, and employee categories.
  • Practical R programs with outputs.
  • Real-world applications, lab exercises, and viva questions.

Class 9: Data Transformation using summarise() and group_by()

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the purpose of summarise() and group_by().
  • Calculate statistical summaries of datasets.
  • Group data based on one or more columns.
  • Generate department-wise reports.
  • Perform grouped statistical analysis.
  • Apply summary functions in real-world business scenarios.

📖 2.33 Introduction to summarise()

Definition

The summarise() (or summarize()) function from the dplyr package is used to calculate summary statistics for a dataset. It reduces multiple rows into a single summary.

Common statistics include:

  • Mean
  • Sum
  • Minimum
  • Maximum
  • Count
  • Standard Deviation
  • Variance

Install and Load Package

install.packages("dplyr")
library(dplyr)

📊 Sample Dataset (10 Records)

employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinance3155000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
110AnjaliHR3252000

📖 2.34 Using summarise()

Syntax

summarise(dataframe,
NewColumn = function(column))

💻 Example 1: Calculate Average Salary

library(dplyr)

employee %>%
summarise(
Average_Salary = mean(Salary)
)

Output

Average_Salary
45700

💻 Example 2: Total Salary

employee %>%
summarise(
Total_Salary = sum(Salary)
)

Output

Total_Salary
457000

💻 Example 3: Minimum and Maximum Salary

employee %>%
summarise(

Minimum = min(Salary),

Maximum = max(Salary)

)

Output

MinimumMaximum
3000060000

💻 Example 4: Count Employees

employee %>%
summarise(
Total_Employees = n()
)

Output

Total_Employees
10

💻 Example 5: Standard Deviation

employee %>%
summarise(
Standard_Deviation = sd(Salary)
)

Output

Standard_Deviation
10682.07 (approx.)

📖 2.35 Using group_by()

Definition

The group_by() function divides a dataset into groups. When used with summarise(), it calculates statistics for each group separately.


Syntax

group_by(dataframe, Column_Name)

💻 Example 6: Average Salary by Department

employee %>%
group_by(Department) %>%
summarise(
Average_Salary = mean(Salary)
)

Output

DepartmentAverage Salary
Finance56500
HR38000
IT51667
Sales37500

💻 Example 7: Total Salary by Department

employee %>%
group_by(Department) %>%
summarise(
Total_Salary = sum(Salary)
)

Output

DepartmentTotal Salary
Finance113000
HR114000
IT155000
Sales75000

💻 Example 8: Employee Count by Department

employee %>%
group_by(Department) %>%
summarise(
Employees = n()
)

Output

DepartmentEmployees
Finance2
HR3
IT3
Sales2

💻 Example 9: Department-wise Minimum and Maximum Salary

employee %>%
group_by(Department) %>%
summarise(

Minimum = min(Salary),

Maximum = max(Salary)

)

Output

DepartmentMinimumMaximum
Finance5500058000
HR3000052000
IT4500060000
Sales3500040000

💻 Example 10: Multiple Summary Statistics

employee %>%
group_by(Department) %>%
summarise(

Average_Age = mean(Age),

Average_Salary = mean(Salary),

Highest_Salary = max(Salary),

Lowest_Salary = min(Salary),

Employees = n()

)

Output

DepartmentAvg AgeAvg SalaryHighestLowestEmployees
Finance32.05650058000550002
HR28.03800052000300003
IT30.35166760000450003
Sales28.53750040000350002

📊 Common Summary Functions

FunctionPurpose
mean()Average
sum()Total
min()Minimum
max()Maximum
n()Count
sd()Standard Deviation
var()Variance
median()Median

📊 Comparison of Functions

FunctionPurpose
summarise()Creates summary statistics
group_by()Groups data into categories
n()Counts rows in each group
mean()Calculates average
sum()Calculates total

🌍 Real-Life Applications

  • Department-wise salary analysis.
  • Student performance reports by class.
  • Monthly sales summaries by region.
  • Customer purchase analysis.
  • Banking transaction summaries.
  • Hospital patient statistics.
  • Inventory reports.
  • Business intelligence dashboards.

✔ Advantages

  • Produces concise statistical summaries.
  • Supports grouped analysis.
  • Easy to combine with other dplyr functions.
  • Ideal for dashboards and reports.
  • Highly efficient for large datasets.

✖ Limitations

  • Requires correctly grouped data.
  • Missing values should be handled before summarizing.
  • Complex summaries may require additional functions.

📝 Lab Exercises

  1. Calculate the average salary of all employees.
  2. Find the total salary paid.
  3. Count the total number of employees.
  4. Find the highest and lowest salary.
  5. Calculate the standard deviation of salaries.
  6. Find the average salary for each department.
  7. Count employees in each department.
  8. Calculate total salary by department.
  9. Find the minimum and maximum salary for each department.
  10. Create a department-wise summary showing average age, average salary, highest salary, lowest salary, and employee count.

❓ Viva Questions

  1. What is the purpose of summarise()?
  2. What is the purpose of group_by()?
  3. Which function counts the number of rows?
  4. How do you calculate the average salary?
  5. What is the difference between summarise() and group_by()?
  6. Can summarise() be used without group_by()?
  7. Which function calculates standard deviation?
  8. What is the purpose of n()?
  9. Why is grouped analysis important?
  10. Give two real-life applications of group_by().

Class 10 (Final): Complete Data Cleaning and Data Transformation Case Study

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Import data from a CSV file.
  • Explore the dataset.
  • Handle missing values.
  • Remove duplicate records.
  • Rename columns.
  • Transform data using dplyr.
  • Generate summary reports.
  • Export the processed dataset.
  • Apply the complete data analysis workflow in R.

📖 2.36 Complete Data Analysis Workflow

A typical data analysis project follows these steps:

Raw Data


Import Data


Explore Dataset


Clean Data


Transform Data


Summarize Data


Export Results

📊 Case Study: Employee Salary Analysis

Suppose a company provides the following employee dataset.

Sample Dataset (10 Records)

employee <- data.frame(

Emp_ID = c(101,102,103,104,105,106,107,108,109,109),

Name = c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Rohan"),

Department = c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","IT"),

Age = c(25,28,30,27,35,NA,29,33,26,26),

Salary = c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,45000)

)

employee

Output

Emp_IDNameDepartmentAgeSalary
101AmitHR2530000
102PriyaSales2835000
103RahulIT3050000
104SnehaHR2732000
105KaranIT3560000
106NehaFinanceNA55000
107ArjunSales2940000
108PoojaFinance3358000
109RohanIT2645000
109RohanIT2645000

Notice that:

  • One missing value exists in Age.
  • One duplicate employee record exists.

Step 1: Explore the Dataset

Program 1

str(employee)
summary(employee)

Output

'data.frame': 10 observations of 5 variables

Summary:
Emp_ID
Name
Department
Age
Salary

Step 2: Detect Missing Values

Program 2

sum(is.na(employee))

Output

[1] 1

Step 3: Replace Missing Age with Mean

Program 3

employee$Age[is.na(employee$Age)] <-

mean(employee$Age,
na.rm=TRUE)

employee

Output

Missing value replaced successfully.

Step 4: Detect Duplicate Records

Program 4

duplicated(employee)

Output

FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE

Step 5: Remove Duplicate Records

Program 5

employee <-

employee[!duplicated(employee),]

Output

Duplicate record removed.

Total Records = 9

Step 6: Rename Columns

Program 6

colnames(employee) <-

c("Employee_ID",

"Employee_Name",

"Department",

"Age",

"Salary")

employee

Output

Columns renamed successfully.

Step 7: Create Bonus Column

Program 7

library(dplyr)

employee <-

employee %>%

mutate(

Bonus = Salary*0.10

)

employee

Output

Employee_NameSalaryBonus
Amit300003000
Priya350003500
Rahul500005000
Sneha320003200
Karan600006000
Neha550005500
Arjun400004000
Pooja580005800
Rohan450004500

Step 8: Create Gross Salary

Program 8

employee <-

employee %>%

mutate(

Gross_Salary=

Salary+Bonus

)

employee

Output

Employee_NameGross Salary
Amit33000
Priya38500
Rahul55000
Sneha35200
Karan66000
Neha60500
Arjun44000
Pooja63800
Rohan49500

Step 9: Department-wise Summary

Program 9

employee %>%

group_by(Department)%>%

summarise(

Employees=n(),

Average_Salary=

mean(Salary),

Highest=max(Salary),

Lowest=min(Salary)

)

Output

DepartmentEmployeesAverage SalaryHighestLowest
Finance2565005800055000
HR2310003200030000
IT3516676000045000
Sales2375004000035000

Step 10: Export Processed Dataset

Program 10

write.csv(

employee,

"Employee_Report.csv",

row.names=FALSE

)

Output

Employee_Report.csv created successfully.

📊 Complete Workflow Summary

StepFunction
Import Dataread.csv()
Check Structurestr()
Summarysummary()
Missing Valuesis.na()
Remove Missingna.omit()
Replace Missingmean()
Duplicate Detectionduplicated()
Remove Duplicates!duplicated()
Rename Columnscolnames()
Create New Columnsmutate()
Group Datagroup_by()
Statistical Summarysummarise()
Export Datawrite.csv()

📊 Best Practices

✔ Keep a backup of the original dataset.

✔ Handle missing values before analysis.

✔ Remove duplicate records carefully.

✔ Use meaningful column names.

✔ Verify data types.

✔ Use group_by() for grouped analysis.

✔ Export the final cleaned dataset.

✔ Document every transformation step.


⚠ Common Errors and Solutions

ErrorCauseSolution
Object not foundIncorrect variable nameCheck spelling
Missing packagePackage not installedinstall.packages()
NA values in meanMissing values presentUse na.rm = TRUE
Duplicate recordsRepeated dataUse duplicated()
Wrong column nameTyping mistakeUse colnames()

🌍 Real-Life Applications

  • Employee payroll processing.
  • Student examination systems.
  • Banking customer databases.
  • Hospital patient records.
  • Insurance claim processing.
  • Retail sales analysis.
  • Inventory management.
  • Government census data.
  • Customer relationship management (CRM).
  • Machine learning data preprocessing.

📝 Lab Programs

  1. Import a CSV file.
  2. Display the first 10 records.
  3. Check the structure of the dataset.
  4. Count missing values.
  5. Replace missing values with the mean.
  6. Detect duplicate records.
  7. Remove duplicate records.
  8. Rename all columns.
  9. Create a Bonus column.
  10. Calculate Gross Salary.
  11. Calculate Annual Salary.
  12. Group employees by department.
  13. Calculate average salary department-wise.
  14. Export the cleaned dataset.
  15. Create a complete employee report.

❓ Viva Questions

  1. What is data cleaning?
  2. What is data transformation?
  3. Which function imports a CSV file?
  4. How do you detect missing values?
  5. Which function removes duplicate records?
  6. What is the purpose of mutate()?
  7. What is group_by() used for?
  8. Which function exports data to CSV?
  9. Why is data cleaning important?
  10. What are the steps in a data analysis workflow?
  11. What is the difference between summarise() and mutate()?
  12. Why are meaningful column names important?
  13. What is the purpose of na.rm = TRUE?
  14. How do you calculate department-wise statistics?
  15. Give three real-life applications of data transformation.
  16. What is the use of duplicated()?
  17. How do you create a new variable in R?
  18. What is the difference between CSV and Excel files?
  19. Why should raw data be backed up before cleaning?
  20. Explain the complete data analysis process in R.

📚 Module 2 Summary

In this module, you learned:

  • Importing and exporting data using CSV and Excel files.
  • Handling missing values and duplicate records.
  • Converting data types.
  • Renaming rows and columns.
  • Selecting, filtering, and arranging data.
  • Creating and transforming variables with mutate() and transmute().
  • Summarizing data using summarise() and group_by().
  • Applying a complete data cleaning and transformation workflow using R.
  • Solving real-world data analysis problems with practical R programs and outputs.





Module 3: Data Visualization in R Programming



📘 CHAPTER 1: Introduction to Data Visualization in R


🌟 1.1 What is Data Visualization?

Data Visualization is the graphical representation of data using charts, graphs, and plots.
It helps to convert raw data into meaningful visual information.

🎯 Purpose:

  • To understand patterns in data
  • To identify trends and relationships
  • To detect outliers
  • To support decision making

📊 1.2 Importance of Data Visualization

  • Makes complex data easy to understand
  • Improves analysis speed
  • Helps in statistical interpretation
  • Useful in business intelligence
  • Enhances presentation quality

📈 1.3 Types of Data Visualizations in R

TypePurpose
Scatter PlotRelationship between variables
Line PlotTrend analysis
Bar ChartCategory comparison
HistogramData distribution
Pie ChartPercentage representation
Box PlotOutlier detection

🟦 1.4 Base R Graphics

Base R provides built-in functions to create plots without installing additional packages.

🔧 Common Functions:

  • plot() → General plotting
  • barplot() → Bar chart
  • hist() → Histogram
  • pie() → Pie chart
  • boxplot() → Box plot

📍 1.5 Scatter Plot in Base R

🎯 Objective:

To show relationship between two variables.

💻 R Script:

# Scatter Plot Example

x <- c(10, 20, 30, 40, 50)
y <- c(15, 25, 35, 45, 60)

plot(x, y,
main = "Scatter Plot Example",
xlab = "X Values",
ylab = "Y Values",
col = "blue",
pch = 19,
cex = 1.5)

🖥️ Output:

  • A blue scatter plot
  • Points increasing diagonally
  • Title: Scatter Plot Example

📌 Interpretation:

There is a positive relationship between X and Y values.


📉 1.6 Line Plot in Base R

💻 R Script:

# Line Plot Example

sales <- c(100, 120, 150, 180, 200)

plot(sales,
type = "l",
col = "red",
lwd = 3,
main = "Sales Growth Over Time",
xlab = "Time",
ylab = "Sales")

🖥️ Output:

  • Red line graph
  • Shows increasing trend

📌 Interpretation:

Sales are increasing steadily over time.


📊 1.7 Bar Plot in Base R

💻 R Script:

# Bar Plot Example

students <- c(30, 25, 40, 35)

barplot(students,
names.arg = c("A", "B", "C", "D"),
col = "green",
main = "Class Strength")

🖥️ Output:

  • Green vertical bars
  • Categories A, B, C, D

📊 1.8 Histogram in Base R

💻 R Script:

# Histogram Example

marks <- c(45, 50, 55, 60, 65, 70, 75, 80, 85)

hist(marks,
col = "skyblue",
main = "Marks Distribution",
xlab = "Marks")

🖥️ Output:

  • Blue histogram bars
  • Frequency distribution of marks

🥧 1.9 Pie Chart in Base R

💻 R Script:

# Pie Chart Example

data <- c(20, 30, 25, 25)

pie(data,
labels = c("Food", "Rent", "Travel", "Savings"),
col = rainbow(4),
main = "Expense Distribution")

🖥️ Output:

  • Multicolor pie chart
  • Shows percentage distribution

📦 1.10 Box Plot in Base R

💻 R Script:

# Box Plot Example

marks <- c(40, 50, 55, 60, 65, 70, 75, 90)

boxplot(marks,
col = "orange",
main = "Marks Analysis")

🖥️ Output:

  • Orange box plot
  • Shows median and spread

⚡ 1.11 Key Advantages of Base R Graphics

  • Easy to use
  • No installation required
  • Fast execution
  • Good for basic analysis

📌 1.12 Summary

  • Data visualization converts data into graphical form
  • Base R provides simple plotting tools
  • Common plots: scatter, line, bar, histogram, pie, box
  • Helps in understanding patterns and trends

❓ 1.13 Viva Questions

  1. What is data visualization?
  2. What is the use of plot() in R?
  3. What is a scatter plot?
  4. Difference between bar plot and histogram?
  5. What is the purpose of a box plot?
  6. What does col parameter do?
  7. What is the use of pch in scatter plot?


📘 CHAPTER 2: Advanced Data Visualization Using Base R Graphics + Introduction to ggplot2


🌟 2.1 Limitations of Base R Graphics

Although Base R graphics are useful, they have some limitations:

  • ❌ Limited customization
  • ❌ Not visually attractive for reports
  • ❌ Difficult to create complex plots
  • ❌ No grammar-based structure
  • ❌ Hard to build advanced dashboards

👉 To overcome these problems, we use ggplot2


🎨 2.2 Introduction to ggplot2

ggplot2 is a powerful visualization package in R based on the Grammar of Graphics.

📦 Install Package:

install.packages("ggplot2")

📥 Load Package:

library(ggplot2)

📚 2.3 Grammar of Graphics (Core Concept)

A plot in ggplot2 is built using layers:

🧩 Components:

ComponentMeaning
DataDataset
Aesthetics (aes)Mapping variables
GeomType of plot
StatsStatistical transformation
CoordCoordinate system
ThemeVisual appearance

📊 2.4 Basic ggplot Structure

ggplot(data, aes(x, y)) +
geom_function()

📌 2.5 Example Dataset

student <- data.frame(
Name = c("A", "B", "C", "D", "E"),
Marks = c(70, 85, 90, 60, 75),
Age = c(18, 19, 20, 18, 21)
)

📍 2.6 Scatter Plot (ggplot2)

library(ggplot2)

ggplot(student, aes(x = Age, y = Marks)) +
geom_point(color = "blue", size = 4) +
ggtitle("Age vs Marks Scatter Plot") +
xlab("Age") +
ylab("Marks")

🖥️ Output:

  • Blue circular points
  • Clear relationship between Age and Marks

📉 2.7 Line Plot (ggplot2)

ggplot(student, aes(x = Age, y = Marks)) +
geom_line(color = "red", size = 1.5) +
geom_point(color = "black", size = 3) +
ggtitle("Line Plot of Marks")

🖥️ Output:

  • Red line connecting points
  • Black dots on each value

📊 2.8 Bar Plot (ggplot2)

ggplot(student, aes(x = Name, y = Marks)) +
geom_bar(stat = "identity", fill = "green") +
ggtitle("Student Marks Bar Chart")

🖥️ Output:

  • Green vertical bars
  • Each student’s marks compared

📊 2.9 Histogram (ggplot2)

ggplot(student, aes(x = Marks)) +
geom_histogram(binwidth = 10,
fill = "skyblue",
color = "black") +
ggtitle("Marks Distribution")

🖥️ Output:

  • Histogram showing frequency of marks

📦 2.10 Box Plot (ggplot2)

ggplot(student, aes(y = Marks)) +
geom_boxplot(fill = "orange") +
ggtitle("Box Plot of Marks")

🖥️ Output:

  • Orange box showing median & outliers

🌈 2.11 Density Plot

ggplot(student, aes(x = Marks)) +
geom_density(fill = "pink", alpha = 0.5) +
ggtitle("Density Plot of Marks")

🖥️ Output:

  • Smooth curve showing distribution

🎨 2.12 Customizing ggplot2

🔹 Titles & Labels

ggplot(student, aes(Age, Marks)) +
geom_point() +
labs(title = "Student Performance",
x = "Age",
y = "Marks")

🔹 Themes

ggplot(student, aes(Age, Marks)) +
geom_point() +
theme_minimal()

Other Themes:

  • theme_bw()
  • theme_classic()
  • theme_dark()

🔹 Colors & Size

ggplot(student, aes(Age, Marks)) +
geom_point(color = "red", size = 4)

🔹 Scales

ggplot(student, aes(Age, Marks)) +
geom_point() +
scale_y_continuous(limits = c(50, 100))

🧩 2.13 Faceting (Multiple Plots)

student$Gender <- c("M", "F", "M", "F", "M")

ggplot(student, aes(Age, Marks)) +
geom_point() +
facet_wrap(~Gender)

🖥️ Output:

  • Separate plots for Male and Female

📊 2.14 Multiple Plot Layout

library(gridExtra)

p1 <- ggplot(student, aes(Age, Marks)) + geom_point()
p2 <- ggplot(student, aes(Name, Marks)) + geom_bar(stat="identity")

grid.arrange(p1, p2, ncol = 2)

📌 2.15 Summary

  • Base R is simple but limited
  • ggplot2 is powerful and flexible
  • Grammar of Graphics is core concept
  • Customization is easy in ggplot2
  • Faceting helps in multi-view analysis

❓ 2.16 Viva Questions

  1. What is ggplot2?
  2. What is Grammar of Graphics?
  3. Difference between base R and ggplot2?
  4. What is aes() in ggplot2?
  5. What is geom_point()?
  6. What is faceting?
  7. What is theme in ggplot2?
  8. What is density plot?

📘 CHAPTER 3: Interactive Data Visualization in R (Plotly & Shiny)


🌟 3.1 What is Interactive Visualization?

Interactive visualization allows users to:

  • 🔍 Zoom in/out of graphs
  • 🖱️ Hover to see values
  • 🎯 Click and explore data
  • 📊 Filter and analyze dynamically

👉 It makes data exploration more powerful than static graphs.


📦 3.2 Plotly in R

Plotly is used to create interactive charts in R.

📥 Install Plotly

install.packages("plotly")

📥 Load Library

library(plotly)

📊 3.3 Interactive Scatter Plot

library(plotly)

x <- c(1,2,3,4,5)
y <- c(10,20,15,25,30)

fig <- plot_ly(
x = x,
y = y,
type = "scatter",
mode = "markers",
marker = list(color = "blue", size = 10)
)

fig

🖥️ Output:

  • Interactive blue points
  • Hover shows values
  • Zoom enabled

📈 3.4 Interactive Line Plot

plot_ly(
x = 1:10,
y = (1:10)^2,
type = "scatter",
mode = "lines+markers",
line = list(color = "red")
)

🖥️ Output:

  • Red curve showing quadratic growth
  • Click and zoom enabled

📊 3.5 Interactive Bar Chart

plot_ly(
x = c("A", "B", "C", "D"),
y = c(20, 35, 30, 40),
type = "bar",
marker = list(color = "green")
)

🖥️ Output:

  • Green bars
  • Hover shows values

📊 3.6 ggplot2 + Plotly Integration

library(ggplot2)
library(plotly)

student <- data.frame(
Name = c("A","B","C","D"),
Marks = c(70,80,90,85)
)

p <- ggplot(student, aes(Name, Marks)) +
geom_bar(stat="identity", fill="blue")

ggplotly(p)

🖥️ Output:

  • Interactive bar chart
  • Hover + zoom + click enabled

🌐 3.7 Introduction to Shiny

Shiny is used to create interactive web applications in R.

👉 Used for:

  • Dashboards
  • Data apps
  • Live reports

📥 Install Shiny

install.packages("shiny")

📥 Load Library

library(shiny)

🧱 3.8 Structure of Shiny App

A Shiny app has 2 parts:

ComponentPurpose
UIUser Interface
ServerLogic/Backend

📱 3.9 Simple Shiny App

library(shiny)

ui <- fluidPage(
titlePanel("Simple Shiny App"),

sidebarLayout(
sidebarPanel(
sliderInput("num",
"Select Number:",
min = 1,
max = 100,
value = 50)
),

mainPanel(
textOutput("result")
)
)
)

server <- function(input, output) {
output$result <- renderText({
paste("Selected Value:", input$num)
})
}

shinyApp(ui = ui, server = server)

🖥️ Output:

  • Slider input (1–100)
  • Dynamic text updates instantly

📊 3.10 Shiny Dashboard Example

library(shiny)

ui <- fluidPage(
titlePanel("Student Dashboard"),

sidebarLayout(
sidebarPanel(
selectInput("subject",
"Choose Subject:",
choices = c("Math", "Science", "English"))
),

mainPanel(
textOutput("outputText")
)
)
)

server <- function(input, output) {

output$outputText <- renderText({
paste("You selected:", input$subject)
})

}

shinyApp(ui = ui, server = server)

🖥️ Output:

  • Dropdown menu
  • Dynamic response display

📊 3.11 Advantages of Interactive Visualization

  • 🎯 Real-time interaction
  • 📊 Better data understanding
  • 📈 Professional dashboards
  • 🧠 Easy decision-making
  • 🌐 Web-based applications

⚖️ 3.12 Comparison

ToolTypeUse
Base RStaticBasic plots
ggplot2Static advancedPublication graphs
PlotlyInteractiveDynamic charts
ShinyWeb appDashboards

📌 3.13 Summary

  • Plotly adds interactivity to graphs
  • ggplotly converts ggplot to interactive charts
  • Shiny creates full web applications
  • Interactive tools are used in real-world analytics

❓ 3.14 Viva Questions

  1. What is interactive visualization?
  2. What is Plotly used for?
  3. What is Shiny in R?
  4. Difference between ggplot2 and Plotly?
  5. What are UI and Server in Shiny?
  6. What is ggplotly()?
  7. What are dashboards?

🎓 FINAL SUMMARY (FULL MODULE)

✔ Base R Graphics → Simple plots
✔ ggplot2 → Advanced visualization
✔ Plotly → Interactive charts
✔ Shiny → Full web dashboards



📘 MODULE 4: STATISTICAL ANALYSIS AND MODELING

Class 1: Descriptive Statistics and Measures of Central Tendency


🌟 Learning Objectives

After completing this chapter, students will be able to:

  • Understand the concept of descriptive statistics.
  • Explain measures of central tendency.
  • Calculate Mean, Median, and Mode using R.
  • Interpret statistical results.
  • Apply descriptive statistics to real-world data.

📚 4.1 Introduction to Statistics

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. It helps researchers, businesses, scientists, and governments make informed decisions based on numerical information.

For example:

  • A school calculates the average marks of students.
  • A company analyzes monthly sales.
  • A hospital studies patient recovery rates.
  • Weather departments analyze temperature records.

Statistics transforms raw data into useful information.


📖 Types of Statistics

Statistics is broadly classified into two categories:

1. Descriptive Statistics

Descriptive statistics summarizes and describes the main features of a dataset. It does not make predictions but presents the data in a meaningful way.

Examples:

  • Mean
  • Median
  • Mode
  • Range
  • Variance
  • Standard Deviation

Applications

  • Student result analysis
  • Employee salary reports
  • Sales reports
  • Population surveys

2. Inferential Statistics

Inferential statistics uses sample data to make predictions or conclusions about a larger population.

Examples:

  • Hypothesis testing
  • Regression analysis
  • ANOVA
  • Confidence intervals

⭐ Importance of Descriptive Statistics

Descriptive statistics helps to:

  • Summarize large datasets.
  • Identify patterns and trends.
  • Compare different datasets.
  • Support decision-making.
  • Prepare data for advanced analysis.

📊 Measures of Central Tendency

Measures of Central Tendency describe the center or typical value of a dataset.

The three common measures are:

  1. Mean
  2. Median
  3. Mode

🔵 4.2 Mean (Arithmetic Mean)

Definition

The Mean is the arithmetic average of all observations.

It is the most commonly used measure of central tendency.


Formula

Mean=XN

Where:

  • ΣX = Sum of all observations
  • N = Number of observations

Sample Data (10 Students' Marks)

StudentMarks
145
252
358
463
567
672
778
884
990
1095

Manual Calculation

Step 1: Add all values

45 + 52 + 58 + 63 + 67 + 72 + 78 + 84 + 90 + 95

= 704

Step 2: Count observations

Number of observations = 10

Step 3: Apply Formula

Mean = 704 ÷ 10

= 70.4


💻 R Program

# Program to Calculate Mean

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

mean_value <- mean(marks)

print("Mean of Marks")
print(mean_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Mean of Marks"

[1] 70.4

📖 Explanation

The mean() function in R calculates the arithmetic average of all values in the vector.

mean(marks)

returns

70.4

because the total marks are 704, divided by 10 students.


✅ Interpretation

The average marks obtained by the students are 70.4.

This means that if the total marks were equally distributed among all students, each student would receive 70.4 marks.


🌍 Real-Life Applications of Mean

  • Calculating students' average marks.
  • Measuring average monthly income.
  • Determining average rainfall.
  • Calculating average temperature.
  • Business profit analysis.
  • Cricket batting average.
  • Manufacturing quality control.

✔ Advantages of Mean

  • Easy to calculate.
  • Uses all observations.
  • Suitable for mathematical analysis.
  • Widely used in statistics.

✖ Disadvantages of Mean

  • Affected by very high or very low values (outliers).
  • Not suitable for highly skewed data.
  • Cannot be used for categorical data.

💡 Important Note

The Mean is the most widely used measure of central tendency, but it can be misleading when a dataset contains extreme values.


📝 Practice Exercise

Use the following data to calculate the Mean manually and using R.

Data
25
30
35
40
45
50
55
60
65
70

Write an R Program

marks <- c(25,30,35,40,45,50,55,60,65,70)

mean(marks)

Expected Output

[1] 47.5

📌 Key Points

  • Mean is the arithmetic average.
  • It is calculated using all observations.
  • R provides the mean() function.
  • Mean is affected by extreme values.
  • It is widely used in business, science, education, and research.

🎯 Learning Summary

After completing this lesson, you have learned:

  • What is descriptive statistics?
  • Types of statistics.
  • Importance of descriptive statistics.
  • Definition and formula of Mean.
  • Manual calculation of Mean.
  • R program to calculate Mean.
  • Interpretation of output.
  • Applications, advantages, and disadvantages of Mean.




🔴 4.3 Median


📖 Definition

The Median is the middle value of a dataset when the observations are arranged in ascending or descending order.

Unlike the Mean, the Median is not affected by extremely high or low values (outliers). Therefore, it is considered a better measure of central tendency for skewed data.


🎯 Formula

For Odd Number of Observations

Median=(n+12)th Observation

For Even Number of Observations

Median=Middle Value1+Middle Value22

Where:

  • n = Total number of observations

📊 Example (10 Student Marks)

StudentMarks
145
252
358
463
567
672
778
884
990
1095

The data is already arranged in ascending order.


🧮 Manual Calculation

Number of observations = 10 (Even)

Middle positions:

  • 5th value = 67
  • 6th value = 72

Median

= (67 + 72) ÷ 2

= 69.5


💻 R Program

# Program to Calculate Median

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

median_value <- median(marks)

print("Median of Marks")
print(median_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Median of Marks"

[1] 69.5

📖 Explanation

The median() function automatically sorts the values (if required) and finds the middle value.

For an even number of observations, it calculates the average of the two middle values.


✅ Interpretation

The median marks are 69.5.

This means:

  • 50% of students scored below 69.5
  • 50% of students scored above 69.5

🌍 Real-Life Applications

  • Income analysis
  • House price analysis
  • Population studies
  • Salary surveys
  • Medical research

✔ Advantages

  • Not affected by outliers.
  • Easy to understand.
  • Suitable for skewed data.
  • Useful for ordinal data.

✖ Disadvantages

  • Does not use every observation.
  • Difficult to calculate for grouped data manually.

📝 Practice Exercise

Find the median of the following data using R.

Sample Data

28, 35, 40, 45, 50, 55, 60, 65, 70, 80

R Script

marks <- c(28,35,40,45,50,55,60,65,70,80)

median(marks)

Output

[1] 52.5

🟣 4.4 Mode


📖 Definition

The Mode is the value that appears most frequently in a dataset.

A dataset may have:

  • One Mode (Unimodal)
  • Two Modes (Bimodal)
  • More than Two Modes (Multimodal)
  • No Mode (all values occur once)

Since R does not provide a built-in function for statistical mode, we create a custom function.


📊 Example (10 Student Marks)

StudentMarks
145
252
363
463
563
672
778
884
990
1095

📋 Frequency Table

MarksFrequency
451
521
633
721
781
841
901
951

The highest frequency is 3.

Therefore,

Mode = 63


💻 R Program

# Program to Calculate Mode

marks <- c(45,52,63,63,63,72,78,84,90,95)

Mode <- function(x)
{
unique_values <- unique(x)
unique_values[which.max(tabulate(match(x, unique_values)))]
}

mode_value <- Mode(marks)

print("Student Marks")
print(marks)

print("Mode of Marks")
print(mode_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 63 63 63 72 78 84 90 95

[1] "Mode of Marks"

[1] 63

📖 Explanation

The custom Mode() function:

  1. Finds the unique values.
  2. Counts how many times each value appears.
  3. Returns the value with the highest frequency.

✅ Interpretation

The most frequently occurring mark is 63.

This indicates that 63 is the most common score among the students.


🌍 Real-Life Applications

  • Most sold product
  • Most common blood group
  • Most frequently purchased item
  • Customer preference analysis
  • Election survey analysis

✔ Advantages

  • Easy to understand.
  • Suitable for categorical data.
  • Not affected by outliers.
  • Represents the most common value.

✖ Disadvantages

  • Some datasets have multiple modes.
  • Some datasets have no mode.
  • Less useful for mathematical calculations.

📊 Comparison of Mean, Median, and Mode

FeatureMeanMedianMode
DefinitionAverage of all valuesMiddle valueMost frequent value
Uses All Data✔ Yes✖ No✖ No
Affected by Outliers✔ Yes✖ No✖ No
Suitable for Categorical Data✖ No✖ No✔ Yes
R Functionmean()median()Custom Function

Class 2: Measures of Dispersion


🌟 Learning Objectives

After completing this chapter, students will be able to:

  • Understand the concept of dispersion.
  • Explain the importance of measures of dispersion.
  • Calculate Range, Variance, and Standard Deviation using R.
  • Interpret statistical results.
  • Compare different measures of dispersion.

📖 4.5 Measures of Dispersion

Definition

Measures of Dispersion are statistical measures that describe how spread out or scattered the data values are around a central value (usually the mean).

While measures of central tendency tell us the center of the data, measures of dispersion indicate how much the observations vary.

For example, two classes may have the same average marks, but one class may have marks that are closely grouped while the other has marks that are widely spread.


Importance of Measures of Dispersion

Measures of dispersion help us to:

  • Determine the consistency of data.
  • Compare different datasets.
  • Measure variability.
  • Analyze business and scientific data.
  • Make better statistical decisions.

🔵 4.6 Range


Definition

The Range is the simplest measure of dispersion. It is the difference between the highest and the lowest value in a dataset.


Formula

Range=Maximum ValueMinimum Value

Sample Data (10 Students' Marks)

StudentMarks
145
252
358
463
567
672
778
884
990
1095

Manual Calculation

Maximum Value = 95

Minimum Value = 45

Range = 95 − 45

= 50


💻 R Program

# Program to Calculate Range

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

range_value <- max(marks) - min(marks)

print("Range")
print(range_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Range"

[1] 50

Explanation

  • max() finds the highest value.
  • min() finds the lowest value.
  • Their difference gives the range.

Interpretation

The marks are spread over 50 marks, indicating the overall spread between the highest and lowest scores.


Real-Life Applications

  • Temperature variation
  • Stock market prices
  • Monthly rainfall
  • Student performance analysis

Advantages

  • Very easy to calculate.
  • Easy to understand.
  • Quick measure of spread.

Disadvantages

  • Uses only two values.
  • Strongly affected by extreme values.
  • Does not describe the distribution of all observations.

🟣 4.7 Variance


Definition

Variance measures the average squared deviation of each observation from the mean. It tells us how much the data values vary around the average.

A small variance indicates that the data points are close to the mean, while a large variance indicates that the data points are widely spread.




Sample Data (10 Students' Marks)

45, 52, 58, 63, 67, 72, 78, 84, 90, 95


Step 1: Calculate Mean

Mean = 70.4


Step 2: Find Squared Devia


tions

Marksx − Mean(x − Mean)²
45−25.4645.16
52−18.4338.56
58−12.4153.76
63−7.454.76
67−3.411.56
721.62.56
787.657.76
8413.6184.96
9019.6384.16
9524.6605.16

Sum of squared deviations = 2438.40


Step 3: Calculate Variance

Variance = 2438.40 ÷ (10 − 1)

Variance = 2438.40 ÷ 9

270.93


💻 R Program

# Program to Calculate Variance

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

variance_value <- var(marks)

print("Variance")
print(variance_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Variance"

[1] 270.9333

Explanation

The var() function calculates the sample variance by dividing the sum of squared deviations by n − 1.


Interpretation

The variance of 270.93 indicates a moderate spread of marks around the mean.


Applications

  • Quality control
  • Financial risk analysis
  • Scientific experiments
  • Educational performance analysis

Advantages

  • Uses all observations.
  • Provides an accurate measure of variability.
  • Widely used in statistical analysis.

Disadvantages

  • Expressed in squared units.
  • Less intuitive than standard deviation.

🔴 4.8 Standard Deviation


Definition

The Standard Deviation is the positive square root of the variance. It measures the average distance of each observation from the mean and is expressed in the same units as the original data.


Formula

Standard Deviation=Variance

Sample Data

45, 52, 58, 63, 67, 72, 78, 84, 90, 95


Manual Calculation

Variance = 270.93

Standard Deviation

= √270.93

16.46


💻 R Program

# Program to Calculate Standard Deviation

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

sd_value <- sd(marks)

print("Standard Deviation")
print(sd_value)

🖥 Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Standard Deviation"

[1] 16.46005

Explanation

The sd() function calculates the square root of the sample variance.


Interpretation

The marks typically vary by about 16.46 marks from the average.


Real-Life Applications

  • Exam result analysis
  • Investment risk measurement
  • Manufacturing quality control
  • Medical research
  • Weather forecasting

Advantages

  • Uses all observations.
  • Expressed in original units.
  • Easy to interpret.
  • Most commonly used measure of dispersion.

Disadvantages

  • Influenced by outliers.
  • More computationally intensive than range.

📊 Comparison of Measures of Dispersion

MeasureFormulaR FunctionUses All DataAffected by Outliers
RangeMax − Minmax() - min()❌ No✔ Yes
VarianceΣ(x − x̄)² / (n − 1)var()✔ Yes✔ Yes
Standard Deviation√Variancesd()✔ Yes✔ Yes

📝 Practice Exercises

  1. Calculate the Range for: 25, 30, 35, 40, 45, 50, 55, 60, 65, 70.
  2. Write an R program to calculate the Variance of 10 observations.
  3. Write an R program to calculate the Standard Deviation of a dataset.
  4. Explain the difference between Range and Standard Deviation.
  5. Which measure of dispersion is most commonly used? Why?

🎯 Chapter Summary

After studying this chapter, you should be able to:

  • Define measures of dispersion.
  • Calculate Range manually and using R.
  • Calculate Variance manually and using R.
  • Calculate Standard Deviation manually and using R.
  • Interpret the results produced by R.
  • Compare different measures of dispersion.
  • Apply these concepts to real-world datasets.



📘 MODULE 4: STATISTICAL ANALYSIS AND MODELING

Class 3: Complete Descriptive Statistics Using R


🎯 Learning Objectives

After completing this practical session, students will be able to:

  • Create an R program to calculate descriptive statistics.
  • Calculate Mean, Median, Mode, Range, Variance, and Standard Deviation.
  • Interpret statistical results.
  • Analyze a dataset using R.

📖 Introduction

In previous classes, we studied each statistical measure separately. In this practical session, we will develop a single R program that calculates all descriptive statistics for a dataset.


📊 Sample Dataset (10 Students' Marks)

StudentMarks
145
252
358
463
567
672
778
884
990
1095

💻 Complete R Program

#---------------------------------------
# Descriptive Statistics in R
#---------------------------------------

# Sample Dataset

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

# Mean

mean_value <- mean(marks)

# Median

median_value <- median(marks)

# Mode

Mode <- function(x)
{
unique_values <- unique(x)
unique_values[
which.max(tabulate(match(x, unique_values)))
]
}

mode_value <- Mode(marks)

# Range

range_value <- max(marks)-min(marks)

# Variance

variance_value <- var(marks)

# Standard Deviation

sd_value <- sd(marks)

print("-------------------------")

print(paste("Mean =",mean_value))

print(paste("Median =",median_value))

print(paste("Mode =",mode_value))

print(paste("Range =",range_value))

print(paste("Variance =",variance_value))

print(paste("Standard Deviation =",sd_value))

print("-------------------------")

🖥 Sample Output

[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "-------------------------"

[1] "Mean = 70.4"

[1] "Median = 69.5"

[1] "Mode = 45"

[1] "Range = 50"

[1] "Variance = 270.9333"

[1] "Standard Deviation = 16.46005"

[1] "-------------------------"

Note: In this dataset, every value appears only once, so the custom Mode() function returns the first value (45). Statistically, this dataset has no mode because no value occurs more frequently than the others. To demonstrate a true mode, use a dataset with repeated values (for example: 45, 52, 58, 63, 63, 63, 72, 84, 90, 95).


📖 Explanation of the Program

FunctionPurpose
mean()Calculates the arithmetic mean
median()Calculates the median
Mode()Finds the most frequently occurring value
max()Returns the maximum value
min()Returns the minimum value
var()Calculates the sample variance
sd()Calculates the sample standard deviation

📊 Interpretation of Results

Mean = 70.4

The average marks of the students are 70.4.


Median = 69.5

Half of the students scored below 69.5, while the other half scored above it.


Mode

Since all values occur only once, there is no statistical mode in this dataset.


Range = 50

The difference between the highest and lowest marks is 50.


Variance = 270.93

The marks show a moderate spread around the average.


Standard Deviation = 16.46

The marks typically vary by approximately 16.46 marks from the mean.


🌍 Real-Life Applications

Descriptive statistics is used in:

  • 🎓 Student performance analysis
  • 🏥 Hospital patient data
  • 💼 Employee salary analysis
  • 🏦 Banking and finance
  • 📈 Stock market analysis
  • 🌦 Weather forecasting
  • 🛒 Business sales analysis
  • 🧪 Scientific research
  • 🏭 Manufacturing quality control
  • 📊 Government census reports

📋 Summary Table

Statistical MeasureFormulaR FunctionResult
MeanΣX / Nmean()70.4
MedianMiddle valuemedian()69.5
ModeMost frequent valueCustom FunctionNo mode (or 45 with this simple function)
RangeMax − Minmax() - min()50
VarianceΣ(x − x̄)² / (n − 1)var()270.93
Standard Deviation√Variancesd()16.46

📌 Advantages of Descriptive Statistics

  • Summarizes large datasets.
  • Easy to understand.
  • Supports decision-making.
  • Helps compare datasets.
  • Forms the basis for advanced statistical analysis.

⚠ Limitations

  • Describes only the available data.
  • Cannot make predictions about a population.
  • Sensitive to outliers (especially mean, variance, and standard deviation).
  • Does not establish cause-and-effect relationships.

📝 Lab Exercises

Exercise 1

Write an R program to calculate the Mean of the following data:

25, 30, 35, 40, 45, 50, 55, 60, 65, 70


Exercise 2

Write an R program to calculate the Median of:

18, 25, 32, 40, 45, 48, 55, 60, 68, 72


Exercise 3

Write an R program to calculate the Mode of:

10, 15, 20, 20, 20, 25, 30, 35, 40, 45


Exercise 4

Write an R program to calculate the Range of:

50, 60, 65, 70, 75, 80, 85, 90, 95, 100


Exercise 5

Write an R program to calculate the Variance and Standard Deviation of:

12, 15, 18, 20, 24, 27, 30, 32, 35, 40


❓ Viva Questions

  1. What is descriptive statistics?
  2. Define mean.
  3. Define median.
  4. What is mode?
  5. Why does R require a custom function for mode?
  6. Define range.
  7. What is variance?
  8. Define standard deviation.
  9. Differentiate between variance and standard deviation.
  10. Which measure of central tendency is least affected by outliers?
  11. What is the difference between sample variance and population variance?
  12. Which R function calculates the mean?
  13. Which R function calculates the median?
  14. Which R function calculates the variance?
  15. Which R function calculates the standard deviation?

🎯 Learning Outcomes

After completing Module 4, students can:

  • Explain descriptive statistics.
  • Calculate measures of central tendency.
  • Calculate measures of dispersion.
  • Develop R programs for statistical analysis.
  • Interpret statistical output correctly.
  • Apply descriptive statistics to real-world datasets.


📘 MODULE 5: ADVANCED R PROGRAMMING

Class 1: Control Structures in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the concept of control structures.
  • Use conditional statements in R.
  • Write programs using if, if...else, and switch().
  • Make decisions based on different conditions.
  • Develop real-world decision-making programs in R.

📖 5.1 Introduction to Control Structures

Control structures determine the flow of execution of a program. They allow the program to make decisions and execute different blocks of code based on specified conditions.

Without control structures, every statement would execute sequentially.

Control structures help programmers create intelligent and interactive programs.


🌟 Types of Control Structures in R

There are two major categories:

1. Conditional Statements

  • if
  • if...else
  • else if
  • switch()

2. Looping Statements

  • for
  • while
  • repeat

📌 Advantages of Control Structures

  • Makes programs intelligent.
  • Supports decision making.
  • Reduces unnecessary code.
  • Improves program efficiency.
  • Makes programs easier to maintain.

🔵 5.2 The if Statement

Definition

The if statement executes a block of code only if the specified condition is TRUE.

If the condition is FALSE, the statements inside the if block are skipped.


Syntax

if(condition)
{
statements
}

Flow Diagram

          Condition

┌───────┴────────┐
│ │
TRUE FALSE
│ │
Execute Block Skip Block

Example 1: Check Positive Number

Problem

Write an R program to check whether a number is positive.

R Program

number <- 25

if(number > 0)
{
print("Positive Number")
}

Output

[1] "Positive Number"

Explanation

Since 25 > 0, the condition is TRUE.

Therefore,

Positive Number

is printed.


Example 2: Student Passed or Not

Sample Marks

Marks = 68

R Program

marks <- 68

if(marks >= 40)
{
print("Student Passed")
}

Output

[1] "Student Passed"

Example 3: Check Even Number

R Program

number <- 20

if(number %% 2 == 0)
{
print("Even Number")
}

Output

[1] "Even Number"

Real-Life Applications

  • ATM transaction approval
  • Online payment verification
  • Student pass/fail checking
  • Login authentication
  • Age verification

Advantages

  • Simple and easy to use.
  • Executes code only when required.
  • Improves efficiency.

Limitation

  • Cannot specify an alternative action when the condition is FALSE.

🟢 5.3 The if...else Statement

Definition

The if...else statement executes one block of code if the condition is TRUE and another block if the condition is FALSE.


Syntax

if(condition)
{
statements
}
else
{
statements
}

Flow Diagram

           Condition

┌────────┴────────┐
│ │
TRUE FALSE
│ │
Execute IF Block Execute ELSE Block

Example 1: Pass or Fail

R Program

marks <- 35

if(marks >= 40)
{
print("Pass")
}
else
{
print("Fail")
}

Output

[1] "Fail"

Explanation

The student's marks are 35, which is less than 40.

Therefore, the else block is executed.


Example 2: Voting Eligibility

R Program

age <- 20

if(age >= 18)
{
print("Eligible for Voting")
}
else
{
print("Not Eligible")
}

Output

[1] "Eligible for Voting"

Example 3: Largest of Two Numbers

Sample Data

A = 45

B = 60

R Program

a <- 45
b <- 60

if(a > b)
{
print("A is Largest")
}
else
{
print("B is Largest")
}

Output

[1] "B is Largest"

Real-Life Applications

  • Bank loan approval
  • Employee promotion
  • Scholarship eligibility
  • Insurance claim approval
  • Online order verification

Advantages

  • Supports two-way decision making.
  • Easy to implement.
  • Improves program readability.

Practice Exercise

Write an R program to check whether a person is:

  • Adult (Age ≥ 18)
  • Minor (Age < 18)

Expected Program

age <- 16

if(age >= 18)
{
print("Adult")
}
else
{
print("Minor")
}

Output

[1] "Minor"

🟣 5.4 The switch() Statement

Definition

The switch() statement selects one option from multiple alternatives based on a given expression.

It is useful when there are many possible choices, making the code shorter and easier to read than multiple if...else if statements.


Syntax

switch(expression,
option1,
option2,
option3,
...
)

Example 1: Day of the Week

R Program

day <- 3

result <- switch(day,
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday")

print(result)

Output

[1] "Wednesday"

Example 2: Calculator Menu

R Program

choice <- 2

result <- switch(choice,
"Addition",
"Subtraction",
"Multiplication",
"Division")

print(result)

Output

[1] "Subtraction"

Advantages

  • Easy to read.
  • Suitable for menu-driven programs.
  • Reduces lengthy if...else if statements.
  • Improves program organization.

Limitations

  • Best suited for fixed choices.
  • Not suitable for complex logical conditions.

📊 Comparison of Conditional Statements

StatementPurposeBest Use
ifExecutes code when a condition is TRUESingle condition
if...elseChooses between two alternativesTwo-way decisions
switch()Selects one option from manyMenu-driven programs

📝 Lab Exercises

  1. Check whether a number is positive or negative.
  2. Check whether a number is even or odd.
  3. Check whether a student has passed or failed.
  4. Find the larger of two numbers.
  5. Display the day of the week using switch().
  6. Create a simple calculator menu using switch().

❓ Viva Questions

  1. What is a control structure?
  2. What is the purpose of the if statement?
  3. Explain the if...else statement with an example.
  4. What is the difference between if and if...else?
  5. What is the purpose of the switch() statement?
  6. Give two real-life applications of conditional statements.
  7. Which statement is suitable for menu-driven programs?
  8. What happens if the if condition is FALSE?

📚 Class Summary

In this class, you learned:

  • The concept of control structures.
  • How to use if for single-condition decisions.
  • How if...else handles two-way decisions.
  • How switch() simplifies multiple-choice selection.
  • Real-world applications, advantages, limitations, and practice exercises.


Class 2: Nested if...else and else if Ladder

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand nested if...else statements.
  • Use the else if ladder for multiple conditions.
  • Develop programs involving multiple decision-making scenarios.
  • Apply conditional logic to solve practical problems.

📖 5.5 Nested if...else

Definition

A nested if...else statement is an if or if...else statement placed inside another if or else block.

It is used when one decision depends on the result of another decision.


Syntax

if(condition1)
{
if(condition2)
{
statements
}
else
{
statements
}
}
else
{
statements
}

Flow of Execution

  1. Check the first condition.
  2. If it is TRUE, check the second condition.
  3. Execute the appropriate block.
  4. If the first condition is FALSE, execute the outer else block.

💻 Example 1: Voting Eligibility and Senior Citizen

Problem

Write an R program to determine whether a person is:

  • Not Eligible for Voting
  • Eligible for Voting
  • Senior Citizen

Sample Data

PersonAge
Rahul65

R Program

age <- 65

if(age >= 18)
{
if(age >= 60)
{
print("Senior Citizen")
}
else
{
print("Eligible for Voting")
}
}
else
{
print("Not Eligible for Voting")
}

Output

[1] "Senior Citizen"

Explanation

  • Age = 65
  • First condition (age >= 18) is TRUE.
  • Second condition (age >= 60) is also TRUE.
  • Therefore, Senior Citizen is displayed.

💻 Example 2: Login Authentication

Problem

Check both username and password.


R Program

username <- "admin"
password <- "12345"

if(username == "admin")
{
if(password == "12345")
{
print("Login Successful")
}
else
{
print("Incorrect Password")
}
}
else
{
print("Invalid Username")
}

Output

[1] "Login Successful"

Real-Life Applications

  • ATM authentication
  • Online banking
  • User login systems
  • Online examinations
  • Employee verification

🟢 5.6 else if Ladder


Definition

The else if ladder is used when there are more than two possible conditions.

The conditions are checked from top to bottom, and the first TRUE condition is executed.

If none of the conditions is TRUE, the else block is executed.


Syntax

if(condition1)
{
statements
}
else if(condition2)
{
statements
}
else if(condition3)
{
statements
}
else
{
statements
}

💻 Example 3: Grade Calculation

Problem

Write an R program to display the grade of a student based on marks.


Grade Table

MarksGrade
90–100A+
80–89A
70–79B
60–69C
40–59D
Below 40F

Sample Data

Marks = 76


R Program

marks <- 76

if(marks >= 90)
{
print("Grade A+")
}
else if(marks >= 80)
{
print("Grade A")
}
else if(marks >= 70)
{
print("Grade B")
}
else if(marks >= 60)
{
print("Grade C")
}
else if(marks >= 40)
{
print("Grade D")
}
else
{
print("Grade F")
}

Output

[1] "Grade B"

Explanation

Since 76 is greater than or equal to 70 but less than 80, the program prints Grade B.


💻 Example 4: Largest of Three Numbers

Sample Data

VariableValue
A45
B75
C60

R Program

a <- 45
b <- 75
c <- 60

if(a > b && a > c)
{
print("A is Largest")
}
else if(b > a && b > c)
{
print("B is Largest")
}
else
{
print("C is Largest")
}

Output

[1] "B is Largest"

💻 Example 5: Leap Year Check

Rule

A year is a leap year if:

  • It is divisible by 400, or
  • It is divisible by 4 but not divisible by 100.

Sample Data

Year = 2024


R Program

year <- 2024

if((year %% 400 == 0) || (year %% 4 == 0 && year %% 100 != 0))
{
print("Leap Year")
}
else
{
print("Not a Leap Year")
}

Output

[1] "Leap Year"

Explanation

  • 2024 is divisible by 4.
  • 2024 is not divisible by 100.
  • Therefore, 2024 is a leap year.

📊 Comparison of Conditional Statements

Featureifif...elseelse if Ladder
Number of ConditionsOneTwoMultiple
Alternative Action
Best UseSingle decisionTwo-way decisionMulti-way decision

🌍 Real-Life Applications

  • Student grading systems
  • Online login verification
  • Bank loan approval
  • Employee salary classification
  • Tax calculation
  • Scholarship eligibility
  • Election voting systems

✔ Advantages

  • Handles multiple conditions efficiently.
  • Makes code easier to read.
  • Supports complex decision-making.
  • Suitable for real-world applications.

✖ Disadvantages

  • Long else if ladders can reduce readability.
  • Incorrect condition order may lead to wrong results.

📝 Lab Exercises

  1. Write an R program to find the largest of three numbers.
  2. Write an R program to calculate grades using the else if ladder.
  3. Write an R program to check whether a year is a leap year.
  4. Write an R program to classify a person's age as Child, Teenager, Adult, or Senior Citizen.
  5. Write an R program to calculate electricity bills based on slab rates.

❓ Viva Questions

  1. What is a nested if...else statement?
  2. What is an else if ladder?
  3. What is the difference between nested if and else if?
  4. When should you use an else if ladder?
  5. Write the syntax of a nested if statement.
  6. How is a leap year determined in R?
  7. Why is the order of conditions important in an else if ladder?
  8. Give two practical applications of nested if.

📚 Class Summary

In this class, you learned:

  • Nested if...else
  • else if ladder
  • Grade calculation
  • Largest of three numbers
  • Leap year checking
  • Login authentication
  • Practical applications
  • Lab exercises and viva questions 


Class 3: Looping Constructs in R (Part 1)

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the need for loops in programming.
  • Use the for loop in R.
  • Apply nested for loops.
  • Generate tables and patterns using loops.
  • Solve repetitive programming tasks efficiently.

📖 5.7 Introduction to Loops

Definition

A loop is a control structure that repeatedly executes a block of code until a specified condition is met or for a fixed number of iterations.

Loops reduce repetitive coding and make programs shorter, more efficient, and easier to maintain.

Why Use Loops?

Without loops, printing numbers from 1 to 10 would require ten separate print() statements.

Using a loop, the same task can be completed with just a few lines of code.


🌟 Types of Loops in R

R supports three main looping constructs:

  1. for Loop
  2. while Loop
  3. repeat Loop

In this class, we focus on the for loop.


🔵 5.8 The for Loop

Definition

The for loop executes a block of code a fixed number of times. It is commonly used when the number of iterations is known in advance.


Syntax

for(variable in sequence)
{
statements
}

Flow Diagram

Start


Initialize Variable


Is Next Value Available?

┌───────┐
│ Yes │
▼ │
Execute │
Block │
│ │
└────────┘


No


Stop

💻 Example 1: Print Numbers from 1 to 10

R Program

for(i in 1:10)
{
print(i)
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The loop variable i takes the values from 1 to 10, one at a time.

The print(i) statement executes once for each value.


💻 Example 2: Print Even Numbers from 2 to 20

R Program

for(i in seq(2,20,2))
{
print(i)
}

Output

[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 16
[1] 18
[1] 20

Explanation

The seq(2,20,2) function generates the sequence:

2, 4, 6, 8, 10, 12, 14, 16, 18, 20

The loop prints each value.


💻 Example 3: Multiplication Table of 5

R Program

number <- 5

for(i in 1:10)
{
print(paste(number,"x",i,"=",number*i))
}

Output

[1] "5 x 1 = 5"
[1] "5 x 2 = 10"
[1] "5 x 3 = 15"
[1] "5 x 4 = 20"
[1] "5 x 5 = 25"
[1] "5 x 6 = 30"
[1] "5 x 7 = 35"
[1] "5 x 8 = 40"
[1] "5 x 9 = 45"
[1] "5 x 10 = 50"

💻 Example 4: Sum of First 10 Natural Numbers

Formula

Sum = 1 + 2 + 3 + ... + 10


R Program

sum <- 0

for(i in 1:10)
{
sum <- sum + i
}

print(sum)

Output

[1] 55

Explanation

The variable sum starts at 0.

Each loop iteration adds the current value of i.

Final result = 55.


💻 Example 5: Factorial of a Number

Sample Data

Number = 5


R Program

fact <- 1

for(i in 1:5)
{
fact <- fact * i
}

print(fact)

Output

[1] 120

Explanation

5! = 1 × 2 × 3 × 4 × 5 = 120


🟣 5.9 Nested for Loop

Definition

A nested for loop is a for loop placed inside another for loop.

It is useful for:

  • Pattern printing
  • Matrix operations
  • Multiplication tables
  • Two-dimensional data processing

Syntax

for(i in 1:n)
{
for(j in 1:m)
{
statements
}
}

💻 Example 6: Print a 5 × 5 Star Pattern

R Program

for(i in 1:5)
{
for(j in 1:5)
{
cat("* ")
}
cat("\n")
}

Output

* * * * *
* * * * *
* * * * *
* * * * *
* * * * *

💻 Example 7: Multiplication Table (1 to 5)

R Program

for(i in 1:5)
{
for(j in 1:5)
{
cat(i*j,"\t")
}
cat("\n")
}

Output

1   2   3   4   5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25

🌍 Real-Life Applications of for Loops

  • Processing student marks
  • Reading records from a dataset
  • Generating reports
  • Printing invoices
  • Matrix calculations
  • Creating tables and charts
  • Automating repetitive tasks

✔ Advantages

  • Reduces repetitive code.
  • Easy to understand.
  • Suitable when the number of iterations is known.
  • Improves code readability.

✖ Disadvantages

  • Not suitable when the number of iterations is unknown.
  • Can become inefficient for extremely large datasets if vectorized solutions are available.

📊 Summary Table

Loop TypeBest Used WhenExample
forNumber of iterations is knownPrint 1–10
Nested forTwo-dimensional processingMatrix, patterns

📝 Lab Exercises

  1. Print numbers from 1 to 20.
  2. Print all odd numbers from 1 to 50.
  3. Print the multiplication table of 7.
  4. Calculate the sum of the first 20 natural numbers.
  5. Calculate the factorial of 6.
  6. Print a 4 × 4 star pattern.
  7. Print the multiplication table from 1 to 10 using nested for loops.

❓ Viva Questions

  1. What is a loop?
  2. Why are loops used in programming?
  3. Define the for loop.
  4. What is a nested for loop?
  5. Write the syntax of a for loop.
  6. When should you use a for loop?
  7. Give two applications of nested for loops.
  8. What is the output of for(i in 1:5) print(i)?

📚 Class Summary

In this class, you learned:

  • The concept of loops.
  • The for loop.
  • Nested for loops.
  • Programs for printing numbers, even numbers, multiplication tables, sums, factorials, and star patterns.
  • Real-world applications, advantages, disadvantages, and practice exercises. 

Class 4: while Loop, repeat Loop, break, and next

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the while loop.
  • Use the repeat loop.
  • Apply break and next statements.
  • Write efficient looping programs.
  • Compare different looping constructs in R.

📖 5.10 The while Loop

Definition

A while loop repeatedly executes a block of code as long as the specified condition is TRUE.

Unlike a for loop, the number of iterations is not fixed. The loop continues until the condition becomes FALSE.


Syntax

while(condition)
{
statements
}

Flow Diagram

           Start


Check Condition

┌──────┴──────┐
│ │
TRUE FALSE
│ │
Execute Block Stop

└─────────────┘

💻 Example 1: Print Numbers from 1 to 10

R Program

i <- 1

while(i <= 10)
{
print(i)
i <- i + 1
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

  • Variable i starts at 1.
  • The condition i <= 10 is checked.
  • The value of i is printed.
  • i is increased by 1.
  • The process repeats until i = 11.

💻 Example 2: Sum of First 10 Natural Numbers

i <- 1
sum <- 0

while(i <= 10)
{
sum <- sum + i
i <- i + 1
}

print(sum)

Output

[1] 55

💻 Example 3: Multiplication Table of 8

i <- 1

while(i <= 10)
{
print(paste("8 x", i, "=", 8*i))
i <- i + 1
}

Output

[1] "8 x 1 = 8"
[1] "8 x 2 = 16"
[1] "8 x 3 = 24"
...
[1] "8 x 10 = 80"

✔ Advantages of while

  • Suitable when the number of iterations is unknown.
  • Easy to implement.
  • Flexible for condition-based repetition.

✖ Disadvantages

  • May result in an infinite loop if the condition never becomes FALSE.
  • Requires careful updating of the loop variable.

🟣 5.11 The repeat Loop


Definition

The repeat loop executes a block of code indefinitely until it is explicitly stopped using the break statement.


Syntax

repeat
{
statements

if(condition)
break
}

💻 Example 4: Print Numbers from 1 to 10

i <- 1

repeat
{
print(i)

i <- i + 1

if(i > 10)
break
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The repeat loop runs continuously until the break statement stops it.


💻 Example 5: Find First Number Divisible by 17

i <- 1

repeat
{
if(i %% 17 == 0)
{
print(i)
break
}

i <- i + 1
}

Output

[1] 17

🔴 5.12 The break Statement


Definition

The break statement immediately terminates the loop, regardless of whether the loop condition is still TRUE.


Syntax

break

💻 Example 6: Stop at Number 6

for(i in 1:10)
{
if(i == 6)
{
break
}

print(i)
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Explanation

When i becomes 6, the loop stops immediately.


🟢 5.13 The next Statement


Definition

The next statement skips the current iteration and continues with the next iteration of the loop.


Syntax

next

💻 Example 7: Skip Number 5

for(i in 1:10)
{
if(i == 5)
{
next
}

print(i)
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The value 5 is skipped because the next statement moves directly to the next iteration.


💻 Example 8: Print Only Odd Numbers

for(i in 1:20)
{
if(i %% 2 == 0)
{
next
}

print(i)
}

Output

[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
[1] 19

📊 Comparison of Loops

Featureforwhilerepeat
Number of IterationsKnownUnknownInfinite until break
Condition CheckedBefore each iterationBefore each iterationInside the loop
Best UseFixed repetitionsCondition-controlled repetitionContinuous processes

📊 Comparison of break and next

Featurebreaknext
ActionStops the loopSkips current iteration
Continues Loop?❌ No✔ Yes
Typical UseExit earlyIgnore specific values

🌍 Real-Life Applications

  • ATM cash withdrawal limits
  • Password validation
  • Online shopping carts
  • Data cleaning
  • Reading files line by line
  • Sensor monitoring
  • Menu-driven applications
  • Network communication

✔ Advantages

  • Automates repetitive tasks.
  • Supports flexible programming.
  • Easy to combine with conditions.
  • Efficient for large datasets.

✖ Disadvantages

  • Infinite loops may occur if conditions are incorrect.
  • Can reduce readability if nested excessively.

📝 Lab Exercises

Exercise 1

Print numbers from 1 to 50 using a while loop.


Exercise 2

Find the sum of the first 25 natural numbers.


Exercise 3

Print the multiplication table of 12 using a while loop.


Exercise 4

Write a repeat loop that prints numbers from 10 to 1.


Exercise 5

Print numbers from 1 to 20, stopping at 15 using break.


Exercise 6

Print numbers from 1 to 20, skipping multiples of 3 using next.


Exercise 7

Write a program to print only even numbers between 1 and 50.


❓ Viva Questions

  1. What is a while loop?
  2. How does a while loop differ from a for loop?
  3. What is a repeat loop?
  4. Why is the break statement used?
  5. What is the purpose of the next statement?
  6. What happens if a while loop condition never becomes FALSE?
  7. Can a repeat loop execute without a break statement?
  8. Which loop is best when the number of iterations is unknown?

📚 Class Summary

In this class, you learned:

  • The while loop and its syntax.
  • The repeat loop and how it differs from while.
  • The use of break to terminate loops.
  • The use of next to skip iterations.
  • Practical R programs with outputs and explanations.
  • Real-world applications, advantages, limitations, exercises, and viva questions. 



Class 5: Vectorized Operations in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand vectorized operations in R.
  • Perform arithmetic operations on vectors.
  • Apply logical and relational operators.
  • Use mathematical functions with vectors.
  • Compare vectorized operations with loops.

📖 5.14 Introduction to Vectorized Operations

Definition

Vectorization is one of the most powerful features of R. It allows operations to be performed on an entire vector at once instead of processing one element at a time using loops.

This makes R programs:

  • ✔ Faster
  • ✔ Shorter
  • ✔ Easier to read
  • ✔ More efficient

Example

Without vectorization:

a <- c(10,20,30,40,50)
b <- c(2,4,6,8,10)

result <- numeric(5)

for(i in 1:5)
{
result[i] <- a[i] + b[i]
}

print(result)

With vectorization:

a <- c(10,20,30,40,50)
b <- c(2,4,6,8,10)

result <- a + b

print(result)

Both programs produce the same result, but the vectorized version is shorter and more efficient.


🔵 5.15 Arithmetic Operations on Vectors


Sample Data (10 Values)

Vector A

10 20 30 40 50 60 70 80 90 100

Vector B

2 4 6 8 10 12 14 16 18 20

💻 Example 1: Addition

A <- c(10,20,30,40,50,60,70,80,90,100)
B <- c(2,4,6,8,10,12,14,16,18,20)

A + B

Output

[1] 12 24 36 48 60 72 84 96 108 120

💻 Example 2: Subtraction

A - B

Output

[1] 8 16 24 32 40 48 56 64 72 80

💻 Example 3: Multiplication

A * B

Output

[1] 20 80 180 320 500 720 980 1280 1620 2000

💻 Example 4: Division

A / B

Output

[1] 5 5 5 5 5 5 5 5 5 5

💻 Example 5: Power

A^2

Output

[1] 100 400 900 1600 2500 3600 4900 6400 8100 10000

🟢 5.16 Relational Operations

Relational operators compare vector elements and return TRUE or FALSE.


Operators

OperatorMeaning
>Greater than
<Less than
>=Greater than or equal
<=Less than or equal
==Equal
!=Not equal

💻 Example 6

A > 50

Output

[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE

💻 Example 7

A == 40

Output

[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE

🟣 5.17 Logical Operations

Logical operators combine multiple conditions.


Operators

OperatorMeaning
&AND
|OR
!NOT

💻 Example 8

(A > 30) & (A < 80)

Output

[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE

💻 Example 9

(A < 30) | (A > 80)

Output

[1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE

🟠 5.18 Mathematical Functions

R provides built-in mathematical functions that work directly on vectors.


Example 10: Square Root

sqrt(A)

Output

[1] 3.162278 4.472136 5.477226 6.324555 7.071068
[6] 7.745967 8.366600 8.944272 9.486833 10.000000

Example 11: Logarithm

log(A)

Output

[1] 2.302585 2.995732 3.401197 3.688879 3.912023
[6] 4.094345 4.248495 4.382027 4.499810 4.605170

Example 12: Exponential

exp(1:5)

Output

[1] 2.718282 7.389056 20.085537 54.598150 148.413159

Example 13: Absolute Value

x <- c(-5,-10,15,-20,25)

abs(x)

Output

[1] 5 10 15 20 25

🔴 5.19 Statistical Functions


Example 14

marks <- c(45,52,58,63,67,72,78,84,90,95)

sum(marks)

mean(marks)

max(marks)

min(marks)

length(marks)

Output

Sum      = 704

Mean = 70.4

Maximum = 95

Minimum = 45

Length = 10

⚡ Performance Comparison

Using Loop

result <- numeric(10)

for(i in 1:10)
{
result[i] <- A[i] + B[i]
}

Using Vectorization

result <- A + B

Which is Better?

FeatureLoopVectorization
SpeedSlowerFaster
ReadabilityModerateExcellent
Memory EfficiencyLowerHigher
Code LengthLongerShorter

🌍 Real-Life Applications

  • Financial analysis
  • Data preprocessing
  • Machine learning
  • Scientific computing
  • Image processing
  • Bioinformatics
  • Statistical analysis
  • Business analytics

✔ Advantages

  • Faster execution.
  • Cleaner code.
  • Less programming effort.
  • Better performance.
  • Optimized for large datasets.

✖ Disadvantages

  • May use more memory for very large vectors.
  • Requires vectors of compatible lengths or understanding of R's recycling rules.

📝 Lab Exercises

  1. Create two vectors of 10 elements and perform addition.
  2. Perform subtraction, multiplication, and division on two vectors.
  3. Find the square of every element in a vector.
  4. Check which elements are greater than 50.
  5. Find the square root of all elements.
  6. Calculate the sum and mean of a vector.
  7. Compare the performance of a loop and vectorized addition.
  8. Create a vector of temperatures and convert them from Celsius to Fahrenheit using vectorized operations.

❓ Viva Questions

  1. What is vectorization in R?
  2. Why are vectorized operations faster than loops?
  3. Name four arithmetic operations on vectors.
  4. What are relational operators?
  5. What is the purpose of logical operators?
  6. Which function calculates the square root?
  7. Which function returns the absolute value?
  8. Which function calculates the mean?
  9. How does vectorization improve code readability?
  10. Give two applications of vectorized operations.

📚 Class Summary

In this class, you learned:

  • The concept of vectorization.
  • Arithmetic operations on vectors.
  • Relational and logical operations.
  • Mathematical and statistical functions.
  • Performance comparison between loops and vectorized operations.
  • Practical programs with outputs and explanations.
  • Applications, exercises, and viva questions.



Class 6: The apply() Function in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the purpose of the apply() function.
  • Perform row-wise and column-wise operations on matrices.
  • Replace loops with efficient vectorized operations.
  • Apply mathematical and statistical functions to matrices.
  • Analyze matrix data effectively.

📖 5.20 Introduction to the Apply Family

The Apply Family is one of the most powerful features of R. It provides efficient alternatives to explicit loops for processing data.

The Apply Family includes:

FunctionPurpose
apply()Apply a function to rows or columns of a matrix/array
lapply()Apply a function to each element of a list
sapply()Simplified version of lapply()
tapply()Apply a function to groups of data
mapply()Apply a function to multiple objects simultaneously

In this class, we focus on apply().


🔵 5.21 The apply() Function

Definition

The apply() function applies a specified function to the rows or columns of a matrix or array.

It eliminates the need for explicit loops and makes the code more concise.


Syntax

apply(X, MARGIN, FUN)

Parameters

ParameterDescription
XMatrix or array
MARGIN = 1Apply function to rows
MARGIN = 2Apply function to columns
FUNFunction to apply (e.g., sum, mean, max)

📊 Sample Matrix (5 × 2)

We will use the following matrix of 10 values throughout this chapter.

     English  Mathematics

S1 65 70
S2 75 82
S3 68 75
S4 90 95
S5 88 91

Creating the Matrix

marks <- matrix(c(
65,70,
75,82,
68,75,
90,95,
88,91),
nrow=5,
byrow=TRUE)

colnames(marks)<-c("English","Mathematics")

rownames(marks)<-c("S1","S2","S3","S4","S5")

marks

Output

   English Mathematics

S1 65 70
S2 75 82
S3 68 75
S4 90 95
S5 88 91

💻 Example 1: Row-wise Sum

apply(marks,1,sum)

Output

S1  S2  S3  S4  S5

135 157 143 185 179

Explanation

Here,

  • 1 means rows
  • sum calculates the total marks of each student.

💻 Example 2: Column-wise Sum

apply(marks,2,sum)

Output

English      Mathematics

386 413

Explanation

2 indicates columns.

The total marks for each subject are calculated.


💻 Example 3: Row-wise Mean

apply(marks,1,mean)

Output

S1   S2   S3   S4   S5

67.5 78.5 71.5 92.5 89.5

💻 Example 4: Column-wise Mean

apply(marks,2,mean)

Output

English      Mathematics

77.2 82.6

💻 Example 5: Maximum Marks in Each Subject

apply(marks,2,max)

Output

English      Mathematics

90 95

💻 Example 6: Minimum Marks

apply(marks,2,min)

Output

English      Mathematics

65 70

💻 Example 7: Standard Deviation

apply(marks,2,sd)

Output

English      Mathematics

11.92 10.48

(Approximate values.)


💻 Example 8: Variance

apply(marks,2,var)

Output

English      Mathematics

142.2 109.8

(Approximate values.)


💻 Example 9: Square Root of All Elements

apply(marks,c(1,2),sqrt)

Output

        English Mathematics

S1 8.06 8.37

S2 8.66 9.05

S3 8.25 8.66

S4 9.49 9.75

S5 9.38 9.54

💻 Example 10: Find Maximum Marks of Each Student

apply(marks,1,max)

Output

S1 S2 S3 S4 S5

70 82 75 95 91

💻 Example 11: Find Minimum Marks of Each Student

apply(marks,1,min)

Output

S1 S2 S3 S4 S5

65 75 68 90 88

📊 Understanding MARGIN

ValueOperation
1Apply function row-wise
2Apply function column-wise
c(1,2)Apply function to every element

🌍 Real-Life Applications

The apply() function is commonly used in:

  • Student result analysis
  • Employee salary calculations
  • Financial data analysis
  • Sales report generation
  • Machine learning data preprocessing
  • Scientific research
  • Medical statistics
  • Data mining

✔ Advantages

  • Faster than explicit loops.
  • Reduces program length.
  • Easy to read and maintain.
  • Efficient for matrix operations.
  • Ideal for statistical analysis.

✖ Disadvantages

  • Works only with arrays and matrices.
  • Less suitable for irregular data structures (lists with different element types).

📊 Comparison: Loop vs. apply()

Featurefor Loopapply()
Code LengthLongShort
SpeedModerateFast
ReadabilityGoodExcellent
Matrix OperationsManualBuilt-in
PerformanceLowerHigher

📝 Lab Exercises

Exercise 1

Create a 5 × 2 matrix and calculate the row-wise sum.


Exercise 2

Find the column-wise average of a matrix.


Exercise 3

Find the maximum value in each row.


Exercise 4

Find the minimum value in each column.


Exercise 5

Calculate the standard deviation of each column.


Exercise 6

Calculate the variance of each row.


Exercise 7

Find the square root of every matrix element using apply().


Exercise 8

Compare a for loop with apply() for calculating row sums.


❓ Viva Questions

  1. What is the purpose of the apply() function?
  2. Write the syntax of apply().
  3. What does MARGIN = 1 mean?
  4. What does MARGIN = 2 mean?
  5. Can apply() be used on vectors?
  6. Which data structures are suitable for apply()?
  7. Give two advantages of apply().
  8. Why is apply() preferred over loops?
  9. Name five functions in the Apply Family.
  10. Give two real-life applications of apply().

📚 Class Summary

In this class, you learned:

  • The concept of the Apply Family.
  • Syntax and parameters of apply().
  • Row-wise and column-wise calculations.
  • Matrix operations using statistical functions.
  • Comparison of apply() and loops.
  • Practical R programs with outputs and explanations.
  • Real-world applications, lab exercises, and viva questions.









Class 7: lapply() and sapply() Functions in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the purpose of lapply() and sapply().
  • Apply functions to list elements.
  • Differentiate between lapply() and sapply().
  • Use these functions for efficient data processing.
  • Analyze list-based data in R.

📖 5.22 Introduction to lapply() and sapply()

In R, lists can store different types of data, such as numbers, characters, vectors, and matrices. The lapply() and sapply() functions are designed to apply a function to each element of a list.

These functions help eliminate repetitive loops and produce concise, efficient code.


🔵 5.23 The lapply() Function

Definition

The lapply() function applies a specified function to each element of a list and always returns a list.


Syntax

lapply(X, FUN)

Parameters

ParameterDescription
XList or vector
FUNFunction to apply

📊 Sample List

student_data <- list(

English = c(65,70,75,80,85),

Math = c(72,78,82,88,90),

Science = c(68,74,79,84,89)

)

💻 Example 1: Find Mean of Each Subject

R Program

student_data <- list(

English=c(65,70,75,80,85),

Math=c(72,78,82,88,90),

Science=c(68,74,79,84,89)

)

lapply(student_data,mean)

Output

$English

[1] 75

$Math

[1] 82

$Science

[1] 78.8

Explanation

The mean() function is applied to each list element individually. The result is returned as a list.


💻 Example 2: Find Sum of Each Subject

lapply(student_data,sum)

Output

$English

[1] 375

$Math

[1] 410

$Science

[1] 394

💻 Example 3: Find Maximum Marks

lapply(student_data,max)

Output

$English

[1] 85

$Math

[1] 90

$Science

[1] 89

💻 Example 4: Find Minimum Marks

lapply(student_data,min)

Output

$English

[1] 65

$Math

[1] 72

$Science

[1] 68

💻 Example 5: Find Standard Deviation

lapply(student_data,sd)

Output

$English

[1] 7.91

$Math

[1] 7.44

$Science

[1] 8.17

(Approximate values.)


🟢 5.24 The sapply() Function

Definition

The sapply() function works like lapply(), but it tries to simplify the result. If possible, it returns a vector or matrix instead of a list.


Syntax

sapply(X, FUN)

💻 Example 6: Mean of Each Subject

sapply(student_data,mean)

Output

English      Math   Science

75.0 82.0 78.8

Explanation

Unlike lapply(), the result is returned as a named vector.


💻 Example 7: Sum of Each Subject

sapply(student_data,sum)

Output

English      Math   Science

375 410 394

💻 Example 8: Maximum Marks

sapply(student_data,max)

Output

English      Math   Science

85 90 89

💻 Example 9: Minimum Marks

sapply(student_data,min)

Output

English      Math   Science

65 72 68

💻 Example 10: Length of Each Subject Vector

sapply(student_data,length)

Output

English      Math   Science

5 5 5

💻 Example 11: Square Root of All Values

sapply(student_data,sqrt)

Output

          English      Math    Science

[1,] 8.06 8.49 8.25

[2,] 8.37 8.83 8.60

[3,] 8.66 9.06 8.89

[4,] 8.94 9.38 9.17

[5,] 9.22 9.49 9.43

📊 Comparison of lapply() and sapply()

Featurelapply()sapply()
Return TypeListVector, Matrix, or List
Simplifies OutputNoYes
Easy to ReadModerateExcellent
Suitable ForComplex ObjectsSimple Results
Common UsageListsReports and Summaries

🌍 Real-Life Applications

  • Student result analysis
  • Employee salary reports
  • Machine learning preprocessing
  • Financial reporting
  • Medical research
  • Sales analysis
  • Survey data analysis
  • Scientific computing

✔ Advantages

lapply()

  • Always returns a list.
  • Preserves the original structure.
  • Suitable for complex data.

sapply()

  • Produces simplified output.
  • Easier to use in calculations.
  • Ideal for reports and summaries.

✖ Disadvantages

lapply()

  • Output may require additional extraction.

sapply()

  • Simplification may not always produce the expected structure.

📊 Comparison with Loops

Featurefor Looplapply()sapply()
SpeedModerateFastFast
Code LengthLongShortShort
ReadabilityGoodExcellentExcellent
OutputManualListSimplified

📝 Lab Exercises

Exercise 1

Create a list containing marks for three subjects and calculate the mean using lapply().


Exercise 2

Find the sum of each subject using sapply().


Exercise 3

Calculate the maximum and minimum marks for each subject.


Exercise 4

Find the standard deviation of each subject.


Exercise 5

Use sapply() to calculate the length of each vector in the list.


Exercise 6

Find the square root of all values in the list using sapply().


Exercise 7

Compare the outputs of lapply() and sapply() for the same dataset.


❓ Viva Questions

  1. What is the purpose of lapply()?
  2. What is the purpose of sapply()?
  3. What is the main difference between lapply() and sapply()?
  4. Which function always returns a list?
  5. Which function simplifies its output?
  6. Can sapply() return a matrix?
  7. Give two applications of lapply().
  8. Give two applications of sapply().
  9. Why are these functions preferred over loops?
  10. Name five functions in the Apply Family.

📚 Class Summary

In this class, you learned:

  • The purpose of lapply() and sapply().
  • How to apply functions to list elements.
  • The difference between the two functions.
  • Practical R programs with outputs.
  • Real-world applications.
  • Comparison with loops.
  • Lab exercises and viva questions. 









📘 MODULE 5: ADVANCED R PROGRAMMING

Class 8: tapply() and mapply() Functions in R

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the purpose of tapply() and mapply().
  • Perform group-wise calculations using tapply().
  • Apply functions to multiple vectors simultaneously using mapply().
  • Compare all members of the Apply Family.
  • Solve practical data analysis problems efficiently.

📖 5.25 The tapply() Function

Definition

The tapply() function is used to apply a function to subsets of a vector, where the subsets are defined by a grouping variable (factor).

It is particularly useful for group-wise statistical analysis.


Syntax

tapply(X, INDEX, FUN)

Parameters

ParameterDescription
XNumeric vector
INDEXGrouping factor
FUNFunction to apply (e.g., mean, sum, max)

Sample Data (10 Students)

student <- c("S1","S2","S3","S4","S5","S6","S7","S8","S9","S10")

department <- c(
"Science","Commerce","Science","Arts","Commerce",
"Arts","Science","Commerce","Arts","Science")

marks <- c(85,72,90,65,78,70,88,80,75,92)

💻 Example 1: Average Marks by Department

R Program

tapply(marks, department, mean)

Output

      Arts Commerce Science

70.00 76.67 88.75

Explanation

The students are grouped by department, and the mean marks are calculated separately for each group.


💻 Example 2: Total Marks by Department

tapply(marks, department, sum)

Output

Arts Commerce Science

210 230 355

💻 Example 3: Maximum Marks

tapply(marks, department, max)

Output

Arts Commerce Science

75 80 92

💻 Example 4: Minimum Marks

tapply(marks, department, min)

Output

Arts Commerce Science

65 72 85

💻 Example 5: Standard Deviation

tapply(marks, department, sd)

Output

Arts Commerce Science

5.00 4.16 2.99

(Approximate values.)


📊 Real-Life Uses of tapply()

  • Average salary by department
  • Sales by region
  • Marks by class
  • Hospital patients by ward
  • Profit by branch
  • Employee performance by team

🟢 5.26 The mapply() Function


Definition

The mapply() function applies a function to multiple vectors or lists simultaneously.

It is the multivariate version of sapply().


Syntax

mapply(FUN, vector1, vector2)

Sample Data

A <- c(10,20,30,40,50)

B <- c(2,4,6,8,10)

💻 Example 6: Addition

mapply(function(x,y)x+y,A,B)

Output

[1] 12 24 36 48 60

💻 Example 7: Multiplication

mapply(function(x,y)x*y,A,B)

Output

[1] 20 80 180 320 500

💻 Example 8: Division

mapply(function(x,y)x/y,A,B)

Output

[1] 5 5 5 5 5

💻 Example 9: Power Function

base <- c(2,3,4,5,6)

power <- c(2,2,2,2,2)

mapply(function(x,y)x^y,base,power)

Output

[1] 4 9 16 25 36

💻 Example 10: Maximum of Two Numbers

mapply(max,A,B)

Output

[1] 10 20 30 40 50

💻 Example 11: Minimum of Two Numbers

mapply(min,A,B)

Output

[1] 2 4 6 8 10

💻 Example 12: Product of Price and Quantity

Sample Data

price <- c(150,250,300,120,400)

quantity <- c(2,1,3,4,2)

R Program

mapply(function(p,q)p*q,price,quantity)

Output

[1] 300 250 900 480 800

Explanation

Each product price is multiplied by its corresponding quantity to calculate the total cost.


📊 Comparison of the Apply Family

FunctionInputReturnsBest Used For
apply()MatrixVector/MatrixRow-wise & Column-wise operations
lapply()ListListComplex list processing
sapply()ListVector/MatrixSimplified summaries
tapply()Vector + GroupGroup-wise resultStatistical analysis
mapply()Multiple vectorsVector/ListMultiple input operations

🌍 Real-Life Applications

tapply()

  • Sales by region
  • Student marks by department
  • Employee salary by designation
  • Hospital patient analysis
  • Banking transaction summaries

mapply()

  • Billing systems
  • Invoice generation
  • Salary calculations
  • Shopping cart totals
  • Financial computations

✔ Advantages

tapply()

  • Performs grouped calculations efficiently.
  • Reduces coding effort.
  • Ideal for statistical reporting.

mapply()

  • Works with multiple vectors simultaneously.
  • Eliminates nested loops.
  • Improves readability and efficiency.

✖ Disadvantages

  • Requires compatible vector lengths for mapply().
  • tapply() needs an appropriate grouping factor.

📝 Lab Exercises

Exercise 1

Create a vector of marks and departments. Find the average marks department-wise using tapply().


Exercise 2

Find the maximum marks department-wise.


Exercise 3

Calculate the standard deviation for each department.


Exercise 4

Create two vectors and perform addition using mapply().


Exercise 5

Multiply two vectors element-wise using mapply().


Exercise 6

Calculate the total cost of products using price and quantity vectors.


Exercise 7

Compare the outputs of apply(), lapply(), sapply(), tapply(), and mapply() using suitable datasets.


❓ Viva Questions

  1. What is the purpose of tapply()?
  2. What is a grouping factor in tapply()?
  3. Write the syntax of tapply().
  4. What is the purpose of mapply()?
  5. How is mapply() different from sapply()?
  6. Give two real-life applications of tapply().
  7. Give two applications of mapply().
  8. Which Apply Family function is used for grouped statistical analysis?
  9. Which function processes multiple vectors simultaneously?
  10. List all five Apply Family functions in R.

📚 Class Summary

In this class, you learned:

  • The tapply() function for group-wise calculations.
  • The mapply() function for operations on multiple vectors.
  • Practical examples with outputs.
  • Comparison of all Apply Family functions.
  • Real-world applications.
  • Lab exercises and viva questions.



📘 MODULE 5: ADVANCED R PROGRAMMING

Class 9: Debugging Tools in R (debug(), trace(), browser(), debugonce())

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand the concept of debugging.
  • Identify different types of programming errors.
  • Use built-in debugging tools in R.
  • Trace and inspect program execution.
  • Find and fix logical and runtime errors.

📖 5.27 Introduction to Debugging

Definition

Debugging is the process of finding, identifying, and correcting errors (bugs) in a program.

Every programmer encounters errors while writing code. Debugging tools help locate these errors efficiently.


🌟 Why Debugging is Important?

Debugging helps to:

  • Find programming mistakes.
  • Correct logical errors.
  • Prevent program crashes.
  • Improve code quality.
  • Reduce development time.
  • Increase program reliability.

📊 Types of Errors in R

Error TypeDescriptionExample
Syntax ErrorIncorrect grammarMissing )
Runtime ErrorError during executionDivision by zero
Logical ErrorProgram runs but gives wrong outputIncorrect formula

🔵 Example 1: Syntax Error

Incorrect Program

x <- 10
y <- 20

print(x+y

Output

Error: unexpected end of input

Correct Program

x <- 10
y <- 20

print(x+y)

Output

[1] 30

🔴 Example 2: Logical Error

Incorrect Program

length <- 10
breadth <- 5

area <- 2*(length+breadth)

print(area)

Output

[1] 30

Explanation

The program calculates the perimeter, not the area.


Correct Program

length <- 10
breadth <- 5

area <- length * breadth

print(area)

Output

[1] 50

🟣 5.28 The debug() Function

Definition

The debug() function places a function into debugging mode. The function pauses before executing each statement, allowing you to inspect variables and execution flow.


Syntax

debug(function_name)

💻 Example 3

square <- function(x)
{
y <- x*x
return(y)
}

debug(square)

square(5)

Console Output (Simplified)

debugging in: square(5)

Browse[2]>

Explanation

The program enters debug mode and pauses at each line. You can inspect variables and execute commands before continuing.


🟢 5.29 The debugonce() Function

Definition

debugonce() works like debug(), but it enables debugging only for the next function call.


Syntax

debugonce(function_name)

💻 Example 4

cube <- function(x)
{
x^3
}

debugonce(cube)

cube(4)

Output

debugging in: cube(4)

Browse[2]>

Explanation

The debugger is activated only once. Future calls to cube() run normally unless debugonce() is called again.


🟡 5.30 The trace() Function

Definition

The trace() function inserts temporary debugging or tracing code into an existing function without modifying the original function definition.


Syntax

trace(function_name)

💻 Example 5

add <- function(a,b)
{
a+b
}

trace(add)

add(10,20)

Output (Simplified)

Tracing add(10,20)

[1] 30

Explanation

The trace message shows when the function is called, helping you understand program flow.


🔵 5.31 The browser() Function

Definition

The browser() function pauses program execution at a specific point, allowing you to inspect variables interactively.


Syntax

browser()

💻 Example 6

calculate <- function(x,y)
{
browser()

z <- x+y

print(z)
}

calculate(15,25)

Output (Simplified)

Called from: calculate(15,25)

Browse[1]>

Explanation

Execution stops at browser(). You can inspect variables, execute commands, and continue execution after checking the program state.


💻 Example 7: Removing Debug Mode

undebug(square)

Explanation

The undebug() function disables debugging for the specified function.


📊 Debugging Workflow

Write Program


Run Program


Error Found?

┌────┴────┐
│ │
No Yes
│ │
End Use Debugging Tools


Fix the Error


Test Again

📊 Comparison of Debugging Functions

FunctionPurposeStops Execution?
debug()Debug every call✔ Yes
debugonce()Debug next call only✔ Yes
trace()Trace function execution✖ Usually No
browser()Pause at a specific line✔ Yes
undebug()Remove debug mode✖ No

🌍 Real-Life Applications

  • Software development
  • Data science projects
  • Financial applications
  • Machine learning model debugging
  • Scientific computing
  • Statistical analysis
  • Database programming
  • Web application development

✔ Advantages

  • Quickly identifies programming errors.
  • Improves program reliability.
  • Saves development time.
  • Helps understand program execution.
  • Useful for large R projects.

✖ Disadvantages

  • Can slow program execution.
  • Requires understanding of program flow.
  • Excessive debugging may become time-consuming.

📝 Lab Exercises

Exercise 1

Create a function to calculate the square of a number and debug it using debug().


Exercise 2

Use debugonce() with a factorial function.


Exercise 3

Insert browser() into a function and inspect variable values.


Exercise 4

Trace a user-defined function using trace().


Exercise 5

Enable debugging and then remove it using undebug().


Exercise 6

Create a program with a logical error and identify it using debugging techniques.


Exercise 7

Write a function to calculate the average of five numbers and debug the function.


❓ Viva Questions

  1. What is debugging?
  2. Why is debugging important?
  3. What are the three main types of errors?
  4. What is the purpose of debug()?
  5. How does debugonce() differ from debug()?
  6. What is the purpose of trace()?
  7. What does the browser() function do?
  8. How can you remove debugging from a function?
  9. Which debugging function pauses execution at a specified line?
  10. Give two real-life applications of debugging.

📚 Class Summary

In this class, you learned:

  • The concept and importance of debugging.
  • Types of programming errors.
  • Using debug(), debugonce(), trace(), browser(), and undebug().
  • Practical debugging examples with outputs.
  • Debugging workflow.
  • Real-world applications.
  • Lab exercises and viva questions.




📘 MODULE 5: ADVANCED R PROGRAMMING

Class 10: Error Handling in R (try(), tryCatch(), warning(), stop())

Duration: 1 Class


🎯 Learning Objectives

After completing this lesson, students will be able to:

  • Understand error handling in R.
  • Use try() to prevent program termination.
  • Handle errors using tryCatch().
  • Generate warning messages with warning().
  • Stop execution using stop().
  • Write robust and fault-tolerant R programs.

📖 5.32 Introduction to Error Handling

Definition

Error handling is the process of detecting, managing, and responding to errors that occur during program execution.

Instead of allowing a program to terminate unexpectedly, error handling enables the program to continue executing gracefully or display meaningful messages.


Why is Error Handling Important?

Error handling helps to:

  • Prevent unexpected program crashes.
  • Improve software reliability.
  • Provide user-friendly error messages.
  • Handle invalid input safely.
  • Simplify debugging and maintenance.

📊 Types of Conditions in R

TypeDescriptionExample
ErrorStops program executionDivision by zero in some contexts, invalid operations
WarningDisplays a warning but continues executionsqrt(-1)
MessageProvides informational messagesPackage loading messages

🔵 5.33 The try() Function

Definition

The try() function executes an expression and captures any errors without stopping the entire program.


Syntax

try(expression)

Example 1: Division

result <- try(10 / 2)

print(result)

Output

[1] 5

Example 2: Invalid Operation

x <- "ABC"

result <- try(log(x))

print(result)

Output

Error in log(x) :
non-numeric argument to mathematical function

The program continues running, even though an error occurs.


Advantages of try()

  • Prevents abrupt program termination.
  • Useful in loops and batch processing.
  • Easy to implement.

🟢 5.34 The tryCatch() Function

Definition

The tryCatch() function provides advanced error handling by allowing different actions for errors, warnings, and successful execution.


Syntax

tryCatch(
expression,

error = function(e){},

warning = function(w){},

finally = {}
)

💻 Example 3: Handle an Error

result <- tryCatch(

{
log("ABC")
},

error = function(e)
{
print("Error Detected")
}
)

Output

[1] "Error Detected"

💻 Example 4: Successful Execution

result <- tryCatch(

{
sqrt(64)
},

error = function(e)
{
print("Error")
}
)

print(result)

Output

[1] 8

💻 Example 5: Using finally

tryCatch(

{
print("Program Started")
},

finally=
{
print("Program Finished")
}
)

Output

[1] "Program Started"

[1] "Program Finished"

🟡 5.35 The warning() Function

Definition

The warning() function displays a warning message but does not stop the program.


Syntax

warning("Message")

💻 Example 6

marks <- -10

if(marks < 0)
{
warning("Marks cannot be negative.")
}

Output

Warning message:

Marks cannot be negative.

Explanation

The program continues executing after displaying the warning.


🔴 5.36 The stop() Function

Definition

The stop() function immediately terminates execution and displays an error message.


Syntax

stop("Error Message")

💻 Example 7

age <- -5

if(age < 0)
{
stop("Age cannot be negative.")
}

Output

Error:

Age cannot be negative.

Explanation

The program stops immediately because stop() generates an error.


💻 Example 8: Combining warning() and stop()

temperature <- -300

if(temperature < -273.15)
{
stop("Temperature below absolute zero is not possible.")
}
else if(temperature < 0)
{
warning("Temperature is below freezing.")
}
else
{
print("Temperature is valid.")
}

Output

Error:

Temperature below absolute zero is not possible.

📊 Comparison of Error Handling Functions

FunctionStops ProgramPurpose
try()❌ NoContinue after an error
tryCatch()❌ NoHandle errors and warnings gracefully
warning()❌ NoDisplay warning message
stop()✔ YesTerminate program with an error

🌍 Real-Life Applications

  • Banking software
  • E-commerce applications
  • Online registration systems
  • Student management systems
  • Medical data processing
  • Machine learning pipelines
  • Scientific computing
  • Financial analysis

✔ Best Practices

  • Validate user input before processing.
  • Use meaningful error and warning messages.
  • Handle expected errors with tryCatch().
  • Use stop() only for critical errors.
  • Test programs with both valid and invalid inputs.

📝 Lab Exercises

Exercise 1

Use try() to execute a division operation safely.


Exercise 2

Use tryCatch() to handle invalid numeric input.


Exercise 3

Create a warning if marks are negative.


Exercise 4

Use stop() when age is less than zero.


Exercise 5

Write a function that checks whether a number is positive. If not, display an appropriate warning or error.


Exercise 6

Create a simple calculator and handle division by zero using tryCatch().


Exercise 7

Write a program to validate student marks (0–100). Display a warning for unusual values and stop execution for invalid values.


❓ Viva Questions

  1. What is error handling?
  2. Why is error handling important?
  3. What is the purpose of try()?
  4. How does tryCatch() differ from try()?
  5. What is the purpose of warning()?
  6. When should stop() be used?
  7. Does warning() terminate program execution?
  8. What is the role of the finally block in tryCatch()?
  9. Give two real-life applications of error handling.
  10. Why should programs validate user input?

📚 Module 5 Summary

In Module 5: Advanced R Programming, you learned:

  • Control Structures (if, if...else, switch)
  • Looping Constructs (for, while, repeat)
  • break and next
  • Vectorized Operations
  • The Apply Family (apply(), lapply(), sapply(), tapply(), mapply())
  • Debugging (debug(), debugonce(), trace(), browser())
  • Error Handling (try(), tryCatch(), warning(), stop())

You also practiced each concept through:

  • ✔ Step-by-step explanations
  • ✔ R programs with sample data
  • ✔ Expected outputs
  • ✔ Real-world applications
  • ✔ Comparison tables
  • ✔ Lab exercises
  • ✔ Viva questions

🎓 End of Module 5