R LANGUAGE

BASIC :

1. ASSIGN 2 VALUES USING <- OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE - CLICK CLICK

2. ASSIGN 2 VALUES USING -> OPERATOR AND PRINT SUM USING CAT() IN R LANGUAGE - CLICK CLICK

3. ASSIGN 2 VALUES USING = OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE - CLICK CLICK

4. ASSIGN 2 VALUES USING -> OPERATOR AND PRINT SUM OF TWO VALUES IN R LANGUAGE - CLICK CLICK

5. ARITHMETIC OPERATOR IN R LANGUAGE ( USING TWO INTEGER VALUES ) - CLICK CLICK

6. ARITHMETIC OPERATOR IN R LANGUAGE ( USING TWO FLOAT VALUES ) - CLICK CLICK

7. RELATIONAL OPERATORS IN R LANGUAGE - CLICK CLICK

8. LOGICAL OPERATORS IN R LANGUAGE - CLICK CLICK

9. IF ELSE STRUCTURE IN R LANGUAGE - CLICK CLICK

10. NESTED IF STRUCTURE IN R LANGUAGE - CLICK

11. INPUT STRING FROM USER AND PRINT IN R LANGUAGE - CLICK

12. FOR LOOP STRUCTURE IN R LANGUAGE - CLICK

13. WHILE LOOP STRUCTURE IN R LANGUAGE - CLICK

14. SWITCH CASE IN R LANGAUAGE - CLICK

15. SWITCH CASE WHEN CASE VALUE GIVEN BY USER IN R LANGUAGE - CLICK

VECTOR:

1. VECTOR CREATION, MODIFICATION,DELETION,BASIC OPERARTION OF TWO VECTORS IN R LANGUAGE -- CLICK

2. SUM AND AVERAGE OF ELEMENTS INVECTOR IN R LANGUAGE - CLICK

3. LARGEST AND SMALLEST ELEMENT IN VECTOR IN R LANGUAGE - CLICK

4. LINEAR SEARCH IN R LANGUAGE - CLICK

5. BINARY SEARCH IN R LANGUAGE - CLICK

6. BUBBLE SORT IN R LANGUAGE - CLICK

7. INSERTION SORT IN R LANGUAGE - CLICK

8. SELECTION SORT IN R LANGUAGE - CLICK

9. RANDOMIZED QUICK SORT IN R LANGUAGE - CLICK

10. MERGE SORT IN R LANGUAGE - CLICK

11. VECTOR USING RANDOM VALUES - CLICK

MATRIX :

1. MATRIX CREATION IN R LANGUAGE , ADDITION OF TWO MATRICES IN R LANGUAGE , SUBTRACTION OF TWO MATRICES IN R LANGUAGE, MULTIPLICATION OF TWO MATRICES IN R LANGUAGE, DIVISION OF TWO MATRICES IN R LANGUAGE - CLICK

APPLY FAMILY IN R LANGUAGE:

1.APPLY() IN R LANGUAGE: CLICK

2.LAPPLY() IN R LANGUAGE: CLICK

3.SAPPLY () IN R LANGUAGE: CLICK

UNIVERSITY ASSIGNMENT - CLICK HERE

DATA HANDLING :

1. READ DATA FROM CSV FILE I) CLICK II) CLICK III) CLICK

2. Read data from CSV files - FIND LARGEST AND SMALLEST I) CLICK II) CLICK

3. WRITE DATA TO CSV FILE I) CLICK II) CLICK

4. DATA / SUBSET FROM CSV FILE IN R LANGUAGE CLICK

5. Read data from txt file CLICK

DATA PROCESSING / CLEANING

1. Type conversion in R language CLICK

📘 Module 1: Introduction to R Programming

(6 Classes)

🎯 Learning Outcomes

After completing this module, students will be able to:

✔ Install R and RStudio

✔ Understand the RStudio Interface

✔ Write basic R programs

✔ Perform arithmetic and logical operations

✔ Work with different data types

✔ Create and manipulate vectors, lists, matrices and data frames

✔ Understand factors and categorical variables

CLASS 1

Introduction to R

What is R?

R is an open-source programming language specially designed for

Data Analysis
Statistics
Machine Learning
Artificial Intelligence
Data Visualization
Research

It was developed by

Ross Ihaka
Robert Gentleman

at the University of Auckland.

Today R is maintained by the R Foundation.

Why Learn R?

Advantages

✔ Free

✔ Open Source

✔ Easy to Learn

✔ Powerful Graphics

✔ Huge Package Library

✔ Excellent Statistical Functions

✔ Cross Platform

Applications of R

R is widely used in

Data Science
Business Analytics
Bioinformatics
Finance
Healthcare
Marketing
Machine Learning
Research

Installing R

Step 1

Download R from

https://cran.r-project.org

Install normally.

Step 2

Download RStudio

https://posit.co/download/rstudio-desktop/

Install after installing R.

CLASS 2

RStudio Interface

When RStudio opens, four main windows appear.


+---------------------+----------------------+
| Source Editor       | Environment          |
|                     | History              |
+---------------------+----------------------+
| Console             | Files                |
|                     | Plots                |
|                     | Packages             |
|                     | Help                 |
+---------------------+----------------------+

1. Source

Used to

Write scripts
Save programs
Edit code

Shortcut


Ctrl + Shift + N

2. Console

Used to execute commands immediately.

Example


5+10

Output

3. Environment

Shows

Variables
Data
Functions

4. Files

Displays project files.

5. Plots

Displays graphs.

6. Packages

Shows installed packages.

7. Help

Displays documentation.

Example


help(mean)

Understanding the R Command Prompt

Console Prompt

means R is ready.

Example


> 5+2
[1] 7

CLASS 3

Basic Operations

Arithmetic Operators

Operator	Meaning
+	Addition
-	Subtraction
*	Multiplication
/	Division
^	Power
%%	Modulus
%/%	Integer Division

Example


a <- 20
b <- 6

a+b
a-b
a*b
a/b
a%%b
a%/%b
a^2

Output

Comparison Operators

Operator	Meaning
>	Greater
<	Less
>=	Greater Equal
<=	Less Equal
==	Equal
!=	Not Equal

Example


10>5
5==5
10!=2

Output


TRUE
TRUE
TRUE

Logical Operators

Operator	Meaning
&	AND
\|	OR
!	NOT

Example


TRUE & FALSE
TRUE | FALSE
!TRUE

Output


FALSE
TRUE
FALSE

CLASS 4

Data Types

R supports many data types.

Numeric


x <- 10.5

class(x)
typeof(x)

Output


"numeric"

"double"

Integer


x <- 10L

class(x)

Output


"integer"

Character


name <- "Rahul"

class(name)

Output


"character"

Logical


flag <- TRUE

class(flag)

Output


"logical"

Factor


gender <- factor(c("Male","Female","Male"))

gender

Output


Male Female Male

Levels:
Female Male

Variable Assignment

There are three assignment operators.


x <- 10

y = 20

30 -> z

Output


x=10

y=20

z=30

Variable Naming Rules

✔ Can contain letters

✔ Numbers

✔ Underscore

✔ Dot

Cannot start with numbers.

Correct


student_name

age

salary1

marks.math

Wrong


1age

my-name

CLASS 5

Data Structures in R

Vector

A vector stores similar data.

Create Vector


marks <- c(80,90,75,85,95)

marks

Output


80 90 75 85 95

Length


length(marks)

Output

Class


class(marks)

Output


"numeric"

Type


typeof(marks)

Output


"double"

Indexing


marks[2]

Output

Multiple Values


marks[c(2,4)]

Output


90
85

Functions


sum(marks)

mean(marks)

max(marks)

min(marks)

Output

List

Lists store different data types.


student <- list(
Name="Amit",
Age=20,
Marks=85,
Passed=TRUE
)

student

Output


$Name
"Amit"

$Age
20

$Marks
85

$Passed
TRUE

Access


student$Name

student[[2]]

Output


"Amit"

20

Matrix

Stores data in rows and columns.


mat <- matrix(1:9,nrow=3,ncol=3)

mat

Output

Indexing


mat[2,3]

Output

Matrix Addition


A<-matrix(1:4,2,2)

B<-matrix(5:8,2,2)

A+B

Output


6 10

8 12

CLASS 6

Data Frame and Factors

Data Frame

Most important data structure.


student <- data.frame(

Roll=c(1,2,3),

Name=c("A","B","C"),

Marks=c(90,85,95)

)

student

Output


Roll Name Marks

1 A 90

2 B 85

3 C 95

Structure


str(student)

Summary


summary(student)

Access Column


student$Marks

First Row


student[1,]

Import CSV


data <- read.csv("student.csv")

head(data)

Export CSV


write.csv(student,"student.csv")

Factors

Factors store categorical data.

Example


grade <- factor(c(

"A",

"B",

"A",

"C",

"B"

))

grade

Output


A

B

A

C

B

Levels

A B C

Levels


levels(grade)

Output


"A"

"B"

"C"

Frequency


table(grade)

Output


A 2

B 2

C 1

Summary of Data Structures

Data Structure	Stores
Vector	Same Data Type
List	Different Data Types
Matrix	2D Same Data Type
Data Frame	Tabular Data
Factor	Categorical Data

Common Built-in Functions

Function	Purpose
length()	Number of elements
class()	Data class
typeof()	Internal type
sum()	Addition
mean()	Average
max()	Maximum
min()	Minimum
str()	Structure
summary()	Summary
head()	First rows
tail()	Last rows
table()	Frequency

Practical Exercises

Create two variables and perform all arithmetic operations.
Compare two numbers using comparison operators.
Demonstrate logical operators using TRUE and FALSE.
Create variables of numeric, integer, character, logical, and factor types.
Create a vector of 10 numbers and calculate its sum, mean, maximum, and minimum.
Create a list containing a student's name, age, course, and marks.
Create a 3×3 matrix and print the second row.
Create a data frame of five students with roll number, name, and marks.
Import a CSV file and display the first five records.
Create a factor for student grades and display the frequency of each grade.

Viva Questions

What is R?
What is RStudio?
What is the difference between R and RStudio?
What are the data types in R?
Explain vectors with an example.
What is a list?
What is a matrix?
What is a data frame?
What are factors?
Explain the difference between class() and typeof().
What is the use of summary()?
What is indexing in R?
How do you import a CSV file?
How do you export a CSV file?
Why are factors important in statistical analysis?

📘 Module 2: Data Manipulation and Management (10 Classes)

📚 Syllabus

1. Data Import and Export

Reading data from CSV files
Reading data from Excel files
Writing data to CSV files
Writing data to Excel files

2. Data Cleaning and Preparation

Handling missing values (NA)
Detecting and removing duplicates
Data type conversion
Renaming rows and columns

3. Data Transformation

Selecting columns (select())
Filtering rows (filter())
Arranging data (arrange())
Creating new variables (mutate())
Transforming variables (transmute())
Summarizing data (summarise())
Grouping data (group_by())

📖 Class-wise Course Plan

Class	Topics
Class 1	Introduction to Data Manipulation, Reading CSV Files (`read.csv()`)
Class 2	Reading Excel Files (`readxl`), Importing Different File Formats
Class 3	Writing Data to CSV and Excel (`write.csv()`, `writexl`)
Class 4	Data Cleaning: Missing Values (`NA`), `is.na()`, `na.omit()`
Class 5	Handling Duplicate Records, Data Type Conversion
Class 6	Renaming Rows and Columns, Working with Data Frames
Class 7	Data Transformation: `select()`, `filter()`, `arrange()`
Class 8	`mutate()`, `transmute()`, Creating New Variables
Class 9	`summarise()`, `group_by()`, Statistical Summaries
Class 10	Complete Data Cleaning & Transformation Case Study, Revision, Viva Questions, Lab Exercises

Class 1: Data Import and Export – Reading Data from CSV Files

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the concept of data import.
Know different file formats supported by R.
Read CSV files into R.
Display and inspect imported data.
Understand the structure of a data frame.
Perform basic data exploration.

📖 2.1 Introduction to Data Import

Definition

Data Import is the process of loading data from external sources into R for analysis and visualization.

Most real-world datasets are stored in external files such as:

CSV Files
Excel Files
Text Files
JSON Files
Database Tables

R provides powerful functions to import these datasets efficiently.

🌟 Why Data Import is Important?

Data import is the first step in any data analysis project because it allows users to work with real-world datasets.

Advantages

Imports large datasets quickly.
Supports multiple file formats.
Easy to analyze imported data.
Compatible with data visualization and machine learning.

📊 Common Data File Formats

File Format	Extension	Description
CSV	`.csv`	Comma-Separated Values
Excel	`.xlsx`	Microsoft Excel Workbook
Text	`.txt`	Plain Text File
JSON	`.json`	JavaScript Object Notation
R Data	`.RData`	Native R Data File

📖 2.2 What is a CSV File?

CSV stands for Comma-Separated Values.

Each row represents one record, and each column represents one variable.

CSV is the most widely used format for data exchange because it is simple and supported by almost every software application.

📊 Sample CSV Dataset (10 Records)

File Name: employee.csv

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	31	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
110	Anjali	HR	32	52000

📖 2.3 Creating a CSV File

The dataset above can be saved in Notepad or Microsoft Excel as:


employee.csv

CSV Content


Emp_ID,Name,Department,Age,Salary
101,Amit,HR,25,30000
102,Priya,Sales,28,35000
103,Rahul,IT,30,50000
104,Sneha,HR,27,32000
105,Karan,IT,35,60000
106,Neha,Finance,31,55000
107,Arjun,Sales,29,40000
108,Pooja,Finance,33,58000
109,Rohan,IT,26,45000
110,Anjali,HR,32,52000

🔵 2.4 Reading a CSV File

Method 1: Using `read.csv()`

Syntax


read.csv(file, header = TRUE)

Parameters

Parameter	Description
`file`	CSV file path
`header`	TRUE if the first row contains column names

💻 Example 1: Read Employee Data


employee <- read.csv("employee.csv")

employee

Output


   Emp_ID   Name Department Age Salary
1     101   Amit         HR  25  30000
2     102  Priya      Sales  28  35000
3     103  Rahul         IT  30  50000
4     104 Sneha         HR  27  32000
5     105 Karan         IT  35  60000
6     106  Neha    Finance  31  55000
7     107 Arjun      Sales  29  40000
8     108 Pooja    Finance  33  58000
9     109 Rohan         IT  26  45000
10    110 Anjali        HR  32  52000

Explanation

read.csv() imports the CSV file.
The data is stored as a data frame.
Each row represents one employee.
Each column represents one variable.

💻 Example 2: View the First Six Records


head(employee)

Output


  Emp_ID Name Department Age Salary
1    101 Amit HR         25 30000
2    102 Priya Sales     28 35000
3    103 Rahul IT        30 50000
4    104 Sneha HR        27 32000
5    105 Karan IT        35 60000
6    106 Neha Finance    31 55000

💻 Example 3: View the Last Six Records


tail(employee)

Output


  Emp_ID Name Department Age Salary
5    105 Karan IT        35 60000
6    106 Neha Finance    31 55000
7    107 Arjun Sales     29 40000
8    108 Pooja Finance   33 58000
9    109 Rohan IT        26 45000
10   110 Anjali HR       32 52000

💻 Example 4: Display Structure of Dataset


str(employee)

Output


'data.frame': 10 obs. of 5 variables:

$ Emp_ID      : int
$ Name        : chr
$ Department  : chr
$ Age         : int
$ Salary      : int

Explanation

str() displays:

Number of rows
Number of columns
Data types of variables

💻 Example 5: Dataset Dimensions


dim(employee)

Output


[1] 10 5

Interpretation: The dataset contains 10 rows and 5 columns.

💻 Example 6: Column Names


colnames(employee)

Output


[1] "Emp_ID" "Name" "Department" "Age" "Salary"

💻 Example 7: Row Names


rownames(employee)

Output


[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"

💻 Example 8: Summary of Dataset


summary(employee)

Output (Example)


Emp_ID
Min. :101
1st Qu.:103.25
Median :105.5
Mean :105.5
3rd Qu.:107.75
Max. :110

Age
Min. :25
Mean :29.6
Max. :35

Salary
Min. :30000
Mean :45700
Max. :60000

💻 Example 9: Display Individual Column


employee$Salary

Output


[1] 30000 35000 50000 32000 60000
[6] 55000 40000 58000 45000 52000

💻 Example 10: Display Multiple Columns


employee[,c("Name","Salary")]

Output


     Name Salary

1    Amit 30000

2   Priya 35000

3   Rahul 50000

4   Sneha 32000

5   Karan 60000

6    Neha 55000

7   Arjun 40000

8   Pooja 58000

9   Rohan 45000

10 Anjali 52000

📊 Common Functions for Exploring Data

Function	Purpose
`head()`	First 6 rows
`tail()`	Last 6 rows
`str()`	Structure
`summary()`	Statistical summary
`dim()`	Rows and columns
`nrow()`	Number of rows
`ncol()`	Number of columns
`colnames()`	Column names
`rownames()`	Row names

🌍 Real-Life Applications

Importing student records
Employee databases
Sales reports
Banking transactions
Hospital patient data
Survey results
Research datasets
Machine learning datasets

✔ Advantages of CSV Files

Easy to create and edit.
Lightweight and portable.
Supported by Excel, R, Python, and databases.
Ideal for data exchange.

✖ Limitations

Does not store formatting.
Does not support formulas.
No multiple worksheets (unlike Excel).
Data types are not preserved automatically.

📝 Lab Exercises

Create an employee.csv file with 10 employee records.
Import the file using read.csv().
Display the first and last six records.
Find the number of rows and columns.
Display the structure of the dataset.
Print only the Name and Salary columns.
Generate a statistical summary using summary().

❓ Viva Questions

What is a CSV file?
What is the purpose of read.csv()?
What does the header argument do?
Which function displays the first six rows?
Which function shows the structure of a dataset?
How do you display column names?
What is the difference between head() and tail()?
What information does summary() provide?
Name two advantages of CSV files.
Give two real-world applications of importing CSV data.

📚 Class Summary

In this class, you learned:

The concept of data import.
CSV file structure.
Reading CSV files using read.csv().
Exploring datasets with head(), tail(), str(), dim(), and summary().
Practical examples using a 10-record employee dataset.
Real-world applications, advantages, limitations, exercises, and viva questions.

Class 2: Data Import and Export – Reading Data from Excel Files (.xlsx)

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand Excel file formats.
Install and use the readxl package.
Read Excel files into R.
Import specific worksheets.
Read multiple sheets from an Excel workbook.
Explore imported data using R functions.
Compare CSV and Excel file formats.

📖 2.5 Introduction to Excel Files

Definition

An Excel file is a spreadsheet created using Microsoft Excel. It stores data in rows and columns and may contain multiple worksheets, formulas, charts, and formatting.

Unlike CSV files, Excel files can store multiple sheets in a single workbook.

🌟 Advantages of Excel Files

Multiple worksheets in one file
Supports formulas and functions
Can contain charts and graphs
Easy to edit using Microsoft Excel
Widely used in businesses and organizations

📊 Excel File Extensions

Extension	Description
`.xls`	Excel 97–2003 Workbook
`.xlsx`	Excel 2007 and Later Workbook
`.xlsm`	Macro-Enabled Workbook

📖 2.6 The `readxl` Package

The readxl package is used to import Excel files into R.

If it is not installed, install it once using:

Install Package


install.packages("readxl")

Load Package


library(readxl)

📊 Sample Excel File

File Name: employee.xlsx

Worksheet: Employee

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	31	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
110	Anjali	HR	32	52000

📖 2.7 Reading an Excel File

Syntax


read_excel(path)

💻 Example 1: Read an Excel File


library(readxl)

employee <- read_excel("employee.xlsx")

employee

Output


# A tibble: 10 × 5

 Emp_ID Name   Department Age Salary

1 101   Amit   HR          25 30000
2 102   Priya  Sales       28 35000
3 103   Rahul  IT          30 50000
4 104   Sneha  HR          27 32000
5 105   Karan  IT          35 60000
6 106   Neha   Finance     31 55000
7 107   Arjun  Sales       29 40000
8 108   Pooja  Finance     33 58000
9 109   Rohan  IT          26 45000
10 110  Anjali HR          32 52000

💻 Example 2: Read a Specific Worksheet

Suppose the workbook contains two sheets:

Employee
Salary


library(readxl)

employee <- read_excel(
"employee.xlsx",
sheet="Employee"
)

employee

Output

Displays all records from the Employee worksheet.

💻 Example 3: Read Sheet by Number


library(readxl)

employee <- read_excel(
"employee.xlsx",
sheet=1
)

Output

Imports the first worksheet.

💻 Example 4: Display Available Sheet Names


library(readxl)

excel_sheets("employee.xlsx")

Output


[1] "Employee"

[2] "Salary"

💻 Example 5: Read Selected Columns


library(readxl)

employee <- read_excel(
"employee.xlsx",
range="A:C"
)

employee

Output


Emp_ID Name Department

101 Amit HR

102 Priya Sales

103 Rahul IT

...

110 Anjali HR

💻 Example 6: Read Specific Cell Range


library(readxl)

employee <- read_excel(
"employee.xlsx",
range="A1:E6"
)

employee

Output

Imports only the first six rows.

💻 Example 7: View Dataset Structure


str(employee)

Output


tibble [10 × 5]

Emp_ID : numeric

Name : character

Department : character

Age : numeric

Salary : numeric

💻 Example 8: Display Summary


summary(employee)

Output


Emp_ID

Min :101

Mean :105.5

Max :110

Age

Min :25

Mean :29.6

Max :35

Salary

Min :30000

Mean :45700

Max :60000

💻 Example 9: First Six Records


head(employee)

Output


First six employee records are displayed.

💻 Example 10: Last Six Records


tail(employee)

Output


Last six employee records are displayed.

📊 Comparison: CSV vs Excel

Feature	CSV	Excel
File Extension	`.csv`	`.xlsx`
Multiple Sheets	❌ No	✅ Yes
Supports Formatting	❌ No	✅ Yes
Supports Charts	❌ No	✅ Yes
File Size	Small	Larger
Speed	Faster	Slightly Slower
Best For	Data Exchange	Business Reports

🌍 Real-Life Applications

Student attendance records
Employee payroll
Banking reports
Hospital patient data
Sales reports
Inventory management
Research datasets
Financial statements

✔ Advantages of `readxl`

Reads Excel files directly.
Supports .xls and .xlsx.
Imports selected sheets.
Imports selected cell ranges.
Fast and reliable.

✖ Limitations

Cannot modify Excel files (reading only).
Formatting is not imported.
Macros are ignored.
Charts and images are not imported.

📝 Lab Exercises

Exercise 1

Install the readxl package.

Exercise 2

Read an Excel file named employee.xlsx.

Exercise 3

Display available worksheet names.

Exercise 4

Read only the first worksheet.

Exercise 5

Import only columns A to C.

Exercise 6

Import rows 1–6 from the worksheet.

Exercise 7

Display the structure and summary of the imported dataset.

❓ Viva Questions

What is an Excel workbook?
Which package is used to read Excel files in R?
Which function imports Excel data?
What is the purpose of excel_sheets()?
How do you read a worksheet by name?
How do you read a worksheet by number?
What is the difference between CSV and Excel?
Can readxl read .xls files?
Can readxl import charts?
Give two applications of Excel data import.

📚 Class Summary

In this class, you learned:

Introduction to Excel files.
Installing and loading the readxl package.
Reading Excel files with read_excel().
Importing specific worksheets and ranges.
Viewing sheet names with excel_sheets().
Comparing CSV and Excel formats.
Practical R programs with outputs.
Real-world applications, exercises, and viva questions.

Class 3: Data Export – Writing Data to CSV and Excel Files

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand data export in R.
Write data frames to CSV files.
Write data frames to Excel files.
Export selected columns and filtered data.
Save processed data for future use.
Understand the differences between CSV and Excel exports.

📖 2.8 Introduction to Data Export

Definition

Data Export is the process of saving data from R into an external file so that it can be used in other software such as Microsoft Excel, LibreOffice Calc, databases, or shared with others.

Common export formats include:

CSV (.csv)
Excel (.xlsx)
Text (.txt)
RData (.RData)

🌟 Why Data Export is Important?

Data export allows users to:

Save processed datasets.
Share reports with others.
Store analysis results.
Create backup copies.
Use data in other applications.

📊 Sample Dataset (10 Records)


employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
       "Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
             "Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
         55000,40000,58000,45000,52000)
)

employee

Output


   Emp_ID   Name Department Age Salary
1     101   Amit         HR  25 30000
2     102  Priya      Sales  28 35000
3     103  Rahul         IT  30 50000
4     104  Sneha         HR  27 32000
5     105  Karan         IT  35 60000
6     106   Neha    Finance  31 55000
7     107  Arjun      Sales  29 40000
8     108  Pooja    Finance  33 58000
9     109  Rohan         IT  26 45000
10    110 Anjali         HR  32 52000

🔵 2.9 Writing Data to a CSV File

Syntax


write.csv(data, file, row.names = FALSE)

Parameters

Parameter	Description
`data`	Data frame to export
`file`	Output file name
`row.names=FALSE`	Prevents row numbers from being written

💻 Example 1: Export Entire Dataset


write.csv(employee,
          "employee.csv",
          row.names=FALSE)

Output


employee.csv created successfully.

💻 Example 2: Export Selected Columns


emp_salary <- employee[,c("Name","Salary")]

write.csv(emp_salary,
          "salary.csv",
          row.names=FALSE)

Output


salary.csv created successfully.

💻 Example 3: Export Employees from IT Department


IT_emp <- subset(employee,
                 Department=="IT")

write.csv(IT_emp,
          "IT_Employees.csv",
          row.names=FALSE)

Output


IT_Employees.csv created successfully.

💻 Example 4: Export Employees with Salary > 50,000


high_salary <- subset(employee,
                      Salary>50000)

write.csv(high_salary,
          "HighSalary.csv",
          row.names=FALSE)

Output


HighSalary.csv created successfully.

🟢 2.10 Writing Data to Excel Files

R uses the writexl package to export Excel files.

Install Package


install.packages("writexl")

Load Package


library(writexl)

Syntax


write_xlsx(data, path)

💻 Example 5: Export to Excel


library(writexl)

write_xlsx(employee,
           "employee.xlsx")

Output


employee.xlsx created successfully.

💻 Example 6: Export Salary Data


salary_data <- employee[,c("Name","Salary")]

write_xlsx(salary_data,
           "EmployeeSalary.xlsx")

Output


EmployeeSalary.xlsx created successfully.

💻 Example 7: Export HR Department


HR_emp <- subset(employee,
                 Department=="HR")

write_xlsx(HR_emp,
           "HR_Department.xlsx")

Output


HR_Department.xlsx created successfully.

💻 Example 8: Export Finance Department


Finance_emp <- subset(employee,
                      Department=="Finance")

write_xlsx(Finance_emp,
           "Finance.xlsx")

Output


Finance.xlsx created successfully.

💻 Example 9: Export Employees Older Than 30


older_emp <- subset(employee,
                    Age>30)

write.csv(older_emp,
          "AgeAbove30.csv",
          row.names=FALSE)

Output


AgeAbove30.csv created successfully.

💻 Example 10: Export Summary Statistics


summary_data <- summary(employee)

write.table(summary_data,
            "Summary.txt")

Output


Summary.txt created successfully.

📊 Comparison: `write.csv()` vs `write_xlsx()`

Feature	`write.csv()`	`write_xlsx()`
Output Format	CSV	Excel
Multiple Sheets	❌ No	❌ No (basic usage)
File Size	Smaller	Larger
Readable in Excel	✅ Yes	✅ Yes
Supports Formatting	❌ No	Limited

🌍 Real-Life Applications

Exporting employee payroll reports.
Saving student examination results.
Generating monthly sales reports.
Creating financial statements.
Exporting survey responses.
Sharing machine learning results.
Backing up processed datasets.
Sending reports to management.

✔ Advantages

Saves processed data permanently.
Easy to share with others.
Compatible with Excel and other software.
Useful for report generation.
Supports automation.

✖ Limitations

CSV files cannot store formatting.
Excel export requires an additional package.
Charts and formulas are not exported automatically.

📝 Lab Exercises

Create a data frame containing 10 student records.
Export the data frame to a CSV file.
Export only the Name and Marks columns.
Export students scoring more than 80 marks.
Export the dataset to an Excel file.
Create separate Excel files for different departments.
Generate a summary report and save it as a text file.

❓ Viva Questions

What is data export?
Which function exports data to CSV?
Why is row.names = FALSE commonly used?
Which package is used to export Excel files?
What is the purpose of write_xlsx()?
Can CSV files store formatting?
Name two advantages of exporting data.
What is the difference between CSV and Excel export?
How can you export only selected columns?
Give two real-life applications of data export.

📚 Class Summary

In this class, you learned:

The concept of data export.
Writing data frames to CSV files using write.csv().
Writing Excel files using the writexl package.
Exporting filtered and selected datasets.
Comparison of CSV and Excel exports.
Practical examples with outputs.
Real-world applications, lab exercises, and viva questions.

Class 4: Data Cleaning and Preparation – Handling Missing Values (NA)

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand missing values (NA) in R.
Identify missing values in datasets.
Count missing values.
Remove missing values.
Replace missing values.
Perform statistical analysis after handling missing data.

📖 2.11 Introduction to Data Cleaning

Definition

Data Cleaning is the process of detecting, correcting, or removing inaccurate, incomplete, duplicate, or inconsistent data from a dataset.

Data cleaning is one of the most important steps in Data Science, Machine Learning, and Statistical Analysis because the quality of the analysis depends on the quality of the data.

🌟 Why Data Cleaning is Important?

Data cleaning helps to:

Improve data quality.
Increase the accuracy of analysis.
Remove errors and inconsistencies.
Handle missing values effectively.
Improve machine learning model performance.

📖 2.12 What are Missing Values?

A missing value is a data value that is unavailable or unknown. In R, missing values are represented by NA (Not Available).

Common Causes of Missing Values

Data entry errors
Survey respondents skipping questions
Equipment or sensor failures
Data transmission errors
Incomplete records

📊 Sample Dataset (10 Records)


student <- data.frame(

Roll_No=c(1,2,3,4,5,6,7,8,9,10),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
       "Neha","Arjun","Pooja","Rohan","Anjali"),

Marks=c(85,NA,78,92,NA,81,75,88,NA,95),

Age=c(20,21,20,22,21,NA,20,22,21,20)

)

student

Output


   Roll_No   Name Marks Age
1        1   Amit    85  20
2        2  Priya    NA  21
3        3  Rahul    78  20
4        4  Sneha    92  22
5        5  Karan    NA  21
6        6   Neha    81  NA
7        7  Arjun    75  20
8        8  Pooja    88  22
9        9  Rohan    NA  21
10      10 Anjali    95  20

📖 2.13 Detecting Missing Values

Syntax


is.na(object)

is.na() checks each value and returns TRUE if it is missing, otherwise FALSE.

💻 Example 1: Detect Missing Values


is.na(student)

Output


      Roll_No Name Marks Age
1      FALSE FALSE FALSE FALSE
2      FALSE FALSE TRUE  FALSE
3      FALSE FALSE FALSE FALSE
4      FALSE FALSE FALSE FALSE
5      FALSE FALSE TRUE  FALSE
6      FALSE FALSE FALSE TRUE
7      FALSE FALSE FALSE FALSE
8      FALSE FALSE FALSE FALSE
9      FALSE FALSE TRUE  FALSE
10     FALSE FALSE FALSE FALSE

💻 Example 2: Count Missing Values


sum(is.na(student))

Output


[1] 4

Explanation: There are 4 missing values in the dataset.

💻 Example 3: Missing Values in Each Column


colSums(is.na(student))

Output


Roll_No   0
Name      0
Marks     3
Age       1

💻 Example 4: Missing Values in Each Row


rowSums(is.na(student))

Output

📖 2.14 Removing Missing Values

Syntax


na.omit(data)

💻 Example 5: Remove Missing Records


clean_student <- na.omit(student)

clean_student

Output


Roll_No Name Marks Age

1 Amit 85 20

3 Rahul 78 20

4 Sneha 92 22

7 Arjun 75 20

8 Pooja 88 22

10 Anjali 95 20

Explanation

Rows containing missing values are removed.

📖 2.15 Replacing Missing Values

Instead of deleting rows, missing values can be replaced.

💻 Example 6: Replace Missing Marks with Zero


student$Marks[is.na(student$Marks)] <- 0

student

Output

💻 Example 7: Replace Missing Age with Mean Age


student$Age[is.na(student$Age)] <-

mean(student$Age, na.rm=TRUE)

student

Output

Explanation

na.rm=TRUE ignores missing values while calculating the mean.

💻 Example 8: Calculate Mean Without Missing Values


mean(student$Marks, na.rm=TRUE)

Output


[1] 84.86

💻 Example 9: Calculate Median


median(student$Marks, na.rm=TRUE)

Output


[1] 84.5

💻 Example 10: Standard Deviation


sd(student$Marks, na.rm=TRUE)

Output


[1] 7.38

(Approximate value.)

📖 2.16 Methods for Handling Missing Values

Method	Description
Delete rows	Remove incomplete records
Replace with Mean	Numerical data
Replace with Median	Skewed numerical data
Replace with Mode	Categorical data
Predict Missing Values	Machine learning techniques

📊 Useful Functions

Function	Purpose
`is.na()`	Detect missing values
`sum(is.na())`	Count missing values
`colSums(is.na())`	Missing values by column
`rowSums(is.na())`	Missing values by row
`na.omit()`	Remove missing rows
`mean(..., na.rm=TRUE)`	Ignore missing values
`median(..., na.rm=TRUE)`	Ignore missing values
`sd(..., na.rm=TRUE)`	Standard deviation without missing values

🌍 Real-Life Applications

Student attendance records
Hospital patient databases
Banking transactions
Insurance claims
Sales and inventory management
Customer feedback analysis
Survey data cleaning
Machine learning preprocessing

✔ Advantages

Improves data quality.
Increases analysis accuracy.
Prevents errors in statistical calculations.
Enhances model performance.
Produces reliable reports.

✖ Disadvantages

Removing records may reduce dataset size.
Replacing values may introduce bias if done incorrectly.
Requires careful selection of imputation methods.

📝 Lab Exercises

Create a dataset with 10 student records containing missing values.
Detect missing values using is.na().
Count total missing values.
Find missing values in each column.
Find missing values in each row.
Remove missing records using na.omit().
Replace missing marks with 0.
Replace missing ages with the mean age.
Calculate the mean and median while ignoring missing values.
Find the standard deviation of marks after handling missing values.

❓ Viva Questions

What is a missing value in R?
How are missing values represented in R?
What is the purpose of is.na()?
What does na.omit() do?
Why is na.rm=TRUE used?
How can missing values be counted?
What are common causes of missing data?
When should you replace missing values instead of deleting rows?
What are the advantages of handling missing values?
Give two real-life applications of data cleaning.

📚 Class Summary

In this class, you learned:

The concept of data cleaning.
Missing values (NA) and their causes.
Detecting missing values using is.na().
Counting missing values.
Removing missing records with na.omit().
Replacing missing values with constants and statistical measures.
Practical examples with outputs.
Real-world applications, exercises, and viva questions.

Class 5: Handling Duplicate Records and Data Type Conversion

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand duplicate records in datasets.
Detect duplicate rows and values.
Remove duplicate records.
Understand different data types in R.
Convert data between numeric, character, factor, and logical types.
Apply data type conversion in real-world datasets.

📖 2.17 Introduction to Duplicate Data

Definition

A duplicate record is a row or value that appears more than once in a dataset.

Duplicate data may occur because of:

Repeated data entry
System errors
Database merging
Data import from multiple sources

Duplicate records can lead to inaccurate statistical analysis and incorrect reports.

🌟 Why Remove Duplicate Records?

Removing duplicates helps to:

Improve data quality.
Reduce storage space.
Increase analysis accuracy.
Prevent incorrect statistical results.
Improve machine learning performance.

📊 Sample Dataset (10 Records)


employee <- data.frame(

Emp_ID=c(101,102,103,104,105,103,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
       "Rahul","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
             "IT","Sales","Finance","IT","HR"),

Salary=c(30000,35000,50000,32000,60000,
         50000,40000,58000,45000,52000)

)

employee

Output


Emp_ID Name Department Salary

101 Amit HR 30000

102 Priya Sales 35000

103 Rahul IT 50000

104 Sneha HR 32000

105 Karan IT 60000

103 Rahul IT 50000

107 Arjun Sales 40000

108 Pooja Finance 58000

109 Rohan IT 45000

110 Anjali HR 52000

📖 2.18 Detecting Duplicate Records

Syntax


duplicated(data)

💻 Example 1: Detect Duplicate Rows


duplicated(employee)

Output


[1]

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

FALSE

FALSE

FALSE

Explanation

The 6th row is a duplicate of the 3rd row.

💻 Example 2: Display Duplicate Records


employee[duplicated(employee),]

Output


Emp_ID Name Department Salary

103 Rahul IT 50000

💻 Example 3: Count Duplicate Records


sum(duplicated(employee))

Output


[1] 1

💻 Example 4: Remove Duplicate Records


employee_unique <- employee[!duplicated(employee),]

employee_unique

Output


Duplicate row removed successfully.

Total Records = 9

💻 Example 5: Detect Duplicate Employee IDs


duplicated(employee$Emp_ID)

Output


FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

FALSE

FALSE

FALSE

📖 2.19 Data Types in R

R supports different types of data.

Data Type	Description	Example
Numeric	Numbers	100
Character	Text	"Amit"
Logical	TRUE/FALSE	TRUE
Factor	Categories	HR, Sales

📖 2.20 Data Type Conversion

Data type conversion changes one data type into another.

💻 Example 6: Numeric to Character


x <- 100

class(x)

x <- as.character(x)

class(x)

Output


[1] "numeric"

[1] "character"

💻 Example 7: Character to Numeric


x <- "250"

class(x)

x <- as.numeric(x)

class(x)

Output


[1] "character"

[1] "numeric"

💻 Example 8: Character to Factor


department <- c(

"HR",

"Sales",

"IT",

"HR",

"Finance"

)

factor_department <-

as.factor(department)

factor_department

Output


[1]

HR

Sales

IT

HR

Finance

Levels:

Finance

HR

IT

Sales

💻 Example 9: Numeric to Logical


x <- c(1,0,5)

as.logical(x)

Output


[1]

TRUE

FALSE

TRUE

Explanation

0 becomes FALSE.
Any non-zero value becomes TRUE.

💻 Example 10: Check Data Type


class(employee)

str(employee)

Output


[1]

"data.frame"

'data.frame':

10 obs.

4 variables

📊 Common Conversion Functions

Function	Purpose
`as.numeric()`	Convert to numeric
`as.character()`	Convert to character
`as.factor()`	Convert to factor
`as.logical()`	Convert to logical
`class()`	Display data type
`str()`	Display structure

📊 Comparison of Data Types

Type	Stores	Example
Numeric	Numbers	100
Character	Text	"Amit"
Logical	TRUE/FALSE	TRUE
Factor	Categories	HR

🌍 Real-Life Applications

Duplicate Handling

Banking transactions
Employee databases
Hospital patient records
Student admission systems
Customer databases

Data Type Conversion

Machine learning preprocessing
Survey analysis
Statistical modeling
Financial analysis
Database management

✔ Advantages

Removes redundant information.
Improves dataset quality.
Ensures correct data types for analysis.
Enhances model accuracy.
Simplifies data manipulation.

✖ Disadvantages

Removing duplicates without verification may delete valid records.
Incorrect data type conversion may cause data loss.
Requires careful validation before conversion.

📝 Lab Exercises

Create a dataset containing duplicate employee records.
Detect duplicate rows using duplicated().
Count duplicate records.
Remove duplicate records.
Detect duplicate employee IDs.
Convert numeric data to character.
Convert character data to numeric.
Convert department names to factors.
Convert numeric values to logical.
Display the structure of the dataset.

❓ Viva Questions

What is a duplicate record?
Which function detects duplicate rows?
How can duplicate rows be removed?
What is the purpose of duplicated()?
What are the four basic data types in R?
Which function converts data to numeric?
Which function converts data to character?
What is a factor in R?
How does as.logical() work?
Why is data type conversion important?

📚 Class Summary

In this class, you learned:

Duplicate records and their effects.
Detecting and removing duplicate data.
Basic data types in R.
Data type conversion using as.numeric(), as.character(), as.factor(), and as.logical().
Practical examples with outputs.
Real-world applications, exercises, and viva questions.

Class 6: Renaming Columns and Rows in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the importance of meaningful column and row names.
Rename columns using colnames(), names(), and rename().
Rename rows using rownames().
Rename multiple columns simultaneously.
Apply renaming techniques in real-world datasets.

📖 2.21 Introduction to Renaming

Definition

Renaming is the process of changing the names of columns or rows in a dataset to make them more meaningful, readable, and easier to understand.

For example:

Old Name	New Name
M1	Marks
Dept	Department
Sal	Salary
Age1	Age

Using meaningful names improves code readability and makes data analysis easier.

🌟 Why Rename Columns and Rows?

Renaming helps to:

Improve readability.
Use meaningful variable names.
Avoid confusion during analysis.
Make reports easier to understand.
Prepare data for machine learning and visualization.

📊 Sample Dataset (10 Records)


employee <- data.frame(

ID=c(101,102,103,104,105,106,107,108,109,110),

EmpName=c("Amit","Priya","Rahul","Sneha","Karan",
          "Neha","Arjun","Pooja","Rohan","Anjali"),

Dept=c("HR","Sales","IT","HR","IT",
       "Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Sal=c(30000,35000,50000,32000,60000,
      55000,40000,58000,45000,52000)
)

employee

Output

ID	EmpName	Dept	Age	Sal
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	31	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
110	Anjali	HR	32	52000

📖 2.22 Renaming Columns Using `colnames()`

Syntax


colnames(dataframe) <- c("Column1","Column2",...)

💻 Example 1: Rename All Columns


colnames(employee) <- c("Emp_ID",
                        "Name",
                        "Department",
                        "Age",
                        "Salary")

employee

Output

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
...	...	...	...	...

💻 Example 2: Display Column Names


colnames(employee)

Output


[1] "Emp_ID"

[2] "Name"

[3] "Department"

[4] "Age"

[5] "Salary"

📖 2.23 Renaming Columns Using `names()`

names() works similarly to colnames().

Syntax


names(dataframe)

💻 Example 3


names(employee)

Output


[1]

"Emp_ID"

"Name"

"Department"

"Age"

"Salary"

💻 Example 4: Rename One Column


names(employee)[5] <- "Monthly_Salary"

employee

Output

Emp_ID	Name	Department	Age	Monthly_Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
...	...	...	...	...

📖 2.24 Renaming Rows

Rows can also have names.

Syntax


rownames(dataframe)

💻 Example 5: Display Row Names


rownames(employee)

Output


[1]

"1"

"2"

"3"

...

"10"

💻 Example 6: Rename Rows


rownames(employee) <-

paste("Employee",

1:10,

sep="_")

employee

Output


Employee_1

Employee_2

Employee_3

...

Employee_10

📖 2.25 Renaming Using `rename()` from dplyr

The dplyr package provides the rename() function.

Install Package


install.packages("dplyr")

Load Package


library(dplyr)

Syntax


rename(data,

NewName = OldName)

💻 Example 7


library(dplyr)

employee <-

rename(employee,

Salary=Monthly_Salary)

employee

Output

The column Monthly_Salary is renamed to Salary.

💻 Example 8: Rename Multiple Columns


library(dplyr)

employee <-

rename(

employee,

Employee_ID=Emp_ID,

Employee_Name=Name
)

Output

Employee_ID	Employee_Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
...	...	...	...	...

💻 Example 9: Verify Structure


str(employee)

Output


'data.frame':

10 obs.

5 variables

💻 Example 10: Display Dataset


head(employee)

Output


Employee_ID Employee_Name Department Age Salary

101 Amit HR 25 30000

102 Priya Sales 28 35000

103 Rahul IT 30 50000

104 Sneha HR 27 32000

105 Karan IT 35 60000

106 Neha Finance 31 55000

📊 Comparison of Renaming Functions

Function	Purpose
`colnames()`	Rename all columns
`names()`	Rename one or more columns
`rownames()`	Rename rows
`rename()`	Rename selected columns using dplyr

📊 Advantages of Meaningful Column Names

Poor Name	Better Name
M1	Marks
Dept	Department
Sal	Salary
Emp	Employee_Name
ID	Employee_ID

🌍 Real-Life Applications

Employee management systems
Student databases
Banking records
Hospital patient databases
Inventory management
Sales reporting
Data visualization
Machine learning preprocessing

✔ Advantages

Improves readability.
Makes code easier to understand.
Helps create professional reports.
Simplifies data manipulation.
Enhances collaboration among team members.

✖ Limitations

Renaming columns incorrectly may break existing code.
Duplicate column names should be avoided.
Frequent renaming may reduce code consistency.

📝 Lab Exercises

Create a dataset containing 10 employee records.
Rename all column names using colnames().
Display column names.
Rename one column using names().
Display row names.
Rename all row names.
Install and load the dplyr package.
Rename one column using rename().
Rename two columns simultaneously.
Display the structure of the renamed dataset.

❓ Viva Questions

What is the purpose of renaming columns?
Which function changes column names?
Which function changes row names?
What is the difference between colnames() and names()?
Which package contains rename()?
How do you rename multiple columns?
Why are meaningful column names important?
Can row names be customized?
What is the syntax of rename()?
Give two real-life applications of renaming data.

📚 Class Summary

In this class, you learned:

The importance of meaningful column and row names.
Renaming columns using colnames() and names().
Renaming rows using rownames().
Using rename() from the dplyr package.
Practical examples with outputs.
Comparison tables, real-world applications, lab exercises, and viva questions.

Class 7: Data Transformation – select(), filter(), and arrange()

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Select specific columns from a dataset.
Filter rows based on conditions.
Sort data in ascending and descending order.
Combine multiple transformation operations.
Use dplyr functions for efficient data analysis.

📖 2.26 Introduction to Data Transformation

Data Transformation means modifying, selecting, filtering, or arranging data into a form suitable for analysis.

R provides powerful transformation functions through the dplyr package.

Install and Load dplyr

📊 Sample Dataset (10 Records)

🔵 2.27 Selecting Columns with select()

Definition

The select() function chooses specific columns from a dataset.

Syntax

💻 Example 1: Select Name and Salary

Output

Name	Salary
Amit	30000
Priya	35000
Rahul	50000
...	...

💻 Example 2: Select Multiple Columns

💻 Example 3: Exclude a Column

🟢 2.28 Filtering Rows with filter()

Definition

The filter() function selects rows that satisfy specified conditions.

Syntax

💻 Example 4: Employees from IT Department

Output

Name	Department
Rahul	IT
Karan	IT
Rohan	IT

💻 Example 5: Salary Greater Than 50,000

💻 Example 6: Multiple Conditions (AND)

💻 Example 7: Multiple Conditions (OR)

🟣 2.29 Arranging Data with arrange()

Definition

The arrange() function sorts rows based on one or more columns.

Syntax

💻 Example 8: Sort by Salary (Ascending)

Output

Name	Salary
Amit	30000
Sneha	32000
Priya	35000
...	...

💻 Example 9: Sort by Salary (Descending)

Output

Name	Salary
Karan	60000
Pooja	58000
Neha	55000
...	...

💻 Example 10: Sort by Department and Salary

📊 Combining Functions

Example: IT Employees Sorted by Salary

Output

Name	Salary
Karan	60000
Rahul	50000
Rohan	45000

📊 Comparison of Functions

Function	Purpose
select()	Choose columns
filter()	Choose rows
arrange()	Sort rows

🌍 Real-Life Applications

Selecting important columns from large databases.
Filtering customers with high purchases.
Sorting employees by salary.
Analyzing sales by region.
Preparing data for machine learning.
Generating management reports.

✔ Advantages

Simple and readable syntax.
Fast processing.
Works well with large datasets.
Easy to combine multiple operations.
Widely used in data science projects.

✖ Limitations

Requires the dplyr package.
Very large datasets may require additional optimization.
Incorrect conditions may produce unexpected results.

📝 Lab Exercises

Select only Name and Salary columns.
Exclude the Age column.
Filter employees from the Sales department.
Filter employees with salary greater than 40,000.
Filter employees from IT with salary greater than 45,000.
Sort employees by Age ascending.
Sort employees by Salary descending.
Sort employees by Department and Salary.
Display only IT employees sorted by salary.
Combine select(), filter(), and arrange() in one program.

❓ Viva Questions

What is data transformation?
What is the purpose of select()?
What is the purpose of filter()?
What is the purpose of arrange()?
How do you sort data in descending order?
How do you apply multiple conditions in filter()?
What does desc() do?
Can select() exclude columns?
What is the pipe operator %>%?
Give two real-life applications of data transformation.

📚 Class Summary

In this class, you learned:

select() for choosing columns.
filter() for selecting rows.
arrange() for sorting data.
Using multiple conditions.
Combining transformation functions with the pipe operator.
Practical examples with outputs.
Real-world applications, exercises, and viva questions.

Class 8: Data Transformation using `mutate()` and `transmute()`

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the purpose of mutate() and transmute().
Create new variables in a dataset.
Modify existing variables.
Perform arithmetic operations on columns.
Calculate bonus, tax, gross salary, and net salary.
Understand the difference between mutate() and transmute().

📖 2.30 Introduction to `mutate()`

Definition

The mutate() function from the dplyr package is used to create new columns or modify existing columns in a data frame.

It is one of the most frequently used functions in data analysis and machine learning.

Install and Load Package


install.packages("dplyr")

library(dplyr)

📊 Sample Dataset (10 Records)


employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	31	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
110	Anjali	HR	32	52000

📖 2.31 Creating New Columns with `mutate()`

Syntax


mutate(dataframe,
       NewColumn = Expression)

💻 Example 1: Calculate 10% Bonus


library(dplyr)

employee_bonus <- employee %>%
mutate(Bonus = Salary * 0.10)

employee_bonus

Output

Name	Salary	Bonus
Amit	30000	3000
Priya	35000	3500
Rahul	50000	5000
Sneha	32000	3200
Karan	60000	6000
Neha	55000	5500
Arjun	40000	4000
Pooja	58000	5800
Rohan	45000	4500
Anjali	52000	5200

💻 Example 2: Calculate Gross Salary


employee_gross <- employee %>%
mutate(Gross_Salary = Salary + (Salary * 0.10))

employee_gross

Output

Name	Salary	Gross_Salary
Amit	30000	33000
Priya	35000	38500
Rahul	50000	55000
Sneha	32000	35200
Karan	60000	66000
Neha	55000	60500
Arjun	40000	44000
Pooja	58000	63800
Rohan	45000	49500
Anjali	52000	57200

💻 Example 3: Calculate 5% Income Tax


employee_tax <- employee %>%
mutate(Tax = Salary * 0.05)

employee_tax

Output

Name	Salary	Tax
Amit	30000	1500
Priya	35000	1750
Rahul	50000	2500
Sneha	32000	1600
Karan	60000	3000
Neha	55000	2750
Arjun	40000	2000
Pooja	58000	2900
Rohan	45000	2250
Anjali	52000	2600

💻 Example 4: Calculate Net Salary


employee_net <- employee %>%
mutate(

Bonus = Salary*0.10,

Tax = Salary*0.05,

Net_Salary = Salary + Bonus - Tax

)

employee_net

Output

Name	Salary	Bonus	Tax	Net_Salary
Amit	30000	3000	1500	31500
Priya	35000	3500	1750	36750
Rahul	50000	5000	2500	52500
Sneha	32000	3200	1600	33600
Karan	60000	6000	3000	63000
Neha	55000	5500	2750	57750
Arjun	40000	4000	2000	42000
Pooja	58000	5800	2900	60900
Rohan	45000	4500	2250	47250
Anjali	52000	5200	2600	54600

💻 Example 5: Increase Salary by ₹5,000


employee %>%
mutate(Salary = Salary + 5000)

Output

Each employee's salary increases by ₹5,000.

📖 2.32 The `transmute()` Function

Definition

The transmute() function creates new columns but returns only the newly created columns.

Unlike mutate(), the original columns are not included.

Syntax


transmute(dataframe,
          NewColumn = Expression)

💻 Example 6: Display Bonus Only


employee %>%
transmute(Name,
Bonus = Salary*0.10)

Output

Name	Bonus
Amit	3000
Priya	3500
Rahul	5000
Sneha	3200
Karan	6000
Neha	5500
Arjun	4000
Pooja	5800
Rohan	4500
Anjali	5200

💻 Example 7: Gross Salary Only


employee %>%
transmute(Name,
Gross = Salary*1.10)

Output

Displays only Name and Gross Salary.

💻 Example 8: Age After Five Years


employee %>%
mutate(Age_After_5_Years = Age + 5)

Output

Name	Age	Age_After_5_Years
Amit	25	30
Priya	28	33
Rahul	30	35
Sneha	27	32
Karan	35	40
Neha	31	36
Arjun	29	34
Pooja	33	38
Rohan	26	31
Anjali	32	37

💻 Example 9: Annual Salary


employee %>%
mutate(Annual_Salary = Salary * 12)

Output

Name	Monthly Salary	Annual Salary
Amit	30000	360000
Priya	35000	420000
Rahul	50000	600000
Sneha	32000	384000
Karan	60000	720000
Neha	55000	660000
Arjun	40000	480000
Pooja	58000	696000
Rohan	45000	540000
Anjali	52000	624000

💻 Example 10: Employee Category


employee %>%
mutate(Category = ifelse(Salary >= 50000,
                         "High Salary",
                         "Normal Salary"))

Output

Name	Salary	Category
Amit	30000	Normal Salary
Priya	35000	Normal Salary
Rahul	50000	High Salary
Sneha	32000	Normal Salary
Karan	60000	High Salary
Neha	55000	High Salary
Arjun	40000	Normal Salary
Pooja	58000	High Salary
Rohan	45000	Normal Salary
Anjali	52000	High Salary

📊 Comparison of `mutate()` and `transmute()`

Feature	`mutate()`	`transmute()`
Keeps Original Columns	✅ Yes	❌ No
Creates New Columns	✅ Yes	✅ Yes
Modifies Existing Columns	✅ Yes	✅ Yes
Returns Only New Columns	❌ No	✅ Yes

🌍 Real-Life Applications

Employee payroll systems
Student result processing
Banking interest calculation
GST and tax calculation
Insurance premium calculation
Sales commission reports
Financial reporting
Business analytics

📝 Lab Exercises

Calculate a 15% bonus for each employee.
Create a Gross Salary column.
Create a Net Salary column after deducting 8% tax.
Calculate annual salary.
Increase every salary by ₹2,000.
Create a category column (High Salary, Medium Salary, Low Salary).
Display only Name and Bonus using transmute().
Calculate age after 10 years.
Create a PF deduction column (12% of salary).
Calculate Take Home Salary = Salary + Bonus − Tax − PF.

❓ Viva Questions

What is the purpose of mutate()?
What is the difference between mutate() and transmute()?
Can mutate() modify existing columns?
Which package contains mutate()?
Which function returns only new columns?
How do you create a new column in R?
What is the use of ifelse() inside mutate()?
How do you calculate annual salary?
What are the advantages of mutate()?
Give two real-life applications of transmute().

📚 Class Summary

In this class, you learned:

Creating new variables with mutate().
Modifying existing variables.
Using transmute() to return only selected transformed columns.
Calculating bonus, tax, gross salary, net salary, annual salary, and employee categories.
Practical R programs with outputs.
Real-world applications, lab exercises, and viva questions.

Class 9: Data Transformation using `summarise()` and `group_by()`

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the purpose of summarise() and group_by().
Calculate statistical summaries of datasets.
Group data based on one or more columns.
Generate department-wise reports.
Perform grouped statistical analysis.
Apply summary functions in real-world business scenarios.

📖 2.33 Introduction to `summarise()`

Definition

The summarise() (or summarize()) function from the dplyr package is used to calculate summary statistics for a dataset. It reduces multiple rows into a single summary.

Common statistics include:

Mean
Sum
Minimum
Maximum
Count
Standard Deviation
Variance

Install and Load Package


install.packages("dplyr")
library(dplyr)

📊 Sample Dataset (10 Records)


employee <- data.frame(

Emp_ID=c(101,102,103,104,105,106,107,108,109,110),

Name=c("Amit","Priya","Rahul","Sneha","Karan",
"Neha","Arjun","Pooja","Rohan","Anjali"),

Department=c("HR","Sales","IT","HR","IT",
"Finance","Sales","Finance","IT","HR"),

Age=c(25,28,30,27,35,31,29,33,26,32),

Salary=c(30000,35000,50000,32000,60000,
55000,40000,58000,45000,52000)
)

employee

Output

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	31	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
110	Anjali	HR	32	52000

📖 2.34 Using `summarise()`

Syntax


summarise(dataframe,
          NewColumn = function(column))

💻 Example 1: Calculate Average Salary


library(dplyr)

employee %>%
summarise(
Average_Salary = mean(Salary)
)

Output

Average_Salary
45700

💻 Example 2: Total Salary


employee %>%
summarise(
Total_Salary = sum(Salary)
)

Output

Total_Salary
457000

💻 Example 3: Minimum and Maximum Salary


employee %>%
summarise(

Minimum = min(Salary),

Maximum = max(Salary)

)

Output

Minimum	Maximum
30000	60000

💻 Example 4: Count Employees


employee %>%
summarise(
Total_Employees = n()
)

Output

Total_Employees
10

💻 Example 5: Standard Deviation


employee %>%
summarise(
Standard_Deviation = sd(Salary)
)

Output

Standard_Deviation
10682.07 (approx.)

📖 2.35 Using `group_by()`

Definition

The group_by() function divides a dataset into groups. When used with summarise(), it calculates statistics for each group separately.

Syntax


group_by(dataframe, Column_Name)

💻 Example 6: Average Salary by Department


employee %>%
group_by(Department) %>%
summarise(
Average_Salary = mean(Salary)
)

Output

Department	Average Salary
Finance	56500
HR	38000
IT	51667
Sales	37500

💻 Example 7: Total Salary by Department


employee %>%
group_by(Department) %>%
summarise(
Total_Salary = sum(Salary)
)

Output

Department	Total Salary
Finance	113000
HR	114000
IT	155000
Sales	75000

💻 Example 8: Employee Count by Department


employee %>%
group_by(Department) %>%
summarise(
Employees = n()
)

Output

Department	Employees
Finance	2
HR	3
IT	3
Sales	2

💻 Example 9: Department-wise Minimum and Maximum Salary


employee %>%
group_by(Department) %>%
summarise(

Minimum = min(Salary),

Maximum = max(Salary)

)

Output

Department	Minimum	Maximum
Finance	55000	58000
HR	30000	52000
IT	45000	60000
Sales	35000	40000

💻 Example 10: Multiple Summary Statistics


employee %>%
group_by(Department) %>%
summarise(

Average_Age = mean(Age),

Average_Salary = mean(Salary),

Highest_Salary = max(Salary),

Lowest_Salary = min(Salary),

Employees = n()

)

Output

Department	Avg Age	Avg Salary	Highest	Lowest	Employees
Finance	32.0	56500	58000	55000	2
HR	28.0	38000	52000	30000	3
IT	30.3	51667	60000	45000	3
Sales	28.5	37500	40000	35000	2

📊 Common Summary Functions

Function	Purpose
`mean()`	Average
`sum()`	Total
`min()`	Minimum
`max()`	Maximum
`n()`	Count
`sd()`	Standard Deviation
`var()`	Variance
`median()`	Median

📊 Comparison of Functions

Function	Purpose
`summarise()`	Creates summary statistics
`group_by()`	Groups data into categories
`n()`	Counts rows in each group
`mean()`	Calculates average
`sum()`	Calculates total

🌍 Real-Life Applications

Department-wise salary analysis.
Student performance reports by class.
Monthly sales summaries by region.
Customer purchase analysis.
Banking transaction summaries.
Hospital patient statistics.
Inventory reports.
Business intelligence dashboards.

✔ Advantages

Produces concise statistical summaries.
Supports grouped analysis.
Easy to combine with other dplyr functions.
Ideal for dashboards and reports.
Highly efficient for large datasets.

✖ Limitations

Requires correctly grouped data.
Missing values should be handled before summarizing.
Complex summaries may require additional functions.

📝 Lab Exercises

Calculate the average salary of all employees.
Find the total salary paid.
Count the total number of employees.
Find the highest and lowest salary.
Calculate the standard deviation of salaries.
Find the average salary for each department.
Count employees in each department.
Calculate total salary by department.
Find the minimum and maximum salary for each department.
Create a department-wise summary showing average age, average salary, highest salary, lowest salary, and employee count.

❓ Viva Questions

What is the purpose of summarise()?
What is the purpose of group_by()?
Which function counts the number of rows?
How do you calculate the average salary?
What is the difference between summarise() and group_by()?
Can summarise() be used without group_by()?
Which function calculates standard deviation?
What is the purpose of n()?
Why is grouped analysis important?
Give two real-life applications of group_by().

Class 10 (Final): Complete Data Cleaning and Data Transformation Case Study

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Import data from a CSV file.
Explore the dataset.
Handle missing values.
Remove duplicate records.
Rename columns.
Transform data using dplyr.
Generate summary reports.
Export the processed dataset.
Apply the complete data analysis workflow in R.

📖 2.36 Complete Data Analysis Workflow

A typical data analysis project follows these steps:


Raw Data
    │
    ▼
Import Data
    │
    ▼
Explore Dataset
    │
    ▼
Clean Data
    │
    ▼
Transform Data
    │
    ▼
Summarize Data
    │
    ▼
Export Results

📊 Case Study: Employee Salary Analysis

Suppose a company provides the following employee dataset.

Sample Dataset (10 Records)


employee <- data.frame(

Emp_ID = c(101,102,103,104,105,106,107,108,109,109),

Name = c("Amit","Priya","Rahul","Sneha","Karan",
         "Neha","Arjun","Pooja","Rohan","Rohan"),

Department = c("HR","Sales","IT","HR","IT",
               "Finance","Sales","Finance","IT","IT"),

Age = c(25,28,30,27,35,NA,29,33,26,26),

Salary = c(30000,35000,50000,32000,60000,
           55000,40000,58000,45000,45000)

)

employee

Output

Emp_ID	Name	Department	Age	Salary
101	Amit	HR	25	30000
102	Priya	Sales	28	35000
103	Rahul	IT	30	50000
104	Sneha	HR	27	32000
105	Karan	IT	35	60000
106	Neha	Finance	NA	55000
107	Arjun	Sales	29	40000
108	Pooja	Finance	33	58000
109	Rohan	IT	26	45000
109	Rohan	IT	26	45000

Notice that:

One missing value exists in Age.
One duplicate employee record exists.

Step 1: Explore the Dataset

Program 1


str(employee)
summary(employee)

Output


'data.frame': 10 observations of 5 variables

Summary:
Emp_ID
Name
Department
Age
Salary

Step 2: Detect Missing Values

Program 2


sum(is.na(employee))

Output


[1] 1

Step 3: Replace Missing Age with Mean

Program 3


employee$Age[is.na(employee$Age)] <-

mean(employee$Age,
     na.rm=TRUE)

employee

Output


Missing value replaced successfully.

Step 4: Detect Duplicate Records

Program 4


duplicated(employee)

Output


FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE

Step 5: Remove Duplicate Records

Program 5


employee <-

employee[!duplicated(employee),]

Output


Duplicate record removed.

Total Records = 9

Step 6: Rename Columns

Program 6


colnames(employee) <-

c("Employee_ID",

"Employee_Name",

"Department",

"Age",

"Salary")

employee

Output


Columns renamed successfully.

Step 7: Create Bonus Column

Program 7


library(dplyr)

employee <-

employee %>%

mutate(

Bonus = Salary*0.10

)

employee

Output

Employee_Name	Salary	Bonus
Amit	30000	3000
Priya	35000	3500
Rahul	50000	5000
Sneha	32000	3200
Karan	60000	6000
Neha	55000	5500
Arjun	40000	4000
Pooja	58000	5800
Rohan	45000	4500

Step 8: Create Gross Salary

Program 8


employee <-

employee %>%

mutate(

Gross_Salary=

Salary+Bonus

)

employee

Output

Employee_Name	Gross Salary
Amit	33000
Priya	38500
Rahul	55000
Sneha	35200
Karan	66000
Neha	60500
Arjun	44000
Pooja	63800
Rohan	49500

Step 9: Department-wise Summary

Program 9


employee %>%

group_by(Department)%>%

summarise(

Employees=n(),

Average_Salary=

mean(Salary),

Highest=max(Salary),

Lowest=min(Salary)

)

Output

Department	Employees	Average Salary	Highest	Lowest
Finance	2	56500	58000	55000
HR	2	31000	32000	30000
IT	3	51667	60000	45000
Sales	2	37500	40000	35000

Step 10: Export Processed Dataset

Program 10


write.csv(

employee,

"Employee_Report.csv",

row.names=FALSE

)

Output


Employee_Report.csv created successfully.

📊 Complete Workflow Summary

Step	Function
Import Data	`read.csv()`
Check Structure	`str()`
Summary	`summary()`
Missing Values	`is.na()`
Remove Missing	`na.omit()`
Replace Missing	`mean()`
Duplicate Detection	`duplicated()`
Remove Duplicates	`!duplicated()`
Rename Columns	`colnames()`
Create New Columns	`mutate()`
Group Data	`group_by()`
Statistical Summary	`summarise()`
Export Data	`write.csv()`

📊 Best Practices

✔ Keep a backup of the original dataset.

✔ Handle missing values before analysis.

✔ Remove duplicate records carefully.

✔ Use meaningful column names.

✔ Verify data types.

✔ Use group_by() for grouped analysis.

✔ Export the final cleaned dataset.

✔ Document every transformation step.

⚠ Common Errors and Solutions

Error	Cause	Solution
Object not found	Incorrect variable name	Check spelling
Missing package	Package not installed	`install.packages()`
NA values in mean	Missing values present	Use `na.rm = TRUE`
Duplicate records	Repeated data	Use `duplicated()`
Wrong column name	Typing mistake	Use `colnames()`

🌍 Real-Life Applications

Employee payroll processing.
Student examination systems.
Banking customer databases.
Hospital patient records.
Insurance claim processing.
Retail sales analysis.
Inventory management.
Government census data.
Customer relationship management (CRM).
Machine learning data preprocessing.

📝 Lab Programs

Import a CSV file.
Display the first 10 records.
Check the structure of the dataset.
Count missing values.
Replace missing values with the mean.
Detect duplicate records.
Remove duplicate records.
Rename all columns.
Create a Bonus column.
Calculate Gross Salary.
Calculate Annual Salary.
Group employees by department.
Calculate average salary department-wise.
Export the cleaned dataset.
Create a complete employee report.

❓ Viva Questions

What is data cleaning?
What is data transformation?
Which function imports a CSV file?
How do you detect missing values?
Which function removes duplicate records?
What is the purpose of mutate()?
What is group_by() used for?
Which function exports data to CSV?
Why is data cleaning important?
What are the steps in a data analysis workflow?
What is the difference between summarise() and mutate()?
Why are meaningful column names important?
What is the purpose of na.rm = TRUE?
How do you calculate department-wise statistics?
Give three real-life applications of data transformation.
What is the use of duplicated()?
How do you create a new variable in R?
What is the difference between CSV and Excel files?
Why should raw data be backed up before cleaning?
Explain the complete data analysis process in R.

📚 Module 2 Summary

In this module, you learned:

Importing and exporting data using CSV and Excel files.
Handling missing values and duplicate records.
Converting data types.
Renaming rows and columns.
Selecting, filtering, and arranging data.
Creating and transforming variables with mutate() and transmute().
Summarizing data using summarise() and group_by().
Applying a complete data cleaning and transformation workflow using R.
Solving real-world data analysis problems with practical R programs and outputs.

Module 3: Data Visualization in R Programming

📘 CHAPTER 1: Introduction to Data Visualization in R

🌟 1.1 What is Data Visualization?

Data Visualization is the graphical representation of data using charts, graphs, and plots.
It helps to convert raw data into meaningful visual information.

🎯 Purpose:

To understand patterns in data
To identify trends and relationships
To detect outliers
To support decision making

📊 1.2 Importance of Data Visualization

Makes complex data easy to understand
Improves analysis speed
Helps in statistical interpretation
Useful in business intelligence
Enhances presentation quality

📈 1.3 Types of Data Visualizations in R

Type	Purpose
Scatter Plot	Relationship between variables
Line Plot	Trend analysis
Bar Chart	Category comparison
Histogram	Data distribution
Pie Chart	Percentage representation
Box Plot	Outlier detection

🟦 1.4 Base R Graphics

Base R provides built-in functions to create plots without installing additional packages.

🔧 Common Functions:

plot() → General plotting
barplot() → Bar chart
hist() → Histogram
pie() → Pie chart
boxplot() → Box plot

📍 1.5 Scatter Plot in Base R

🎯 Objective:

To show relationship between two variables.

💻 R Script:


# Scatter Plot Example

x <- c(10, 20, 30, 40, 50)
y <- c(15, 25, 35, 45, 60)

plot(x, y,
     main = "Scatter Plot Example",
     xlab = "X Values",
     ylab = "Y Values",
     col = "blue",
     pch = 19,
     cex = 1.5)

🖥️ Output:

A blue scatter plot
Points increasing diagonally
Title: Scatter Plot Example

📌 Interpretation:

There is a positive relationship between X and Y values.

📉 1.6 Line Plot in Base R

💻 R Script:


# Line Plot Example

sales <- c(100, 120, 150, 180, 200)

plot(sales,
     type = "l",
     col = "red",
     lwd = 3,
     main = "Sales Growth Over Time",
     xlab = "Time",
     ylab = "Sales")

🖥️ Output:

Red line graph
Shows increasing trend

📌 Interpretation:

Sales are increasing steadily over time.

📊 1.7 Bar Plot in Base R

💻 R Script:


# Bar Plot Example

students <- c(30, 25, 40, 35)

barplot(students,
        names.arg = c("A", "B", "C", "D"),
        col = "green",
        main = "Class Strength")

🖥️ Output:

Green vertical bars
Categories A, B, C, D

📊 1.8 Histogram in Base R

💻 R Script:


# Histogram Example

marks <- c(45, 50, 55, 60, 65, 70, 75, 80, 85)

hist(marks,
     col = "skyblue",
     main = "Marks Distribution",
     xlab = "Marks")

🖥️ Output:

Blue histogram bars
Frequency distribution of marks

🥧 1.9 Pie Chart in Base R

💻 R Script:


# Pie Chart Example

data <- c(20, 30, 25, 25)

pie(data,
    labels = c("Food", "Rent", "Travel", "Savings"),
    col = rainbow(4),
    main = "Expense Distribution")

🖥️ Output:

Multicolor pie chart
Shows percentage distribution

📦 1.10 Box Plot in Base R

💻 R Script:


# Box Plot Example

marks <- c(40, 50, 55, 60, 65, 70, 75, 90)

boxplot(marks,
        col = "orange",
        main = "Marks Analysis")

🖥️ Output:

Orange box plot
Shows median and spread

⚡ 1.11 Key Advantages of Base R Graphics

Easy to use
No installation required
Fast execution
Good for basic analysis

📌 1.12 Summary

Data visualization converts data into graphical form
Base R provides simple plotting tools
Common plots: scatter, line, bar, histogram, pie, box
Helps in understanding patterns and trends

❓ 1.13 Viva Questions

What is data visualization?
What is the use of plot() in R?
What is a scatter plot?
Difference between bar plot and histogram?
What is the purpose of a box plot?
What does col parameter do?
What is the use of pch in scatter plot?

📘 CHAPTER 2: Advanced Data Visualization Using Base R Graphics + Introduction to ggplot2

🌟 2.1 Limitations of Base R Graphics

Although Base R graphics are useful, they have some limitations:

❌ Limited customization
❌ Not visually attractive for reports
❌ Difficult to create complex plots
❌ No grammar-based structure
❌ Hard to build advanced dashboards

👉 To overcome these problems, we use ggplot2

🎨 2.2 Introduction to ggplot2

ggplot2 is a powerful visualization package in R based on the Grammar of Graphics.

📦 Install Package:


install.packages("ggplot2")

📥 Load Package:


library(ggplot2)

📚 2.3 Grammar of Graphics (Core Concept)

A plot in ggplot2 is built using layers:

🧩 Components:

Component	Meaning
Data	Dataset
Aesthetics (aes)	Mapping variables
Geom	Type of plot
Stats	Statistical transformation
Coord	Coordinate system
Theme	Visual appearance

📊 2.4 Basic ggplot Structure


ggplot(data, aes(x, y)) +
  geom_function()

📌 2.5 Example Dataset


student <- data.frame(
  Name = c("A", "B", "C", "D", "E"),
  Marks = c(70, 85, 90, 60, 75),
  Age = c(18, 19, 20, 18, 21)
)

📍 2.6 Scatter Plot (ggplot2)


library(ggplot2)

ggplot(student, aes(x = Age, y = Marks)) +
  geom_point(color = "blue", size = 4) +
  ggtitle("Age vs Marks Scatter Plot") +
  xlab("Age") +
  ylab("Marks")

🖥️ Output:

Blue circular points
Clear relationship between Age and Marks

📉 2.7 Line Plot (ggplot2)


ggplot(student, aes(x = Age, y = Marks)) +
  geom_line(color = "red", size = 1.5) +
  geom_point(color = "black", size = 3) +
  ggtitle("Line Plot of Marks")

🖥️ Output:

Red line connecting points
Black dots on each value

📊 2.8 Bar Plot (ggplot2)


ggplot(student, aes(x = Name, y = Marks)) +
  geom_bar(stat = "identity", fill = "green") +
  ggtitle("Student Marks Bar Chart")

🖥️ Output:

Green vertical bars
Each student’s marks compared

📊 2.9 Histogram (ggplot2)


ggplot(student, aes(x = Marks)) +
  geom_histogram(binwidth = 10,
                 fill = "skyblue",
                 color = "black") +
  ggtitle("Marks Distribution")

🖥️ Output:

Histogram showing frequency of marks

📦 2.10 Box Plot (ggplot2)


ggplot(student, aes(y = Marks)) +
  geom_boxplot(fill = "orange") +
  ggtitle("Box Plot of Marks")

🖥️ Output:

Orange box showing median & outliers

🌈 2.11 Density Plot


ggplot(student, aes(x = Marks)) +
  geom_density(fill = "pink", alpha = 0.5) +
  ggtitle("Density Plot of Marks")

🖥️ Output:

Smooth curve showing distribution

🎨 2.12 Customizing ggplot2

🔹 Titles & Labels


ggplot(student, aes(Age, Marks)) +
  geom_point() +
  labs(title = "Student Performance",
       x = "Age",
       y = "Marks")

🔹 Themes


ggplot(student, aes(Age, Marks)) +
  geom_point() +
  theme_minimal()

Other Themes:

theme_bw()
theme_classic()
theme_dark()

🔹 Colors & Size


ggplot(student, aes(Age, Marks)) +
  geom_point(color = "red", size = 4)

🔹 Scales


ggplot(student, aes(Age, Marks)) +
  geom_point() +
  scale_y_continuous(limits = c(50, 100))

🧩 2.13 Faceting (Multiple Plots)


student$Gender <- c("M", "F", "M", "F", "M")

ggplot(student, aes(Age, Marks)) +
  geom_point() +
  facet_wrap(~Gender)

🖥️ Output:

Separate plots for Male and Female

📊 2.14 Multiple Plot Layout


library(gridExtra)

p1 <- ggplot(student, aes(Age, Marks)) + geom_point()
p2 <- ggplot(student, aes(Name, Marks)) + geom_bar(stat="identity")

grid.arrange(p1, p2, ncol = 2)

📌 2.15 Summary

Base R is simple but limited
ggplot2 is powerful and flexible
Grammar of Graphics is core concept
Customization is easy in ggplot2
Faceting helps in multi-view analysis

❓ 2.16 Viva Questions

What is ggplot2?
What is Grammar of Graphics?
Difference between base R and ggplot2?
What is aes() in ggplot2?
What is geom_point()?
What is faceting?
What is theme in ggplot2?
What is density plot?

📘 CHAPTER 3: Interactive Data Visualization in R (Plotly & Shiny)

🌟 3.1 What is Interactive Visualization?

Interactive visualization allows users to:

🔍 Zoom in/out of graphs
🖱️ Hover to see values
🎯 Click and explore data
📊 Filter and analyze dynamically

👉 It makes data exploration more powerful than static graphs.

📦 3.2 Plotly in R

Plotly is used to create interactive charts in R.

📥 Install Plotly


install.packages("plotly")

📥 Load Library


library(plotly)

📊 3.3 Interactive Scatter Plot


library(plotly)

x <- c(1,2,3,4,5)
y <- c(10,20,15,25,30)

fig <- plot_ly(
  x = x,
  y = y,
  type = "scatter",
  mode = "markers",
  marker = list(color = "blue", size = 10)
)

fig

🖥️ Output:

Interactive blue points
Hover shows values
Zoom enabled

📈 3.4 Interactive Line Plot


plot_ly(
  x = 1:10,
  y = (1:10)^2,
  type = "scatter",
  mode = "lines+markers",
  line = list(color = "red")
)

🖥️ Output:

Red curve showing quadratic growth
Click and zoom enabled

📊 3.5 Interactive Bar Chart


plot_ly(
  x = c("A", "B", "C", "D"),
  y = c(20, 35, 30, 40),
  type = "bar",
  marker = list(color = "green")
)

🖥️ Output:

Green bars
Hover shows values

📊 3.6 ggplot2 + Plotly Integration


library(ggplot2)
library(plotly)

student <- data.frame(
  Name = c("A","B","C","D"),
  Marks = c(70,80,90,85)
)

p <- ggplot(student, aes(Name, Marks)) +
  geom_bar(stat="identity", fill="blue")

ggplotly(p)

🖥️ Output:

Interactive bar chart
Hover + zoom + click enabled

🌐 3.7 Introduction to Shiny

Shiny is used to create interactive web applications in R.

👉 Used for:

Dashboards
Data apps
Live reports

📥 Install Shiny


install.packages("shiny")

📥 Load Library


library(shiny)

🧱 3.8 Structure of Shiny App

A Shiny app has 2 parts:

Component	Purpose
UI	User Interface
Server	Logic/Backend

📱 3.9 Simple Shiny App


library(shiny)

ui <- fluidPage(
  titlePanel("Simple Shiny App"),

  sidebarLayout(
    sidebarPanel(
      sliderInput("num",
                  "Select Number:",
                  min = 1,
                  max = 100,
                  value = 50)
    ),

    mainPanel(
      textOutput("result")
    )
  )
)

server <- function(input, output) {
  output$result <- renderText({
    paste("Selected Value:", input$num)
  })
}

shinyApp(ui = ui, server = server)

🖥️ Output:

Slider input (1–100)
Dynamic text updates instantly

📊 3.10 Shiny Dashboard Example


library(shiny)

ui <- fluidPage(
  titlePanel("Student Dashboard"),

  sidebarLayout(
    sidebarPanel(
      selectInput("subject",
                  "Choose Subject:",
                  choices = c("Math", "Science", "English"))
    ),

    mainPanel(
      textOutput("outputText")
    )
  )
)

server <- function(input, output) {

  output$outputText <- renderText({
    paste("You selected:", input$subject)
  })

}

shinyApp(ui = ui, server = server)

🖥️ Output:

Dropdown menu
Dynamic response display

📊 3.11 Advantages of Interactive Visualization

🎯 Real-time interaction
📊 Better data understanding
📈 Professional dashboards
🧠 Easy decision-making
🌐 Web-based applications

⚖️ 3.12 Comparison

Tool	Type	Use
Base R	Static	Basic plots
ggplot2	Static advanced	Publication graphs
Plotly	Interactive	Dynamic charts
Shiny	Web app	Dashboards

📌 3.13 Summary

Plotly adds interactivity to graphs
ggplotly converts ggplot to interactive charts
Shiny creates full web applications
Interactive tools are used in real-world analytics

❓ 3.14 Viva Questions

What is interactive visualization?
What is Plotly used for?
What is Shiny in R?
Difference between ggplot2 and Plotly?
What are UI and Server in Shiny?
What is ggplotly()?
What are dashboards?

🎓 FINAL SUMMARY (FULL MODULE)

✔ Base R Graphics → Simple plots
✔ ggplot2 → Advanced visualization
✔ Plotly → Interactive charts
✔ Shiny → Full web dashboards

📘 MODULE 4: STATISTICAL ANALYSIS AND MODELING

Class 1: Descriptive Statistics and Measures of Central Tendency

🌟 Learning Objectives

After completing this chapter, students will be able to:

Understand the concept of descriptive statistics.
Explain measures of central tendency.
Calculate Mean, Median, and Mode using R.
Interpret statistical results.
Apply descriptive statistics to real-world data.

📚 4.1 Introduction to Statistics

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. It helps researchers, businesses, scientists, and governments make informed decisions based on numerical information.

For example:

A school calculates the average marks of students.
A company analyzes monthly sales.
A hospital studies patient recovery rates.
Weather departments analyze temperature records.

Statistics transforms raw data into useful information.

📖 Types of Statistics

Statistics is broadly classified into two categories:

1. Descriptive Statistics

Descriptive statistics summarizes and describes the main features of a dataset. It does not make predictions but presents the data in a meaningful way.

Examples:

Mean
Median
Mode
Range
Variance
Standard Deviation

Applications

Student result analysis
Employee salary reports
Sales reports
Population surveys

2. Inferential Statistics

Inferential statistics uses sample data to make predictions or conclusions about a larger population.

Examples:

Hypothesis testing
Regression analysis
ANOVA
Confidence intervals

⭐ Importance of Descriptive Statistics

Descriptive statistics helps to:

Summarize large datasets.
Identify patterns and trends.
Compare different datasets.
Support decision-making.
Prepare data for advanced analysis.

📊 Measures of Central Tendency

Measures of Central Tendency describe the center or typical value of a dataset.

The three common measures are:

Mean
Median
Mode

🔵 4.2 Mean (Arithmetic Mean)

Definition

The Mean is the arithmetic average of all observations.

It is the most commonly used measure of central tendency.

Formula

$Mean = \frac{\sum X}{N}$

Where:

ΣX = Sum of all observations
N = Number of observations

Sample Data (10 Students' Marks)

Student	Marks
1	45
2	52
3	58
4	63
5	67
6	72
7	78
8	84
9	90
10	95

Manual Calculation

Step 1: Add all values

45 + 52 + 58 + 63 + 67 + 72 + 78 + 84 + 90 + 95

= 704

Step 2: Count observations

Number of observations = 10

Step 3: Apply Formula

Mean = 704 ÷ 10

= 70.4

💻 R Program


# Program to Calculate Mean

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

mean_value <- mean(marks)

print("Mean of Marks")
print(mean_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Mean of Marks"

[1] 70.4

📖 Explanation

The mean() function in R calculates the arithmetic average of all values in the vector.


mean(marks)

returns


70.4

because the total marks are 704, divided by 10 students.

✅ Interpretation

The average marks obtained by the students are 70.4.

This means that if the total marks were equally distributed among all students, each student would receive 70.4 marks.

🌍 Real-Life Applications of Mean

Calculating students' average marks.
Measuring average monthly income.
Determining average rainfall.
Calculating average temperature.
Business profit analysis.
Cricket batting average.
Manufacturing quality control.

✔ Advantages of Mean

Easy to calculate.
Uses all observations.
Suitable for mathematical analysis.
Widely used in statistics.

✖ Disadvantages of Mean

Affected by very high or very low values (outliers).
Not suitable for highly skewed data.
Cannot be used for categorical data.

💡 Important Note

The Mean is the most widely used measure of central tendency, but it can be misleading when a dataset contains extreme values.

📝 Practice Exercise

Use the following data to calculate the Mean manually and using R.

Data
25
30
35
40
45
50
55
60
65
70

Write an R Program


marks <- c(25,30,35,40,45,50,55,60,65,70)

mean(marks)

Expected Output


[1] 47.5

📌 Key Points

Mean is the arithmetic average.
It is calculated using all observations.
R provides the mean() function.
Mean is affected by extreme values.
It is widely used in business, science, education, and research.

🎯 Learning Summary

After completing this lesson, you have learned:

What is descriptive statistics?
Types of statistics.
Importance of descriptive statistics.
Definition and formula of Mean.
Manual calculation of Mean.
R program to calculate Mean.
Interpretation of output.
Applications, advantages, and disadvantages of Mean.

🔴 4.3 Median

📖 Definition

The Median is the middle value of a dataset when the observations are arranged in ascending or descending order.

Unlike the Mean, the Median is not affected by extremely high or low values (outliers). Therefore, it is considered a better measure of central tendency for skewed data.

🎯 Formula

For Odd Number of Observations

$Median = {(\frac{n + 1}{2})}^{t h} Observation$

For Even Number of Observations

$Median = \frac{{Middle Value}_{1} + {Middle Value}_{2}}{2}$

Where:

n = Total number of observations

📊 Example (10 Student Marks)

Student	Marks
1	45
2	52
3	58
4	63
5	67
6	72
7	78
8	84
9	90
10	95

The data is already arranged in ascending order.

🧮 Manual Calculation

Number of observations = 10 (Even)

Middle positions:

5th value = 67
6th value = 72

Median

= (67 + 72) ÷ 2

= 69.5

💻 R Program


# Program to Calculate Median

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

median_value <- median(marks)

print("Median of Marks")
print(median_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Median of Marks"

[1] 69.5

📖 Explanation

The median() function automatically sorts the values (if required) and finds the middle value.

For an even number of observations, it calculates the average of the two middle values.

✅ Interpretation

The median marks are 69.5.

This means:

50% of students scored below 69.5
50% of students scored above 69.5

🌍 Real-Life Applications

Income analysis
House price analysis
Population studies
Salary surveys
Medical research

✔ Advantages

Not affected by outliers.
Easy to understand.
Suitable for skewed data.
Useful for ordinal data.

✖ Disadvantages

Does not use every observation.
Difficult to calculate for grouped data manually.

📝 Practice Exercise

Find the median of the following data using R.

Sample Data

28, 35, 40, 45, 50, 55, 60, 65, 70, 80

R Script


marks <- c(28,35,40,45,50,55,60,65,70,80)

median(marks)

Output


[1] 52.5

🟣 4.4 Mode

📖 Definition

The Mode is the value that appears most frequently in a dataset.

A dataset may have:

One Mode (Unimodal)
Two Modes (Bimodal)
More than Two Modes (Multimodal)
No Mode (all values occur once)

Since R does not provide a built-in function for statistical mode, we create a custom function.

📊 Example (10 Student Marks)

Student	Marks
1	45
2	52
3	63
4	63
5	63
6	72
7	78
8	84
9	90
10	95

📋 Frequency Table

Marks	Frequency
45	1
52	1
63	3
72	1
78	1
84	1
90	1
95	1

The highest frequency is 3.

Therefore,

Mode = 63

💻 R Program


# Program to Calculate Mode

marks <- c(45,52,63,63,63,72,78,84,90,95)

Mode <- function(x)
{
    unique_values <- unique(x)
    unique_values[which.max(tabulate(match(x, unique_values)))]
}

mode_value <- Mode(marks)

print("Student Marks")
print(marks)

print("Mode of Marks")
print(mode_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 63 63 63 72 78 84 90 95

[1] "Mode of Marks"

[1] 63

📖 Explanation

The custom Mode() function:

Finds the unique values.
Counts how many times each value appears.
Returns the value with the highest frequency.

✅ Interpretation

The most frequently occurring mark is 63.

This indicates that 63 is the most common score among the students.

🌍 Real-Life Applications

Most sold product
Most common blood group
Most frequently purchased item
Customer preference analysis
Election survey analysis

✔ Advantages

Easy to understand.
Suitable for categorical data.
Not affected by outliers.
Represents the most common value.

✖ Disadvantages

Some datasets have multiple modes.
Some datasets have no mode.
Less useful for mathematical calculations.

📊 Comparison of Mean, Median, and Mode

Feature	Mean	Median	Mode
Definition	Average of all values	Middle value	Most frequent value
Uses All Data	✔ Yes	✖ No	✖ No
Affected by Outliers	✔ Yes	✖ No	✖ No
Suitable for Categorical Data	✖ No	✖ No	✔ Yes
R Function	`mean()`	`median()`	Custom Function

Class 2: Measures of Dispersion

🌟 Learning Objectives

After completing this chapter, students will be able to:

Understand the concept of dispersion.
Explain the importance of measures of dispersion.
Calculate Range, Variance, and Standard Deviation using R.
Interpret statistical results.
Compare different measures of dispersion.

📖 4.5 Measures of Dispersion

Definition

Measures of Dispersion are statistical measures that describe how spread out or scattered the data values are around a central value (usually the mean).

While measures of central tendency tell us the center of the data, measures of dispersion indicate how much the observations vary.

For example, two classes may have the same average marks, but one class may have marks that are closely grouped while the other has marks that are widely spread.

Importance of Measures of Dispersion

Measures of dispersion help us to:

Determine the consistency of data.
Compare different datasets.
Measure variability.
Analyze business and scientific data.
Make better statistical decisions.

🔵 4.6 Range

Definition

The Range is the simplest measure of dispersion. It is the difference between the highest and the lowest value in a dataset.

Formula

Range = Maximum Value - Minimum Value

Sample Data (10 Students' Marks)

Student	Marks
1	45
2	52
3	58
4	63
5	67
6	72
7	78
8	84
9	90
10	95

Manual Calculation

Maximum Value = 95

Minimum Value = 45

Range = 95 − 45

= 50

💻 R Program


# Program to Calculate Range

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

range_value <- max(marks) - min(marks)

print("Range")
print(range_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Range"

[1] 50

Explanation

max() finds the highest value.
min() finds the lowest value.
Their difference gives the range.

Interpretation

The marks are spread over 50 marks, indicating the overall spread between the highest and lowest scores.

Real-Life Applications

Temperature variation
Stock market prices
Monthly rainfall
Student performance analysis

Advantages

Very easy to calculate.
Easy to understand.
Quick measure of spread.

Disadvantages

Uses only two values.
Strongly affected by extreme values.
Does not describe the distribution of all observations.

🟣 4.7 Variance

Definition

Variance measures the average squared deviation of each observation from the mean. It tells us how much the data values vary around the average.

A small variance indicates that the data points are close to the mean, while a large variance indicates that the data points are widely spread.

Sample Data (10 Students' Marks)

45, 52, 58, 63, 67, 72, 78, 84, 90, 95

Step 1: Calculate Mean

Mean = 70.4

Step 2: Find Squared Devia

tions

Marks	x − Mean	(x − Mean)²
45	−25.4	645.16
52	−18.4	338.56
58	−12.4	153.76
63	−7.4	54.76
67	−3.4	11.56
72	1.6	2.56
78	7.6	57.76
84	13.6	184.96
90	19.6	384.16
95	24.6	605.16

Sum of squared deviations = 2438.40

Step 3: Calculate Variance

Variance = 2438.40 ÷ (10 − 1)

Variance = 2438.40 ÷ 9

≈ 270.93

💻 R Program


# Program to Calculate Variance

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

variance_value <- var(marks)

print("Variance")
print(variance_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Variance"

[1] 270.9333

Explanation

The var() function calculates the sample variance by dividing the sum of squared deviations by n − 1.

Interpretation

The variance of 270.93 indicates a moderate spread of marks around the mean.

Applications

Quality control
Financial risk analysis
Scientific experiments
Educational performance analysis

Advantages

Uses all observations.
Provides an accurate measure of variability.
Widely used in statistical analysis.

Disadvantages

Expressed in squared units.
Less intuitive than standard deviation.

🔴 4.8 Standard Deviation

Definition

The Standard Deviation is the positive square root of the variance. It measures the average distance of each observation from the mean and is expressed in the same units as the original data.

Formula

Standard Deviation = \sqrt{Variance}

Sample Data

45, 52, 58, 63, 67, 72, 78, 84, 90, 95

Manual Calculation

Variance = 270.93

Standard Deviation

= √270.93

≈ 16.46

💻 R Program


# Program to Calculate Standard Deviation

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

sd_value <- sd(marks)

print("Standard Deviation")
print(sd_value)

🖥 Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "Standard Deviation"

[1] 16.46005

Explanation

The sd() function calculates the square root of the sample variance.

Interpretation

The marks typically vary by about 16.46 marks from the average.

Real-Life Applications

Exam result analysis
Investment risk measurement
Manufacturing quality control
Medical research
Weather forecasting

Advantages

Uses all observations.
Expressed in original units.
Easy to interpret.
Most commonly used measure of dispersion.

Disadvantages

Influenced by outliers.
More computationally intensive than range.

📊 Comparison of Measures of Dispersion

Measure	Formula	R Function	Uses All Data	Affected by Outliers
Range	Max − Min	`max() - min()`	❌ No	✔ Yes
Variance	Σ(x − x̄)² / (n − 1)	`var()`	✔ Yes	✔ Yes
Standard Deviation	√Variance	`sd()`	✔ Yes	✔ Yes

📝 Practice Exercises

Calculate the Range for: 25, 30, 35, 40, 45, 50, 55, 60, 65, 70.
Write an R program to calculate the Variance of 10 observations.
Write an R program to calculate the Standard Deviation of a dataset.
Explain the difference between Range and Standard Deviation.
Which measure of dispersion is most commonly used? Why?

🎯 Chapter Summary

After studying this chapter, you should be able to:

Define measures of dispersion.
Calculate Range manually and using R.
Calculate Variance manually and using R.
Calculate Standard Deviation manually and using R.
Interpret the results produced by R.
Compare different measures of dispersion.
Apply these concepts to real-world datasets.

📘 MODULE 4: STATISTICAL ANALYSIS AND MODELING

Class 3: Complete Descriptive Statistics Using R

🎯 Learning Objectives

After completing this practical session, students will be able to:

Create an R program to calculate descriptive statistics.
Calculate Mean, Median, Mode, Range, Variance, and Standard Deviation.
Interpret statistical results.
Analyze a dataset using R.

📖 Introduction

In previous classes, we studied each statistical measure separately. In this practical session, we will develop a single R program that calculates all descriptive statistics for a dataset.

📊 Sample Dataset (10 Students' Marks)

Student	Marks
1	45
2	52
3	58
4	63
5	67
6	72
7	78
8	84
9	90
10	95

💻 Complete R Program


#---------------------------------------
# Descriptive Statistics in R
#---------------------------------------

# Sample Dataset

marks <- c(45,52,58,63,67,72,78,84,90,95)

print("Student Marks")
print(marks)

# Mean

mean_value <- mean(marks)

# Median

median_value <- median(marks)

# Mode

Mode <- function(x)
{
  unique_values <- unique(x)
  unique_values[
    which.max(tabulate(match(x, unique_values)))
  ]
}

mode_value <- Mode(marks)

# Range

range_value <- max(marks)-min(marks)

# Variance

variance_value <- var(marks)

# Standard Deviation

sd_value <- sd(marks)

print("-------------------------")

print(paste("Mean =",mean_value))

print(paste("Median =",median_value))

print(paste("Mode =",mode_value))

print(paste("Range =",range_value))

print(paste("Variance =",variance_value))

print(paste("Standard Deviation =",sd_value))

print("-------------------------")

🖥 Sample Output


[1] "Student Marks"

[1] 45 52 58 63 67 72 78 84 90 95

[1] "-------------------------"

[1] "Mean = 70.4"

[1] "Median = 69.5"

[1] "Mode = 45"

[1] "Range = 50"

[1] "Variance = 270.9333"

[1] "Standard Deviation = 16.46005"

[1] "-------------------------"

Note: In this dataset, every value appears only once, so the custom Mode() function returns the first value (45). Statistically, this dataset has no mode because no value occurs more frequently than the others. To demonstrate a true mode, use a dataset with repeated values (for example: 45, 52, 58, 63, 63, 63, 72, 84, 90, 95).

📖 Explanation of the Program

Function	Purpose
`mean()`	Calculates the arithmetic mean
`median()`	Calculates the median
`Mode()`	Finds the most frequently occurring value
`max()`	Returns the maximum value
`min()`	Returns the minimum value
`var()`	Calculates the sample variance
`sd()`	Calculates the sample standard deviation

📊 Interpretation of Results

Mean = 70.4

The average marks of the students are 70.4.

Median = 69.5

Half of the students scored below 69.5, while the other half scored above it.

Mode

Since all values occur only once, there is no statistical mode in this dataset.

Range = 50

The difference between the highest and lowest marks is 50.

Variance = 270.93

The marks show a moderate spread around the average.

Standard Deviation = 16.46

The marks typically vary by approximately 16.46 marks from the mean.

🌍 Real-Life Applications

Descriptive statistics is used in:

🎓 Student performance analysis
🏥 Hospital patient data
💼 Employee salary analysis
🏦 Banking and finance
📈 Stock market analysis
🌦 Weather forecasting
🛒 Business sales analysis
🧪 Scientific research
🏭 Manufacturing quality control
📊 Government census reports

📋 Summary Table

Statistical Measure	Formula	R Function	Result
Mean	ΣX / N	`mean()`	70.4
Median	Middle value	`median()`	69.5
Mode	Most frequent value	Custom Function	No mode (or 45 with this simple function)
Range	Max − Min	`max() - min()`	50
Variance	Σ(x − x̄)² / (n − 1)	`var()`	270.93
Standard Deviation	√Variance	`sd()`	16.46

📌 Advantages of Descriptive Statistics

Summarizes large datasets.
Easy to understand.
Supports decision-making.
Helps compare datasets.
Forms the basis for advanced statistical analysis.

⚠ Limitations

Describes only the available data.
Cannot make predictions about a population.
Sensitive to outliers (especially mean, variance, and standard deviation).
Does not establish cause-and-effect relationships.

📝 Lab Exercises

Exercise 1

Write an R program to calculate the Mean of the following data:

25, 30, 35, 40, 45, 50, 55, 60, 65, 70

Exercise 2

Write an R program to calculate the Median of:

18, 25, 32, 40, 45, 48, 55, 60, 68, 72

Exercise 3

Write an R program to calculate the Mode of:

10, 15, 20, 20, 20, 25, 30, 35, 40, 45

Exercise 4

Write an R program to calculate the Range of:

50, 60, 65, 70, 75, 80, 85, 90, 95, 100

Exercise 5

Write an R program to calculate the Variance and Standard Deviation of:

12, 15, 18, 20, 24, 27, 30, 32, 35, 40

❓ Viva Questions

What is descriptive statistics?
Define mean.
Define median.
What is mode?
Why does R require a custom function for mode?
Define range.
What is variance?
Define standard deviation.
Differentiate between variance and standard deviation.
Which measure of central tendency is least affected by outliers?
What is the difference between sample variance and population variance?
Which R function calculates the mean?
Which R function calculates the median?
Which R function calculates the variance?
Which R function calculates the standard deviation?

🎯 Learning Outcomes

After completing Module 4, students can:

Explain descriptive statistics.
Calculate measures of central tendency.
Calculate measures of dispersion.
Develop R programs for statistical analysis.
Interpret statistical output correctly.
Apply descriptive statistics to real-world datasets.

📘 MODULE 5: ADVANCED R PROGRAMMING

Class 1: Control Structures in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the concept of control structures.
Use conditional statements in R.
Write programs using if, if...else, and switch().
Make decisions based on different conditions.
Develop real-world decision-making programs in R.

📖 5.1 Introduction to Control Structures

Control structures determine the flow of execution of a program. They allow the program to make decisions and execute different blocks of code based on specified conditions.

Without control structures, every statement would execute sequentially.

Control structures help programmers create intelligent and interactive programs.

🌟 Types of Control Structures in R

There are two major categories:

1. Conditional Statements

if
if...else
else if
switch()

2. Looping Statements

for
while
repeat

📌 Advantages of Control Structures

Makes programs intelligent.
Supports decision making.
Reduces unnecessary code.
Improves program efficiency.
Makes programs easier to maintain.

🔵 5.2 The `if` Statement

Definition

The if statement executes a block of code only if the specified condition is TRUE.

If the condition is FALSE, the statements inside the if block are skipped.

Syntax


if(condition)
{
   statements
}

Flow Diagram


          Condition
              │
      ┌───────┴────────┐
      │                │
    TRUE             FALSE
      │                │
Execute Block      Skip Block

Example 1: Check Positive Number

Problem

Write an R program to check whether a number is positive.

R Program


number <- 25

if(number > 0)
{
   print("Positive Number")
}

Output


[1] "Positive Number"

Explanation

Since 25 > 0, the condition is TRUE.

Therefore,


Positive Number

is printed.

Example 2: Student Passed or Not

Sample Marks


Marks = 68

R Program


marks <- 68

if(marks >= 40)
{
   print("Student Passed")
}

Output


[1] "Student Passed"

Example 3: Check Even Number

R Program


number <- 20

if(number %% 2 == 0)
{
   print("Even Number")
}

Output


[1] "Even Number"

Real-Life Applications

ATM transaction approval
Online payment verification
Student pass/fail checking
Login authentication
Age verification

Advantages

Simple and easy to use.
Executes code only when required.
Improves efficiency.

Limitation

Cannot specify an alternative action when the condition is FALSE.

🟢 5.3 The `if...else` Statement

Definition

The if...else statement executes one block of code if the condition is TRUE and another block if the condition is FALSE.

Syntax


if(condition)
{
   statements
}
else
{
   statements
}

Flow Diagram


           Condition
               │
      ┌────────┴────────┐
      │                 │
    TRUE              FALSE
      │                 │
Execute IF Block   Execute ELSE Block

Example 1: Pass or Fail

R Program


marks <- 35

if(marks >= 40)
{
   print("Pass")
}
else
{
   print("Fail")
}

Output


[1] "Fail"

Explanation

The student's marks are 35, which is less than 40.

Therefore, the else block is executed.

Example 2: Voting Eligibility

R Program


age <- 20

if(age >= 18)
{
   print("Eligible for Voting")
}
else
{
   print("Not Eligible")
}

Output


[1] "Eligible for Voting"

Example 3: Largest of Two Numbers

Sample Data


A = 45

B = 60

R Program


a <- 45
b <- 60

if(a > b)
{
   print("A is Largest")
}
else
{
   print("B is Largest")
}

Output


[1] "B is Largest"

Real-Life Applications

Bank loan approval
Employee promotion
Scholarship eligibility
Insurance claim approval
Online order verification

Advantages

Supports two-way decision making.
Easy to implement.
Improves program readability.

Practice Exercise

Write an R program to check whether a person is:

Adult (Age ≥ 18)
Minor (Age < 18)

Expected Program


age <- 16

if(age >= 18)
{
   print("Adult")
}
else
{
   print("Minor")
}

Output


[1] "Minor"

🟣 5.4 The `switch()` Statement

Definition

The switch() statement selects one option from multiple alternatives based on a given expression.

It is useful when there are many possible choices, making the code shorter and easier to read than multiple if...else if statements.

Syntax


switch(expression,
       option1,
       option2,
       option3,
       ...
)

Example 1: Day of the Week

R Program


day <- 3

result <- switch(day,
                 "Monday",
                 "Tuesday",
                 "Wednesday",
                 "Thursday",
                 "Friday",
                 "Saturday",
                 "Sunday")

print(result)

Output


[1] "Wednesday"

Example 2: Calculator Menu

R Program


choice <- 2

result <- switch(choice,
                 "Addition",
                 "Subtraction",
                 "Multiplication",
                 "Division")

print(result)

Output


[1] "Subtraction"

Advantages

Easy to read.
Suitable for menu-driven programs.
Reduces lengthy if...else if statements.
Improves program organization.

Limitations

Best suited for fixed choices.
Not suitable for complex logical conditions.

📊 Comparison of Conditional Statements

Statement	Purpose	Best Use
`if`	Executes code when a condition is TRUE	Single condition
`if...else`	Chooses between two alternatives	Two-way decisions
`switch()`	Selects one option from many	Menu-driven programs

📝 Lab Exercises

Check whether a number is positive or negative.
Check whether a number is even or odd.
Check whether a student has passed or failed.
Find the larger of two numbers.
Display the day of the week using switch().
Create a simple calculator menu using switch().

❓ Viva Questions

What is a control structure?
What is the purpose of the if statement?
Explain the if...else statement with an example.
What is the difference between if and if...else?
What is the purpose of the switch() statement?
Give two real-life applications of conditional statements.
Which statement is suitable for menu-driven programs?
What happens if the if condition is FALSE?

📚 Class Summary

In this class, you learned:

The concept of control structures.
How to use if for single-condition decisions.
How if...else handles two-way decisions.
How switch() simplifies multiple-choice selection.
Real-world applications, advantages, limitations, and practice exercises.

Class 2: Nested `if...else` and `else if` Ladder

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand nested if...else statements.
Use the else if ladder for multiple conditions.
Develop programs involving multiple decision-making scenarios.
Apply conditional logic to solve practical problems.

📖 5.5 Nested `if...else`

Definition

A nested if...else statement is an if or if...else statement placed inside another if or else block.

It is used when one decision depends on the result of another decision.

Syntax


if(condition1)
{
    if(condition2)
    {
        statements
    }
    else
    {
        statements
    }
}
else
{
    statements
}

Flow of Execution

Check the first condition.
If it is TRUE, check the second condition.
Execute the appropriate block.
If the first condition is FALSE, execute the outer else block.

💻 Example 1: Voting Eligibility and Senior Citizen

Problem

Write an R program to determine whether a person is:

Not Eligible for Voting
Eligible for Voting
Senior Citizen

Sample Data

Person	Age
Rahul	65

R Program


age <- 65

if(age >= 18)
{
    if(age >= 60)
    {
        print("Senior Citizen")
    }
    else
    {
        print("Eligible for Voting")
    }
}
else
{
    print("Not Eligible for Voting")
}

Output


[1] "Senior Citizen"

Explanation

Age = 65
First condition (age >= 18) is TRUE.
Second condition (age >= 60) is also TRUE.
Therefore, Senior Citizen is displayed.

💻 Example 2: Login Authentication

Problem

Check both username and password.

R Program


username <- "admin"
password <- "12345"

if(username == "admin")
{
    if(password == "12345")
    {
        print("Login Successful")
    }
    else
    {
        print("Incorrect Password")
    }
}
else
{
    print("Invalid Username")
}

Output


[1] "Login Successful"

Real-Life Applications

ATM authentication
Online banking
User login systems
Online examinations
Employee verification

🟢 5.6 `else if` Ladder

Definition

The else if ladder is used when there are more than two possible conditions.

The conditions are checked from top to bottom, and the first TRUE condition is executed.

If none of the conditions is TRUE, the else block is executed.

Syntax


if(condition1)
{
    statements
}
else if(condition2)
{
    statements
}
else if(condition3)
{
    statements
}
else
{
    statements
}

💻 Example 3: Grade Calculation

Problem

Write an R program to display the grade of a student based on marks.

Grade Table

Marks	Grade
90–100	A+
80–89	A
70–79	B
60–69	C
40–59	D
Below 40	F

Sample Data

Marks = 76

R Program


marks <- 76

if(marks >= 90)
{
    print("Grade A+")
}
else if(marks >= 80)
{
    print("Grade A")
}
else if(marks >= 70)
{
    print("Grade B")
}
else if(marks >= 60)
{
    print("Grade C")
}
else if(marks >= 40)
{
    print("Grade D")
}
else
{
    print("Grade F")
}

Output


[1] "Grade B"

Explanation

Since 76 is greater than or equal to 70 but less than 80, the program prints Grade B.

💻 Example 4: Largest of Three Numbers

Sample Data

Variable	Value
A	45
B	75
C	60

R Program


a <- 45
b <- 75
c <- 60

if(a > b && a > c)
{
    print("A is Largest")
}
else if(b > a && b > c)
{
    print("B is Largest")
}
else
{
    print("C is Largest")
}

Output


[1] "B is Largest"

💻 Example 5: Leap Year Check

Rule

A year is a leap year if:

It is divisible by 400, or
It is divisible by 4 but not divisible by 100.

Sample Data

Year = 2024

R Program


year <- 2024

if((year %% 400 == 0) || (year %% 4 == 0 && year %% 100 != 0))
{
    print("Leap Year")
}
else
{
    print("Not a Leap Year")
}

Output


[1] "Leap Year"

Explanation

2024 is divisible by 4.
2024 is not divisible by 100.
Therefore, 2024 is a leap year.

📊 Comparison of Conditional Statements

Feature	`if`	`if...else`	`else if` Ladder
Number of Conditions	One	Two	Multiple
Alternative Action	❌	✔	✔
Best Use	Single decision	Two-way decision	Multi-way decision

🌍 Real-Life Applications

Student grading systems
Online login verification
Bank loan approval
Employee salary classification
Tax calculation
Scholarship eligibility
Election voting systems

✔ Advantages

Handles multiple conditions efficiently.
Makes code easier to read.
Supports complex decision-making.
Suitable for real-world applications.

✖ Disadvantages

Long else if ladders can reduce readability.
Incorrect condition order may lead to wrong results.

📝 Lab Exercises

Write an R program to find the largest of three numbers.
Write an R program to calculate grades using the else if ladder.
Write an R program to check whether a year is a leap year.
Write an R program to classify a person's age as Child, Teenager, Adult, or Senior Citizen.
Write an R program to calculate electricity bills based on slab rates.

❓ Viva Questions

What is a nested if...else statement?
What is an else if ladder?
What is the difference between nested if and else if?
When should you use an else if ladder?
Write the syntax of a nested if statement.
How is a leap year determined in R?
Why is the order of conditions important in an else if ladder?
Give two practical applications of nested if.

📚 Class Summary

In this class, you learned:

Nested if...else
else if ladder
Grade calculation
Largest of three numbers
Leap year checking
Login authentication
Practical applications
Lab exercises and viva questions

Class 3: Looping Constructs in R (Part 1)

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the need for loops in programming.
Use the for loop in R.
Apply nested for loops.
Generate tables and patterns using loops.
Solve repetitive programming tasks efficiently.

📖 5.7 Introduction to Loops

Definition

A loop is a control structure that repeatedly executes a block of code until a specified condition is met or for a fixed number of iterations.

Loops reduce repetitive coding and make programs shorter, more efficient, and easier to maintain.

Why Use Loops?

Without loops, printing numbers from 1 to 10 would require ten separate print() statements.

Using a loop, the same task can be completed with just a few lines of code.

🌟 Types of Loops in R

R supports three main looping constructs:

for Loop
while Loop
repeat Loop

In this class, we focus on the for loop.

🔵 5.8 The `for` Loop

Definition

The for loop executes a block of code a fixed number of times. It is commonly used when the number of iterations is known in advance.

Syntax


for(variable in sequence)
{
    statements
}

Flow Diagram


Start
  │
  ▼
Initialize Variable
  │
  ▼
Is Next Value Available?
  │
 ┌───────┐
 │ Yes   │
 ▼       │
Execute  │
Block    │
 │        │
 └────────┘
  │
  ▼
 No
  │
  ▼
Stop

💻 Example 1: Print Numbers from 1 to 10

R Program


for(i in 1:10)
{
    print(i)
}

Output


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The loop variable i takes the values from 1 to 10, one at a time.

The print(i) statement executes once for each value.

💻 Example 2: Print Even Numbers from 2 to 20

R Program


for(i in seq(2,20,2))
{
    print(i)
}

Output


[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 16
[1] 18
[1] 20

Explanation

The seq(2,20,2) function generates the sequence:

2, 4, 6, 8, 10, 12, 14, 16, 18, 20

The loop prints each value.

💻 Example 3: Multiplication Table of 5

R Program


number <- 5

for(i in 1:10)
{
    print(paste(number,"x",i,"=",number*i))
}

Output


[1] "5 x 1 = 5"
[1] "5 x 2 = 10"
[1] "5 x 3 = 15"
[1] "5 x 4 = 20"
[1] "5 x 5 = 25"
[1] "5 x 6 = 30"
[1] "5 x 7 = 35"
[1] "5 x 8 = 40"
[1] "5 x 9 = 45"
[1] "5 x 10 = 50"

💻 Example 4: Sum of First 10 Natural Numbers

Formula

Sum = 1 + 2 + 3 + ... + 10

R Program


sum <- 0

for(i in 1:10)
{
    sum <- sum + i
}

print(sum)

Output


[1] 55

Explanation

The variable sum starts at 0.

Each loop iteration adds the current value of i.

Final result = 55.

💻 Example 5: Factorial of a Number

Sample Data

Number = 5

R Program


fact <- 1

for(i in 1:5)
{
    fact <- fact * i
}

print(fact)

Output


[1] 120

Explanation

5! = 1 × 2 × 3 × 4 × 5 = 120

🟣 5.9 Nested `for` Loop

Definition

A nested for loop is a for loop placed inside another for loop.

It is useful for:

Pattern printing
Matrix operations
Multiplication tables
Two-dimensional data processing

Syntax


for(i in 1:n)
{
    for(j in 1:m)
    {
        statements
    }
}

💻 Example 6: Print a 5 × 5 Star Pattern

R Program


for(i in 1:5)
{
    for(j in 1:5)
    {
        cat("* ")
    }
    cat("\n")
}

Output


* * * * *
* * * * *
* * * * *
* * * * *
* * * * *

💻 Example 7: Multiplication Table (1 to 5)

R Program


for(i in 1:5)
{
    for(j in 1:5)
    {
        cat(i*j,"\t")
    }
    cat("\n")
}

Output


1   2   3   4   5
2   4   6   8   10
3   6   9   12  15
4   8   12  16  20
5   10  15  20  25

🌍 Real-Life Applications of `for` Loops

Processing student marks
Reading records from a dataset
Generating reports
Printing invoices
Matrix calculations
Creating tables and charts
Automating repetitive tasks

✔ Advantages

Reduces repetitive code.
Easy to understand.
Suitable when the number of iterations is known.
Improves code readability.

✖ Disadvantages

Not suitable when the number of iterations is unknown.
Can become inefficient for extremely large datasets if vectorized solutions are available.

📊 Summary Table

Loop Type	Best Used When	Example
`for`	Number of iterations is known	Print 1–10
Nested `for`	Two-dimensional processing	Matrix, patterns

📝 Lab Exercises

Print numbers from 1 to 20.
Print all odd numbers from 1 to 50.
Print the multiplication table of 7.
Calculate the sum of the first 20 natural numbers.
Calculate the factorial of 6.
Print a 4 × 4 star pattern.
Print the multiplication table from 1 to 10 using nested for loops.

❓ Viva Questions

What is a loop?
Why are loops used in programming?
Define the for loop.
What is a nested for loop?
Write the syntax of a for loop.
When should you use a for loop?
Give two applications of nested for loops.
What is the output of for(i in 1:5) print(i)?

📚 Class Summary

In this class, you learned:

The concept of loops.
The for loop.
Nested for loops.
Programs for printing numbers, even numbers, multiplication tables, sums, factorials, and star patterns.
Real-world applications, advantages, disadvantages, and practice exercises.

Class 4: `while` Loop, `repeat` Loop, `break`, and `next`

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the while loop.
Use the repeat loop.
Apply break and next statements.
Write efficient looping programs.
Compare different looping constructs in R.

📖 5.10 The `while` Loop

Definition

A while loop repeatedly executes a block of code as long as the specified condition is TRUE.

Unlike a for loop, the number of iterations is not fixed. The loop continues until the condition becomes FALSE.

Syntax


while(condition)
{
    statements
}

Flow Diagram


           Start
             │
             ▼
      Check Condition
             │
      ┌──────┴──────┐
      │             │
    TRUE          FALSE
      │             │
Execute Block      Stop
      │
      └─────────────┘

💻 Example 1: Print Numbers from 1 to 10

R Program


i <- 1

while(i <= 10)
{
    print(i)
    i <- i + 1
}

Output


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

Variable i starts at 1.
The condition i <= 10 is checked.
The value of i is printed.
i is increased by 1.
The process repeats until i = 11.

💻 Example 2: Sum of First 10 Natural Numbers


i <- 1
sum <- 0

while(i <= 10)
{
    sum <- sum + i
    i <- i + 1
}

print(sum)

Output


[1] 55

💻 Example 3: Multiplication Table of 8


i <- 1

while(i <= 10)
{
    print(paste("8 x", i, "=", 8*i))
    i <- i + 1
}

Output


[1] "8 x 1 = 8"
[1] "8 x 2 = 16"
[1] "8 x 3 = 24"
...
[1] "8 x 10 = 80"

✔ Advantages of `while`

Suitable when the number of iterations is unknown.
Easy to implement.
Flexible for condition-based repetition.

✖ Disadvantages

May result in an infinite loop if the condition never becomes FALSE.
Requires careful updating of the loop variable.

🟣 5.11 The `repeat` Loop

Definition

The repeat loop executes a block of code indefinitely until it is explicitly stopped using the break statement.

Syntax


repeat
{
    statements

    if(condition)
        break
}

💻 Example 4: Print Numbers from 1 to 10


i <- 1

repeat
{
    print(i)

    i <- i + 1

    if(i > 10)
        break
}

Output


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The repeat loop runs continuously until the break statement stops it.

💻 Example 5: Find First Number Divisible by 17


i <- 1

repeat
{
    if(i %% 17 == 0)
    {
        print(i)
        break
    }

    i <- i + 1
}

Output


[1] 17

🔴 5.12 The `break` Statement

Definition

The break statement immediately terminates the loop, regardless of whether the loop condition is still TRUE.

Syntax


break

💻 Example 6: Stop at Number 6


for(i in 1:10)
{
    if(i == 6)
    {
        break
    }

    print(i)
}

Output


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Explanation

When i becomes 6, the loop stops immediately.

🟢 5.13 The `next` Statement

Definition

The next statement skips the current iteration and continues with the next iteration of the loop.

Syntax


next

💻 Example 7: Skip Number 5


for(i in 1:10)
{
    if(i == 5)
    {
        next
    }

    print(i)
}

Output


[1] 1
[1] 2
[1] 3
[1] 4
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Explanation

The value 5 is skipped because the next statement moves directly to the next iteration.

💻 Example 8: Print Only Odd Numbers


for(i in 1:20)
{
    if(i %% 2 == 0)
    {
        next
    }

    print(i)
}

Output


[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
[1] 19

📊 Comparison of Loops

Feature	`for`	`while`	`repeat`
Number of Iterations	Known	Unknown	Infinite until `break`
Condition Checked	Before each iteration	Before each iteration	Inside the loop
Best Use	Fixed repetitions	Condition-controlled repetition	Continuous processes

📊 Comparison of `break` and `next`

Feature	`break`	`next`
Action	Stops the loop	Skips current iteration
Continues Loop?	❌ No	✔ Yes
Typical Use	Exit early	Ignore specific values

🌍 Real-Life Applications

ATM cash withdrawal limits
Password validation
Online shopping carts
Data cleaning
Reading files line by line
Sensor monitoring
Menu-driven applications
Network communication

✔ Advantages

Automates repetitive tasks.
Supports flexible programming.
Easy to combine with conditions.
Efficient for large datasets.

✖ Disadvantages

Infinite loops may occur if conditions are incorrect.
Can reduce readability if nested excessively.

📝 Lab Exercises

Exercise 1

Print numbers from 1 to 50 using a while loop.

Exercise 2

Find the sum of the first 25 natural numbers.

Exercise 3

Print the multiplication table of 12 using a while loop.

Exercise 4

Write a repeat loop that prints numbers from 10 to 1.

Exercise 5

Print numbers from 1 to 20, stopping at 15 using break.

Exercise 6

Print numbers from 1 to 20, skipping multiples of 3 using next.

Exercise 7

Write a program to print only even numbers between 1 and 50.

❓ Viva Questions

What is a while loop?
How does a while loop differ from a for loop?
What is a repeat loop?
Why is the break statement used?
What is the purpose of the next statement?
What happens if a while loop condition never becomes FALSE?
Can a repeat loop execute without a break statement?
Which loop is best when the number of iterations is unknown?

📚 Class Summary

In this class, you learned:

The while loop and its syntax.
The repeat loop and how it differs from while.
The use of break to terminate loops.
The use of next to skip iterations.
Practical R programs with outputs and explanations.
Real-world applications, advantages, limitations, exercises, and viva questions.

Class 5: Vectorized Operations in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand vectorized operations in R.
Perform arithmetic operations on vectors.
Apply logical and relational operators.
Use mathematical functions with vectors.
Compare vectorized operations with loops.

📖 5.14 Introduction to Vectorized Operations

Definition

Vectorization is one of the most powerful features of R. It allows operations to be performed on an entire vector at once instead of processing one element at a time using loops.

This makes R programs:

✔ Faster
✔ Shorter
✔ Easier to read
✔ More efficient

Example

Without vectorization:


a <- c(10,20,30,40,50)
b <- c(2,4,6,8,10)

result <- numeric(5)

for(i in 1:5)
{
    result[i] <- a[i] + b[i]
}

print(result)

With vectorization:


a <- c(10,20,30,40,50)
b <- c(2,4,6,8,10)

result <- a + b

print(result)

Both programs produce the same result, but the vectorized version is shorter and more efficient.

🔵 5.15 Arithmetic Operations on Vectors

Sample Data (10 Values)


Vector A

10 20 30 40 50 60 70 80 90 100

Vector B

2 4 6 8 10 12 14 16 18 20

💻 Example 1: Addition


A <- c(10,20,30,40,50,60,70,80,90,100)
B <- c(2,4,6,8,10,12,14,16,18,20)

A + B

Output


[1] 12 24 36 48 60 72 84 96 108 120

💻 Example 2: Subtraction


A - B

Output


[1] 8 16 24 32 40 48 56 64 72 80

💻 Example 3: Multiplication


A * B

Output


[1] 20 80 180 320 500 720 980 1280 1620 2000

💻 Example 4: Division


A / B

Output


[1] 5 5 5 5 5 5 5 5 5 5

💻 Example 5: Power

A^2

Output


[1] 100 400 900 1600 2500 3600 4900 6400 8100 10000

🟢 5.16 Relational Operations

Relational operators compare vector elements and return TRUE or FALSE.

Operators

Operator	Meaning
>	Greater than
<	Less than
>=	Greater than or equal
<=	Less than or equal
==	Equal
!=	Not equal

💻 Example 6


A > 50

Output


[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE

💻 Example 7


A == 40

Output


[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE

🟣 5.17 Logical Operations

Logical operators combine multiple conditions.

Operators

Operator	Meaning
&	AND
\|	OR
!	NOT

💻 Example 8


(A > 30) & (A < 80)

Output


[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE

💻 Example 9


(A < 30) | (A > 80)

Output


[1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE

🟠 5.18 Mathematical Functions

R provides built-in mathematical functions that work directly on vectors.

Example 10: Square Root


sqrt(A)

Output


[1] 3.162278 4.472136 5.477226 6.324555 7.071068
[6] 7.745967 8.366600 8.944272 9.486833 10.000000

Example 11: Logarithm


log(A)

Output


[1] 2.302585 2.995732 3.401197 3.688879 3.912023
[6] 4.094345 4.248495 4.382027 4.499810 4.605170

Example 12: Exponential


exp(1:5)

Output


[1] 2.718282 7.389056 20.085537 54.598150 148.413159

Example 13: Absolute Value


x <- c(-5,-10,15,-20,25)

abs(x)

Output


[1] 5 10 15 20 25

🔴 5.19 Statistical Functions

Example 14


marks <- c(45,52,58,63,67,72,78,84,90,95)

sum(marks)

mean(marks)

max(marks)

min(marks)

length(marks)

Output


Sum      = 704

Mean     = 70.4

Maximum  = 95

Minimum  = 45

Length   = 10

⚡ Performance Comparison

Using Loop


result <- numeric(10)

for(i in 1:10)
{
    result[i] <- A[i] + B[i]
}

Using Vectorization


result <- A + B

Which is Better?

Feature	Loop	Vectorization
Speed	Slower	Faster
Readability	Moderate	Excellent
Memory Efficiency	Lower	Higher
Code Length	Longer	Shorter

🌍 Real-Life Applications

Financial analysis
Data preprocessing
Machine learning
Scientific computing
Image processing
Bioinformatics
Statistical analysis
Business analytics

✔ Advantages

Faster execution.
Cleaner code.
Less programming effort.
Better performance.
Optimized for large datasets.

✖ Disadvantages

May use more memory for very large vectors.
Requires vectors of compatible lengths or understanding of R's recycling rules.

📝 Lab Exercises

Create two vectors of 10 elements and perform addition.
Perform subtraction, multiplication, and division on two vectors.
Find the square of every element in a vector.
Check which elements are greater than 50.
Find the square root of all elements.
Calculate the sum and mean of a vector.
Compare the performance of a loop and vectorized addition.
Create a vector of temperatures and convert them from Celsius to Fahrenheit using vectorized operations.

❓ Viva Questions

What is vectorization in R?
Why are vectorized operations faster than loops?
Name four arithmetic operations on vectors.
What are relational operators?
What is the purpose of logical operators?
Which function calculates the square root?
Which function returns the absolute value?
Which function calculates the mean?
How does vectorization improve code readability?
Give two applications of vectorized operations.

📚 Class Summary

In this class, you learned:

The concept of vectorization.
Arithmetic operations on vectors.
Relational and logical operations.
Mathematical and statistical functions.
Performance comparison between loops and vectorized operations.
Practical programs with outputs and explanations.
Applications, exercises, and viva questions.

Class 6: The `apply()` Function in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the purpose of the apply() function.
Perform row-wise and column-wise operations on matrices.
Replace loops with efficient vectorized operations.
Apply mathematical and statistical functions to matrices.
Analyze matrix data effectively.

📖 5.20 Introduction to the Apply Family

The Apply Family is one of the most powerful features of R. It provides efficient alternatives to explicit loops for processing data.

The Apply Family includes:

Function	Purpose
`apply()`	Apply a function to rows or columns of a matrix/array
`lapply()`	Apply a function to each element of a list
`sapply()`	Simplified version of `lapply()`
`tapply()`	Apply a function to groups of data
`mapply()`	Apply a function to multiple objects simultaneously

In this class, we focus on apply().

🔵 5.21 The `apply()` Function

Definition

The apply() function applies a specified function to the rows or columns of a matrix or array.

It eliminates the need for explicit loops and makes the code more concise.

Syntax


apply(X, MARGIN, FUN)

Parameters

Parameter	Description
`X`	Matrix or array
`MARGIN = 1`	Apply function to rows
`MARGIN = 2`	Apply function to columns
`FUN`	Function to apply (e.g., sum, mean, max)

📊 Sample Matrix (5 × 2)

We will use the following matrix of 10 values throughout this chapter.


     English  Mathematics

S1      65        70
S2      75        82
S3      68        75
S4      90        95
S5      88        91

Creating the Matrix


marks <- matrix(c(
65,70,
75,82,
68,75,
90,95,
88,91),
nrow=5,
byrow=TRUE)

colnames(marks)<-c("English","Mathematics")

rownames(marks)<-c("S1","S2","S3","S4","S5")

marks

Output


   English Mathematics

S1      65        70
S2      75        82
S3      68        75
S4      90        95
S5      88        91

💻 Example 1: Row-wise Sum


apply(marks,1,sum)

Output


S1  S2  S3  S4  S5

135 157 143 185 179

Explanation

Here,

1 means rows
sum calculates the total marks of each student.

💻 Example 2: Column-wise Sum


apply(marks,2,sum)

Output


English      Mathematics

386             413

Explanation

2 indicates columns.

The total marks for each subject are calculated.

💻 Example 3: Row-wise Mean


apply(marks,1,mean)

Output


S1   S2   S3   S4   S5

67.5 78.5 71.5 92.5 89.5

💻 Example 4: Column-wise Mean


apply(marks,2,mean)

Output


English      Mathematics

77.2            82.6

💻 Example 5: Maximum Marks in Each Subject


apply(marks,2,max)

Output


English      Mathematics

90               95

💻 Example 6: Minimum Marks


apply(marks,2,min)

Output


English      Mathematics

65              70

💻 Example 7: Standard Deviation


apply(marks,2,sd)

Output


English      Mathematics

11.92          10.48

(Approximate values.)

💻 Example 8: Variance


apply(marks,2,var)

Output


English      Mathematics

142.2        109.8

(Approximate values.)

💻 Example 9: Square Root of All Elements


apply(marks,c(1,2),sqrt)

Output


        English Mathematics

S1      8.06      8.37

S2      8.66      9.05

S3      8.25      8.66

S4      9.49      9.75

S5      9.38      9.54

💻 Example 10: Find Maximum Marks of Each Student


apply(marks,1,max)

Output


S1 S2 S3 S4 S5

70 82 75 95 91

💻 Example 11: Find Minimum Marks of Each Student


apply(marks,1,min)

Output


S1 S2 S3 S4 S5

65 75 68 90 88

📊 Understanding `MARGIN`

Value	Operation
`1`	Apply function row-wise
`2`	Apply function column-wise
`c(1,2)`	Apply function to every element

🌍 Real-Life Applications

The apply() function is commonly used in:

Student result analysis
Employee salary calculations
Financial data analysis
Sales report generation
Machine learning data preprocessing
Scientific research
Medical statistics
Data mining

✔ Advantages

Faster than explicit loops.
Reduces program length.
Easy to read and maintain.
Efficient for matrix operations.
Ideal for statistical analysis.

✖ Disadvantages

Works only with arrays and matrices.
Less suitable for irregular data structures (lists with different element types).

📊 Comparison: Loop vs. `apply()`

Feature	`for` Loop	`apply()`
Code Length	Long	Short
Speed	Moderate	Fast
Readability	Good	Excellent
Matrix Operations	Manual	Built-in
Performance	Lower	Higher

📝 Lab Exercises

Exercise 1

Create a 5 × 2 matrix and calculate the row-wise sum.

Exercise 2

Find the column-wise average of a matrix.

Exercise 3

Find the maximum value in each row.

Exercise 4

Find the minimum value in each column.

Exercise 5

Calculate the standard deviation of each column.

Exercise 6

Calculate the variance of each row.

Exercise 7

Find the square root of every matrix element using apply().

Exercise 8

Compare a for loop with apply() for calculating row sums.

❓ Viva Questions

What is the purpose of the apply() function?
Write the syntax of apply().
What does MARGIN = 1 mean?
What does MARGIN = 2 mean?
Can apply() be used on vectors?
Which data structures are suitable for apply()?
Give two advantages of apply().
Why is apply() preferred over loops?
Name five functions in the Apply Family.
Give two real-life applications of apply().

📚 Class Summary

In this class, you learned:

The concept of the Apply Family.
Syntax and parameters of apply().
Row-wise and column-wise calculations.
Matrix operations using statistical functions.
Comparison of apply() and loops.
Practical R programs with outputs and explanations.
Real-world applications, lab exercises, and viva questions.

Class 7: `lapply()` and `sapply()` Functions in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the purpose of lapply() and sapply().
Apply functions to list elements.
Differentiate between lapply() and sapply().
Use these functions for efficient data processing.
Analyze list-based data in R.

📖 5.22 Introduction to `lapply()` and `sapply()`

In R, lists can store different types of data, such as numbers, characters, vectors, and matrices. The lapply() and sapply() functions are designed to apply a function to each element of a list.

These functions help eliminate repetitive loops and produce concise, efficient code.

🔵 5.23 The `lapply()` Function

Definition

The lapply() function applies a specified function to each element of a list and always returns a list.

Syntax


lapply(X, FUN)

Parameters

Parameter	Description
`X`	List or vector
`FUN`	Function to apply

📊 Sample List


student_data <- list(

English = c(65,70,75,80,85),

Math = c(72,78,82,88,90),

Science = c(68,74,79,84,89)

)

💻 Example 1: Find Mean of Each Subject

R Program


student_data <- list(

English=c(65,70,75,80,85),

Math=c(72,78,82,88,90),

Science=c(68,74,79,84,89)

)

lapply(student_data,mean)

Output


$English

[1] 75

$Math

[1] 82

$Science

[1] 78.8

Explanation

The mean() function is applied to each list element individually. The result is returned as a list.

💻 Example 2: Find Sum of Each Subject


lapply(student_data,sum)

Output


$English

[1] 375

$Math

[1] 410

$Science

[1] 394

💻 Example 3: Find Maximum Marks


lapply(student_data,max)

Output


$English

[1] 85

$Math

[1] 90

$Science

[1] 89

💻 Example 4: Find Minimum Marks


lapply(student_data,min)

Output


$English

[1] 65

$Math

[1] 72

$Science

[1] 68

💻 Example 5: Find Standard Deviation


lapply(student_data,sd)

Output


$English

[1] 7.91

$Math

[1] 7.44

$Science

[1] 8.17

(Approximate values.)

🟢 5.24 The `sapply()` Function

Definition

The sapply() function works like lapply(), but it tries to simplify the result. If possible, it returns a vector or matrix instead of a list.

Syntax


sapply(X, FUN)

💻 Example 6: Mean of Each Subject


sapply(student_data,mean)

Output


English      Math   Science

75.0        82.0      78.8

Explanation

Unlike lapply(), the result is returned as a named vector.

💻 Example 7: Sum of Each Subject


sapply(student_data,sum)

Output


English      Math   Science

375         410        394

💻 Example 8: Maximum Marks


sapply(student_data,max)

Output


English      Math   Science

85          90          89

💻 Example 9: Minimum Marks


sapply(student_data,min)

Output


English      Math   Science

65          72         68

💻 Example 10: Length of Each Subject Vector


sapply(student_data,length)

Output


English      Math   Science

5            5          5

💻 Example 11: Square Root of All Values


sapply(student_data,sqrt)

Output


          English      Math    Science

[1,]      8.06       8.49      8.25

[2,]      8.37       8.83      8.60

[3,]      8.66       9.06      8.89

[4,]      8.94       9.38      9.17

[5,]      9.22       9.49      9.43

📊 Comparison of `lapply()` and `sapply()`

Feature	`lapply()`	`sapply()`
Return Type	List	Vector, Matrix, or List
Simplifies Output	No	Yes
Easy to Read	Moderate	Excellent
Suitable For	Complex Objects	Simple Results
Common Usage	Lists	Reports and Summaries

🌍 Real-Life Applications

Student result analysis
Employee salary reports
Machine learning preprocessing
Financial reporting
Medical research
Sales analysis
Survey data analysis
Scientific computing

✔ Advantages

`lapply()`

Always returns a list.
Preserves the original structure.
Suitable for complex data.

`sapply()`

Produces simplified output.
Easier to use in calculations.
Ideal for reports and summaries.

✖ Disadvantages

`lapply()`

Output may require additional extraction.

`sapply()`

Simplification may not always produce the expected structure.

📊 Comparison with Loops

Feature	`for` Loop	`lapply()`	`sapply()`
Speed	Moderate	Fast	Fast
Code Length	Long	Short	Short
Readability	Good	Excellent	Excellent
Output	Manual	List	Simplified

📝 Lab Exercises

Exercise 1

Create a list containing marks for three subjects and calculate the mean using lapply().

Exercise 2

Find the sum of each subject using sapply().

Exercise 3

Calculate the maximum and minimum marks for each subject.

Exercise 4

Find the standard deviation of each subject.

Exercise 5

Use sapply() to calculate the length of each vector in the list.

Exercise 6

Find the square root of all values in the list using sapply().

Exercise 7

Compare the outputs of lapply() and sapply() for the same dataset.

❓ Viva Questions

What is the purpose of lapply()?
What is the purpose of sapply()?
What is the main difference between lapply() and sapply()?
Which function always returns a list?
Which function simplifies its output?
Can sapply() return a matrix?
Give two applications of lapply().
Give two applications of sapply().
Why are these functions preferred over loops?
Name five functions in the Apply Family.

📚 Class Summary

In this class, you learned:

The purpose of lapply() and sapply().
How to apply functions to list elements.
The difference between the two functions.
Practical R programs with outputs.
Real-world applications.
Comparison with loops.
Lab exercises and viva questions.

📘 MODULE 5: ADVANCED R PROGRAMMING

Class 8: `tapply()` and `mapply()` Functions in R

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the purpose of tapply() and mapply().
Perform group-wise calculations using tapply().
Apply functions to multiple vectors simultaneously using mapply().
Compare all members of the Apply Family.
Solve practical data analysis problems efficiently.

📖 5.25 The `tapply()` Function

Definition

The tapply() function is used to apply a function to subsets of a vector, where the subsets are defined by a grouping variable (factor).

It is particularly useful for group-wise statistical analysis.

Syntax


tapply(X, INDEX, FUN)

Parameters

Parameter	Description
`X`	Numeric vector
`INDEX`	Grouping factor
`FUN`	Function to apply (e.g., mean, sum, max)

Sample Data (10 Students)


student <- c("S1","S2","S3","S4","S5","S6","S7","S8","S9","S10")

department <- c(
"Science","Commerce","Science","Arts","Commerce",
"Arts","Science","Commerce","Arts","Science")

marks <- c(85,72,90,65,78,70,88,80,75,92)

💻 Example 1: Average Marks by Department

R Program


tapply(marks, department, mean)

Output


      Arts Commerce Science

70.00     76.67     88.75

Explanation

The students are grouped by department, and the mean marks are calculated separately for each group.

💻 Example 2: Total Marks by Department


tapply(marks, department, sum)

Output


Arts Commerce Science

210    230      355

💻 Example 3: Maximum Marks


tapply(marks, department, max)

Output


Arts Commerce Science

75      80       92

💻 Example 4: Minimum Marks


tapply(marks, department, min)

Output


Arts Commerce Science

65      72       85

💻 Example 5: Standard Deviation


tapply(marks, department, sd)

Output


Arts Commerce Science

5.00   4.16    2.99

(Approximate values.)

📊 Real-Life Uses of `tapply()`

Average salary by department
Sales by region
Marks by class
Hospital patients by ward
Profit by branch
Employee performance by team

🟢 5.26 The `mapply()` Function

Definition

The mapply() function applies a function to multiple vectors or lists simultaneously.

It is the multivariate version of sapply().

Syntax


mapply(FUN, vector1, vector2)

Sample Data


A <- c(10,20,30,40,50)

B <- c(2,4,6,8,10)

💻 Example 6: Addition


mapply(function(x,y)x+y,A,B)

Output


[1] 12 24 36 48 60

💻 Example 7: Multiplication


mapply(function(x,y)x*y,A,B)

Output


[1] 20 80 180 320 500

💻 Example 8: Division


mapply(function(x,y)x/y,A,B)

Output


[1] 5 5 5 5 5

💻 Example 9: Power Function


base <- c(2,3,4,5,6)

power <- c(2,2,2,2,2)

mapply(function(x,y)x^y,base,power)

Output


[1] 4 9 16 25 36

💻 Example 10: Maximum of Two Numbers


mapply(max,A,B)

Output


[1] 10 20 30 40 50

💻 Example 11: Minimum of Two Numbers


mapply(min,A,B)

Output


[1] 2 4 6 8 10

💻 Example 12: Product of Price and Quantity

Sample Data


price <- c(150,250,300,120,400)

quantity <- c(2,1,3,4,2)

R Program


mapply(function(p,q)p*q,price,quantity)

Output


[1] 300 250 900 480 800

Explanation

Each product price is multiplied by its corresponding quantity to calculate the total cost.

📊 Comparison of the Apply Family

Function	Input	Returns	Best Used For
`apply()`	Matrix	Vector/Matrix	Row-wise & Column-wise operations
`lapply()`	List	List	Complex list processing
`sapply()`	List	Vector/Matrix	Simplified summaries
`tapply()`	Vector + Group	Group-wise result	Statistical analysis
`mapply()`	Multiple vectors	Vector/List	Multiple input operations

🌍 Real-Life Applications

`tapply()`

Sales by region
Student marks by department
Employee salary by designation
Hospital patient analysis
Banking transaction summaries

`mapply()`

Billing systems
Invoice generation
Salary calculations
Shopping cart totals
Financial computations

✔ Advantages

`tapply()`

Performs grouped calculations efficiently.
Reduces coding effort.
Ideal for statistical reporting.

`mapply()`

Works with multiple vectors simultaneously.
Eliminates nested loops.
Improves readability and efficiency.

✖ Disadvantages

Requires compatible vector lengths for mapply().
tapply() needs an appropriate grouping factor.

📝 Lab Exercises

Exercise 1

Create a vector of marks and departments. Find the average marks department-wise using tapply().

Exercise 2

Find the maximum marks department-wise.

Exercise 3

Calculate the standard deviation for each department.

Exercise 4

Create two vectors and perform addition using mapply().

Exercise 5

Multiply two vectors element-wise using mapply().

Exercise 6

Calculate the total cost of products using price and quantity vectors.

Exercise 7

Compare the outputs of apply(), lapply(), sapply(), tapply(), and mapply() using suitable datasets.

❓ Viva Questions

What is the purpose of tapply()?
What is a grouping factor in tapply()?
Write the syntax of tapply().
What is the purpose of mapply()?
How is mapply() different from sapply()?
Give two real-life applications of tapply().
Give two applications of mapply().
Which Apply Family function is used for grouped statistical analysis?
Which function processes multiple vectors simultaneously?
List all five Apply Family functions in R.

📚 Class Summary

In this class, you learned:

The tapply() function for group-wise calculations.
The mapply() function for operations on multiple vectors.
Practical examples with outputs.
Comparison of all Apply Family functions.
Real-world applications.
Lab exercises and viva questions.

📘 MODULE 5: ADVANCED R PROGRAMMING

Class 9: Debugging Tools in R (`debug()`, `trace()`, `browser()`, `debugonce()`)

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand the concept of debugging.
Identify different types of programming errors.
Use built-in debugging tools in R.
Trace and inspect program execution.
Find and fix logical and runtime errors.

📖 5.27 Introduction to Debugging

Definition

Debugging is the process of finding, identifying, and correcting errors (bugs) in a program.

Every programmer encounters errors while writing code. Debugging tools help locate these errors efficiently.

🌟 Why Debugging is Important?

Debugging helps to:

Find programming mistakes.
Correct logical errors.
Prevent program crashes.
Improve code quality.
Reduce development time.
Increase program reliability.

📊 Types of Errors in R

Error Type	Description	Example
Syntax Error	Incorrect grammar	Missing `)`
Runtime Error	Error during execution	Division by zero
Logical Error	Program runs but gives wrong output	Incorrect formula

🔵 Example 1: Syntax Error

Incorrect Program


x <- 10
y <- 20

print(x+y

Output


Error: unexpected end of input

Correct Program


x <- 10
y <- 20

print(x+y)

Output


[1] 30

🔴 Example 2: Logical Error

Incorrect Program


length <- 10
breadth <- 5

area <- 2*(length+breadth)

print(area)

Output


[1] 30

Explanation

The program calculates the perimeter, not the area.

Correct Program


length <- 10
breadth <- 5

area <- length * breadth

print(area)

Output


[1] 50

🟣 5.28 The `debug()` Function

Definition

The debug() function places a function into debugging mode. The function pauses before executing each statement, allowing you to inspect variables and execution flow.

Syntax


debug(function_name)

💻 Example 3


square <- function(x)
{
    y <- x*x
    return(y)
}

debug(square)

square(5)

Console Output (Simplified)


debugging in: square(5)

Browse[2]>

Explanation

The program enters debug mode and pauses at each line. You can inspect variables and execute commands before continuing.

🟢 5.29 The `debugonce()` Function

Definition

debugonce() works like debug(), but it enables debugging only for the next function call.

Syntax


debugonce(function_name)

💻 Example 4


cube <- function(x)
{
    x^3
}

debugonce(cube)

cube(4)

Output


debugging in: cube(4)

Browse[2]>

Explanation

The debugger is activated only once. Future calls to cube() run normally unless debugonce() is called again.

🟡 5.30 The `trace()` Function

Definition

The trace() function inserts temporary debugging or tracing code into an existing function without modifying the original function definition.

Syntax


trace(function_name)

💻 Example 5


add <- function(a,b)
{
    a+b
}

trace(add)

add(10,20)

Output (Simplified)


Tracing add(10,20)

[1] 30

Explanation

The trace message shows when the function is called, helping you understand program flow.

🔵 5.31 The `browser()` Function

Definition

The browser() function pauses program execution at a specific point, allowing you to inspect variables interactively.

Syntax


browser()

💻 Example 6


calculate <- function(x,y)
{
    browser()

    z <- x+y

    print(z)
}

calculate(15,25)

Output (Simplified)


Called from: calculate(15,25)

Browse[1]>

Explanation

Execution stops at browser(). You can inspect variables, execute commands, and continue execution after checking the program state.

💻 Example 7: Removing Debug Mode


undebug(square)

Explanation

The undebug() function disables debugging for the specified function.

📊 Debugging Workflow


Write Program
      │
      ▼
Run Program
      │
      ▼
Error Found?
      │
 ┌────┴────┐
 │         │
No        Yes
 │         │
End   Use Debugging Tools
            │
            ▼
      Fix the Error
            │
            ▼
      Test Again

📊 Comparison of Debugging Functions

Function	Purpose	Stops Execution?
`debug()`	Debug every call	✔ Yes
`debugonce()`	Debug next call only	✔ Yes
`trace()`	Trace function execution	✖ Usually No
`browser()`	Pause at a specific line	✔ Yes
`undebug()`	Remove debug mode	✖ No

🌍 Real-Life Applications

Software development
Data science projects
Financial applications
Machine learning model debugging
Scientific computing
Statistical analysis
Database programming
Web application development

✔ Advantages

Quickly identifies programming errors.
Improves program reliability.
Saves development time.
Helps understand program execution.
Useful for large R projects.

✖ Disadvantages

Can slow program execution.
Requires understanding of program flow.
Excessive debugging may become time-consuming.

📝 Lab Exercises

Exercise 1

Create a function to calculate the square of a number and debug it using debug().

Exercise 2

Use debugonce() with a factorial function.

Exercise 3

Insert browser() into a function and inspect variable values.

Exercise 4

Trace a user-defined function using trace().

Exercise 5

Enable debugging and then remove it using undebug().

Exercise 6

Create a program with a logical error and identify it using debugging techniques.

Exercise 7

Write a function to calculate the average of five numbers and debug the function.

❓ Viva Questions

What is debugging?
Why is debugging important?
What are the three main types of errors?
What is the purpose of debug()?
How does debugonce() differ from debug()?
What is the purpose of trace()?
What does the browser() function do?
How can you remove debugging from a function?
Which debugging function pauses execution at a specified line?
Give two real-life applications of debugging.

📚 Class Summary

In this class, you learned:

The concept and importance of debugging.
Types of programming errors.
Using debug(), debugonce(), trace(), browser(), and undebug().
Practical debugging examples with outputs.
Debugging workflow.
Real-world applications.
Lab exercises and viva questions.

📘 MODULE 5: ADVANCED R PROGRAMMING

Class 10: Error Handling in R (`try()`, `tryCatch()`, `warning()`, `stop()`)

Duration: 1 Class

🎯 Learning Objectives

After completing this lesson, students will be able to:

Understand error handling in R.
Use try() to prevent program termination.
Handle errors using tryCatch().
Generate warning messages with warning().
Stop execution using stop().
Write robust and fault-tolerant R programs.

📖 5.32 Introduction to Error Handling

Definition

Error handling is the process of detecting, managing, and responding to errors that occur during program execution.

Instead of allowing a program to terminate unexpectedly, error handling enables the program to continue executing gracefully or display meaningful messages.

Why is Error Handling Important?

Error handling helps to:

Prevent unexpected program crashes.
Improve software reliability.
Provide user-friendly error messages.
Handle invalid input safely.
Simplify debugging and maintenance.

📊 Types of Conditions in R

Type	Description	Example
Error	Stops program execution	Division by zero in some contexts, invalid operations
Warning	Displays a warning but continues execution	`sqrt(-1)`
Message	Provides informational messages	Package loading messages

🔵 5.33 The `try()` Function

Definition

The try() function executes an expression and captures any errors without stopping the entire program.

Syntax


try(expression)

Example 1: Division


result <- try(10 / 2)

print(result)

Output


[1] 5

Example 2: Invalid Operation


x <- "ABC"

result <- try(log(x))

print(result)

Output


Error in log(x) :
  non-numeric argument to mathematical function

The program continues running, even though an error occurs.

Advantages of `try()`

Prevents abrupt program termination.
Useful in loops and batch processing.
Easy to implement.

🟢 5.34 The `tryCatch()` Function

Definition

The tryCatch() function provides advanced error handling by allowing different actions for errors, warnings, and successful execution.

Syntax


tryCatch(
   expression,

   error = function(e){},

   warning = function(w){},

   finally = {}
)

💻 Example 3: Handle an Error


result <- tryCatch(

{
    log("ABC")
},

error = function(e)
{
    print("Error Detected")
}
)

Output


[1] "Error Detected"

💻 Example 4: Successful Execution


result <- tryCatch(

{
    sqrt(64)
},

error = function(e)
{
    print("Error")
}
)

print(result)

Output


[1] 8

💻 Example 5: Using `finally`


tryCatch(

{
    print("Program Started")
},

finally=
{
    print("Program Finished")
}
)

Output


[1] "Program Started"

[1] "Program Finished"

🟡 5.35 The `warning()` Function

Definition

The warning() function displays a warning message but does not stop the program.

Syntax


warning("Message")

💻 Example 6


marks <- -10

if(marks < 0)
{
    warning("Marks cannot be negative.")
}

Output


Warning message:

Marks cannot be negative.

Explanation

The program continues executing after displaying the warning.

🔴 5.36 The `stop()` Function

Definition

The stop() function immediately terminates execution and displays an error message.

Syntax


stop("Error Message")

💻 Example 7


age <- -5

if(age < 0)
{
    stop("Age cannot be negative.")
}

Output


Error:

Age cannot be negative.

Explanation

The program stops immediately because stop() generates an error.

💻 Example 8: Combining `warning()` and `stop()`


temperature <- -300

if(temperature < -273.15)
{
    stop("Temperature below absolute zero is not possible.")
}
else if(temperature < 0)
{
    warning("Temperature is below freezing.")
}
else
{
    print("Temperature is valid.")
}

Output


Error:

Temperature below absolute zero is not possible.

📊 Comparison of Error Handling Functions

Function	Stops Program	Purpose
`try()`	❌ No	Continue after an error
`tryCatch()`	❌ No	Handle errors and warnings gracefully
`warning()`	❌ No	Display warning message
`stop()`	✔ Yes	Terminate program with an error

🌍 Real-Life Applications

Banking software
E-commerce applications
Online registration systems
Student management systems
Medical data processing
Machine learning pipelines
Scientific computing
Financial analysis

✔ Best Practices

Validate user input before processing.
Use meaningful error and warning messages.
Handle expected errors with tryCatch().
Use stop() only for critical errors.
Test programs with both valid and invalid inputs.

📝 Lab Exercises

Exercise 1

Use try() to execute a division operation safely.

Exercise 2

Use tryCatch() to handle invalid numeric input.

Exercise 3

Create a warning if marks are negative.

Exercise 4

Use stop() when age is less than zero.

Exercise 5

Write a function that checks whether a number is positive. If not, display an appropriate warning or error.

Exercise 6

Create a simple calculator and handle division by zero using tryCatch().

Exercise 7

Write a program to validate student marks (0–100). Display a warning for unusual values and stop execution for invalid values.

❓ Viva Questions

What is error handling?
Why is error handling important?
What is the purpose of try()?
How does tryCatch() differ from try()?
What is the purpose of warning()?
When should stop() be used?
Does warning() terminate program execution?
What is the role of the finally block in tryCatch()?
Give two real-life applications of error handling.
Why should programs validate user input?

📚 Module 5 Summary

In Module 5: Advanced R Programming, you learned:

Control Structures (if, if...else, switch)
Looping Constructs (for, while, repeat)
break and next
Vectorized Operations
The Apply Family (apply(), lapply(), sapply(), tapply(), mapply())
Debugging (debug(), debugonce(), trace(), browser())
Error Handling (try(), tryCatch(), warning(), stop())

You also practiced each concept through:

✔ Step-by-step explanations
✔ R programs with sample data
✔ Expected outputs
✔ Real-world applications
✔ Comparison tables
✔ Lab exercises
✔ Viva questions

Total Pageviews

Sunday, June 14, 2020

R LANGUAGE

📘 Module 1: Introduction to R Programming

(6 Classes)

🎯 Learning Outcomes

CLASS 1

Introduction to R

What is R?

Why Learn R?

Applications of R

Installing R

Step 1

Step 2

CLASS 2

RStudio Interface

1. Source

2. Console

3. Environment

4. Files

5. Plots

6. Packages

7. Help

Understanding the R Command Prompt

CLASS 3

Basic Operations

Arithmetic Operators

Comparison Operators

Logical Operators

CLASS 4

Data Types

Numeric

Integer

Character

Logical

Factor

Variable Assignment

Variable Naming Rules

CLASS 5

Data Structures in R

Vector

List

Matrix

CLASS 6

Data Frame and Factors

Data Frame

Factors

Summary of Data Structures

Common Built-in Functions

Practical Exercises

Viva Questions

📘 Module 2: Data Manipulation and Management (10 Classes)

📚 Syllabus

1. Data Import and Export

2. Data Cleaning and Preparation

3. Data Transformation

📖 Class-wise Course Plan

Class 1: Data Import and Export – Reading Data from CSV Files

🎯 Learning Objectives

📖 2.1 Introduction to Data Import

Definition

🌟 Why Data Import is Important?

Advantages

📊 Common Data File Formats

📖 2.2 What is a CSV File?

📊 Sample CSV Dataset (10 Records)

📖 2.3 Creating a CSV File

CSV Content

🔵 2.4 Reading a CSV File

Method 1: Using read.csv()

Syntax

Parameters

💻 Example 1: Read Employee Data

Output

Explanation

💻 Example 2: View the First Six Records

Output

💻 Example 3: View the Last Six Records

Output

💻 Example 4: Display Structure of Dataset

Method 1: Using `read.csv()`

📖 2.6 The `readxl` Package

✔ Advantages of `readxl`