Temperature (in Celcius)	Sales
20	2,000
25	2,100
26	2,300
28	2,400
30	2,600
36	3,100

No. of rooms	Floors	Area (sq ft)	Price
2	0	900	$4000,00
3	2	1,100	$600,000
3.5	5	1,500	$900,000
4	3	2,100	$1,200,000

Basic Program 📚	Suggested Program ✍️	Trending Program 📈
Explore Now	Explore Now	Explore Now

Error	Residual Error
The difference between the actual value and the predicted value is called an error. Some of the popular means of calculating data science errors are: Root Mean Squared Error (RMSE) Mean Absolute Error (MAE) Mean Squared Error (MSE)	The difference between the arithmetic mean of a group of values and the observed group of values is called a residual error.
An error is generally unobservable.	A residual error can be represented using a graph.
A residual error is used to show how the sample population data and the observed data differ from each other.	An error is how actual population data and observed data differ from each other.

Standardization	Normalization
The technique of converting data in such a way that it is normally distributed and has a standard deviation of 1 and a mean of 0.	The technique of converting all data values to lie between 1 and 0 is known as Normalization. This is also known as min-max scaling.
Standardization takes care that the standard normal distribution is followed by the data.	The data returning into the 0 to 1 range is taken care of by Normalization.
Normalization formula - X’ = (X - Xmin) / (Xmax - Xmin) Here, Xmin - feature’s minimum value, Xmax - feature’s maximum value.	Standardization formula - X’ = (X - 𝞵) / 𝞼

NAME	ATTRIBUTE	VALUE
RAMA	HEIGHT	182
SITA	HEIGHT	160

NAME	HEIGHT
RAMA	182
SITA	160

Tutorial Playlist

Data Science Tutorial for Beginners

What Is Data Science: Lifecycle, Applications, Prerequisites and Tools

The Best Introduction to Data Science

Data Scientist vs Data Analyst vs Data Engineer: Job Role, Skills, and Salary

Data Science with R

Getting Started with Linear Regression in R

Logistic Regression in R: The Ultimate Tutorial with Examples

Support Vector Machine (SVM) in R: Taking a Deep Dive

Introduction to Random Forest in R

What is Hierarchical Clustering and How Does It Work

The Best Guide to Time Series Forecasting in R

How to Build a Career in Data Science?

How to Become a Data Scientist

Data Scientist Salary in India: Are You Earning Enough?

Data Science Interview Questions and Answers

What is Synthetic Data Generation? Definition, Types, and More

Data Science Interview Questions and Answers

Data Science Tutorial for Beginners

What Is Data Science: Lifecycle, Applications, Prerequisites and Tools

The Best Introduction to Data Science

Data Scientist vs Data Analyst vs Data Engineer: Job Role, Skills, and Salary

Data Science with R

Getting Started with Linear Regression in R

Logistic Regression in R: The Ultimate Tutorial with Examples

Support Vector Machine (SVM) in R: Taking a Deep Dive

Introduction to Random Forest in R

What is Hierarchical Clustering and How Does It Work

The Best Guide to Time Series Forecasting in R

How to Build a Career in Data Science?

How to Become a Data Scientist

Data Scientist Salary in India: Are You Earning Enough?

Data Science Interview Questions and Answers

What is Synthetic Data Generation? Definition, Types, and More

Table of Contents

Data Science Interview Questions At A Glance

1. Basic Data Science Interview Questions

2. Advanced Data Science Interview Questions

Unlock High Salaries by Becoming a Data Scientist

A. Basic Data Science Interview Questions

1. What are the differences between supervised and unsupervised learning?

2. How is logistic regression done?

3. Explain the steps in making a decision tree.

4. How do you build a random forest model?

Steps to build a random forest model:

5. How can you avoid overfitting your model?

6. Differentiate between univariate, bivariate, and multivariate analysis.

Bivariate

Multivariate

7. What feature selection methods are used to select the right variables?

1. Filter Methods

2. Wrapper Methods

8. In your choice of language, write a program that prints the numbers ranging from one to 50.

9. You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?

10. For the given points, how will you calculate the Euclidean distance in Python?

11. What are dimensionality reduction and its benefits?

12. How will you calculate eigenvalues and eigenvectors of the following 3x3 matrix?

13. How should you maintain a deployed model?

Monitor

Evaluate

Compare

Rebuild

14. What are recommender systems?

Collaborative Filtering

Content-based Filtering

15. How do you find RMSE and MSE in a linear regression model?

16. How can you select k for k-means?

17. What is the significance of p-value?

18. How can outlier values be treated?

19. How can time-series data be declared as stationery?

20. How can you calculate accuracy using a confusion matrix?

21. Write the equation and calculate the precision and recall rate.

22. 'People who bought this also bought…' recommendations seen on Amazon are a result of which algorithm?

23. Write a basic SQL query that lists all orders with customer information.

24. You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn't you be happy with your model performance? What can you do about it?

25. Which of the following machine learning algorithms can be used for inputting missing values of both categorical and continuous variables?

Take Your Data Scientist Skills to the Next Level

26. Below are the eight actual values of the target variable in the train file. What is the entropy of the target variable?

27. We want to predict the probability of death from heart disease based on three risk factors: age, gender, and blood cholesterol level. What is the most appropriate algorithm for this case?

28. After studying the behavior of a population, you have identified four specific individual types that are valuable to your study. You would like to find all users who are most similar to each individual type. Which algorithm is most appropriate for this study?