We make sure that our Machine Learning course in Mumbai provides coding experience along with hands-on projects. While beginning with the concepts, we also provide theoretical motivation and mathematical problem formulation.
This course includes one primary capstone project and more than 25 ancillary exercises based on 17 machine learning algorithms.
Capstone Project Details:
Project Name: Predicting house prices in California
Description: The project involves building a model that predicts median house values in Californian districts. You will be given metrics such as population, median income, median housing price, and so on for each block group in California. Block groups are the smallest geographical unit for which the US Census Bureau publishes sample data (a lock group typically has a population of 600 to 3,000 people). The model you build should learn from this data and be able to predict the median housing price in any district.
Concept covered: Techniques of Machine Learning
Case Study 1: Predict whether the houses will be purchased or not by the consumers, from the given dataset, provided with their salary and age
Project 1: In reference to the above problem statement, what issues can be observed in the plot generated by the code?
Project 2: What is the estimated cost of the houses with areas 1700 and 1900?
Concept covered: Data Preprocessing
Case Study 2: Using the information provided in the dataset, demonstrate the methods to handle missing data, categorical data, and data standardization
Project 3: Review the training dataset (Excel file). Observe that weight is missing for the fifth and eighth rows. For the mentioned rows, what are the values computed by the imputer?
Project 4: In the tutorial code, find the call to the Imputer class. Replace the strategy parameter from “mean” to “median” and rerun it. What is the new value assigned to the blank fields Weight and Height for the two rows?
Project 5: In the code snippet given below in the tutorial, why does the array X have 5 columns instead of 3 columns as before?
Case Study 3: Demonstrate how to reduce data dimensions from 3D to 2D using the information provided
Project 6: What does the hyperplane shadow represent in the PCA output chart on random data?
Project 7: What is the reconstruction error after PCA transformation? Give interpretation.
Concept Covered: Regression
Case Study 4: Demonstrate how to reduce data dimensions from 3D to 2D using the information provided
Project 8: Modify the degree of the polynomial from Polynomial Features (degree = 1) to 1, 2, 3, and interpret the resulting regression plot. Specify if it is under fitted, right-fitted, or overfitted?
Project 9: Predict the insurance claims for age 70 with polynomial regression n with degree 2 and linear regression.
Project 10: In the code snippet given below in the tutorial, why does the array X have 5 columns instead of 3 columns as before?
Case Study 5: Predict insurance premium per year based on a person’s age using Decision Trees using the information provided in the dataset
Project 11: Modify the code to predict insurance claim values for people over 55 years of age in the given dataset.
Case Study 6: Generate random quadratic data and demonstrate Decision Tree regression
Project 12: Modify the max_depth from 2 to 3 or 4, and observe the output.
Project 13: Modify the max_depth to 20, and observe the output
Project 14: What is the class prediction for petal_length = 3 cm and petal_width = 1 cm for the max_depth = 2?
Project 15: Explain the Decision Tree regression graphs produced when max_depths are 2 and 3. How many leaf nodes exist in the two cases? What does the average value represent these two situations? Use the information provided
Project 16: Modify the regularization parameter min_sample_leaf from 10 to 6, and check the output of Decision Tree regression. What result do you observe? Explain the reason.
Case Study 7: Use Random Forests to predict insurance per year based on the age of a person.
Project 17: What is the output insurance value for individuals aged 60 and with n_estimators = 10?
Case Study 8: Demonstrate various regression techniques over a random dataset using the information provided in the dataset
Project 18: The program shows a learning process when the values of the learning rate η are 0.02, 0.1, and 0.5. Interpret these charts.
Project 19: The program shows the learning process when the values of the learning rate η are 0.02, 0.1, and 0.5. Try modifying the values to 0.001, 0.25, and 0.9 and observe the output. Give your interpretation.
Concept Covered: Classification
Case Study 9: Predict if the houses will be purchased by the consumers, given their salary and age. Use the information provided in the dataset
Project 20: Typically, the nearest_neighbors for testing class in KNN has the value 5. Modify the code with the value of nearest_neighbours to 2 and 20, and note down your observations.
Case Study 10: Classify the IRIS dataset using SVM, and demonstrate how Kernel SVMs can help classify non-linear data.
Project 21: Modify the kernel trick to linear from RBF to check the type of classifier that is produced for the XOR data in this program. Interpret the data.
Project 22: For the Iris dataset, add a new code at the end of this program to produce a classification for RBF kernel trick with gamma = 1.0. Discuss the result.
Case Study 11: Use Decision Trees to classify the IRIS flower dataset. Use the information provided.
Project 23: Run decision tree on the IRIS dataset with max depths of 3 and 4, and display the tree output.
Project 24: Predict and print class probability for Iris flower instance with petal_len 1 cm and petal_width 0.5 cm.
Case Study 12: Classify the IRIS flower dataset using various classification algorithms. Use the information provided.
Project 25: Add Logistic Regression classification to the program and compare classification output to previous algorithms?
Concept Covered: Unsupervised Learning with Clustering
Case Study 13: Demonstrate the Clustering algorithm and the Elbow method on a random dataset.
Project 26: Change the number of clusters k to 2, and record the observations.
Project 27: Modify the n_samples from 150 to 15000 and the number of centers to 4 with n_clusters as 3. Find the output, and record the observations.
Project 28: Change the code to set the n_samples from 150 to 15000 and the number of centers to 4, keeping n_clusters at 4. Find the output.
Project 29: Modify the number of clusters k to 6, and record your observations.