FRESHERS INTERVIEW Q&A
1. Explain How a System Can Play a Game of Chess Using Reinforcement Learning.
Reinforcement learning has an environment and an agent. The agent performs some actions to achieve a specific goal. Every time the agent performs a task that is taking it towards the goal, it is rewarded. And, every time it takes a step that goes against that goal or in the reverse direction, it is penalized.
Earlier, chess programs had to determine the best moves after much research on numerous factors. Building a machine designed to play such games would require many rules to be specified.
With reinforced learning, we don’t have to deal with this problem as the learning agent learns by playing the game. It will make a move (decision), check if it’s the right move (feedback), and keep the outcomes in memory for the next step it takes (learning). There is a reward for every correct decision the system takes and punishment for the wrong one.
2. How Will You Know Which Machine Learning Algorithm to Choose for Your Classification Problem?
While there is no fixed rule to choose an algorithm for a classification problem, you can follow these guidelines:
If accuracy is a concern, test different algorithms and cross-validate them
If the training dataset is small, use models that have low variance and high bias
If the training dataset is large, use models that have high variance and little bias
3. How is Amazon Able to Recommend Other Things to Buy? How Does the Recommendation Engine Work?
Once a user buys something from Amazon, Amazon stores that purchase data for future reference and finds products that are most likely also to be bought, it is possible because of the Association algorithm, which can identify patterns in a given dataset.
4. When Will You Use Classification over Regression?
Classification is used when your target is categorical, while regression is used when your target variable is continuous. Both classification and regression belong to the category of supervised machine learning algorithms.
Examples of classification problems include:
Predicting yes or no
Estimating gender
Breed of an animal
Type of color
Examples of regression problems include:
Estimating sales and price of a product
Predicting the score of a team
Predicting the amount of rainfall
5. How Do You Design an Email Spam Filter?
Building a spam filter involves the following process:
The email spam filter will be fed with thousands of emails
Each of these emails already has a label: ‘spam’ or ‘not spam.’
The supervised machine learning algorithm will then determine which type of emails are being marked as spam based on spam words like the lottery, free offer, no money, full refund, etc.
The next time an email is about to hit your inbox, the spam filter will use statistical analysis and algorithms like Decision Trees and SVM to determine how likely the email is spam
If the likelihood is high, it will label it as spam, and the email won’t hit your inbox
Based on the accuracy of each model, we will use the algorithm with the highest accuracy after testing all the models.
6. What is a Random Forest?
A ‘random forest’ is a supervised machine learning algorithm that is generally used for classification problems. It operates by constructing multiple decision trees during the training phase. The random forest chooses the decision of the majority of the trees as the final decision.
7. Considering a Long List of Machine Learning Algorithms, given a Data Set, How Do You Decide Which One to Use?
There is no master algorithm for all situations. Choosing an algorithm depends on the following questions:
How much data do you have, and is it continuous or categorical?
Is the problem related to classification, association, clustering, or regression?
Predefined variables (labeled), unlabeled, or mix?
What is the goal?
8. What is Bias and Variance in a Machine Learning Model?
Bias
Bias in a machine learning model occurs when the predicted values are further from the actual values. Low bias indicates a model where the prediction values are very close to the actual ones.
Underfitting: High bias can cause an algorithm to miss the relevant relations between features and target outputs.
Variance
Variance refers to the amount the target model will change when trained with different training data. For a good model, the variance should be minimized.
Overfitting: High variance can cause an algorithm to model the random noise in the training data rather than the intended outputs.
9. What is the Trade-off Between Bias and Variance?
The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, variance, and a bit of irreducible error due to noise in the underlying dataset.
Necessarily, if you make the model more complex and add more variables, you’ll lose bias but gain variance. To get the optimally-reduced amount of error, you’ll have to trade off bias and variance. Neither high bias nor high variance is desired.
High bias and low variance algorithms train models that are consistent, but inaccurate on average.
High variance and low bias algorithms train models that are accurate but inconsistent.
10. Define Precision and Recall.
Precision
Precision is the ratio of several events you can correctly recall to the total number of events you recall (mix of correct and wrong recalls).
Precision = (True Positive) / (True Positive + False Positive)
Recall
A recall is the ratio of the number of events you can recall the number of total events.
Recall = (True Positive) / (True Positive + False Negative)
11. What is a Decision Tree Classification?
A decision tree builds classification (or regression) models as a tree structure, with datasets broken up into ever-smaller subsets while developing the decision tree, literally in a tree-like way with branches and nodes. Decision trees can handle both categorical and numerical data.
12. What is Pruning in Decision Trees, and How Is It Done?
Pruning is a technique in machine learning that reduces the size of decision trees. It reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.
Pruning can occur in:
Top-down fashion. It will traverse nodes and trim subtrees starting at the root
Bottom-up fashion. It will begin at the leaf nodes
There is a popular pruning algorithm called reduced error pruning, in which:
Starting at the leaves, each node is replaced with its most popular class
If the prediction accuracy is not affected, the change is kept
There is an advantage of simplicity and speed
13. Briefly Explain Logistic Regression.
Logistic regression is a classification algorithm used to predict a binary outcome for a given set of independent variables.
The output of logistic regression is either a 0 or 1 with a threshold value of generally 0.5. Any value above 0.5 is considered as 1, and any point below 0.5 is considered as 0.
14. Explain the K Nearest Neighbour Algorithm.
K nearest neighbour algorithm is a classification algorithm that works in a way that a new data point is assigned to a neighbouring group to which it is most similar. In K nearest neighbours, K can be an integer greater than 1. So, for every new data point, we want to classify, we compute to which neighbouring group it is closest.
15. What is a Recommendation System?
Anyone who has used Spotify or shopped at Amazon will recognize a recommendation system: It’s an information filtering system that predicts what a user might want to hear or see based on choice patterns provided by the user.