Editing Computational biology (section)

===Supervised Learning===
[[Supervised learning]] is a type of algorithm that learns from labeled data and learns how to assign labels to future data that is unlabeled. In biology supervised learning can be helpful when we have data that we know how to categorize and we would like to categorize more data into those categories.

[[File:Random forest explain.png|thumb|350x350px|Diagram showing a simple random forest]]
A common supervised learning algorithm is the [[random forest]], which uses numerous [[Decision tree learning|decision trees]] to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a [[Classification chart|classification tree]], but if the target variable is continuous then it is called a [[regression tree]]. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.