Training a model is the process of getting a machine to "learn"
the data by supplying an adequate training set that consists of relevant data
such that the model will now be able to recognize new instances of the same
type of data which did not belong to the supplied training data set. This is
done by pairing the input data with the expected output. For example, if we
must train a model to recognize human speech of the English language, we must
supply the model with a training data set with a large number of people
speaking English. With the help of this training set, the machine must be
taught to learn that the samples are of human speech by associating them all
with some common patterns in their feature vectors (for example, MFCCs). Once a model is believed to be trained, it
must be tested with a test data. In the context of the aforementioned example,
this could consist of a few more samples of human speech that are not present
in the training data set.
Training a model can be accomplished by using a supervised algorithm, an
unsupervised algorithm or a semi-supervised algorithm. A supervised algorithm
helps a machine to infer from the data. The data provided in the training set
is labelled. An unsupervised algorithm on the other hand enables the machine to
learn the data on its own by finding a hidden pattern or organization in the
data (the data is unlabelled). In semi-supervised learning, both, labelled and
unlabelled data is used. Commonly, a small amount of labelled data is used along
with a large amount of unlabelled data, where the labelled data is made use of to
understand or learn the structure of the unlabelled data.
The training phase is extremely important, since the datasets used in the
process will largely influence the machine’s ability to learn, and the machine’s
performance (measured in terms of how many test cases are identified correctly
or erroneously) and efficiency (in terms of speed and energy utilization). The
datasets must be large enough and in the case of speech and sound applications,
diverse enough. Most often, the richer the dataset used in the training phase, the more
accurate the results.
No comments:
Post a Comment