PES Sound Forensics: 2015

Friday, 8 May 2015

K-Means Clustering Code

We have written a code for the K-Means clustering of the mfccs we have obtained. We are yet to test the code, that will be our next task to accomplish in the coming week.

#train1.txt - contains all mfccs of all the categories' samples (FIRST frame coefficients only)

def dv(p1, p2): #takes two points as input
return ( math.sqrt((p1[0]*p1[0] - p2[0]*p2[0]) + ( p1[1]*p1[1] - p2[1]*p2[1])))

def mindv(p):
min = 0
mini = 0
for i in range(len(centroids)):
temp = dv(p, centroids[i])
if(temp<max):
temp = max
mini = i
rerurn mini

def meanPair(lst):
meanValX=0
meanValY=0
for l in lst:
meanValX+=l[0]
meanValY+=l[1]
return [float(meanValX)/len(lst),float(meanValY)/len(lst)]

def costFunc(cen,cls,m):
J=0.0
for l in range(len(cls)):
for p in cls[l]:
J+=(p[0]-cen[l][0])**2
J+=(p[1]-cen[l][1])**2
J=float(J)/m
return J

fd = open("train1.txt", "r")
L1 = []
for l in fd.read().split("\n"):
L1.append([float(i) for i in l.split(',')])

pairs = []

fd = open("train2.txt", "r")
L2 = []
for l in fd.read().split("\n"):
L2.append([float(i) for i in l.split(',')])

for i in range(len(L1)):
for j in range(13):
pairs.append([L1[i][j],L2[i][j]]

k=4
centroids=[]
for i in range(k):
centroids.append(pairs[random(0,len(pairs))])

oldJ=999999

#Loop Here
while 1:
C = [] #classification
for i in range(k):
C.append([])

#uncertain :P
for p in pairs:
C[mindv(p)].append(p)
#end uncertain :P

newJ=costFunc(centroids,C,len(pairs))
if((float(oldJ)-float(newJ))/float(oldJ))<=0.05:
break

centroids=[]
for i in range(k):
centroids.append(meanPair(c[i]))

print centroids, C

Friday, 17 April 2015

Performance Measure of Naive Bayes Classifier

Performance Measure

Naive Bayes classifier works better when the data set is small because of its low variance. It follows a simple algorithm which basically is related to counting. An NB classifier will give faster output in case the NB conditional independence assumption holds and in the case it when it doesn’t, an NB classifier still performs much better than expected more often than not. A naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Its a good choice when some kind of semi-supervised learning is needed.

Examples and Test Cases

Naive Bayes Classifier is good for the text classification problem of email spam filtering: classifying email messages into spam and non-spam . Since often a document is represented as a bag of words, text classifiers often don’t use any kind of deep representation about language:. This is an extremely simple representation: it only knows which words are included in the document along with their occurrences, and doesn’t store the word order.

The inclusion of strong feature independence assumptions makes it unsuitable for Speech Recognition. Consider a model that uses the average sentence length as a feature amongst others. Now, if we add some features modeling syntactic complexity of sentences in a text, such features may add new cues to the model, but syntactic complexity also has a correlation with sentence length. In such situations, naive Bayes models may fail, since they treat all features as independent.

Testing A Model

Once the training phase is completed, the next phase entered is the testing phase. In this phase, the testing data (new samples of data which is not present in the training set) is tested to check for an output, i.e., this is the phase where the algorithm chosen is actually run on the test data. The algorithm chosen is run on the test data and an output is obtained. For instance, if the input is a sound sample that needs to be classified as human or dog, the algorithm uses the learned parameters from training set in order to identify the given test sample.

For this, we need to select a classifier appropriate to the application on hand. Different classification algorithms yield varying performances for different applications. For example, the a classifier may be good for text processing but not for audio processing. A few classifiers may be identified and each of them may be tried and evaluated by their accuracy in identifying the test data correctly and the one with the best results may be used. The testing phase is dependent on the training phase for its performance. For example, care must be taken to ensure that the samples in the training set contain sufficiently clear and relevant data (in case of speech processing, pure samples of speech with the required words, less distortion and background noise).

The testing phase also determines the performance of the model.

Scatter Plots

Below are the scatter plots of two samples of dogs barking:

Here are the scatter plots of human speech samples:

Report - What It Means To Train A Model

Training a model is the process of getting a machine to "learn" the data by supplying an adequate training set that consists of relevant data such that the model will now be able to recognize new instances of the same type of data which did not belong to the supplied training data set. This is done by pairing the input data with the expected output. For example, if we must train a model to recognize human speech of the English language, we must supply the model with a training data set with a large number of people speaking English. With the help of this training set, the machine must be taught to learn that the samples are of human speech by associating them all with some common patterns in their feature vectors (for example, MFCCs). Once a model is believed to be trained, it must be tested with a test data. In the context of the aforementioned example, this could consist of a few more samples of human speech that are not present in the training data set.

Training a model can be accomplished by using a supervised algorithm, an unsupervised algorithm or a semi-supervised algorithm. A supervised algorithm helps a machine to infer from the data. The data provided in the training set is labelled. An unsupervised algorithm on the other hand enables the machine to learn the data on its own by finding a hidden pattern or organization in the data (the data is unlabelled). In semi-supervised learning, both, labelled and unlabelled data is used. Commonly, a small amount of labelled data is used along with a large amount of unlabelled data, where the labelled data is made use of to understand or learn the structure of the unlabelled data.

The training phase is extremely important, since the datasets used in the process will largely influence the machine’s ability to learn, and the machine’s performance (measured in terms of how many test cases are identified correctly or erroneously) and efficiency (in terms of speed and energy utilization). The datasets must be large enough and in the case of speech and sound applications, diverse enough. Most often, the richer the dataset used in the training phase, the more accurate the results.

Tuesday, 7 April 2015

The tasks to be done this week:

1)Get the scatter plots for different sound samples and try to separate them manually.

2) Report on classification .

3)Classify data on weka and compare with known results.

Wednesday, 1 April 2015

Using Weka - First Example We Ran

We created a .arff file with the following small sample data-set about housing. It has six attributes - housingSize, lotSize, bedrooms, granite, bathroom and sellingPrice.

Selecting this arff file in Weka Explorer :

Selecting the Linear Regression classifier,

We used help from the following link: http://www.ibm.com/developerworks/library/os-weka2/

Tuesday, 31 March 2015

The scatter plots for three samples (human speaking(h1), Bird Chirping(b1), Dog Barking(d1)) have been obtained.
Fig b1b1 gives the scatter plot when both the axes are values of b1.
Fig d1d1 gives the scatter plot when both the axes are values of d1.
Fig h1h1 gives the scatter plot when both the axes are values of h1.
Fig b1h1 gives the scatter plot where values of h1 are plotted on one axis and b1 on the other.
Fig d1h1 gives the scatter plot where values of h1 are plotted on one axis and d1 on the other.

Fig b1d1 gives the scatter plot where values of d1 are plotted on one axis and b1 on the other.

https://drive.google.com/folderview?id=0B3U5ydL0qUhyfkpLdFZSQzVsM0VFVXlRcmRXWUNHTkNEa2FfaE8wZWxPRlNBRmd2R0Q1bXc&usp=sharing

Saturday, 28 March 2015

One for tasks was to compute the Mfcc for different sound samples.
The below link has the outputs for three samples one of a Bird Chirping(B1),Dog Bark(D1), Human Speaking(H1).

https://drive.google.com/folderview?id=0B3U5ydL0qUhyfjkyeUwwV1lXWDdxMU50YVJvWW1xMnF3Z1BHSUJZS1VOSHZ4Rmk0MVdtOW8&usp=sharing

Friday, 27 March 2015

So far, we have used a python code to find the MFCCs of the samples we have collected. We are also continuing to collect more sounds of the various categories. (Our current sound sample categories include dogs barking, birds chirping, human speech, bike engines, car engines, jet sounds, crowds cheering, musical instruments - violin, piano, cello, guitar and drums.)

Our next task is to carefully study the classification of the data sets available on the UCI Machine Learning Repository page and use the knowledge to classify the samples we have using Weka. We have also been going through tutorials of Weka to familiarize ourselves with how it works.

We will shortly be uploading the samples we have collected onto a Google Drive and share the link on the blog.

Monday, 23 March 2015

Tasks over the next one week:
Mfcc for different recordings
Scatter Plots
Use UCI database as examples for testing different classifiers

Thursday, 19 March 2015

Over the past few days we have been engaged in collecting more samples until we have a sizable amount, which as of now is around 200 samples of pure events (not mixed) of events ranging from human speech, air-conditioning sounds, engine sounds, birds chirping and musical instruments.

Our next task is to identify a suitable classifier that will enable us to classify new samples based on the training set of these 200 sounds.

We are also working on computing the MFCCs for the samples, and the classification of the samples based on the MFCCs using Weka.

We have also been reviewing the slides on asr.cs.cmu.edu with rigor for a more thorough understanding of the underlying concepts.

Friday, 13 March 2015

In the last ten days we have collected more samples (we now have around 35 samples of environmental noise). We are also familiarizing ourselves with Weka and how to use Weka to classify these sounds. Our current task is to try and use Weka to classify the sounds that we have collected. We have also been doing thorough overviews of the course material available on asr.cs.cmu.edu .

Sunday, 1 March 2015

Classifiers

The following link provides a basic understanding of the different types of classifiers that may be used:

Evaluating Types of Classifiers

Multiple Classifier Systems for Audio Classification

Links that provide a good understanding of the Support Vector Machine:

Columbia CS Support Vector Machine Tutorial

www.support-vector-machines.org/

Stanford Notes on SVMs

YouTube Tutorial on SVMs

Understanding the application of Hidden Markov Models for sound classification:

Using Hierarchical HMM for vehicle classification

Automatic Classification of Environmental Noise Events

HMM Tutorial

Understanding Deep Neural Networks:

Deep Neural Networks Tutorial

Audio Classification using Deep Neural Networks

Recognition of Acoustic Events using Deep Neural Networks

Thursday, 26 February 2015

Audio Sample Taken in a Public Restaurant

The following is the audio signal of a sample obtained at a restaurant with less crowd. Prominent background noise sources include tapping feet (stationary), footsteps (moving around at intervals), distant (approximately around a 5 feet radius) chatter at low volumes, music from a player, and noise from the kitchen (around 30 feet from the point of recording).

The frequency analysis spectrum, obtained with the help of audacity:

The cepstrum for the above sample:

Below is the signal of a sample taken in the same location, a few minutes apart. This sample includes noise from a nearby (around 2-3 feet away) conversation, along with the rest of the above events occurring in addition:

The frequency spectrum:

Frequency cepstrum:

Judging by the audio signals alone, the two seem to differ visibly. The sample with the conversation has sharp, visible spikes in it that occur along the duration of the sample. The amplitude of the signals of the conversation are quite large compared to that of the rest of the signal. Factors such as the proximity of the source might have an effect over the signal as well.

About Us

We are a team of engineering students studying at the PES University. We are currently in our sixth semester. The objective of this project is to analyse sound and deduce meaningful events from the analysis. We are primarily focusing on classifying environmental sound by studying the spectra and properties of the audio signals. Our mentor is Mrs. Savitha Murthy and our guide is Dr. Dinkar Sitaram, head of the Center for Cloud Computing and Big Data.

Our team comprises the following members:

Saikarun BR
Department of Telecommunication Engineering

Sanchitha Seshadri

Department of Information Sciences Engineering

Sagar T.P.
Department of Computer Science Engineering