A Support Vector Machine Implementation for Sign Language Recognition on x86

Quick Update:

I have not had enough time to do a comprehensive update on all the recent developments around this project, here is a quick summary before delving into the technical implementation:

  • The project recently emerged the grand winner of the hardware trailblazer award at the American Society of Mechanical Engineers(ASME) global finals in New York. You can read more about this here. It competed against other impressive innovations from America, India and the rest of the world.
  • Sign-IO also placed 2nd runners-up at the Royal Academy of Engineering Leaders in Innovation Fellowship in London.
  • Below are a few images of the prototyping efforts so far(top being most recent version)

The current iteration is refined for portability and appears as shown below:

Pretty neat, right?
Pretty neat, right?:-)

IMG_20171008_200341_2

blog1
The previous version

Currently, more than 30 million people in the world have speech impairments and thus to communicate have to use sign language resulting in a language barrier between sign language and non-sign language users. This project explores the development of a sign language to speech translation glove by implementing a Support Vector Machine(SVM) on the Intel Edison to recognize various letters signed by sign language users. The data for the predicted signed gesture is then transmitted to an Android application where it is vocalized.

The Intel Edison is the preferred board of choice for this project primarily because:

  1. The huge and immediate processing needs of the project as Support Vector Machines are a machine learning algorithms that require a lot of processing power and memory. In addition to this we need our output in real-time.
  2.  The inbuilt Bluetooth module on the Edison is used to transmit the predicted gesture to the companion Android application for vocalization.

I have also published this project on the Intel Developer Zone site.

The project code can be downloaded from here

Below is the short project video demo from a while back:

1. The Hardware Design

The sign language glove has five flex sensors mounted on each finger to quantify how much a finger is bent.

Flex sensors are sensors that change their resistance depending on the amount of bend on the sensor. They convert the change in bend to electrical resistance – the more the bend, the more the resistance.

flex

The flex sensors used on this project are the unidirectional 4.5” Spectra Symbol flex sensors. Flex sensors are analog resistors that work as variable voltage dividers.

Below is the design of the glove PCB in KiCad

circuit

We use the flex sensor library in the Intel XDK IoT Edition to read values from each of the flex sensors.

var flexSensor_lib = require('jsupm_flex');
var Flex1 = new flexSensor_lib.Flex(4);

We want data from all the flex sensors in a standardized format since they are over a wide range and thus their interpretation is difficult. To achieve this, we first establish both the minimum and maximum bend values for each flex sensor then use these values to scale down the readings between 1.0 and 2.0. The snippet below shows how this is achieved for one of the sensors.

var ScaleMin = 1.0;
var ScaleMax = 2.0;
var flexOneMin = 280;
var flexOneMax = 400;

var flex1 = (scaleDown(Flex1.value(), flexOneMin, flexOneMax)).toFixed(2);

function scaleDown(flexval, flexMin, flexMax) {

    var new_val = (flexval - flexMin) / (flexMax - flexMin) * ((ScaleMax - ScaleMin) + ScaleMin);

    return new_val;
}

We then pass these values to our support vector classifier.

2.Support Vector Machine Implementation

Support Vector Machines (SVMs) are machine learning supervised  models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked for belonging to one of n categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. Given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.

We used node-svm which is the JavaScript implementation of LIBSVM-the most popular SVM library. To install the library run:

npm install node-svm

We then copy the node-svm library build folder for the into our project directory.
Additionally, in using node-svm,you will have to install all the require npm packages used by the library which are Stringify-object, Mout, Graceful-fs, Optimist, Osenv, Numeric, Q and underscore. You do this by running:

npm install <package name>

1. We create the classifier, setting all the required kernel parameters:

var clf = new svm.CSVC({
    gamma: 0.25,
    c: 1,
    normalize: false,
    reduce: false,
    kFold: 2 // enable k-fold cross-validation
});

The parameter C controls the tradeoff between errors of the SVM on training data and margin maximization. C is used during the training phase and says how much outliers are taken into account in calculating Support Vectors.The best values for C and gamma parameters are determined using a grid search. We don’t do dimensionality reduction as each of the values(dimensions) from the 5 flex sensors is important in predicting signed gestures.

2. We build the model by training the classifier using a training dataset and generate a training report. Training the classifier should take a few seconds.

svm.read(fileName)
    .then(function (dataset) {
        return clf.train(dataset)
            .progress(function (progress) {
                console.log('training progress: %d%', Math.round(progress*100));
            });
    })
    .spread(function (model, report) {
        console.log('SVM trained. \nReport:\n%s', so(report));
    }).done(function () {
        console.log('Training Complete.');
    });

3. We then use the classifier to synchronously predict the signed gesture. The classifier accepts a 1-D array as input to make the predictions. We pass the flex sensor values as the parameters.

prediction = clf.predictSync([flex1, flex2, flex3, flex4, flex5]);

We can also get the probability of each of the predicted gestures at each instance by running the following:

probability= clf.predic ProbabilitiesSync ([flex1, flex2, flex3, flex4, flex5]);

The predicted symbol is transmitted to the Android device each time the application receives a read request from the Android app.

Creating the training file.

The training.ds file which is the training file for the project has 832 lines of training data. It would be time-consuming to manually key in all these values so this was done using the code snippet below from logtrainingdata.js

var data = "X" + " " + "1:" + f1ex1 + " " + "2:" + flex2 + " " + "3:" + flex3 + " " + "4:" + flex4 + " " + "5:" + flex5 + "\n";
//X is the current alphabet letter we are training on. We represent these as numbers e.g. A-=0, B=1,C=2…
//append the data to the dataset file
fs.appendFile('training.ds', data, function(err) {
    if (err) {
        console.log(err)
    }
});

3. Running the program

We need to turn on the Bluetooth radio on the Edison before we begin advertising. To do this we run:

rfkill unblock bluetooth
killall bluetoothd
hciconfig hci0 up

You can verify that the Bluetooth radio is up with the command below which will return the Bluetooth MAC address of the Edison:

hcitool dev

We then execute the JavaScript program with:

node main.js

4. Android Companion App

The Android companion application uses the Android text to speech engine to vocalize the various predicted gestures The application presents the user with the option of testing and setting the   language locale, voice speed and voice pitch as shown in the screenshots. The settings are stored in SharedPreferences. The user also sets the read request interval to the Edison for getting the advertised characteristics. It is important that users set their own preferred interval since a user just beginning to learn sign language will sign at a slower rate compared to a sign language expert.

Screenshot_2016-06-23-22-38-19-725In the next activity, the user scans and connects to the Intel Edison device which is currently advertising the predicted gestures.They then proceed to the activity where the predicted gestures are displayed and vocalized as can be viewed from the demo video at the top.

 

 

ASME promo

RAEng LIF

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *