NWG - Automatic Pattern Recognition

About This Project

The primary objective of this project was to gain a basic understanding of machine learning by studying neural networks and applying MATLAB’s Neural Network Toolbox to perform character recognition on a widely studied dataset of hand-written characters. After evaluating the results of various features and network configurations with ten-fold cross-validation, an optimal network that combined distance profiles, projection histograms, and pixel features was selected. This network achieved a classification success of 88.12% ± 2.26% on all 62 characters, which is comparable to the reported successes of a convolution neural network and human classification.

NIST Special Database 19

Special Database 19 was acquired from the National Institute of Standards and Technology (NIST) for use in this project. This dataset contains 815,000 written characters (0-9, a-z, A-Z) acquired from 3,600 different writers for use in character recognition.

What is a Neural Network

Artificial Neural Networks are problem solving models inspired by biological neural networks (i.e. a brain). They consist of many interconnected nodes, akin to neurons, that receive and pass forward information, potentially through multiple layers. A node can receive multiple data inputs, and from them produce a single output. Multiple nodes can be created in a layer to receive different inputs and make different “decisions.” Creating additional layers of nodes allows outputs from the previous layer to be combined, which allows the network to view the data in more abstract ways.

Profiling Features

For this project, the network inputs were numerical features that describe the characters and help distinguish them from each other. The output targets were the 62 different character classes. Not shown here are two other features that were researched. Crossings and Euler Number features are explained in more detail in the report, but were not used in the final network and were therefore removed in this abbreviated format.

Results

Principal Component Analysis (PCA) was performed on the features to locate the eigenvectors that contribute the most to the overall variance of the data. The transformed data was used as the new features space. Ten-fold cross validation was performed on each feature network.

Feature	Network Configuration	Classification Success
Crossings	[150,110]	36.59% ± 1.17%
Projection Histogram	[100]	48.17% ± 0.74%
Distance Profile	[100,100]	71.46% ± 1.47%
Pixel	[70,70]	71.59% ± 14.49%
Merged	[90,90]	81.94% ± 2.26%

The optimal neural network utilized a combination of projection histograms, distance profiles, and individual pixels as features of interest. The success achieved by the merged network is comparable to the success of a convolution neural network committee (88.12% ± 0.09%), and the success of actual human classification (81.8% w/ 0.1% standard error).

Note: All three of these studies were conducted using the full NIST Database. Most of the merged network’s incorrect classifications resulted from very similar characters that would require contextual information to correctly classify (e.g. {0,O,o}, {1,l,I,i}, {9,q}, {C,c}, etc.).

Multilayer Perceptron Neural networks for Handwritten Character Recognition

Logan Greer, Eric Guidarelli, Joseph Mandara, Craig Szot, Ric Rey Vergara, and Ryan Wright

About This Project

NIST Special Database 19

What is a Neural Network

Profiling Features

Results

Want More?