< prev | next >

Artificial Neural Networks

Artificial Neural Networks (ANN) are computing system graphs roughly modeled on biological neural networks that constitute animal brains.

Input Nodes - take in feature data used for model training and prediction processing
Hidden Nodes - are in between input and output nodes and take in data and use processes such as activation functions to produce outputs that are sent to other nodes
Output Nodes - receive the result of the neural processing of activation nodes
Data Array Links - connect and pass data between nodes
Weights - are adjusted by the model training process to modify data array links to produce increasingly more accurate output results; weights are used by the Activation Functions to modify data input values

The diagram below shows conceptual ANN high level components. There are many variations on ANN architecture such as recurrence and convolution.

Key aspects of ANNs include:

Flexibility - ANNs can be configured to address a wide variety of Machine Learning applications
Accuracy - in many applications, ANNs have achieved accuracies exceeding that of humans
Advancements - advancements in ANN technology over the past few years have made them the most widely used Machine Learning algorithm

Model Training

Data is iteratively processed through the neural network while adjusting the weights and biases applied to data array links using backpropagation to produced increasingly more accurate output results.

Data Inputs - data is fed into the training process
Iteration - data is iteratively passed through the neural network
Forward Propagation - data is passed from node to node
Outputs - output results are fed into loss calculations
Loss Calculation - the difference between output results and desired results is calculated
Weight Optimization - the amount of change to data flow weights is calculated
Backpropagation - modifies the weights and biases applied to data array links

Prediction Processing

Data is passed forward through the neural network to produce a result and associated confidence level that the result is true.

Data Inputs - data is fed into the training process
Forward Propagation - data is passed from node to node
Outputs - output results are fed into confidence level calculations
Confidence Level - is a number from 0 to 1 indicating the probability that the output results is correct

Processing Enhancements

Processing enhancements include methods such as:

Batch Normalization - normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation
Batch Gradient Descent - averages the gradients of training examples and uses the mean to update parameters

Python Example

This code example uses a number of hyperparameters to control aspects of model instantiation and training. A selection of the main hyperparameters are shown below. For more information on the processing functions used and additional hyperparameters see:

activation_function: which Activation Function is used in Activation Nodes
batch_size: the number of inputs to include in each processing iteration linked to the learning rate
hidden_network_layers: the number of nodes in each hidden network layer; hidden layers are those between the input and output layers
learning_rate: what algorithm to use for controlling Weight Optimization
maximum_number_of_iterations: the maximum number of iterations of data is processed through the neural network
number_of_data_features: the number of data features used for model training and inference processing
number_of_informative_data_features: the number of data features correlated to the training outputs; this simulates real world model training where the correlation of data features may not be known
number_of_model_classes: the number of output classes the neural network is being trained to predict
number_of_prediction_tests: the number of prediction tests included in the example code
number_of_training_and_test_samples: the number of data samples processed through model training
print_training_progress: whether to print the loss after each training iteration; loss is a measure of the difference between calculated outputs and expected outputs
tolerance_for_optimization: a numeric value used for ending the model training iteration cycles
weight_optimization_algorithm: the algorithm used for Weight Optimization, such as Stochastic Gradient Descent

To download the code below, click here.

"""
neural_network_with_scikit_learn.py
creates, trains and tests an artificial neural network

With the parameter values set as they are,
running the code may take as much as a few minutes to finish.
To reduce the running time, reduce the parameter value for:
number_of_training_and_test_samples
"""
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Set parameters.
number_of_training_and_test_samples = 10000
number_of_data_features = 60
batch_size = min(1000, number_of_training_and_test_samples)
number_of_informative_data_features = 50
number_of_model_classes = 8
number_of_prediction_tests = 30
activation_function = 'relu'
hidden_network_layers = (50, 50, 50)
weight_optimization_algorithm = 'sgd'
learning_rate = 'adaptive'
tolerance_for_optimization = 1e-5
maximum_number_of_iterations = 10000
random_state = 1
print_training_progress = True

# Generate model training and test data.
X, y = make_classification(n_samples=number_of_training_and_test_samples,
                           n_features=number_of_data_features,
                           n_informative=number_of_informative_data_features,
                           n_classes=number_of_model_classes,
                           random_state=random_state)

# Split the classification data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    stratify=y,
                                                    random_state=random_state)

# Instantiate a neural network classifier.
classifier = MLPClassifier(random_state=random_state,
                           hidden_layer_sizes=hidden_network_layers,
                           batch_size=batch_size,
                           activation=activation_function,
                           solver=weight_optimization_algorithm,
                           learning_rate=learning_rate,
                           tol=tolerance_for_optimization,
                           max_iter=maximum_number_of_iterations,
                           verbose=print_training_progress)

# Train the classifier.
trained_classifier = classifier.fit(X_train, y_train)

# Get the trained model mean accuracy score using test data.
mean_accuracy = trained_classifier.score(X_test, y_test)
print"Mean Accuracy of All Test Predictions:"
print(mean_accuracy)

# Process test predictions.
test_predictions = trained_classifier.predict(X_test[:number_of_prediction_tests, :])
print("Actual Prediction Test Classes:")
print(y_test[:number_of_prediction_tests])
print("Predicted Test Classes:")
print(test_predictions)

The example output is below:

Iteration 1, loss = 3.12814482
Iteration 2, loss = 2.67163940
Iteration 3, loss = 2.45638738
Iteration 4, loss = 2.34068097
Iteration 5, loss = 2.26461961
Iteration 6, loss = 2.21042757
Iteration 7, loss = 2.16944587
Iteration 8, loss = 2.13762949
Iteration 9, loss = 2.11139347
Iteration 10, loss = 2.08858667
Iteration 11, loss = 2.06853117
.
.
.
Iteration 3104, loss = 0.01097677
Iteration 3105, loss = 0.01097640
Iteration 3106, loss = 0.01097621
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000002
Iteration 3107, loss = 0.01097576
Iteration 3108, loss = 0.01097568
Iteration 3109, loss = 0.01097563
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000000
Iteration 3110, loss = 0.01097555
Iteration 3111, loss = 0.01097553
Iteration 3112, loss = 0.01097552
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Learning rate too small. Stopping.
Mean Accuracy of All Test Predictions:
0.6672
Actual Prediction Test Classes:
[0 4 3 7 5 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 4 0 2 4 5 3 0 2 3 5]
Predicted Test Classes:
[0 4 3 7 6 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 7 5 2 4 0 3 0 2 3 5]

Artificial Neural Networks

Model Training

Prediction Processing

Processing Enhancements

Python Example

References