Artificial Neural Networks
Artificial Neural Networks (ANN) are computing system graphs roughly modeled on biological neural networks that constitute animal brains.
Input Nodes - take in feature data used for model training and prediction processing
Hidden Nodes - are in between input and output nodes and take in data and use processes such as activation functions to produce outputs that are sent to other nodes
Output Nodes - receive the result of the neural processing of activation nodes
Data Array Links - connect and pass data between nodes
Weights - are adjusted by the model training process to modify data array links to produce increasingly more accurate output results; weights are used by the Activation Functions to modify data input values
The diagram below shows conceptual ANN high level components. There are many variations on ANN architecture such as recurrence and convolution.
Key aspects of ANNs include:
Flexibility - ANNs can be configured to address a wide variety of Machine Learning applications
Accuracy - in many applications, ANNs have achieved accuracies exceeding that of humans
Advancements - advancements in ANN technology over the past few years have made them the most widely used Machine Learning algorithm
Model Training
Data is iteratively processed through the neural network while adjusting the weights and biases applied to data array links using backpropagation to produced increasingly more accurate output results.
Data Inputs - data is fed into the training process
Iteration - data is iteratively passed through the neural network
Forward Propagation - data is passed from node to node
Outputs - output results are fed into loss calculations
Loss Calculation - the difference between output results and desired results is calculated
Weight Optimization - the amount of change to data flow weights is calculated
Backpropagation - modifies the weights and biases applied to data array links
Prediction Processing
Data is passed forward through the neural network to produce a result and associated confidence level that the result is true.
Data Inputs - data is fed into the training process
Forward Propagation - data is passed from node to node
Outputs - output results are fed into confidence level calculations
Confidence Level - is a number from 0 to 1 indicating the probability that the output results is correct
Processing Enhancements
Processing enhancements include methods such as:
Batch Normalization - normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation
Batch Gradient Descent - averages the gradients of training examples and uses the mean to update parameters
Python Example
This code example uses a number of hyperparameters to control aspects of model instantiation and training. A selection of the main hyperparameters are shown below. For more information on the processing functions used and additional hyperparameters see:
activation_function: which Activation Function is used in Activation Nodes
batch_size: the number of inputs to include in each processing iteration linked to the learning rate
hidden_network_layers: the number of nodes in each hidden network layer; hidden layers are those between the input and output layers
learning_rate: what algorithm to use for controlling Weight Optimization
maximum_number_of_iterations: the maximum number of iterations of data is processed through the neural network
number_of_data_features: the number of data features used for model training and inference processing
number_of_informative_data_features: the number of data features correlated to the training outputs; this simulates real world model training where the correlation of data features may not be known
number_of_model_classes: the number of output classes the neural network is being trained to predict
number_of_prediction_tests: the number of prediction tests included in the example code
number_of_training_and_test_samples: the number of data samples processed through model training
print_training_progress: whether to print the loss after each training iteration; loss is a measure of the difference between calculated outputs and expected outputs
tolerance_for_optimization: a numeric value used for ending the model training iteration cycles
weight_optimization_algorithm: the algorithm used for Weight Optimization, such as Stochastic Gradient Descent
To download the code below, click here.
""" neural_network_with_scikit_learn.py creates, trains and tests an artificial neural network With the parameter values set as they are, running the code may take as much as a few minutes to finish. To reduce the running time, reduce the parameter value for: number_of_training_and_test_samples """ from sklearn.neural_network import MLPClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split # Set parameters. number_of_training_and_test_samples = 10000 number_of_data_features = 60 batch_size = min(1000, number_of_training_and_test_samples) number_of_informative_data_features = 50 number_of_model_classes = 8 number_of_prediction_tests = 30 activation_function = 'relu' hidden_network_layers = (50, 50, 50) weight_optimization_algorithm = 'sgd' learning_rate = 'adaptive' tolerance_for_optimization = 1e-5 maximum_number_of_iterations = 10000 random_state = 1 print_training_progress = True # Generate model training and test data. X, y = make_classification(n_samples=number_of_training_and_test_samples, n_features=number_of_data_features, n_informative=number_of_informative_data_features, n_classes=number_of_model_classes, random_state=random_state) # Split the classification data into training and testing sets. X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=random_state) # Instantiate a neural network classifier. classifier = MLPClassifier(random_state=random_state, hidden_layer_sizes=hidden_network_layers, batch_size=batch_size, activation=activation_function, solver=weight_optimization_algorithm, learning_rate=learning_rate, tol=tolerance_for_optimization, max_iter=maximum_number_of_iterations, verbose=print_training_progress) # Train the classifier. trained_classifier = classifier.fit(X_train, y_train) # Get the trained model mean accuracy score using test data. mean_accuracy = trained_classifier.score(X_test, y_test) print"Mean Accuracy of All Test Predictions:" print(mean_accuracy) # Process test predictions. test_predictions = trained_classifier.predict(X_test[:number_of_prediction_tests, :]) print("Actual Prediction Test Classes:") print(y_test[:number_of_prediction_tests]) print("Predicted Test Classes:") print(test_predictions)
The example output is below:
Iteration 1, loss = 3.12814482 Iteration 2, loss = 2.67163940 Iteration 3, loss = 2.45638738 Iteration 4, loss = 2.34068097 Iteration 5, loss = 2.26461961 Iteration 6, loss = 2.21042757 Iteration 7, loss = 2.16944587 Iteration 8, loss = 2.13762949 Iteration 9, loss = 2.11139347 Iteration 10, loss = 2.08858667 Iteration 11, loss = 2.06853117 . . . Iteration 3104, loss = 0.01097677 Iteration 3105, loss = 0.01097640 Iteration 3106, loss = 0.01097621 Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000002 Iteration 3107, loss = 0.01097576 Iteration 3108, loss = 0.01097568 Iteration 3109, loss = 0.01097563 Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000000 Iteration 3110, loss = 0.01097555 Iteration 3111, loss = 0.01097553 Iteration 3112, loss = 0.01097552 Training loss did not improve more than tol=0.000010 for two consecutive epochs. Learning rate too small. Stopping. Mean Accuracy of All Test Predictions: 0.6672 Actual Prediction Test Classes: [0 4 3 7 5 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 4 0 2 4 5 3 0 2 3 5] Predicted Test Classes: [0 4 3 7 6 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 7 5 2 4 0 3 0 2 3 5]