Modeling Process
Modeling is a multi-stage methodology for creating trained and tested Machine Learning and AI models.
The Modeling Process is essentially a scientific experiment which includes:
- Development of a Hypothesis - e.g., data collected about a specific previous consumer behavior can be used to predict future behavior 
- Design of the Experiment - e.g., model/algorithm selection 
- Execution of the Experiment - e.g., model training and testing 
- Evaluation and Explanation of Results - e.g., is the hypothesis true or false, what is the accuracy 
Process Phases
Phases in the modeling process, which can be highly recursive/iterative, generally include:
- Type Identification 
- Platform Selection 
- Data Collection 
- Model/Algorithm Selection 
- Model Hyperparameters Setting 
- Model Training 
- Model Testing 
- Model Evaluation 
- Model Deployment 
Type Identification
The type of ML/AI needed can have a significant influence on the details of modelng process phases. Type identification can be driven by:
- Areas of Interest 
- Educational Needs 
- Research and Development 
Major category types include:
- Computer Vision (e.g., object recognition, facial recognition, handwriting recognition) 
- Natural Language Processing (e.g., speech to text, translation, understanding) 
- Pattern Recognition (e.g., event prediction, medical diagnosis) 
Traditional Machine Learning vs. AI Modeling
Generative AI Models such as Large Language Models differ from more traditional Models such as Decision Trees in a number of aspects.
Model Architecture
- AI Models - rely heavily on Transformer Neural Networks 
- ML Models - use a large variety of algorithms such as Artificial Neural Networks and Decision Trees 
Training Data
- AI Models - are trained on very large volumes of data 
- ML Models - are trained on much smaller volumes of data 
Training Compute Resources
- AI Models - use high levels of computing resources during training 
- ML Models - use relatively lower levels of computing resources during training 
Model Fine Tuning
- AI Models - can be fine tuned using small datasets for focused applications 
- ML Models - are typically not fine tuned with additional data after model training 
Transfer Learning
- AI Models - can be applied to a wide variety of applications 
- ML Models - are trained and used for specific applications 
Platform Selection
ML and AI Platforms are generally of two types:
- Open Source (e.g., TensorFlow, Keras, Scikit-Learn, Theano, Caffe, Torch) 
Data Collection
Data Collection is the processing of finding, organizing, cleaning, and storing data in a form that can be fed into model training and prediction processing.
Data Collection can involve:
- Databases (e.g., Columnar, Document, Relational) 
Model/Algorithm Selection
Model/Algorithm Options
Model algorithms to select from include:
Selection Methodologies
Methods of selecting an algorithm include:
- Identifying Project Key Criteria - often include model application, need for model explainability and interpretability, training data availability 
- Reviewing Model Categories - a categorization of models and their variations can provide insights useful for algorithm selection 
- Researching the Latest Advancements - Machine Learning is a very dynamic field; internet searches related to the type of ML being pursued can be valuable; use the Application page of this site to see a Google search for specific areas of interest 
- Experimenting with Various Options - running tests using various algorithms can provide insights into their effectiveness for the type of use envisioned 
- Comparing Models - use a method such as a spreadsheet to compare various models 
Model Hyperparameter Settings
Hyperparameters control aspects of model instantiation and training and can include factors, depending on the model algorithm being used, such as:
- activation_function: which Activation Function is used in Activation Nodes 
- batch_size: the number of inputs to include in each processing iteration linked to the learning rate 
- hidden_network_layers: the number of nodes in each hidden network layer; hidden layers are those between the input and output layers 
- learning_rate: what algorithm to use for controlling Weight Optimization 
- maximum_number_of_iterations: the maximum number of iterations of data is processed through the neural network 
- number_of_data_features: the number of data features used for model training and inference processing 
- number_of_informative_data_features: the number of data features correlated to the training outputs; this simulates real world model training where the correlation of data features may not be known 
- number_of_model_classes: the number of output classes the neural network is being trained to predict 
- number_of_training_and_test_samples: the number of data samples processed through model training 
- print_training_progress: whether to print the loss after each training iteration; loss is a measure of the difference between calculated outputs and expected outputs 
- tolerance_for_optimization: a numeric value used for ending the model training iteration cycles 
- weight_optimization_algorithm: the algorithm used for Weight Optimization, such as Stochastic Gradient Descent 
Model Training
Data is iteratively processed through the model to adjust the weights and biases applied to data array links to produced increasingly more accurate output results. The diagram below illustrates an Artificial Neural Network; the concepts are true for other model algorithms.
- Data Inputs - data is fed into the training process 
- Iteration - data is iteratively passed through the neural network 
- Forward Propagation - data is passed from node to node 
- Outputs - output results are fed into loss calculations 
- Loss Calculation - the difference between output results and desired results is calculated 
- Weight Optimization - the amount of change to data flow weights is calculated 
- Backpropagation - modifies the weights and biases applied to data array links 
Typically the training process is performed iteratively while monitoring for factors such as best accuracy results as illustrated below:
Model Testing
Data is passed forward through the neural network to produce a result and associated confidence level that the result is true. The diagram below illustrates an Artificial Neural Network; the concepts are true for other model algorithms.
- Data Inputs - data is fed into the training process 
- Forward Propagation - data is passed from node to node 
- Outputs - output results are fed into confidence level calculations 
- Confidence Level - is a number from 0 to 1 indicating the probability that the output results is correct 
Model Evaluation
Model Evaluation involves applying Probability and Statistics using measurements such as:
Depending on the results of model evaluation, previous modeling steps may need to be adjusted and repeated.
To reduce overfitting, consider using:
- Fewer Variables 
- Reduced Model Training Time 
Model Reinforcement Learning with Human Feedback (RLHF)
RLHF is a type of machine learning that combines reinforcement learning and human feedback to train AI models.
Key Benefits
- Improved alignment: RLHF helps align the agent's objectives with human values and preferences. 
- Flexibility: RLHF can be applied to various domains, including those with complex or nuanced objectives. 
- Efficient learning: Human feedback accelerates learning, reducing the need for large amounts of data or trial and error. 
Challenges and Limitations
- Scalability: Obtaining high-quality human feedback can be time-consuming and expensive. 
- Bias and variability: Human feedback may be subjective, inconsistent, or biased. 
- Evaluation metrics: Assessing the effectiveness of RLHF can be challenging due to the complexity of human feedback. 
By combining reinforcement learning with human feedback, RLHF enables AI agents to learn complex behaviors and make decisions that align with human values and preferences. Steps 2-4 below are repeated, with the agent refining its policy through continuous human feedback and reward signals.
Step 1: Environment and Agent
The AI agent interacts with an environment, such as a game, simulation, or text-based interface.
Step 2: Human Feedback
Humans provide feedback on the agent's actions, such as:
- Rewards (e.g., +1 for good action, -1 for bad action) 
- Preferences (e.g., "I like this action better than that one") 
- Corrections (e.g., "No, do this instead") 
Step 3: Reward Signal
The human feedback is converted into a reward signal, which guides the agent's learning process. One method of doing this is Proximal Policy Optimization (PPO). 
- PPO was introduced by OpenAI in 2017, designed to optimize the policy of an agent in a stable and efficient manner. PPO is a type of policy gradient method, which means it focuses on optimizing the policy directly rather than relying on a value function. 
- The key innovation of PPO lies in its use of a clipped surrogate objective function. This function constrains the policy updates by clipping the probability ratio between the new and old policies within a specified range. By doing so, PPO prevents large, destabilizing updates to the policy, ensuring that changes remain within a "trust region" that maintains training stability. 
- This approach allows PPO to achieve a balance between exploration and exploitation, making it more sample efficient and stable compared to previous methods like Trust Region Policy Optimization (TRPO). 
- PPO's simplicity, combined with its effectiveness, has made it a popular choice for various applications, including robotics, game playing, and other high-dimensional tasks. 
Step 4: Policy Update
The agent updates its policy (behavior) based on the reward signal, using reinforcement learning algorithms (e.g., Q-learning, policy gradients).
Q-learning is a reinforcement learning algorithm that enables an agent to learn optimal action-selection policies in an environment. Here's how it works:
1. Q-Table Initialization
The algorithm starts by creating a Q-table, which is a matrix where rows represent states and columns represent actions. All Q-values are initially set to zero or random small values.
2. Exploration and Exploitation
The agent interacts with the environment, balancing between exploring new actions and exploiting known good actions, often using an epsilon-greedy strategy.
3. Action Selection
In each state, the agent selects an action, either randomly (exploration) or based on the highest Q-value for that state (exploitation).
4. Reward Observation
After taking an action, the agent observes the reward received and the new state it has transitioned to.
5. Q-Value Update
The Q-value for the state-action pair is updated using the Q-learning formula:
Q(s,a) = Q(s,a) + α * [R + γ * max(Q(s',a')) - Q(s,a)]
Where:
- Q(s,a) is the current Q-value
- α is the learning rate
- R is the reward received
- γ is the discount factor
- max(Q(s',a')) is the maximum Q-value for the next state
6. Iteration
Steps 3-5 are repeated for many episodes, allowing the agent to learn from various experiences.
7. Convergence
Over time, the Q-values converge to optimal values, representing the expected cumulative reward for each action in each state.
8. Policy Extraction
Once training is complete, the optimal policy can be extracted by selecting the action with the highest Q-value for each state.
Q-learning is model-free (doesn't require knowledge of the environment's dynamics) and off-policy (can learn from actions not in the current policy). It effectively learns to make optimal decisions by iteratively improving its estimates of action values based on the rewards received and the structure of the environment.
Model Deployment
Model software deployment typically involves:

 
            