47 Software For Doe Design Of Experiments

Concepts and Terms

47. Software for DOE (Design of Experiments)

DOE Fundamentals

  • Design of Experiments (DOE) - Systematic method to determine factor effects
  • Factor - Independent variable being studied (e.g., temperature, pressure)
  • Level - Specific value of a factor (e.g., 200°C, 250°C)
  • Response - Dependent variable being measured (e.g., film thickness)
  • Treatment - Specific combination of factor levels
  • Experimental unit - Subject receiving treatment (e.g., wafer)
  • Replication - Repeating treatments for statistical power
  • Randomization - Random order to avoid systematic bias
  • Blocking - Grouping units to reduce variability
  • Main effect - Effect of single factor on response
  • Interaction - Combined effect of multiple factors
  • Confounding - When factor effects cannot be separated
  • Resolution - Ability to distinguish effects (I-VI scale)

Classical DOE Designs

  • Full factorial - All combinations of factors and levels
  • 2^k design - Two levels for k factors; 2^k runs
  • 3^k design - Three levels for k factors
  • Fractional factorial - Subset of full factorial; fewer runs
  • Half-fraction (2^(k-1)) - Half the runs of full factorial
  • Quarter-fraction (2^(k-2)) - One-quarter of runs
  • Alias structure - Which effects are confounded
  • Generator - Defines relationship creating fraction
  • Defining relation - Mathematical expression for design
  • Plackett-Burman - Screening design for many factors
  • Central composite design (CCD) - For response surface methodology
  • Box-Behnken - Alternative to CCD; avoids corners
  • Latin square - Balanced design for three factors

Response Surface Methodology (RSM)

  • Response surface - Mathematical model of response vs factors
  • Quadratic model - Includes squared terms and interactions
  • Contour plot - 2D representation of response surface
  • Surface plot - 3D visualization of response
  • Saddle point - Point that's max in one direction, min in another
  • Steepest ascent - Sequential optimization method
  • Ridge analysis - Finding optimum along ridge
  • Canonical analysis - Understanding surface shape
  • Desirability function - Combines multiple responses
  • Overlay plot - Simultaneous constraints visualization

Robust Design (Taguchi Methods)

  • Taguchi method - Focus on reducing variation
  • Noise factors - Uncontrollable variables
  • Control factors - Controllable variables
  • Orthogonal array - Balanced fractional factorial
  • Signal-to-noise ratio (S/N) - Robustness metric
  • Larger-is-better (LIB) - S/N for maximizing response
  • Smaller-is-better (SIB) - S/N for minimizing response
  • Nominal-is-best (NIB) - S/N for targeting specific value
  • Crossed array - Inner (control) and outer (noise) arrays
  • Loss function - Quadratic loss from target
  • Parameter design - Optimize mean response
  • Tolerance design - Optimize variation around mean
  • Confirmation experiment - Verify optimal settings

Advanced DOE Methods

  • Optimal design - Computer-generated optimal designs
  • D-optimal - Maximize determinant of X'X; best for coefficients
  • I-optimal - Minimize prediction variance; best for prediction
  • A-optimal - Minimize trace of variance matrix
  • G-optimal - Minimize maximum prediction variance
  • Space-filling designs - Cover design space uniformly
  • Latin hypercube - Each factor level appears once in each dimension
  • Maximin - Maximize minimum distance between points
  • Uniform design - Low-discrepancy sequences
  • Definitive screening design - Efficient for many factors, quadratic effects
  • Split-plot design - When some factors hard to change
  • Mixture design - When factors must sum to 100%

DOE Software Packages

  • JMP (SAS Institute) - Comprehensive DOE capabilities
  • Custom Design - Flexible optimal design generation
  • Augment Design - Add runs to existing design
  • Evaluate Design - Assess design quality
  • Fit Model - Statistical analysis platform
  • Minitab - Widely used statistical software
  • Factorial Designs - Full and fractional factorials
  • Response Surface - CCD, Box-Behnken
  • Mixture Designs - Formulation experiments
  • Taguchi Designs - Robust design methods
  • Design-Expert (Stat-Ease) - Specialized DOE software
  • Power calculations - Sample size determination
  • Optimal designs - Various criteria
  • Split-plot - Hard-to-change factors
  • Historical data - Analyze existing data
  • SAS/STAT - Enterprise statistical software
  • PROC FACTEX - Generate factorial designs
  • PROC OPTEX - Optimal experimental designs
  • PROC RSREG - Response surface analysis
  • PROC GLM - General linear models

Specialized Semiconductor DOE Tools

  • Synopsys SiVL (Silicon Variation Library) - Process variation analysis
  • TEL Process Integration Platform - Equipment-linked DOE
  • Applied Materials Advisor - Equipment-specific optimization
  • KLA SPECS - Metrology-integrated DOE
  • Yield management systems - DOE modules in YMS
  • Run-to-run control systems - Automated DOE execution
  • SEMI E133 - Standard for automated DOE

DOE Data Analysis

  • ANOVA (Analysis of Variance) - Decompose variance by source
  • Sum of squares - Measure of variation
  • Mean square - Sum of squares divided by degrees of freedom
  • F-statistic - Ratio of mean squares; tests significance
  • P-value - Probability of observed result if null hypothesis true
  • Effect plot - Visualization of factor effects
  • Pareto chart - Rank effects by magnitude
  • Half-normal plot - Identify significant effects
  • Residual analysis - Check model assumptions
  • Normal probability plot - Test normality of residuals
  • Lack of fit - Test if model captures true relationship
  • R-squared - Fraction of variance explained by model
  • Adjusted R-squared - R² corrected for number of terms
  • Predicted R-squared - Cross-validation metric
  • PRESS (Predicted Residual Sum of Squares) - Leave-one-out error

Modeling & Prediction

  • Regression - Fit model to data
  • Linear model - First-order terms only
  • Quadratic model - Includes squared terms
  • Interaction model - Includes cross-product terms
  • Stepwise regression - Automatic term selection
  • Forward selection - Add terms sequentially
  • Backward elimination - Remove terms sequentially
  • Best subsets - Evaluate all combinations
  • Model hierarchy - Include lower-order terms if higher-order significant
  • Multicollinearity - Correlated predictors cause instability
  • VIF (Variance Inflation Factor) - Measure of multicollinearity
  • Prediction interval - Range for individual predictions
  • Confidence interval - Range for mean prediction

Optimization

  • Numerical optimization - Find optimal factor settings
  • Graphical optimization - Visual identification of optimal region
  • Desirability - Combine multiple responses into single objective
  • Individual desirability - Transform each response to 0-1 scale
  • Overall desirability - Geometric mean of individual desirabilities
  • Constraints - Limits on factors or responses
  • Optimization algorithm - Gradient, simplex, genetic algorithm
  • Multiple optima - Check for local vs global optimum
  • Robust optimum - Settings insensitive to variation
  • Confirmation runs - Verify predicted optimum

Sequential Experimentation

  • EVOP (Evolutionary Operation) - Small changes during production
  • Simplex method - Geometric optimization approach
  • OFAT (One Factor At a Time) - Inefficient but common approach
  • Steepest ascent/descent - Move in direction of greatest improvement
  • Ridge analysis - Follow ridge toward optimum
  • Sequential assembly - Build design in stages
  • Augmentation - Add runs to existing design
  • Screening → Optimization - Two-stage approach

DOE Best Practices

  • Pre-experimental planning - Define objectives before running
  • Subject matter expertise - Include process engineers
  • Factor selection - Choose relevant factors
  • Range selection - Wide enough to see effects, narrow enough to run
  • Center points - Detect curvature, estimate error
  • Replication strategy - Balance power and cost
  • Run order - Randomize to avoid bias
  • Documentation - Record all conditions
  • Analysis strategy - Plan analysis before running
  • Follow-up experiments - Iterate based on results
Speech Content

Design of Experiments Software for Semiconductor Manufacturing

This overview covers D O E fundamentals, classical and advanced designs, software platforms, and opportunities for next-generation fabs. Key terms include factorial design, response surface methodology, Taguchi methods, optimal design, and desirability functions.

Design of Experiments Fundamentals

Design of Experiments, or D O E, is the systematic methodology for understanding how input factors affect output responses in semiconductor processes. Unlike the naive approach of changing one factor at a time, D O E captures interactions, which are the synergistic or antagonistic effects when factors combine in non-additive ways.

The core vocabulary starts with factors, which are your independent variables like chamber temperature, pressure, or gas flow rates. Each factor has levels, which are specific values you test, such as 200 degrees versus 250 degrees Celsius. The response is what you measure, like film thickness or defect density. A treatment is one specific combination of all factor levels. The experimental unit is typically a wafer or wafer region receiving that treatment.

Replication means running the same treatment multiple times to gain statistical power. Randomization means running experiments in random order to prevent time-based drift from confounding your results. Blocking groups experimental units by known sources of variation, like chamber or lot, to isolate that variability.

Main effects measure the average change in response when a single factor moves from low to high. Interactions occur when the effect of factor A depends on the level of factor B. Confounding happens when effects cannot be mathematically separated. Resolution describes a design's ability to distinguish effects, ranging from resolution three where main effects are tangled with two-way interactions, up to resolution five or higher where two-way interactions can be cleanly estimated.

Classical Design Structures

A full factorial design tests every combination of factor levels. With two levels and k factors, you need two to the k runs. For five factors, that is 32 experiments. Three levels per factor causes explosive growth: four factors require 81 runs.

Fractional factorials reduce this by using clever aliasing. A half-fraction uses two to the k minus one runs. A generator expression defines how factor columns relate, creating an alias structure that shows which effects are confounded together. The defining relation is the mathematical identity underlying the design.

Plackett-Burman designs are saturated screening designs handling n minus one factors in n runs where n is a multiple of four. Excellent for initial screening of 10 to 20 factors, though all two-way interactions become confounded.

Definitive screening designs, introduced by Jones and Nachtsheim in 20 11, revolutionized semiconductor D O E by estimating main effects, quadratic effects, and some interactions in just two k plus one runs.

Response Surface Methods

Response surface methodology, or R S M, fits polynomial models, typically quadratic with squared terms and cross products, to characterize how responses vary across factor space.

Central composite designs add axial points to factorial corners plus center points, enabling quadratic model fitting. Box-Behnken designs avoid corner points, useful when corners represent physically extreme conditions.

Visualization includes contour plots showing iso-response lines in two dimensions, surface plots rendering three dimensional landscapes, and overlay plots superimposing multiple constraints to reveal feasible operating regions.

Saddle points are maxima in one direction but minima in another, common in semiconductor processes involving trade-offs. Ridges are extended optimal regions offering operational flexibility.

For multiple responses, desirability functions transform each response to a zero to one scale, then combine them via geometric mean into overall desirability. This handles simultaneous optimization of thickness, uniformity, stress, and defect density.

Taguchi Robust Design

Genichi Taguchi focused on making processes insensitive to noise. Control factors are what engineers can set. Noise factors are uncontrollable or costly to control, like incoming wafer variation.

Orthogonal arrays like L 8, L 9, and L 18 provide balanced fractional factorials. Crossed arrays combine an inner array of control factors with an outer array of noise factors.

Signal to noise ratios include larger is better for maximizing yield, smaller is better for minimizing defects, and nominal is best for targeting specific values like film thickness. The loss function quantifies quadratic penalty from target, monetizing variation.

Advanced and Optimal Designs

Computer-generated optimal designs handle irregular experimental regions and constraints. D optimal designs maximize parameter estimation precision. I optimal designs minimize average prediction variance. G optimal designs minimize worst-case prediction error.

Space-filling designs like Latin hypercube sampling ensure each factor level appears once per dimension, critical for computer experiments and T C A D simulation D O Es. Maximin designs maximize the minimum distance between points.

Split-plot designs address hard to change factors like chamber temperature requiring long stabilization. Proper analysis needs mixed models accounting for whole-plot and sub-plot error structures.

Software Platforms

J M P from S A S Institute offers custom design generation, design augmentation, evaluation tools, and interactive profilers. Minitab provides traditional design catalogs with excellent tutorials. Design Expert from Stat-Ease specializes exclusively in D O E with strong mixture experiment support. S A S slash S T A T offers enterprise-grade procedures like P R O C F A C T E X and P R O C O P T E X.

Semiconductor-specific tools include Applied Materials Advisor for equipment-specific optimization, K L A S P E C S for metrology-integrated D O E, and yield management systems with embedded D O E modules. The S E M I E 1 33 standard governs automated D O E execution.

Statistical Analysis Essentials

A N O V A, analysis of variance, decomposes total variation into model and error components. The F statistic tests whether factor effects exceed random noise. P values below 0.05 typically indicate significance.

Effect plots and Pareto charts visualize factor importance. Half-normal plots elegantly identify active factors in fractional factorials. Residual analysis checks model assumptions. Lack of fit tests compare pure error to model error.

R squared measures variance explained, while adjusted R squared penalizes excess terms. Predicted R squared uses leave-one-out cross-validation and should fall within 0.2 of adjusted R squared. Multicollinearity inflates variance when predictors correlate, measured by variance inflation factor exceeding 10.

Opportunities for Advanced Fabs

For lunar manufacturing, virtual D O E using calibrated T C A D models becomes essential given high experiment costs. Bayesian optimal experimental design selects maximum-value physical experiments. Reduced noise factors from stable vacuum simplify Taguchi crossed arrays. Autonomous D O E execution handles communication latency.

For western competitive fabs pursuing vacuum-native processing, D O E for cold welding and wafer bonding creates entirely new response variables: bond strength, contact resistance, hermeticity. Chiplet integration D O E addresses bonding parameters and alignment tolerances.

A I acceleration via Bayesian optimization can reduce required experiments three to five fold. Gaussian process models with acquisition functions like expected improvement guide sequential experiment selection. Transfer learning from T C A D simulations initializes models before expensive physical runs.

Mature robotics enables automated recipe parameter changes, inline metrology integration, and elimination of operator variation as a noise factor. High-throughput combinatorial approaches test thousands of conditions per wafer using spatial gradients.

Historical methods worth revisiting include E V O P, evolutionary operation, for continuous production optimization compatible with autonomous fabs. Emerging methods include physics-informed machine learning surrogates, federated D O E combining data across fabs without sharing raw data, and causal inference approaches moving beyond correlation to mechanism understanding.

Core Concepts Review

You have now covered D O E fundamentals including factors, levels, responses, treatments, replication, randomization, blocking, main effects, interactions, confounding, and resolution. Classical designs span full factorial, fractional factorial, Plackett-Burman, central composite, and Box-Behnken. Response surface methodology enables quadratic modeling and desirability optimization. Taguchi methods address robustness through signal to noise ratios and crossed arrays. Advanced optimal designs include D optimal, I optimal, and space-filling Latin hypercube. Software platforms J M P, Minitab, Design Expert, and S A S provide implementation. Analysis relies on A N O V A, F statistics, R squared metrics, and residual diagnostics. Strategic opportunities lie in Bayesian optimization, autonomous experimentation, vacuum-native process D O E, and A I-accelerated sequential design.

Technical Overview

Design of Experiments (DOE) Software for Semiconductor Manufacturing

Fundamental Framework

DOE is the systematic methodology for determining causal relationships between input factors (temperature, pressure, gas flows, time, power) and output responses (film thickness, uniformity, defect density, electrical parameters). Unlike OFAT (one-factor-at-a-time), DOE captures interactions—the synergistic or antagonistic effects when factors combine non-additively.

Core Terminology:
- Factor: Independent variable (e.g., chamber pressure)
- Level: Discrete value of factor (e.g., 1 mTorr, 5 mTorr, 10 mTorr)
- Response: Measured output (e.g., etch rate in nm/min)
- Treatment: Specific combination of all factor levels
- Experimental unit: Individual wafer or wafer region receiving treatment
- Replication: Running same treatment multiple times for statistical power
- Randomization: Random run order to prevent time-drift confounding
- Blocking: Grouping experimental units (e.g., by lot, chamber) to isolate known variability sources
- Main effect: Average change in response when factor moves from low to high
- Interaction: When effect of factor A depends on level of factor B
- Confounding: When effects cannot be mathematically separated
- Resolution: Design's ability to distinguish effects (III: main effects confounded with 2-way interactions; IV: main effects clear, 2-way interactions confounded with each other; V+: 2-way interactions estimable)

Classical DOE Designs

Full Factorial (2^k and 3^k):
- 2^k: Two levels per factor, 2^k total runs. For k=5 factors: 32 runs
- 3^k: Three levels enables curvature detection. k=4 requires 81 runs—often impractical
- Provides complete information but exponential growth limits applicability

Fractional Factorial:
- Half-fraction (2^(k-1)): Uses generator to alias effects. 5 factors in 16 runs instead of 32
- Quarter-fraction (2^(k-2)): Two generators, more aliasing. 6 factors in 16 runs
- Alias structure: Mathematical relationship showing which effects are confounded
- Generator: Expression like E=ABCD means factor E's column equals product of A,B,C,D columns
- Defining relation: I=ABCDE means product of all factors equals identity column

Screening Designs:
- Plackett-Burman: Saturated designs for n-1 factors in n runs (n multiple of 4). Excellent for initial screening 10-20 factors but all 2-way interactions confounded
- Definitive Screening Design (DSD): Jones & Nachtsheim (2011). Estimates main effects, quadratic effects, and some 2-way interactions in 2k+1 runs. Revolutionary for semiconductor where factors often have nonlinear effects

Response Surface Designs:
- Central Composite Design (CCD): 2^k factorial + axial points + center points. Axial points at ±α from center enable quadratic model fitting
- Box-Behnken: Alternative avoiding corner points. Useful when corners represent physically extreme/dangerous conditions
- Latin Square: Balances three factors; used when two nuisance factors exist

Response Surface Methodology (RSM)

RSM fits polynomial models (typically quadratic: y = β₀ + Σβᵢxᵢ + Σβᵢᵢxᵢ² + ΣΣβᵢⱼxᵢxⱼ) to characterize response surfaces.

Visualization:
- Contour plot: 2D slices showing iso-response lines
- Surface plot: 3D rendering of response vs two factors
- Overlay plot: Multiple constraints superimposed to show feasible region

Surface Features:
- Saddle point: Maximum in one direction, minimum in another—common in semiconductor processes where trade-offs exist
- Ridge: Extended optimal region—allows operating flexibility

Sequential Optimization:
- Steepest ascent/descent: Follow gradient direction with sequential experiments
- Ridge analysis: When optimum lies outside design region or along a ridge

Multi-Response Optimization:
- Desirability function: Transform each response to 0-1 scale based on target/bounds
- Overall desirability: Geometric mean D = (d₁^w₁ × d₂^w₂ × ... × dₙ^wₙ)^(1/Σwᵢ)
- Critical for semiconductor where multiple specs (thickness, uniformity, stress, defects) must be simultaneously optimized

Taguchi Methods (Robust Design)

Genichi Taguchi's philosophy: design products/processes insensitive to noise.

Key Concepts:
- Control factors: Engineer can set (temperature setpoint, flow rate)
- Noise factors: Uncontrollable or costly to control (incoming wafer variation, ambient humidity, chamber drift)
- Orthogonal arrays: L4, L8, L9, L16, L18 designs providing balanced fractional factorials
- Crossed array: Inner array (control factors) × outer array (noise factors)

Signal-to-Noise Ratios:
- Larger-is-better (LIB): S/N = -10×log₁₀(mean(1/y²)) — maximize yield
- Smaller-is-better (SIB): S/N = -10×log₁₀(mean(y²)) — minimize defects
- Nominal-is-best (NIB): S/N = 10×log₁₀(μ²/σ²) — target thickness

Loss Function: L(y) = k(y-τ)² — quadratic loss from target τ, monetizing variation

Controversial Aspects: Statisticians criticize Taguchi's two-stage optimization and reliance on S/N ratios. Modern practice often uses RSM with explicit noise factor modeling.

Advanced DOE Methods

Optimal Designs (Computer-Generated):
- D-optimal: Maximize |X'X| — minimizes parameter estimate variances
- I-optimal: Minimize average prediction variance — best for prediction/transfer models
- A-optimal: Minimize trace(X'X)⁻¹
- G-optimal: Minimax prediction variance

These designs handle irregular experimental regions, missing combinations, constrained factor spaces—common in semiconductor where some combinations are physically impossible.

Space-Filling Designs:
- Latin Hypercube: Stratified sampling; each factor level appears once per dimension. Critical for computer experiments, TCAD simulation DOEs
- Maximin: Maximize minimum inter-point distance
- Uniform designs (Fang): Low-discrepancy sequences filling space evenly

Split-Plot Designs: When some factors are hard-to-change (HTCh factors like chamber temperature requiring long stabilization). Proper analysis requires mixed models accounting for whole-plot and sub-plot error structures. Critical for fab experiments where chamber conditions can't be randomized between wafers.

Mixture Designs: When factors are proportions summing to 100% (e.g., gas mixtures). Simplex-lattice, simplex-centroid designs account for constraint.

DOE Software Platforms

JMP (SAS Institute):
- Custom Design platform: Specify model, constraints, generate optimal design
- Augment Design: Add runs to existing design for more power or model terms
- Evaluate Design: Power analysis, alias structure, variance inflation
- Profiler: Interactive prediction profiler for optimization
- Functional Data Explorer: Handle spectral/profile data as responses
- Strengths: Visualization, interactivity, scripting (JSL)

Minitab:
- Traditional design catalog approach
- Excellent tutorials, widely used in six sigma
- Response optimizer for multi-response
- Limitations: Less flexible than JMP for irregular designs

Design-Expert (Stat-Ease):
- Focused exclusively on DOE
- Strong for mixture experiments
- Good split-plot support
- 3D response surfaces with hold factors

SAS/STAT:
- PROC FACTEX: Generate factorial designs
- PROC OPTEX: Optimal design generation with exchange algorithm
- PROC RSREG: Ridge analysis, canonical analysis
- PROC GLM/MIXED: Analysis with proper error structures
- Enterprise-grade, integrates with fab data systems

Python/R Open Source:
- pyDOE2, dexpy (Python)
- AlgDesign, rsm, FrF2 (R)
- Increasingly used for automation integration

Specialized Semiconductor DOE Tools

Equipment Vendor Platforms:
- Applied Materials Advisor: Equipment-specific models, process sensitivity analysis integrated with chamber data
- TEL Process Integration Platform: DOE module linked to equipment controllers
- Lam Research: Process window characterization tools

Metrology Integration:
- KLA SPECS: DOE driven by inline metrology, automatic response collection
- Yield management systems (YMS): DOE modules in PDF Solutions, Synopsys Yield Explorer

Automation Standards:
- SEMI E133: Standard for recipe-parametric DOE execution
- Run-to-run (R2R) control: Automated DOE within control framework

TCAD Integration:
- Synopsys SiVL: Process variation analysis using TCAD, virtual DOE
- Virtual DOE: Computer experiments using process simulators before physical experiments

Statistical Analysis Methods

ANOVA Framework:
- Sum of squares decomposition: SS_total = SS_model + SS_error
- Mean square: MS = SS/df
- F-statistic: F = MS_factor/MS_error
- P-value: probability under null hypothesis; <0.05 typically significant

Effect Analysis:
- Effect plot: Main effects as line plots
- Pareto chart: Effects ranked by magnitude
- Half-normal plot: Significant effects deviate from line; elegant for identifying active factors in fractional factorials

Model Diagnostics:
- Residual plots: Check normality, constant variance, independence
- Lack of fit test: Compare pure error to model error; significant LOF indicates inadequate model
- : Proportion of variance explained
- Adjusted R²: Penalizes additional terms
- Predicted R²: Leave-one-out cross-validation; should be within 0.2 of adjusted R²
- PRESS: Predicted residual sum of squares

Regression Concerns:
- Multicollinearity: Correlated predictors inflate variance
- VIF (Variance Inflation Factor): VIF > 10 indicates serious multicollinearity
- Model hierarchy: Include lower-order terms if higher-order terms significant
- Stepwise methods: Forward selection, backward elimination, best subsets—use with caution, can overfit

Optimization Strategies

Numerical Optimization: Gradient-based or evolutionary algorithms find optimal factor settings. Most software uses sequential quadratic programming or Nelder-Mead simplex.

Multiple Optima: Response surfaces often have multiple local optima. Use multiple starting points, grid search, or genetic algorithms.

Robust Optimization: Find settings where response is insensitive to factor variation (flat region of surface).

Confirmation Runs: Always verify predicted optimum with independent experiments. Compare predicted vs actual with prediction intervals.

Sequential Experimentation Philosophy

Traditional Two-Stage:
1. Screening: Plackett-Burman or low-resolution fractional factorial to identify vital few factors
2. Optimization: RSM design on significant factors

EVOP (Evolutionary Operation): Small factorial designs during production for continuous improvement without disrupting manufacturing. Pioneered by Box.

OFAT Inefficiency: One-factor-at-a-time misses interactions and requires more runs to achieve same precision. Still common due to simplicity but statistically inferior.

Augmentation: Add runs to existing design based on initial results. JMP Augment Design feature supports this iterative approach.

AI/ML Integration Opportunities

Bayesian Optimization: Model response surface with Gaussian Process, use acquisition function (Expected Improvement, Upper Confidence Bound) to select next experiment. Dramatically reduces experiments needed for optimization. Software: BoTorch, Ax (Facebook), GPyOpt.

Neural Network Surrogate Models: Train neural networks on TCAD or experimental data. Use for rapid prediction, gradient-based optimization over many factors.

Active Learning: Sequentially select most informative experiments. Uncertainty sampling focuses on regions of high model uncertainty.

Transfer Learning: Use data from similar processes to initialize models, reducing experiments needed on new process.

Automated Feature Engineering: ML identifies non-obvious factor transformations or interactions.

High-Dimensional DOE: Traditional DOE struggles beyond ~15 factors. ML methods (random forests, gradient boosting) can handle 100+ factors with proper regularization.

Moon-Based Fab Considerations

Virtual DOE Dominance: With high experiment costs (wafer transport, vacuum chamber cycling), maximize virtual DOE using TCAD models calibrated to sparse physical experiments. Bayesian optimal experimental design selects highest-value physical experiments.

Reduced Noise Factors: Lunar vacuum eliminates atmospheric variation, humidity noise. Simplified crossed arrays in Taguchi designs.

Vibration Blocking: Seismic isolation as natural block factor eliminated.

Thermal Factors: Extreme day/night temperature variation becomes critical noise factor. Robust designs must address thermal control.

Communication Latency: Sequential experimentation algorithms must run locally; 2.5s round-trip prohibits Earth-based interactive optimization. Autonomous DOE execution essential.

Resource Constraints: Consumable factors (gases, chemicals) tightly constrained. Mixture designs critical for optimizing minimal chemistry approaches.

Western Competitive Fab Considerations

Vacuum-Native Processing:
- DOE for cold welding parameters, wafer bonding without oxides
- Vacuum dielectric eliminates barrier/passivation DOEs
- Novel response variables: contact resistance, bond strength, hermeticity
- Simplified factor space if atmospheric contamination eliminated

Chiplet Integration DOE:
- Bonding temperature, pressure, time, surface preparation
- Alignment tolerance as response
- Thermal cycling as noise factor

AI-Accelerated Experimentation:
- Bayesian optimization reduces experiments 3-5x
- Autonomous lab execution with robotics
- Real-time metrology feeding adaptive DOE
- Transfer learning from TCAD simulations

Talent & Software:
- JMP, Design-Expert expertise concentrated in pharma (Boston, San Francisco)
- Semiconductor DOE expertise: Arizona, Oregon, Texas
- European statistics centers: Netherlands, Germany
- Open-source alternatives reduce software costs

Key Opportunities:
- Unified vacuum process window characterization
- DOE software integrated with autonomous fab robotics
- ML-native DOE replacing classical designs
- Process digital twins enabling massive virtual DOE

Robotics & Automation Impact

Automated DOE Execution:
- Recipe parameter changes automated
- Metrology integrated inline
- No operator variation as noise factor

High-Throughput Experimentation:
- Combinatorial approaches: 1000s of conditions on single wafer
- Spatial DOE using wafer gradients
- Automatic analysis pipeline

Adaptive Experimentation:
- Real-time response to prior results
- Bayesian optimization with continuous updates
- Sequential design modification without human delay

Documentation & Traceability:
- Complete experimental provenance
- Automatic reporting
- Regulatory compliance for aerospace/auto applications

Historical and Emerging Methods

Historical (Worth Revisiting):
- EVOP: Continuous production optimization, compatible with autonomous fabs
- Sequential simplex: Geometric optimization, minimal compute requirements
- Supersaturated designs: More factors than runs, modern sparse regression enables analysis

Emerging:
- Self-validating designs: Built-in model checking
- Causal inference DOE: Beyond correlation to mechanism understanding
- Physics-informed ML for DOE: Incorporate known physics into surrogate models
- Federated DOE: Combine data across fabs/companies without sharing raw data
- Quantum-inspired optimization: For extremely high-dimensional factor spaces

Open Research Questions:
- Optimal experiment selection with non-stationary processes
- DOE for hierarchical/multi-scale processes
- Uncertainty quantification in neural network surrogates
- DOE for processes with long memory/hysteresis