Using several recent innovations, the company Databricks will let customers boost the IQ of their AI models even if they don’t have squeaky clean data.
Databricks developed a "Test-time Adaptive Optimization" (TAO) method that improves AI model performance by combining reinforcement learning and synthetic training data, allowing models to enhance themselves even without clean, labeled data. This technique, called "best-of-N," trains a model to predict preferred outputs and uses this to create synthetic data for further fine-tuning. Databricks demonstrated the effectiveness of TAO by surpassing OpenAI's models on the FinanceBench benchmark using Meta's Llama 3.1B model.