Software

Hand it rows. Get back a predictor.

Estimator is a Python library that turns in-memory data tables into calibrated, deterministic, dict-in/dict-out predictors — without manual pipeline tuning, model selection, or file serialization.

Core Specs

  • ✓ Three-method API (init, evaluate, info)
  • ✓ Six model families, one wall-clock budget
  • ✓ Deterministic under fixed seed and data hash
  • ✓ In-memory, service-free library footprint

Secure and calibrated prediction for applied engineering

Estimator is an open-source lightweight tabular regression and Bernoulli-probability calibration Python library (currently at v0.16.0, with source code available on GitHub) designed to optimize the process of creating highly reliable predictors. Instead of forcing the user to manually select model parameters or write a preprocessing pipeline, the library automatically profiles incoming rows, selects appropriate losses, and trains six distinct model families under a shared wall-clock budget. By executing linear models, random forests, XGBoost, TabPFN v2, Gaussian processes, and symbolic regressions in parallel, it identifies the best performing single model or an equal-weighted ensemble, persisting the results directly through an injected storage adapter.

Designed explicitly for Python developers and ML practitioners who already have tabular data in memory (typically fetched from a database or upstream service), Estimator prioritizes correctness, determinism, and calibrated uncertainty over leaderboard-winning heuristics. By utilizing a fixed seed and data fingerprint rather than timestamps, it ensures completely reproducible predictions and handles reload-vs-retrain decisions automatically. The library is completely in-memory, depends on no global state or service layers, and can be integrated into production environments in a matter of minutes.

Key Principles

  • Three-Method API: The entire public surface is defined by Estimator.init, evaluate, and info. Integration is measured in minutes, not days.
  • Six Families, One Budget: Seamlessly bridges Linear models, Random Forests, XGBoost, TabPFN v2, Gaussian Processes, and Symbolic Regression through a unified protocol.
  • Honest Evaluation & Calibrated Intervals: Uses a leak-proof 70/15/15 split for larger datasets ($n \geq 50$), and leave-one-out cross-validation (LOO-CV) for tiny tiers (4–49 rows), returning first-class conformal uncertainty intervals.
  • Determinism & Fingerprinting: Same data, seed, and config yields bit-identical outputs. Persistence is delegated through a simple three-method adapter contract (get/put/exists).

Minimal Integration Example

The snippet below illustrates the minimal dict-in/dict-out public contract in action:

from cognitive_estimator import Estimator, Config

# 1. Initialize the estimator with an injected storage adapter
config = Config(target_column="price", wall_clock_seconds=30)
estimator = Estimator.init(
    config=config,
    seed=42,
    storage_adapter=my_custom_db_adapter
)

# 2. Train and evaluate the models on your in-memory list of dicts
metrics = estimator.evaluate(rows=[
    {"feature_1": 1.2, "feature_2": "A", "price": 100.5},
    {"feature_1": 2.4, "feature_2": "B", "price": 150.2},
    # ... more in-memory dicts
])

# 3. Get predictions with calibrated uncertainty intervals
prediction = estimator.predict(row={"feature_1": 1.8, "feature_2": "A"})
print(f"Mean prediction: {prediction.values}")
print(f"Calibrated interval: [{prediction.lower_bound}, {prediction.upper_bound}]")