Software
Estimator is a Python library that turns in-memory data tables into calibrated, deterministic, dict-in/dict-out predictors — without manual pipeline tuning, model selection, or file serialization.
Core Specs
Estimator is an open-source lightweight tabular regression and Bernoulli-probability calibration Python library (currently at v0.16.0, with source code available on GitHub) designed to optimize the process of creating highly reliable predictors. Instead of forcing the user to manually select model parameters or write a preprocessing pipeline, the library automatically profiles incoming rows, selects appropriate losses, and trains six distinct model families under a shared wall-clock budget. By executing linear models, random forests, XGBoost, TabPFN v2, Gaussian processes, and symbolic regressions in parallel, it identifies the best performing single model or an equal-weighted ensemble, persisting the results directly through an injected storage adapter.
Designed explicitly for Python developers and ML practitioners who already have tabular data in memory (typically fetched from a database or upstream service), Estimator prioritizes correctness, determinism, and calibrated uncertainty over leaderboard-winning heuristics. By utilizing a fixed seed and data fingerprint rather than timestamps, it ensures completely reproducible predictions and handles reload-vs-retrain decisions automatically. The library is completely in-memory, depends on no global state or service layers, and can be integrated into production environments in a matter of minutes.
Estimator.init, evaluate, and info. Integration is measured in minutes, not days.get/put/exists).The snippet below illustrates the minimal dict-in/dict-out public contract in action:
from cognitive_estimator import Estimator, Config
# 1. Initialize the estimator with an injected storage adapter
config = Config(target_column="price", wall_clock_seconds=30)
estimator = Estimator.init(
config=config,
seed=42,
storage_adapter=my_custom_db_adapter
)
# 2. Train and evaluate the models on your in-memory list of dicts
metrics = estimator.evaluate(rows=[
{"feature_1": 1.2, "feature_2": "A", "price": 100.5},
{"feature_1": 2.4, "feature_2": "B", "price": 150.2},
# ... more in-memory dicts
])
# 3. Get predictions with calibrated uncertainty intervals
prediction = estimator.predict(row={"feature_1": 1.8, "feature_2": "A"})
print(f"Mean prediction: {prediction.values}")
print(f"Calibrated interval: [{prediction.lower_bound}, {prediction.upper_bound}]")