Reproducibility and Seeds
SAES supports deterministic behavior for reproducible research through random seed control.
Why Reproducibility Matters
When analyzing stochastic algorithms, reproducibility is crucial for:
Research validation: Others can verify your results
Debugging: Consistent results make it easier to identify issues
Comparisons: Fair comparison requires consistent conditions
Publication: Many journals and conferences require reproducible results
Functions with Random Seeds
The following SAES functions support deterministic execution via the seed parameter:
Bayesian Statistical Tests
Both Bayesian tests support the seed parameter for reproducibility:
from SAES.statistical_tests.bayesian import bayesian_sign_test, bayesian_signed_rank_test
import pandas as pd
data = pd.DataFrame({
'Algorithm_A': [0.9, 0.85, 0.95, 0.9, 0.92],
'Algorithm_B': [0.5, 0.6, 0.55, 0.58, 0.52]
})
# Deterministic results with seed
result1, _ = bayesian_sign_test(data, sample_size=5000, seed=42)
result2, _ = bayesian_sign_test(data, sample_size=5000, seed=42)
# result1 and result2 will be identical
# Same for signed rank test
result3, _ = bayesian_signed_rank_test(data, sample_size=1000, seed=123)
Histogram Plots
The HistoPlot class supports seeding for consistent jitter when handling identical values:
from SAES.plots.histoplot import HistoPlot
import pandas as pd
data = pd.read_csv("results.csv")
metrics = pd.read_csv("metrics.csv")
# Create histoplot with reproducible jitter
histoplot = HistoPlot(data, metrics, "Accuracy", seed=42)
histoplot.save_instance("Problem1", "output.png")
Best Practices
Always use seeds for published research: Set explicit seeds for all random operations
Document your seeds: Include seed values in your research papers and code
Use different seeds for different experiments: Avoid accidentally reusing the same random sequence
Version control: Include seed values in your version-controlled analysis scripts
Example: Complete Reproducible Workflow
from SAES.statistical_tests.bayesian import bayesian_sign_test, bayesian_signed_rank_test
from SAES.plots.histoplot import HistoPlot
import pandas as pd
# Load data
data = pd.read_csv("algorithm_results.csv")
metrics = pd.read_csv("metrics.csv")
# Reproducible Bayesian analysis
SEED = 42
algorithm_a = data[data['Algorithm'] == 'A']['MetricValue']
algorithm_b = data[data['Algorithm'] == 'B']['MetricValue']
comparison_data = pd.DataFrame({
'Algorithm_A': algorithm_a.values,
'Algorithm_B': algorithm_b.values
})
# Run Bayesian test with seed
result, samples = bayesian_sign_test(
comparison_data,
sample_size=5000,
seed=SEED
)
print(f"P(A < B): {result[0]:.4f}")
print(f"P(A ≈ B): {result[1]:.4f}")
print(f"P(A > B): {result[2]:.4f}")
# Create reproducible visualization
histoplot = HistoPlot(data, metrics, "Accuracy", seed=SEED)
histoplot.save_all_instances("comparison.png")
Headless Mode for Automated Workflows
SAES can be run in headless mode (without display) for automated pipelines and CI/CD:
# Set matplotlib to use non-interactive backend
export MPLBACKEND=Agg
# Run SAES commands
python -m SAES -ls -ds data.csv -ms metrics.csv -m HV -s friedman -op results.tex
python -m SAES -bp -ds data.csv -ms metrics.csv -m HV -i Problem1 -op boxplot.png
python -m SAES -cdp -ds data.csv -ms metrics.csv -m HV -op cdplot.png
For Python scripts in headless environments:
import matplotlib
matplotlib.use('Agg') # Must be called before importing pyplot
from SAES.plots.boxplot import Boxplot
import pandas as pd
# Your analysis code here
data = pd.read_csv("results.csv")
metrics = pd.read_csv("metrics.csv")
boxplot = Boxplot(data, metrics, "Accuracy")
boxplot.save_instance("Problem1", "output.png")