Reproducibility and Seeds
=========================

SAES supports deterministic behavior for reproducible research through random seed control.

Why Reproducibility Matters
---------------------------

When analyzing stochastic algorithms, reproducibility is crucial for:

- **Research validation**: Others can verify your results
- **Debugging**: Consistent results make it easier to identify issues
- **Comparisons**: Fair comparison requires consistent conditions
- **Publication**: Many journals and conferences require reproducible results

Functions with Random Seeds
---------------------------

The following SAES functions support deterministic execution via the ``seed`` parameter:

Bayesian Statistical Tests
~~~~~~~~~~~~~~~~~~~~~~~~~~

Both Bayesian tests support the ``seed`` parameter for reproducibility:

.. code-block:: python

    from SAES.statistical_tests.bayesian import bayesian_sign_test, bayesian_signed_rank_test
    import pandas as pd

    data = pd.DataFrame({
        'Algorithm_A': [0.9, 0.85, 0.95, 0.9, 0.92],
        'Algorithm_B': [0.5, 0.6, 0.55, 0.58, 0.52]
    })

    # Deterministic results with seed
    result1, _ = bayesian_sign_test(data, sample_size=5000, seed=42)
    result2, _ = bayesian_sign_test(data, sample_size=5000, seed=42)
    # result1 and result2 will be identical

    # Same for signed rank test
    result3, _ = bayesian_signed_rank_test(data, sample_size=1000, seed=123)

Histogram Plots
~~~~~~~~~~~~~~

The HistoPlot class supports seeding for consistent jitter when handling identical values:

.. code-block:: python

    from SAES.plots.histoplot import HistoPlot
    import pandas as pd

    data = pd.read_csv("results.csv")
    metrics = pd.read_csv("metrics.csv")

    # Create histoplot with reproducible jitter
    histoplot = HistoPlot(data, metrics, "Accuracy", seed=42)
    histoplot.save_instance("Problem1", "output.png")

Best Practices
-------------

1. **Always use seeds for published research**: Set explicit seeds for all random operations
2. **Document your seeds**: Include seed values in your research papers and code
3. **Use different seeds for different experiments**: Avoid accidentally reusing the same random sequence
4. **Version control**: Include seed values in your version-controlled analysis scripts

Example: Complete Reproducible Workflow
---------------------------------------

.. code-block:: python

    from SAES.statistical_tests.bayesian import bayesian_sign_test, bayesian_signed_rank_test
    from SAES.plots.histoplot import HistoPlot
    import pandas as pd

    # Load data
    data = pd.read_csv("algorithm_results.csv")
    metrics = pd.read_csv("metrics.csv")

    # Reproducible Bayesian analysis
    SEED = 42
    algorithm_a = data[data['Algorithm'] == 'A']['MetricValue']
    algorithm_b = data[data['Algorithm'] == 'B']['MetricValue']
    
    comparison_data = pd.DataFrame({
        'Algorithm_A': algorithm_a.values,
        'Algorithm_B': algorithm_b.values
    })

    # Run Bayesian test with seed
    result, samples = bayesian_sign_test(
        comparison_data, 
        sample_size=5000, 
        seed=SEED
    )

    print(f"P(A < B): {result[0]:.4f}")
    print(f"P(A ≈ B): {result[1]:.4f}")
    print(f"P(A > B): {result[2]:.4f}")

    # Create reproducible visualization
    histoplot = HistoPlot(data, metrics, "Accuracy", seed=SEED)
    histoplot.save_all_instances("comparison.png")

Headless Mode for Automated Workflows
-------------------------------------

SAES can be run in headless mode (without display) for automated pipelines and CI/CD:

.. code-block:: bash

    # Set matplotlib to use non-interactive backend
    export MPLBACKEND=Agg

    # Run SAES commands
    python -m SAES -ls -ds data.csv -ms metrics.csv -m HV -s friedman -op results.tex
    python -m SAES -bp -ds data.csv -ms metrics.csv -m HV -i Problem1 -op boxplot.png
    python -m SAES -cdp -ds data.csv -ms metrics.csv -m HV -op cdplot.png

For Python scripts in headless environments:

.. code-block:: python

    import matplotlib
    matplotlib.use('Agg')  # Must be called before importing pyplot
    
    from SAES.plots.boxplot import Boxplot
    import pandas as pd

    # Your analysis code here
    data = pd.read_csv("results.csv")
    metrics = pd.read_csv("metrics.csv")
    
    boxplot = Boxplot(data, metrics, "Accuracy")
    boxplot.save_instance("Problem1", "output.png")