API Documentation

Common Data Types

Sorting

All Designs allow you to control the sort order of values.

One of the options is CUSTOM sorting. This is entirely optional but can be very useful. For example, if you have the following values in the Age Group variable:

'<20', '20-39', ... '80+'

you don't want the default alphabetical sorting by value. Otherwise '<20' appears at the end.

If you want to supply a CUSTOM sort order, all Design objects have sort_orders and sort_orders_yaml_file_path settings. See CommonDesign

`sofastats.conf.main.SortOrder`

Bases: StrEnum

Sort orders to apply. Note - INCREASING & DECREASING only apply to sorting at the final values level. E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type' can potentially have sort order by frequency

Source code in src/sofastats/conf/main.py

class SortOrder(StrEnum):
    """
    Sort orders to apply.
    Note - INCREASING & DECREASING only apply to sorting at the final values level.
    E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type'
    can potentially have sort order by frequency
    """
    CUSTOM = 'by custom order'
    "By custom order configured in YAML or dictionary for relevant variable"
    DECREASING = 'by decreasing frequency'
    "By decreasing frequency"
    INCREASING = 'by increasing frequency'
    "By increasing frequency"
    VALUE = 'by value'
    "By value alphabetically sorted"

`CUSTOM = 'by custom order'` `class-attribute` `instance-attribute`

By custom order configured in YAML or dictionary for relevant variable

`DECREASING = 'by decreasing frequency'` `class-attribute` `instance-attribute`

By decreasing frequency

`INCREASING = 'by increasing frequency'` `class-attribute` `instance-attribute`

By increasing frequency

`VALUE = 'by value'` `class-attribute` `instance-attribute`

By value alphabetically sorted

Common Parameters

The parameters in CommonDesign are common to all¹ output design dataclasses:

`sofastats.output.interfaces.CommonDesign` `dataclass`

Bases: ABC

Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign. Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses. Therefore, we are required to supply defaults for everything in the output dataclasses. That includes mandatory fields. So how do we ensure those mandatory field arguments are supplied. We use a decorator (add_post_init_enforcing_mandatory_cols) to add a post_init handler which runs CommonDesign.post_init and then enforces the supply of values for every attribute which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.

Parameters:

csv_file_path (Path | str | None, default: None ) –

full file path to CSV file (if using CSV as source)
csv_separator (str, default: ',' ) –

CSV separator (if using CSV as source)
cur (Any | None, default: None ) –

dpapi2 cursor i.e. an object able to run cur.execute, cur.fetchall() etc. (if using a cursor as source)
database_engine_name (DbeName | str | None, default: None ) –

e.g. DbeName.SQLITE or 'sqlite' (if using a cursor as the source)
source_table_name (str | None, default: None ) –

source table name (if using the cursor as a source OR using the internal SOFA SQLite database)
table_filter_sql (str | None, default: None ) –

valid SQL to filter the source table as supplied in source_table_name - must be in the appropriate SQL dialect and entities should be quoted appropriately as needed e.g. SQLite requires backticks for field names with spaces such as `Age Group`. You cannot filter a CSV. CSVs must already be in the form required for analysis.
style_name (str, default: 'default' ) –

e.g. 'default'. Either one of the built-in styles under sofastats.output.styles or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder e.g. ~/Documents/sofastats/custom_styles
output_file_path (Path | str | None, default: None ) –

full path to folder where output HTML will be generated.
output_title (str | None, default: None ) –

the title the HTML output will display in a web browser
show_in_web_browser (bool, default: True ) –

if True will open a tab in your default browser to display the output file generated
sort_orders (SortOrderSpecs | None, default: None ) –
if supplied, a dictionary that provides the sort orders for any variables given a custom sort order (SortOrder.CUSTOM). Multiple sort orders can be defined - with each variable given a custom sort order being a key in the dictionary. Example:
```
{
    Age Group: [
        '<20',
        '20 to <30', '30 to <40', '40 to <50',
        '50 to <60', '60 to <70', '70 to <80',
        '80+',
    ]
}
```
If the sort order applied was SortOrder.VALUES, we would see '<20' appearing as the last value by alphabetical order. If a custom order is defined, every value must appear in the list defining the desired sequence. Don't supply both sort_orders and sort_orders_yaml_file_path.
sort_orders_yaml_file_path (Path | str | None, default: None ) –

file path containing YAML defining custom sort orders. See structure and effect as discussed under sort_orders. Don't supply both sort_orders and sort_orders_yaml_file_path.
decimal_points (int, default: 3 ) –

defines the maximum number of decimal points displayed. If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and 1.60000000 as 1.6.

Source code in src/sofastats/output/interfaces.py

@dataclass(frozen=False)
class CommonDesign(ABC):
    r"""
    Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign.
    Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses.
    Therefore, we are required to supply defaults for everything in the output dataclasses.
    That includes mandatory fields.
    So how do we ensure those mandatory field arguments are supplied.
    We use a decorator (add_post_init_enforcing_mandatory_cols) to add a __post_init__ handler
    which runs CommonDesign.__post_init__ and then enforces the supply of values for every attribute
    which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.

    Args:
        csv_file_path: full file path to CSV file (if using CSV as source)
        csv_separator: CSV separator (if using CSV as source)
        cur: dpapi2 cursor i.e. an object able to run cur.execute, `cur.fetchall()` etc. (if using a cursor as source)
        database_engine_name: e.g. `DbeName.SQLITE` or 'sqlite' (if using a cursor as the source)
        source_table_name: source table name (if using the cursor as a source OR using the internal SOFA SQLite database)
        table_filter_sql: valid SQL to filter the source table as supplied in source_table_name -
            must be in the appropriate SQL dialect and entities should be quoted appropriately as needed
            e.g. SQLite requires backticks for field names with spaces such as \`Age Group\`.
            You cannot filter a CSV. CSVs must already be in the form required for analysis.
        style_name: e.g. 'default'. Either one of the built-in styles under `sofastats.output.styles`
            or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder
            e.g. `~/Documents/sofastats/custom_styles`
        output_file_path: full path to folder where output HTML will be generated.
        output_title: the title the HTML output will display in a web browser
        show_in_web_browser: if `True` will open a tab in your default browser to display the output file generated
        sort_orders: if supplied, a dictionary that provides the sort orders for any variables given a custom sort order
            (`SortOrder.CUSTOM`). Multiple sort orders can be defined - with each variable given a custom sort order
            being a key in the dictionary. Example:

            ```python
            {
                Age Group: [
                    '<20',
                    '20 to <30', '30 to <40', '40 to <50',
                    '50 to <60', '60 to <70', '70 to <80',
                    '80+',
                ]
            }
            ```

            If the sort order applied was `SortOrder.VALUES`, we would see '<20' appearing as the last value
            by alphabetical order. If a custom order is defined, every value must appear in the list defining the
            desired sequence.
            Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
        sort_orders_yaml_file_path: file path containing YAML defining custom sort orders. See structure and effect as
            discussed under `sort_orders`. Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
        decimal_points: defines the maximum number of decimal points displayed.
            If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and
            1.60000000 as 1.6.
    """
    ## inputs ***********************************
    csv_file_path: Path | str | None = None
    csv_separator: str = ','
    cur: Any | None = None
    database_engine_name: DbeName | str | None = None
    source_table_name: str | None = None
    table_filter_sql: str | None = None
    ## outputs **********************************
    style_name: str = 'default'
    output_file_path: Path | str | None = None
    output_title: str | None = None
    show_in_web_browser: bool = True
    sort_orders: SortOrderSpecs | None = None
    sort_orders_yaml_file_path: Path | str | None = None
    decimal_points: int = 3

    @abstractmethod
    def to_html_design(self) -> HTMLItemSpec:
        """
        From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
        Also return the style name and output item type e.g. whether a chart, table, or statistical output
        """
        pass

    def _handle_inputs(self):
        """
        There are three main paths for specific data values to be supplied to the design:

        1. CSV - data will be ingested into internal sofastats SQLite database
        (`source_table_name` optional - later analyses might be referring to that ingested table
        so nice to let user choose the name)
        2. `cur`, `database_engine_name`, and `source_table_name`
        3. or just a `source_table_name` (assumed to be using internal sofastats SQLite database)

        Any supplied cursors are "wrapped" inside an `ExtendedCursor` so we can use `.exe()` instead of `.execute()`
        and to provide better error messages on query failure.

        Client code supplies `database_engine_name` rather than dbe_spec for simplicity but internally
        `CommonDesign` supplies all code that inherits from it a `dbe_spec` attribute ready to use.

        Settings are validated e.g. to prevent client code supplying both CSV settings and database settings.
        """
        if self.csv_file_path:
            if self.cur or self.database_engine_name or self.source_table_name or self.table_filter_sql:
                raise Exception("If supplying a CSV path don't also supply database requirements")
            if not self.csv_separator:
                self.csv_separator = ','
            if not SQLITE_DB.get('sqlite_default_cur'):
                SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
                SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
            self.cur = SQLITE_DB['sqlite_default_cur']
            self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
            if not self.source_table_name:
                self.source_table_name = get_safer_name(Path(self.csv_file_path).stem)
            ## ingest CSV into database
            df = pd.read_csv(self.csv_file_path, sep=self.csv_separator)
            try:
                df.to_sql(self.source_table_name, SQLITE_DB['sqlite_default_con'], if_exists='replace', index=False)
            except Exception as e:  ## TODO: supply more specific exception
                logger.info(f"Failed at attempt to ingest CSV from '{self.csv_file_path}' "
                    f"into internal pysofa SQLite database as table '{self.source_table_name}'.\nError: {e}")
            else:
                logger.info(f"Successfully ingested CSV from '{self.csv_file_path}' "
                    f"into internal pysofa SQLite database as table '{self.source_table_name}'")
        elif self.cur:
            self.cur = ExtendedCursor(self.cur)
            if not self.database_engine_name:
                supported_names = '"' + '", "'.join(name.value for name in DbeName) + '"'
                raise Exception("When supplying a cursor, a database_engine_name must also be supplied. "
                    f"Supported names currently are: {supported_names}")
            else:
                self.dbe_spec = get_dbe_spec(self.database_engine_name)
            if not self.source_table_name:
                raise Exception("When supplying a cursor, a source_table_name must also be supplied")
        elif self.source_table_name:
            if not SQLITE_DB.get('sqlite_default_cur'):
                SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
                SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
            self.cur = SQLITE_DB['sqlite_default_cur']  ## not already set if in the third path - will have gone down first
            if self.database_engine_name and self.database_engine_name != DbeName.SQLITE:
                raise Exception("If not supplying a csv_file_path, or a cursor, the only permitted database engine is "
                    "SQLite (the dbe of the internal sofastats SQLite database)")
            self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
        else:
            raise Exception("Either supply a path to a CSV "
                "(optional tbl_name for when ingested into internal sofastats SQLite database), "
                "a cursor (with dbe_name and tbl_name), "
                "or a tbl_name (data assumed to be in internal sofastats SQLite database)")

    def _handle_outputs(self):
        """
        Validate configuration and provide sane defaults for `output_title` and `output_file_path` if nothing set.
        """
        ## output file path and title
        nice_name = '_'.join(self.__module__.split('.')[-2:]) + f"_{self.__class__.__name__}"
        if not self.output_file_path:
            now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
            self.output_file_path = Path.cwd() / f"{nice_name}_{now}.html"
        if not self.output_title:
            self.output_title = f"{nice_name} Output"
        ## sort orders
        if self.sort_orders:
            if self.sort_orders_yaml_file_path:
                raise Exception("Oops - it looks like you supplied settings for both sort_orders "
                    "and sort_orders_yaml_file_path. Please set one or both of them to None.")
            else:
                pass
        elif self.sort_orders_yaml_file_path:
            yaml = YAML(typ='safe')  ## default, if not specified, is 'rt' (round-trip)
            self.sort_orders = yaml.load(Path(self.sort_orders_yaml_file_path))  ## might be a str or Path so make sure
        else:
            self.sort_orders = {}

    def __post_init__(self):
        self._handle_inputs()
        self._handle_outputs()
        for field in fields(self):
            if self.__getattribute__(field.name) == DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY:
                ## raise a friendly error for when they didn't supply a mandatory field that technically had a default (DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY), but we want to insist they supply a real value
                client_module = self.__module__.split('.')[-1]
                nice_name = f"{client_module}.{self.__class__.__name__}"  ## e.g. anova.AnovaDesign
                raise Exception(f"Oops - you need to supply a value for {field.name} in your {nice_name}")

    def __repr_html__(self):
        return self.__str__

    def make_output(self):
        """
        Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
        and open in web browser if `show_in_web_browser=True`.
        """
        self.to_html_design().to_file(fpath=self.output_file_path)
        if self.show_in_web_browser:
            open_new_tab(url=f"file://{self.output_file_path}")

`to_html_design() -> HTMLItemSpec` `abstractmethod`

From the design produce the HTML to display as one of the attributes of the HTMLItemSpec. Also return the style name and output item type e.g. whether a chart, table, or statistical output

Source code in src/sofastats/output/interfaces.py

@abstractmethod
def to_html_design(self) -> HTMLItemSpec:
    """
    From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
    Also return the style name and output item type e.g. whether a chart, table, or statistical output
    """
    pass

`make_output()`

Produce HTML output, e.g. charts and numerical results, save to output_file_path, and open in web browser if show_in_web_browser=True.

Source code in src/sofastats/output/interfaces.py

def make_output(self):
    """
    Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
    and open in web browser if `show_in_web_browser=True`.
    """
    self.to_html_design().to_file(fpath=self.output_file_path)
    if self.show_in_web_browser:
        open_new_tab(url=f"file://{self.output_file_path}")

Charts

Area Charts

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See AreaChartDesign for the parameters configuring individual area chart designs.

API Documentation

Common Data Types

Sorting

sofastats.conf.main.SortOrder

CUSTOM = 'by custom order' class-attribute instance-attribute

DECREASING = 'by decreasing frequency' class-attribute instance-attribute

INCREASING = 'by increasing frequency' class-attribute instance-attribute

VALUE = 'by value' class-attribute instance-attribute

Common Parameters

sofastats.output.interfaces.CommonDesign dataclass

to_html_design() -> HTMLItemSpec abstractmethod

make_output()

Charts

Area Charts

sofastats.output.charts.area.AreaChartDesign dataclass

sofastats.output.charts.area.MultiChartAreaChartDesign dataclass

Bar Charts

sofastats.output.charts.bar.CommonBarDesign dataclass

sofastats.output.charts.bar.SimpleBarChartDesign dataclass

sofastats.output.charts.bar.MultiChartBarChartDesign dataclass

sofastats.output.charts.bar.ClusteredBarChartDesign dataclass

sofastats.output.charts.bar.MultiChartClusteredBarChartDesign dataclass

Box Plots

sofastats.output.charts.box_plot.BoxplotChartDesign dataclass

sofastats.output.charts.box_plot.ClusteredBoxplotChartDesign dataclass

Histograms

sofastats.output.charts.histogram.HistogramChartDesign dataclass

sofastats.output.charts.histogram.MultiChartHistogramChartDesign dataclass

Line Charts

sofastats.output.charts.line.LineChartDesign dataclass

sofastats.output.charts.line.MultiChartLineChartDesign dataclass

sofastats.output.charts.line.MultiLineChartDesign dataclass

sofastats.output.charts.line.MultiChartMultiLineChartDesign dataclass

Pie Charts

sofastats.output.charts.pie.PieChartDesign dataclass

sofastats.output.charts.pie.MultiChartPieChartDesign dataclass

Scatter Plots

sofastats.output.charts.scatter_plot.SimpleScatterChartDesign dataclass

sofastats.output.charts.scatter_plot.MultiChartScatterChartDesign dataclass

sofastats.output.charts.scatter_plot.BySeriesScatterChartDesign dataclass

sofastats.output.charts.scatter_plot.MultiChartBySeriesScatterChartDesign dataclass

Tables

sofastats.output.tables.interfaces.DimensionSpec dataclass

sofastats.output.tables.interfaces.Row dataclass

sofastats.output.tables.interfaces.Column dataclass

sofastats.output.tables.freq.FrequencyTableDesign dataclass

sofastats.output.tables.cross_tab.CrossTabDesign dataclass

Statistical Tests

sofastats.output.stats.interfaces.CommonStatsDesign dataclass

to_result() -> Type[StatsResult] abstractmethod

ANOVA

sofastats.output.stats.anova.AnovaDesign dataclass

Chi Square

sofastats.output.stats.chi_square.ChiSquareDesign dataclass

Independent Samples T-Test

sofastats.output.stats.independent_t_test.IndependentTTestDesign dataclass

Kruskal-Wallis H

sofastats.output.stats.kruskal_wallis_h.KruskalWallisHDesign dataclass

Mann-Whitney U

sofastats.output.stats.mann_whitney_u.MannWhitneyUDesign dataclass

Normality

sofastats.output.stats.normality.NormalityDesign dataclass

Paired Samples T-Test

sofastats.output.stats.paired_t_test.PairedTTestDesign dataclass

Pearson's R Correlation

sofastats.output.stats.pearsons_r.PearsonsRDesign dataclass

Spearman's R Correlation

sofastats.output.stats.spearmans_r.SpearmansRDesign dataclass

Wilcoxon Signed Ranks

sofastats.output.stats.wilcoxon_signed_ranks.WilcoxonSignedRanksDesign dataclass

`sofastats.conf.main.SortOrder`

`CUSTOM = 'by custom order'` `class-attribute` `instance-attribute`

`DECREASING = 'by decreasing frequency'` `class-attribute` `instance-attribute`

`INCREASING = 'by increasing frequency'` `class-attribute` `instance-attribute`

`VALUE = 'by value'` `class-attribute` `instance-attribute`

`sofastats.output.interfaces.CommonDesign` `dataclass`

`to_html_design() -> HTMLItemSpec` `abstractmethod`

`make_output()`

`sofastats.output.charts.area.AreaChartDesign` `dataclass`

`sofastats.output.charts.area.MultiChartAreaChartDesign` `dataclass`

`sofastats.output.charts.bar.CommonBarDesign` `dataclass`

`sofastats.output.charts.bar.SimpleBarChartDesign` `dataclass`

`sofastats.output.charts.bar.MultiChartBarChartDesign` `dataclass`

`sofastats.output.charts.bar.ClusteredBarChartDesign` `dataclass`

`sofastats.output.charts.bar.MultiChartClusteredBarChartDesign` `dataclass`

`sofastats.output.charts.box_plot.BoxplotChartDesign` `dataclass`

`sofastats.output.charts.box_plot.ClusteredBoxplotChartDesign` `dataclass`

`sofastats.output.charts.histogram.HistogramChartDesign` `dataclass`

`sofastats.output.charts.histogram.MultiChartHistogramChartDesign` `dataclass`

`sofastats.output.charts.line.LineChartDesign` `dataclass`

`sofastats.output.charts.line.MultiChartLineChartDesign` `dataclass`

`sofastats.output.charts.line.MultiLineChartDesign` `dataclass`

`sofastats.output.charts.line.MultiChartMultiLineChartDesign` `dataclass`

`sofastats.output.charts.pie.PieChartDesign` `dataclass`

`sofastats.output.charts.pie.MultiChartPieChartDesign` `dataclass`

`sofastats.output.charts.scatter_plot.SimpleScatterChartDesign` `dataclass`

`sofastats.output.charts.scatter_plot.MultiChartScatterChartDesign` `dataclass`

`sofastats.output.charts.scatter_plot.BySeriesScatterChartDesign` `dataclass`

`sofastats.output.charts.scatter_plot.MultiChartBySeriesScatterChartDesign` `dataclass`

`sofastats.output.tables.interfaces.DimensionSpec` `dataclass`

`sofastats.output.tables.interfaces.Row` `dataclass`

`sofastats.output.tables.interfaces.Column` `dataclass`

`sofastats.output.tables.freq.FrequencyTableDesign` `dataclass`

`sofastats.output.tables.cross_tab.CrossTabDesign` `dataclass`

`sofastats.output.stats.interfaces.CommonStatsDesign` `dataclass`

`to_result() -> Type[StatsResult]` `abstractmethod`

`sofastats.output.stats.anova.AnovaDesign` `dataclass`

`sofastats.output.stats.chi_square.ChiSquareDesign` `dataclass`

`sofastats.output.stats.independent_t_test.IndependentTTestDesign` `dataclass`

`sofastats.output.stats.kruskal_wallis_h.KruskalWallisHDesign` `dataclass`

`sofastats.output.stats.mann_whitney_u.MannWhitneyUDesign` `dataclass`

`sofastats.output.stats.normality.NormalityDesign` `dataclass`

`sofastats.output.stats.paired_t_test.PairedTTestDesign` `dataclass`

`sofastats.output.stats.pearsons_r.PearsonsRDesign` `dataclass`

`sofastats.output.stats.spearmans_r.SpearmansRDesign` `dataclass`

`sofastats.output.stats.wilcoxon_signed_ranks.WilcoxonSignedRanksDesign` `dataclass`