Skip to content

API Documentation

Common Data Types

Sorting

All Designs allow you to control the sort order of values.

One of the options is CUSTOM sorting. This is entirely optional but can be very useful. For example, if you have the following values in the Age Group variable:

'<20', '20-39', ... '80+'

you don't want the default alphabetical sorting by value. Otherwise '<20' appears at the end.

If you want to supply a CUSTOM sort order, all Design objects have sort_orders and sort_orders_yaml_file_path settings. See CommonDesign

sofastats.conf.main.SortOrder

Bases: StrEnum

Sort orders to apply. Note - INCREASING & DECREASING only apply to sorting at the final values level. E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type' can potentially have sort order by frequency

Source code in src/sofastats/conf/main.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class SortOrder(StrEnum):
    """
    Sort orders to apply.
    Note - INCREASING & DECREASING only apply to sorting at the final values level.
    E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type'
    can potentially have sort order by frequency
    """
    CUSTOM = 'by custom order'
    "By custom order configured in YAML or dictionary for relevant variable"
    DECREASING = 'by decreasing frequency'
    "By decreasing frequency"
    INCREASING = 'by increasing frequency'
    "By increasing frequency"
    VALUE = 'by value'
    "By value alphabetically sorted"

CUSTOM = 'by custom order' class-attribute instance-attribute

By custom order configured in YAML or dictionary for relevant variable

DECREASING = 'by decreasing frequency' class-attribute instance-attribute

By decreasing frequency

INCREASING = 'by increasing frequency' class-attribute instance-attribute

By increasing frequency

VALUE = 'by value' class-attribute instance-attribute

By value alphabetically sorted

Common Parameters

The parameters in CommonDesign are common to all output design dataclasses:

sofastats.output.interfaces.CommonDesign dataclass

Bases: ABC

Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign. Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses. Therefore, we are required to supply defaults for everything in the output dataclasses. That includes mandatory fields. So how do we ensure those mandatory field arguments are supplied. We use a decorator (add_post_init_enforcing_mandatory_cols) to add a post_init handler which runs CommonDesign.post_init and then enforces the supply of values for every attribute which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.

Parameters:

  • csv_file_path (Path | str | None, default: None ) –

    full file path to CSV file (if using CSV as source)

  • csv_separator (str, default: ',' ) –

    CSV separator (if using CSV as source)

  • cur (Any | None, default: None ) –

    dpapi2 cursor i.e. an object able to run cur.execute, cur.fetchall() etc. (if using a cursor as source)

  • database_engine_name (DbeName | str | None, default: None ) –

    e.g. DbeName.SQLITE or 'sqlite' (if using a cursor as the source)

  • source_table_name (str | None, default: None ) –

    source table name (if using the cursor as a source OR using the internal SOFA SQLite database)

  • table_filter_sql (str | None, default: None ) –

    valid SQL to filter the source table - must be in the appropriate SQL dialect and entities should be quoted appropriately as needed e.g. SQLite requires backticks for field names with spaces such as `Age Group`

  • style_name (str, default: 'default' ) –

    e.g. 'default'. Either one of the built-in styles under sofastats.output.styles or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder e.g. ~/Documents/sofastats/custom_styles

  • output_file_path (Path | str | None, default: None ) –

    full path to folder where output HTML will be generated.

  • output_title (str | None, default: None ) –

    the title the HTML output will display in a web browser

  • show_in_web_browser (bool, default: True ) –

    if True will open a tab in your default browser to display the output file generated

  • sort_orders (SortOrderSpecs | None, default: None ) –

    if supplied, a dictionary that provides the sort orders for any variables given a custom sort order (SortOrder.CUSTOM). Multiple sort orders can be defined - with each variable given a custom sort order being a key in the dictionary. Example:

    {
        Age Group: [
            '<20',
            '20 to <30', '30 to <40', '40 to <50',
            '50 to <60', '60 to <70', '70 to <80',
            '80+',
        ]
    }
    

    If the sort order applied was SortOrder.VALUES, we would see '<20' appearing as the last value by alphabetical order. If a custom order is defined, every value must appear in the list defining the desired sequence. Don't supply both sort_orders and sort_orders_yaml_file_path.

  • sort_orders_yaml_file_path (Path | str | None, default: None ) –

    file path containing YAML defining custom sort orders. See structure and effect as discussed under sort_orders. Don't supply both sort_orders and sort_orders_yaml_file_path.

  • decimal_points (int, default: 3 ) –

    defines the maximum number of decimal points displayed. If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and 1.60000000 as 1.6.

Source code in src/sofastats/output/interfaces.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
@dataclass(frozen=False)
class CommonDesign(ABC):
    """
    Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign.
    Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses.
    Therefore, we are required to supply defaults for everything in the output dataclasses.
    That includes mandatory fields.
    So how do we ensure those mandatory field arguments are supplied.
    We use a decorator (add_post_init_enforcing_mandatory_cols) to add a __post_init__ handler
    which runs CommonDesign.__post_init__ and then enforces the supply of values for every attribute
    which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.

    Args:
        csv_file_path: full file path to CSV file (if using CSV as source)
        csv_separator: CSV separator (if using CSV as source)
        cur: dpapi2 cursor i.e. an object able to run cur.execute, `cur.fetchall()` etc. (if using a cursor as source)
        database_engine_name: e.g. `DbeName.SQLITE` or 'sqlite' (if using a cursor as the source)
        source_table_name: source table name (if using the cursor as a source OR using the internal SOFA SQLite database)
        table_filter_sql: valid SQL to filter the source table - must be in the appropriate SQL dialect
            and entities should be quoted appropriately as needed
            e.g. SQLite requires backticks for field names with spaces such as \`Age Group\`
        style_name: e.g. 'default'. Either one of the built-in styles under `sofastats.output.styles`
            or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder
            e.g. `~/Documents/sofastats/custom_styles`
        output_file_path: full path to folder where output HTML will be generated.
        output_title: the title the HTML output will display in a web browser
        show_in_web_browser: if `True` will open a tab in your default browser to display the output file generated
        sort_orders: if supplied, a dictionary that provides the sort orders for any variables given a custom sort order
            (`SortOrder.CUSTOM`). Multiple sort orders can be defined - with each variable given a custom sort order
            being a key in the dictionary. Example:

            ```python
            {
                Age Group: [
                    '<20',
                    '20 to <30', '30 to <40', '40 to <50',
                    '50 to <60', '60 to <70', '70 to <80',
                    '80+',
                ]
            }
            ```

            If the sort order applied was `SortOrder.VALUES`, we would see '<20' appearing as the last value
            by alphabetical order. If a custom order is defined, every value must appear in the list defining the
            desired sequence.
            Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
        sort_orders_yaml_file_path: file path containing YAML defining custom sort orders. See structure and effect as
            discussed under `sort_orders`. Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
        decimal_points: defines the maximum number of decimal points displayed.
            If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and
            1.60000000 as 1.6.
    """
    ## inputs ***********************************
    csv_file_path: Path | str | None = None
    csv_separator: str = ','
    cur: Any | None = None
    database_engine_name: DbeName | str | None = None
    source_table_name: str | None = None
    table_filter_sql: str | None = None
    ## outputs **********************************
    style_name: str = 'default'
    output_file_path: Path | str | None = None
    output_title: str | None = None
    show_in_web_browser: bool = True
    sort_orders: SortOrderSpecs | None = None
    sort_orders_yaml_file_path: Path | str | None = None
    decimal_points: int = 3

    @abstractmethod
    def to_html_design(self) -> HTMLItemSpec:
        """
        From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
        Also return the style name and output item type e.g. whether a chart, table, or statistical output
        """
        pass

    def _handle_inputs(self):
        """
        There are three main paths for specific data values to be supplied to the design:

        1. CSV - data will be ingested into internal sofastats SQLite database
        (`source_table_name` optional - later analyses might be referring to that ingested table
        so nice to let user choose the name)
        2. `cur`, `database_engine_name`, and `source_table_name`
        3. or just a `source_table_name` (assumed to be using internal sofastats SQLite database)

        Any supplied cursors are "wrapped" inside an `ExtendedCursor` so we can use `.exe()` instead of `.execute()`
        and to provide better error messages on query failure.

        Client code supplies `database_engine_name` rather than dbe_spec for simplicity but internally
        `CommonDesign` supplies all code that inherits from it a `dbe_spec` attribute ready to use.

        Settings are validated e.g. to prevent client code supplying both CSV settings and database settings.
        """
        if self.csv_file_path:
            if self.cur or self.database_engine_name or self.source_table_name or self.table_filter_sql:
                raise Exception("If supplying a CSV path don't also supply database requirements")
            if not self.csv_separator:
                self.csv_separator = ','
            if not SQLITE_DB.get('sqlite_default_cur'):
                SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
                SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
            self.cur = SQLITE_DB['sqlite_default_cur']
            self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
            if not self.source_table_name:
                self.source_table_name = get_safer_name(Path(self.csv_file_path).stem)
            ## ingest CSV into database
            df = pd.read_csv(self.csv_file_path, sep=self.csv_separator)
            try:
                df.to_sql(self.source_table_name, SQLITE_DB['sqlite_default_con'], if_exists='replace', index=False)
            except Exception as e:  ## TODO: supply more specific exception
                logger.info(f"Failed at attempt to ingest CSV from '{self.csv_file_path}' "
                    f"into internal pysofa SQLite database as table '{self.source_table_name}'.\nError: {e}")
            else:
                logger.info(f"Successfully ingested CSV from '{self.csv_file_path}' "
                    f"into internal pysofa SQLite database as table '{self.source_table_name}'")
        elif self.cur:
            self.cur = ExtendedCursor(self.cur)
            if not self.database_engine_name:
                supported_names = '"' + '", "'.join(name.value for name in DbeName) + '"'
                raise Exception("When supplying a cursor, a database_engine_name must also be supplied. "
                    f"Supported names currently are: {supported_names}")
            else:
                self.dbe_spec = get_dbe_spec(self.database_engine_name)
            if not self.source_table_name:
                raise Exception("When supplying a cursor, a source_table_name must also be supplied")
        elif self.source_table_name:
            if not SQLITE_DB.get('sqlite_default_cur'):
                SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
                SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
            self.cur = SQLITE_DB['sqlite_default_cur']  ## not already set if in the third path - will have gone down first
            if self.database_engine_name and self.database_engine_name != DbeName.SQLITE:
                raise Exception("If not supplying a csv_file_path, or a cursor, the only permitted database engine is "
                    "SQLite (the dbe of the internal sofastats SQLite database)")
            self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
        else:
            raise Exception("Either supply a path to a CSV "
                "(optional tbl_name for when ingested into internal sofastats SQLite database), "
                "a cursor (with dbe_name and tbl_name), "
                "or a tbl_name (data assumed to be in internal sofastats SQLite database)")

    def _handle_outputs(self):
        """
        Validate configuration and provide sane defaults for `output_title` and `output_file_path` if nothing set.
        """
        ## output file path and title
        nice_name = '_'.join(self.__module__.split('.')[-2:]) + f"_{self.__class__.__name__}"
        if not self.output_file_path:
            now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
            self.output_file_path = Path.cwd() / f"{nice_name}_{now}.html"
        if not self.output_title:
            self.output_title = f"{nice_name} Output"
        ## sort orders
        if self.sort_orders:
            if self.sort_orders_yaml_file_path:
                raise Exception("Oops - it looks like you supplied settings for both sort_orders "
                    "and sort_orders_yaml_file_path. Please set one or both of them to None.")
            else:
                pass
        elif self.sort_orders_yaml_file_path:
            yaml = YAML(typ='safe')  ## default, if not specified, is 'rt' (round-trip)
            self.sort_orders = yaml.load(Path(self.sort_orders_yaml_file_path))  ## might be a str or Path so make sure
        else:
            self.sort_orders = {}

    def __post_init__(self):
        self._handle_inputs()
        self._handle_outputs()
        for field in fields(self):
            if self.__getattribute__(field.name) == DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY:
                ## raise a friendly error for when they didn't supply a mandatory field that technically had a default (DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY), but we want to insist they supply a real value
                client_module = self.__module__.split('.')[-1]
                nice_name = f"{client_module}.{self.__class__.__name__}"  ## e.g. anova.AnovaDesign
                raise Exception(f"Oops - you need to supply a value for {field.name} in your {nice_name}")

    def __repr_html__(self):
        return self.__str__

    def make_output(self):
        """
        Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
        and open in web browser if `show_in_web_browser=True`.
        """
        self.to_html_design().to_file(fpath=self.output_file_path)
        if self.show_in_web_browser:
            open_new_tab(url=f"file://{self.output_file_path}")

to_html_design() -> HTMLItemSpec abstractmethod

From the design produce the HTML to display as one of the attributes of the HTMLItemSpec. Also return the style name and output item type e.g. whether a chart, table, or statistical output

Source code in src/sofastats/output/interfaces.py
155
156
157
158
159
160
161
@abstractmethod
def to_html_design(self) -> HTMLItemSpec:
    """
    From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
    Also return the style name and output item type e.g. whether a chart, table, or statistical output
    """
    pass

make_output()

Produce HTML output, e.g. charts and numerical results, save to output_file_path, and open in web browser if show_in_web_browser=True.

Source code in src/sofastats/output/interfaces.py
265
266
267
268
269
270
271
272
def make_output(self):
    """
    Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
    and open in web browser if `show_in_web_browser=True`.
    """
    self.to_html_design().to_file(fpath=self.output_file_path)
    if self.show_in_web_browser:
        open_new_tab(url=f"file://{self.output_file_path}")

Charts

Area Charts

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See AreaChartDesign for the parameters configuring individual area chart designs.

sofastats.output.charts.area.AreaChartDesign dataclass

Bases: CommonDesign

Parameters:

  • category_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    name of field in the x-axis

  • category_sort_order (SortOrder | str, default: VALUE ) –

    define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • is_time_series (bool, default: False ) –

    space x-axis labels according to time e.g. there might be variable gaps between items

  • show_major_ticks_only (bool, default: True ) –

    suppress minor ticks

  • show_markers (bool, default: True ) –

    show markers on the line bounding the area

  • rotate_x_labels (bool, default: False ) –

    make x-axis labels vertical

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • x_axis_font_size (int, default: 12 ) –

    font size for x-axis labels

  • y_axis_title (str, default: 'Freq' ) –

    title displayed vertically alongside y-axis

Source code in src/sofastats/output/charts/area.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
@dataclass(frozen=False)
class AreaChartDesign(CommonDesign):
    """
    Args:
        category_field_name: name of field in the x-axis
        category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        is_time_series: space x-axis labels according to time e.g. there might be variable gaps between items
        show_major_ticks_only: suppress minor ticks
        show_markers: show markers on the line bounding the area
        rotate_x_labels: make x-axis labels vertical
        show_n_records: show the number of records the chart is based on
        x_axis_font_size: font size for x-axis labels
        y_axis_title: title displayed vertically alongside y-axis
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder | str = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## chart details
        charting_spec = AreaChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.area.MultiChartAreaChartDesign dataclass

Bases: CommonDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder | str, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/area.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
@dataclass(frozen=False)
class MultiChartAreaChartDesign(CommonDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
             might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder | str = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder | str = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order, chart_sort_order=self.category_sort_order,
            table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
        ## chart details
        charting_spec = AreaChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Bar Charts

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See SimpleBarChartDesign for the parameters configuring individual bar chart designs.

sofastats.output.charts.bar.CommonBarDesign dataclass

Bases: CommonDesign

Parameters:

  • metric (ChartMetric, default: FREQ ) –

    defines what bar heights represent - whether ChartMetric.FREQ, ChartMetric.PCT, etc.

  • field_name (str | None, default: None ) –

    the name of the field being aggregated when the metric is an aggregate e.g. ChartMetric.AVG or ChartMetric.SUM

  • y_axis_title (str | None, default: None ) –

    title displayed vertically alongside y-axis

Source code in src/sofastats/output/charts/bar.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
@dataclass(frozen=False)
class CommonBarDesign(CommonDesign):
    """
    Args:
        metric: defines what bar heights represent - whether ChartMetric.FREQ, ChartMetric.PCT, etc.
        field_name: the name of the field being aggregated when the metric is an aggregate
            e.g. ChartMetric.AVG or ChartMetric.SUM
        y_axis_title: title displayed vertically alongside y-axis
    """

    metric: ChartMetric = ChartMetric.FREQ
    field_name: str | None = None
    y_axis_title: str | None = None

    def __post_init__(self):
        super().__post_init__()
        if self.y_axis_title is None:  ##TODO - no field name unless aggregating
            if self.metric == ChartMetric.AVG:
                self.y_axis_title = f"Average {self.field_name}"
            elif self.metric == ChartMetric.FREQ:
                self.y_axis_title = 'Frequency'
            elif self.metric == ChartMetric.PCT:
                self.y_axis_title = 'Percent'
            elif self.metric == ChartMetric.SUM:
                self.y_axis_title = f"Summed {self.field_name}"
            else:
                raise ValueError(f'Metric {self.metric} is not supported.')
        if self.field_name is None:
            if self.metric in (ChartMetric.AVG, ChartMetric.SUM):
                raise ValueError("A field_name must be set if the metric aggregates "
                    "e.g. ChartMetric.AVG or ChartMetric.SUM")
        else:
            if self.metric not in (ChartMetric.AVG, ChartMetric.SUM):
                raise ValueError("A field_name should only be supplied if the metric aggregates "
                    "e.g. ChartMetric.AVG or ChartMetric.SUM")

    @abstractmethod
    def to_html_design(self) -> HTMLItemSpec:
        pass

sofastats.output.charts.bar.SimpleBarChartDesign dataclass

Bases: CommonBarDesign

Parameters:

  • category_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    name of field in the x-axis

  • category_sort_order (SortOrder, default: VALUE ) –

    define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • rotate_x_labels (bool, default: False ) –

    make x-axis labels vertical

  • show_borders (bool, default: False ) –

    show a coloured border around the bars

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • x_axis_font_size (int, default: 12 ) –

    font size for x-axis labels

Source code in src/sofastats/output/charts/bar.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
@dataclass(frozen=False)
class SimpleBarChartDesign(CommonBarDesign):
    """
    Args:
        category_field_name: name of field in the x-axis
        category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        rotate_x_labels: make x-axis labels vertical
        show_borders: show a coloured border around the bars
        show_n_records: show the number of records the chart is based on
        x_axis_font_size: font size for x-axis labels
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE

    rotate_x_labels: bool = False
    show_borders: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = from_data.get_by_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name, sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            metric=self.metric, field_name=self.field_name,
            table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
        ## chart details
        charting_spec = BarChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)  ## see get_indiv_chart_html() below
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.bar.MultiChartBarChartDesign dataclass

Bases: CommonBarDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/bar.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
@dataclass(frozen=False)
class MultiChartBarChartDesign(CommonBarDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    metric: ChartMetric = ChartMetric.FREQ
    rotate_x_labels: bool = False
    show_borders: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = from_data.get_by_chart_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name, chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order, chart_sort_order=self.chart_sort_order,
            metric=self.metric, field_name=self.field_name,
            table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
        ## charts details
        charting_spec = BarChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.bar.ClusteredBarChartDesign dataclass

Bases: CommonBarDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/bar.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
@dataclass(frozen=False)
class ClusteredBarChartDesign(CommonBarDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE

    metric: ChartMetric = ChartMetric.FREQ
    rotate_x_labels: bool = False
    show_borders: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = from_data.get_by_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name, series_field_name=self.series_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order, series_sort_order=self.series_sort_order,
            metric=self.metric, field_name=self.field_name,
            table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
        ## chart details
        charting_spec = BarChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.bar.MultiChartClusteredBarChartDesign dataclass

Bases: CommonBarDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/bar.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
@dataclass(frozen=False)
class MultiChartClusteredBarChartDesign(CommonBarDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    metric: ChartMetric = ChartMetric.FREQ
    rotate_x_labels: bool = False
    show_borders: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = from_data.get_by_chart_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            series_field_name=self.series_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            series_sort_order=self.series_sort_order,
            chart_sort_order=self.chart_sort_order,
            metric=self.metric, field_name=self.field_name,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points)
        ## chart details
        charting_spec = BarChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.bar.MultiChartClusteredBarChartDesign dataclass

Bases: CommonBarDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/bar.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
@dataclass(frozen=False)
class MultiChartClusteredBarChartDesign(CommonBarDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    metric: ChartMetric = ChartMetric.FREQ
    rotate_x_labels: bool = False
    show_borders: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = from_data.get_by_chart_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            series_field_name=self.series_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            series_sort_order=self.series_sort_order,
            chart_sort_order=self.chart_sort_order,
            metric=self.metric, field_name=self.field_name,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points)
        ## chart details
        charting_spec = BarChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Box Plots

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See BoxplotChartDesign for the parameters configuring individual box plot chart designs.

sofastats.output.charts.box_plot.BoxplotChartDesign dataclass

Bases: CommonDesign

Parameters:

  • field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    field summarised in each box

  • category_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    name of field in the x-axis

  • category_sort_order (SortOrder, default: VALUE ) –

    define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • box_plot_type (BoxplotType, default: INSIDE_1_POINT_5_TIMES_IQR ) –

    options for what the boxes represent and whether outliers are displayed or not.

  • rotate_x_labels (bool, default: False ) –

    make x-axis labels vertical

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • x_axis_font_size (int, default: 12 ) –

    font size for x-axis labels

Source code in src/sofastats/output/charts/box_plot.py
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
@dataclass(frozen=False)
class BoxplotChartDesign(CommonDesign):
    """
    Args:
        field_name: field summarised in each box
        category_field_name: name of field in the x-axis
        category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        box_plot_type: options for what the boxes represent and whether outliers are displayed or not.
        rotate_x_labels: make x-axis labels vertical
        show_n_records: show the number of records the chart is based on
        x_axis_font_size: font size for x-axis labels
    """
    field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE

    box_plot_type: BoxplotType = BoxplotType.INSIDE_1_POINT_5_TIMES_IQR
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            field_name=self.field_name,
            category_field_name=self.category_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            table_filter_sql=self.table_filter_sql,
            box_plot_type=self.box_plot_type)
        ## charts details
        categories = [
            category_vals_spec.category_val for category_vals_spec in intermediate_charting_spec.category_vals_specs]
        indiv_chart_spec = intermediate_charting_spec.to_indiv_chart_spec()
        charting_spec = BoxplotChartingSpec(
            categories=categories,
            indiv_chart_specs=[indiv_chart_spec, ],
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=intermediate_charting_spec.field_name,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.box_plot.ClusteredBoxplotChartDesign dataclass

Bases: CommonDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate boxes within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/box_plot.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
@dataclass(frozen=False)
class ClusteredBoxplotChartDesign(CommonDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate boxes within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE

    box_plot_type: BoxplotType = BoxplotType.INSIDE_1_POINT_5_TIMES_IQR
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            field_name=self.field_name,
            category_field_name=self.category_field_name,
            series_field_name=self.series_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            series_sort_order=self.series_sort_order,
            table_filter_sql=self.table_filter_sql,
            box_plot_type=self.box_plot_type)
        ## charts details
        categories = [category_vals_spec.category_val
            for category_vals_spec in intermediate_charting_spec.series_category_vals_specs[0].category_vals_specs]
        indiv_chart_spec = intermediate_charting_spec.to_indiv_chart_spec(dp=self.decimal_points)
        charting_spec = BoxplotChartingSpec(
            categories=categories,
            indiv_chart_specs=[indiv_chart_spec, ],
            series_legend_label=intermediate_charting_spec.series_field,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            x_axis_title=intermediate_charting_spec.category_field,
            y_axis_title=intermediate_charting_spec.field,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Histograms

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See HistogramChartDesign for the parameters configuring individual histogram chart designs.

sofastats.output.charts.histogram.HistogramChartDesign dataclass

Bases: CommonDesign

Parameters:

  • field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    field summarised in each box

  • show_borders (bool, default: False ) –

    show a coloured border around the bars

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • show_normal_curve (bool, default: True ) –

    if True display normal curve on the chart

  • x_axis_font_size (int, default: 12 ) –

    font size for x-axis labels

Source code in src/sofastats/output/charts/histogram.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
@dataclass(frozen=False)
class HistogramChartDesign(CommonDesign):
    """
    Args:
        field_name: field summarised in each box
        show_borders: show a coloured border around the bars
        show_n_records: show the number of records the chart is based on
        show_normal_curve: if `True` display normal curve on the chart
        x_axis_font_size: font size for x-axis labels
    """
    field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_borders: bool = False
    show_n_records: bool = True
    show_normal_curve: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_vals_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            field_name=self.field_name, table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
        bin_labels = intermediate_charting_spec.to_bin_labels()
        x_axis_min_val, x_axis_max_val = intermediate_charting_spec.to_x_axis_range()
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = HistoChartingSpec(
            bin_labels=bin_labels,
            indiv_chart_specs=indiv_chart_specs,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            show_normal_curve=self.show_normal_curve,
            var_label=intermediate_charting_spec.field_name,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_max_val=x_axis_max_val,
            x_axis_min_val=x_axis_min_val,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.histogram.MultiChartHistogramChartDesign dataclass

Bases: CommonDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/histogram.py
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
@dataclass(frozen=False)
class MultiChartHistogramChartDesign(CommonDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    show_borders: bool = False
    show_n_records: bool = True
    show_normal_curve: bool = True
    x_axis_font_size: int = 12

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            field_name=self.field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points,
        )
        x_axis_min_val, x_axis_max_val = intermediate_charting_spec.to_x_axis_range()
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = HistoChartingSpec(
            bin_labels=intermediate_charting_spec.to_bin_labels(),
            indiv_chart_specs=indiv_chart_specs,
            show_borders=self.show_borders,
            show_n_records=self.show_n_records,
            show_normal_curve=self.show_normal_curve,
            var_label=intermediate_charting_spec.field_name,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_max_val=x_axis_max_val,
            x_axis_min_val=x_axis_min_val,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Line Charts

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See LineChartDesign for the parameters configuring individual line chart designs.

sofastats.output.charts.line.LineChartDesign dataclass

Bases: CommonDesign

Parameters:

  • category_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    name of field in the x-axis

  • category_sort_order (SortOrder, default: VALUE ) –

    define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • is_time_series (bool, default: False ) –

    space x-axis labels according to time e.g. there might be variable gaps between items

  • show_major_ticks_only (bool, default: True ) –

    suppress minor ticks

  • show_markers (bool, default: True ) –

    show markers on the line bounding the area

  • show_smooth_line (bool, default: False ) –

    if True also show smoothed version of line

  • show_trend_line (bool, default: False ) –

    if True also show trend line

  • rotate_x_labels (bool, default: False ) –

    make x-axis labels vertical

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • x_axis_font_size (int, default: 12 ) –

    font size for x-axis labels

  • y_axis_title (str, default: 'Freq' ) –

    title displayed vertically alongside y-axis

Source code in src/sofastats/output/charts/line.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
@dataclass(frozen=False)
class LineChartDesign(CommonDesign):
    """
    Args:
        category_field_name: name of field in the x-axis
        category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        is_time_series: space x-axis labels according to time e.g. there might be variable gaps between items
        show_major_ticks_only: suppress minor ticks
        show_markers: show markers on the line bounding the area
        show_smooth_line: if `True` also show smoothed version of line
        show_trend_line: if `True` also show trend line
        rotate_x_labels: make x-axis labels vertical
        show_n_records: show the number of records the chart is based on
        x_axis_font_size: font size for x-axis labels
        y_axis_title: title displayed vertically alongside y-axis
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    show_smooth_line: bool = False
    show_trend_line: bool = False
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points,
        )
        ## chart details
        charting_spec = LineChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            show_smooth_line=self.show_smooth_line,
            show_trend_line=self.show_trend_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.line.MultiChartLineChartDesign dataclass

Bases: CommonDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/line.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
@dataclass(frozen=False)
class MultiChartLineChartDesign(CommonDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    show_smooth_line: bool = False
    show_trend_line: bool = False
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points,
        )
        ## chart details
        charting_spec = LineChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=None,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            show_smooth_line=self.show_smooth_line,
            show_trend_line=self.show_trend_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.line.MultiLineChartDesign dataclass

Bases: CommonDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series in legend e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/line.py
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
@dataclass(frozen=False)
class MultiLineChartDesign(CommonDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series in legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    show_smooth_line: bool = False
    show_trend_line: bool = False
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name, series_field_name=self.series_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order, series_sort_order=self.series_sort_order,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points,
        )
        ## chart details
        charting_spec = LineChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            show_smooth_line=self.show_smooth_line,
            show_trend_line=self.show_trend_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.line.MultiChartMultiLineChartDesign dataclass

Bases: CommonDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series in legend e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/line.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
@dataclass(frozen=False)
class MultiChartMultiLineChartDesign(CommonDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series in legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    is_time_series: bool = False
    show_major_ticks_only: bool = True
    show_markers: bool = True
    show_smooth_line: bool = False
    show_trend_line: bool = False
    rotate_x_labels: bool = False
    show_n_records: bool = True
    x_axis_font_size: int = 12
    y_axis_title: str = 'Freq'

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_series_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            series_field_name=self.series_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order,
            series_sort_order=self.series_sort_order,
            chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql,
            decimal_points=self.decimal_points,
        )
        ## chart details
        charting_spec = LineChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            series_legend_label=intermediate_charting_spec.series_field_name,
            rotate_x_labels=self.rotate_x_labels,
            show_n_records=self.show_n_records,
            is_time_series=self.is_time_series,
            show_major_ticks_only=self.show_major_ticks_only,
            show_markers=self.show_markers,
            show_smooth_line=self.show_smooth_line,
            show_trend_line=self.show_trend_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.category_field_name,
            y_axis_title=self.y_axis_title,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Pie Charts

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See PieChartDesign for the parameters configuring individual pie chart designs.

sofastats.output.charts.pie.PieChartDesign dataclass

Bases: CommonDesign

Parameters:

  • category_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    name of field in the x-axis

  • category_sort_order (SortOrder, default: VALUE ) –

    define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • show_n_records (bool, default: (True,) ) –

    show the number of records the chart is based on

Source code in src/sofastats/output/charts/pie.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
@dataclass(frozen=False)
class PieChartDesign(CommonDesign):
    """
    Args:
        category_field_name: name of field in the x-axis
        category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        show_n_records: show the number of records the chart is based on
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE

    show_n_records: bool = True,

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name,
            sort_orders=self.sort_orders, category_sort_order=self.category_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        charting_spec = PieChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
            show_n_records=self.show_n_records,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.pie.MultiChartPieChartDesign dataclass

Bases: CommonDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/pie.py
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
@dataclass(frozen=False)
class MultiChartPieChartDesign(CommonDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    category_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    show_n_records: bool = True,

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_category_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            category_field_name=self.category_field_name, chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            category_sort_order=self.category_sort_order, chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        charting_spec = PieChartingSpec(
            categories=intermediate_charting_spec.sorted_categories,
            indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
            show_n_records=self.show_n_records,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Scatter Plots

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

See SimpleScatterChartDesign for the parameters configuring individual scatter plot chart designs.

sofastats.output.charts.scatter_plot.SimpleScatterChartDesign dataclass

Bases: CommonDesign

Parameters:

  • x_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    field defining the x value of each x-y pair

  • y_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    field defining the y value of each x-y pair

  • show_dot_borders (bool, default: True ) –

    if Tue show borders around individual dots

  • show_n_records (bool, default: True ) –

    show the number of records the chart is based on

  • show_regression_line (bool, default: True ) –

    if True show regression line of best fit

  • x_axis_font_size (int, default: 10 ) –

    font size for x-axis labels

Source code in src/sofastats/output/charts/scatter_plot.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
@dataclass(frozen=False)
class SimpleScatterChartDesign(CommonDesign):
    """
    Args:
        x_field_name: field defining the x value of each x-y pair
        y_field_name: field defining the y value of each x-y pair
        show_dot_borders: if `Tue` show borders around individual dots
        show_n_records: show the number of records the chart is based on
        show_regression_line: if `True` show regression line of best fit
        x_axis_font_size: font size for x-axis labels
    """
    x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_dot_borders: bool = True
    show_n_records: bool = True
    show_regression_line: bool = True
    x_axis_font_size: int = 10

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_xy_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            x_field_name=self.x_field_name, y_field_name=self.y_field_name,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = ScatterChartingSpec(
            indiv_chart_specs=indiv_chart_specs,
            series_legend_label=None,
            show_dot_borders=self.show_dot_borders,
            show_n_records=self.show_n_records,
            show_regression_line=self.show_regression_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.x_field_name,
            y_axis_title=intermediate_charting_spec.y_field_name,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.scatter_plot.MultiChartScatterChartDesign dataclass

Bases: CommonDesign

Parameters:

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/scatter_plot.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
@dataclass(frozen=False)
class MultiChartScatterChartDesign(CommonDesign):
    """
    Args:
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    show_dot_borders: bool = True
    show_n_records: bool = True
    show_regression_line: bool = True
    x_axis_font_size: int = 10

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_xy_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            x_field_name=self.x_field_name, y_field_name=self.y_field_name,
            chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = ScatterChartingSpec(
            indiv_chart_specs=indiv_chart_specs,
            series_legend_label=None,
            show_dot_borders=self.show_dot_borders,
            show_n_records=self.show_n_records,
            show_regression_line=self.show_regression_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.x_field_name,
            y_axis_title=intermediate_charting_spec.y_field_name,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.scatter_plot.BySeriesScatterChartDesign dataclass

Bases: CommonDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series in the legend e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/scatter_plot.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
@dataclass(frozen=False)
class BySeriesScatterChartDesign(CommonDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series in the legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE

    show_dot_borders: bool = True
    show_n_records: bool = True
    show_regression_line: bool = True
    x_axis_font_size: int = 10

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_series_xy_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            x_field_name=self.x_field_name, y_field_name=self.y_field_name,
            series_field_name=self.series_field_name,
            sort_orders=self.sort_orders,
            series_sort_order=self.series_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = ScatterChartingSpec(
            indiv_chart_specs=indiv_chart_specs,
            series_legend_label=intermediate_charting_spec.series_field_name,
            show_dot_borders=self.show_dot_borders,
            show_n_records=self.show_n_records,
            show_regression_line=self.show_regression_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.x_field_name,
            y_axis_title=intermediate_charting_spec.y_field_name,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.charts.scatter_plot.MultiChartBySeriesScatterChartDesign dataclass

Bases: CommonDesign

Parameters:

  • series_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the series e.g. a series_field_name of 'Country' might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • series_sort_order (SortOrder, default: VALUE ) –

    define order of series in the legend e.g. SortOrder.VALUES or SortOrder.CUSTOM

  • chart_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the field name defining the charts e.g. a chart_field_name of 'Country' might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.

  • chart_sort_order (SortOrder, default: VALUE ) –

    define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM

Source code in src/sofastats/output/charts/scatter_plot.py
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
@dataclass(frozen=False)
class MultiChartBySeriesScatterChartDesign(CommonDesign):
    """
    Args:
        series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
            might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        series_sort_order: define order of series in the legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
        chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
            might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
        chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
    """
    x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    series_sort_order: SortOrder = SortOrder.VALUE
    chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    chart_sort_order: SortOrder = SortOrder.VALUE

    show_dot_borders: bool = True
    show_n_records: bool = True
    show_regression_line: bool = True
    x_axis_font_size: int = 10

    def to_html_design(self) -> HTMLItemSpec:
        # style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        intermediate_charting_spec = get_by_chart_series_xy_charting_spec(
            cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            x_field_name=self.x_field_name, y_field_name=self.y_field_name,
            series_field_name=self.series_field_name, chart_field_name=self.chart_field_name,
            sort_orders=self.sort_orders,
            series_sort_order=self.series_sort_order, chart_sort_order=self.chart_sort_order,
            table_filter_sql=self.table_filter_sql)
        ## charts details
        indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
        charting_spec = ScatterChartingSpec(
            indiv_chart_specs=indiv_chart_specs,
            series_legend_label=intermediate_charting_spec.series_field_name,
            show_dot_borders=self.show_dot_borders,
            show_n_records=self.show_n_records,
            show_regression_line=self.show_regression_line,
            x_axis_font_size=self.x_axis_font_size,
            x_axis_title=intermediate_charting_spec.x_field_name,
            y_axis_title=intermediate_charting_spec.y_field_name,
        )
        ## output
        html = get_html(charting_spec, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.CHART,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Tables

See CommonDesign for the parameters common to all output design dataclasses in sofastats - for example, style_name.

DimensionSpec defines the main parameters of both the Row and Column table dimensions. The only parameter Row and Column adds is the appropriate setting for is_col.

sofastats.output.tables.interfaces.DimensionSpec dataclass

Parameters:

  • variable_name (str) –

    name of variable

  • has_total (bool, default: False ) –

    if True add a total

  • is_col (bool, default: False ) –

    if True is a column

  • pct_metrics (Collection[Metric] | None, default: None ) –

    define which metrics to display - options: Metric.ROW_PCT and Metric.COL_PCT

  • sort_order (SortOrder | str, default: VALUE ) –

    sort order of variable

  • child (Self | None, default: None ) –

    a child DimensionSpec if nesting underneath

Source code in src/sofastats/output/tables/interfaces.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
@dataclass(frozen=False)
class DimensionSpec:
    """
    Args:
        variable_name: name of variable
        has_total: if `True` add a total
        is_col: if `True` is a column
        pct_metrics: define which metrics to display - options: `Metric.ROW_PCT` and `Metric.COL_PCT`
        sort_order: sort order of variable
        child: a child DimensionSpec if nesting underneath
    """
    variable_name: str
    has_total: bool = False
    is_col: bool = False
    pct_metrics: Collection[Metric] | None = None
    sort_order: SortOrder | str = SortOrder.VALUE
    child: Self | None = None

    @property
    def descendant_vars(self) -> list[str]:
        """
        All variables under, but not including, this DimensionSpec.
        Note - only includes chains, not trees, as a deliberate design choice to avoid excessively complicated tables.
        Tables are for computers to make, but for humans to read and understand :-).
        """
        dim_vars = []
        if self.child:
            dim_vars.append(self.child.variable_name)
            dim_vars.extend(self.child.descendant_vars)
        return dim_vars

    @property
    def self_and_descendants(self) -> list[Self]:
        """
        All DimensionSpecs under, and including, this DimensionSpec.
        """
        dims = [self, ]
        if self.child:
            dims.extend(self.child.self_and_descendants)
        return dims

    @property
    def self_and_descendant_vars(self) -> list[str]:
        """
        All variable names under, and including, this DimensionSpec.
        """
        return [dim.variable_name for dim in self.self_and_descendants]

    @property
    def self_and_descendant_totalled_vars(self) -> list[str]:
        """
        All variables under, and including, this DimensionSpec that are totalled (if any).
        """
        return [dim.variable_name for dim in self.self_and_descendants if dim.has_total]

    @property
    def self_or_descendant_pct_metrics(self) -> Collection[Metric] | None:
        """
        All percentage metrics (row and/or column percentages) under, or for, this DimensionSpec.
        """
        if self.pct_metrics:
            return self.pct_metrics
        elif self.child:
            return self.child.self_or_descendant_pct_metrics
        else:
            return None

    def __post_init__(self):
        if self.pct_metrics:
            if self.child:
                raise ValueError(f"Metrics are only for terminal dimension specs e.g. a > b > c (can have metrics)")
            if not self.is_col:
                raise ValueError(f"Metrics are only for terminal column specs, yet this is a row spec")
        if self.child:
            if not self.is_col == self.child.is_col:
                raise ValueError(f"This dim has a child that is inconsistent e.g. a col parent having a row child")
        if self.variable_name in self.descendant_vars:
            raise ValueError("Variables can't be repeated in the same dimension spec "
                f"e.g. Car > Country > Car. Variable {self.variable_name}")

sofastats.output.tables.interfaces.Row dataclass

Bases: DimensionSpec

Source code in src/sofastats/output/tables/interfaces.py
104
105
106
107
108
109
@dataclass(frozen=False)
class Row(DimensionSpec):

    def __post_init__(self):
        self.is_col = False
        super().__post_init__()

sofastats.output.tables.interfaces.Column dataclass

Bases: DimensionSpec

Source code in src/sofastats/output/tables/interfaces.py
112
113
114
115
116
117
@dataclass(frozen=False)
class Column(DimensionSpec):

    def __post_init__(self):
        self.is_col = True
        super().__post_init__()

sofastats.output.tables.freq.FrequencyTableDesign dataclass

Bases: CommonDesign

Parameters:

  • row_variable_designs (list[Row], default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    list of Rows

  • include_column_percent (bool, default: False ) –

    if True add a column percentage column

Source code in src/sofastats/output/tables/freq.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
@dataclass(frozen=False, kw_only=True)
class FrequencyTableDesign(CommonDesign):
    """
    Args:
        row_variable_designs: list of Rows
        include_column_percent: if `True` add a column percentage column
    """
    row_variable_designs: list[Row] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    include_column_percent: bool = False
    debug: bool = False
    verbose: bool = False

    @property
    def totalled_vars(self) -> list[str]:
        tot_vars = []
        for row_spec in self.row_variable_designs:
            tot_vars.extend(row_spec.self_and_descendant_totalled_vars)
        return tot_vars

    @property
    def max_row_depth(self) -> int:
        max_depth = 0
        for row_spec in self.row_variable_designs:
            row_depth = len(row_spec.self_and_descendant_vars)
            if row_depth > max_depth:
                max_depth = row_depth
        return max_depth

    def __post_init__(self):
        CommonDesign.__post_init__(self)
        row_vars = [spec.variable_name for spec in self.row_variable_designs]
        row_dupes = set()
        seen = set()
        for row_var in row_vars:
            if row_var in seen:
                row_dupes.add(row_var)
            else:
                seen.add(row_var)
        if row_dupes:
            raise ValueError(f"Duplicate top-level variable(s) detected in row dimension - {sorted(row_dupes)}")

    def get_row_df(self, cur, *, row_idx: int, dp: int = 2) -> pd.DataFrame:
        """
        See cross_tab docs
        """
        row_spec = self.row_variable_designs[row_idx]
        totalled_variables = row_spec.self_and_descendant_totalled_vars
        row_vars = row_spec.self_and_descendant_vars
        data = get_data_from_spec(cur, dbe_spec=self.dbe_spec,
            source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
            all_variables=row_vars, totalled_variables=totalled_variables, debug=self.debug)
        n_row_fillers = self.max_row_depth - len(row_vars)
        df = get_all_metrics_df_from_vars(data, row_vars=row_vars, n_row_fillers=n_row_fillers,
            inc_col_pct=self.include_column_percent,
            dp=dp, debug=self.debug)
        return df

    def get_tbl_df(self, cur) -> pd.DataFrame:
        """
        See cross_tab docs
        """
        dfs = [self.get_row_df(cur, row_idx=row_idx, dp=self.decimal_points)
            for row_idx in range(len(self.row_variable_designs))]
        df_t = dfs[0].T
        dfs_remaining = dfs[1:]
        for df_next in dfs_remaining:
            df_t = df_t.join(df_next.T, how='outer')
        df = df_t.T  ## re-transpose back so cols are cols and rows are rows again
        if self.debug: print(f"\nCOMBINED:\n{df}")
        ## Sorting indexes
        raw_df = get_raw_df(cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name)
        order_rules_for_multi_index_branches = get_order_rules_for_multi_index_branches(self.row_variable_designs)
        ## ROWS
        unsorted_row_multi_index_list = list(df.index)
        sorted_row_multi_index_list = get_sorted_multi_index_list(
            unsorted_row_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_multi_index_branches,
            sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=False, debug=self.debug)
        sorted_row_multi_index = pd.MultiIndex.from_tuples(
            sorted_row_multi_index_list)  ## https://pandas.pydata.org/docs/user_guide/advanced.html
        sorted_col_multi_index_list = sorted(
            df.columns, key=lambda metric_label_and_metric: get_metric2order(metric_label_and_metric[1]))
        sorted_col_multi_index = pd.MultiIndex.from_tuples(sorted_col_multi_index_list)
        df = df.reindex(index=sorted_row_multi_index, columns=sorted_col_multi_index)
        if self.debug: print(f"\nORDERED:\n{df}")
        return df

    def to_html_design(self) -> HTMLItemSpec:
        get_tbl_df_for_cur = partial(self.get_tbl_df)
        df = get_tbl_df_for_cur(self.cur)
        pd_styler = set_table_styles(df.style)
        style_spec = get_style_spec(style_name=self.style_name)
        pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='rows')
        pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='columns')
        raw_tbl_html = pd_styler.to_html()
        if self.debug:
            print(raw_tbl_html)
        ## Fix
        html = raw_tbl_html
        html = fix_top_left_box(html, style_spec, debug=self.debug, verbose=self.verbose)
        html = merge_cols_of_blanks(html, debug=self.debug)
        if self.debug:
            print(pd_styler.uuid)  ## A unique identifier to avoid CSS collisions; generated automatically.
            print(html)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.MAIN_TABLE,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

sofastats.output.tables.cross_tab.CrossTabDesign dataclass

Bases: CommonDesign

Parameters:

  • row_variable_designs (list[Row], default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    list of Rows

  • column_variable_designs (list[Column], default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    list of Columns

Source code in src/sofastats/output/tables/cross_tab.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
@dataclass(frozen=False, kw_only=True)
class CrossTabDesign(CommonDesign):
    """
    Args:
        row_variable_designs: list of Rows
        column_variable_designs: list of Columns
    """
    row_variable_designs: list[Row] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    column_variable_designs: list[Column] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    debug: bool = False
    verbose: bool = False

    @staticmethod
    def _get_dupes(_vars: Collection[str]) -> set[str]:
        dupes = set()
        seen = set()
        for var in _vars:
            if var in seen:
                dupes.add(var)
            else:
                seen.add(var)
        return dupes

    @property
    def totalled_vars(self) -> list[str]:
        tot_vars = []
        for row_spec in self.row_variable_designs:
            tot_vars.extend(row_spec.self_and_descendant_totalled_vars)
        for col_spec in self.column_variable_designs:
            tot_vars.extend(col_spec.self_and_descendant_totalled_vars)
        return tot_vars

    def _get_max_dim_depth(self, *, is_col=False) -> int:
        max_depth = 0
        dim_specs = self.column_variable_designs if is_col else self.row_variable_designs
        for dim_spec in dim_specs:
            dim_depth = len(dim_spec.self_and_descendant_vars)
            if dim_depth > max_depth:
                max_depth = dim_depth
        return max_depth

    @property
    def max_row_depth(self) -> int:
        return self._get_max_dim_depth()

    @property
    def max_col_depth(self) -> int:
        return self._get_max_dim_depth(is_col=True)

    def __post_init__(self):
        CommonDesign.__post_init__(self)
        row_dupes = CrossTabDesign._get_dupes([spec.variable_name for spec in self.row_variable_designs])
        if row_dupes:
            raise ValueError(f"Duplicate top-level variable(s) detected in row dimension - {sorted(row_dupes)}")
        col_dupes = CrossTabDesign._get_dupes([spec.variable_name for spec in self.column_variable_designs])
        if col_dupes:
            raise ValueError(f"Duplicate top-level variable(s) detected in column dimension - {sorted(col_dupes)}")
        ## var can't be in both row and col e.g. car vs country > car
        for row_spec, col_spec in product(self.row_variable_designs, self.column_variable_designs):
            row_spec_vars = set([row_spec.variable_name] + row_spec.descendant_vars)
            col_spec_vars = set([col_spec.variable_name] + col_spec.descendant_vars)
            overlapping_vars = row_spec_vars.intersection(col_spec_vars)
            if overlapping_vars:
                raise ValueError("Variables can't appear in both rows and columns. "
                    f"Found the following overlapping variable(s): {', '.join(overlapping_vars)}")

    def get_df_from_row_spec(self, cur, *, row_spec_idx: int) -> pd.DataFrame:
        """
        get a combined df for, e.g. the combined top df. Or the middle df. Or the bottom df. Or whatever you have.
        e.g.
        row_variables_design_1 = Row(variable='country', has_total=True,
            child=(variable='gender', has_total=True))
        vs
        column_variables_design_1 = Column(variable='Age Group', has_total=True)
        column_variables_design_2 = Column(variable='Web Browser', has_total=True,
            child=Column(variable='Age Group', has_total=True, pct_metrics=[Metric.ROW_PCT, Metric.COL_PCT]))
        column_variables_design_3 = Column(variable='Standard Age Group', has_total=True)
        """
        row_spec = self.row_variable_designs[row_spec_idx]
        row_vars = row_spec.self_and_descendant_vars
        n_row_fillers = self.max_row_depth - len(row_vars)
        df_cols = []
        for col_spec in self.column_variable_designs:
            col_vars = col_spec.self_and_descendant_vars
            totalled_variables = row_spec.self_and_descendant_totalled_vars + col_spec.self_and_descendant_totalled_vars
            all_variables = row_vars + col_vars
            data = get_data_from_spec(cur, dbe_spec=self.dbe_spec,
                source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
                all_variables=all_variables, totalled_variables=totalled_variables, debug=self.debug)
            df_col = get_all_metrics_df_from_vars(data, row_vars=row_vars, col_vars=col_vars,
                n_row_fillers=n_row_fillers, n_col_fillers=self.max_col_depth - len(col_vars),
                pct_metrics=col_spec.self_or_descendant_pct_metrics, dp=self.decimal_points, debug=self.debug)
            df_cols.append(df_col)
        df = df_cols[0]
        df_cols_remaining = df_cols[1:]
        row_merge_on = []
        for row_var in row_vars:
            row_merge_on.append(get_pandas_friendly_name(row_var, '_var'))
            row_merge_on.append(row_var)
        for i in range(n_row_fillers):
            row_merge_on.append(f'row_filler_var_{i}')
            row_merge_on.append(f'row_filler_{i}')
        for df_next_col in df_cols_remaining:
            df = df.merge(df_next_col, how='outer', on=row_merge_on)
        return df

    def get_tbl_df(self, cur) -> pd.DataFrame:
        """
        For each row_variable_designs get a completed df and then merge those.

        Note - using pd.concat or df.merge(how='outer') has the same result, but I use merge for horizontal joining
        to avoid repeating the row dimension columns e.g. country and gender.

        Basically we are merging left and right dfs. Merging is typically on an id field that both parts share.
        In this case there are as many fields to merge on as there are fields in the row index -
        in this example there are 4 (var_00, val_00, var_01, and val_01).
        There is one added complexity because the column is multi-index.
        We need to supply a tuple with an item (possibly an empty string) for each level.
        In this case there are two levels (browser and age_group). So we merge on
        [('var_00', ''), ('val_00', ''), ('var_01', ''), ('val_01', '')]
        If there were three row levels and four col levels we would need something like:
        [('var_00', '', '', ''), ('val_00', '', '', ''), ... ('val_02', '', '', '')]

        BOTTOM LEFT:
        browser    var_00       val_00     var_01     val_01 Chrome                       Firefox
        agegroup                                                <20 20-29 30-39 40-64 65+     <20 20-29 30-39 40-64 65+
        0         Country           NZ  __blank__  __blank__     10    19    17    28  44      25    26    14    38  48
        ...

        BOTTOM RIGHT:
        agegroup   var_00       val_00     var_01     val_01 <20 20-29 30-39 40-64 65+
        dummy
        0         Country           NZ  __blank__  __blank__  35    45    31    66  92
        ...

        Note - we flatten out the row multi-index using reset_index().
        This flattening results in a column per row variable e.g. one for country and one for gender
         (at this point we're ignoring the labelling step where we split each row variable e.g. for country into Country (var) and NZ (val)).
        Given it is a column, it has to have as many levels as the column dimension columns.
        So if there are two column dimension levels each row column will need to be a two-tuple e.g. ('gender', '').
        If there were three column dimension levels the row column would need to be a three-tuple e.g. ('gender', '', '').
        """
        dfs = [self.get_df_from_row_spec(cur, row_spec_idx=row_spec_idx)
            for row_spec_idx in range(len(self.row_variable_designs))]
        ## COMBINE using pandas JOINing (the big magic trick at the middle of this approach to complex table-making)
        ## Unfortunately, delegating to Pandas means we can't fix anything intrinsic to what Pandas does.
        ## And there is a bug (from my point of view) whenever tables are merged with the same variables at the top level.
        ## To prevent this we have to disallow variable reuse at top-level.
        ## transpose, join, and re-transpose back. JOINing on rows works differently from columns and will include all items in sub-levels under the correct upper levels even if missing from the first multi-index
        ## E.g. if Age Group > 40-64 is missing from the first index it will not be appended on the end but will be alongside all its siblings so we end up with Age Group > >20, 20-29 30-39, 40-64, 65+
        ## Note - variable levels (odd numbered levels if 1 is the top level) should be in the same order as they were originally
        df_t = dfs[0].T
        dfs_remaining = dfs[1:]
        for df_next in dfs_remaining:
            df_t = df_t.join(df_next.T, how='outer')
        df = df_t.T  ## re-transpose back so cols are cols and rows are rows again
        if self.debug: print(f"\nCOMBINED:\n{df}")
        ## Sorting indexes
        raw_df = get_raw_df(cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name, debug=self.debug)
        order_rules_for_row_multi_index_branches = get_order_rules_for_multi_index_branches(self.row_variable_designs)
        order_rules_for_col_multi_index_branches = get_order_rules_for_multi_index_branches(self.column_variable_designs)
        ## COLS
        unsorted_col_multi_index_list = list(df.columns)
        sorted_col_multi_index_list = get_sorted_multi_index_list(
            unsorted_col_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_col_multi_index_branches,
            sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=True, debug=self.debug)
        sorted_col_multi_index = pd.MultiIndex.from_tuples(sorted_col_multi_index_list)  ## https://pandas.pydata.org/docs/user_guide/advanced.html
        ## ROWS
        unsorted_row_multi_index_list = list(df.index)
        sorted_row_multi_index_list = get_sorted_multi_index_list(
            unsorted_row_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_row_multi_index_branches,
            sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=False, debug=self.debug)
        sorted_row_multi_index = pd.MultiIndex.from_tuples(sorted_row_multi_index_list)  ## https://pandas.pydata.org/docs/user_guide/advanced.html
        df = df.reindex(index=sorted_row_multi_index, columns=sorted_col_multi_index)
        if self.debug: print(f"\nORDERED:\n{df}")
        return df

    def to_html_design(self) -> HTMLItemSpec:
        get_tbl_df_for_cur = partial(self.get_tbl_df)
        df = get_tbl_df_for_cur(self.cur)
        pd_styler = set_table_styles(df.style)
        style_spec = get_style_spec(style_name=self.style_name)
        pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='rows')
        pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='columns')
        raw_tbl_html = pd_styler.to_html()
        if self.debug:
            print(raw_tbl_html)
        ## Fix
        html = raw_tbl_html
        html = fix_top_left_box(html, style_spec, debug=self.debug, verbose=self.verbose)
        html = merge_cols_of_blanks(html, debug=self.debug)
        html = merge_rows_of_blanks(html, debug=self.debug, verbose=self.verbose)
        if self.debug:
            print(pd_styler.uuid)
            print(html)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.MAIN_TABLE,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Statistical Tests

sofastats.output.stats.interfaces.CommonStatsDesign dataclass

Bases: CommonDesign

Output dataclasses for statistical tests (e.g. MannWhitneyUDesign) inherit from CommonStatsDesign.

Source code in src/sofastats/output/stats/interfaces.py
15
16
17
18
19
20
21
22
23
24
25
class CommonStatsDesign(CommonDesign):
    """
    Output dataclasses for statistical tests (e.g. MannWhitneyUDesign) inherit from CommonStatsDesign.
    """

    @abstractmethod
    def to_result(self) -> Type[StatsResult]:
        """
        Return a dataclass with results as attributes
        """
        pass

to_result() -> Type[StatsResult] abstractmethod

Return a dataclass with results as attributes

Source code in src/sofastats/output/stats/interfaces.py
20
21
22
23
24
25
@abstractmethod
def to_result(self) -> Type[StatsResult]:
    """
    Return a dataclass with results as attributes
    """
    pass

ANOVA

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.anova.AnovaDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • measure_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field aggregated by group - the ANOVA compares the mean value of each group. For example, 'Age'

  • grouping_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field used to define the groups compared in the ANOVA e.g. 'Country'

  • group_values (Collection[Any], default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the ANOVA will compare the means of the groups defined by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']

  • high_precision_required (bool, default: False ) –

    if True, the calculation will be high precision and the algorithm used will not be vulnerable to certain edge cases. Why not use it by default? Because it runs much, much, much slower and the edge cases are quite rare. The high precision algorithm uses Python's decimal data type rather than floats. Using floating point math is a pragmatic strategy, but it reduces accuracy. In particular edge cases, it can produce wildly different results from the correct results. High precision is needed to handle difficult datasets e.g. ANOVA test 9 from the NIST website. Search for articles / videos on the topic of floating point math if interested. It is a fascinating topic. If to one decimal point the high precision algorithm also multiplies some values by 10 to push from float to integer (to reduce error) and then divides squared values by 100 (10 squared) at the end in key calculations to restore to correct magnitude.

Source code in src/sofastats/output/stats/anova.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
@dataclass(frozen=False)
class AnovaDesign(CommonStatsDesign):
    """
    Args:
        measure_field_name: the name of the field aggregated by group - the ANOVA compares the mean value of each group.
            For example, 'Age'
        grouping_field_name: the name of the field used to define the groups compared in the ANOVA e.g. 'Country'
        group_values: the ANOVA will compare the means of the groups defined
            by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
        high_precision_required: if `True`, the calculation will be high precision
            and the algorithm used will not be vulnerable to certain edge cases.
            Why not use it by default? Because it runs much, much, much slower and the edge cases are quite rare.
            The high precision algorithm uses Python's
            [decimal](https://docs.python.org/3/library/decimal.html) data type rather than floats.
            Using floating point math is a pragmatic strategy, but it reduces accuracy.
            In particular edge cases, it can produce wildly different results from the correct results.
            High precision is needed to handle difficult datasets e.g. ANOVA test 9 from the NIST website.
            Search for articles / videos on the topic of floating point math if interested. It is a fascinating topic.
            If to one decimal point the high precision algorithm also multiplies some values by 10
            to push from float to integer (to reduce error) and then divides squared values by 100 (10 squared)
            at the end in key calculations to restore to correct magnitude.
    """
    measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_values: Collection[Any] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    high_precision_required: bool = False

    def to_result(self) -> AnovaResult:
        ## values (sorted)
        grouping_field_values = apply_custom_sorting_to_values(
            variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
        ## data
        grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
        ## build sample results ready for anova function
        samples = []
        for grouping_field_value in grouping_field_values:
            grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
                val_is_numeric=grouping_val_is_numeric)
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
                table_filter_sql=self.table_filter_sql)
            samples.append(sample)
        stats_result = anova_stats_calc(
            self.grouping_field_name, self.measure_field_name, samples, high=self.high_precision_required)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## values (sorted)
        grouping_field_values = apply_custom_sorting_to_values(
            variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
        ## data
        grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
        ## build sample results ready for anova function
        samples = []
        for grouping_field_value in grouping_field_values:
            grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
                val_is_numeric=grouping_val_is_numeric)
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
                table_filter_sql=self.table_filter_sql)
            samples.append(sample)
        ## calculations
        stats_result = anova_stats_calc(
            self.grouping_field_name, self.measure_field_name, samples, high=self.high_precision_required)
        ## output
        histograms2show = []
        for group_spec in stats_result.group_specs:
            try:
                histogram_html = get_embedded_histogram_html(
                    self.measure_field_name, style_spec.chart, group_spec.vals, group_spec.label)
            except Exception as e:
                html_or_msg = f"<b>{group_spec.label}</b> - unable to display histogram. Reason: {e}"
            else:
                html_or_msg = histogram_html
            histograms2show.append(html_or_msg)
        result = Result(**todict(stats_result),
            grouping_field_name=self.grouping_field_name,
            measure_field_name=self.measure_field_name,
            histograms2show=histograms2show,
            decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Chi Square

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.chi_square.ChiSquareDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the first variable

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the second variable

  • show_workings (bool, default: False ) –

    show the workings so you can see how the final results were derived

Source code in src/sofastats/output/stats/chi_square.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
@dataclass(frozen=False)
class ChiSquareDesign(CommonStatsDesign):
    """
    Args:
        variable_a_name: the name of the first variable
        variable_a_name: the name of the second variable
        show_workings: show the workings so you can see how the final results were derived
    """
    variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_workings: bool = False

    def to_result(self) -> ChiSquareResult:
        ## data
        chi_square_data = get_chi_square_data(cur=self.cur, dbe_spec=self.dbe_spec,
            source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            sort_orders=self.sort_orders)
        ## get results
        stats_result = chi_square_stats_calc(
            f_obs=chi_square_data.observed_values_a_then_b_ordered,
            f_exp=chi_square_data.expected_values_a_then_b_ordered,
            df=chi_square_data.degrees_of_freedom)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        chi_square_data = get_chi_square_data(cur=self.cur, dbe_spec=self.dbe_spec,
            source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            sort_orders=self.sort_orders)
        ## get results
        stats_result = chi_square_stats_calc(
            f_obs=chi_square_data.observed_values_a_then_b_ordered,
            f_exp=chi_square_data.expected_values_a_then_b_ordered,
            df=chi_square_data.degrees_of_freedom)

        observed_vs_expected_tbl = get_observed_vs_expected_tbl(
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
            observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
            expected_values_a_then_b_ordered=chi_square_data.expected_values_a_then_b_ordered,
            style_name_hyphens=style_spec.style_name_hyphens,
        )

        chi_square_charts = get_chi_square_charts(
            style_spec=style_spec,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
            observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered)

        if self.show_workings:
            worked_result = get_worked_result(
                variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
                observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
                degrees_of_freedom=chi_square_data.degrees_of_freedom)
            worked_example = get_worked_example(worked_result)
        else:
            worked_result = None
            worked_example = ''
        result = Result(
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
            observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
            expected_values_a_then_b_ordered=chi_square_data.expected_values_a_then_b_ordered,
            p=stats_result.p, chi_square=stats_result.chi_square, degrees_of_freedom=chi_square_data.degrees_of_freedom,
            minimum_cell_count=chi_square_data.minimum_cell_count, pct_cells_lt_5=chi_square_data.pct_cells_freq_under_5,
            observed_vs_expected_tbl=observed_vs_expected_tbl, chi_square_charts=chi_square_charts,
            worked_example=worked_example, decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Kruskal-Wallis H

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.kruskal_wallis_h.KruskalWallisHDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • measure_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field aggregated by group - the analysis compares the mean value of each group. For example, 'Age'

  • grouping_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field used to define the groups compared in the analysis e.g. 'Country'

  • group_values (Sequence[Any], default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the analysis will compare the means of the groups defined by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']

  • show_workings (bool, default: False ) –

    show the workings so you can see how the final results were derived

Source code in src/sofastats/output/stats/kruskal_wallis_h.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
@dataclass(frozen=False)
class KruskalWallisHDesign(CommonStatsDesign):
    """
    Args:
        measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
            For example, 'Age'
        grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
        group_values: the analysis will compare the means of the groups defined
            by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
        show_workings: show the workings so you can see how the final results were derived
    """
    measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_values: Sequence[Any] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_workings: bool = False

    def to_result(self) -> KruskalWallisHResult:
        ## values (sorted)
        grouping_field_values = apply_custom_sorting_to_values(
            variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
        ## data
        grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
        samples = []
        for grouping_field_value in grouping_field_values:
            grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
                val_is_numeric=grouping_val_is_numeric)
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
                table_filter_sql=self.table_filter_sql)
            samples.append(sample)
        stats_result = kruskal_wallis_h_stats_calc(samples)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## values (sorted)
        grouping_field_values = apply_custom_sorting_to_values(
            variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
        ## data
        grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
        samples = []
        for grouping_field_value in grouping_field_values:
            grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
                val_is_numeric=grouping_val_is_numeric)
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
                table_filter_sql=self.table_filter_sql)
            samples.append(sample)
        stats_result = kruskal_wallis_h_stats_calc(samples)
        result = Result(**todict(stats_result),
            grouping_field_name=self.grouping_field_name,
            measure_field_name=self.measure_field_name,
            decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Mann-Whitney U

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.mann_whitney_u.MannWhitneyUDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • measure_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field aggregated by group - the analysis compares the mean value of each group. For example, 'Age'

  • grouping_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field used to define the groups compared in the analysis e.g. 'Country'

  • group_a_value (Any, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the analysis will compare the ranks of this group against the ranks of the group defined by group_b_value

  • group_b_value (Any, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the analysis will compare the ranks of this group against the ranks of the group defined by group_a_value

  • show_workings (bool, default: False ) –

    show the workings so you can see how the final results were derived

Source code in src/sofastats/output/stats/mann_whitney_u.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
@dataclass(frozen=False)
class MannWhitneyUDesign(CommonStatsDesign):
    """
    Args:
        measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
            For example, 'Age'
        grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
        group_a_value: the analysis will compare the ranks of this group
            against the ranks of the group defined by group_b_value
        group_b_value: the analysis will compare the ranks of this group
            against the ranks of the group defined by group_a_value
        show_workings: show the workings so you can see how the final results were derived
    """
    measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_a_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_b_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_workings: bool = False

    def to_result(self) -> MannWhitneyUResult:
        ## build samples ready for mann whitney u function
        grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
        sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
        sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        stats_result = mann_whitney_u_stats_calc(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        ## build samples ready for mann whitney u function
        grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
        sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
        sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        ## get result
        stats_result = mann_whitney_u_stats_calc(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
        n_a = stats_result.group_a_spec.n
        n_b = stats_result.group_b_spec.n
        even_matches = (n_a * n_b) / float(2)

        if self.show_workings:
            result_workings = mann_whitney_u_for_workings(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
            worked_example = get_worked_example(result_workings, style_spec.style_name_hyphens)
        else:
            worked_example = ''

        result = Result(**todict(stats_result),
            sample_a=sample_a,
            sample_b=sample_b,
            grouping_field_name=self.grouping_field_name,
            measure_field_name=self.measure_field_name,
            n_a=n_a,
            n_b=n_b,
            even_matches=even_matches,
            worked_example=worked_example,
            decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Normality

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.normality.NormalityDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    if only this variable name is supplied, display the distribution and test it for normality. If another variable name is also supplied, do the same thing but for the difference between the two variables.

  • variable_b_name (str | None, default: None ) –

    if supplied, will be testing the normality of the difference between two variables rather than the normality of a variable.

Source code in src/sofastats/output/stats/normality.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
@dataclass(frozen=False)
class NormalityDesign(CommonStatsDesign):
    """
    Args:
        variable_a_name: if only this variable name is supplied, display the distribution and test it for normality.
            If another variable name is also supplied, do the same thing
            but for the difference between the two variables.
        variable_b_name: if supplied, will be testing the normality of the difference between two variables
            rather than the normality of a variable.
    """
    variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    variable_b_name: str | None = None

    def to_result(self) -> NormalTestResult:
        ## data
        paired = self.variable_b_name is not None
        if paired:
            sample = get_paired_diffs_sample(
                cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
                table_filter_sql=self.table_filter_sql)
        else:
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                measure_field_name=self.variable_a_name, grouping_filt=None, table_filter_sql=self.table_filter_sql)
        n_vals = len(sample.vals)
        if n_vals < MIN_VALS_FOR_NORMALITY_TEST:
            raise Exception(f"We need at least {MIN_VALS_FOR_NORMALITY_TEST:,} values to test normality.")
        else:
            stats_result = normal_test(sample.vals)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        paired = self.variable_b_name is not None
        if paired:
            data_label = f'Difference Between "{self.variable_a_name}" and "{self.variable_b_name}"'
            sample = get_paired_diffs_sample(
                cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
                table_filter_sql=self.table_filter_sql)
        else:
            data_label = self.variable_a_name
            sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
                measure_field_name=self.variable_a_name, grouping_filt=None, table_filter_sql=self.table_filter_sql)
        title = f"Normality Tests for {data_label}"
        ## message
        n_vals = len(sample.vals)
        if n_vals < MIN_VALS_FOR_NORMALITY_TEST:
            message = (f"<p>We need at least {MIN_VALS_FOR_NORMALITY_TEST:,} values to test normality.</p>"
            "<p>Rely entirely on visual inspection of graph above.</p>")
        else:
            try:
                stats_result = normal_test(sample.vals)
            except Exception as e:
                logger.info(f"Unable to calculate normality. Orig error: {e}")
                message = "<p>Unable to calculate normality tests</p>"
            else:
                ## skew
                if abs(stats_result.c_skew) <= 1:
                    skew_indication = 'a great sign'
                elif abs(stats_result.c_skew) <= 2:
                    skew_indication = 'a good sign'
                else:
                    skew_indication = 'not a good sign'
                skew_msg = (f"Skew (lopsidedness) is {round(stats_result.c_skew, self.decimal_points)} "
                    f"which is probably {skew_indication}.")
                ## kurtosis
                if abs(stats_result.c_kurtosis) <= 1:
                    kurtosis_indication = 'a great sign'
                elif abs(stats_result.c_kurtosis) <= 2:
                    kurtosis_indication = 'a good sign'
                else:
                    kurtosis_indication = 'not a good sign'
                kurtosis_msg = (
                    f"Kurtosis (peakedness or flatness) is {round(stats_result.c_kurtosis, self.decimal_points)} "
                    f"which is probably {kurtosis_indication}.")
                ## combined
                if n_vals > N_WHERE_NORMALITY_USUALLY_FAILS_NO_MATTER_WHAT:
                    message = ("<p>Rely on visual inspection of graph to assess normality.</p>"
                        "<p>Although the data failed the ideal normality test, "
                        f"most real-world data-sets with as many results ({n_vals:,}) would fail "
                        f"for even slight differences from the perfect normal curve.</p>"
                        f"<p>{skew_msg}</p><p>{kurtosis_msg}</p>")
                else:
                    if stats_result.p < 0.05:
                        message = (f"<p>The distribution of {data_label} passed one test for normality.</p>"
                            f"<p>Confirm or reject based on visual inspection of graph. {skew_msg} {kurtosis_msg}</p>")
                    else:
                        message = (f'<p>Although the distribution of {data_label} is not perfectly "normal", '
                            f'it may still be "normal" enough for use. View graph to decide.</p>'
                            f"<p>{skew_msg}</p></p>{kurtosis_msg}</p>")
        ## histogram
        histogram = get_embedded_histogram_html(measure_field_label=data_label, style_spec=style_spec.chart,
            vals=sample.vals, width_scalar=1.5, label_chart_from_var_if_needed=False)

        result = Result(
            title=title,
            message=message,
            histogram=histogram,
        )
        html = get_html(result)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Pearson's R Correlation

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.pearsons_r.PearsonsRDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the first variable in each pair we are checking for correlation

  • variable_b_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the second variable in each pair we are checking for correlation

Source code in src/sofastats/output/stats/pearsons_r.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
@dataclass(frozen=False)
class PearsonsRDesign(CommonStatsDesign):
    """
    Args:
        variable_a_name: the first variable in each pair we are checking for correlation
        variable_b_name: the second variable in each pair we are checking for correlation
    """
    variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    def to_result(self) -> CorrelationCalcResult:
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        stats_result = pearsonsr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        coords = [Coord(x=x, y=y) for x, y in zip(paired_data.sample_a.vals, paired_data.sample_b.vals, strict=True)]
        pearsonsr_calc_result = pearsonsr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
        regression_result = get_regression_result(xs=paired_data.sample_a.vals,ys=paired_data.sample_b.vals)

        correlation_result = CorrelationResult(
            variable_a_name=self.variable_a_name,
            variable_b_name=self.variable_b_name,
            coords=coords,
            stats_result=pearsonsr_calc_result,
            regression_result=regression_result,
            decimal_points=self.decimal_points,
        )

        scatterplot_series = ScatterplotSeries(
            coords=correlation_result.coords,
            dot_colour=style_spec.chart.colour_mappings[0].main,
            dot_line_colour=style_spec.chart.major_grid_line_colour,
            show_regression_details=True,
        )
        vars_series = [scatterplot_series, ]
        xs = correlation_result.xs
        ys = correlation_result.ys
        x_min, x_max = get_optimal_min_max(axis_min=min(xs), axis_max=max(xs))
        y_min, y_max = get_optimal_min_max(axis_min=min(ys), axis_max=max(ys))
        chart_conf = ScatterplotConf(
            width_inches=7.5,
            height_inches=4.0,
            inner_background_colour=style_spec.chart.plot_bg_colour,
            text_colour=style_spec.chart.axis_font_colour,
            x_axis_label=correlation_result.variable_a_name,
            y_axis_label=correlation_result.variable_b_name,
            show_dot_lines=True,
            x_min=x_min,
            x_max=x_max,
            y_min=y_min,
            y_max=y_max,
        )
        fig = get_scatterplot_fig(vars_series, chart_conf)
        image_as_data = plot2image_as_data(fig)
        scatterplot_html = f'<img src="{image_as_data}"/>'

        result = Result(**todict(correlation_result),
            scatterplot_html=scatterplot_html,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Spearman's R Correlation

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.spearmans_r.SpearmansRDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the first variable in each pair we are checking for correlation

  • variable_b_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the second variable in each pair we are checking for correlation

  • show_workings (bool, default: False ) –

    show the workings so you can see how the final results were derived

  • high_volume_ok (bool, default: False ) –

    the algorithm is more expensive than those which can make parametric assumptions so we need to stop people unknowingly starting very slow operations. This setting has no impact if the number of records is less than MAX_RANK_DATA_VALS (currently 50,000 records). If set to False, an exception is raised if the code is being asked to operate on an amount of data which will make it run very slowly. If True, the operation is allowed to proceed but a message tells the user they can expect the process to take a fairly long time (so they don't terminate early on the assumption that something has gone wrong the analysis is never going to finish).

Source code in src/sofastats/output/stats/spearmans_r.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
@dataclass(frozen=False)
class SpearmansRDesign(CommonStatsDesign):
    """
    Args:
        variable_a_name: the first variable in each pair we are checking for correlation
        variable_b_name: the second variable in each pair we are checking for correlation
        show_workings: show the workings so you can see how the final results were derived
        high_volume_ok: the algorithm is more expensive than those which can make
            parametric assumptions so we need to stop people unknowingly starting very slow operations.
            This setting has no impact if the number of records is less than MAX_RANK_DATA_VALS
            (currently 50,000 records). If set to `False`, an exception is raised if the code is being asked to operate
            on an amount of data which will make it run very slowly. If `True`, the operation is allowed to proceed
             but a message tells the user they can expect the process to take a fairly long time
             (so they don't terminate early on the assumption that something has gone wrong the analysis
             is never going to finish).
    """
    variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_workings: bool = False
    high_volume_ok: bool = False

    def to_result(self) -> CorrelationCalcResult:
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        stats_result = spearmansr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        coords = [Coord(x=x, y=y) for x, y in zip(paired_data.sample_a.vals, paired_data.sample_b.vals, strict=True)]
        pearsonsr_calc_result = spearmansr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals,
            high_volume_ok=self.high_volume_ok)
        regression_result = get_regression_result(xs=paired_data.sample_a.vals,ys=paired_data.sample_b.vals)

        if self.show_workings:
            worked_result = get_worked_result(
                variable_a_values=paired_data.sample_a.vals,
                variable_b_values=paired_data.sample_b.vals,
            )
        else:
            worked_result = None

        correlation_result = CorrelationResult(
            variable_a_name=self.variable_a_name,
            variable_b_name=self.variable_b_name,
            coords=coords,
            stats_result=pearsonsr_calc_result,
            regression_result=regression_result,
            worked_result=worked_result,
            decimal_points=self.decimal_points,
        )

        worked_example = (
            get_worked_example(correlation_result, style_spec.style_name_hyphens) if self.show_workings else '')

        scatterplot_series = ScatterplotSeries(
            coords=coords,
            dot_colour=style_spec.chart.colour_mappings[0].main,
            dot_line_colour=style_spec.chart.major_grid_line_colour,
            show_regression_details=True,
        )
        vars_series = [scatterplot_series, ]
        xs = correlation_result.xs
        ys = correlation_result.ys
        x_min, x_max = get_optimal_min_max(axis_min=min(xs), axis_max=max(xs))
        y_min, y_max = get_optimal_min_max(axis_min=min(ys), axis_max=max(ys))
        chart_conf = ScatterplotConf(
            width_inches=7.5,
            height_inches=4.0,
            inner_background_colour=style_spec.chart.plot_bg_colour,
            text_colour=style_spec.chart.axis_font_colour,
            x_axis_label=self.variable_a_name,
            y_axis_label=self.variable_b_name,
            show_dot_lines=True,
            x_min=x_min,
            x_max=x_max,
            y_min=y_min,
            y_max=y_max,
        )
        fig = get_scatterplot_fig(vars_series, chart_conf)
        image_as_data = plot2image_as_data(fig)
        scatterplot_html = f'<img src="{image_as_data}"/>'

        result = Result(**todict(correlation_result),
            scatterplot_html=scatterplot_html,
            worked_example=worked_example,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Independent Samples T-Test

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.ttest_indep.TTestIndepDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • measure_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field aggregated by group - the analysis compares the mean value of each group. For example, 'Age'

  • grouping_field_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the name of the field used to define the groups compared in the analysis e.g. 'Country'

  • group_a_value (Any, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the analysis will compare the mean value for this group against the mean value of the group defined by group_b_value

  • group_b_value (Any, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the analysis will compare the mean value of this group against the mean value of the group defined by group_a_value

Source code in src/sofastats/output/stats/ttest_indep.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
@dataclass(frozen=False)
class TTestIndepDesign(CommonStatsDesign):
    """
    Args:
        measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
            For example, 'Age'
        grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
        group_a_value: the analysis will compare the mean value for this group
            against the mean value of the group defined by group_b_value
        group_b_value: the analysis will compare the mean value of this group
            against the mean value of the group defined by group_a_value
    """
    measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_a_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    group_b_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    def to_result(self) -> TTestIndepResult:
        ## data
        ## build samples ready for ttest_indep function
        grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
        sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
        sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        ## get result
        stats_result = ttest_indep_stats_calc(sample_a, sample_b)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        ## build samples ready for ttest_indep function
        grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
        sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
            value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
        sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
            table_filter_sql=self.table_filter_sql)
        ## get result
        stats_result = ttest_indep_stats_calc(sample_a, sample_b)

        mpl_pngs.set_gen_mpl_settings(axes_label_size=10, xtick_label_size=8, ytick_label_size=8)
        histograms2show = []
        for group_spec in [stats_result.group_a_spec, stats_result.group_b_spec]:
            try:
                histogram_html = get_embedded_histogram_html(
                    self.measure_field_name, style_spec.chart, group_spec.vals, group_spec.label)
            except Exception as e:
                html_or_msg = f"<b>{group_spec.label}</b> - unable to display histogram. Reason: {e}"
            else:
                html_or_msg = histogram_html
            histograms2show.append(html_or_msg)

        result = Result(**todict(stats_result),
            grouping_field_name=self.grouping_field_name,
            measure_field_name=self.measure_field_name,
            histograms2show=histograms2show,
            decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )

Wilcoxon Signed Ranks

See CommonStatsDesign for details of the to_result() method common to all stats output design dataclasses in sofastats.

sofastats.output.stats.wilcoxon_signed_ranks.WilcoxonSignedRanksDesign dataclass

Bases: CommonStatsDesign

Parameters:

  • variable_a_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the first variable in each pair we are checking for a difference

  • variable_b_name (str, default: DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY ) –

    the second variable in each pair we are checking for a difference

  • show_workings (bool, default: False ) –

    show the workings so you can see how the final results were derived

Source code in src/sofastats/output/stats/wilcoxon_signed_ranks.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
@dataclass(frozen=False)
class WilcoxonSignedRanksDesign(CommonStatsDesign):
    """
    Args:
        variable_a_name: the first variable in each pair we are checking for a difference
        variable_b_name: the second variable in each pair we are checking for a difference
        show_workings: show the workings so you can see how the final results were derived
    """
    variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
    variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY

    show_workings: bool = False

    def to_result(self) -> WilcoxonSignedRanksResult:
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        stats_result = wilcoxon_signed_ranks_stats_calc(
            sample_a=paired_data.sample_a, sample_b=paired_data.sample_b, high_volume_ok=False)
        return stats_result

    def to_html_design(self) -> HTMLItemSpec:
        ## style
        style_spec = get_style_spec(style_name=self.style_name)
        ## data
        paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
            variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
            table_filter_sql=self.table_filter_sql)
        stats_result = wilcoxon_signed_ranks_stats_calc(
            sample_a=paired_data.sample_a, sample_b=paired_data.sample_b, high_volume_ok=False)

        if self.show_workings:
            result_workings = wilcoxon_signed_ranks_for_workings(
                sample_a=paired_data.sample_a, sample_b=paired_data.sample_b,
                label_a=self.variable_a_name, label_b=self.variable_b_name)
            worked_example = get_worked_example(result_workings, style_spec.style_name_hyphens)
        else:
            worked_example = ''

        result = Result(**todict(stats_result),
            worked_example=worked_example,
            decimal_points=self.decimal_points,
        )
        html = get_html(result, style_spec)
        return HTMLItemSpec(
            html_item_str=html,
            output_item_type=OutputItemType.STATS,
            output_title=self.output_title,
            design_name=self.__class__.__name__,
            style_name=self.style_name,
        )