API Documentation

Common Data Types
Sorting
All Designs allow you to control the sort order of values.
One of the options is CUSTOM sorting.
This is entirely optional but can be very useful.
For example, if you have the following values in the Age Group variable:
'<20', '20-39', ... '80+'
you don't want the default alphabetical sorting by value. Otherwise '<20' appears at the end.
If you want to supply a CUSTOM sort order,
all Design objects have sort_orders and sort_orders_yaml_file_path settings.
See CommonDesign
sofastats.conf.main.SortOrder
Bases: StrEnum
Sort orders to apply.
Note - INCREASING & DECREASING only apply to sorting at the final values level.
E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type'
can potentially have sort order by frequency
Source code in src/sofastats/conf/main.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155 | class SortOrder(StrEnum):
"""
Sort orders to apply.
Note - INCREASING & DECREASING only apply to sorting at the final values level.
E.g. If 'Age Group' > 'Handedness' > 'Home Location Type' then only 'Home Location Type'
can potentially have sort order by frequency
"""
CUSTOM = 'by custom order'
"By custom order configured in YAML or dictionary for relevant variable"
DECREASING = 'by decreasing frequency'
"By decreasing frequency"
INCREASING = 'by increasing frequency'
"By increasing frequency"
VALUE = 'by value'
"By value alphabetically sorted"
|
CUSTOM = 'by custom order'
class-attribute
instance-attribute
By custom order configured in YAML or dictionary for relevant variable
DECREASING = 'by decreasing frequency'
class-attribute
instance-attribute
INCREASING = 'by increasing frequency'
class-attribute
instance-attribute
VALUE = 'by value'
class-attribute
instance-attribute
By value alphabetically sorted
Common Parameters
The parameters in CommonDesign are common to all output design dataclasses:
sofastats.output.interfaces.CommonDesign
dataclass
Bases: ABC
Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign.
Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses.
Therefore, we are required to supply defaults for everything in the output dataclasses.
That includes mandatory fields.
So how do we ensure those mandatory field arguments are supplied.
We use a decorator (add_post_init_enforcing_mandatory_cols) to add a post_init handler
which runs CommonDesign.post_init and then enforces the supply of values for every attribute
which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.
Parameters:
-
csv_file_path
(Path | str | None, default:
None
)
–
full file path to CSV file (if using CSV as source)
-
csv_separator
(str, default:
','
)
–
CSV separator (if using CSV as source)
-
cur
(Any | None, default:
None
)
–
dpapi2 cursor i.e. an object able to run cur.execute, cur.fetchall() etc. (if using a cursor as source)
-
database_engine_name
(DbeName | str | None, default:
None
)
–
e.g. DbeName.SQLITE or 'sqlite' (if using a cursor as the source)
-
source_table_name
(str | None, default:
None
)
–
source table name (if using the cursor as a source OR using the internal SOFA SQLite database)
-
table_filter_sql
(str | None, default:
None
)
–
valid SQL to filter the source table - must be in the appropriate SQL dialect
and entities should be quoted appropriately as needed
e.g. SQLite requires backticks for field names with spaces such as `Age Group`
-
style_name
(str, default:
'default'
)
–
e.g. 'default'. Either one of the built-in styles under sofastats.output.styles
or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder
e.g. ~/Documents/sofastats/custom_styles
-
output_file_path
(Path | str | None, default:
None
)
–
full path to folder where output HTML will be generated.
-
output_title
(str | None, default:
None
)
–
the title the HTML output will display in a web browser
-
show_in_web_browser
(bool, default:
True
)
–
if True will open a tab in your default browser to display the output file generated
-
sort_orders
(SortOrderSpecs | None, default:
None
)
–
if supplied, a dictionary that provides the sort orders for any variables given a custom sort order
(SortOrder.CUSTOM). Multiple sort orders can be defined - with each variable given a custom sort order
being a key in the dictionary. Example:
{
Age Group: [
'<20',
'20 to <30', '30 to <40', '40 to <50',
'50 to <60', '60 to <70', '70 to <80',
'80+',
]
}
If the sort order applied was SortOrder.VALUES, we would see '<20' appearing as the last value
by alphabetical order. If a custom order is defined, every value must appear in the list defining the
desired sequence.
Don't supply both sort_orders and sort_orders_yaml_file_path.
-
sort_orders_yaml_file_path
(Path | str | None, default:
None
)
–
file path containing YAML defining custom sort orders. See structure and effect as
discussed under sort_orders. Don't supply both sort_orders and sort_orders_yaml_file_path.
-
decimal_points
(int, default:
3
)
–
defines the maximum number of decimal points displayed.
If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and
1.60000000 as 1.6.
Source code in src/sofastats/output/interfaces.py
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272 | @dataclass(frozen=False)
class CommonDesign(ABC):
"""
Output dataclasses (e.g. ClusteredBoxplotChartDesign) inherit from CommonDesign.
Can't have defaults in CommonDesign attributes (which go first) and then missing defaults for the output dataclasses.
Therefore, we are required to supply defaults for everything in the output dataclasses.
That includes mandatory fields.
So how do we ensure those mandatory field arguments are supplied.
We use a decorator (add_post_init_enforcing_mandatory_cols) to add a __post_init__ handler
which runs CommonDesign.__post_init__ and then enforces the supply of values for every attribute
which has DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY.
Args:
csv_file_path: full file path to CSV file (if using CSV as source)
csv_separator: CSV separator (if using CSV as source)
cur: dpapi2 cursor i.e. an object able to run cur.execute, `cur.fetchall()` etc. (if using a cursor as source)
database_engine_name: e.g. `DbeName.SQLITE` or 'sqlite' (if using a cursor as the source)
source_table_name: source table name (if using the cursor as a source OR using the internal SOFA SQLite database)
table_filter_sql: valid SQL to filter the source table - must be in the appropriate SQL dialect
and entities should be quoted appropriately as needed
e.g. SQLite requires backticks for field names with spaces such as \`Age Group\`
style_name: e.g. 'default'. Either one of the built-in styles under `sofastats.output.styles`
or a custom style defined by YAML in the custom_styles subfolder of the sofastats local folder
e.g. `~/Documents/sofastats/custom_styles`
output_file_path: full path to folder where output HTML will be generated.
output_title: the title the HTML output will display in a web browser
show_in_web_browser: if `True` will open a tab in your default browser to display the output file generated
sort_orders: if supplied, a dictionary that provides the sort orders for any variables given a custom sort order
(`SortOrder.CUSTOM`). Multiple sort orders can be defined - with each variable given a custom sort order
being a key in the dictionary. Example:
```python
{
Age Group: [
'<20',
'20 to <30', '30 to <40', '40 to <50',
'50 to <60', '60 to <70', '70 to <80',
'80+',
]
}
```
If the sort order applied was `SortOrder.VALUES`, we would see '<20' appearing as the last value
by alphabetical order. If a custom order is defined, every value must appear in the list defining the
desired sequence.
Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
sort_orders_yaml_file_path: file path containing YAML defining custom sort orders. See structure and effect as
discussed under `sort_orders`. Don't supply both `sort_orders` and `sort_orders_yaml_file_path`.
decimal_points: defines the maximum number of decimal points displayed.
If set to 3, for example, 1.23456789 will be displayed as 1.235. 1.320000000 will be displayed as 1.32, and
1.60000000 as 1.6.
"""
## inputs ***********************************
csv_file_path: Path | str | None = None
csv_separator: str = ','
cur: Any | None = None
database_engine_name: DbeName | str | None = None
source_table_name: str | None = None
table_filter_sql: str | None = None
## outputs **********************************
style_name: str = 'default'
output_file_path: Path | str | None = None
output_title: str | None = None
show_in_web_browser: bool = True
sort_orders: SortOrderSpecs | None = None
sort_orders_yaml_file_path: Path | str | None = None
decimal_points: int = 3
@abstractmethod
def to_html_design(self) -> HTMLItemSpec:
"""
From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
Also return the style name and output item type e.g. whether a chart, table, or statistical output
"""
pass
def _handle_inputs(self):
"""
There are three main paths for specific data values to be supplied to the design:
1. CSV - data will be ingested into internal sofastats SQLite database
(`source_table_name` optional - later analyses might be referring to that ingested table
so nice to let user choose the name)
2. `cur`, `database_engine_name`, and `source_table_name`
3. or just a `source_table_name` (assumed to be using internal sofastats SQLite database)
Any supplied cursors are "wrapped" inside an `ExtendedCursor` so we can use `.exe()` instead of `.execute()`
and to provide better error messages on query failure.
Client code supplies `database_engine_name` rather than dbe_spec for simplicity but internally
`CommonDesign` supplies all code that inherits from it a `dbe_spec` attribute ready to use.
Settings are validated e.g. to prevent client code supplying both CSV settings and database settings.
"""
if self.csv_file_path:
if self.cur or self.database_engine_name or self.source_table_name or self.table_filter_sql:
raise Exception("If supplying a CSV path don't also supply database requirements")
if not self.csv_separator:
self.csv_separator = ','
if not SQLITE_DB.get('sqlite_default_cur'):
SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
self.cur = SQLITE_DB['sqlite_default_cur']
self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
if not self.source_table_name:
self.source_table_name = get_safer_name(Path(self.csv_file_path).stem)
## ingest CSV into database
df = pd.read_csv(self.csv_file_path, sep=self.csv_separator)
try:
df.to_sql(self.source_table_name, SQLITE_DB['sqlite_default_con'], if_exists='replace', index=False)
except Exception as e: ## TODO: supply more specific exception
logger.info(f"Failed at attempt to ingest CSV from '{self.csv_file_path}' "
f"into internal pysofa SQLite database as table '{self.source_table_name}'.\nError: {e}")
else:
logger.info(f"Successfully ingested CSV from '{self.csv_file_path}' "
f"into internal pysofa SQLite database as table '{self.source_table_name}'")
elif self.cur:
self.cur = ExtendedCursor(self.cur)
if not self.database_engine_name:
supported_names = '"' + '", "'.join(name.value for name in DbeName) + '"'
raise Exception("When supplying a cursor, a database_engine_name must also be supplied. "
f"Supported names currently are: {supported_names}")
else:
self.dbe_spec = get_dbe_spec(self.database_engine_name)
if not self.source_table_name:
raise Exception("When supplying a cursor, a source_table_name must also be supplied")
elif self.source_table_name:
if not SQLITE_DB.get('sqlite_default_cur'):
SQLITE_DB['sqlite_default_con'] = sqlite.connect(INTERNAL_DATABASE_FPATH)
SQLITE_DB['sqlite_default_cur'] = ExtendedCursor(SQLITE_DB['sqlite_default_con'].cursor())
self.cur = SQLITE_DB['sqlite_default_cur'] ## not already set if in the third path - will have gone down first
if self.database_engine_name and self.database_engine_name != DbeName.SQLITE:
raise Exception("If not supplying a csv_file_path, or a cursor, the only permitted database engine is "
"SQLite (the dbe of the internal sofastats SQLite database)")
self.dbe_spec = get_dbe_spec(DbeName.SQLITE)
else:
raise Exception("Either supply a path to a CSV "
"(optional tbl_name for when ingested into internal sofastats SQLite database), "
"a cursor (with dbe_name and tbl_name), "
"or a tbl_name (data assumed to be in internal sofastats SQLite database)")
def _handle_outputs(self):
"""
Validate configuration and provide sane defaults for `output_title` and `output_file_path` if nothing set.
"""
## output file path and title
nice_name = '_'.join(self.__module__.split('.')[-2:]) + f"_{self.__class__.__name__}"
if not self.output_file_path:
now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
self.output_file_path = Path.cwd() / f"{nice_name}_{now}.html"
if not self.output_title:
self.output_title = f"{nice_name} Output"
## sort orders
if self.sort_orders:
if self.sort_orders_yaml_file_path:
raise Exception("Oops - it looks like you supplied settings for both sort_orders "
"and sort_orders_yaml_file_path. Please set one or both of them to None.")
else:
pass
elif self.sort_orders_yaml_file_path:
yaml = YAML(typ='safe') ## default, if not specified, is 'rt' (round-trip)
self.sort_orders = yaml.load(Path(self.sort_orders_yaml_file_path)) ## might be a str or Path so make sure
else:
self.sort_orders = {}
def __post_init__(self):
self._handle_inputs()
self._handle_outputs()
for field in fields(self):
if self.__getattribute__(field.name) == DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY:
## raise a friendly error for when they didn't supply a mandatory field that technically had a default (DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY), but we want to insist they supply a real value
client_module = self.__module__.split('.')[-1]
nice_name = f"{client_module}.{self.__class__.__name__}" ## e.g. anova.AnovaDesign
raise Exception(f"Oops - you need to supply a value for {field.name} in your {nice_name}")
def __repr_html__(self):
return self.__str__
def make_output(self):
"""
Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
and open in web browser if `show_in_web_browser=True`.
"""
self.to_html_design().to_file(fpath=self.output_file_path)
if self.show_in_web_browser:
open_new_tab(url=f"file://{self.output_file_path}")
|
to_html_design() -> HTMLItemSpec
abstractmethod
From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
Also return the style name and output item type e.g. whether a chart, table, or statistical output
Source code in src/sofastats/output/interfaces.py
155
156
157
158
159
160
161 | @abstractmethod
def to_html_design(self) -> HTMLItemSpec:
"""
From the design produce the HTML to display as one of the attributes of the HTMLItemSpec.
Also return the style name and output item type e.g. whether a chart, table, or statistical output
"""
pass
|
make_output()
Produce HTML output, e.g. charts and numerical results, save to output_file_path,
and open in web browser if show_in_web_browser=True.
Source code in src/sofastats/output/interfaces.py
265
266
267
268
269
270
271
272 | def make_output(self):
"""
Produce HTML output, e.g. charts and numerical results, save to `output_file_path`,
and open in web browser if `show_in_web_browser=True`.
"""
self.to_html_design().to_file(fpath=self.output_file_path)
if self.show_in_web_browser:
open_new_tab(url=f"file://{self.output_file_path}")
|
Charts
Area Charts
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See AreaChartDesign for the parameters
configuring individual area chart designs.
sofastats.output.charts.area.AreaChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
category_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
name of field in the x-axis
-
category_sort_order
(SortOrder | str, default:
VALUE
)
–
define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
is_time_series
(bool, default:
False
)
–
space x-axis labels according to time e.g. there might be variable gaps between items
-
show_major_ticks_only
(bool, default:
True
)
–
-
show_markers
(bool, default:
True
)
–
show markers on the line bounding the area
-
rotate_x_labels
(bool, default:
False
)
–
make x-axis labels vertical
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
x_axis_font_size
(int, default:
12
)
–
font size for x-axis labels
-
y_axis_title
(str, default:
'Freq'
)
–
title displayed vertically alongside y-axis
Source code in src/sofastats/output/charts/area.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178 | @dataclass(frozen=False)
class AreaChartDesign(CommonDesign):
"""
Args:
category_field_name: name of field in the x-axis
category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
is_time_series: space x-axis labels according to time e.g. there might be variable gaps between items
show_major_ticks_only: suppress minor ticks
show_markers: show markers on the line bounding the area
rotate_x_labels: make x-axis labels vertical
show_n_records: show the number of records the chart is based on
x_axis_font_size: font size for x-axis labels
y_axis_title: title displayed vertically alongside y-axis
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder | str = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
table_filter_sql=self.table_filter_sql)
## chart details
charting_spec = AreaChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.area.MultiChartAreaChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder | str, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/area.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235 | @dataclass(frozen=False)
class MultiChartAreaChartDesign(CommonDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder | str = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder | str = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order, chart_sort_order=self.category_sort_order,
table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
## chart details
charting_spec = AreaChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Bar Charts
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See SimpleBarChartDesign for the parameters
configuring individual bar chart designs.
sofastats.output.charts.bar.CommonBarDesign
dataclass
Bases: CommonDesign
Parameters:
-
metric
(ChartMetric, default:
FREQ
)
–
defines what bar heights represent - whether ChartMetric.FREQ, ChartMetric.PCT, etc.
-
field_name
(str | None, default:
None
)
–
the name of the field being aggregated when the metric is an aggregate
e.g. ChartMetric.AVG or ChartMetric.SUM
-
y_axis_title
(str | None, default:
None
)
–
title displayed vertically alongside y-axis
Source code in src/sofastats/output/charts/bar.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68 | @dataclass(frozen=False)
class CommonBarDesign(CommonDesign):
"""
Args:
metric: defines what bar heights represent - whether ChartMetric.FREQ, ChartMetric.PCT, etc.
field_name: the name of the field being aggregated when the metric is an aggregate
e.g. ChartMetric.AVG or ChartMetric.SUM
y_axis_title: title displayed vertically alongside y-axis
"""
metric: ChartMetric = ChartMetric.FREQ
field_name: str | None = None
y_axis_title: str | None = None
def __post_init__(self):
super().__post_init__()
if self.y_axis_title is None: ##TODO - no field name unless aggregating
if self.metric == ChartMetric.AVG:
self.y_axis_title = f"Average {self.field_name}"
elif self.metric == ChartMetric.FREQ:
self.y_axis_title = 'Frequency'
elif self.metric == ChartMetric.PCT:
self.y_axis_title = 'Percent'
elif self.metric == ChartMetric.SUM:
self.y_axis_title = f"Summed {self.field_name}"
else:
raise ValueError(f'Metric {self.metric} is not supported.')
if self.field_name is None:
if self.metric in (ChartMetric.AVG, ChartMetric.SUM):
raise ValueError("A field_name must be set if the metric aggregates "
"e.g. ChartMetric.AVG or ChartMetric.SUM")
else:
if self.metric not in (ChartMetric.AVG, ChartMetric.SUM):
raise ValueError("A field_name should only be supplied if the metric aggregates "
"e.g. ChartMetric.AVG or ChartMetric.SUM")
@abstractmethod
def to_html_design(self) -> HTMLItemSpec:
pass
|
sofastats.output.charts.bar.SimpleBarChartDesign
dataclass
Bases: CommonBarDesign
Parameters:
-
category_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
name of field in the x-axis
-
category_sort_order
(SortOrder, default:
VALUE
)
–
define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
rotate_x_labels
(bool, default:
False
)
–
make x-axis labels vertical
-
show_borders
(bool, default:
False
)
–
show a coloured border around the bars
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
x_axis_font_size
(int, default:
12
)
–
font size for x-axis labels
Source code in src/sofastats/output/charts/bar.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119 | @dataclass(frozen=False)
class SimpleBarChartDesign(CommonBarDesign):
"""
Args:
category_field_name: name of field in the x-axis
category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
rotate_x_labels: make x-axis labels vertical
show_borders: show a coloured border around the bars
show_n_records: show the number of records the chart is based on
x_axis_font_size: font size for x-axis labels
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
rotate_x_labels: bool = False
show_borders: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = from_data.get_by_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name, sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
metric=self.metric, field_name=self.field_name,
table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
## chart details
charting_spec = BarChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec) ## see get_indiv_chart_html() below
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.bar.MultiChartBarChartDesign
dataclass
Bases: CommonBarDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/bar.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172 | @dataclass(frozen=False)
class MultiChartBarChartDesign(CommonBarDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
metric: ChartMetric = ChartMetric.FREQ
rotate_x_labels: bool = False
show_borders: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = from_data.get_by_chart_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name, chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order, chart_sort_order=self.chart_sort_order,
metric=self.metric, field_name=self.field_name,
table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
## charts details
charting_spec = BarChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.bar.ClusteredBarChartDesign
dataclass
Bases: CommonBarDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/bar.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225 | @dataclass(frozen=False)
class ClusteredBarChartDesign(CommonBarDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
metric: ChartMetric = ChartMetric.FREQ
rotate_x_labels: bool = False
show_borders: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = from_data.get_by_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name, series_field_name=self.series_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order, series_sort_order=self.series_sort_order,
metric=self.metric, field_name=self.field_name,
table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
## chart details
charting_spec = BarChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.bar.MultiChartClusteredBarChartDesign
dataclass
Bases: CommonBarDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/bar.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288 | @dataclass(frozen=False)
class MultiChartClusteredBarChartDesign(CommonBarDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
metric: ChartMetric = ChartMetric.FREQ
rotate_x_labels: bool = False
show_borders: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = from_data.get_by_chart_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
series_field_name=self.series_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
series_sort_order=self.series_sort_order,
chart_sort_order=self.chart_sort_order,
metric=self.metric, field_name=self.field_name,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points)
## chart details
charting_spec = BarChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.bar.MultiChartClusteredBarChartDesign
dataclass
Bases: CommonBarDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/bar.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288 | @dataclass(frozen=False)
class MultiChartClusteredBarChartDesign(CommonBarDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate bars within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
metric: ChartMetric = ChartMetric.FREQ
rotate_x_labels: bool = False
show_borders: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = from_data.get_by_chart_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
series_field_name=self.series_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
series_sort_order=self.series_sort_order,
chart_sort_order=self.chart_sort_order,
metric=self.metric, field_name=self.field_name,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points)
## chart details
charting_spec = BarChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Box Plots
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See BoxplotChartDesign for the parameters
configuring individual box plot chart designs.
sofastats.output.charts.box_plot.BoxplotChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
field summarised in each box
-
category_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
name of field in the x-axis
-
category_sort_order
(SortOrder, default:
VALUE
)
–
define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
box_plot_type
(BoxplotType, default:
INSIDE_1_POINT_5_TIMES_IQR
)
–
options for what the boxes represent and whether outliers are displayed or not.
-
rotate_x_labels
(bool, default:
False
)
–
make x-axis labels vertical
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
x_axis_font_size
(int, default:
12
)
–
font size for x-axis labels
Source code in src/sofastats/output/charts/box_plot.py
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411 | @dataclass(frozen=False)
class BoxplotChartDesign(CommonDesign):
"""
Args:
field_name: field summarised in each box
category_field_name: name of field in the x-axis
category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
box_plot_type: options for what the boxes represent and whether outliers are displayed or not.
rotate_x_labels: make x-axis labels vertical
show_n_records: show the number of records the chart is based on
x_axis_font_size: font size for x-axis labels
"""
field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
box_plot_type: BoxplotType = BoxplotType.INSIDE_1_POINT_5_TIMES_IQR
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
field_name=self.field_name,
category_field_name=self.category_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
table_filter_sql=self.table_filter_sql,
box_plot_type=self.box_plot_type)
## charts details
categories = [
category_vals_spec.category_val for category_vals_spec in intermediate_charting_spec.category_vals_specs]
indiv_chart_spec = intermediate_charting_spec.to_indiv_chart_spec()
charting_spec = BoxplotChartingSpec(
categories=categories,
indiv_chart_specs=[indiv_chart_spec, ],
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=intermediate_charting_spec.field_name,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.box_plot.ClusteredBoxplotChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate boxes within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series within each category cluster e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/box_plot.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468 | @dataclass(frozen=False)
class ClusteredBoxplotChartDesign(CommonDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate boxes within each category cluster for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series within each category cluster e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
box_plot_type: BoxplotType = BoxplotType.INSIDE_1_POINT_5_TIMES_IQR
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
field_name=self.field_name,
category_field_name=self.category_field_name,
series_field_name=self.series_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
series_sort_order=self.series_sort_order,
table_filter_sql=self.table_filter_sql,
box_plot_type=self.box_plot_type)
## charts details
categories = [category_vals_spec.category_val
for category_vals_spec in intermediate_charting_spec.series_category_vals_specs[0].category_vals_specs]
indiv_chart_spec = intermediate_charting_spec.to_indiv_chart_spec(dp=self.decimal_points)
charting_spec = BoxplotChartingSpec(
categories=categories,
indiv_chart_specs=[indiv_chart_spec, ],
series_legend_label=intermediate_charting_spec.series_field,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
x_axis_title=intermediate_charting_spec.category_field,
y_axis_title=intermediate_charting_spec.field,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Histograms
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See HistogramChartDesign for the parameters
configuring individual histogram chart designs.
sofastats.output.charts.histogram.HistogramChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
field summarised in each box
-
show_borders
(bool, default:
False
)
–
show a coloured border around the bars
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
show_normal_curve
(bool, default:
True
)
–
if True display normal curve on the chart
-
x_axis_font_size
(int, default:
12
)
–
font size for x-axis labels
Source code in src/sofastats/output/charts/histogram.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329 | @dataclass(frozen=False)
class HistogramChartDesign(CommonDesign):
"""
Args:
field_name: field summarised in each box
show_borders: show a coloured border around the bars
show_n_records: show the number of records the chart is based on
show_normal_curve: if `True` display normal curve on the chart
x_axis_font_size: font size for x-axis labels
"""
field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_borders: bool = False
show_n_records: bool = True
show_normal_curve: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_vals_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
field_name=self.field_name, table_filter_sql=self.table_filter_sql, decimal_points=self.decimal_points)
bin_labels = intermediate_charting_spec.to_bin_labels()
x_axis_min_val, x_axis_max_val = intermediate_charting_spec.to_x_axis_range()
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = HistoChartingSpec(
bin_labels=bin_labels,
indiv_chart_specs=indiv_chart_specs,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
show_normal_curve=self.show_normal_curve,
var_label=intermediate_charting_spec.field_name,
x_axis_font_size=self.x_axis_font_size,
x_axis_max_val=x_axis_max_val,
x_axis_min_val=x_axis_min_val,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.histogram.MultiChartHistogramChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/histogram.py
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383 | @dataclass(frozen=False)
class MultiChartHistogramChartDesign(CommonDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
show_borders: bool = False
show_n_records: bool = True
show_normal_curve: bool = True
x_axis_font_size: int = 12
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
field_name=self.field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points,
)
x_axis_min_val, x_axis_max_val = intermediate_charting_spec.to_x_axis_range()
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = HistoChartingSpec(
bin_labels=intermediate_charting_spec.to_bin_labels(),
indiv_chart_specs=indiv_chart_specs,
show_borders=self.show_borders,
show_n_records=self.show_n_records,
show_normal_curve=self.show_normal_curve,
var_label=intermediate_charting_spec.field_name,
x_axis_font_size=self.x_axis_font_size,
x_axis_max_val=x_axis_max_val,
x_axis_min_val=x_axis_min_val,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Line Charts
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See LineChartDesign for the parameters
configuring individual line chart designs.
sofastats.output.charts.line.LineChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
category_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
name of field in the x-axis
-
category_sort_order
(SortOrder, default:
VALUE
)
–
define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
is_time_series
(bool, default:
False
)
–
space x-axis labels according to time e.g. there might be variable gaps between items
-
show_major_ticks_only
(bool, default:
True
)
–
-
show_markers
(bool, default:
True
)
–
show markers on the line bounding the area
-
show_smooth_line
(bool, default:
False
)
–
if True also show smoothed version of line
-
show_trend_line
(bool, default:
False
)
–
if True also show trend line
-
rotate_x_labels
(bool, default:
False
)
–
make x-axis labels vertical
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
x_axis_font_size
(int, default:
12
)
–
font size for x-axis labels
-
y_axis_title
(str, default:
'Freq'
)
–
title displayed vertically alongside y-axis
Source code in src/sofastats/output/charts/line.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311 | @dataclass(frozen=False)
class LineChartDesign(CommonDesign):
"""
Args:
category_field_name: name of field in the x-axis
category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
is_time_series: space x-axis labels according to time e.g. there might be variable gaps between items
show_major_ticks_only: suppress minor ticks
show_markers: show markers on the line bounding the area
show_smooth_line: if `True` also show smoothed version of line
show_trend_line: if `True` also show trend line
rotate_x_labels: make x-axis labels vertical
show_n_records: show the number of records the chart is based on
x_axis_font_size: font size for x-axis labels
y_axis_title: title displayed vertically alongside y-axis
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
show_smooth_line: bool = False
show_trend_line: bool = False
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points,
)
## chart details
charting_spec = LineChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
show_smooth_line=self.show_smooth_line,
show_trend_line=self.show_trend_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.line.MultiChartLineChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/line.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437 | @dataclass(frozen=False)
class MultiChartLineChartDesign(CommonDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
show_smooth_line: bool = False
show_trend_line: bool = False
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points,
)
## chart details
charting_spec = LineChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=None,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
show_smooth_line=self.show_smooth_line,
show_trend_line=self.show_trend_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.line.MultiLineChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series in legend e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/line.py
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373 | @dataclass(frozen=False)
class MultiLineChartDesign(CommonDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series in legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
show_smooth_line: bool = False
show_trend_line: bool = False
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name, series_field_name=self.series_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order, series_sort_order=self.series_sort_order,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points,
)
## chart details
charting_spec = LineChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
show_smooth_line=self.show_smooth_line,
show_trend_line=self.show_trend_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.line.MultiChartMultiLineChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series in legend e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/line.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508 | @dataclass(frozen=False)
class MultiChartMultiLineChartDesign(CommonDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might generate separate lines with different colours for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series in legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
is_time_series: bool = False
show_major_ticks_only: bool = True
show_markers: bool = True
show_smooth_line: bool = False
show_trend_line: bool = False
rotate_x_labels: bool = False
show_n_records: bool = True
x_axis_font_size: int = 12
y_axis_title: str = 'Freq'
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_series_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
series_field_name=self.series_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order,
series_sort_order=self.series_sort_order,
chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql,
decimal_points=self.decimal_points,
)
## chart details
charting_spec = LineChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
series_legend_label=intermediate_charting_spec.series_field_name,
rotate_x_labels=self.rotate_x_labels,
show_n_records=self.show_n_records,
is_time_series=self.is_time_series,
show_major_ticks_only=self.show_major_ticks_only,
show_markers=self.show_markers,
show_smooth_line=self.show_smooth_line,
show_trend_line=self.show_trend_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.category_field_name,
y_axis_title=self.y_axis_title,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Pie Charts
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See PieChartDesign for the parameters
configuring individual pie chart designs.
sofastats.output.charts.pie.PieChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
category_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
name of field in the x-axis
-
category_sort_order
(SortOrder, default:
VALUE
)
–
define order of categories in each chart e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
show_n_records
(bool, default:
(True,)
)
–
show the number of records the chart is based on
Source code in src/sofastats/output/charts/pie.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239 | @dataclass(frozen=False)
class PieChartDesign(CommonDesign):
"""
Args:
category_field_name: name of field in the x-axis
category_sort_order: define order of categories in each chart e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
show_n_records: show the number of records the chart is based on
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
show_n_records: bool = True,
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name,
sort_orders=self.sort_orders, category_sort_order=self.category_sort_order,
table_filter_sql=self.table_filter_sql)
## charts details
charting_spec = PieChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=[intermediate_charting_spec.to_indiv_chart_spec(), ],
show_n_records=self.show_n_records,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.pie.MultiChartPieChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/pie.py
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281 | @dataclass(frozen=False)
class MultiChartPieChartDesign(CommonDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
category_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
category_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
show_n_records: bool = True,
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_category_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
category_field_name=self.category_field_name, chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
category_sort_order=self.category_sort_order, chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql)
## charts details
charting_spec = PieChartingSpec(
categories=intermediate_charting_spec.sorted_categories,
indiv_chart_specs=intermediate_charting_spec.to_indiv_chart_specs(),
show_n_records=self.show_n_records,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Scatter Plots
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
See SimpleScatterChartDesign for the parameters
configuring individual scatter plot chart designs.
sofastats.output.charts.scatter_plot.SimpleScatterChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
x_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
field defining the x value of each x-y pair
-
y_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
field defining the y value of each x-y pair
-
show_dot_borders
(bool, default:
True
)
–
if Tue show borders around individual dots
-
show_n_records
(bool, default:
True
)
–
show the number of records the chart is based on
-
show_regression_line
(bool, default:
True
)
–
if True show regression line of best fit
-
x_axis_font_size
(int, default:
10
)
–
font size for x-axis labels
Source code in src/sofastats/output/charts/scatter_plot.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330 | @dataclass(frozen=False)
class SimpleScatterChartDesign(CommonDesign):
"""
Args:
x_field_name: field defining the x value of each x-y pair
y_field_name: field defining the y value of each x-y pair
show_dot_borders: if `Tue` show borders around individual dots
show_n_records: show the number of records the chart is based on
show_regression_line: if `True` show regression line of best fit
x_axis_font_size: font size for x-axis labels
"""
x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_dot_borders: bool = True
show_n_records: bool = True
show_regression_line: bool = True
x_axis_font_size: int = 10
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_xy_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
x_field_name=self.x_field_name, y_field_name=self.y_field_name,
table_filter_sql=self.table_filter_sql)
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = ScatterChartingSpec(
indiv_chart_specs=indiv_chart_specs,
series_legend_label=None,
show_dot_borders=self.show_dot_borders,
show_n_records=self.show_n_records,
show_regression_line=self.show_regression_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.x_field_name,
y_axis_title=intermediate_charting_spec.y_field_name,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.scatter_plot.MultiChartScatterChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/scatter_plot.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434 | @dataclass(frozen=False)
class MultiChartScatterChartDesign(CommonDesign):
"""
Args:
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
show_dot_borders: bool = True
show_n_records: bool = True
show_regression_line: bool = True
x_axis_font_size: int = 10
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_xy_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
x_field_name=self.x_field_name, y_field_name=self.y_field_name,
chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql)
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = ScatterChartingSpec(
indiv_chart_specs=indiv_chart_specs,
series_legend_label=None,
show_dot_borders=self.show_dot_borders,
show_n_records=self.show_n_records,
show_regression_line=self.show_regression_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.x_field_name,
y_axis_title=intermediate_charting_spec.y_field_name,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.scatter_plot.BySeriesScatterChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series in the legend e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/scatter_plot.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382 | @dataclass(frozen=False)
class BySeriesScatterChartDesign(CommonDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series in the legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
show_dot_borders: bool = True
show_n_records: bool = True
show_regression_line: bool = True
x_axis_font_size: int = 10
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_series_xy_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
x_field_name=self.x_field_name, y_field_name=self.y_field_name,
series_field_name=self.series_field_name,
sort_orders=self.sort_orders,
series_sort_order=self.series_sort_order,
table_filter_sql=self.table_filter_sql)
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = ScatterChartingSpec(
indiv_chart_specs=indiv_chart_specs,
series_legend_label=intermediate_charting_spec.series_field_name,
show_dot_borders=self.show_dot_borders,
show_n_records=self.show_n_records,
show_regression_line=self.show_regression_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.x_field_name,
y_axis_title=intermediate_charting_spec.y_field_name,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.charts.scatter_plot.MultiChartBySeriesScatterChartDesign
dataclass
Bases: CommonDesign
Parameters:
-
series_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the series e.g. a series_field_name of 'Country'
might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
series_sort_order
(SortOrder, default:
VALUE
)
–
define order of series in the legend e.g. SortOrder.VALUES or SortOrder.CUSTOM
-
chart_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the field name defining the charts e.g. a chart_field_name of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
-
chart_sort_order
(SortOrder, default:
VALUE
)
–
define order of charts e.g. SortOrder.VALUES or SortOrder.CUSTOM
Source code in src/sofastats/output/charts/scatter_plot.py
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491 | @dataclass(frozen=False)
class MultiChartBySeriesScatterChartDesign(CommonDesign):
"""
Args:
series_field_name: the field name defining the series e.g. a `series_field_name` of 'Country'
might separate generate different colour dots for 'USA', 'NZ', 'Denmark', and 'South Korea'.
series_sort_order: define order of series in the legend e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
chart_field_name: the field name defining the charts e.g. a `chart_field_name` of 'Country'
might separate generate charts for 'USA', 'NZ', 'Denmark', and 'South Korea'.
chart_sort_order: define order of charts e.g. `SortOrder.VALUES` or `SortOrder.CUSTOM`
"""
x_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
y_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
series_sort_order: SortOrder = SortOrder.VALUE
chart_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
chart_sort_order: SortOrder = SortOrder.VALUE
show_dot_borders: bool = True
show_n_records: bool = True
show_regression_line: bool = True
x_axis_font_size: int = 10
def to_html_design(self) -> HTMLItemSpec:
# style
style_spec = get_style_spec(style_name=self.style_name)
## data
intermediate_charting_spec = get_by_chart_series_xy_charting_spec(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
x_field_name=self.x_field_name, y_field_name=self.y_field_name,
series_field_name=self.series_field_name, chart_field_name=self.chart_field_name,
sort_orders=self.sort_orders,
series_sort_order=self.series_sort_order, chart_sort_order=self.chart_sort_order,
table_filter_sql=self.table_filter_sql)
## charts details
indiv_chart_specs = intermediate_charting_spec.to_indiv_chart_specs()
charting_spec = ScatterChartingSpec(
indiv_chart_specs=indiv_chart_specs,
series_legend_label=intermediate_charting_spec.series_field_name,
show_dot_borders=self.show_dot_borders,
show_n_records=self.show_n_records,
show_regression_line=self.show_regression_line,
x_axis_font_size=self.x_axis_font_size,
x_axis_title=intermediate_charting_spec.x_field_name,
y_axis_title=intermediate_charting_spec.y_field_name,
)
## output
html = get_html(charting_spec, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.CHART,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Tables
See CommonDesign
for the parameters common to all output design dataclasses in sofastats - for example, style_name.
DimensionSpec defines the main parameters of both the Row and Column table dimensions.
The only parameter Row and Column adds is the appropriate setting for is_col.
sofastats.output.tables.interfaces.DimensionSpec
dataclass
Parameters:
-
variable_name
(str)
–
-
has_total
(bool, default:
False
)
–
-
is_col
(bool, default:
False
)
–
-
pct_metrics
(Collection[Metric] | None, default:
None
)
–
define which metrics to display - options: Metric.ROW_PCT and Metric.COL_PCT
-
sort_order
(SortOrder | str, default:
VALUE
)
–
-
child
(Self | None, default:
None
)
–
a child DimensionSpec if nesting underneath
Source code in src/sofastats/output/tables/interfaces.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101 | @dataclass(frozen=False)
class DimensionSpec:
"""
Args:
variable_name: name of variable
has_total: if `True` add a total
is_col: if `True` is a column
pct_metrics: define which metrics to display - options: `Metric.ROW_PCT` and `Metric.COL_PCT`
sort_order: sort order of variable
child: a child DimensionSpec if nesting underneath
"""
variable_name: str
has_total: bool = False
is_col: bool = False
pct_metrics: Collection[Metric] | None = None
sort_order: SortOrder | str = SortOrder.VALUE
child: Self | None = None
@property
def descendant_vars(self) -> list[str]:
"""
All variables under, but not including, this DimensionSpec.
Note - only includes chains, not trees, as a deliberate design choice to avoid excessively complicated tables.
Tables are for computers to make, but for humans to read and understand :-).
"""
dim_vars = []
if self.child:
dim_vars.append(self.child.variable_name)
dim_vars.extend(self.child.descendant_vars)
return dim_vars
@property
def self_and_descendants(self) -> list[Self]:
"""
All DimensionSpecs under, and including, this DimensionSpec.
"""
dims = [self, ]
if self.child:
dims.extend(self.child.self_and_descendants)
return dims
@property
def self_and_descendant_vars(self) -> list[str]:
"""
All variable names under, and including, this DimensionSpec.
"""
return [dim.variable_name for dim in self.self_and_descendants]
@property
def self_and_descendant_totalled_vars(self) -> list[str]:
"""
All variables under, and including, this DimensionSpec that are totalled (if any).
"""
return [dim.variable_name for dim in self.self_and_descendants if dim.has_total]
@property
def self_or_descendant_pct_metrics(self) -> Collection[Metric] | None:
"""
All percentage metrics (row and/or column percentages) under, or for, this DimensionSpec.
"""
if self.pct_metrics:
return self.pct_metrics
elif self.child:
return self.child.self_or_descendant_pct_metrics
else:
return None
def __post_init__(self):
if self.pct_metrics:
if self.child:
raise ValueError(f"Metrics are only for terminal dimension specs e.g. a > b > c (can have metrics)")
if not self.is_col:
raise ValueError(f"Metrics are only for terminal column specs, yet this is a row spec")
if self.child:
if not self.is_col == self.child.is_col:
raise ValueError(f"This dim has a child that is inconsistent e.g. a col parent having a row child")
if self.variable_name in self.descendant_vars:
raise ValueError("Variables can't be repeated in the same dimension spec "
f"e.g. Car > Country > Car. Variable {self.variable_name}")
|
sofastats.output.tables.interfaces.Row
dataclass
Bases: DimensionSpec
Source code in src/sofastats/output/tables/interfaces.py
| @dataclass(frozen=False)
class Row(DimensionSpec):
def __post_init__(self):
self.is_col = False
super().__post_init__()
|
sofastats.output.tables.interfaces.Column
dataclass
Bases: DimensionSpec
Source code in src/sofastats/output/tables/interfaces.py
| @dataclass(frozen=False)
class Column(DimensionSpec):
def __post_init__(self):
self.is_col = True
super().__post_init__()
|
sofastats.output.tables.freq.FrequencyTableDesign
dataclass
Bases: CommonDesign
Parameters:
-
row_variable_designs
(list[Row], default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
-
include_column_percent
(bool, default:
False
)
–
if True add a column percentage column
Source code in src/sofastats/output/tables/freq.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216 | @dataclass(frozen=False, kw_only=True)
class FrequencyTableDesign(CommonDesign):
"""
Args:
row_variable_designs: list of Rows
include_column_percent: if `True` add a column percentage column
"""
row_variable_designs: list[Row] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
include_column_percent: bool = False
debug: bool = False
verbose: bool = False
@property
def totalled_vars(self) -> list[str]:
tot_vars = []
for row_spec in self.row_variable_designs:
tot_vars.extend(row_spec.self_and_descendant_totalled_vars)
return tot_vars
@property
def max_row_depth(self) -> int:
max_depth = 0
for row_spec in self.row_variable_designs:
row_depth = len(row_spec.self_and_descendant_vars)
if row_depth > max_depth:
max_depth = row_depth
return max_depth
def __post_init__(self):
CommonDesign.__post_init__(self)
row_vars = [spec.variable_name for spec in self.row_variable_designs]
row_dupes = set()
seen = set()
for row_var in row_vars:
if row_var in seen:
row_dupes.add(row_var)
else:
seen.add(row_var)
if row_dupes:
raise ValueError(f"Duplicate top-level variable(s) detected in row dimension - {sorted(row_dupes)}")
def get_row_df(self, cur, *, row_idx: int, dp: int = 2) -> pd.DataFrame:
"""
See cross_tab docs
"""
row_spec = self.row_variable_designs[row_idx]
totalled_variables = row_spec.self_and_descendant_totalled_vars
row_vars = row_spec.self_and_descendant_vars
data = get_data_from_spec(cur, dbe_spec=self.dbe_spec,
source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
all_variables=row_vars, totalled_variables=totalled_variables, debug=self.debug)
n_row_fillers = self.max_row_depth - len(row_vars)
df = get_all_metrics_df_from_vars(data, row_vars=row_vars, n_row_fillers=n_row_fillers,
inc_col_pct=self.include_column_percent,
dp=dp, debug=self.debug)
return df
def get_tbl_df(self, cur) -> pd.DataFrame:
"""
See cross_tab docs
"""
dfs = [self.get_row_df(cur, row_idx=row_idx, dp=self.decimal_points)
for row_idx in range(len(self.row_variable_designs))]
df_t = dfs[0].T
dfs_remaining = dfs[1:]
for df_next in dfs_remaining:
df_t = df_t.join(df_next.T, how='outer')
df = df_t.T ## re-transpose back so cols are cols and rows are rows again
if self.debug: print(f"\nCOMBINED:\n{df}")
## Sorting indexes
raw_df = get_raw_df(cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name)
order_rules_for_multi_index_branches = get_order_rules_for_multi_index_branches(self.row_variable_designs)
## ROWS
unsorted_row_multi_index_list = list(df.index)
sorted_row_multi_index_list = get_sorted_multi_index_list(
unsorted_row_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_multi_index_branches,
sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=False, debug=self.debug)
sorted_row_multi_index = pd.MultiIndex.from_tuples(
sorted_row_multi_index_list) ## https://pandas.pydata.org/docs/user_guide/advanced.html
sorted_col_multi_index_list = sorted(
df.columns, key=lambda metric_label_and_metric: get_metric2order(metric_label_and_metric[1]))
sorted_col_multi_index = pd.MultiIndex.from_tuples(sorted_col_multi_index_list)
df = df.reindex(index=sorted_row_multi_index, columns=sorted_col_multi_index)
if self.debug: print(f"\nORDERED:\n{df}")
return df
def to_html_design(self) -> HTMLItemSpec:
get_tbl_df_for_cur = partial(self.get_tbl_df)
df = get_tbl_df_for_cur(self.cur)
pd_styler = set_table_styles(df.style)
style_spec = get_style_spec(style_name=self.style_name)
pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='rows')
pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='columns')
raw_tbl_html = pd_styler.to_html()
if self.debug:
print(raw_tbl_html)
## Fix
html = raw_tbl_html
html = fix_top_left_box(html, style_spec, debug=self.debug, verbose=self.verbose)
html = merge_cols_of_blanks(html, debug=self.debug)
if self.debug:
print(pd_styler.uuid) ## A unique identifier to avoid CSS collisions; generated automatically.
print(html)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.MAIN_TABLE,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
sofastats.output.tables.cross_tab.CrossTabDesign
dataclass
Bases: CommonDesign
Parameters:
-
row_variable_designs
(list[Row], default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
-
column_variable_designs
(list[Column], default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
Source code in src/sofastats/output/tables/cross_tab.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407 | @dataclass(frozen=False, kw_only=True)
class CrossTabDesign(CommonDesign):
"""
Args:
row_variable_designs: list of Rows
column_variable_designs: list of Columns
"""
row_variable_designs: list[Row] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
column_variable_designs: list[Column] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
debug: bool = False
verbose: bool = False
@staticmethod
def _get_dupes(_vars: Collection[str]) -> set[str]:
dupes = set()
seen = set()
for var in _vars:
if var in seen:
dupes.add(var)
else:
seen.add(var)
return dupes
@property
def totalled_vars(self) -> list[str]:
tot_vars = []
for row_spec in self.row_variable_designs:
tot_vars.extend(row_spec.self_and_descendant_totalled_vars)
for col_spec in self.column_variable_designs:
tot_vars.extend(col_spec.self_and_descendant_totalled_vars)
return tot_vars
def _get_max_dim_depth(self, *, is_col=False) -> int:
max_depth = 0
dim_specs = self.column_variable_designs if is_col else self.row_variable_designs
for dim_spec in dim_specs:
dim_depth = len(dim_spec.self_and_descendant_vars)
if dim_depth > max_depth:
max_depth = dim_depth
return max_depth
@property
def max_row_depth(self) -> int:
return self._get_max_dim_depth()
@property
def max_col_depth(self) -> int:
return self._get_max_dim_depth(is_col=True)
def __post_init__(self):
CommonDesign.__post_init__(self)
row_dupes = CrossTabDesign._get_dupes([spec.variable_name for spec in self.row_variable_designs])
if row_dupes:
raise ValueError(f"Duplicate top-level variable(s) detected in row dimension - {sorted(row_dupes)}")
col_dupes = CrossTabDesign._get_dupes([spec.variable_name for spec in self.column_variable_designs])
if col_dupes:
raise ValueError(f"Duplicate top-level variable(s) detected in column dimension - {sorted(col_dupes)}")
## var can't be in both row and col e.g. car vs country > car
for row_spec, col_spec in product(self.row_variable_designs, self.column_variable_designs):
row_spec_vars = set([row_spec.variable_name] + row_spec.descendant_vars)
col_spec_vars = set([col_spec.variable_name] + col_spec.descendant_vars)
overlapping_vars = row_spec_vars.intersection(col_spec_vars)
if overlapping_vars:
raise ValueError("Variables can't appear in both rows and columns. "
f"Found the following overlapping variable(s): {', '.join(overlapping_vars)}")
def get_df_from_row_spec(self, cur, *, row_spec_idx: int) -> pd.DataFrame:
"""
get a combined df for, e.g. the combined top df. Or the middle df. Or the bottom df. Or whatever you have.
e.g.
row_variables_design_1 = Row(variable='country', has_total=True,
child=(variable='gender', has_total=True))
vs
column_variables_design_1 = Column(variable='Age Group', has_total=True)
column_variables_design_2 = Column(variable='Web Browser', has_total=True,
child=Column(variable='Age Group', has_total=True, pct_metrics=[Metric.ROW_PCT, Metric.COL_PCT]))
column_variables_design_3 = Column(variable='Standard Age Group', has_total=True)
"""
row_spec = self.row_variable_designs[row_spec_idx]
row_vars = row_spec.self_and_descendant_vars
n_row_fillers = self.max_row_depth - len(row_vars)
df_cols = []
for col_spec in self.column_variable_designs:
col_vars = col_spec.self_and_descendant_vars
totalled_variables = row_spec.self_and_descendant_totalled_vars + col_spec.self_and_descendant_totalled_vars
all_variables = row_vars + col_vars
data = get_data_from_spec(cur, dbe_spec=self.dbe_spec,
source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
all_variables=all_variables, totalled_variables=totalled_variables, debug=self.debug)
df_col = get_all_metrics_df_from_vars(data, row_vars=row_vars, col_vars=col_vars,
n_row_fillers=n_row_fillers, n_col_fillers=self.max_col_depth - len(col_vars),
pct_metrics=col_spec.self_or_descendant_pct_metrics, dp=self.decimal_points, debug=self.debug)
df_cols.append(df_col)
df = df_cols[0]
df_cols_remaining = df_cols[1:]
row_merge_on = []
for row_var in row_vars:
row_merge_on.append(get_pandas_friendly_name(row_var, '_var'))
row_merge_on.append(row_var)
for i in range(n_row_fillers):
row_merge_on.append(f'row_filler_var_{i}')
row_merge_on.append(f'row_filler_{i}')
for df_next_col in df_cols_remaining:
df = df.merge(df_next_col, how='outer', on=row_merge_on)
return df
def get_tbl_df(self, cur) -> pd.DataFrame:
"""
For each row_variable_designs get a completed df and then merge those.
Note - using pd.concat or df.merge(how='outer') has the same result, but I use merge for horizontal joining
to avoid repeating the row dimension columns e.g. country and gender.
Basically we are merging left and right dfs. Merging is typically on an id field that both parts share.
In this case there are as many fields to merge on as there are fields in the row index -
in this example there are 4 (var_00, val_00, var_01, and val_01).
There is one added complexity because the column is multi-index.
We need to supply a tuple with an item (possibly an empty string) for each level.
In this case there are two levels (browser and age_group). So we merge on
[('var_00', ''), ('val_00', ''), ('var_01', ''), ('val_01', '')]
If there were three row levels and four col levels we would need something like:
[('var_00', '', '', ''), ('val_00', '', '', ''), ... ('val_02', '', '', '')]
BOTTOM LEFT:
browser var_00 val_00 var_01 val_01 Chrome Firefox
agegroup <20 20-29 30-39 40-64 65+ <20 20-29 30-39 40-64 65+
0 Country NZ __blank__ __blank__ 10 19 17 28 44 25 26 14 38 48
...
BOTTOM RIGHT:
agegroup var_00 val_00 var_01 val_01 <20 20-29 30-39 40-64 65+
dummy
0 Country NZ __blank__ __blank__ 35 45 31 66 92
...
Note - we flatten out the row multi-index using reset_index().
This flattening results in a column per row variable e.g. one for country and one for gender
(at this point we're ignoring the labelling step where we split each row variable e.g. for country into Country (var) and NZ (val)).
Given it is a column, it has to have as many levels as the column dimension columns.
So if there are two column dimension levels each row column will need to be a two-tuple e.g. ('gender', '').
If there were three column dimension levels the row column would need to be a three-tuple e.g. ('gender', '', '').
"""
dfs = [self.get_df_from_row_spec(cur, row_spec_idx=row_spec_idx)
for row_spec_idx in range(len(self.row_variable_designs))]
## COMBINE using pandas JOINing (the big magic trick at the middle of this approach to complex table-making)
## Unfortunately, delegating to Pandas means we can't fix anything intrinsic to what Pandas does.
## And there is a bug (from my point of view) whenever tables are merged with the same variables at the top level.
## To prevent this we have to disallow variable reuse at top-level.
## transpose, join, and re-transpose back. JOINing on rows works differently from columns and will include all items in sub-levels under the correct upper levels even if missing from the first multi-index
## E.g. if Age Group > 40-64 is missing from the first index it will not be appended on the end but will be alongside all its siblings so we end up with Age Group > >20, 20-29 30-39, 40-64, 65+
## Note - variable levels (odd numbered levels if 1 is the top level) should be in the same order as they were originally
df_t = dfs[0].T
dfs_remaining = dfs[1:]
for df_next in dfs_remaining:
df_t = df_t.join(df_next.T, how='outer')
df = df_t.T ## re-transpose back so cols are cols and rows are rows again
if self.debug: print(f"\nCOMBINED:\n{df}")
## Sorting indexes
raw_df = get_raw_df(cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name, debug=self.debug)
order_rules_for_row_multi_index_branches = get_order_rules_for_multi_index_branches(self.row_variable_designs)
order_rules_for_col_multi_index_branches = get_order_rules_for_multi_index_branches(self.column_variable_designs)
## COLS
unsorted_col_multi_index_list = list(df.columns)
sorted_col_multi_index_list = get_sorted_multi_index_list(
unsorted_col_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_col_multi_index_branches,
sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=True, debug=self.debug)
sorted_col_multi_index = pd.MultiIndex.from_tuples(sorted_col_multi_index_list) ## https://pandas.pydata.org/docs/user_guide/advanced.html
## ROWS
unsorted_row_multi_index_list = list(df.index)
sorted_row_multi_index_list = get_sorted_multi_index_list(
unsorted_row_multi_index_list, order_rules_for_multi_index_branches=order_rules_for_row_multi_index_branches,
sort_orders=self.sort_orders, raw_df=raw_df, has_metrics=False, debug=self.debug)
sorted_row_multi_index = pd.MultiIndex.from_tuples(sorted_row_multi_index_list) ## https://pandas.pydata.org/docs/user_guide/advanced.html
df = df.reindex(index=sorted_row_multi_index, columns=sorted_col_multi_index)
if self.debug: print(f"\nORDERED:\n{df}")
return df
def to_html_design(self) -> HTMLItemSpec:
get_tbl_df_for_cur = partial(self.get_tbl_df)
df = get_tbl_df_for_cur(self.cur)
pd_styler = set_table_styles(df.style)
style_spec = get_style_spec(style_name=self.style_name)
pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='rows')
pd_styler = apply_index_styles(df, style_spec, pd_styler, axis='columns')
raw_tbl_html = pd_styler.to_html()
if self.debug:
print(raw_tbl_html)
## Fix
html = raw_tbl_html
html = fix_top_left_box(html, style_spec, debug=self.debug, verbose=self.verbose)
html = merge_cols_of_blanks(html, debug=self.debug)
html = merge_rows_of_blanks(html, debug=self.debug, verbose=self.verbose)
if self.debug:
print(pd_styler.uuid)
print(html)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.MAIN_TABLE,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Statistical Tests
sofastats.output.stats.interfaces.CommonStatsDesign
dataclass
Bases: CommonDesign
Output dataclasses for statistical tests (e.g. MannWhitneyUDesign) inherit from CommonStatsDesign.
Source code in src/sofastats/output/stats/interfaces.py
15
16
17
18
19
20
21
22
23
24
25 | class CommonStatsDesign(CommonDesign):
"""
Output dataclasses for statistical tests (e.g. MannWhitneyUDesign) inherit from CommonStatsDesign.
"""
@abstractmethod
def to_result(self) -> Type[StatsResult]:
"""
Return a dataclass with results as attributes
"""
pass
|
to_result() -> Type[StatsResult]
abstractmethod
Return a dataclass with results as attributes
Source code in src/sofastats/output/stats/interfaces.py
| @abstractmethod
def to_result(self) -> Type[StatsResult]:
"""
Return a dataclass with results as attributes
"""
pass
|
ANOVA
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.anova.AnovaDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
measure_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field aggregated by group - the ANOVA compares the mean value of each group.
For example, 'Age'
-
grouping_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field used to define the groups compared in the ANOVA e.g. 'Country'
-
group_values
(Collection[Any], default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the ANOVA will compare the means of the groups defined
by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
-
high_precision_required
(bool, default:
False
)
–
if True, the calculation will be high precision
and the algorithm used will not be vulnerable to certain edge cases.
Why not use it by default? Because it runs much, much, much slower and the edge cases are quite rare.
The high precision algorithm uses Python's
decimal data type rather than floats.
Using floating point math is a pragmatic strategy, but it reduces accuracy.
In particular edge cases, it can produce wildly different results from the correct results.
High precision is needed to handle difficult datasets e.g. ANOVA test 9 from the NIST website.
Search for articles / videos on the topic of floating point math if interested. It is a fascinating topic.
If to one decimal point the high precision algorithm also multiplies some values by 10
to push from float to integer (to reduce error) and then divides squared values by 100 (10 squared)
at the end in key calculations to restore to correct magnitude.
Source code in src/sofastats/output/stats/anova.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290 | @dataclass(frozen=False)
class AnovaDesign(CommonStatsDesign):
"""
Args:
measure_field_name: the name of the field aggregated by group - the ANOVA compares the mean value of each group.
For example, 'Age'
grouping_field_name: the name of the field used to define the groups compared in the ANOVA e.g. 'Country'
group_values: the ANOVA will compare the means of the groups defined
by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
high_precision_required: if `True`, the calculation will be high precision
and the algorithm used will not be vulnerable to certain edge cases.
Why not use it by default? Because it runs much, much, much slower and the edge cases are quite rare.
The high precision algorithm uses Python's
[decimal](https://docs.python.org/3/library/decimal.html) data type rather than floats.
Using floating point math is a pragmatic strategy, but it reduces accuracy.
In particular edge cases, it can produce wildly different results from the correct results.
High precision is needed to handle difficult datasets e.g. ANOVA test 9 from the NIST website.
Search for articles / videos on the topic of floating point math if interested. It is a fascinating topic.
If to one decimal point the high precision algorithm also multiplies some values by 10
to push from float to integer (to reduce error) and then divides squared values by 100 (10 squared)
at the end in key calculations to restore to correct magnitude.
"""
measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_values: Collection[Any] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
high_precision_required: bool = False
def to_result(self) -> AnovaResult:
## values (sorted)
grouping_field_values = apply_custom_sorting_to_values(
variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
## data
grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
## build sample results ready for anova function
samples = []
for grouping_field_value in grouping_field_values:
grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
val_is_numeric=grouping_val_is_numeric)
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
samples.append(sample)
stats_result = anova_stats_calc(
self.grouping_field_name, self.measure_field_name, samples, high=self.high_precision_required)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## values (sorted)
grouping_field_values = apply_custom_sorting_to_values(
variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
## data
grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
## build sample results ready for anova function
samples = []
for grouping_field_value in grouping_field_values:
grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
val_is_numeric=grouping_val_is_numeric)
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
samples.append(sample)
## calculations
stats_result = anova_stats_calc(
self.grouping_field_name, self.measure_field_name, samples, high=self.high_precision_required)
## output
histograms2show = []
for group_spec in stats_result.group_specs:
try:
histogram_html = get_embedded_histogram_html(
self.measure_field_name, style_spec.chart, group_spec.vals, group_spec.label)
except Exception as e:
html_or_msg = f"<b>{group_spec.label}</b> - unable to display histogram. Reason: {e}"
else:
html_or_msg = histogram_html
histograms2show.append(html_or_msg)
result = Result(**todict(stats_result),
grouping_field_name=self.grouping_field_name,
measure_field_name=self.measure_field_name,
histograms2show=histograms2show,
decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Chi Square
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.chi_square.ChiSquareDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the first variable
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the second variable
-
show_workings
(bool, default:
False
)
–
show the workings so you can see how the final results were derived
Source code in src/sofastats/output/stats/chi_square.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552 | @dataclass(frozen=False)
class ChiSquareDesign(CommonStatsDesign):
"""
Args:
variable_a_name: the name of the first variable
variable_a_name: the name of the second variable
show_workings: show the workings so you can see how the final results were derived
"""
variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_workings: bool = False
def to_result(self) -> ChiSquareResult:
## data
chi_square_data = get_chi_square_data(cur=self.cur, dbe_spec=self.dbe_spec,
source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
sort_orders=self.sort_orders)
## get results
stats_result = chi_square_stats_calc(
f_obs=chi_square_data.observed_values_a_then_b_ordered,
f_exp=chi_square_data.expected_values_a_then_b_ordered,
df=chi_square_data.degrees_of_freedom)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
chi_square_data = get_chi_square_data(cur=self.cur, dbe_spec=self.dbe_spec,
source_table_name=self.source_table_name, table_filter_sql=self.table_filter_sql,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
sort_orders=self.sort_orders)
## get results
stats_result = chi_square_stats_calc(
f_obs=chi_square_data.observed_values_a_then_b_ordered,
f_exp=chi_square_data.expected_values_a_then_b_ordered,
df=chi_square_data.degrees_of_freedom)
observed_vs_expected_tbl = get_observed_vs_expected_tbl(
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
expected_values_a_then_b_ordered=chi_square_data.expected_values_a_then_b_ordered,
style_name_hyphens=style_spec.style_name_hyphens,
)
chi_square_charts = get_chi_square_charts(
style_spec=style_spec,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered)
if self.show_workings:
worked_result = get_worked_result(
variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
degrees_of_freedom=chi_square_data.degrees_of_freedom)
worked_example = get_worked_example(worked_result)
else:
worked_result = None
worked_example = ''
result = Result(
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
variable_a_values=chi_square_data.variable_a_values, variable_b_values=chi_square_data.variable_b_values,
observed_values_a_then_b_ordered=chi_square_data.observed_values_a_then_b_ordered,
expected_values_a_then_b_ordered=chi_square_data.expected_values_a_then_b_ordered,
p=stats_result.p, chi_square=stats_result.chi_square, degrees_of_freedom=chi_square_data.degrees_of_freedom,
minimum_cell_count=chi_square_data.minimum_cell_count, pct_cells_lt_5=chi_square_data.pct_cells_freq_under_5,
observed_vs_expected_tbl=observed_vs_expected_tbl, chi_square_charts=chi_square_charts,
worked_example=worked_example, decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Kruskal-Wallis H
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.kruskal_wallis_h.KruskalWallisHDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
measure_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
-
grouping_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field used to define the groups compared in the analysis e.g. 'Country'
-
group_values
(Sequence[Any], default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the analysis will compare the means of the groups defined
by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
-
show_workings
(bool, default:
False
)
–
show the workings so you can see how the final results were derived
Source code in src/sofastats/output/stats/kruskal_wallis_h.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190 | @dataclass(frozen=False)
class KruskalWallisHDesign(CommonStatsDesign):
"""
Args:
measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
group_values: the analysis will compare the means of the groups defined
by the values of the grouping field listed here e.g. ['South Korea', 'NZ', 'USA']
show_workings: show the workings so you can see how the final results were derived
"""
measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_values: Sequence[Any] = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_workings: bool = False
def to_result(self) -> KruskalWallisHResult:
## values (sorted)
grouping_field_values = apply_custom_sorting_to_values(
variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
## data
grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
samples = []
for grouping_field_value in grouping_field_values:
grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
val_is_numeric=grouping_val_is_numeric)
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
samples.append(sample)
stats_result = kruskal_wallis_h_stats_calc(samples)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## values (sorted)
grouping_field_values = apply_custom_sorting_to_values(
variable_name=self.grouping_field_name, values=list(self.group_values), sort_orders=self.sort_orders)
## data
grouping_val_is_numeric = all(is_numeric(x) for x in self.group_values)
samples = []
for grouping_field_value in grouping_field_values:
grouping_filter = ValFilterSpec(variable_name=self.grouping_field_name, value=grouping_field_value,
val_is_numeric=grouping_val_is_numeric)
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filter, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
samples.append(sample)
stats_result = kruskal_wallis_h_stats_calc(samples)
result = Result(**todict(stats_result),
grouping_field_name=self.grouping_field_name,
measure_field_name=self.measure_field_name,
decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Mann-Whitney U
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.mann_whitney_u.MannWhitneyUDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
measure_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
-
grouping_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field used to define the groups compared in the analysis e.g. 'Country'
-
group_a_value
(Any, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the analysis will compare the ranks of this group
against the ranks of the group defined by group_b_value
-
group_b_value
(Any, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the analysis will compare the ranks of this group
against the ranks of the group defined by group_a_value
-
show_workings
(bool, default:
False
)
–
show the workings so you can see how the final results were derived
Source code in src/sofastats/output/stats/mann_whitney_u.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313 | @dataclass(frozen=False)
class MannWhitneyUDesign(CommonStatsDesign):
"""
Args:
measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
group_a_value: the analysis will compare the ranks of this group
against the ranks of the group defined by group_b_value
group_b_value: the analysis will compare the ranks of this group
against the ranks of the group defined by group_a_value
show_workings: show the workings so you can see how the final results were derived
"""
measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_a_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_b_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_workings: bool = False
def to_result(self) -> MannWhitneyUResult:
## build samples ready for mann whitney u function
grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
stats_result = mann_whitney_u_stats_calc(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
## build samples ready for mann whitney u function
grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
## get result
stats_result = mann_whitney_u_stats_calc(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
n_a = stats_result.group_a_spec.n
n_b = stats_result.group_b_spec.n
even_matches = (n_a * n_b) / float(2)
if self.show_workings:
result_workings = mann_whitney_u_for_workings(sample_a=sample_a, sample_b=sample_b, high_volume_ok=False)
worked_example = get_worked_example(result_workings, style_spec.style_name_hyphens)
else:
worked_example = ''
result = Result(**todict(stats_result),
sample_a=sample_a,
sample_b=sample_b,
grouping_field_name=self.grouping_field_name,
measure_field_name=self.measure_field_name,
n_a=n_a,
n_b=n_b,
even_matches=even_matches,
worked_example=worked_example,
decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Normality
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.normality.NormalityDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
if only this variable name is supplied, display the distribution and test it for normality.
If another variable name is also supplied, do the same thing
but for the difference between the two variables.
-
variable_b_name
(str | None, default:
None
)
–
if supplied, will be testing the normality of the difference between two variables
rather than the normality of a variable.
Source code in src/sofastats/output/stats/normality.py
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170 | @dataclass(frozen=False)
class NormalityDesign(CommonStatsDesign):
"""
Args:
variable_a_name: if only this variable name is supplied, display the distribution and test it for normality.
If another variable name is also supplied, do the same thing
but for the difference between the two variables.
variable_b_name: if supplied, will be testing the normality of the difference between two variables
rather than the normality of a variable.
"""
variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
variable_b_name: str | None = None
def to_result(self) -> NormalTestResult:
## data
paired = self.variable_b_name is not None
if paired:
sample = get_paired_diffs_sample(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
else:
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
measure_field_name=self.variable_a_name, grouping_filt=None, table_filter_sql=self.table_filter_sql)
n_vals = len(sample.vals)
if n_vals < MIN_VALS_FOR_NORMALITY_TEST:
raise Exception(f"We need at least {MIN_VALS_FOR_NORMALITY_TEST:,} values to test normality.")
else:
stats_result = normal_test(sample.vals)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
paired = self.variable_b_name is not None
if paired:
data_label = f'Difference Between "{self.variable_a_name}" and "{self.variable_b_name}"'
sample = get_paired_diffs_sample(
cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
else:
data_label = self.variable_a_name
sample = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
measure_field_name=self.variable_a_name, grouping_filt=None, table_filter_sql=self.table_filter_sql)
title = f"Normality Tests for {data_label}"
## message
n_vals = len(sample.vals)
if n_vals < MIN_VALS_FOR_NORMALITY_TEST:
message = (f"<p>We need at least {MIN_VALS_FOR_NORMALITY_TEST:,} values to test normality.</p>"
"<p>Rely entirely on visual inspection of graph above.</p>")
else:
try:
stats_result = normal_test(sample.vals)
except Exception as e:
logger.info(f"Unable to calculate normality. Orig error: {e}")
message = "<p>Unable to calculate normality tests</p>"
else:
## skew
if abs(stats_result.c_skew) <= 1:
skew_indication = 'a great sign'
elif abs(stats_result.c_skew) <= 2:
skew_indication = 'a good sign'
else:
skew_indication = 'not a good sign'
skew_msg = (f"Skew (lopsidedness) is {round(stats_result.c_skew, self.decimal_points)} "
f"which is probably {skew_indication}.")
## kurtosis
if abs(stats_result.c_kurtosis) <= 1:
kurtosis_indication = 'a great sign'
elif abs(stats_result.c_kurtosis) <= 2:
kurtosis_indication = 'a good sign'
else:
kurtosis_indication = 'not a good sign'
kurtosis_msg = (
f"Kurtosis (peakedness or flatness) is {round(stats_result.c_kurtosis, self.decimal_points)} "
f"which is probably {kurtosis_indication}.")
## combined
if n_vals > N_WHERE_NORMALITY_USUALLY_FAILS_NO_MATTER_WHAT:
message = ("<p>Rely on visual inspection of graph to assess normality.</p>"
"<p>Although the data failed the ideal normality test, "
f"most real-world data-sets with as many results ({n_vals:,}) would fail "
f"for even slight differences from the perfect normal curve.</p>"
f"<p>{skew_msg}</p><p>{kurtosis_msg}</p>")
else:
if stats_result.p < 0.05:
message = (f"<p>The distribution of {data_label} passed one test for normality.</p>"
f"<p>Confirm or reject based on visual inspection of graph. {skew_msg} {kurtosis_msg}</p>")
else:
message = (f'<p>Although the distribution of {data_label} is not perfectly "normal", '
f'it may still be "normal" enough for use. View graph to decide.</p>'
f"<p>{skew_msg}</p></p>{kurtosis_msg}</p>")
## histogram
histogram = get_embedded_histogram_html(measure_field_label=data_label, style_spec=style_spec.chart,
vals=sample.vals, width_scalar=1.5, label_chart_from_var_if_needed=False)
result = Result(
title=title,
message=message,
histogram=histogram,
)
html = get_html(result)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Pearson's R Correlation
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.pearsons_r.PearsonsRDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the first variable in each pair we are checking for correlation
-
variable_b_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the second variable in each pair we are checking for correlation
Source code in src/sofastats/output/stats/pearsons_r.py
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168 | @dataclass(frozen=False)
class PearsonsRDesign(CommonStatsDesign):
"""
Args:
variable_a_name: the first variable in each pair we are checking for correlation
variable_b_name: the second variable in each pair we are checking for correlation
"""
variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
def to_result(self) -> CorrelationCalcResult:
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
stats_result = pearsonsr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
coords = [Coord(x=x, y=y) for x, y in zip(paired_data.sample_a.vals, paired_data.sample_b.vals, strict=True)]
pearsonsr_calc_result = pearsonsr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
regression_result = get_regression_result(xs=paired_data.sample_a.vals,ys=paired_data.sample_b.vals)
correlation_result = CorrelationResult(
variable_a_name=self.variable_a_name,
variable_b_name=self.variable_b_name,
coords=coords,
stats_result=pearsonsr_calc_result,
regression_result=regression_result,
decimal_points=self.decimal_points,
)
scatterplot_series = ScatterplotSeries(
coords=correlation_result.coords,
dot_colour=style_spec.chart.colour_mappings[0].main,
dot_line_colour=style_spec.chart.major_grid_line_colour,
show_regression_details=True,
)
vars_series = [scatterplot_series, ]
xs = correlation_result.xs
ys = correlation_result.ys
x_min, x_max = get_optimal_min_max(axis_min=min(xs), axis_max=max(xs))
y_min, y_max = get_optimal_min_max(axis_min=min(ys), axis_max=max(ys))
chart_conf = ScatterplotConf(
width_inches=7.5,
height_inches=4.0,
inner_background_colour=style_spec.chart.plot_bg_colour,
text_colour=style_spec.chart.axis_font_colour,
x_axis_label=correlation_result.variable_a_name,
y_axis_label=correlation_result.variable_b_name,
show_dot_lines=True,
x_min=x_min,
x_max=x_max,
y_min=y_min,
y_max=y_max,
)
fig = get_scatterplot_fig(vars_series, chart_conf)
image_as_data = plot2image_as_data(fig)
scatterplot_html = f'<img src="{image_as_data}"/>'
result = Result(**todict(correlation_result),
scatterplot_html=scatterplot_html,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Spearman's R Correlation
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.spearmans_r.SpearmansRDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the first variable in each pair we are checking for correlation
-
variable_b_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the second variable in each pair we are checking for correlation
-
show_workings
(bool, default:
False
)
–
show the workings so you can see how the final results were derived
-
high_volume_ok
(bool, default:
False
)
–
the algorithm is more expensive than those which can make
parametric assumptions so we need to stop people unknowingly starting very slow operations.
This setting has no impact if the number of records is less than MAX_RANK_DATA_VALS
(currently 50,000 records). If set to False, an exception is raised if the code is being asked to operate
on an amount of data which will make it run very slowly. If True, the operation is allowed to proceed
but a message tells the user they can expect the process to take a fairly long time
(so they don't terminate early on the assumption that something has gone wrong the analysis
is never going to finish).
Source code in src/sofastats/output/stats/spearmans_r.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326 | @dataclass(frozen=False)
class SpearmansRDesign(CommonStatsDesign):
"""
Args:
variable_a_name: the first variable in each pair we are checking for correlation
variable_b_name: the second variable in each pair we are checking for correlation
show_workings: show the workings so you can see how the final results were derived
high_volume_ok: the algorithm is more expensive than those which can make
parametric assumptions so we need to stop people unknowingly starting very slow operations.
This setting has no impact if the number of records is less than MAX_RANK_DATA_VALS
(currently 50,000 records). If set to `False`, an exception is raised if the code is being asked to operate
on an amount of data which will make it run very slowly. If `True`, the operation is allowed to proceed
but a message tells the user they can expect the process to take a fairly long time
(so they don't terminate early on the assumption that something has gone wrong the analysis
is never going to finish).
"""
variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_workings: bool = False
high_volume_ok: bool = False
def to_result(self) -> CorrelationCalcResult:
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
stats_result = spearmansr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
coords = [Coord(x=x, y=y) for x, y in zip(paired_data.sample_a.vals, paired_data.sample_b.vals, strict=True)]
pearsonsr_calc_result = spearmansr_stats_calc(paired_data.sample_a.vals, paired_data.sample_b.vals,
high_volume_ok=self.high_volume_ok)
regression_result = get_regression_result(xs=paired_data.sample_a.vals,ys=paired_data.sample_b.vals)
if self.show_workings:
worked_result = get_worked_result(
variable_a_values=paired_data.sample_a.vals,
variable_b_values=paired_data.sample_b.vals,
)
else:
worked_result = None
correlation_result = CorrelationResult(
variable_a_name=self.variable_a_name,
variable_b_name=self.variable_b_name,
coords=coords,
stats_result=pearsonsr_calc_result,
regression_result=regression_result,
worked_result=worked_result,
decimal_points=self.decimal_points,
)
worked_example = (
get_worked_example(correlation_result, style_spec.style_name_hyphens) if self.show_workings else '')
scatterplot_series = ScatterplotSeries(
coords=coords,
dot_colour=style_spec.chart.colour_mappings[0].main,
dot_line_colour=style_spec.chart.major_grid_line_colour,
show_regression_details=True,
)
vars_series = [scatterplot_series, ]
xs = correlation_result.xs
ys = correlation_result.ys
x_min, x_max = get_optimal_min_max(axis_min=min(xs), axis_max=max(xs))
y_min, y_max = get_optimal_min_max(axis_min=min(ys), axis_max=max(ys))
chart_conf = ScatterplotConf(
width_inches=7.5,
height_inches=4.0,
inner_background_colour=style_spec.chart.plot_bg_colour,
text_colour=style_spec.chart.axis_font_colour,
x_axis_label=self.variable_a_name,
y_axis_label=self.variable_b_name,
show_dot_lines=True,
x_min=x_min,
x_max=x_max,
y_min=y_min,
y_max=y_max,
)
fig = get_scatterplot_fig(vars_series, chart_conf)
image_as_data = plot2image_as_data(fig)
scatterplot_html = f'<img src="{image_as_data}"/>'
result = Result(**todict(correlation_result),
scatterplot_html=scatterplot_html,
worked_example=worked_example,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Independent Samples T-Test
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.ttest_indep.TTestIndepDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
measure_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
-
grouping_field_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the name of the field used to define the groups compared in the analysis e.g. 'Country'
-
group_a_value
(Any, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the analysis will compare the mean value for this group
against the mean value of the group defined by group_b_value
-
group_b_value
(Any, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the analysis will compare the mean value of this group
against the mean value of the group defined by group_a_value
Source code in src/sofastats/output/stats/ttest_indep.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243 | @dataclass(frozen=False)
class TTestIndepDesign(CommonStatsDesign):
"""
Args:
measure_field_name: the name of the field aggregated by group - the analysis compares the mean value of each group.
For example, 'Age'
grouping_field_name: the name of the field used to define the groups compared in the analysis e.g. 'Country'
group_a_value: the analysis will compare the mean value for this group
against the mean value of the group defined by group_b_value
group_b_value: the analysis will compare the mean value of this group
against the mean value of the group defined by group_a_value
"""
measure_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
grouping_field_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_a_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
group_b_value: Any = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
def to_result(self) -> TTestIndepResult:
## data
## build samples ready for ttest_indep function
grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
## get result
stats_result = ttest_indep_stats_calc(sample_a, sample_b)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
## build samples ready for ttest_indep function
grouping_filt_a = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_a_value, val_is_numeric=is_numeric(self.group_a_value))
sample_a = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_a, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
grouping_filt_b = ValFilterSpec(variable_name=self.grouping_field_name,
value=self.group_b_value, val_is_numeric=is_numeric(self.group_b_value))
sample_b = get_sample(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
grouping_filt=grouping_filt_b, measure_field_name=self.measure_field_name,
table_filter_sql=self.table_filter_sql)
## get result
stats_result = ttest_indep_stats_calc(sample_a, sample_b)
mpl_pngs.set_gen_mpl_settings(axes_label_size=10, xtick_label_size=8, ytick_label_size=8)
histograms2show = []
for group_spec in [stats_result.group_a_spec, stats_result.group_b_spec]:
try:
histogram_html = get_embedded_histogram_html(
self.measure_field_name, style_spec.chart, group_spec.vals, group_spec.label)
except Exception as e:
html_or_msg = f"<b>{group_spec.label}</b> - unable to display histogram. Reason: {e}"
else:
html_or_msg = histogram_html
histograms2show.append(html_or_msg)
result = Result(**todict(stats_result),
grouping_field_name=self.grouping_field_name,
measure_field_name=self.measure_field_name,
histograms2show=histograms2show,
decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|
Wilcoxon Signed Ranks
See CommonStatsDesign
for details of the to_result() method common to all stats output design dataclasses in sofastats.
sofastats.output.stats.wilcoxon_signed_ranks.WilcoxonSignedRanksDesign
dataclass
Bases: CommonStatsDesign
Parameters:
-
variable_a_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the first variable in each pair we are checking for a difference
-
variable_b_name
(str, default:
DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
)
–
the second variable in each pair we are checking for a difference
-
show_workings
(bool, default:
False
)
–
show the workings so you can see how the final results were derived
Source code in src/sofastats/output/stats/wilcoxon_signed_ranks.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270 | @dataclass(frozen=False)
class WilcoxonSignedRanksDesign(CommonStatsDesign):
"""
Args:
variable_a_name: the first variable in each pair we are checking for a difference
variable_b_name: the second variable in each pair we are checking for a difference
show_workings: show the workings so you can see how the final results were derived
"""
variable_a_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
variable_b_name: str = DEFAULT_SUPPLIED_BUT_MANDATORY_ANYWAY
show_workings: bool = False
def to_result(self) -> WilcoxonSignedRanksResult:
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
stats_result = wilcoxon_signed_ranks_stats_calc(
sample_a=paired_data.sample_a, sample_b=paired_data.sample_b, high_volume_ok=False)
return stats_result
def to_html_design(self) -> HTMLItemSpec:
## style
style_spec = get_style_spec(style_name=self.style_name)
## data
paired_data = get_paired_data(cur=self.cur, dbe_spec=self.dbe_spec, source_table_name=self.source_table_name,
variable_a_name=self.variable_a_name, variable_b_name=self.variable_b_name,
table_filter_sql=self.table_filter_sql)
stats_result = wilcoxon_signed_ranks_stats_calc(
sample_a=paired_data.sample_a, sample_b=paired_data.sample_b, high_volume_ok=False)
if self.show_workings:
result_workings = wilcoxon_signed_ranks_for_workings(
sample_a=paired_data.sample_a, sample_b=paired_data.sample_b,
label_a=self.variable_a_name, label_b=self.variable_b_name)
worked_example = get_worked_example(result_workings, style_spec.style_name_hyphens)
else:
worked_example = ''
result = Result(**todict(stats_result),
worked_example=worked_example,
decimal_points=self.decimal_points,
)
html = get_html(result, style_spec)
return HTMLItemSpec(
html_item_str=html,
output_item_type=OutputItemType.STATS,
output_title=self.output_title,
design_name=self.__class__.__name__,
style_name=self.style_name,
)
|