This repo provides a webpage that is embedded in stateofthenation.gov.za. This webpage is published at pres-employment.openup.org.za. An embedded preview is available at sona-shell.netlify.app.
Data processing is done using Python, website UX design in Webflow, and website dynamics using jQuery and D3.js.
NOTE: Read this is you are updating the spreadsheet used as input for the website.
The basic structure of the spreadsheet is as follows:
-
Targets - a sheet listing all programmes and their target number of beneficiaries. This stays the same for a phase, as targets are set once.
-
Trends - a sheet listing programme outcomes. As the spreadsheet is updated, columns are added to this sheet.
-
Provincial (beneficiaries) - the by-province breakdown of programmes - each province gets a column
-
Demographic data - all non-province breakdowns: gender, youth, etc.
-
Implementation status - the implementation status of each programme
-
Department Descriptions - the descriptions and blurbs ("lead" and "paragraph") for each department
General rules for the spreadsheet:
-
Keep it rectangular: the code expects a grid of rows and columns, so there must not be any merged cells, etc.
-
Pay attention to naming: the programme names need to be exactly the same throughout the spreadsheet
-
Whitespace matters: "Educational Assistants" is different to "Educational Assistants" and "Educational Assistants "
-
Each change needs a new version: To make it clear which version of which, make sure that each time you change the spreadsheet you give the file a new name and store it in the appropriate place on Google Drive.
NOTE: Read this if you are running the data update code.
Data is processed by the Python script in python-src/update_all_data.py
. The previously used Jupyter Notebook is deprecated. The Python script has these
parameters:
usage: update_all_data.py [-h] [--phase1_excel PHASE1_EXCEL] [--phase2_excel PHASE2_EXCEL] [--output_dir OUTPUT_DIR] [--output_filename OUTPUT_FILENAME]
options:
-h, --help show this help message and exit
--phase1_excel PHASE1_EXCEL
--phase2_excel PHASE2_EXCEL
--output_dir OUTPUT_DIR
--output_filename OUTPUT_FILENAME
The default output filename is data/all_data.json
file.
Commits made to the data-updates
branch are visible at https://data-updates--presidency-employment-stimulus.netlify.app/ and the staging
branch updates to https://staging--presidency-employment-stimulus.netlify.app/.
The list of valid months and corresponding columns in the Trends sheet is in python-src/presidential\_employment/__init__.py
lines 14-117.
The months should correspond to the number of columns in the Trends sheet - no more, no less. For lookup on the web interface,
the data/lookups.json
should be updated.
Update the end date of phases in src/index.html
. There are in class="feature-value__phase-label"
and class="phase-legend__text"
on lines 269 and 202.
If the "number of direct participants" needs to be changed, this is in src/js/viz-phased.js
(line 58).
The data from the spreadsheet is read into an Overview and a list of Departments. Within each of these, there are Phases, which in turn contain Sections. Sections are essentially the top-level page breakdowns, for example, the Programme Achievements in the Overview is a section, and the Programme Targets for a Department is another. Within each Department there is a Section for each type of opportunity: jobs created, livelihoods supported and jobs retained. Each Section has multiple Metrics. These are, for example, programmes like the DBE's Education Assistants programme. Each Metric has overall values and targets and zero or more Dimensions. The Dimensions are the breakdowns by time and by various demographics. Each Dimension has an associated visualation type and a set of values and targets.
The JSON is generated by the patched version of the dataclasses_json module
(the patch is in this PR) and the classes are defined in the python-src/presidential_employment.py
file in this repository.
Dimensions are parsed into their Python class representation by the code in compute_all_data_departments
(in the above-mentioned
Python file) and the make_dim
function (for Dimensions that are represented by their own sheet, e.g. the Provincial breakdown)
and also the code in that function that looks for columns (e.g. the gender ones) in the Demographics sheet. Demographic information
is aggregated (for use in the Overview) by the compute_breakdowns
function.
To update the website with a Webflow export, save the Webflow export to /webflow-export.zip
, then run:
npm run webflow-import
Commits to main
are deployed to presidency-employment-stimulus.netlify.app by Netlify. The site pres-employment.openup.org.za points at this site.
Dependencies:
python>=3.9
dataclasses-json>=0.5.7
pandas>=1.4.1
numpy>=1.21.5
The data structures in use are:
Everything -> Overview
-> List[Department]
Overview -> List[PhaseDates] # this describes the start and end dates of the phases
-> List[Sections] # the sections are for different types of beneficiary or other top-level divisions e.g. totals vs breakdowns
# when used in Overview
Section -> List[PhasedMetrics] # Metrics are both the top level summary (budget, total beneficiaries) and the different breakdowns
# a "PhasedMetric" has a list of total values and target values
PhasedMetric -> List[Dimension] # Dimensions hold the data displayed as line charts, bar charts, etc. i.e. breakdowns of a Metric
Dimension -> List[MultiMetricValue] # MultiMetricValues are used for dimensions that need values that map phase_num -> value
# when used in Department - in this case the phases are split apart at the top level as Phases, not via PhasedMetrics
Department -> Phase
Phase -> List[Section]
List[Beneficiary]
List[ImplementationDetail] # when the implementation status is stored on the Department level
Section -> List[Metric]
Metric -> List[Dimension]
ImplementationDetail # when we have implementation status for a programme
Dimension -> List[MetricValue] # where the MetricValue stores value and value_target e.g. by time, by gender, etc