Skip to content
chris48s edited this page Nov 24, 2017 · 2 revisions

UPRNs as a first-class Citizen in WhereDIV: 2 - Plan

Data requirements

  • UPRN does become a hard requirement. I can't find any areas who didn't give us UPRNs at all for parl.2017 (even South Ribble, Hartlepool, Blaenau Gwent, Barnsley, Torfaen which were all possibilities)
  • It will be fine for Xpress, Halarose or DCounts users
  • It will be fine for GIS data
  • Idox users may have problems
  • There are some areas with low UPRN coverage (e.g: Haringey - only 93% with UPRNs)
  • Even where there is low UPRN coverage, the number of rows with no UPRN will be roughly similar to the number of ambiguous rows we're discarding using the current processes

Where does AddressBase live?

  • Postgres can't join across databases
  • You can have multiple schema in the same database and join across them
  • How does Django like this setup?
  • Is django happy for multiple apps to live in schema within the same database?
  • Deploy changes
  • How would this setup impact on your local dev copy?

You need to work out where AddressBase, ONSAD, ONSPD will live:

  • Same DB?
  • Different DB?
  • Same DB, different schema?

Factors to consider:

  • Do you want to put polling station data directly into the AddressBase table?
  • Do you want to have a 1-to-1 relationship?
  • Do you actually need to do JOINs or FKs?

Sort this out first - it is important and it defines constraints on your data model.

[Captain Hindsight]

https://stackoverflow.com/questions/35404442/django-orm-confusion-about-router-allow-relation

In WhereDIV we can only have a FK relationship between Loggedpostcode and Council because at some point we have run the councils importer on the logger DB so all the IDs are in that table. It isn't FK-ing across connections.

Postgres explicitly doesn't support cross-db JOINs http://wiki.postgresql.org/wiki/FAQ#How_do_I_perform_queries_using_multiple_databases.3F

  • postgres_fdw - can't find ANY docs on using this with django :(
  • contrib/dblink - not supported in django
  • Different DB doesn't work
  • Same DB different schema does work but fundamentally different DB connections
  • either way you'd have to use loads of raw SQL not the ORM

Database

  • You don't need blacklist anymore
  • In principle, you don't need PollingDistrict model, but it might be practical to store them in a DB either for performance or consistency checking (but don't query them interactively)
  • There is no reason for ResidentialAddress and Address to be different models - you only need one address model.
  • You should only need Address (addressbase) and PollingStation (again, in principle)
  • These data model changes impact on the API endpoints

AddressBase App

Most of the proposed work on the AddressBase app was done under

Polling Stations App

  • Data model changes - lots of work in models.py (see Database section)
  • get_polling_station() will need substantial changes

Address Pickers

Fundamental question: In the front-end, do we always show an address picker?

This has impact on

  • UX/accessability
  • Licence considerations
  • API spec
  • API users: EC, WhoCIVF, Labour, widget (remember widget does not check EE)
  • Directions source point:
    • If we always show an address picker, we can use doorstep grid refs for source point
    • If not, we use centroid, or inconsistently use centroid/doorsetp
  • EE
    • If we always show a picker, we can call the EE API by grid ref and that abstracts a lot of issues - we can shift the issue to the client app
    • If not, we need EE and WhereDIV to be 'in step'
    • Polling Districts must be a strict subset of current electoral boundaries (i.e: the votes at a single station won't be split across multiple posts) but not historic. That only helps us in areas where we do hold data though.
    • Sort this out early on
  • Design of data_finder app

[Captain Hindsight]

We decided we wanted to retain the "if everyone with your postcode votes at the same place, we don't show a picker" at the expense of:

  • providing a centroid to EE
  • using a centroid as the source point for directions

Stuff you don't need to change

  • Feedback app: No changes
  • NUS Wales app: No changes
  • Whitelabel app: No changes

Data Collection App

  • Large chunks of this do need to be rewritten and re-imagained.
  • A lot of code needs to be deleted, rather than anything else

Shouldn't need to make any changes to:

  • s3wrapper.py
  • geo_utils.py
  • filehelpers.py
  • loghelper.py
  • Data collection views (models need a bit of changing) - should we just bin this though?

Everything else needs substantial changes.

Should you:

  • Try to keep the top-level interface surface the same for import classes (i.e: try to maintain the current public method signatures/returns) and just modify the implementation details OR
  • Is this your opportunity to bin it off and start again?
  • Could you just re-write
    • BaseStationsImporter
    • BaseDistrictsImporter (do you even need this?)
    • BaseAddressesImporter and then minimise changes at the next level of abstraction?

maybe you could do that for BaseStationsImporter, but not BaseAddressesImporter

  • Need to think about performance at import time as well as query time. Ensure you are not creating a crazy-slow process.
  • Think about tooling for checking importers (reports, logging, etc)
  • There are checks you do now (e.g: checking that the number of districts in the input file is the same as the number in the DB) that won't 'translate' to the new model. You will need to find new ways to check the data/debug import scripts.
  • Fortunately there is a lot of test/sample data to play with..

[Captain Hindsight]

If you make queries like UPDATE address SET polling_station_id='foo' WHERE uprn IN (0001, 0002...) the performance is surprisingly good. Prototype implementation was able to attach station ids to ~7.2 million UPRNs in about 10 mins running 4x scripts in parallell (~100 import scripts).

Data Finder App

  • views.py
    • Try to keep BasePollingStationView fairly consistent
    • PostcodeView and AddressView need major changes (or may be both replaced by UPRNView if everyone sees an address picker)
    • AddressForm - major rewrite
    • MultipleCouncilsView - you'd think this is not needed, but if you say "my address not in list" on a split postcode, it is still relevant.
    • WeDontKnowView - changes needed
  • Helpers:
    • LoggedPostcode --> LoggedUPRN> (low priority)
    • Geocoders need to go, but review the use-cases for geocode() and geocode_point_only() again. Review v. thoroughly before thinking about implementation. https://gist.github.com/chris48s/3fc6b354dec4de6ae7d85b029f7ef5d1
    • get_council() - changes needed
    • AddressSorter - probably keep it, but depends on how you are storing AddressBase.
    • EE wrapper will need to reflect chanes to EE but until you've made them, leave it as-is.
    • DirectionsHelper - Fine. Leave it as it is.
    • RoutingHelper - heavily dependent on 'do we show all users an address picker'? but definitely needs some changes. If everyone sees an address picker, do we even need this?
    • Directions clients - fine
  • Remember to account for Northern Ireland correctly
  • Need new tests to account for new behaviour

[Captain Hindsight]

Never made it this far

Templates

  • Data Quality list - changes
  • Address Select - text edits, but nothing major
  • Multiple Councils ??
  • Postcode view - likely to need some edits to account for changes to views.
  • Rest is prob. fine

[Captain Hindsight]

Never made it this far