Discussion about statistics/ data-driven evaluation of mapping activities within Colouring Cities platforms #1115

traveller195 · 2023-03-08T16:58:51Z

traveller195
Mar 8, 2023
Collaborator

Hello everybody,
@matkoniecz @tomalrussell @mfbenitezp @polly64 @mdsimpson42 and all others :-)

for our citizen science project Colouring Dresden, the evaluation is an important part of the project, as well. On the one hand, we do some online surveys to collect feedback and also socio-demographic data about our participants. On the other hand, we plan also to analyse the mapping acitivites in a quantitative and data-driven way. For this, the Postgis database is a perfect base to analyse it (e.g. tables 'buildings' , 'users' or 'logs')
In the end, we could generate several plots about various possivle aspects of those data. And maybe, we will provide those plots also in dashboards with real-time data flow.

For a better comparison between colouring cities projects - and also to learn about your experiences - I/ we want to ask all of you, which experience did you make concerning finding proper indicators / metrics or about your chosen technology for dashboards etc.?

As a first proposal, here are some basic and very simple indicators, we could imagine to visualize first:
- number of new accounts per day (bar plot)
- number of new edits per day (bar plot)
- number of unique active accounts per day (which made edits on this day) (bar plot)
- number of page views per day (all page views, including those whithout edits) --> not in database, but in logfiles of gateway/ proxy
- percentage of buildings with collected values (not null) for each building feature, maybe ordered desc (e.g. building age ist collected for 4.5% of all buildings, is_domestic is 2.3% etc.)

...

Currently, I see it as an extra little front-end/ dashboard, which have access either directly to database or to the API (would be better, but could require new GET/ endpoints to access proper data for the plots...). So, it could run with an extra URL. But of course, also an integration into the main Colouring Cities front-end would be possible...

What are your experiences and opinions about the field of data-driven intrinsic evaluation of our projects?

(Because in Dresden in the next six months, we plan different activities like mapathons, presentations, social events or events with kids... it is interesting also to evaluate, which kind of those acitivities is suitable in wich quantitative way for mapping building features... )

thanks in advance

best
Theodor from Dresden, Germany

polly64 · 2023-03-08T17:27:34Z

polly64
Mar 8, 2023
Maintainer

@traveller195 this is great. @mdsimpson42 would be v.good if we could co-work with IOER on integrating a dashboard into core code so u can see local and international data collected for categories identified above. Also think we also should move this to new feature issue. Eventually It would also be good to collect info on % of datasets verified by alternative data capture methods - (e.g volume computational generation using inference verified through manual crowdsourcing from expert sources)

0 replies

matkoniecz · 2023-03-09T07:18:35Z

matkoniecz
Mar 9, 2023
Maintainer

number of new accounts per day (bar plot)

I would consider showing rather count of people who made the first edit. At least as a second property.

In general it is typical that vast majority of people signs up and never edits, it gets worse once you get steady stream of spam accounts (that often are fully automated and in many cases are not even spamming).

Such display is vastly less impressive but far more useful. Raw account count is usually much higher but stops being impressive when you look into it.

I suspect that CRP may be less affected by it, but you need just a single spam wave to have data that is not actually reflecting anything useful.

0 replies

matkoniecz · 2023-03-09T07:28:06Z

matkoniecz
Mar 9, 2023
Maintainer

What are your experiences and opinions about the field of data-driven intrinsic evaluation of our projects?

It is surprisingly easy to make major mistakes. For example, when @polly64 asked me to provide edit counts made by human accounts I initially provided value ten times larger than real, as some bot accounts were misclassified (I caught that when cross-checking with other data, within minutes - but sometimes such mistakes are not caught for long time, see spinach as source of bio-available iron).

I tried to do some data analysis for other projects and it is surprisingly easy to end with conclusion that looks like supported by data but actually is not or is highly misleading.

It gets worse when users start caring about statistics and game them or modify behaviour that causes them to be less productive and more visible on leaderboard.

I am not saying that it is pointless to try but it is (as with anything interesting) far more complicated than expected, especially if you want to get some knowledge and insight and not just Large Numbers For Marketing Purposes.

1 reply

polly64 Mar 9, 2023
Maintainer

@matkoniecz Agree good to have this point as a place to build from

matkoniecz · 2023-03-09T07:37:32Z

matkoniecz
Mar 9, 2023
Maintainer

https://colouring.london/leaderboard.html is providing some data already

1 reply

polly64 Mar 9, 2023
Maintainer

we want to start using the leaderboard better to incentivise stakeholder groups. This brings up other questions e.g :

how should we be encouraging people sign up e.g. under an anonymous name if individual or organisation name or company name if relevant. Do we collect info on what sector they come from (possibly ok) or cultural background for example so we don't miss audiences (I personally think this is too much personal data collection) . How does that impinge on their privacy , why do we need it? Also how can you stop people signing up under another organisation's name and adding poor quality data?
keen to use leaderboards linked to dashboards to celebrate contributions, incentivise, and also drive release of restricted building attribute data currently stored by gov, industry and third sector- to show the if u put a little in u will get much more out for your own research and the more u put in obviously the higher and more visible on board you will get. Should we have gold/silver awards etc? (I remember talking the top map contributor of an historical map georeferencing project and he said a big motivation was the in person celebratory events the museum put on to thank contributors.
Think there should be a separate corporate and government dashboards on the same page so companies can compete withwith each other - as this is what they are used it - and departments can also. We don't use corporate logos so important to have a dedicated area of the site to offer a corporate responsibility tick and 'thank you' for contributing data.

matkoniecz · 2023-03-09T07:39:36Z

matkoniecz
Mar 9, 2023
Maintainer

What are your experiences and opinions about the field of data-driven intrinsic evaluation of our projects?

It may be also useful to provide as much as possible of editing data. Sometimes people in community will create own analysis and statistics.

(I hope that it is OK to post it as four separate parts as they touch on quite different aspects)

3 replies

traveller195 Mar 9, 2023
Collaborator Author

thanks a lot for your replies.
I agree with you, it would be nice to provide a lot of data via the API. We had also the idea, to invite others to code own dashboards. Could be a nice little challenge and also a further approach how to participate in Colouring Cities project

polly64 Mar 9, 2023
Maintainer

all sounds great. @matkoniecz when Colouring Loughborough work starts (hopefully soon) maybe we could meet with @traveller195 and other colleagues to chat though this as would be great to start work on it asap

matkoniecz Mar 9, 2023
Maintainer

it would be nice to provide a lot of data via the API

or database dumps with full history - not sure is colouring providing it by default (at some scale API is not viable anyway)

polly64 · 2023-03-09T08:35:09Z

polly64
Mar 9, 2023
Maintainer

@traveller195 @matkoniecz @mattnkm @aldenc A key feature of the CCRP is that it is a research led initiative that brings together academics involved in building stock research, and software engineers to co-develop platforms and to analyse data across countries and to test data capture/enrichment/verification methods. The idea is to create feedback loops between researchers analysing the data across countries and the CCRP international teams co-working on interface and code.

At present we have over 45 researchers and engineers working on the project, yet in each country and department teams range from 1-3 - Dresden currently being the largest funded team and Colombia the largest in help-in -kind as many engineering students are involved. A key problem is that it is very hard to pick up the huge amount of valuable help-in-kind value that is being .given by academia despite this being very important for CCRP partners in securing research funding. Collective specialist international help in kind for the CCRP from academic sources alone will massively outweigh the small investment requested from research funders for national CCRP teams- as the model also looks at low cost distributed maintenance at national scale (see UK and Australia academic network work model testing).

It would be interesting to discuss the above issue in the context of the dashboard and the planned 'Showcase section' #644 which we wish to implement to allow case studies to be easily uploaded and searched for using a standard template (that requires a link to spatial data visualisation- still/animation/simulation to be added).

We need to somehow try and separate out and cost time that is being fed back into the project design through CCRP teams and through the wider CCRP_academic_ networks we are developing in a way from the specialist crowdsourcing from other stakeholders. The point is to demo the value of these networks not only in terms of showing amount of funding flowing in in this way but also to help departments collaborative more easily, to reduce academic competition than may inhibit the flow of open knowledge and data, and to drive feedback loops that improve core colouring code.

Anyway these are just thoughts but the model can be applied to any area of academic research- I can see its is ncredible hard to quantify this but worth making a start as there will be huge benefits for the CCRP model if we can crack it.

0 replies

traveller195 · 2023-06-03T09:54:17Z

traveller195
Jun 3, 2023
Collaborator Author

@polly64 @tomalrussell @matkoniecz @popcorndoublefeature @blaumiau
hello everybody,
just to keep you informed.
We could realize some of the ideas concerning further API endpoints to supply external dashboards with real-time statistical data about the mapping activites of Colouring Cities.

So, we extended the Colouring Cities API by a new endpoint /statistics and some first analysis for dashboard usage.

To find existing endpoints, please check out the following links/ URIs you can find here:
https://github.com/colouring-cities/colouring-dresden/blob/main/app/src/api/routes/statisticsRouter.ts

The SQL (PostgreSQL/ PostGIS) queries behind you can find here:
https://github.com/colouring-cities/colouring-dresden/blob/main/app/src/api/services/statistics.ts

At the moment we have an internal (not public) external dashboard made with Grafana Cloud to visualize those data.

For the future we plan to add further analysis, like mapped attributes distinguished by city boroughs, showing last xx edits /or edits of last xx minutes/ hours... in a map within the dashboard. Or to distinguish the analysis into new mapped attributes, modified attributes and removed attributes ...
A hackathon / coding challenge to create own dashboards together with our Citizen Scientists is also possible. Either "just" using the same data provided by CC API, or to extend it by sending us pull requests with further exciting analysis/ endpoints for the API...
In Terms of gamification aspects, it could also an option to extend the indivudal account page (while a user is logged in) by the personal plots or maps to inform about the own mapping progress (in total, or also for current data: "you mapped already 13 builiding attributes today" or something).
Therefore I have some questions, especially for Tom and Mateusz or Mike:

1.) Is there currently a proper mechanism in the backend of the API to handle performance and requests in a certain period? To protect the API and the server, and to avoid too much traffic by a lot of API requests and related troubles?
2.) How is your policy when to use API-Key authentification or not? I saw, that endpoint for data extracts is open to everyone (like also the new /statistics endpoint). Maybe, currently the user/ client only is required to provide the API-key when writing data (POST requests)?
Would it make sense in the future, to protect the /statistics endpoint with API-key authentification to avoid not authorized usage?

3.) When we think abou integration of a React and TypeScript compatibel package/ library to allow dynamic and interactive charts/ plots or maps for the usage in the frontend and not in external dashboard. Do you have any recommend library which would fit to our needs? I ask, because it could lead to solve some nodeJS/ npm dependencies and it would be the best to use the equal library in whole CCRP (if you agree to include such functionality to the platform...).

all the best,

Theodor

1 reply

tomalrussell Jun 5, 2023
Maintainer

Some great ideas in this discussion, nice to see the grafana dashboard demo too!

1.) Is there currently a proper mechanism in the backend of the API to handle performance and requests in a certain period? To protect the API and the server, and to avoid too much traffic by a lot of API requests and related troubles?

As far as I am aware, this is handled in the NGINX/gateway configuration (e.g. configuring limit_req), and only supported in the app by tile caching. Most API requests are very cheap - it might be worth caching more expensive results, e.g. from statistical/analytical queries.

2.) How is your policy when to use API-Key authentication or not? I saw, that endpoint for data extracts is open to everyone (like also the new /statistics endpoint). Maybe, currently the user/ client only is required to provide the API-key when writing data (POST requests)? Would it make sense in the future, to protect the /statistics endpoint with API-key authentication to avoid not authorized usage?

API-key authentication is used as an alternative to secure cookies: in the web frontend, requests are authenticated using the cookie; in scripts or automated requests, the API key should work as an alternative.

The policy has been: edits (basically any POST) must be authenticated; reads (any GET) are open, except for the user's own account data.

All contributed data and edits are part of the ODbL open data, so the rationale has been to leave them public.

It might make sense to limit aggregated user/edit statistics to all logged-in users. It might also make sense to limit any user-specific queries to the authenticated user.

3.) When we think about integration of a React and TypeScript compatible package/ library to allow dynamic and interactive charts/ plots or maps for the usage in the frontend and not in external dashboard. Do you have any recommend library which would fit to our needs? I ask, because it could lead to solve some node/npm dependencies and it would be the best to use the equal library in whole CCRP (if you agree to include such functionality to the platform...).

I have had good experiences using vega-lite for lightly interactive charts. For maps, the frontend already uses leaflet which should work nicely for mapping data by borough too.

polly64 · 2023-06-05T10:51:36Z

polly64
Jun 5, 2023
Maintainer

@tomalrussell could we arrange a technical meeting with @traveller195 and @matkoniecz and @mdsimpson42 to talk about this and about the data storage model and archiving process? @traveller195 would you like to open an new issue with your comments diagram and then everyone can comment there?

Is this also linked to the question of whether we introduce email verification - which would you agree we need to do?

1 reply

matkoniecz Jun 13, 2023
Maintainer

I am open to that, feel free to propose specific suitable meeting time

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion about statistics/ data-driven evaluation of mapping activities within Colouring Cities platforms #1115

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 7 replies

This comment has been minimized.

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Discussion about statistics/ data-driven evaluation of mapping activities within Colouring Cities platforms #1115

traveller195 Mar 8, 2023 Collaborator

Replies: 9 comments · 7 replies

This comment has been minimized.

polly64 Mar 8, 2023 Maintainer

matkoniecz Mar 9, 2023 Maintainer

matkoniecz Mar 9, 2023 Maintainer

polly64 Mar 9, 2023 Maintainer

matkoniecz Mar 9, 2023 Maintainer

polly64 Mar 9, 2023 Maintainer

matkoniecz Mar 9, 2023 Maintainer

traveller195 Mar 9, 2023 Collaborator Author

polly64 Mar 9, 2023 Maintainer

matkoniecz Mar 9, 2023 Maintainer

polly64 Mar 9, 2023 Maintainer

traveller195 Jun 3, 2023 Collaborator Author

tomalrussell Jun 5, 2023 Maintainer

polly64 Jun 5, 2023 Maintainer

matkoniecz Jun 13, 2023 Maintainer

traveller195
Mar 8, 2023
Collaborator

Replies: 9 comments 7 replies

polly64
Mar 8, 2023
Maintainer

matkoniecz
Mar 9, 2023
Maintainer

matkoniecz
Mar 9, 2023
Maintainer

polly64 Mar 9, 2023
Maintainer

matkoniecz
Mar 9, 2023
Maintainer

polly64 Mar 9, 2023
Maintainer

matkoniecz
Mar 9, 2023
Maintainer

traveller195 Mar 9, 2023
Collaborator Author

polly64 Mar 9, 2023
Maintainer

matkoniecz Mar 9, 2023
Maintainer

polly64
Mar 9, 2023
Maintainer

traveller195
Jun 3, 2023
Collaborator Author

tomalrussell Jun 5, 2023
Maintainer

polly64
Jun 5, 2023
Maintainer

matkoniecz Jun 13, 2023
Maintainer