Discussion about statistics/ data-driven evaluation of mapping activities within Colouring Cities platforms #1115
Replies: 9 comments 7 replies
This comment has been minimized.
This comment has been minimized.
-
@traveller195 this is great. @mdsimpson42 would be v.good if we could co-work with IOER on integrating a dashboard into core code so u can see local and international data collected for categories identified above. Also think we also should move this to new feature issue. Eventually It would also be good to collect info on % of datasets verified by alternative data capture methods - (e.g volume computational generation using inference verified through manual crowdsourcing from expert sources) |
Beta Was this translation helpful? Give feedback.
-
I would consider showing rather count of people who made the first edit. At least as a second property. In general it is typical that vast majority of people signs up and never edits, it gets worse once you get steady stream of spam accounts (that often are fully automated and in many cases are not even spamming). Such display is vastly less impressive but far more useful. Raw account count is usually much higher but stops being impressive when you look into it. I suspect that CRP may be less affected by it, but you need just a single spam wave to have data that is not actually reflecting anything useful. |
Beta Was this translation helpful? Give feedback.
-
It is surprisingly easy to make major mistakes. For example, when @polly64 asked me to provide edit counts made by human accounts I initially provided value ten times larger than real, as some bot accounts were misclassified (I caught that when cross-checking with other data, within minutes - but sometimes such mistakes are not caught for long time, see spinach as source of bio-available iron). I tried to do some data analysis for other projects and it is surprisingly easy to end with conclusion that looks like supported by data but actually is not or is highly misleading. It gets worse when users start caring about statistics and game them or modify behaviour that causes them to be less productive and more visible on leaderboard. I am not saying that it is pointless to try but it is (as with anything interesting) far more complicated than expected, especially if you want to get some knowledge and insight and not just Large Numbers For Marketing Purposes. |
Beta Was this translation helpful? Give feedback.
-
https://colouring.london/leaderboard.html is providing some data already |
Beta Was this translation helpful? Give feedback.
-
It may be also useful to provide as much as possible of editing data. Sometimes people in community will create own analysis and statistics. (I hope that it is OK to post it as four separate parts as they touch on quite different aspects) |
Beta Was this translation helpful? Give feedback.
-
@traveller195 @matkoniecz @mattnkm @aldenc A key feature of the CCRP is that it is a research led initiative that brings together academics involved in building stock research, and software engineers to co-develop platforms and to analyse data across countries and to test data capture/enrichment/verification methods. The idea is to create feedback loops between researchers analysing the data across countries and the CCRP international teams co-working on interface and code. At present we have over 45 researchers and engineers working on the project, yet in each country and department teams range from 1-3 - Dresden currently being the largest funded team and Colombia the largest in help-in -kind as many engineering students are involved. A key problem is that it is very hard to pick up the huge amount of valuable help-in-kind value that is being .given by academia despite this being very important for CCRP partners in securing research funding. Collective specialist international help in kind for the CCRP from academic sources alone will massively outweigh the small investment requested from research funders for national CCRP teams- as the model also looks at low cost distributed maintenance at national scale (see UK and Australia academic network work model testing). It would be interesting to discuss the above issue in the context of the dashboard and the planned 'Showcase section' #644 which we wish to implement to allow case studies to be easily uploaded and searched for using a standard template (that requires a link to spatial data visualisation- still/animation/simulation to be added). We need to somehow try and separate out and cost time that is being fed back into the project design through CCRP teams and through the wider CCRP_academic_ networks we are developing in a way from the specialist crowdsourcing from other stakeholders. The point is to demo the value of these networks not only in terms of showing amount of funding flowing in in this way but also to help departments collaborative more easily, to reduce academic competition than may inhibit the flow of open knowledge and data, and to drive feedback loops that improve core colouring code. Anyway these are just thoughts but the model can be applied to any area of academic research- I can see its is ncredible hard to quantify this but worth making a start as there will be huge benefits for the CCRP model if we can crack it. |
Beta Was this translation helpful? Give feedback.
-
@polly64 @tomalrussell @matkoniecz @popcorndoublefeature @blaumiau So, we extended the Colouring Cities API by a new endpoint /statistics and some first analysis for dashboard usage. To find existing endpoints, please check out the following links/ URIs you can find here: The SQL (PostgreSQL/ PostGIS) queries behind you can find here: At the moment we have an internal (not public) external dashboard made with Grafana Cloud to visualize those data. For the future we plan to add further analysis, like mapped attributes distinguished by city boroughs, showing last xx edits /or edits of last xx minutes/ hours... in a map within the dashboard. Or to distinguish the analysis into new mapped attributes, modified attributes and removed attributes ... 1.) Is there currently a proper mechanism in the backend of the API to handle performance and requests in a certain period? To protect the API and the server, and to avoid too much traffic by a lot of API requests and related troubles? 3.) When we think abou integration of a React and TypeScript compatibel package/ library to allow dynamic and interactive charts/ plots or maps for the usage in the frontend and not in external dashboard. Do you have any recommend library which would fit to our needs? I ask, because it could lead to solve some nodeJS/ npm dependencies and it would be the best to use the equal library in whole CCRP (if you agree to include such functionality to the platform...). all the best, Theodor |
Beta Was this translation helpful? Give feedback.
-
@tomalrussell could we arrange a technical meeting with @traveller195 and @matkoniecz and @mdsimpson42 to talk about this and about the data storage model and archiving process? @traveller195 would you like to open an new issue with your comments diagram and then everyone can comment there? Is this also linked to the question of whether we introduce email verification - which would you agree we need to do? |
Beta Was this translation helpful? Give feedback.
-
Hello everybody,
@matkoniecz @tomalrussell @mfbenitezp @polly64 @mdsimpson42 and all others :-)
for our citizen science project Colouring Dresden, the evaluation is an important part of the project, as well. On the one hand, we do some online surveys to collect feedback and also socio-demographic data about our participants. On the other hand, we plan also to analyse the mapping acitivites in a quantitative and data-driven way. For this, the Postgis database is a perfect base to analyse it (e.g. tables 'buildings' , 'users' or 'logs')
In the end, we could generate several plots about various possivle aspects of those data. And maybe, we will provide those plots also in dashboards with real-time data flow.
For a better comparison between colouring cities projects - and also to learn about your experiences - I/ we want to ask all of you, which experience did you make concerning finding proper indicators / metrics or about your chosen technology for dashboards etc.?
As a first proposal, here are some basic and very simple indicators, we could imagine to visualize first:
- number of new accounts per day (bar plot)
- number of new edits per day (bar plot)
- number of unique active accounts per day (which made edits on this day) (bar plot)
- number of page views per day (all page views, including those whithout edits) --> not in database, but in logfiles of gateway/ proxy
- percentage of buildings with collected values (not null) for each building feature, maybe ordered desc (e.g. building age ist collected for 4.5% of all buildings, is_domestic is 2.3% etc.)
Currently, I see it as an extra little front-end/ dashboard, which have access either directly to database or to the API (would be better, but could require new GET/ endpoints to access proper data for the plots...). So, it could run with an extra URL. But of course, also an integration into the main Colouring Cities front-end would be possible...
What are your experiences and opinions about the field of data-driven intrinsic evaluation of our projects?
(Because in Dresden in the next six months, we plan different activities like mapathons, presentations, social events or events with kids... it is interesting also to evaluate, which kind of those acitivities is suitable in wich quantitative way for mapping building features... )
thanks in advance
best
Theodor from Dresden, Germany
Beta Was this translation helpful? Give feedback.
All reactions