Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slash vs dash (IATI vs PublicBodies); "aggregator conflict" #152

Closed
VladimirAlexiev opened this issue May 20, 2017 · 9 comments
Closed

slash vs dash (IATI vs PublicBodies); "aggregator conflict" #152

VladimirAlexiev opened this issue May 20, 2017 · 9 comments

Comments

@VladimirAlexiev
Copy link

The "slash vs underscore" issue (split off from datasets/publicbodies#74) reflects a big difference in philosophy, so I believe it merits a new issue to be opened.

  • IATI has mostly been used in XML exchanges where URLs are not used. @rory09 can you elaborate why having a slash in the ID will harm this goal "wanting the identifier to be easily inserted into a url as is, given the variety".
  • PublicBodies is driven by newer COOLURI trends, in particular "hackability" (removing some suffixes of the URL still produces useful URLs). Eg http://publicbodies.org/eu/dg-infso is one directorate, while http://publicbodies.org/eu is the list of all EU orgs known to PublicBodies.

If we register OpenCorporates (who have info on 127M companies) as XI-OC in IATI, we'll have a similar issue:

Since there is no RO prefix in IATI (and maybe the official RO registry is not yet openly available), XI-OC-ro would be a very useful prefix.

And this raises a bigger issue (@CountCulture), consider
http://data.companieshouse.gov.uk/doc/company/07444723, which is the same entity as
https://beta.companieshouse.gov.uk/company/07444723 and also
https://opencorporates.com/companies/gb/07444723.

The GB official register is online and registered in IATI.
So one should prefer GB-COH-07444723 to XI-OC-gb-07444723.
But does this mean when an official register becomes available, we should deprecate "aggregator" identifier schemes or URLs (like OpenCorporates or PublicBodies) in favor of that official register?

@timgdavies
Copy link

@VladimirAlexiev The org-id.guide methodology includes the idea of a 'primary' register and always prefers these over secondary aggregations.

XI-OC is unlikely to ever be created as a register/organisation identifier list, as it simply republishes information from existing registers - so in all the cases that there is an identifier in Open Corporates, it should be possible to cross-reference it to an official register, and publish it as such.

The use of the codes (GB-COH) rather than URIs, is so that users can choose which endpoint to resolve an identifier against, and to be robust against changes in URI patterns.

(E.g., faced with GB-COH-07444723, with the meta-data available in org-id.guide the user could choose to resolve against Open Corporates data at https://opencorporates.com/companies/gb/07444723 or Companies House data at http://data.companieshouse.gov.uk/doc/company/07444723)

@VladimirAlexiev
Copy link
Author

Ok, got it. But:

  • the metadata doesn't include URL patterns. As you see above, COH have 2 patterns; and often there may be different prefixes for machine-readable data vs human-readable page.
  • your description doesn't allow for company URLs to be created and used since it says "the user could choose". IATI prefixes COULD be used as part of permanent URL schemes but since they don't give specific guidance what URL templates to choose, they WON'T be used in permanent URLs.

Given that linked data / semantic web is now the predominant way of doing inter-enterprise data integration, would't it be nice for IATI to think of permanent URLs?

@timgdavies
Copy link

The metadata doesn't include URL patterns. As you see above, COH have 2 patterns; and often there may be different prefixes for machine-readable data vs human-readable page.

Happy to look at getting this included. If you can suggest best way to capture this, we could add to the schema for org-id.guide meta-data.

Given that linked data / semantic web is now the predominant way of doing inter-enterprise data integration, would't it be nice for IATI to think of permanent URLs?

@VladimirAlexiev I'm not sure that is true.

Whilst getting the second part of a URI standardised well may be possible - experience suggests that getting agreement on using the same domain - or maintenance of dereferenceable URIs at a particular location - is far from easy, and tends to undermine attempts to use URIs for data integration across distributed publication.

@VladimirAlexiev
Copy link
Author

Wikidata has 3 such props:

getting agreement is far from easy

That seems like a bad excuse not to try it. There are many successful examples where this has happened, eg

  • VIAF has 20M entities (15M people) from 30 contributors (20 national libraries)
  • Wikidata has 20M entities (but few companies, maybe 300k), and may soon be recommended for use in schema.org data
  • EN wikipedia & DBpedia have 5M entities, and these are used very widely

If you have several URLs for an entity, owl:sameAs can be used to declare them equivalent

@VladimirAlexiev
Copy link
Author

identifier in Open Corporates, it should be possible to cross-reference it to an official register, and publish it as such.

And when the particular national register is not online, still use the OC site in formatterURL? That's a good idea.

@benparkergit
Copy link

benparkergit commented May 22, 2017 via email

@hayfield
Copy link
Contributor

hayfield commented Nov 1, 2017

@IATI/bas Is this resolved by the org-id changes?

@VladimirAlexiev
Copy link
Author

OpenCorporates only publishes records that are already online in one format or another.

That is not entirely true. Many registers publish data in weird non-user-friendly and non-web-friendly ways, while @openc makes that data uniformly available. Eg the BG register hides companies behind MS.NET postbacks and CAPTCHA. Also, there aren't company page URLs including the official ID. There are pages keyed by an ugly GUID (eg https://public.brra.bg/CheckUps/Verifications/ActiveCondition.ra?guid=617f4edf8c154f4296efdf146513de21 for EIK=204060254) and even these are behind CAPTCHA.

@openc doesn't yet have the BG register online but hopefully will soon, as part of @euBusinessGraph. The full data is dumped at http://opendata.government.bg/dataset/tbprobckn-pernctbp, we've analyzed it and a simple version is at http://data.businessgraph.io, eg see http://data.businessgraph.io/resource?uri=http://data.businessgraph.io/company/BG/200356710

@amy-silcock
Copy link
Contributor

IATI follows the guidance provide by org-id.guide. I am closing the issue on the IATI github.

The org-id GitHub is here: https://github.com/org-id/register/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants