You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to show several things with the SCER on the road to being published as a specification and endorsed by the GSF.
It has consensus with the Standards Working Group.
It has consensus with the Steering Committee.
To put it simply, this means that if there are concerns raised, they must be resolved, we can't ignore.
To get adopted as an ISO standard we need to show:
There are multiple GSF member organizations contributing to it (both in terms of PRs but this also could be in the form of issues/discussions on the repos)
We need to engineer interactions as much as possible so we can show evidence of organizations being involved. Force communications to happen through GitHub issues for instance. But just generally it's something we need to think about and plan now, this won't get into ISO if just two people wrote the whole spec.
To get adopted by policy makers and governments and regulators we need to show:
It's a Royalty Free Specification, which means there are no claims in the specification OR if there are we have an agreement from the party with the IP/Patent to not charge royalty fees.
We can ensure this for our members that are engaging since it's part of your member agreement.
When external parties contribute/give feedback they all must sign a form to give up any IP claims resulting from their feedback and we clearly document that whole interaction.
We can keep internal conversations informal, but we need to formalize the feedback process as this project is being discussed externally.
Challenges
There have been several meetings with the standards working group and feedback has been provided which indicates this is of great interest, but there isn't consensus with:
Categorization
Benchmarking
Ratings
Interestingly the area where these seems to be more consensus (at least no one has vocally raised any concerns so far) is in the labeling component (minus ratings of course).
There also has been some feedback that this proposed specification too large to reasonably expect all our organizations to examine and get consensus on, we'll need to break things down into smaller pieces otherwise we risk the default answer being to object, it's safer to say no to something you don't understand.
Proposed Roadmap
It's good to start of with a baseline that everyone agrees with and iterate from there, I propose that we formally split this project into several milestones. Make the first milestone just the concepts that seem to have consensus (labeling) and move the areas which need much more work to get consensus into future milestones. Then we can seriously work on getting consensus for the first milestone and have a path to ISO, whilst still working on future versions with more functionality.
Milestone 1: Disclosure
Scope: Strip out all the rest and just work on a labeling system (minus ratings and categorization). Essentially a mechanism of disclosure, just like the food ingredient labeling system, it's not a statement about how healthy the food is, it's just a disclosure of what's in it.
Milestone 2: Categorization
Scope: Get consensus on the categorization mechanism proposed. Ratings is a function of categorization, if we can't get agreement on how to categorize, there won't be agreement on how to perform ratings.
Milestone 3: Benchmarking
Scope: Get consensus on how to perform benchmarking or if the benchmarking is part of the underlying specification (like SCI). NOTE: This might actually make more sense as a separate OSS project.
Milestone 4: Ratings
Scope: Get consensus on a ratings mechanism. This is going to be the hard one!
The text was updated successfully, but these errors were encountered:
Thank you @jawache for the very meaningful suggestion! All agreed! We need more contributors! In terms of work, maybe start with the definitions of the 4 steps: Categorization, Benchmarking, Rating, Labelling. Initial attempt to define them are already in the base spec, with concrete examples explaining the definitions. The same thought process is applied in SCER for LLMs, for instance. In software engineering terms, base spec (i.e. SCER) is like a base class, SCER for LLMs, for example, is an implementation of the SCER base class. My point is that while abstractions are being worked on, examples/implementations are illustrated to back up the abstractions. This helps people understand the spec and enables readability.
Using SCER for LLMs as an example, in the context of carbon efficiencies, categorization is primarily based on LLM's size, type, and the spec uses huggingface as an example to illustrate the point:
The whole spec is following this thought process. Therefore, I would encourage contributors to use this as a reference and go through the rest of the document and see how to make it through to GSF endorsement and an ISO standard.
In order for SCER to get GSF internal consensus, @jawache is suggesting to split SCER spec into 4 mini specs, and to start with a SCER-Labeling spec because labeling (for transparency) is perceived to be the least contentious topic. The visuals/labelling section of the current SCER for LLM spec includes information for:
category
rating (relative value)
gCO2e numbers (absolute values)
QR code that tells how these ratings and numbers come about.
Can or should the SCER specification be divided into separate, independent specifications?
Should they be maintained as a cohesive, organic whole?
For example, when it comes to AI models, if categorization information is missing from the label, people might ask, "What type of AI model does this label refer to?" Is it fair to compare the carbon emissions of large language models with those of small language models? Would comparing apples to oranges be reasonable in these cases?
It looks like @jawache 's suggestion is that the label only includes information for item 3&4 above.
We need to show several things with the SCER on the road to being published as a specification and endorsed by the GSF.
To put it simply, this means that if there are concerns raised, they must be resolved, we can't ignore.
To get adopted as an ISO standard we need to show:
We need to engineer interactions as much as possible so we can show evidence of organizations being involved. Force communications to happen through GitHub issues for instance. But just generally it's something we need to think about and plan now, this won't get into ISO if just two people wrote the whole spec.
To get adopted by policy makers and governments and regulators we need to show:
We can keep internal conversations informal, but we need to formalize the feedback process as this project is being discussed externally.
Challenges
There have been several meetings with the standards working group and feedback has been provided which indicates this is of great interest, but there isn't consensus with:
Interestingly the area where these seems to be more consensus (at least no one has vocally raised any concerns so far) is in the labeling component (minus ratings of course).
There also has been some feedback that this proposed specification too large to reasonably expect all our organizations to examine and get consensus on, we'll need to break things down into smaller pieces otherwise we risk the default answer being to object, it's safer to say no to something you don't understand.
Proposed Roadmap
It's good to start of with a baseline that everyone agrees with and iterate from there, I propose that we formally split this project into several milestones. Make the first milestone just the concepts that seem to have consensus (labeling) and move the areas which need much more work to get consensus into future milestones. Then we can seriously work on getting consensus for the first milestone and have a path to ISO, whilst still working on future versions with more functionality.
Milestone 1: Disclosure
Scope: Strip out all the rest and just work on a labeling system (minus ratings and categorization). Essentially a mechanism of disclosure, just like the food ingredient labeling system, it's not a statement about how healthy the food is, it's just a disclosure of what's in it.
Milestone 2: Categorization
Scope: Get consensus on the categorization mechanism proposed. Ratings is a function of categorization, if we can't get agreement on how to categorize, there won't be agreement on how to perform ratings.
Milestone 3: Benchmarking
Scope: Get consensus on how to perform benchmarking or if the benchmarking is part of the underlying specification (like SCI). NOTE: This might actually make more sense as a separate OSS project.
Milestone 4: Ratings
Scope: Get consensus on a ratings mechanism. This is going to be the hard one!
The text was updated successfully, but these errors were encountered: