-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Added ML-BOM examples #50
base: master
Are you sure you want to change the base?
Changes from all commits
91b77aa
d95fc31
44b64c1
cb2791f
64378ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
{ | ||
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json", | ||
"bomFormat": "CycloneDX", | ||
"specVersion": "1.6", | ||
"serialNumber": "urn:uuid:75de3b9b-9e53-4421-a259-11f18afc22bf", | ||
"version": 1, | ||
"metadata": { | ||
"timestamp": "2024-11-24T13:10:49Z", | ||
}, | ||
"components": [ | ||
{ | ||
"type": "data", | ||
"supplier": { | ||
"name": "Wikimedia" | ||
}, | ||
"manufacturer": { | ||
"name": "Wikimedia" | ||
}, | ||
"publisher": "Hugging Face Inc", | ||
"name": "wikipedia", | ||
"version": "b04c8d1ceb2f5cd4588862100d08de323dccfbaa", | ||
"data": [ | ||
{ | ||
"type": "dataset", | ||
"name": "wikipedia", | ||
"contents": { | ||
"url": "https://huggingface.co/datasets/wikimedia/wikipedia", | ||
} | ||
} | ||
], | ||
"licenses": [ | ||
{ | ||
"license": { | ||
"id": "CC-BY-SA-3.0", | ||
"name": "Creative Commons Attribution Share Alike 3.0", | ||
"url": "https://spdx.org/licenses/CC-BY-SA-3.0.html" | ||
} | ||
}, | ||
{ | ||
"license": { | ||
"id": "GFDL-1.3", | ||
"name": "GNU Free Documentation License family", | ||
"url": "https://www.gnu.org/licenses/fdl-1.3.en.html" | ||
} | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "website", | ||
"url": "https://huggingface.co/datasets/wikimedia/wikipedia" | ||
} | ||
], | ||
"hashes": [ | ||
{ | ||
"alg": "SHA-1", | ||
"content": "b04c8d1ceb2f5cd4588862100d08de323dccfbaa" | ||
} | ||
], | ||
"properties": [ | ||
{ | ||
"name": "task_categories", | ||
"value": "text-generation" | ||
}, | ||
{ | ||
"name": "task_categories", | ||
"value": "fill-mask" | ||
}, | ||
{ | ||
"name": "task_ids", | ||
"value": "language-modeling" | ||
}, | ||
{ | ||
"name": "task_ids", | ||
"value": "masked-language-modeling" | ||
}, | ||
{ | ||
"name": "language", | ||
"value": "en" | ||
}, | ||
{ | ||
"name": "language", | ||
"value": "es" | ||
}, | ||
{ | ||
"name": "size_categories", | ||
"value": "10M<n<100M" | ||
}, | ||
{ | ||
"name": "format", | ||
"value": "parquet" | ||
}, | ||
{ | ||
"name": "modality", | ||
"value": "text" | ||
}, | ||
{ | ||
"name": "library", | ||
"value": "datasets" | ||
}, | ||
{ | ||
"name": "library", | ||
"value": "dask" | ||
}, | ||
{ | ||
"name": "library", | ||
"value": "mlcroissant" | ||
}, | ||
{ | ||
"name": "library", | ||
"value": "polars" | ||
}, | ||
{ | ||
"name": "region", | ||
"value": "us" | ||
} | ||
] | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
{ | ||
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json", | ||
"bomFormat": "CycloneDX", | ||
"specVersion": "1.6", | ||
"serialNumber": "urn:uuid:56315ffe-c0af-4474-9c11-c94d1af986a9", | ||
"version": 1, | ||
"metadata": { | ||
"timestamp": "2024-11-24T13:05:42Z", | ||
"manufacturer": { | ||
"name": "Noma Security Inc." | ||
} | ||
}, | ||
"components": [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want to include a This component does not include the Refer to: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
{ | ||
"type": "machine-learning-model", | ||
"supplier": { | ||
"name": "OpenAI Inc" | ||
}, | ||
"manufacturer": { | ||
"name": "OpenAI Inc" | ||
}, | ||
"publisher": "OpenAI Inc", | ||
"name": "gpt-4o", | ||
"modelCard": { | ||
"modelParameters": { | ||
"modelArchitecture": "GPT-4", | ||
"inputs": [ | ||
{ | ||
"format": "string" | ||
}, | ||
{ | ||
"format": "image" | ||
} | ||
], | ||
"outputs": [ | ||
{ | ||
"format": "string" | ||
}, | ||
{ | ||
"format": "image" | ||
} | ||
] | ||
} | ||
} | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
{ | ||
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json", | ||
"bomFormat": "CycloneDX", | ||
"specVersion": "1.6", | ||
"serialNumber": "urn:uuid:21d0b6f8-f5b0-44df-8587-79c5d70cd1da", | ||
"version": 1, | ||
"metadata": { | ||
"timestamp": "2024-11-24T13:10:49Z", | ||
}, | ||
"components": [ | ||
{ | ||
"type": "machine-learning-model", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue here. The type does not have a modelCard property. And there should ideally be a data subcomponent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
"supplier": { | ||
"name": "google-bert" | ||
}, | ||
"manufacturer": { | ||
"name": "google-bert" | ||
}, | ||
"publisher": "Hugging Face Inc", | ||
"name": "bert-base-cased", | ||
"version": "cd5ef92a9fb2f889e972770a36d4ed042daf221e", | ||
"licenses": [ | ||
{ | ||
"license": { | ||
"id": "Apache-2.0", | ||
"name": "Apache License 2.0", | ||
"url": "https://www.apache.org/licenses/LICENSE-2.0" | ||
} | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "website", | ||
"url": "https://huggingface.co/google-bert/bert-base-cased" | ||
} | ||
], | ||
"hashes": [ | ||
{ | ||
"alg": "SHA-1", | ||
"content": "cd5ef92a9fb2f889e972770a36d4ed042daf221e" | ||
} | ||
], | ||
"modelCard": { | ||
"modelParameters": { | ||
"datasets": [ | ||
{ | ||
"type": "dataset", | ||
"name": "legacy-datasets/wikipedia", | ||
"contents": { | ||
"url": "https://huggingface.co/datasets/legacy-datasets/wikipedia" | ||
}, | ||
"description": "Wikipedia dataset containing cleaned articles of all languages." | ||
}, | ||
{ | ||
"type": "dataset", | ||
"name": "bookcorpus/bookcorpus", | ||
"contents": { | ||
"url": "https://huggingface.co/datasets/bookcorpus/bookcorpus" | ||
}, | ||
"description": "A corpus of fine-grained information and high-level semantics text" | ||
} | ||
] | ||
} | ||
}, | ||
"properties": [ | ||
{ | ||
"name": "region", | ||
"value": "us" | ||
} | ||
], | ||
"tags": [ | ||
"transformers", | ||
"pytorch", | ||
"tf", | ||
"jax", | ||
"safetensors", | ||
"bert", | ||
"fill-mask", | ||
"exbert", | ||
"en", | ||
"arxiv:1810.04805", | ||
"autotrain_compatible", | ||
"endpoints_compatible" | ||
] | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Machine Learning Bill of Materials (ML-BOM) | ||
|
||
Machine learning, particularly AI, often lacks transparency regarding models' usage, creation processes, and lifecycle within organizations. The Machine Learning Bill of Materials (ML-BOM) builds on CycloneDX to offer a detailed representation of machine learning models, datasets, and related artifacts. ML-BOM empowers organizations to document, manage, and secure their machine learning assets while enhancing visibility into model lineage and mitigating supply chain risks. | ||
|
||
## Features of ML-BOM | ||
- Captures machine learning models, datasets, libraries, and their interdependencies. | ||
- Documents comprehensive metadata about models, including architecture, training datasets, performance metrics, and ethical considerations. | ||
- Facilitates model lineage tracking, integrating it into the lifecycle management of ML components from design to decommission. | ||
- Enhances transparency by illustrating how software incorporates ML/AI components and embedding them within the broader SBOM framework. | ||
- Highlights critical details about model biases and ethical implications stemming from training datasets, while identifying and classifying the presence of sensitive data in datasets or trained models. | ||
|
||
## Key Components | ||
|
||
### 1. **Machine Learning Models** | ||
ML-BOM can document models and their parameters, including Model Architecture and Performance Metrics, and Ethical and Fairness Considerations | ||
|
||
### 2. **Datasets** | ||
Datasets used for training, validation, and inference can be described with: | ||
- **Data Classification**: Tags to specify sensitivity and value. | ||
- **Data Governance**: Ownership, stewardship, and custodianship details. | ||
- **Sensitive Data**: Annotations for datasets containing sensitive information. | ||
|
||
### 3. **Libraries** | ||
ML-BOM provides a detailed overview of the dependencies models have on specific ML/AI libraries, ensuring transparency and traceability in their usage, including their versioning, licenses, and security considerations | ||
|
||
|
||
## High-Level Object Model | ||
![CycloneDX Object Model Swimlane](https://cyclonedx.org/theme/assets/images/CycloneDX-Object-Model-Swimlane.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a data component but does not include a
data
property. Data components should include the data property, Without it, the consumer does not know what kind of data this is (e.g. configuration, source code, dataset, etc). Refer to https://cyclonedx.org/docs/1.6/json/#components_items_dataThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added data property. I noticed it includes also a
contents.url
property. Should I put the huggingface link here, inexternalReferences.url
, or both?