From 00450f709585f9eae8618fcd991245359a623128 Mon Sep 17 00:00:00 2001 From: jxue16 <105090474+jxue16@users.noreply.github.com> Date: Tue, 14 Jan 2025 14:51:42 -0800 Subject: [PATCH 1/4] 2024 updates --- assets/amazon.yaml | 39 +++++++++++++++++++++ assets/anthropic.yaml | 35 ++++++++++++++++++ assets/genmo.yaml | 43 +++++++++++++++++++++++ assets/google.yaml | 66 ++++++++++++++++++++++++++++++++++ assets/ibm.yaml | 44 +++++++++++++++++++++++ assets/microsoft.yaml | 35 ++++++++++++++++++ assets/mistral.yaml | 76 ++++++++++++++++++++++++++++++++++++++++ assets/openx.yaml | 40 +++++++++++++++++++++ assets/stability_ai.yaml | 40 +++++++++++++++++++++ assets/unknown.yaml | 40 +++++++++++++++++++++ js/main.js | 2 ++ 11 files changed, 460 insertions(+) create mode 100644 assets/genmo.yaml create mode 100644 assets/unknown.yaml diff --git a/assets/amazon.yaml b/assets/amazon.yaml index d539cd24..6cae7fcf 100644 --- a/assets/amazon.yaml +++ b/assets/amazon.yaml @@ -86,3 +86,42 @@ prohibited_uses: '' monitoring: '' feedback: https://github.com/amazon-science/chronos-forecasting/discussions +- type: model + name: Amazon Nova + organization: Amazon Web Services (AWS) + description: A new generation of state-of-the-art foundation models (FMs) that + deliver frontier intelligence and industry leading price performance, available + exclusively in Amazon Bedrock. You can use Amazon Nova to lower costs and latency + for almost any generative AI task. + created_date: 2024-12-03 + url: https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/ + model_card: unknown + modality: + explanation: Amazon Nova understanding models accept text, image, or video inputs + to generate text output. Amazon creative content generation models accept + text and image inputs to generate image or video output. + value: text, image, video; text, image, video + analysis: Amazon Nova Pro is capable of processing up to 300K input tokens and + sets new standards in multimodal intelligence and agentic workflows that require + calling APIs and tools to complete complex workflows. It achieves state-of-the-art + performance on key benchmarks including visual question answering ( TextVQA + ) and video understanding ( VATEX ). + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: All Amazon Nova models include built-in safety controls and creative + content generation models include watermarking capabilities to promote responsible + AI use. + access: + explanation: available exclusively in Amazon Bedrock + value: limited + license: unknown + intended_uses: You can build on Amazon Nova to analyze complex documents and videos, + understand charts and diagrams, generate engaging video content, and build sophisticated + AI agents, from across a range of intelligence classes optimized for enterprise + workloads. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/anthropic.yaml b/assets/anthropic.yaml index 1fe22119..9400ee16 100644 --- a/assets/anthropic.yaml +++ b/assets/anthropic.yaml @@ -637,3 +637,38 @@ integrated to ensure robustness of evaluations. feedback: Feedback on Claude 3.5 Sonnet can be submitted directly in-product to inform the development roadmap and improve user experience. +- type: model + name: Claude 3.5 Haiku + organization: Anthropic + description: Claude 3.5 Haiku is Anthropic's fastest model, delivering advanced + coding, tool use, and reasoning capability, surpassing the previous Claude 3 + Opus in intelligence benchmarks. It is designed for critical use cases where + low latency is essential, such as user-facing chatbots and code completions. + created_date: 2024-10-22 + url: https://www.anthropic.com/claude/haiku + model_card: unknown + modality: + explanation: Claude 3.5 Haiku is available...initially as a text-only model + and with image input to follow. + value: text; unknown + analysis: Claude 3.5 Haiku offers strong performance and speed across a variety + of coding, tool use, and reasoning tasks. Also, it has been tested in extensive + safety evaluations and exceeded expectations in reasoning and code generation + tasks. + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: During Claude 3.5 Haiku’s development, we conducted extensive + safety evaluations spanning multiple languages and policy domains. + access: + explanation: Claude 3.5 Haiku is available across Claude.ai, our first-party + API, Amazon Bedrock, and Google Cloud’s Vertex AI. + value: open + license: unknown + intended_uses: Critical use cases where low latency matters, like user-facing + chatbots and code completions. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/genmo.yaml b/assets/genmo.yaml new file mode 100644 index 00000000..7425032a --- /dev/null +++ b/assets/genmo.yaml @@ -0,0 +1,43 @@ +--- +- type: model + name: Mochi 1 + organization: Genmo + description: Mochi 1 is an open-source video generation model designed to produce + high-fidelity motion and strong prompt adherence in generated videos, setting + a new standard for open video generation systems. + created_date: 2025-01-14 + url: https://www.genmo.ai/blog + model_card: unknown + modality: + explanation: Mochi 1 generates smooth videos... Measures how accurately generated + videos follow the provided textual instructions + value: text; video + analysis: Mochi 1 sets a new best-in-class standard for open-source video generation. + It also performs very competitively with the leading closed models... We benchmark + prompt adherence with an automated metric using a vision language model as a + judge following the protocol in OpenAI DALL-E 3. We evaluate generated videos + using Gemini-1.5-Pro-002. + size: + explanation: featuring a 10 billion parameter diffusion model + value: 10B parameters + dependencies: [DDPM, DreamFusion, Emu Video, T5-XXL] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: robust safety moderation protocols in the playground to ensure + that all video generations remain safe and aligned with ethical guidelines. + access: + explanation: open state-of-the-art video generation model... The weights and + architecture for Mochi 1 are open + value: open + license: + explanation: We're releasing the model under a permissive Apache 2.0 license. + value: Apache 2.0 + intended_uses: Advance the field of video generation and explore new methodologies. + Build innovative applications in entertainment, advertising, education, and + more. Empower artists and creators to bring their visions to life with AI-generated + videos. Generate synthetic data for training AI models in robotics, autonomous + vehicles and virtual environments. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/google.yaml b/assets/google.yaml index 8bc851ae..95af22d3 100644 --- a/assets/google.yaml +++ b/assets/google.yaml @@ -1908,3 +1908,69 @@ monitoring: unknown feedback: Encourages developer feedback to inform model improvements and future updates. +- type: model + name: Veo 2 + organization: Google DeepMind + description: Veo 2 is a state-of-the-art video generation model that creates videos + with realistic motion and high-quality output, up to 4K, with extensive camera + controls. It simulates real-world physics and offers advanced motion capabilities + with enhanced realism and fidelity. + created_date: 2024-12-16 + url: https://deepmind.google/technologies/veo/veo-2/ + model_card: unknown + modality: + explanation: Our state-of-the-art video generation model ... text-to-image model + Veo 2 + value: text; video + analysis: Veo 2 outperforms other leading video generation models, based on human + evaluations of its performance. + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: Veo 2 includes features that enhance realism, fidelity, detail, + and artifact reduction to ensure high-quality output. + access: '' + license: unknown + intended_uses: Creating high-quality videos with realistic motion, different styles, + camera controls, shot styles, angles, and movements. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown + +- type: model + name: Gemini 2.0 + organization: Google DeepMind + description: Google DeepMind introduces Gemini 2.0, a new AI model designed for + the 'agentic era.' + created_date: 2024-12-11 + url: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message + model_card: unknown + modality: + explanation: The first model built to be natively multimodal, Gemini 1.0 and + 1.5 drove big advances with multimodality and long context to understand information + across text, video, images, audio and code... + value: text, video, images, audio, code; image, audio + analysis: unknown + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: + explanation: It’s built on custom hardware like Trillium, our sixth-generation + TPUs. + value: custom hardware like Trillium, our sixth-generation TPUs + quality_control: Google is committed to building AI responsibly, with safety and + security as key priorities. + access: + explanation: Gemini 2.0 Flash is available to developers and trusted testers, + with wider availability planned for early next year. + value: limited + license: unknown + intended_uses: Develop more agentic models, meaning they can understand more about + the world around you, think multiple steps ahead, and take action on your behalf, + with your supervision. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/ibm.yaml b/assets/ibm.yaml index 260b3a9f..3df96d5b 100644 --- a/assets/ibm.yaml +++ b/assets/ibm.yaml @@ -75,3 +75,47 @@ prohibited_uses: '' monitoring: '' feedback: '' +- type: model + name: IBM Granite 3.0 + organization: IBM + description: IBM Granite 3.0 models deliver state-of-the-art performance relative + to model size while maximizing safety, speed and cost-efficiency for enterprise + use cases. + created_date: 2024-10-21 + url: https://www.ibm.com/new/ibm-granite-3-0-open-state-of-the-art-enterprise-models + model_card: unknown + modality: + explanation: IBM Granite 3.0 8B Instruct model for classic natural language + use cases including text generation, classification, summarization, entity + extraction and customer service chatbots + value: text; text + analysis: Granite 3.0 8B Instruct matches leading similarly-sized open models + on academic benchmarks while outperforming those peers on benchmarks for enterprise + tasks and safety. + size: + explanation: 'Dense, general purpose LLMs: Granite-3.0-8B-Instruct' + value: 8B parameters + dependencies: [Hugging Face’s OpenLLM Leaderboard v2] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: The entire Granite family of models are trained on carefully + curated enterprise datasets, filtered for objectionable content with critical + concerns like governance, risk, privacy and bias mitigation in mind + access: + explanation: In keeping with IBM’s strong historical commitment to open source + , all Granite models are released under the permissive Apache 2.0 license + value: open + license: + explanation: In keeping with IBM’s strong historical commitment to open source + , all Granite models are released under the permissive Apache 2.0 license + value: Apache 2.0 + intended_uses: classic natural language use cases including text generation, classification, + summarization, entity extraction and customer service chatbots, programming + language use cases such as code generation, code explanation and code editing, + and for agentic use cases requiring tool calling + prohibited_uses: unknown + monitoring: IBM provides a detailed disclosure of training data sets and methodologies + in the Granite 3.0 technical paper , reaffirming IBM’s dedication to building + transparency, safety and trust in AI products. + feedback: unknown diff --git a/assets/microsoft.yaml b/assets/microsoft.yaml index 137996a4..1c7dc960 100644 --- a/assets/microsoft.yaml +++ b/assets/microsoft.yaml @@ -1032,3 +1032,38 @@ use case, particularly for high risk scenarios. monitoring: Unknown feedback: Unknown +- type: model + name: Phi-4 + organization: Microsoft + description: the latest small language model in Phi family, that offers high quality + results at a small size (14B parameters). + created_date: 2024-12-13 + url: https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090 + model_card: unknown + modality: + explanation: Today we are introducing Phi-4 , our 14B parameter state-of-the-art + small language model (SLM) that excels at complex reasoning in areas such + as math, in addition to conventional language processing. + value: text; text + analysis: Phi-4 outperforms comparable and larger models on math related reasoning. + size: + explanation: a small size (14B parameters). + value: 14B parameters + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: Building AI solutions responsibly is at the core of AI development + at Microsoft. We have made our robust responsible AI capabilities available + to customers building with Phi models. + access: + explanation: Phi-4 is available on Azure AI Foundry and on Hugging Face. + value: open + license: unknown + intended_uses: Specialized in complex reasoning, particularly good at math problems + and high-quality language processing. + prohibited_uses: unknown + monitoring: Azure AI evaluations in AI Foundry enable developers to iteratively + assess the quality and safety of models and applications using built-in and + custom metrics to inform mitigations. + feedback: unknown diff --git a/assets/mistral.yaml b/assets/mistral.yaml index 3c78ac23..04f2b8a7 100644 --- a/assets/mistral.yaml +++ b/assets/mistral.yaml @@ -192,3 +192,79 @@ monitoring: Unknown feedback: Feedback is likely expected to be given through the HuggingFace platform where the model's weights are hosted or directly to the Mistral AI team. +- type: model + name: Pixtral Large + organization: Mistral AI + description: Pixtral Large is the second model in our multimodal family and demonstrates + frontier-level image understanding. Particularly, the model is able to understand + documents, charts and natural images, while maintaining the leading text-only + understanding of Mistral Large 2. + created_date: 2024-11-18 + url: https://mistral.ai/news/pixtral-large/ + model_card: unknown + modality: + explanation: Pixtral Large is the second model in our multimodal family and + demonstrates frontier-level image understanding. + value: text, image; text + analysis: We evaluate Pixtral Large against frontier models on a set of standard + multimodal benchmarks, through a common testing harness. + size: + explanation: Today we announce Pixtral Large, a 124B open-weights multimodal + model. + value: 124B parameters + dependencies: [Mistral Large 2] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: unknown + access: + explanation: The model is available under the Mistral Research License (MRL) + for research and educational use; and the Mistral Commercial License for experimentation, + testing, and production for commercial purposes. + value: open + license: + explanation: The model is available under the Mistral Research License (MRL) + for research and educational use; and the Mistral Commercial License for experimentation, + testing, and production for commercial purposes. + value: Mistral Research License (MRL), Mistral Commercial License + intended_uses: RAG and agentic workflows, making it a suitable choice for enterprise + use cases such as knowledge exploration and sharing, semantic understanding + of documents, task automation, and improved customer experiences. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown + +- type: model + name: Codestral 25.01 + organization: Mistral AI + description: Lightweight, fast, and proficient in over 80 programming languages, + Codestral is optimized for low-latency, high-frequency usecases and supports + tasks such as fill-in-the-middle (FIM), code correction and test generation. + created_date: 2025-01-13 + url: https://mistral.ai/news/codestral-2501/ + model_card: unknown + modality: + explanation: it for free in Continue for VS Code or JetBrains + value: text; text + analysis: Benchmarks We have benchmarked the new Codestral with the leading sub-100B + parameter coding models that are widely considered to be best-in-class for FIM + tasks. + size: + explanation: Codestral-2501 256k + value: 256k + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: unknown + access: + explanation: The API is also available on Google Cloud’s Vertex AI, in private + preview on Azure AI Foundry, and coming soon to Amazon Bedrock. + value: closed + license: unknown + intended_uses: Highly capable coding companion, regularly boosting productivity + several times over. + prohibited_uses: unknown + monitoring: unknown + feedback: We can’t wait to hear your experience! Try it now Try it on Continue.dev + with VsCode or JetBrains diff --git a/assets/openx.yaml b/assets/openx.yaml index 25d61646..8123bfab 100644 --- a/assets/openx.yaml +++ b/assets/openx.yaml @@ -77,3 +77,43 @@ prohibited_uses: none monitoring: unknown feedback: none +- type: model + name: GPT-4o + organization: OpenAI + description: GPT-4o is an autoregressive omni model that accepts a combination + of text, audio, image, and video as input and produces any combination of text, + audio, and image outputs. It is trained end-to-end across text, vision, and + audio, focusing on multimodal capabilities. + created_date: 2024-08-08 + url: https://arxiv.org/pdf/2410.21276 + model_card: unknown + modality: + explanation: '...accepts as input any combination of text, audio, image, and + video and generates any combination of text, audio, and image outputs.' + value: text, audio, image, video; text, audio, image + analysis: GPT-4o underwent evaluations that included the Preparedness Framework, + external red teaming, and third-party assessments to ensure safe and aligned + deployment. The evaluations focused on identifying and mitigating potential + risks across its capabilities, especially speech-to-speech functionality. + size: unknown + dependencies: [Shutterstock] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: Quality and safety measures included prior risk assessments, + post-training mitigation, moderation tools, advanced data filtering, and external + red teaming efforts with experts to evaluate potential risks like bias, discrimination, + and information harms. + access: + explanation: we are sharing the GPT-4o System Card, which includes our Preparedness + Framework evaluations. + value: closed + license: unknown + intended_uses: Use in multimodal applications requiring understanding and generation + of combinations of text, audio, and image outputs, better performance on non-English + languages, and enhanced vision and audio understanding. + prohibited_uses: Uses that could involve bias, discrimination, harmful content, + or violation of usage policies. + monitoring: Continuous monitoring and enforcement, providing moderation tools + and transparency reports, and gathering feedback from users. + feedback: unknown diff --git a/assets/stability_ai.yaml b/assets/stability_ai.yaml index 3ff8188c..0972b1bb 100644 --- a/assets/stability_ai.yaml +++ b/assets/stability_ai.yaml @@ -113,3 +113,43 @@ monitoring: Unknown feedback: Information on any downstream issues with the model can be reported to Stability AI through their support request system. +- type: model + name: Stable Diffusion 3.5 + organization: Stability AI + description: Stable Diffusion 3.5 reflects our commitment to empower builders + and creators with tools that are widely accessible, cutting-edge, and free for + most use cases. + created_date: 2023-10-29 + url: https://stability.ai/news/introducing-stable-diffusion-3-5 + model_card: unknown + modality: + explanation: Capable of generating a wide range of styles and aesthetics like + 3D, photography, painting, line art, and virtually any visual style imaginable. + value: text; image + analysis: Our analysis shows that Stable Diffusion 3.5 Large leads the market + in prompt adherence and rivals much larger models in image quality. + size: + explanation: At 8.1 billion parameters, with superior quality and prompt adherence, + this base model is the most powerful in the Stable Diffusion family. + value: 8.1B parameters + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: We believe in safe, responsible AI practices and take deliberate + measures to ensure Integrity starts at the early stages of development. + access: + explanation: This open release includes multiple model variants, including Stable + Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo, and as of October + 29th, Stable Diffusion 3.5 Medium. + value: open + license: + explanation: This open release includes multiple variants that are customizable, + run on consumer hardware, and are available for use under the permissive Stability + AI Community License. + value: Stability AI Community + intended_uses: This model is ideal for professional use cases at 1 megapixel resolution. + prohibited_uses: unknown + monitoring: unknown + feedback: We look forward to hearing your feedback on Stable Diffusion 3.5 and + seeing what you create with the models. diff --git a/assets/unknown.yaml b/assets/unknown.yaml new file mode 100644 index 00000000..a8cddcc9 --- /dev/null +++ b/assets/unknown.yaml @@ -0,0 +1,40 @@ +--- +- type: model + name: DeepSeek-V3 + organization: unknown + description: DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B + total parameters and 37B activated per token. It utilizes Multi-head Latent + Attention (MLA) and adopts innovative strategies for improved performance, such + as an auxiliary-loss-free load balancing and a multi-token prediction training + objective. Comprehensive evaluations show it achieves performance comparable + to leading closed-source models. + created_date: 2025-01-14 + url: https://huggingface.co/deepseek-ai/DeepSeek-V3 + model_card: https://huggingface.co/deepseek-ai/DeepSeek-V3 + modality: unknown + analysis: Comprehensive evaluations reveal that DeepSeek-V3 outperforms other + open-source models and achieves performance comparable to leading closed-source + models. + size: + explanation: a strong Mixture-of-Experts (MoE) language model with 671B total + parameters with 37B activated for each token. + value: 671B parameters (sparse) + dependencies: [DeepSeek-R1] + training_emissions: unknown + training_time: + explanation: DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. + value: 2.788M GPU hours + training_hardware: + explanation: DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. + value: H800 GPUs + quality_control: Post-training includes knowledge distillation from the DeepSeek-R1 + model, incorporating verification and reflection patterns to enhance reasoning + performance. + access: + explanation: producing the currently strongest open-source base model. + value: open + license: unknown + intended_uses: unknown + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/js/main.js b/js/main.js index 6b7611e5..3809eae1 100644 --- a/js/main.js +++ b/js/main.js @@ -635,9 +635,11 @@ function loadAssetsAndRenderPageContent() { const paths = [ 'assets/adept.yaml', + 'assets/genmo.yaml', 'assets/mila.yaml', 'assets/soochow.yaml', 'assets/baichuan.yaml', + 'assets/unknown.yaml', 'assets/xwin.yaml', 'assets/mistral.yaml', 'assets/adobe.yaml', From 9d3e2bc25bffe3be796e6d1d7af19ccd79cf73b0 Mon Sep 17 00:00:00 2001 From: jxue16 <105090474+jxue16@users.noreply.github.com> Date: Tue, 14 Jan 2025 16:47:35 -0800 Subject: [PATCH 2/4] minor change to veo access --- assets/google.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assets/google.yaml b/assets/google.yaml index 95af22d3..fbe25014 100644 --- a/assets/google.yaml +++ b/assets/google.yaml @@ -1931,7 +1931,7 @@ training_hardware: unknown quality_control: Veo 2 includes features that enhance realism, fidelity, detail, and artifact reduction to ensure high-quality output. - access: '' + access: limited license: unknown intended_uses: Creating high-quality videos with realistic motion, different styles, camera controls, shot styles, angles, and movements. From 79a027ed566f50c446a1e494ee2ba76297509753 Mon Sep 17 00:00:00 2001 From: jxue16 <105090474+jxue16@users.noreply.github.com> Date: Tue, 14 Jan 2025 18:37:03 -0800 Subject: [PATCH 3/4] assets update --- assets/amazon.yaml | 46 ++++++++++++++++++++++++++++++++++------ assets/anthropic.yaml | 12 ++++++----- assets/cohere.yaml | 22 +++++++++++++++++++ assets/google.yaml | 2 +- assets/ibm.yaml | 4 +--- assets/inflection.yaml | 26 +++++++++++++++++++++++ assets/meta.yaml | 48 ++++++++++++++++++++++++++++++++++++++++++ assets/mistral.yaml | 4 +--- assets/openx.yaml | 38 ++++++++++++++++++++++++++++++++- assets/unknown.yaml | 4 ++-- 10 files changed, 185 insertions(+), 21 deletions(-) diff --git a/assets/amazon.yaml b/assets/amazon.yaml index 6cae7fcf..70aad8fb 100644 --- a/assets/amazon.yaml +++ b/assets/amazon.yaml @@ -87,20 +87,19 @@ monitoring: '' feedback: https://github.com/amazon-science/chronos-forecasting/discussions - type: model - name: Amazon Nova + name: Amazon Nova (Understanding) organization: Amazon Web Services (AWS) description: A new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available - exclusively in Amazon Bedrock. You can use Amazon Nova to lower costs and latency - for almost any generative AI task. + exclusively in Amazon Bedrock. Amazon Nova understanding models excel in Retrieval-Augmented + Generation (RAG), function calling, and agentic applications. created_date: 2024-12-03 url: https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/ model_card: unknown modality: explanation: Amazon Nova understanding models accept text, image, or video inputs - to generate text output. Amazon creative content generation models accept - text and image inputs to generate image or video output. - value: text, image, video; text, image, video + to generate text output. + value: text, image, video; text analysis: Amazon Nova Pro is capable of processing up to 300K input tokens and sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. It achieves state-of-the-art @@ -125,3 +124,38 @@ prohibited_uses: unknown monitoring: unknown feedback: unknown +- type: model + name: Amazon Nova (Creative Content Generation) + organization: Amazon Web Services (AWS) + description: A new generation of state-of-the-art foundation models (FMs) that + deliver frontier intelligence and industry leading price performance, available + exclusively in Amazon Bedrock. + created_date: 2024-12-03 + url: https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/ + model_card: unknown + modality: + explanation: Amazon creative content generation models accept text and image + inputs to generate image or video output. + value: text, image;image, video + analysis: Amazon Nova Canvas excels on human evaluations and key benchmarks such + as text-to-image faithfulness evaluation with question answering (TIFA) and + ImageReward. + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: All Amazon Nova models include built-in safety controls and creative + content generation models include watermarking capabilities to promote responsible + AI use. + access: + explanation: available exclusively in Amazon Bedrock + value: limited + license: unknown + intended_uses: You can build on Amazon Nova to analyze complex documents and videos, + understand charts and diagrams, generate engaging video content, and build sophisticated + AI agents, from across a range of intelligence classes optimized for enterprise + workloads. + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/anthropic.yaml b/assets/anthropic.yaml index 9400ee16..758ed030 100644 --- a/assets/anthropic.yaml +++ b/assets/anthropic.yaml @@ -608,15 +608,17 @@ speed of its predecessor, Claude 3 Opus, and is designed to tackle tasks like context-sensitive customer support, orchestrating multi-step workflows, interpreting charts and graphs, and transcribing text from images. - created_date: 2024-06-21 - url: https://www.anthropic.com/news/claude-3-5-sonnet + created_date: + explanation: Claude 3.5 Sonnet updated on Oct. 22, initially released on June + 20 of the same year. + value: 2024-10-22 + url: https://www.anthropic.com/news/3-5-models-and-computer-use model_card: unknown modality: text; image, text analysis: The model has been evaluated on a range of tests including graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), coding proficiency (HumanEval), - and standard vision benchmarks. In an internal agentic coding evaluation, Claude - 3.5 Sonnet solved 64% of problems, outperforming the previous version, Claude - 3 Opus, which solved 38%. + and standard vision benchmarks. Claude 3.5 Sonnet demonstrates state-of-the-art + performance on most benchmarks. size: Unknown dependencies: [] training_emissions: Unknown diff --git a/assets/cohere.yaml b/assets/cohere.yaml index bc7e0a2f..c5f86d73 100644 --- a/assets/cohere.yaml +++ b/assets/cohere.yaml @@ -592,3 +592,25 @@ prohibited_uses: unknown monitoring: unknown feedback: https://huggingface.co/CohereForAI/aya-23-35B/discussions +- type: model + name: Command R+ + organization: Cohere + description: Command R+ is a state-of-the-art RAG-optimized model designed to + tackle enterprise-grade workloads, and is available first on Microsoft Azure. + created_date: 2024-04-04 + url: https://cohere.com/blog/command-r-plus-microsoft-azure + model_card: unknown + modality: unknown + analysis: unknown + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: unknown + access: '' + license: unknown + intended_uses: unknown + prohibited_uses: unknown + monitoring: unknown + feedback: unknown diff --git a/assets/google.yaml b/assets/google.yaml index fbe25014..94d73f70 100644 --- a/assets/google.yaml +++ b/assets/google.yaml @@ -1951,7 +1951,7 @@ explanation: The first model built to be natively multimodal, Gemini 1.0 and 1.5 drove big advances with multimodality and long context to understand information across text, video, images, audio and code... - value: text, video, images, audio, code; image, audio + value: text, video, image, audio; image, text analysis: unknown size: unknown dependencies: [] diff --git a/assets/ibm.yaml b/assets/ibm.yaml index 3df96d5b..368680ae 100644 --- a/assets/ibm.yaml +++ b/assets/ibm.yaml @@ -115,7 +115,5 @@ language use cases such as code generation, code explanation and code editing, and for agentic use cases requiring tool calling prohibited_uses: unknown - monitoring: IBM provides a detailed disclosure of training data sets and methodologies - in the Granite 3.0 technical paper , reaffirming IBM’s dedication to building - transparency, safety and trust in AI products. + monitoring: '' feedback: unknown diff --git a/assets/inflection.yaml b/assets/inflection.yaml index 4f82b1f9..d18db5ce 100644 --- a/assets/inflection.yaml +++ b/assets/inflection.yaml @@ -93,3 +93,29 @@ prohibited_uses: '' monitoring: '' feedback: none +- type: model + name: Inflection 3.0 + organization: Inflection AI + description: Inflection for Enterprise, powered by our industry-first, enterprise-grade + AI system, Inflection 3.0. + created_date: 2024-10-07 + url: https://inflection.ai/blog/enterprise + model_card: unknown + modality: unknown + analysis: unknown + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: unknown + access: + explanation: Developers can now access Inflection AI’s Large Language Model + through its new commercial API. + value: open + license: unknown + intended_uses: unknown + prohibited_uses: unknown + monitoring: unknown + feedback: So please drop us a line. We want to keep hearing from enterprises about + how we can help solve their challenges and make AI a reality for their business. diff --git a/assets/meta.yaml b/assets/meta.yaml index 0b7653ec..8b337691 100644 --- a/assets/meta.yaml +++ b/assets/meta.yaml @@ -891,3 +891,51 @@ prohibited_uses: Unknown monitoring: Unknown feedback: Unknown +- type: model + name: Llama 3.3 + organization: Meta + description: The Meta Llama 3.3 multilingual large language model (LLM) is an + instruction tuned generative model in 70B (text in/text out). + created_date: 2024-12-06 + url: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct + model_card: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct + modality: + explanation: The Llama 3.3 instruction tuned text only model is optimized for + multilingual dialogue use cases. + value: text; text + analysis: Unknown + size: + explanation: The Meta Llama 3.3 multilingual large language model (LLM) is an + instruction tuned generative model in 70B (text in/text out). + value: 70B parameters + dependencies: [] + training_emissions: + explanation: Training Greenhouse Gas Emissions Estimated total location-based + greenhouse gas emissions were 11,390 tons CO2eq for training. + value: 11,390 tons CO2eq + training_time: + explanation: Training utilized a cumulative of 39.3M GPU hours of computation + on H100-80GB (TDP of 700W) type hardware. + value: 39.3M GPU hours + training_hardware: + explanation: Training utilized a cumulative of 39.3M GPU hours of computation + on H100-80GB (TDP of 700W) type hardware. + value: H100-80GB (TDP of 700W) type hardware + quality_control: Used "supervised fine-tuning (SFT) and reinforcement learning + with human feedback (RLHF) to align with human preferences for helpfulness and + safety." + access: + explanation: Future versions of the tuned models will be released as we improve + model safety with community feedback. + value: open + license: + explanation: A custom commercial license, the Llama 3.3 Community License Agreement + value: Llama 3.3 Community License Agreement + intended_uses: Intended for commercial and research use in multiple languages. + Instruction tuned text only models are intended for assistant-like chat. + prohibited_uses: Use in any manner that violates applicable laws or regulations + (including trade compliance laws). Use in any other way that is prohibited by + the Acceptable Use Policy and Llama 3.3 Community License. + monitoring: Unknown + feedback: Instructions on how to provide feedback or comments on the model can + be found in the model README. diff --git a/assets/mistral.yaml b/assets/mistral.yaml index 04f2b8a7..7fe7ce5a 100644 --- a/assets/mistral.yaml +++ b/assets/mistral.yaml @@ -249,9 +249,7 @@ analysis: Benchmarks We have benchmarked the new Codestral with the leading sub-100B parameter coding models that are widely considered to be best-in-class for FIM tasks. - size: - explanation: Codestral-2501 256k - value: 256k + size: unknown dependencies: [] training_emissions: unknown training_time: unknown diff --git a/assets/openx.yaml b/assets/openx.yaml index 8123bfab..afaf3ab6 100644 --- a/assets/openx.yaml +++ b/assets/openx.yaml @@ -107,7 +107,7 @@ access: explanation: we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. - value: closed + value: limited license: unknown intended_uses: Use in multimodal applications requiring understanding and generation of combinations of text, audio, and image outputs, better performance on non-English @@ -117,3 +117,39 @@ monitoring: Continuous monitoring and enforcement, providing moderation tools and transparency reports, and gathering feedback from users. feedback: unknown + +- type: model + name: OpenAI o1 + organization: OpenAI + description: OpenAI o1 is a new series of AI models designed to spend more time + thinking before they respond. They can reason through complex tasks and solve + harder problems than previous models in science, coding, and math. + created_date: 2024-09-12 + url: https://openai.com/o1/ + model_card: unknown + modality: text; text + analysis: Evaluated on challenging benchmark tasks in physics, chemistry, and + biology. In a qualifying exam for the International Mathematics Olympiad (IMO), + GPT-4o correctly solved only 13% of problems, while the reasoning model o1 scored + 83%. + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: To match the new capabilities of these models, OpenAI has bolstered + safety work, internal governance, and federal government collaboration. This + includes rigorous testing and evaluations using their Preparedness Framework⁠(opens + in a new window), best-in-class red teaming, and board-level review processes, + including by OpenAI's Safety & Security Committee. + access: limited + license: unknown + intended_uses: These enhanced reasoning capabilities may be particularly useful + if you’re tackling complex problems in science, coding, math, and similar fields. + For example, o1 can be used by healthcare researchers to annotate cell sequencing + data, by physicists to generate complicated mathematical formulas needed for + quantum optics, and by developers in all fields to build and execute multi-step + workflows. + prohibited_uses: '' + monitoring: '' + feedback: unknown diff --git a/assets/unknown.yaml b/assets/unknown.yaml index a8cddcc9..c2f323c3 100644 --- a/assets/unknown.yaml +++ b/assets/unknown.yaml @@ -1,7 +1,7 @@ --- - type: model name: DeepSeek-V3 - organization: unknown + organization: DeepSeek description: DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated per token. It utilizes Multi-head Latent Attention (MLA) and adopts innovative strategies for improved performance, such @@ -33,7 +33,7 @@ access: explanation: producing the currently strongest open-source base model. value: open - license: unknown + license: MIT intended_uses: unknown prohibited_uses: unknown monitoring: unknown From ffa7c88edf990b87b8bd32261c87920221a7cac7 Mon Sep 17 00:00:00 2001 From: jxue16 <105090474+jxue16@users.noreply.github.com> Date: Thu, 16 Jan 2025 16:30:13 -0800 Subject: [PATCH 4/4] add o3 --- assets/openx.yaml | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/assets/openx.yaml b/assets/openx.yaml index afaf3ab6..e67348bd 100644 --- a/assets/openx.yaml +++ b/assets/openx.yaml @@ -119,7 +119,7 @@ feedback: unknown - type: model - name: OpenAI o1 + name: o1 organization: OpenAI description: OpenAI o1 is a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve @@ -153,3 +153,27 @@ prohibited_uses: '' monitoring: '' feedback: unknown + +- type: model + name: o3 + organization: OpenAI + description: OpenAI o1 is, as of release, the latest model in OpenAI's o-model + reasoning series. + created_date: 2024-10-20 + url: https://x.com/OpenAI/status/1870186518230511844 + model_card: unknown + modality: text; text + analysis: Makes significance process on the ARC-AGI evaluation framework compared + to all existing models. + size: unknown + dependencies: [] + training_emissions: unknown + training_time: unknown + training_hardware: unknown + quality_control: '' + access: limited + license: unknown + intended_uses: '' + prohibited_uses: '' + monitoring: '' + feedback: unknown