Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: escaped Unicode in BaseAdapter #105

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

leonardmq
Copy link
Contributor

What does this PR do?

Fix a bug where Unicode in structured input and output is escaped in the BaseAdapter, causing two main issues:

  1. Structured input and output are sent back to the frontend as escaped strings (code reference), which are displayed as-is in the Datasets page.
  2. Structured input and output are persisted in .kiln files as ASCII. While valid JSON, this is not ideal for scenarios like collaboration, committing datasets to GitHub, and searching. Additionally, there is a risk of improperly decoding these strings.

After this fix, Kiln will store structured input and output without escaping Unicode, which is already the behavior for plain text input and output. This change is non-breaking, as json.load and json.loads can handle both Unicode and ASCII-escaped Unicode.

Related Issues

Part of a group of bugs caused by escaped Unicode: #95

Contributor License Agreement

I, @leonardmq, confirm that I have read and agree to the Contributors License Agreement.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

@leonardmq
Copy link
Contributor Author

Run summary output preview before / after:

image

@leonardmq leonardmq force-pushed the fix/unicode-in-adapter branch from 0c22ff5 to e9fd1d6 Compare January 13, 2025 21:39
@leonardmq
Copy link
Contributor Author

Task before (with escaping):

{
  "v": 1,
  "id": "179446515248",
  "created_at": "2025-01-13T22:36:58.663778",
  "created_by": "leonardmarcq",
  "input": "I like cars.",
  "input_source": {
    "type": "human",
    "properties": {
      "created_by": "leonardmarcq"
    }
  },
  "output": {
    "v": 1,
    "id": "172557301377",
    "path": null,
    "created_at": "2025-01-13T22:36:58.663734",
    "created_by": "leonardmarcq",
    "output": "{\"translation\": \"\\u6211\\u559c\\u6b22\\u6c7d\\u8f66\\u3002\", \"sentiment\": 4}",
    "source": {
      "type": "synthetic",
      "properties": {
        "adapter_name": "kiln_langchain_adapter",
        "model_name": "gpt_4o",
        "model_provider": "openrouter",
        "prompt_builder_name": "simple_prompt_builder"
      }
    },
    "rating": null,
    "model_type": "task_output"
  },
  "repair_instructions": null,
  "repaired_output": null,
  "intermediate_outputs": {},
  "tags": [
    "manual_run"
  ],
  "model_type": "task_run"
}

Task after (without escaping):

{
  "v": 1,
  "id": "305769926316",
  "created_at": "2025-01-13T22:37:19.291991",
  "created_by": "leonardmarcq",
  "input": "I like cars.",
  "input_source": {
    "type": "human",
    "properties": {
      "created_by": "leonardmarcq"
    }
  },
  "output": {
    "v": 1,
    "id": "272336414910",
    "path": null,
    "created_at": "2025-01-13T22:37:19.291943",
    "created_by": "leonardmarcq",
    "output": "{\"translation\": \"我喜欢汽车。\", \"sentiment\": 4}",
    "source": {
      "type": "synthetic",
      "properties": {
        "adapter_name": "kiln_langchain_adapter",
        "model_name": "gpt_4o",
        "model_provider": "openrouter",
        "prompt_builder_name": "simple_prompt_builder"
      }
    },
    "rating": null,
    "model_type": "task_output"
  },
  "repair_instructions": null,
  "repaired_output": null,
  "intermediate_outputs": {},
  "tags": [
    "manual_run"
  ],
  "model_type": "task_run"
}

@leonardmq leonardmq force-pushed the fix/unicode-in-adapter branch from e9fd1d6 to 645e15f Compare January 13, 2025 21:56
@scosman scosman merged commit cedad39 into Kiln-AI:main Jan 14, 2025
2 checks passed
@scosman
Copy link
Collaborator

scosman commented Jan 14, 2025

Thanks @leonardmq !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants