Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP new JSON for queries #2053

Closed
wants to merge 29 commits into from
Closed

WIP new JSON for queries #2053

wants to merge 29 commits into from

Conversation

simonw
Copy link
Owner

@simonw simonw commented Apr 5, 2023

Refs:

TODO:

  • Read queries JSON
  • Implement error display with "ok": false and an errors key
  • Read queries HTML
  • Read queries other formats (plugins)
  • Canned read queries (dispatched to from table)
  • Write queries (a canned query thing)
  • Implement different shapes, refactoring to share code with table
  • Implement a sensible subset of extras, also refactoring to share code with table
  • Get all tests passing

📚 Documentation preview 📚: https://datasette--2053.org.readthedocs.build/en/2053/

@simonw
Copy link
Owner Author

simonw commented Apr 5, 2023

Table errors page currently does this:

{
  "ok": false,
  "error": "no such column: blah",
  "status": 400,
  "title": "Invalid SQL"
}

@simonw
Copy link
Owner Author

simonw commented May 25, 2023

Uncommitted experimental code:

diff --git a/datasette/views/database.py b/datasette/views/database.py
index 455ebd1f..85775433 100644
--- a/datasette/views/database.py
+++ b/datasette/views/database.py
@@ -909,12 +909,13 @@ async def query_view(
     elif format_ in datasette.renderers.keys():
         # Dispatch request to the correct output format renderer
         # (CSV is not handled here due to streaming)
+        print(data)
         result = call_with_supported_arguments(
             datasette.renderers[format_][0],
             datasette=datasette,
-            columns=columns,
-            rows=rows,
-            sql=sql,
+            columns=data["rows"][0].keys(),
+            rows=data["rows"],
+            sql='',
             query_name=None,
             database=db.name,
             table=None,
@@ -923,7 +924,7 @@ async def query_view(
             # These will be deprecated in Datasette 1.0:
             args=request.args,
             data={
-                "rows": rows,
+                "rows": data["rows"],
             },  # TODO what should this be?
         )
         result = await await_me_maybe(result)
diff --git a/docs/index.rst b/docs/index.rst
index 5a9cc7ed..254ed3da 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -57,6 +57,7 @@ Contents
    settings
    introspection
    custom_templates
+   template_context
    plugins
    writing_plugins
    plugin_hooks

Where docs/template_context.rst looked like this:

.. _template_context:

Template context
================

.. currentmodule:: datasette.context

This page describes the variables made available to templates used by Datasette to render different pages of the application.

.. autoclass:: QueryContext
    :members:

And datasette/context.py had this:

from dataclasses import dataclass

@dataclass
class QueryContext:
    """
    Used by the ``/database`` page when showing the results of a SQL query
    """
    id: int
    "Id is a thing"
    rows: list[dict]
    "Name is another thing"

@simonw
Copy link
Owner Author

simonw commented May 26, 2023

Now that I have the new View subclass from #2078 I want to use it to simplify this code.

Challenge: there are several things to consider here:

  • The /db page without ?sql= displays a list of tables in that database
  • With ?sql= it shows the query results for that query (or an error)
  • If it's a /db/name-of-canned-query it works a bit like the query page, but executes a canned query instead of the ?sql= query
  • POST /db/name-of-canned-query is support for writable canned queries

@simonw
Copy link
Owner Author

simonw commented May 26, 2023

I'm going to entirely split canned queries off from ?sql= queries - they share a bunch of code right now which is just making everything much harder to follow.

I'll refactor their shared bits into functions that they both call.

Or maybe I'll try having CannedQueryView as a subclass of QueryView.

@simonw
Copy link
Owner Author

simonw commented May 26, 2023

Or maybe...

  • BaseQueryView(View) - knows how to render the results of a SQL query
  • QueryView(BaseQueryView) - renders from ?sql=
  • CannedQueryView(BaseQueryView) - renders for a named canned query

And then later perhaps:

  • RowQueryView(BaseQueryView) - renders the select * from t where pk = ?
  • TableQueryView(BaseQueryView) - replaces the super complex existing TableView

@simonw
Copy link
Owner Author

simonw commented May 26, 2023

I should split out a canned_query.html template too, as something that extends the query.html template.

@simonw
Copy link
Owner Author

simonw commented May 26, 2023

I should have an extra called extra_html_context which bundles together all of the weird extra stuff needed by the HTML template, and is then passed as the root context when the template is rendered (with the other stuff from extras patched into it).

simonw and others added 7 commits July 26, 2023 06:51
Bumps [blacken-docs](https://github.com/asottile/blacken-docs) from 1.13.0 to 1.14.0.
- [Changelog](https://github.com/adamchainz/blacken-docs/blob/main/CHANGELOG.rst)
- [Commits](adamchainz/blacken-docs@1.13.0...1.14.0)

---
updated-dependencies:
- dependency-name: blacken-docs
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Also ensure codespell runs as part of just lint
@simonw
Copy link
Owner Author

simonw commented Jul 26, 2023

Big chunk of commented-out code I just removed:

    import pdb

    pdb.set_trace()

    if isinstance(output, dict) and output.get("ok") is False:
        # TODO: Other error codes?

        response.status_code = 400

    if datasette.cors:
        add_cors_headers(response.headers)

    return response

    # registry = Registry(
    #     extra_count,
    #     extra_facet_results,
    #     extra_facets_timed_out,
    #     extra_suggested_facets,
    #     facet_instances,
    #     extra_human_description_en,
    #     extra_next_url,
    #     extra_columns,
    #     extra_primary_keys,
    #     run_display_columns_and_rows,
    #     extra_display_columns,
    #     extra_display_rows,
    #     extra_debug,
    #     extra_request,
    #     extra_query,
    #     extra_metadata,
    #     extra_extras,
    #     extra_database,
    #     extra_table,
    #     extra_database_color,
    #     extra_table_actions,
    #     extra_filters,
    #     extra_renderers,
    #     extra_custom_table_templates,
    #     extra_sorted_facet_results,
    #     extra_table_definition,
    #     extra_view_definition,
    #     extra_is_view,
    #     extra_private,
    #     extra_expandable_columns,
    #     extra_form_hidden_args,
    # )

    results = await registry.resolve_multi(
        ["extra_{}".format(extra) for extra in extras]
    )
    data = {
        "ok": True,
        "next": next_value and str(next_value) or None,
    }
    data.update(
        {
            key.replace("extra_", ""): value
            for key, value in results.items()
            if key.startswith("extra_") and key.replace("extra_", "") in extras
        }
    )
    raw_sqlite_rows = rows[:page_size]
    data["rows"] = [dict(r) for r in raw_sqlite_rows]

    private = False
    if canned_query:
        # Respect canned query permissions
        visible, private = await datasette.check_visibility(
            request.actor,
            permissions=[
                ("view-query", (database, canned_query)),
                ("view-database", database),
                "view-instance",
            ],
        )
        if not visible:
            raise Forbidden("You do not have permission to view this query")

    else:
        await datasette.ensure_permissions(request.actor, [("execute-sql", database)])

    # If there's no sql, show the database index page
    if not sql:
        return await database_index_view(request, datasette, db)

    validate_sql_select(sql)

    # Extract any :named parameters
    named_parameters = named_parameters or await derive_named_parameters(db, sql)
    named_parameter_values = {
        named_parameter: params.get(named_parameter) or ""
        for named_parameter in named_parameters
        if not named_parameter.startswith("_")
    }

    # Set to blank string if missing from params
    for named_parameter in named_parameters:
        if named_parameter not in params and not named_parameter.startswith("_"):
            params[named_parameter] = ""

    extra_args = {}
    if params.get("_timelimit"):
        extra_args["custom_time_limit"] = int(params["_timelimit"])
    if _size:
        extra_args["page_size"] = _size

    templates = [f"query-{to_css_class(database)}.html", "query.html"]
    if canned_query:
        templates.insert(
            0,
            f"query-{to_css_class(database)}-{to_css_class(canned_query)}.html",
        )

    query_error = None

    # Execute query - as write or as read
    if write:
        raise NotImplementedError("Write queries not yet implemented")
        # if request.method == "POST":
        #     # If database is immutable, return an error
        #     if not db.is_mutable:
        #         raise Forbidden("Database is immutable")
        #     body = await request.post_body()
        #     body = body.decode("utf-8").strip()
        #     if body.startswith("{") and body.endswith("}"):
        #         params = json.loads(body)
        #         # But we want key=value strings
        #         for key, value in params.items():
        #             params[key] = str(value)
        #     else:
        #         params = dict(parse_qsl(body, keep_blank_values=True))
        #     # Should we return JSON?
        #     should_return_json = (
        #         request.headers.get("accept") == "application/json"
        #         or request.args.get("_json")
        #         or params.get("_json")
        #     )
        #     if canned_query:
        #         params_for_query = MagicParameters(params, request, self.ds)
        #     else:
        #         params_for_query = params
        #     ok = None
        #     try:
        #         cursor = await self.ds.databases[database].execute_write(
        #             sql, params_for_query
        #         )
        #         message = metadata.get(
        #             "on_success_message"
        #         ) or "Query executed, {} row{} affected".format(
        #             cursor.rowcount, "" if cursor.rowcount == 1 else "s"
        #         )
        #         message_type = self.ds.INFO
        #         redirect_url = metadata.get("on_success_redirect")
        #         ok = True
        #     except Exception as e:
        #         message = metadata.get("on_error_message") or str(e)
        #         message_type = self.ds.ERROR
        #         redirect_url = metadata.get("on_error_redirect")
        #         ok = False
        #     if should_return_json:
        #         return Response.json(
        #             {
        #                 "ok": ok,
        #                 "message": message,
        #                 "redirect": redirect_url,
        #             }
        #         )
        #     else:
        #         self.ds.add_message(request, message, message_type)
        #         return self.redirect(request, redirect_url or request.path)
        # else:

        #     async def extra_template():
        #         return {
        #             "request": request,
        #             "db_is_immutable": not db.is_mutable,
        #             "path_with_added_args": path_with_added_args,
        #             "path_with_removed_args": path_with_removed_args,
        #             "named_parameter_values": named_parameter_values,
        #             "canned_query": canned_query,
        #             "success_message": request.args.get("_success") or "",
        #             "canned_write": True,
        #         }

        #     return (
        #         {
        #             "database": database,
        #             "rows": [],
        #             "truncated": False,
        #             "columns": [],
        #             "query": {"sql": sql, "params": params},
        #             "private": private,
        #         },
        #         extra_template,
        #         templates,
        #     )

    # Not a write
    rows = []
    if canned_query:
        params_for_query = MagicParameters(params, request, datasette)
    else:
        params_for_query = params
    try:
        results = await datasette.execute(
            database, sql, params_for_query, truncate=True, **extra_args
        )
        columns = [r[0] for r in results.description]
        rows = list(results.rows)
    except sqlite3.DatabaseError as e:
        query_error = e
        results = None
        columns = []

    allow_execute_sql = await datasette.permission_allowed(
        request.actor, "execute-sql", database
    )

    format_ = request.url_vars.get("format") or "html"

    if format_ == "csv":
        raise NotImplementedError("CSV format not yet implemented")
    elif format_ in datasette.renderers.keys():
        # Dispatch request to the correct output format renderer
        # (CSV is not handled here due to streaming)
        result = call_with_supported_arguments(
            datasette.renderers[format_][0],
            datasette=datasette,
            columns=columns,
            rows=rows,
            sql=sql,
            query_name=None,
            database=db.name,
            table=None,
            request=request,
            view_name="table",  # TODO: should this be "query"?
            # These will be deprecated in Datasette 1.0:
            args=request.args,
            data={
                "rows": rows,
            },  # TODO what should this be?
        )
        result = await await_me_maybe(result)
        if result is None:
            raise NotFound("No data")
        if isinstance(result, dict):
            r = Response(
                body=result.get("body"),
                status=result.get("status_code") or 200,
                content_type=result.get("content_type", "text/plain"),
                headers=result.get("headers"),
            )
        elif isinstance(result, Response):
            r = result
            # if status_code is not None:
            #     # Over-ride the status code
            #     r.status = status_code
        else:
            assert False, f"{result} should be dict or Response"
    elif format_ == "html":
        headers = {}
        templates = [f"query-{to_css_class(database)}.html", "query.html"]
        template = datasette.jinja_env.select_template(templates)
        alternate_url_json = datasette.absolute_url(
            request,
            datasette.urls.path(path_with_format(request=request, format="json")),
        )
        headers.update(
            {
                "Link": '{}; rel="alternate"; type="application/json+datasette"'.format(
                    alternate_url_json
                )
            }
        )
        r = Response.html(
            await datasette.render_template(
                template,
                dict(
                    data,
                    append_querystring=append_querystring,
                    path_with_replaced_args=path_with_replaced_args,
                    fix_path=datasette.urls.path,
                    settings=datasette.settings_dict(),
                    # TODO: review up all of these hacks:
                    alternate_url_json=alternate_url_json,
                    datasette_allow_facet=(
                        "true" if datasette.setting("allow_facet") else "false"
                    ),
                    is_sortable=any(c["sortable"] for c in data["display_columns"]),
                    allow_execute_sql=await datasette.permission_allowed(
                        request.actor, "execute-sql", resolved.db.name
                    ),
                    query_ms=1.2,
                    select_templates=[
                        f"{'*' if template_name == template.name else ''}{template_name}"
                        for template_name in templates
                    ],
                ),
                request=request,
                view_name="table",
            ),
            headers=headers,
        )
    else:
        assert False, "Invalid format: {}".format(format_)
    # if next_url:
    #     r.headers["link"] = f'<{next_url}>; rel="next"'
    return r

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2023

I think the hardest part of getting this working is dealing with the different formats.

Idea: refactor .html as a format (since it's by far the most complex) and tweak the plugin hook a bit as part of that, then use what I learn from that to get the other formats working.

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2023

Another point of confusion is how /content sometimes serves the database index page (with a list of tables) and sometimes solves the results of a query.

I could resolve this by turning the information on the index page into extras, which can optionally be requested any time a query is run but default to being shown if there is no query.

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2023

Worth noting that the register_output_renderer() is actually pretty easy to extend, because it returns a dictionary which could have more keys (like the required set of extras) added to it:

@hookimpl
def register_output_renderer(datasette):
    return {
        "extension": "test",
        "render": render_demo,
        "can_render": can_render_demo,  # Optional
    }

https://docs.datasette.io/en/0.64.3/plugin_hooks.html#register-output-renderer-datasette

One slight hiccup with that plugin hook is this:

rows - list of sqlite3.Row objects

I could turn that into a Datasette defined object that behaves like a sqlite3.Row though, which would give me extra flexibility in the future.

A bit tricky though since it's implemented in C for performance: https://github.com/python/cpython/blob/b0202a4e5d6b629ba5acbc703e950f08ebaf07df/Modules/_sqlite/row.c

Pasted that into Claude for the following explanation:

  • pysqlite_Row is the structure defining the Row object. It contains the tuple of data (self->data) and description of columns (self->description).
  • pysqlite_row_new() is the constructor which creates a new Row object given a cursor and tuple of data.
  • pysqlite_row_dealloc() frees the memory when Row object is deleted.
  • pysqlite_row_keys() returns the column names of the row.
  • pysqlite_row_length() and pysqlite_row_subscript() implement sequence like behavior to access row elements by index.
  • pysqlite_row_subscript() also allows accessing by column name by doing a lookup in description.
  • pysqlite_row_hash() and pysqlite_row_richcompare() implement equality checks and hash function.

I could use protocols in Python to make my own DatasetteRow which can be used interchangeably with sqlite3.Row - https://docs.python.org/3/library/typing.html#typing.Protocol

Turned this into a TIL: https://til.simonwillison.net/python/protocols

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2023

I'm abandoning this branch in favour of a fresh attempt, described here:

I'll copy bits and pieces of this branch across as-needed.

@simonw simonw closed this Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant