Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification: Data Type display on docs serve using mysql adapter #108

Open
clementchong opened this issue Aug 30, 2022 · 6 comments
Open
Labels
bug Something isn't working

Comments

@clementchong
Copy link

Describe the bug

Apologies, we do not know if this is a bug yet. Hence like to check with the rest of users who may be able to help. After docs generate and serve, we see that the data type are displayed with b'xxxx'. Please see screenshot below:

image

We do see the same in the generated target/catalog.json file. Not sure if this is adapter specific or configuration issue but we searched and could not find a solution.

Appreciate some guidance on this. Thank you!

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

The output of dbt --version:
dbt-ver

The operating system you're using:

Ubuntu 22.04.1 LTS

The output of python --version:
3.9.13

Additional context

Add any other context about the problem here.

@clementchong clementchong added the bug Something isn't working label Aug 30, 2022
@dbeatty10
Copy link
Owner

@clementchong thanks for opening this issue!

It sounds like target/catalog.json file contains a Python bytes literal rather than a str literal (see here and here for further explanation).

Basically, it has encoded data that needs to be decoded in order to render properly.

Which database and database version are you using (e.g., MySQL 8.0.32)?
Do you know what your "Connection Character Set and Collation System Variable" settings are?

character_set_results is "the character set used for returning query results to the client. This includes result data such as column values, result metadata such as column names, and error messages." I'm wondering if your setting might explain the bytes encoding like b'varchar(8)', etc.

@clementchong
Copy link
Author

clementchong commented Aug 31, 2022

@dbeatty10 Thank you for the quick and detailed overview. Our MySQL version is 8.0.25. The variable values are as below:

SmartSelect_20220831_133825_Gallery

Do you see anything wrong? Or could someone share their versions if not facing such display issue?

@clementchong
Copy link
Author

@clementchong
Copy link
Author

I also found the information_schema.columns table having binary collation for the data_type field. However, I am unable to change it.

image

@moszutij
Copy link

moszutij commented May 6, 2023

I also ran into this issue and it appears to be targeted at MySQL 8.0. The data_type column's collation in information_schema.columns is binary in this version, while it is non-binary in MySQL 5.7 and MariaDB 10.5. When a column is defined with binary collation, the Python MySQL connector returns the column value as a byte.

The collation settings for each database versions are as follows:

DATABASE DATA_TYPE CHARACTER_SET_NAME COLLATION_NAME
mysql:5.7 varchar utf8 utf8_general_ci
mysql:8.0 longtext utf8mb3 utf8mb3_bin
mariadb:10.5 varchar utf8 utf8_general_ci

The different collation settings for each database version can be observed by running the following commands and changing the DOCKER_IMAGE environment variables:

$ DOCKER_IMAGE=mysql:8.0; export DOCKER_IMAGE
$ read -d '' QUERY_TEXT << EOF
select column_name, 
       data_type,
       character_set_name,
       collation_name
  from information_schema.columns 
 where table_schema = 'information_schema' 
   and table_name = 'COLUMNS' 
   and column_name = 'DATA_TYPE';
EOF

$ docker run --name col-data-type-collation-check -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d ${DOCKER_IMAGE}

# Wait for database to start up then:
$ docker exec col-data-type-collation-check mysql -e "${QUERY_TEXT}"
COLUMN_NAME	DATA_TYPE	CHARACTER_SET_NAME	COLLATION_NAME
DATA_TYPE	longtext	utf8mb3	                utf8mb3_bin

# Tidy up
$ docker stop col-data-type-collation-check && docker rm col-data-type-collation-check && docker image rm ${DOCKER_IMAGE}

A proposed fix is to remove the b character prefix from dtype in the MySQLAdapter parse_show_columns function as follows:

import re
...
MySQLColumn(
    table_database=None,
    table_schema=relation.schema,
    table_name=relation.name,
    table_type=relation.type,
    table_owner=None,
    table_stats=None,
    column=column.column,
    column_index=idx,
    # dtype=column.dtype,
    dtype = re.sub(r"^b'|'$", "", column.dtype)
 )

@clementchong
Copy link
Author

clementchong commented May 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants