Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database returns: Duplicate entry '2147483647' for key 'PRIMARY' #6176

Closed
tmartensson opened this issue Sep 26, 2022 · 11 comments
Closed

Database returns: Duplicate entry '2147483647' for key 'PRIMARY' #6176

tmartensson opened this issue Sep 26, 2022 · 11 comments
Assignees
Milestone

Comments

@tmartensson
Copy link

tmartensson commented Sep 26, 2022

Environmental Info:
K3s Version:
k3s version v1.22.12+k3s1 (17b9454)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:
OS: Ubuntu 22.04.1 LTS
CPU: AMD EPYC 7R32
Linux ip-10-21-14-218 5.15.0-1020-aws #24-Ubuntu SMP Thu Sep 1 16:04:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 control-plane nodes
~200 agents
Database: AWS 5.7.mysql_aurora.2.10.2
Cluster managed with Rancher and using Rancher Fleet for deployments to the nodes.
Cluster have been up and running for almost 18 months.

Describe the bug:
control-plane nodes went crazy and was hammering the database with almost 200k connections. I have tried debug logs but they did not give me the root cause, using tcpdump I can see that MySQL replies:
#23000Duplicate entry '2147483647' for key 'PRIMARY'

We have reached the maximum allowed number for the id field:

mysql> DESCRIBE kine;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| name            | varchar(630) | YES  | MUL | NULL    |                |
[...]

Expected behavior:
control-plane nodes should be up and running, but if they encounter this error it should be visible in the logs.

Actual behavior:
control-plane nodes can not start, the error from MySQL was not visible in any logs.

@tmartensson
Copy link
Author

tmartensson commented Sep 26, 2022

Running a count on id shows that we only have 20k active rows:

mysql> select count(id) from kine;
+-----------+
| count(id) |
+-----------+
|     20865 |
+-----------+
1 row in set (0.00 sec)

@tmartensson tmartensson changed the title Database returns" Duplicate entry '2147483647' for key 'PRIMARY'" Database returns: Duplicate entry '2147483647' for key 'PRIMARY' Sep 26, 2022
@tmartensson
Copy link
Author

Since we just had 20k active rows, we dropped the indexes and run a alter table to change to BIGINT and then re-created the indexes.

+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| name            | varchar(630) | YES  | MUL | NULL    |                |
| created         | bigint(20)   | YES  |     | NULL    |                |
| deleted         | bigint(20)   | YES  |     | NULL    |                |
| create_revision | bigint(20)   | YES  |     | NULL    |                |
| prev_revision   | bigint(20)   | YES  | MUL | NULL    |                |
| lease           | bigint(20)   | YES  |     | NULL    |                |
| value           | mediumblob   | YES  |     | NULL    |                |
| old_value       | mediumblob   | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+

control-plane nodes starts successfully after this.

@brandond
Copy link
Member

brandond commented Sep 26, 2022

This has been brought up in the past, although I'm not finding the issue at the moment. If we're going to migrate to another datatype, it should probably be something like BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. I'll have to check what the equivalent might be for other storage engines.

Can I ask how long it took to run the alter table statement? When we've considered doing this in the past, the biggest challenge has been figuring out how to do it automatically for multi-node clusters with a minimum of disruption.

@brandond brandond self-assigned this Sep 26, 2022
@brandond brandond added this to the v1.26.0+k3s1 milestone Sep 26, 2022
@tmartensson
Copy link
Author

tmartensson commented Sep 26, 2022

@brandond Here is the full output from the commands, ALTER TABLE took just 2.2 seconds on a AWS RDS db.r6g.4xlarge instance (see rancher/fleet#499 why we need such a large instance)

Note: we did shut down k3s on all 3 control-plane servers, also ran the k3s-killall command before running the commands below. Verified with show processlist; that no instance was connected to the database.

mysql> DROP INDEX kine_name_index ON kine;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> DROP INDEX kine_name_id_index ON kine;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> DROP INDEX kine_id_deleted_index ON kine;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> DROP INDEX kine_prev_revision_index ON kine;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> DROP INDEX kine_name_prev_revision_uindex ON kine;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE kine
    ->   MODIFY COLUMN id BIGINT NOT NULL AUTO_INCREMENT,
    ->   MODIFY COLUMN created BIGINT,
    ->   MODIFY COLUMN deleted BIGINT,
    ->   MODIFY COLUMN create_revision BIGINT,
    ->   MODIFY COLUMN prev_revision BIGINT,
    ->   MODIFY COLUMN lease BIGINT;
Query OK, 20865 rows affected (2.22 sec)
Records: 20865  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX kine_name_index ON kine (name);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX kine_name_id_index ON kine (name,id);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX kine_id_deleted_index ON kine (id,deleted);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> CREATE INDEX kine_prev_revision_index ON kine (prev_revision);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> CREATE UNIQUE INDEX kine_name_prev_revision_uindex ON kine (name, prev_revision);
Query OK, 0 rows affected (0.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

@dnoland1
Copy link
Contributor

dnoland1 commented Feb 1, 2024

Hit this problem, fixed using:

>  ALTER TABLE kine MODIFY COLUMN id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL UNIQUE, MODIFY COLUMN create_revision BIGINT UNSIGNED, MODIFY COLUMN prev_revision BIGINT UNSIGNED;
Query OK, 118253 rows affected (7.728 sec)
Records: 118253  Duplicates: 0  Warnings: 0

@brandond
Copy link
Member

brandond commented Feb 1, 2024

I need to evaluate the impact of running that statement while multiple servers are connected and active. I would love to proactively address this, but I am concerned that it will cause surprising outages during an upgrade of we just throw the switch by default.

@brandond
Copy link
Member

brandond commented Feb 6, 2024

Just for my own notes: It appears that we do not need to make any changes for sqlite; AUTOINCREMENT (rowid columns) only work on INTEGER PRIMARY KEY columns, and the INTEGER type already supports 8 byte ints:

INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

So we only need to migrate mysql and postgres over to a larger integer column type.

@brandond
Copy link
Member

brandond commented Feb 6, 2024

WIP: k3s-io/kine#273

The current idea is that new databases will created with the new schema, while old databases are left alone. Users who want to opt-in to manually converting to the new schema can set KINE_SCHEMA_MIGRATION=1 in the k3s server environment. At some point on the first release of a new minor (say v1.30.0+k3s1) we may set var by default for users to force schema changes when upgrading.

@brandond brandond removed this from K3s Backlog Feb 7, 2024
@brandond brandond moved this from New to Working in K3s Development Feb 7, 2024
@brandond brandond removed this from the Backlog milestone Feb 7, 2024
@brandond
Copy link
Member

brandond commented Mar 6, 2024

Bumping out to v1.30

@rancher-max rancher-max moved this from Working to To Test in K3s Development May 7, 2024
@ShylajaDevadiga
Copy link
Contributor

Validated using mysql as db on rke2 version v1.30.0-rc2+rke2r1

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04

Cluster Configuration:
3 server 1 agent node

Config.yaml:

cat /etc/rancher/rke2/config,yaml
node-external-ip: 3.144.97.253
token: <TOKEN>
write-kubeconfig-mode: 644
debug: true
kine-tls: true
datastore-endpoint: mysql://<MYSQL DB>:3306)/mydb

Steps to validate

  1. Install rke2 using mysql as the databse
  2. Describe the table and check the datatype
  3. Verify cluster comes up and no duplicate key error in logs

Validation results:

Type is seen and biginit unsigned

mysql> describe kine;
+-----------------+-----------------+------+-----+---------+----------------+
| Field           | Type            | Null | Key | Default | Extra          |
+-----------------+-----------------+------+-----+---------+----------------+
| id              | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| name            | varchar(630)    | YES  | MUL | NULL    |                |
| created         | int             | YES  |     | NULL    |                |
| deleted         | int             | YES  |     | NULL    |                |
| create_revision | bigint unsigned | YES  |     | NULL    |                |
| prev_revision   | bigint unsigned | YES  | MUL | NULL    |                |
| lease           | int             | YES  |     | NULL    |                |
| value           | mediumblob      | YES  |     | NULL    |                |
| old_value       | mediumblob      | YES  |     | NULL    |                |
+-----------------+-----------------+------+-----+---------+----------------+
9 rows in set (0.02 sec)

ubuntu@ip-172-31-12-206:~$ rke2 -v
rke2 version v1.30.0-rc1+rke2r1 (ebe1d35dbd701b6dcfa1a08a27b3877d35a00855)
go version go1.22.2 X:boringcrypto
ubuntu@ip-172-31-12-206:~$ kubectl get nodes
NAME               STATUS   ROLES                  AGE     VERSION
ip-172-31-13-230   Ready    control-plane,master   10m     v1.30.0+rke2r1
ip-172-31-3-94     Ready    control-plane,master   9m27s   v1.30.0+rke2r1
ip-172-31-12-20    Ready    <none>                 9m30s   v1.30.0+rke2r1
ip-172-31-12-206   Ready    control-plane,master   13m     v1.30.0+rke2r1
ubuntu@ip-172-31-12-206:~$ kubectl get pods -A
NAMESPACE     NAME                                                    READY   STATUS      RESTARTS   AGE
kube-system   kube-proxy-ip-172-31-12-206                             1/1     Running     0          13m
kube-system   kube-scheduler-ip-172-31-12-206                         1/1     Running     0          13m
kube-system   kube-controller-manager-ip-172-31-12-206                1/1     Running     0          13m
kube-system   cloud-controller-manager-ip-172-31-12-206               1/1     Running     0          13m
kube-system   kube-apiserver-ip-172-31-12-206                         1/1     Running     0          13m
kube-system   helm-install-rke2-coredns-bwqc8                         0/1     Completed   0          13m
kube-system   helm-install-rke2-canal-rzv4h                           0/1     Completed   0          13m
kube-system   rke2-canal-z47f2                                        2/2     Running     0          13m
kube-system   rke2-coredns-rke2-coredns-autoscaler-5749cd7b8b-qlc74   1/1     Running     0          13m
kube-system   helm-install-rke2-snapshot-controller-crd-8p4dm         0/1     Completed   0          13m
kube-system   helm-install-rke2-metrics-server-8bxjz                  0/1     Completed   0          13m
kube-system   helm-install-rke2-snapshot-controller-zqk6w             0/1     Completed   0          13m
kube-system   helm-install-rke2-snapshot-validation-webhook-ffww5     0/1     Completed   0          13m
kube-system   rke2-snapshot-validation-webhook-bf7bbd6fc-4s9pt        1/1     Running     0          12m
kube-system   rke2-snapshot-controller-7dcf5d5b46-5qbvk               1/1     Running     0          12m
kube-system   helm-install-rke2-ingress-nginx-7qq9z                   0/1     Completed   0          13m
kube-system   rke2-metrics-server-868fc8795f-2xfk8                    1/1     Running     0          12m
kube-system   rke2-coredns-rke2-coredns-64dcf4f58b-4wfzp              1/1     Running     0          13m
kube-system   rke2-ingress-nginx-controller-6h6fm                     1/1     Running     0          12m
kube-system   kube-apiserver-ip-172-31-13-230                         1/1     Running     0          10m
kube-system   cloud-controller-manager-ip-172-31-13-230               1/1     Running     0          10m
kube-system   kube-controller-manager-ip-172-31-13-230                1/1     Running     0          10m
kube-system   kube-scheduler-ip-172-31-13-230                         1/1     Running     0          10m
kube-system   kube-proxy-ip-172-31-13-230                             1/1     Running     0          10m
kube-system   rke2-canal-x8kvs                                        2/2     Running     0          10m
kube-system   kube-apiserver-ip-172-31-3-94                           1/1     Running     0          9m29s
kube-system   rke2-ingress-nginx-controller-wwfpx                     1/1     Running     0          9m49s
kube-system   cloud-controller-manager-ip-172-31-3-94                 1/1     Running     0          9m26s
kube-system   kube-controller-manager-ip-172-31-3-94                  1/1     Running     0          9m26s
kube-system   kube-scheduler-ip-172-31-3-94                           1/1     Running     0          9m26s
kube-system   rke2-coredns-rke2-coredns-64dcf4f58b-zntp5              1/1     Running     0          9m59s
kube-system   kube-proxy-ip-172-31-12-20                              1/1     Running     0          9m34s
kube-system   kube-proxy-ip-172-31-3-94                               1/1     Running     0          9m19s
kube-system   rke2-canal-x8xwk                                        2/2     Running     0          9m31s
kube-system   rke2-canal-tbxpt                                        2/2     Running     0          9m34s
kube-system   rke2-ingress-nginx-controller-cv4p6                     1/1     Running     0          9m12s
kube-system   rke2-ingress-nginx-controller-p7bbt                     1/1     Running     0          8m42s
ubuntu@ip-172-31-12-206:~$ 

@mdrahman-suse
Copy link

mdrahman-suse commented May 10, 2024

Validated with RC v1.30.0-rc2+rke2r1 with PostgreSQL 15 as External DB

Environment Details

Infrastructure
Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04

Cluster Configuration:
3 server 1 agent node

Config.yaml:

$ cat /etc/rancher/rke2/config,yaml
node-external-ip: <public-ip>
token: <TOKEN>
cni: calico
write-kubeconfig-mode: 644
debug: true
datastore-endpoint: postgres://<POSTGRESQL DB>:5432)/pgdb

Steps to validate

  1. Install rke2 using postgreSQL as the database
  2. Describe the table and check the datatype
  3. Verify cluster comes up and no duplicate key error in logs

Validation results

  • Type is bigint
pgdb=> \d+ kine;
                                                                   Table "public.kine"
     Column      |          Type          | Collation | Nullable |             Default              | Storage  | Compression | Stats target | Description
-----------------+------------------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
 id              | bigint                 |           | not null | nextval('kine_id_seq'::regclass) | plain    |             |              |
 name            | character varying(630) |           |          |                                  | extended |             |              |
 created         | integer                |           |          |                                  | plain    |             |              |
 deleted         | integer                |           |          |                                  | plain    |             |              |
 create_revision | bigint                 |           |          |                                  | plain    |             |              |
 prev_revision   | bigint                 |           |          |                                  | plain    |             |              |
 lease           | integer                |           |          |                                  | plain    |             |              |
 value           | bytea                  |           |          |                                  | extended |             |              |
 old_value       | bytea                  |           |          |                                  | extended |             |              |
Indexes:
    "kine_pkey" PRIMARY KEY, btree (id)
    "kine_id_deleted_index" btree (id, deleted)
    "kine_name_id_index" btree (name, id)
    "kine_name_index" btree (name)
    "kine_name_prev_revision_uindex" UNIQUE, btree (name, prev_revision)
    "kine_prev_revision_index" btree (prev_revision)
Access method: heap

$ rke2 -v
rke2 version v1.30.0-rc2+rke2r1 (a78baabd2b8a9846cc7a96633d3b82715bcc4a01)
go version go1.22.2 X:boringcrypto

$ kubectl get nodes
NAME                                         STATUS   ROLES                  AGE    VERSION
ip-172-31-13-96.us-east-2.compute.internal   Ready    control-plane,master   143m   v1.30.0+rke2r1
ip-172-31-14-27.us-east-2.compute.internal   Ready    control-plane,master   143m   v1.30.0+rke2r1
ip-172-31-12-54.us-east-2.compute.internal   Ready    control-plane,master   147m   v1.30.0+rke2r1
ip-172-31-3-142.us-east-2.compute.internal   Ready    <none>                 140m   v1.30.0+rke2r1

$ kubectl get pods -A
NAMESPACE         NAME                                                                  READY   STATUS      RESTARTS   AGE
kube-system       kube-controller-manager-ip-172-31-12-54.us-east-2.compute.internal    1/1     Running     0          147m
kube-system       kube-scheduler-ip-172-31-12-54.us-east-2.compute.internal             1/1     Running     0          147m
kube-system       cloud-controller-manager-ip-172-31-12-54.us-east-2.compute.internal   1/1     Running     0          147m
kube-system       kube-apiserver-ip-172-31-12-54.us-east-2.compute.internal             1/1     Running     0          147m
kube-system       helm-install-rke2-coredns-zzmx4                                       0/1     Completed   0          147m
kube-system       helm-install-rke2-calico-crd-5ppkp                                    0/1     Completed   0          147m
kube-system       helm-install-rke2-calico-s9sts                                        0/1     Completed   1          147m
kube-system       kube-proxy-ip-172-31-12-54.us-east-2.compute.internal                 1/1     Running     0          147m
tigera-operator   tigera-operator-6c58658696-mj4f5                                      1/1     Running     0          147m
calico-system     calico-typha-7787777cb8-zzt6f                                         1/1     Running     0          146m
kube-system       rke2-coredns-rke2-coredns-autoscaler-5749cd7b8b-dkktr                 1/1     Running     0          147m
calico-system     calico-node-ckmbc                                                     1/1     Running     0          146m
calico-system     calico-kube-controllers-688bd57c69-575df                              1/1     Running     0          146m
kube-system       helm-install-rke2-snapshot-controller-crd-scwd2                       0/1     Completed   0          147m
kube-system       helm-install-rke2-metrics-server-swqhh                                0/1     Completed   0          147m
kube-system       helm-install-rke2-snapshot-validation-webhook-k6q6v                   0/1     Completed   0          147m
kube-system       helm-install-rke2-snapshot-controller-8fbp9                           0/1     Completed   0          147m
kube-system       rke2-snapshot-controller-7dcf5d5b46-7wn97                             1/1     Running     0          145m
kube-system       rke2-snapshot-validation-webhook-bf7bbd6fc-hwdph                      1/1     Running     0          145m
kube-system       rke2-coredns-rke2-coredns-64dcf4f58b-kxqfx                            1/1     Running     0          147m
kube-system       rke2-metrics-server-868fc8795f-sb62z                                  1/1     Running     0          145m
kube-system       helm-install-rke2-ingress-nginx-6xnzg                                 0/1     Completed   0          147m
kube-system       rke2-ingress-nginx-controller-rtgq4                                   1/1     Running     0          145m
kube-system       kube-apiserver-ip-172-31-14-27.us-east-2.compute.internal             1/1     Running     0          143m
kube-system       kube-apiserver-ip-172-31-13-96.us-east-2.compute.internal             1/1     Running     0          143m
kube-system       kube-scheduler-ip-172-31-13-96.us-east-2.compute.internal             1/1     Running     0          143m
kube-system       kube-controller-manager-ip-172-31-13-96.us-east-2.compute.internal    1/1     Running     0          143m
kube-system       cloud-controller-manager-ip-172-31-13-96.us-east-2.compute.internal   1/1     Running     0          143m
kube-system       kube-proxy-ip-172-31-14-27.us-east-2.compute.internal                 1/1     Running     0          143m
kube-system       kube-proxy-ip-172-31-13-96.us-east-2.compute.internal                 1/1     Running     0          143m
kube-system       kube-scheduler-ip-172-31-14-27.us-east-2.compute.internal             1/1     Running     0          143m
kube-system       kube-controller-manager-ip-172-31-14-27.us-east-2.compute.internal    1/1     Running     0          143m
kube-system       cloud-controller-manager-ip-172-31-14-27.us-east-2.compute.internal   1/1     Running     0          143m
calico-system     calico-typha-7787777cb8-2tn9w                                         1/1     Running     0          143m
calico-system     calico-node-svpd4                                                     1/1     Running     0          143m
calico-system     calico-node-j9bgj                                                     1/1     Running     0          143m
kube-system       rke2-coredns-rke2-coredns-64dcf4f58b-vzhrn                            1/1     Running     0          143m
kube-system       rke2-ingress-nginx-controller-jww42                                   1/1     Running     0          142m
kube-system       rke2-ingress-nginx-controller-4ggmb                                   1/1     Running     0          142m
kube-system       kube-proxy-ip-172-31-3-142.us-east-2.compute.internal                 1/1     Running     0          141m
calico-system     calico-node-vv4fs                                                     1/1     Running     0          141m
kube-system       rke2-ingress-nginx-controller-h7t5m                                   1/1     Running     0          140m

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants