-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
测试一下中文翻译成英语是否好用 #4
Comments
不按顺序回答你的问题。 再看“自增ID作为主键”的问题,在分布式数据库上这是一个老生常谈的话题,看看gpt的回答: 写入热点:自增主键意味着新插入的行总是拥有比之前高的主键值。这在分布式数据库中可能会造成主键值集中在某个范围内,从而导致所有新的插入操作都集中在特定的分区或节点上。这种情况创建了所谓的“写入热点”,可能会降低系统的性能和扩展性。 分布式事务冲突:在分布式系统中保持主键的唯一自增值需要跨多个节点协调,这可能会导致无法避免的分布式事务冲突和协调开销。同步自增值可能会导致延迟和额外的性能损耗。 扩展性限制:自增主键强制了数据的插入顺序,这可能导致数据在物理存储上的不均匀分布,对于需要水平扩展的分布式数据库来说,这是一个不利因素。扩展操作需要平衡和迁移数据,而自增主键可能会使这一过程更加复杂和低效。 网上也有很多自增主键的博客。大概意思都是自增主键在非分区数据库上都挺和谐,但是分区数据库上很困难。 在来看“对于这种场景,如果 OceanBase 仍然推荐使用它们联合作为主键,是因为 LSM-Tree 能够很好地处理 裂页 问题么 ?” |
Answering your questions out of order. Let’s look at the issue of “self-increasing ID as primary key”. This is a common topic in distributed databases. Take a look at gpt’s answer: Write hotspots: Auto-incrementing primary keys means that newly inserted rows always have a higher primary key value than the previous one. In a distributed database, this may cause primary key values to be concentrated in a certain range, causing all new insert operations to be concentrated on a specific partition or node. This situation creates so-called "write hotspots" that can reduce system performance and scalability. Distributed transaction conflicts: Maintaining a unique auto-increment value for a primary key in a distributed system requires coordination across multiple nodes, which may lead to unavoidable distributed transaction conflicts and coordination overhead. Synchronous auto-increment may cause latency and additional performance loss. Scalability limitations: Auto-incrementing primary keys enforce the insertion order of data, which may lead to uneven distribution of data on physical storage. This is a disadvantage for distributed databases that require horizontal expansion. Scaling operations require balancing and migrating data, and auto-incrementing primary keys can make this process more complex and inefficient. There are also many blogs on the Internet that add primary keys. The general meaning is that auto-incrementing primary keys are quite harmonious on non-partitioned databases, but are very difficult on partitioned databases. Let's look at "For this scenario, if OceanBase still recommends using their union as the primary key, is it because LSM-Tree can handle the split page problem well?" |
测试短一点的中文 |
Test shorter Chinese |
Question
我看官方文档中存在如下描述:
如果有一个电商 SaaS,【订单表】有 商户ID(
int
) 和 订单号(varchar(32)
) 存在唯一约束,这个时候我们也应该用 这两个字段 做联合主键吗?众所周知,MySQL 采用的是 B+Tree 的结构来存储数据,一般建议 ID是自增的,这样可以避免随机插入,导致 MySQL 数据页频繁分裂,进而导致检索数据时 IO 开销增加。
然而,商户ID + 订单号 也很明显不是自增的,所以我才有此疑问。
对于这种场景,如果 OceanBase 仍然推荐使用它们联合作为主键,是因为 LSM-Tree 能够很好地处理 裂页 问题么 ?
此外,除了主键索引,我们一般还会额外在关键业务表上创建几个二级索引。
在 MySQL 中,因为每个二级索引的叶子节点都会存储相应行的主键值。所以,如果主键较大,那么每个二级索引的叶子节点所需的空间也会更大,因为它们需要存储更大的主键值。
如果主键是
bigint
的自增ID,在每个二级索引上存储,关联主键也就占用 8个字节 的空间。如果主键是
int + varchar(32)
这种联合索引,是不是就意味着,关联主键也要占用至少4+32
个字节的空间呢?如果有多个二级索引(一般2~4个),是不是也要重复多占用不少空间 ?这样的话,存储空间 以及 维护这些二级索引的开销不是也增大了许多么 ?
The text was updated successfully, but these errors were encountered: