Skip to content

Quirks specific to some Database systems

José Fidalgo edited this page May 22, 2024 · 13 revisions

This page describes Database system choices that are specific to their implementation of the standard SQL and that differ from all other supported systems.

In case you use one of the following Database systems, make sure you take the respective information into account so as to avoid possible problems.

OracleDB

Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.

  • BOOLEAN is not supported — PDB is using CHAR with value 1 for TRUE and 0 for FALSE.

  • JDBC statements that include string literals with length > 2000 characters will fail with error "string literal too long"; as a workaround, prepared statements should be used instead (they don't fail with these long strings)
    Note: though not experimented, it is reported that splitting the long string into smaller chunks may work
    INSERT INTO table (clob_column) VALUES ( to_clob( 'chunk 1' ) || to_clob( 'chunk 2' ) );

  • LOBs >= 4000 bytes are stored off row. This is detrimental to the performance when accessing those columns because the scan of rows is no longer sequential (and databases are optimized for sequential accesses). See Oracle LOB Performance Guide for more details.

  • By default, recent driver versions will check if the timezone of the client system has a match among the timezones known to the server, otherwise the connection will fail with the error ORA-01882: timezone region not found.
    There are several workarounds for this:

    • revert to old driver behavior by creating a file inside the classpath, in the following path oracle/jdbc/defaultConnectionProperties.properties; inside this file, add the property oracle.jdbc.timezoneAsRegion=false;
    • check the available timezones in the server with SELECT DISTINCT tzname FROM V$TIMEZONE_NAMES, then set a timezone in the client system that matches one of the timezones available in the server (e.g. if "Etc/UTC" is the current timezone in the client and it is unavailable in the server, but the server contains "UTC", use the latter on the client system instead);
    • instead of setting the timezone for the whole client system, that information can be passed with the JVM flag -Duser.timezone=<timezone>.
  • Query cancellation /timeouts not supported with the thin jdbc driver.

MySQL

  • No support for IEEE 754-1985 non-numbers: positive and negative infinities and NaN (not-a-number)

  • A communication packet (e.g. a query) has a relatively low default value of 4MB (this is increased to 64MB in MySQL >= 8.0.3);
    this value must be increased if large BLOB columns or long strings are used in the queries (should be at least as big as the largest BLOB/string that is expected to appear). See MySQL documentation for "Packet too large" error and max_allowed_packet flag.
    If you are using InnoDB engine, the innodb-log-file-size flag must be also increased from the default value of 5MB (this is increased to 48MB in MySQL >= 5.6.8). See MySQL documentation for Redo Log File Configuration.

  • Since MySQL’s UTF-8 implementation only supports up to 3 bytes (the standard allows up to 4), some Unicode characters may be wrongly encoded when using "utf8" encoding; if you need to use real UTF-8, then the correct MySQL encoding option to use is "utf8mb4" (this is already the default in MySQL v8).
    Also, unlike other database systems, MySQL has a case-insensitive default collation.
    Hence, when working with string columns and selecting data, sorting data, or determining uniqueness, the default behavior will be case insensitive. For example, the strings "resume", "Resume" and "résumé" will be considered the same.
    To have a consistent behavior across database systems with PDB, it is therefore recommended to create databases using a proper UTF-8 character set (or the character set necessary to represent the data to be stored) and case-sensitive collation: CREATE DATABASE database_name CHARACTER SET charset_name COLLATE collation_name;
    Afterwards, if necessary, you can set the character set and the collation of specific columns to case-insensitive.
    The character sets available in the server can be obtained with the following query: SHOW CHARACTER SET
    The collations available in the server can be obtained with the following query: SHOW COLLATION
    The default character set and collation for a particular database can be verified using this query: SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'database_name';
    See MySQL - Database Character Set and Collation for information on changing the character set and collation on existing databases and the implications on the data already stored there.

Microsoft SQL Server

  • During concurrent transactions, at least with isolation level "read committed", SQL Server blocks a transaction trying to read a row that another transaction has already locked for writing (exclusive access).
    In this situation other supported Database systems will not block, they will return the current committed value instead. This may seem less safe regarding the correctness of the read data, but the reads will be optimistic (won't block) and data verification can be done on the application level. To make this behavior consistent with other Database systems and to avoid application deadlocks, the database(s) in use should have the flag READ_COMMITTED_SNAPSHOT enabled. This can be done right after creating a database, by running
    ALTER DATABASE database_name SET READ_COMMITTED_SNAPSHOT ON;
    WARNING: enabling READ_COMMITTED_SNAPSHOT does not work on "master" database (which should not be used for data anyway).

  • Unlike other database systems, SQL Server has a case-insensitive default collation (SQL_Latin1_General_CP1_CI_AS as documented in SQL Server - Set or Change the Server Collation).
    Hence, when working with string columns and selecting data, sorting data, or determining uniqueness, the default behavior will be case insensitive. For example, the strings "resume", "Resume" and "résumé" will be considered the same.
    To have a consistent behavior across database systems with PDB, it is therefore recommended to create databases using a case-sensitive collation: CREATE DATABASE database_name COLLATE SQL_Latin1_General_CP1_CS_AS;
    Afterwards, if necessary, you can set the collation of specific columns to case-insensitive.
    The collations available in the server can be obtained with the following query: SELECT Name, Description FROM fn_helpcollations();
    The default collation for a particular database can be verified using this query: SELECT CONVERT (varchar(256), TABASEPROPERTYEX('database_name','collation'));
    See SQL Server - Set or Change the Database Collation for information on changing the collation on existing databases and the implications on the data already stored there.

H2

  • Login timeout, query timeout and socket timeout are not supported in PDB. H2 does support socket timeout, but only as a "global" timeout setting, defined in system properties. See H2 SysProperties - SOCKET_CONNECT_TIMEOUT).

  • This database doesn't correctly calculate the length of a string containing high-surrogate characters (supplementary characters in UTF-16, such as 😁).

Extra information

Fetching LOBs

The process of obtaining the contents of a LOB column on the client side is not specific to a single Database system.
Usually the contents of LOB columns are not fetched immediately like other types of columns. Instead, an object is created with a reference to the current DB connection to fetch the contents of the LOB when required. We confirmed this in OracleDB and PostgreSQL, but probably applies to the other systems as well. Therefore, for a table with "COL1" STRING and "COL2" BLOB, the following Java code works:

List<Map<String, ResultColumn>> result = engine.query(select(all()).from(table("TEST")));  
engine.close();  
result.get(0).get("COL1").toString();

However, the following Java code doesn’t work, as it complains about closed connection when getting BLOB data:

List<Map<String, ResultColumn>> result = engine.query(select(all()).from(table("TEST")));  
engine.close();  
result.get(0).get("COL2").<BlobTest>toBlob();

(it would work if the DB engine connection wasn't closed)

Identifier sizes

The following table presents a general overview of the limits of identifier sizes.
It is not exhaustive and may eventually be outdated. Check the respective documentation pages to ensure you get the most up-to-date maximum identifier lengths, as well as other possible limits.

Database system Maximum length of identifier
PostgreSQL 63 bytes
OracleDB 128 bytes as of v12c R2 (older versions were limited to 30 bytes)
MySQL 64
Microsoft SQL server 128
H2 limited only by memory/storage capacity
DB2 128 bytes as of v11 (older versions were limited to 30 bytes)