Altinity Stable for ClickHouse 21.3.13.9

Details of the Altinity Stable Release 21.3.13.9

A few months ago we certified ClickHouse 21.1 as an Altinity Stable release. Since then, we have worked on newer releases and run them in-house. We completed several new features, and even more have been added by community contributors. We were running ClickHouse 21.3 to power our public datasets at Altinity.Cloud instance and testing it in our environments. As of 21.3.13.9 we are confident in certifying 21.3 as an Altinity Stable release.

This release is a relatively small upgrade since the previous Altinity Stable release. But even a small upgrade includes 742 pull requests from 133 contributors; these mostly focused on performance and stability improvements. There are several important new features as well! Please look below for detailed release notes.

Major new features since the previous stable release 21.1.x

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • SQL features:
    • Window functions! See our blog article for an introduction.
    • Support multiple levels nesting in the Nested datatype.
  • Security features:
    • LDAP roles mapping a).
    • Apply row-level security as a separate PREWHERE step a).
    • Server side keys support for S3 table function a).
  • Cluster improvements:
    • Hedged requests! That allows to reduce tail latencies on large clusters by running the same query at different replicas. This is controlled by use_hedged_requests and max_parallel_replicas settings.
    • Allow inserts into a specific shard via a distributed table. This is controlled by the insert_shard_id setting.
    • Allow inserts into cluster() table function and specify a sharding key.
    • Replicated database engine (experimental). Allows to replicate DDL statements across the cluster. It is not production ready in this release.
  • MergeTree features:
    • Table level concurrency control. Can be controlled by the max_concurrent_queries merge tree setting.
  • Integrations:
    • PostgreSQL table engine, table function and dictionary source.

As usual with ClickHouse, there are many performance and operational improvements in different server components.

a)- contributed by Altinity developers.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • It is no longer allowed to create MergeTree tables in the old syntax with table TTL; this syntax is now ignored. Attaching old tables is still possible.
  • Floating point columns and values are disallowed in:
    • Partitioning key – these can be turned back on with the allow_floating_point_partition_key merge tree setting.
    • Bitwise functions.
  • Excessive parenthesis in type definitions are no longer supported, example: Array((UInt8)).

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Now replicas that are processing the ALTER TABLE ATTACH PART[ITION] command search their detached/ folders before fetching the data from other replicas. As an implementation detail, a new command ATTACH_PART is introduced in the replicated log. Parts are searched and compared by their checksums. ATTACH PART[ITION] queries may not work during cluster upgrade.

Other Important Changes

Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:

  • distributed_aggregation_memory_efficient
  • enable_global_with_statement
  • optimize_normalize_count_variants
  • optimize_respect_aliases
  • optimize_rewrite_sum_if_to_count_if

Known issues in 21.3.13.9

The development team continues to improve the quality of the 21.3 release. The following issues still exist in the 21.3.13.9 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:

ClickHouse Altinity Stable releases are based on the community versions. Community builds can be downloaded from repo.clickhouse.tech.

ClickHouse Altinity builds may be found at:

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New functions

  • DateTime functions:
    • Compatibility with MySQL: DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, HOUR, MINUTE, MONTH, QUARTER, SECOND, YEAR
    • parseDateTimeBestEffortUSOrNull
    • parseDateTimeBestEffortUSOrZero
    • timezoneOffset
  • String functions:
    • decodeXMLComponent, extractTextFromHTML
  • Map functions:
    • mapContains, mapKeys, mapValues
  • Aggregate functions
    • New combinator -SimpleState
    • deltaSum
    • runningConcurrency
  • Functions to work with new Map data type:
    • map
  • Math/Statistics:
    • sign
  • Type related:
    • reinterpret, reinterpretAsInt128, reinterpretAsInt256, reinterpretAsUInt256
  • Internal ClickHouse:
    • connectionID, connection_id
    • normalizeQueryKeepNames, normalizedQueryHashKeepNames

New table functions

  • postgresql
  • file

New table engines

  • PostgreSQL

New metrics and events

system.metrics.

  • PartsCommitted
  • PartsCompact
  • PartsDeleteOnDestroy
  • PartsDeleting
  • PartsInMemory
  • PartsOutdated
  • PartsPreCommitted
  • PartsTemporary
  • PartsWide

system.events:

  • DistributedDelayedInserts
  • DistributedDelayedInsertsMilliseconds
  • DistributedRejectedInserts
  • HedgedRequestsChangeReplica

New system tables

  • distributed_ddl_queue

New columns in system tables

  • metric_log.
    • ProfileEvent_DistributedDelayedInserts, ProfileEvent_DistributedRejectedInserts, ProfileEvent_DistributedDelayedInsertsMilliseconds, ProfileEvent_HedgedRequestsChangeReplica, CurrentMetric_PartsTemporary, CurrentMetric_PartsPreCommitted, CurrentMetric_PartsCommitted, CurrentMetric_PartsOutdated, CurrentMetric_PartsDeleting, CurrentMetric_PartsDeleteOnDestroy, CurrentMetric_PartsWide, CurrentMetric_PartsCompact, CurrentMetric_PartsInMemory
  • Parts:
    • group_by_ttl_info.expression, group_by_ttl_info.min, group_by_ttl_info.max, rows_where_ttl_info.expression, rows_where_ttl_info.min, rows_where_ttl_info.max
  • processes.http_referer
  • query_log:
    • http_referer, log_comment, used_aggregate_functions, used_aggregate_function_combinators, used_database_engines, used_data_type_families, used_dictionaries, used_formats, used_functions, used_storages, used_table_functions
  • query_thread_log.http_referer
  • quota_limits:
    • max_query_selects, max_query_inserts
  • quota_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts
  • quotas_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts

System.merge_tree_settings added/changed

Name Old value New value Description
allow_floating_point_partition_key 0 Allow floating point as partition key.
inactive_parts_to_delay_insert 0 If table contains at least that many inactive parts in single partition, artificially slow down insert into table.
inactive_parts_to_throw_insert 0 If more than this number of inactive parts are in a single partition, throw the ‘Too many inactive parts …’ exception.
max_concurrent_queries 0 Max number of concurrently executed queries related to the MergeTree table (0 - disabled). Queries will still be limited by other max_concurrent_queries settings.
min_marks_to_honor_max_concurrent_queries 0 Minimal number of marks to honor the MergeTree-level’s max_concurrent_queries (0 - disabled). Queries will still be limited by other max_concurrent_queries settings.

system.settings added/changed

Name Old value New value Description
allow_changing_replica_until_first_data_packet 0 Allow HedgedConnections to change the replica until receiving the first data packet.
allow_experimental_database_replicated 0 Allows creating databases with the Replicated engine.
allow_experimental_query_deduplication 0 Allows sending parts’ UUIDs for a query in order to deduplicate data parts, if any.
async_socket_for_remote 1 0 Asynchronously read from socket executing remote query.
background_fetches_pool_size 3 8 Number of threads performing background fetches for replicated tables. Only has meaning at server startup.
checksum_on_read 1 Validates checksums on reading. It is enabled by default and should be always enabled in production. Please do not expect any benefits in disabling this setting. It may only be used for experiments and benchmarks. These settings are only applicable for tables of the MergeTree family. Checksums are always validated for other table engines and when receiving data over the network.
database_replicated_ddl_output 1 Returns table with query execution status as a result of a DDL query.
database_replicated_initial_query_timeout_sec 300 How long an initial DDL query should wait for a Replicated database to process previous DDL queue entries.
distributed_aggregation_memory_efficient 0 1 The memory-saving mode of distributed aggregation is enabled.
enable_global_with_statement 0 1 Propagate WITH statements to UNION queries and all subqueries.
engine_file_empty_if_not_exists 0 Allows selecting data from a file engine table without a file.
engine_file_truncate_on_insert 0 Enables or disables truncate before insert in file engine tables.
flatten_nested 1 If true, columns of type Nested will be flatten to separate array columns instead of one array of tuples.
hedged_connection_timeout_ms 100 Connection timeout for establishing connection with replica for Hedged requests.
insert_in_memory_parts_timeout 600000 REMOVED.
insert_shard_id 0 If non zero, when inserting into a distributed table, the data will be inserted into the shard insert_shard_id synchronously. Possible values range from 1 to shards_number of the corresponding distributed table.
log_comment Log comment into system.query_log table and server log. It can be set to an arbitrary string no longer than max_query_size.
normalize_function_names 0 Normalize function names to their canonical names.
optimize_normalize_count_variants 1 Rewrite aggregate functions that semantically equals count() as count().
optimize_respect_aliases 1 If it is set to true, it will respect aliases in WHERE/GROUP BY/ORDER BY, that will help with partition pruning/secondary indexes/optimize_aggregation_in_order/optimize_read_in_order/optimize_trivial_count.
optimize_rewrite_sum_if_to_count_if 1 Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent.
periodic_live_view_refresh 60 Interval after which a periodically refreshed live view is forced to refresh.
query_plan_max_optimizations_to_apply 10000 Limit the total number of optimizations applied to the query plan. If zero, ignored. If the limit is reached, throw an exception.
receive_data_timeout_ms 2000 Connection timeout for receiving the first packet of data or packet with positive progress from replica.
s3_max_connections 1024 The maximum number of connections per server.
sleep_in_send_data 0 Time to sleep in sending data in TCPHandler.
sleep_in_send_tables_status 0 Time to sleep in sending tables the status response in TCPHandler.
unknown_packet_in_send_data 0 Sends unknown packet instead of N-th data packet.
use_hedged_requests 0 Use hedged requests for distributed queries.

Also, please refer to the release notes from the development team available at the following URLs:

Last modified 2021.06.29