Altinity Stable for ClickHouse 188.8.131.52
This release is a significant step forward since the previous Altinity Stable release 184.108.40.206. It includes 1203 pull requests from 171 contributors.
Major New Features
- TTL moves. See the Putting Things Where They Belong Using New TTL Moves article. While we consider this feature to be production ready, we have a roadmap to further improve it. TTL moves have been developed by Altinity.
- Dictionary DDLs that many ClickHouse users dreamed of! See the ClickHouse Dictionaries: Reloaded! article that highlights this feature.
- New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.
- Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.
- A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.
- A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics
- Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System
- ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.
- Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.
- Server improvements across different server functions. For example, performance of the Kafka Engine has been improved, parallel
INSERTis now possible (see max_insert_threads setting), etc.
- Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with
unknown protocol versionerror until all replicas are upgraded. 220.127.116.11 and 18.104.22.168 contain a special compatibility fix that allows smooth upgrade without replication downtime.
- If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.
- There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.
- There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention.
- ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.
- The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.
- 20.3 requires an alias for every subquery participating in a join by default. Set
joined_subquery_requires_alias=0in order to keep the previous behavior.
- ANY JOIN logic has been changed. To upgrade without changes in behavior, you need to add
SETTINGS any_join_distinct_right_table_keys = 1to Engine Join tables metadata or recreate these tables after upgrade.
- Functions indexHint, findClusterValue, findClusterIndex were removed
- Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored
Other Important Notes
- If you encounter
[Argument at index 0 for function dictGetDateTime must be a constant](https://github.com/ClickHouse/ClickHouse/issues/7798)or a similar error message after upgrading, set
- The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set
input_format_parallel_parsing=0if you experience insert performance degradation.
- The check that a query is performed fast enough is enabled by default. Queries can get an exception like this:
DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.. Set
timeout_before_checking_execution_speed=0to fix this problem.
- Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set
Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling
experimental_use_processors and check if the problem goes away.
New or Improved Functions
- CRC32IEEE, CRC64
- New JSON functions.
- JSONExtractArrayRaw — very useful for parsing nested JSON structures
- New array functions.
- New geo functions.
- H3 functions.
- Functions for DateTime64.
- parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero
- toDateTime64, toDateTime64OrNull, toDateTime64OrZero
- New aggregate functions:
New table functions
- clusterAllReplicas — similar to cluster function but queries all replicas in the cluster
- generateRandom — allows users to generate random dataset with defined structure
- zeros, zeros_mt
New metrics and events
New system tables
metric_log— stores system.metrics and system.events history
quota_usage— introspection to quotes
row_policies— introspection to row security policies
stack_trace— for low-level debugging purpose
generators, like system.numbers, but faster and returns zeros
New columns in system tables
processes: thread_ids replaces old columns thread_numbers and os_thread_ids
query_log: exception_code, thread_ids replaces old columns thread_numbers and os_thread_ids
settings: min, max, readonly
table_engines: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication
|Settings||Old value||New value||Description|
|disable_background_merges||0||Disable background merges|
|merge_max_block_size||8192||How many rows in blocks should be formed for merge operations|
|min_bytes_for_wide_part||0||Minimal uncompressed size in bytes to create part in wide format instead of compact|
|min_rows_for_wide_part||0||Minimal number of rows to create part in wide format instead of compact|
|use_minimalistic_part_header_in_zookeeper||0||1||Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled.|
|Setting||Old value||New value||Description|
|allow_experimental_alter_materialized_view_structure||0||Allow atomic alter on Materialized views. Work in progress|
|allow_experimental_data_skipping_indices||0||1||Data skipping indices are now enabled by default|
|allow_nondeterministic_mutations||0||Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions|
|background_move_pool_size||8||Number of threads performing background moves for tables. Only has meaning at server startup|
|default_max_bytes_in_join||100000000||1000000000||Maximum size of right-side table if limit is required but max_bytes_in_join is not set|
|enable_early_constant_folding||0||1||Enable query optimization where we analyze function and subqueries results and rewrite query if there’re constants there|
|enable_scalar_subquery_optimization||0||1||If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once|
|experimental_use_processors||0||1||Processors pipeline is now enabled by default|
|force_optimize_skip_unused_shards||1||Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw)|
|format_avro_schema_registry_url||For AvroConfluent format: Confluent Schema Registry URL|
|input_format_parallel_parsing||1||Enable parallel parsing for some data formats|
|input_format_values_deduce_templates_of_expressions||0||1||For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows|
|join_algorithm||hash||Specify join algorithm:
|joined_subquery_requires_alias||0||1||Force joined subqueries and table functions to have aliases for correct name qualification|
|max_insert_threads||0||The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see
|max_joined_block_size_rows||65536||Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited|
|max_parser_depth||1000||Maximum parser depth|
|memory_profiler_step||0||Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler|
|min_bytes_to_use_mmap_io||0||The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled|
|min_chunk_bytes_for_parallel_parsing||1048576||The minimum chunk size in bytes, which each thread will parse in parallel|
|mutations_sync||0||Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist|
|optimize_if_chain_to_miltiif||0||Replace if(cond1, then1, if(cond2, …)) chains to multiIf|
|output_format_avro_codec||Compression codec used for output. Possible values:
|output_format_avro_sync_interval||16384||Sync interval in bytes|
|output_format_csv_crlf_end_of_line||If it is set true, end of line in CSV format will be \r\n instead of \n|
|output_format_tsv_crlf_end_of_line||If it is set true, end of line in TSV format will be \r\n instead of \n|
|timeout_before_checking_execution_speed||0||10||Check that the speed is not too low after the specified time has elapsed|
|use_compact_format_in_distributed_parts_names||0||Changes format of directories names for distributed table insert parts|