This is the multi-page printable view of this section. Click here to print.
Altinity Stable for ClickHouse 19.16
- 1: Altinity Stable for ClickHouse 184.108.40.206
- 2: Altinity Stable for ClickHouse 220.127.116.11
- 3: Altinity Stable for ClickHouse 18.104.22.168
1 - Altinity Stable for ClickHouse 22.214.171.124
This is a combined release of 126.96.36.199, upgraded to 188.8.131.52 after review.
Major New Features
enable_scalar_subquery_optimization setting is disabled by default. It is required in order to perform rolling upgrades to 20.x versions.
- Fixed a bug with non-Date/DateTime columns not being allowed in TTL expressions
- Fixed a bug in the Kafka engine that sometimes prevented committing a message back to the broker if the broker was temporarily unavailable
- Fixed a bug in which the boolean functions ‘or’ and ‘and’ might return incorrect results when called on more than 10 arguments if some of them are NULL
- Fixed server side certificates in docker image
- Fixed a possible race condition between insert quorum and drop partition
- Volume selection in merge process could move data back to the first volume in storage policy in some cases
- Mutations that did not work if
insert_quorumhas been used for the table
- Deduplication logic that ignored dependent materialized views if a duplicate for the main table has been detected. The fix is turned off by default and can be enabled with a new
- Kafka engine could result in data losses when ZooKeeper is temporarily not available
- Kafka engine did not allow users to drop a table with incorrect Kafka topic
- Kafka engine did not allow users to use subqueries in attached materialized views
ALTER TABLE MODIFY CODECdefault expression and a codec are specified
2 - Altinity Stable for ClickHouse 184.108.40.206
Improved Kafka engine reliability, performance and usability.
- When there is a rebalance of partitions between consumers, occasional data duplicates and losses were possible.
- When data was polled from several partitions with one poll and committed partially, occasional data losses were possible.
- Block size threshold (‘kafka_max_block_size’ setting) now triggers block flush correctly that reduces the chance of data loss under high load.
- When new columns are added to the Kafka pipeline (Kafka engine -> Materialized View -> Table) it was previously not possible to alter the destination tables first with default value and then alter MV in order to minimize downtime. The problem was not related directly to Kafka, but general implementation of materialized views. It is fixed now.
- Few other minor problems have been addressed as well.
Since ClickHouse now respects the ‘kafka_max_block_size’ setting that defaults to 65535, we recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.
The ‘kafka_max_block_size’ defaults to 65535. We recommend increasing it to the bigger values for high volume streaming. Setting it to 524288 or 1048576 may increase consumer throughput up to 20%.
3 - Altinity Stable for ClickHouse 220.127.116.11
This release has major updates:: 2744 commits in 528 pull-requests from 113 contributors.
Major New Features
Tiered storage: multiple storage volumes for MergeTree tables. It’s possible to store fresh data on SSD and automatically move old data to HDD. We already discussed this functionality in our blog.
LIVE VIEW tables that we also described in our blog. The support in this release is not complete, but new improvements are coming quickly. Feel free to try it out.
WITH FILLmodifier of
ORDER BYfor data gap filling. It allows to fill
missing datato provide uniform reports. For example, you can fill missing dates for time series so that every day is shown even if some days have no data. This feature is not documented yet, so here is an example how missing dates can be filled:
SELECT arrayJoin([today()-10, today()]) AS d ORDER BY d ASC WITH FILL
Table constraints that are checked at insert.
Sensitive data masking for query_log, server logs, process list with regexp-based rules.
Table function input() for reading incoming data in
INSERT SELECTquery. A very useful feature when you need to preprocess data just before inserting. We are going to showcase it in our planned articles around log processing use cases.
Cascaded materialized views. It is an important feature for many use cases including Kafka integration. In particular, it allows you to have load raw data from Kafka using MV and then aggregate it using another MV. That was not possible in previous ClickHouse releases.
Kafka: Altinity took over support for the Kafka engine a few months ago. Kafka functionality and stability has been improved in this release, in particular:
- ClickHouse can act as Kafka producer, and not just to read from Kafka, but also send data back with an insert statement.
- Atomic parsing of each message:
kafka_row_delimiteris now obsolete (ignored)
- More reliable commit logic
- Virtual columns
_timestampfor Kafka engine table.
- Parsing of most of the formats is working properly now
- In case of topic rebalancing on the Kafka side, duplicates are possible. We are going to address it in 2020/Q1. https://github.com/ClickHouse/ClickHouse/issues/7259.
- Select from mysql() table function is broken in this release. Two workarounds are possible:
- Use MySQL table engine.
create table as mysql(...)and then select from this table — this is actually a shortcut to MySQL table engine.
- Some constant-like functions can misbehave when queried through Distributed tables: https://github.com/ClickHouse/ClickHouse/issues/7666.
- ‘system reload dictionary’ does not reload dictionaries with ‘update_field’: https://github.com/ClickHouse/ClickHouse/issues/7440.
whereconditions involving fields of UInt8 type with values > 1 can return unexpected results. (workaround: instead of a and b and c use a <> 0 and b <> 0 and c <> 0) https://github.com/ClickHouse/ClickHouse/issues/7772.
- If you want to preserve old ANY JOIN behavior while upgrading from a version before 19.13, you may need to install 19.13.7 first, change
any_join_distinct_right_table_keyssetting there and after that you can upgrade to 19.16. But we recommend to review your queries and rewrite them without this join type. In future releases it will be available, but with a different name (
Backward Incompatible Changes
count()supports only a single argument.
asterisk_left_columns_onlysetting has been removed (it was disabled by default).
- Numeric values for Enums can now be used directly in
INsection of the query.
- Changed serialization format of
bitmap*aggregate function states to improve performance. Serialized states of
bitmap*from previous versions cannot be read. If you happen to use bitmap aggregate functions, please contact us before upgrading.
system.query_logcolumn type was changed from
ANY RIGHT/FULL/INNER JOINis disabled by default. Set
any_join_distinct_right_table_keyssetting to enable them.
- ORC format.
- Template/TemplateIgnoreSpaces format. It allows to parse / generate data in custom text formats. So you can for example generate HTML directly from ClickHouse thus turning ClickHouse to the web server.
- CustomSeparated/CustomSeparatedIgnoreSpaces format. Supports custom escaping and delimiter rules.
- Parse unquoted NULL literal as NULL (enabled by setting
- Initialize null fields with default values if the data type of this field is not nullable (enabled by setting
New or improved functions
Aggregate function combinators which fill null or default value when there is nothing to aggregate: -OrDefault, -OrNull
uniqCombined64()to get sane results for cardinality > UINT_MAX
QuantileExactExclusive and Inclusive aggregate functions
hasToken/hasTokenCaseInsensitive (look for the token in string the same way as token_bf secondary index)
multiFuzzyMatchAllIndices, multiMatchAllIndices (return the Array of all matched indices in multiMatch family functions)
repeat function for strings
sigmoid and tanh functions for ML applications
- Changes CRoaring serialization functions (you will not be able to read Bitmaps created by earlier versions)
- bitmapMin, bitmapMax,
- bitmapSubsetLimit(bitmap, range_start, limit),
- groupBitmapAnd, groupBitmapOr, groupBitmapXor
geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision) which creates an array of precision-long strings of geohash boxes covering the provided area.
Support for wildcards in paths of table functions file and hdfs. If the path contains wildcards, the table will be readonly:
SELECT * FROM hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file', 'Parquet', 'col1 String')
New function neighbour(value, offset[, default_value]). Allows to reach prev/next row within the column.
Optimize queries with ORDER BY expressions clause, where expressions have coinciding prefix with sorting key in MergeTree tables. This optimization is controlled by
New function arraySplit and arrayReverseSplit which can split an array by “cut off” conditions. They are useful in time sequence handling.
Table function values (the name is case-insensitive). It allows to create table with some data inline.
SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))
fullHostName (alias FQDN)
numbers_mt() — multithreaded version of numbers().
currentUser() (and alias user()), returning login of authorized user.
S3 engine and table function. Partial support in this release (no authentication), complete version is expected in 19.18.x and later
WITH TIES modifier for LIMIT
New dictionaries features
- Redis as an external dictionary source.
- <sparse_hashed> dictionary layout, that is functionally equivalent to the <hashed> layout, but is more memory efficient. It uses about twice as less memory at the cost of slower value retrieval.
allow_dictionariesuser setting that works similar to
- HTTP source new attributes: credentials and http-headers.
Operations / Monitoring
system.metric_log table which stores values of system.events and system.metrics with specified time interval.
system.text_log in order to store ClickHouse logs to itself
Support for detached parts removal:
ALTER TABLE <table_name> DROP DETACHED PART '<part_id>'
MergeTree now has an additional option
ttl_only_drop_parts(disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired.
Added miscellaneous function getMacro(name) that returns String with the value of corresponding <macros> from configuration file on current server where the function is executed.
New metrics & events
- system.metrics: DistributedFilesToInsert, GlobalThread, GlobalThreadActive, LocalThread, LocalThreadActive
- system.events: Merge
New system tables
New columns in system tables
- system.clusters: errors_count, estimated_recovery_time
- system.collations: language
- system.parts: disk_name
- system.parts_columns: disk_name
- system.processes: os_thread_ids
- system.query_log: os_thread_ids
- system.tables: storage_policy
|concurrent_part_removal_threshold||100||Activate concurrent part removal (see ‘max_part_removal_threads’) only if the number of inactive data parts is at least this.|
|max_part_loading_threads||auto(6)||The number of threads to load data parts at startup.|
|max_part_removal_threads||auto(6)||The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).|
|storage_policy||default||Name of storage disk policy|
|ttl_only_drop_parts||0||Only drop altogether the expired parts and not partially prune them.|
|allow_drop_detached||0||Allow ALTER TABLE … DROP DETACHED PARTITION … queries|
|allow_experimental_live_view||0||Enable LIVE VIEW. Not mature enough.|
|allow_introspection_functions||0||Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.|
|any_join_distinct_right_table_keys||0 (was 1)||Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for ’t1 ANY LEFT JOIN t2’ and ’t2 ANY RIGHT JOIN t1’. ANY RIGHT JOIN needs one-to-many keys mapping to be consistent with LEFT one.|
|connection_pool_max_wait_ms||0||The wait time when connection pool is full.|
|default_max_bytes_in_join||100000000||Maximum size of right-side table if limit’s required but max_bytes_in_join is not set.|
|distributed_directory_monitor_max_sleep_time_ms||30000||Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.|
|distributed_replica_error_cap||1000||Max number of errors per replica, prevents piling up incredible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.|
|distributed_replica_error_half_life||60||Time period reduces replica error counter by 2 times.|
|format_custom_escaping_rule||Escaped||Field escaping rule (for CustomSeparated format)|
|format_custom_field_delimiter||Delimiter between fields (for CustomSeparated format)|
|format_custom_result_after_delimiter||Suffix after result set (for CustomSeparated format)|
|format_custom_result_before_delimiter||Prefix before result set (for CustomSeparated format)|
|format_custom_row_after_delimiter||Delimiter after field of the last column (for CustomSeparated format)|
|format_template_resultset||Path to file which contains format string for result set (for Template format)|
|format_template_row||Path to file which contains format string for rows (for Template format)|
|format_template_rows_between_delimiter||Delimiter between rows (for Template format)|
|input_format_csv_unquoted_null_literal_as_null||0||Consider unquoted NULL literal as N|
|input_format_null_as_default||0||For text input formats initialize null fields with default values if data type of this field is not nullable|
|input_format_tsv_empty_as_default||0||Treat empty fields in TSV input as default values.|
|input_format_values_accurate_types_of_literals||1||For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.|
|input_format_values_deduce_templates_of_expressions||0||For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.|
|joined_subquery_requires_alias||0||Force joined subqueries to have aliases for correct name qualification.|
|kafka_max_wait_ms||5000||The wait time for reading from Kafka before retry.|
|live_view_heartbeat_interval||15||The heartbeat interval in seconds to indicate live query is alive.|
|max_http_get_redirects||0||Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.|
|max_live_view_insert_blocks_before_refresh||64||Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.|
|min_free_disk_space_for_temporary_data||0||The minimum disk space to keep while writing temporary data used in external sorting and aggregation.|
|optimize_read_in_order||1||Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.|
|partial_merge_join||0||Use partial merge join instead of hash join for LEFT and INNER JOINs.|
|partial_merge_join_optimizations||0||Enable optimizations in partial merge join|
|partial_merge_join_rows_in_left_blocks||10000||Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.|
|partial_merge_join_rows_in_right_blocks||10000||Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.|
|query_profiler_cpu_time_period_ns||1000000000 (was 0)||Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.|
|query_profiler_real_time_period_ns||1000000000 (was 0)||Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.|
|replace_running_query_max_wait_ms||5000||The wait time for running query with the same query_id to finish when setting ‘replace_running_query’ is active.|
|s3_min_upload_part_size||536870912||The minimum size of part to upload during multipart upload to S3.|
|temporary_live_view_timeout||5||Timeout after which temporary live view is deleted.|