How are encoded logs read back as original lines in ClickHouse?

A Dictionary serves each template pre-split into literals and slots, and a view over six SQL lambda functions splices the row's values back between the literals at query time, so an ordinary SELECT returns the original text.

We shipped a Rust UDF for ClickHouse, then deleted it: plain SQL was 100x faster

Q: Why was the Rust UDF 100x slower than plain SQL in ClickHouse?

ClickHouse talks to an executable UDF over stdin/stdout, paying per-row IPC, while the SQL path never leaves the vectorized engine: roughly 5.8 s vs 60 ms to decode the full 137,418-row sample.

An earlier port shipped a Rust executable UDF, on the assumption a compiled binary would beat interpreted SQL. It lost by about 100x on bulk decode: roughly 5.8 s for the full 137K-row table against 60 ms for the SQL path. We deleted the binary.

What replaced it is one SQL file. A Dictionary, a view, and a few lambda functions are the whole integration.

That is the opposite of how the earlier ports went. When we ported our compact-log decoder to Splunk, we built a transparency layer: a browser hook, an SPL rewriter, a KV-store cron. The Elasticsearch port was a native Lucene plugin with a scorer interceptor. On ClickHouse I built none of that. The ClickHouse port is the simplest of the three, not because ClickHouse does less, but because it exposes as plain SQL the exact primitives the other two had to smuggle into a JVM or fake in a browser.

The problem is the same one the Splunk and Elasticsearch posts open with. The Receiver is a Log10x agent that runs at the edge, alongside your forwarder. In Compact mode it recognizes the format string behind each log.info(...) call as a template, stores it once, and stores each event as the template's hash plus the handful of values that change line to line.

That hash stays stable across deploys because we derive the template vocabulary from the source environment at compile time. We do not infer it from a sliding window the way Drain, the standard streaming log-clustering algorithm, does. Why the hash holds is its own post. This one is the query side.

Everything below is runnable from Apache 2.0 code, install.sql, the sample corpus, and the standalone decoders included; the Receiver that produces compact events is the commercial piece.

What a row looks like before ClickHouse sees through it

The Receiver ships two streams: a small templates set, about one row per pattern, and the bulk of compact events. Each event arrives as a JSON envelope whose log field holds a payload like ~x7Kp2m,1744628581000,admin,192.168.1.1. The leading ~ marks it encoded, the part up to the first comma is the template hash, and the rest are the values.

To rebuild the line, the view looks up that hash's stored template, then drops the row's values into the template's slots in order, splicing them between the fixed literal fragments. A Grafana panel pointed at the raw form shows the encoded string, not a log line. The job is to make an ordinary SELECT return the original text without anyone editing a query.

Each object in install.sql replaces a layer the siblings hand-built

The install is clickhouse-client --multiquery < install.sql. It creates a database, two tables, one dictionary, six SQL functions, and two views, with no binary, executable UDF, or platform-specific path. Each object stands in for something the Splunk app built by hand.

The Splunk app keeps its templates in a MongoDB-backed KV Store and refreshes it with a saved search named "Consume KV" every two minutes. On ClickHouse that collapses into a table plus a dictionary:

CREATE DICTIONARY tenx.templates_dict
(
    templateHash String,
    literals     Array(String),
    slots        Array(String)
)
PRIMARY KEY templateHash
SOURCE(CLICKHOUSE(TABLE 'templates' DB 'tenx'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 60 MAX 120);

LIFETIME(MIN 60 MAX 120) is the cron. New templates become queryable within 60 to 120 seconds, with no scheduled job to maintain. ClickHouse randomizes the refresh inside that window so replicas don't all reload at once. The templates table materializes two arrays at INSERT time, literals and slots, by splitting each template on its $ markers, so the dictionary serves a parsed structure rather than a string the query re-parses.

The Splunk tenx-inflate macro and the Lucene scorer both become a view over six lambda functions:

CREATE VIEW tenx.events AS
SELECT
    container, namespace, pod, templateHash,
    log AS encoded_log,
    tenx_inflate_iso(
        log,
        dictGetOrDefault('tenx.templates_dict', 'literals', tuple(templateHash), []::Array(String)),
        dictGetOrDefault('tenx.templates_dict', 'slots',    tuple(templateHash), []::Array(String))
    ) AS decoded_log
FROM tenx.encoded_events;

The view is the user surface. A SELECT from tenx.events reads like any table, and decoded_log comes back as the original line. The jQuery beforeSend hook from Splunk and the ActionFilter query-rewriter from Elasticsearch map to nothing here. There is no query to rewrite when dashboards already point at the view.

I'd build it this way again. A view plus a dictionary plus a handful of lambda functions is the right shape for transparent decode. Everything the other two ports compiled into a JVM or wired into a browser is, on ClickHouse, declarative SQL the optimizer already understands. You can argue a compiled UDF or external decode service would be faster. We tested that, and it lost.

formatDateTimeInJodaSyntax wants a constant, so there are two views

formatDateTimeInJodaSyntax and formatDateTime require the format-string argument to be a compile-time constant. Hand them a per-row value and you get Code: 44. Argument at index 1 must be constant. But a template's timestamp format is per template, not constant. That one constraint forks the design into two views over the same data.

tenx.events_native dispatches with multiIf over the 17 timestamp formats we've observed, one constant call per branch. It preserves each template's original format at a fixed cost of about 600 ms per query.

tenx.events renders every timestamp as one constant ISO 8601 format, stays inside the vectorized engine, and expands the full 137,418-row sample in about 60 ms, near a raw scan. ISO is the default because most Grafana pickers and BI tools want ISO anyway, and it holds at scan speed regardless of result size.

Filter on templateHash, not on the decoded text

The inflate functions are opaque to the optimizer, so a WHERE on decoded_log won't push down and forces every candidate row to expand. The materialized columns push down. Filter on templateHash, container, namespace, or pod first, then read decoded_log on the survivors; the table is ORDER BY (container, templateHash) for that.

If a hash isn't in the dictionary, or a line doesn't start with ~, the function returns the raw text rather than erroring, and a test pins that behavior. A missing template never crashes a query.

Where the 100x comes from

The gap from the title is a process boundary. ClickHouse talks to an executable UDF over stdin/stdout, so the binary pays per-row IPC while the SQL path never leaves the vectorized engine, and bulk decode stays near scan speed. Pure SQL also runs on ClickHouse Cloud, which blocks executable UDFs, so one install file works everywhere.

The other surprise cuts against the storage pitch. On the 200 MB OpenTelemetry-demo sample, on-disk savings are about 42% under LZ4 but only 27% under ZSTD-3, both totals including the row-count drop from grouping. Per row, templating keeps roughly 17% under LZ4 and about 0% under ZSTD, because ZSTD's own pattern detection already finds most of the structure templating removes.

So I don't lead with storage. The codec-independent wins are transport, about 35% fewer wire bytes from edge to cluster before any compression runs, and ingest plus background-merge CPU, about 25 to 30%. Those numbers are from an 8-core Apple Silicon Mac running ClickHouse 26.5 in Docker.

The decoder is Apache 2.0 at github.com/log-10x/clickhouse-app; the Receiver that produces compact events needs a commercial license. To decode the same format outside ClickHouse, there are standalone Java and JavaScript decoders. The same install.sql runs on self-hosted, Altinity Cloud, and ClickHouse Cloud, CI'd against 24.8 LTS, 25.3, and latest. Clone it, run it against your own logs, and tell me whether you'd have shaped the decode any differently than a view, a dictionary, and six lambdas.

Related: why the template hash stays stable, what else attaches to a template, where the symbol library comes from, and the same problem solved on Splunk and Elasticsearch.

We shipped a Rust UDF for ClickHouse, then deleted it: plain SQL was 100x faster

What a row looks like before ClickHouse sees through it

Each object in install.sql replaces a layer the siblings hand-built

formatDateTimeInJodaSyntax wants a constant, so there are two views

Filter on templateHash, not on the decoded text

Where the 100x comes from

Read more

The Drain pattern ID is a join column you can't trust

Why MCP servers for logs break

Cutting Elasticsearch log storage without rewriting a single query

Cutting Splunk log storage without rewriting a single query