Bonree ONE Storage Architecture

2026-07-03

Bonree ONE Storage Architecture (ClickHouse-based)

The Bonree ONE is defined by three core principles: lightweight, structured, and precise.
All of these capabilities rely on a stable, reliable, and high-performance data storage foundation.

Currently, Bonree ONE is built on ClickHouse as its core storage engine, supporting multi-domain observability data, including:

  • APM (Application Performance Monitoring)

  • RUM (Real User Monitoring)

  • Logs

  • Session data

  • User behavior analytics

Storage Challenges

With multiple integrated modules and highly diverse data scenarios, the underlying storage layer faces several key challenges:

  • High ingestion throughput: Data volume must scale to PB-level ingestion capacity.

  • Extreme traffic variability: Workloads exhibit significant peaks and troughs, including sudden traffic spikes.

  • Complex query patterns: Includes OLAP analytics, raw data queries, and multi-dimensional sorting scenarios.

  • High query stability requirements: Critical metrics and alert queries must achieve millisecond-level response times.

  • Complex cluster operations: Including scaling, rebalancing, and data redistribution.

ClickHouse Optimization Strategy

To address these challenges, we optimize ClickHouse across four key dimensions:
write performance, read performance, multi-tenancy, and failover resilience.

微信图片_2026-07-03_101330_582

Write Optimization

1. Batch Writing per Table

ClickHouse performs best with batch ingestion, where larger batches significantly improve throughput.

To maximize ingestion efficiency across multiple data scenarios, we introduce a consumer-layer batching mechanism.
Each table is assigned a customized batching strategy, ensuring:

  • Maximum ingestion throughput on ClickHouse side

  • Minimal awareness required from upstream business systems

  • Optimized end-to-end ingestion efficiency

2. Rate Limiting

Under constrained storage resources, ingestion capacity is inherently limited. High ingestion pressure typically comes from two factors:

  • Excessive total data volume

  • Sudden ingestion spikes

For sustained overload (high volume), we trigger alerts and address the issue via cluster scaling or data pruning.

For burst traffic scenarios, we implement rate limiting at the consumer layer, ensuring system stability.

Specifically, we introduce a time-window-based control mechanism, including:

  • Requests per second (QPS) limits

  • Controlled ingestion intervals

This ensures stable ingestion under peak workloads.

Read Optimization

To support stable and efficient query performance across multiple business domains, we optimize query execution in the following areas:


1. Query Acceleration

OrderBy & Primary Key Design

  • The ORDER BY clause defines physical data sorting and is critical for query efficiency.

  • It should align with high-frequency query patterns.

  • Sorting should follow a progression from low-cardinality to high-cardinality fields.

The PRIMARY KEY is generally aligned with ORDER BY.
If filters do not fully cover all ORDER BY fields, a subset of leading fields can be used as the primary key.
However, the primary key must always be a prefix of the ORDER BY fields.


Indexing Strategy

  • Bloom Filter index (BFIndex): for equality filtering

  • MinMax index: for range queries

  • TokenBF index: for full-text search scenarios


Materialized Views

For fixed and repeatable query patterns, materialized views are used to:

  • Improve query performance significantly

  • Maintain data consistency

  • Reduce computational overhead

Projections

For pre-aggregation scenarios, ClickHouse projections provide:

  • Higher query efficiency

  • Automatic query routing

  • Reduced application-side complexity


2. Compression & Encoding

ClickHouse supports multiple compression algorithms:

  • NONE: No compression

  • LZ4: Fast compression

  • LZ4HC: High compression variant with adjustable level

  • ZSTD: High-efficiency general-purpose compression

Benchmark results show that ZSTD achieves 5–6x better compression efficiency than LZ4.

Encoding Techniques

To further optimize storage efficiency, ClickHouse provides multiple encoding strategies:

  • Delta encoding: Stores differences between adjacent values

  • DoubleDelta encoding: Stores differences of deltas (ideal for time series)

  • Gorilla encoding: XOR-based compression for slowly changing floating-point values

  • T64 encoding: Bit-level compression for integer types

  • FPC encoding: Prediction-based compression for floating-point values

Based on data characteristics:

  • Time-series fields use DoubleDelta + ZSTD(1)

  • String fields use ZSTD(1)

3. Fine-Grained Data Types

ClickHouse provides highly granular data types to optimize storage and computation:

  • Use Int8 / Int16 / Int32 / Int64 appropriately

  • Prefer minimal sufficient data types (e.g., Int8 instead of Int64)

  • Use LowCardinality(String) for low-cardinality string fields

  • Use Map for semi-structured data where appropriate

  • Use JSON only when necessary

Multi-Tenancy

ClickHouse supports multi-tenant architectures to ensure workload isolation and stable query performance.

In Bonree ONE:

  • Each product line is assigned a dedicated tenant

  • Tenant-level resource configuration is customized based on priority and workload characteristics

Although ClickHouse does not provide strict internal resource isolation, we implement:

  • End-to-end monitoring

  • Alerting and tracing

  • Rapid tenant resource release mechanisms

This reduces resource contention and improves system stability under load.

Failover Strategy

To ensure high availability for both ingestion and query paths, Bonree ONE implements a robust failover mechanism.

When either:

  • Consumer nodes fail, or

  • ClickHouse nodes experience anomalies

The CH-Manager control layer detects failures and performs traffic rerouting:

  • Redirects ingestion traffic away from failed nodes

  • Ensures uninterrupted query services

  • Adjusts ingestion strategies dynamically

  • Prevents cascading failures (snowball effects)


微信图片_2026-07-03_101356_911

Results

  • Write performance: Latest Bonree ONE version improves ingestion throughput by 3–5x compared to the spring release, with significantly improved stability under peak traffic.

  • Read performance: In production public cloud environments, ClickHouse query latency achieves sub-second TP99 performance.

  • System stability:

    • Single-node failure does not impact cluster-level ingestion or querying

    • Consumer node failures do not affect overall ingestion continuity


Article tags

Observability Platform

Related articles

Blog Details

See Our Unified Intelligent Observability Platform in Action!