Optimizing Storage Costs and Accelerating Big Queries in Telecommunication Log Analysis

OBJECTIVES/GOAL

The objective of this project was to assist one of the world’s leading telecommunication service providers in enhancing their log analysis system to support real-time monitoring, threat tracing, and alerting while simultaneously reducing storage costs. The primary focus was on implementing storage strategies to achieve a 50% reduction in costs and optimizing query performance to ensure big queries could be processed within 1 second.

CHALLENGES

The customer faced several challenges with their existing log analysis system:

Managing and storing the massive volume of logs, amounting to 15 billion daily log additions, across multiple petabyte-scale clusters was expensive and resource-intensive.
Storage costs were escalating, posing a significant financial burden to the organization.
Query performance for large datasets was slow, with some queries taking minutes to process, impacting real-time monitoring and threat tracing capabilities.

Balancing the need for immediate access to critical log data with the imperative to reduce storage costs presented a complex challenge.

ACCOMPLISHMENTS

The implementation of storage and query optimization strategies resulted in significant achievements:

Storage Cost Reduction: Three storage strategies, including the use of ZSTD compression, tiered storage of hot and cold data, and differentiated replica numbers for different data partitions, led to a 50% reduction in storage costs.
Query Performance Improvement: Differentiated query strategies based on data size, such as partitioning small tables by date, utilizing materialized views for medium-sized tables, and employing the Aggregate Key model for large tables, enabled big queries to be processed within 1-2 seconds, significantly enhancing system responsiveness and real-time monitoring capabilities.

Enhanced Efficiency: Queries that previously took minutes to process now finish within milliseconds, enabling rapid tracing and location of abnormal events or failures. Queries on large tables containing billions of data records can be completed in just a few seconds, improving overall system efficiency and user experience.

TECHNOLOGIES USED
  • ZSTD Compression Algorithm: Implemented to achieve a compression ratio of 10:1 for tables larger than 1TB, reducing storage costs while maintaining data integrity
  • Tiered Storage: Utilized SSD for storing hot data (within the past 7 days), HDD for cooled down data, and object storage for colder data, optimizing storage costs based on data access patterns.
  • Differentiated Replica Numbers: Applied to different data partitions based on access frequency, ensuring higher replication for newer data and lower replication for older data to balance performance and cost.
  • Materialized Views: Utilized for pre-computed result sets in tables ranging from 100GB to 1TB, improving query performance and reducing resource consumption.
  • Aggregate Key Model: Implemented for tables exceeding 100TB, enabling pre-aggregation of data to accelerate query processing and achieve near real-time response times.