DEV Community

Tinybird profile imageCameron Archer
Cameron Archer for Tinybird

Posted on

Build a Real-time Database Query Performance Analytics API with Tinybird

In the realm of software development, monitoring and analyzing database query performance is crucial for maintaining application efficiency and speed. Slow or inefficient queries can drastically affect user experience and increase resource consumption, leading to higher costs. This tutorial will guide you through creating a real-time API that ingests database query logs and provides Endpoints to analyze query trends, performance by query type, and identify slow queries for optimization. We'll use Tinybird, a data analytics backend for software developers, to implement this solution. Tinybird allows you to build real-time analytics APIs without the need to set up or manage the underlying infrastructure. It leverages data sources and pipes to ingest, transform, and expose your data through high-performance APIs.

Understanding the data

Imagine your data looks like this:

{"query_id": "qid_7802", "database": "staging", "query_text": "SELECT * FROM table2 WHERE column2 = 2", "query_type": "SELECT", "user": "user2", "start_time": "2025-05-12 17:01:52", "end_time": "2025-05-12 17:03:32", "duration_ms": 2902, "rows_read": 737802, "bytes_read": 377737802, "memory_usage": 377737802, "status": "success", "error_message": "", "client_ip": "192.168.87.87"}
Enter fullscreen mode Exit fullscreen mode

This data represents logs from database queries, including identifiers, database names, the query itself, performance metrics like duration and resources used, and the outcome of the query. To store this data in Tinybird, we create a data source with a schema that mirrors these fields. Here's how you can define such a data source in a .datasource file:

DESCRIPTION >
    Stores database query logs for performance analysis

SCHEMA >
    `query_id` String `json:$.query_id`,
    `database` String `json:$.database`,
    `query_text` String `json:$.query_text`,
    `query_type` String `json:$.query_type`,
    `user` String `json:$.user`,
    `start_time` DateTime `json:$.start_time`,
    `end_time` DateTime `json:$.end_time`, 
    `duration_ms` Float64 `json:$.duration_ms`,
    `rows_read` UInt64 `json:$.rows_read`,
    `bytes_read` UInt64 `json:$.bytes_read`,
    `memory_usage` UInt64 `json:$.memory_usage`,
    `status` String `json:$.status`,
    `error_message` String `json:$.error_message`,
    `client_ip` String `json:$.client_ip`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(start_time)"
ENGINE_SORTING_KEY "start_time, database, query_type, user"
Enter fullscreen mode Exit fullscreen mode

This schema is designed for optimal query performance, with a sorting key that aligns with how we'll query the data. For data ingestion, Tinybird's Events API allows you to stream JSON/NDJSON events from your application frontend or backend with a simple HTTP request. This is ideal for real-time data like query logs. Here's how you would use it:

curl -X POST "https://api.europe-west2.gcp.tinybird.co/v0/events?name=query_logs&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV" \
     -H "Authorization: Bearer $TB_ADMIN_TOKEN" \
     -d '{"query_id": "b3f1c8a5-7d84-4f93-9c64-8f7b36c5e1d2", ... }'
Enter fullscreen mode Exit fullscreen mode

In addition to the Events API, Tinybird also offers a Kafka connector for data and the Data Sources API and S3 connector for batch or file-based data ingestion.

Transforming data and publishing APIs

Tinybird's pipes enable batch and real-time data transformations, and the creation of API endpoints. Let's explore how to use pipes to analyze our query logs.

Analyzing trends in query performance over time

The query_trends pipe aggregates query logs by hour, calculating metrics such as the average query duration and error count. This helps identify periods of high load or frequent errors.

DESCRIPTION >
    Analyzes trends in query performance over time

NODE query_trends_node
SQL >
    SELECT 
        toStartOfHour(start_time) as hour,
        count() as query_count,
        avg(duration_ms) as avg_duration_ms,
        ... GROUP BY hour
    ORDER BY hour DESC

TYPE endpoint
Enter fullscreen mode Exit fullscreen mode

This SQL aggregates queries by the hour they were started, providing insights into the number of queries, their average duration, and other metrics over time. To call this API, you would use:

curl -X GET "https://api.europe-west2.gcp.tinybird.co/v0/pipes/query_trends.json?token=%24TB_ADMIN_TOKEN&start_time=2023-06-01+00%3A00%3A00&end_time=2023-06-30+23%3A59%3A59&database=production&query_type=SELECT&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"
Enter fullscreen mode Exit fullscreen mode

By changing the parameters, you can filter the data for specific databases or query types.

Grouping query performance by type

The query_performance_by_type pipe groups performance metrics by query type, such as SELECT, UPDATE, or INSERT. This can help identify which query types are most resource-intensive.

DESCRIPTION >
    Analyzes query performance metrics grouped by query type

NODE query_performance_by_type_node
SQL >
    SELECT 
        query_type,
        count() as query_count,
        avg(duration_ms) as avg_duration_ms,
        ... GROUP BY query_type
    ORDER BY avg_duration_ms DESC

TYPE endpoint
Enter fullscreen mode Exit fullscreen mode

Identifying slow queries

The slow_queries pipe filters queries that exceed a specified duration threshold, useful for identifying and optimizing slow queries.

DESCRIPTION >
    Identifies and analyzes slow queries that exceed a specified duration threshold

NODE slow_queries_node
SQL >
    SELECT 
        query_id,
        database,
        query_type,
        ... ORDER BY duration_ms DESC
    LIMIT {{Int32(limit, 100)}}

TYPE endpoint
Enter fullscreen mode Exit fullscreen mode

Deploying to production

To deploy these data sources and pipes to the Tinybird cloud, use the Tinybird CLI with the command tb --cloud deploy. This command creates scalable, production-ready API endpoints. Tinybird manages these resources as code, facilitating integration with CI/CD pipelines and ensuring your data infrastructure is version-controlled and deployable with ease. Secure your APIs using token-based authentication to keep your data protected. Example API call:

curl -X GET "https://api.tinybird.co/v0/pipes/slow_queries.json?token=%24TB_ADMIN_TOKEN&duration_threshold_ms=2000&utm_source=DEV&utm_campaign=tb+create+--prompt+DEV"
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this tutorial, you've learned how to ingest, transform, and expose database query logs as real-time analytics APIs using Tinybird. By leveraging data sources and pipes, you can efficiently analyze query performance trends, identify resource-intensive query types, and spotlight slow queries for optimization. Tinybird's scalable infrastructure and developer-centric tools enable rapid development and deployment of data-driven APIs. Sign up for Tinybird to build and deploy your first real-time data APIs in a few minutes.

Top comments (0)