Add `/api/v2/test/grafana-table` endpoint to validate Grafana table format compatibility before implementing the full time range API. - Create server/grafana.go with table format structures - Add structured logging and OpenTelemetry tracing - Include realistic NTP Pool sample data with null handling - Set proper CORS and cache headers for testing - Update implementation plan with Phase 0 completion status Ready for Grafana JSON API data source integration testing.
12 KiB
12 KiB
DETAILED IMPLEMENTATION PLAN: Grafana Time Range API with Future Downsampling Support
Overview
Implement a new Grafana-compatible API endpoint /api/v2/server/scores/{server}.{mode} that returns time series data in Grafana format with time range support and future downsampling capabilities.
API Specification
Endpoint
- URL:
/api/v2/server/scores/{server}.{mode} - Method: GET
- Path Parameters:
server: Server IP address or ID (same validation as existing API)mode: Onlyjsonsupported initially
Query Parameters (following Grafana conventions)
from: Unix timestamp in milliseconds (required)to: Unix timestamp in milliseconds (required)maxDataPoints: Integer, default 50000, max 50000 (for future downsampling)monitor: Monitor ID, name prefix, or "*" for all (optional, same as existing)interval: Future downsampling interval like "1m", "5m", "1h" (optional, not implemented initially)
Response Format
Grafana table format JSON array (more efficient than separate series):
[
{
"target": "monitor{name=zakim1-yfhw4a}",
"tags": {
"monitor_id": "126",
"monitor_name": "zakim1-yfhw4a",
"type": "monitor",
"status": "active"
},
"columns": [
{"text": "time", "type": "time"},
{"text": "score", "type": "number"},
{"text": "rtt", "type": "number", "unit": "ms"},
{"text": "offset", "type": "number", "unit": "s"}
],
"values": [
[1753431667000, 20.0, 18.865, -0.000267],
[1753431419000, 20.0, 18.96, -0.000390],
[1753431151000, 20.0, 18.073, -0.000768],
[1753430063000, 20.0, 18.209, null]
]
}
]
Implementation Details
1. Server Routing (server/server.go)
Add new route after existing scores routes:
e.GET("/api/v2/server/scores/:server.:mode", srv.scoresTimeRange)
Key Implementation Clarifications
Monitor Filtering Behavior
- monitor=*: Return ALL monitors (no monitor count limit)
- 50k datapoint limit: Applied in database query (LIMIT clause)
- Return whatever data we get from database to user (no post-processing truncation)
Null Value Handling Strategy
- Score: Always include (should never be null)
- RTT: Skip datapoints where RTT is null
- Offset: Skip datapoints where offset is null
Time Range Validation Rules
- Zero duration: Return 400 Bad Request
- Future timestamps: Allow for now
- Minimum range: 1 second
- Maximum range: 90 days
2. New Handler Function (server/history.go)
Function Signature
func (srv *Server) scoresTimeRange(c echo.Context) error
Parameter Parsing & Validation
// Extend existing historyParameters struct for time range support
type timeRangeParams struct {
historyParameters // embed existing struct
from time.Time
to time.Time
maxDataPoints int
interval string // for future downsampling
}
func (srv *Server) parseTimeRangeParams(ctx context.Context, c echo.Context) (timeRangeParams, error) {
// Start with existing parameter parsing logic
baseParams, err := srv.getHistoryParameters(ctx, c)
if err != nil {
return timeRangeParams{}, err
}
// Parse and validate from/to millisecond timestamps
// Validate time range (max 90 days, min 1 second)
// Parse maxDataPoints (default 50000, max 50000)
// Return extended parameters
}
Response Structure
type ColumnDef struct {
Text string `json:"text"`
Type string `json:"type"`
Unit string `json:"unit,omitempty"`
}
type GrafanaTableSeries struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
Columns []ColumnDef `json:"columns"`
Values [][]interface{} `json:"values"`
}
type GrafanaTimeSeriesResponse []GrafanaTableSeries
Cache Control
// Reuse existing setHistoryCacheControl function for consistency
// Logic based on data recency and entry count:
// - Empty or >8h old data: "s-maxage=260,max-age=360"
// - Single entry: "s-maxage=60,max-age=35"
// - Multiple entries: "s-maxage=90,max-age=120"
setHistoryCacheControl(c, history)
3. ClickHouse Data Access (chdb/logscores.go)
New Method
func (d *ClickHouse) LogscoresTimeRange(ctx context.Context, serverID, monitorID int, from, to time.Time, limit int) ([]ntpdb.LogScore, error) {
// Build query with time range WHERE clause
// Always order by ts ASC (Grafana convention)
// Apply limit to prevent memory issues
// Use same row scanning logic as existing Logscores method
}
Query Structure
SELECT id, monitor_id, server_id, ts,
toFloat64(score), toFloat64(step), offset,
rtt, leap, warning, error
FROM log_scores
WHERE server_id = ?
AND ts >= ?
AND ts <= ?
[AND monitor_id = ?] -- if specific monitor requested
ORDER BY ts ASC
LIMIT ?
4. Data Transformation Logic (server/history.go)
Core Transformation Function
func transformToGrafanaTableFormat(history *logscores.LogScoreHistory, monitors []ntpdb.Monitor) GrafanaTimeSeriesResponse {
// Group data by monitor_id (one series per monitor)
// Create table format with columns: time, score, rtt, offset
// Convert timestamps to milliseconds
// Build proper target names and tags
// Handle null values appropriately in table values
}
Grouping Strategy
- Group by Monitor: One table series per monitor
- Table Columns: time, score, rtt, offset (all metrics in one table)
- Target Naming:
monitor{name={sanitized_monitor_name}} - Tag Structure: Include monitor metadata (no metric type needed)
- Monitor Status: Query real monitor data using
q.GetServerScores()like existing API - Series Ordering: No guaranteed order (standard Grafana behavior)
- Efficiency: More efficient than separate series - less JSON overhead
Timestamp Conversion
timestampMs := logScore.Ts.Unix() * 1000
5. Error Handling
Validation Errors (400 Bad Request)
- Invalid timestamp format
- from >= to (including zero duration)
- Time range too large (> 90 days)
- Time range too small (< 1 second minimum)
- maxDataPoints > 50000
- Invalid mode (not "json")
Not Found Errors (404)
- Server not found
- Monitor not found
- Server deleted
Server Errors (500)
- ClickHouse connection issues
- Database query errors
6. Future Downsampling Design
API Extension Points
intervalparameter parsing readymaxDataPointslimit already enforced- Response format supports downsampled data seamlessly
Downsampling Algorithm (Future Implementation)
// When datapoints > maxDataPoints:
// 1. Calculate downsample interval: (to - from) / maxDataPoints
// 2. Group data into time buckets
// 3. Aggregate per bucket: avg for score/rtt, last for offset
// 4. Return aggregated datapoints
Testing Strategy
Unit Tests
- Parameter parsing and validation
- Data transformation logic
- Error handling scenarios
- Timestamp conversion accuracy
Integration Tests
- End-to-end API requests
- ClickHouse query execution
- Multiple monitor scenarios
- Large time range handling
Manual Testing
- Grafana integration testing
- Performance with various time ranges
- Cache behavior validation
Performance Considerations
Current Implementation
- 50k datapoint limit applied in database query (LIMIT clause) (covers ~few weeks of data)
- ClickHouse-only for better range query performance
- Proper indexing on (server_id, ts) assumed
- Table format more efficient than separate time series (less JSON overhead)
Future Optimizations (Critical for Production)
- Downsampling for large ranges: Essential for 90-day queries with reasonable performance
- Query optimization based on range size
- Potential parallel monitor queries
- Adaptive sampling rates based on time range duration
Documentation Updates
API.md Addition
### 7. Server Scores Time Range (v2)
**GET** `/api/v2/server/scores/{server}.{mode}`
Grafana-compatible time series endpoint for NTP server scoring data.
#### Path Parameters
- `server`: Server IP address or ID
- `mode`: Response format (`json` only)
#### Query Parameters
- `from`: Start time as Unix timestamp in milliseconds (required)
- `to`: End time as Unix timestamp in milliseconds (required)
- `maxDataPoints`: Maximum data points to return (default: 50000, max: 50000)
- `monitor`: Monitor filter (ID, name prefix, or "*" for all)
#### Response Format
Grafana table format array with one series per monitor containing all metrics as columns.
Key Research Findings
Grafana Error Format Requirements
- HTTP Status Codes: Standard 400/404/500 work fine
- Response Body: JSON preferred with
Content-Type: application/json - Structure: Simple
{"error": "message", "status": code}is sufficient - Compatibility: Existing Echo error patterns are Grafana-compatible
Data Volume Considerations
- 50k Datapoint Limit: Only covers ~few weeks of data, not sufficient for 90-day ranges
- Downsampling Critical: Required for production use with 90-day time ranges
- Current Approach: Acceptable for MVP, downsampling essential for full utility
Implementation Checklist
Phase 0: Grafana Table Format Validation ✅ COMPLETED
- Add test endpoint
/api/v2/test/grafana-tablereturning sample table format - Implement Grafana table format response structures in
server/grafana.go - Add structured logging and OpenTelemetry tracing to test endpoint
- Verify endpoint compiles and serves correct JSON format
- Test endpoint response format and headers (CORS, Content-Type, Cache-Control)
- Test with actual Grafana instance to validate table format compatibility
- Confirm time series panels render table format correctly
- Validate column types and units display properly
Phase 0 Implementation Details
Files Created/Modified:
server/grafana.go: New file containing Grafana table format structures and test endpointserver/server.go: Added routee.GET("/api/v2/test/grafana-table", srv.testGrafanaTable)
Test Endpoint Features:
- URL:
http://localhost:8030/api/v2/test/grafana-table - Response Format: Grafana table format with realistic NTP Pool data
- Sample Data: Two monitor series (zakim1-yfhw4a, nj2-mon01) with time-based values
- Columns: time, score, rtt (ms), offset (s) with proper units
- Null Handling: Demonstrates null offset values
- Headers: CORS, JSON content-type, cache control
- Observability: Structured logging with context, OpenTelemetry tracing
Recommended Grafana Data Source: JSON API plugin (marcusolsson-json-datasource) - ideal for REST APIs returning table format JSON
Phase 1: Core Implementation
- Add route in server.go
- Implement parseTimeRangeParams function
- Add LogscoresTimeRange method to ClickHouse
- Implement transformToGrafanaTableFormat function
- Add scoresTimeRange handler
- Error handling and validation (reuse existing Echo patterns)
- Cache control headers (reuse setHistoryCacheControl)
Phase 2: Testing & Polish
- Unit tests for all functions
- Integration tests
- Manual Grafana testing with real data
- Performance testing with large ranges (up to 50k points)
- API documentation updates
Phase 3: Future Enhancement Ready
- Interval parameter parsing (no-op initially)
- Downsampling framework hooks (critical for 90-day ranges)
- Monitoring and metrics for new endpoint
This design provides a solid foundation for immediate Grafana integration while being fully prepared for future downsampling capabilities without breaking changes.
Critical Notes for Production
- Downsampling Required: 50k datapoint limit means 90-day ranges will hit limits quickly
- Table Format Validation: Phase 0 testing ensures Grafana compatibility before full implementation
- Error Handling: Existing Echo patterns are sufficient for Grafana requirements
- Scalability: Current design handles weeks of data well, downsampling needed for months