AI News Hub Logo

AI News Hub

SigNoz Setup Guide - Monitor Server

DEV Community
Anh Hai Newton

Environment OS: Ubuntu (snap docker) Docker path: /var/snap/docker/common/var-lib-docker/ SigNoz path: /root/signoz/ SigNoz version: v0.117.1 ClickHouse version: 25.5.6 CPU: No AVX2 support (important — affects configuration) cd /root git clone -b main https://github.com/SigNoz/signoz.git cd signoz/deploy/docker CPUs without AVX2 will encounter CANNOT_ALLOCATE_MEMORY errors when SigNoz queries metrics via JSONExtractString. Check if your CPU supports AVX2: grep -o 'avx2' /proc/cpuinfo | head -1 # No output = no AVX2 support → apply the fix below Edit /root/signoz/deploy/common/clickhouse/users.xml and add 0 to the default profile: 10000000000 0 random ... ClickHouse uses the simdjson library for JSON parsing functions (JSONExtractString, etc.). Starting from ClickHouse 24.1+, simdjson requires AVX2 CPU instructions. On CPUs without AVX2, any query using JSONExtractString on the distributed_time_series_v4 table throws: Code: 173. DB::Exception: Couldn't allocate N bytes when parsing JSON Setting allow_simdjson=0 forces ClickHouse to fall back to rapidjson, which does not require AVX2. This affects: Infrastructure → Hosts tab (returns 500 error) Any dashboard querying metrics with label filters By default, ClickHouse writes internal diagnostic logs (trace_log, metric_log, etc.) that can grow to 70GB+ over time. Create /root/signoz/deploy/common/clickhouse/z_log_disable.xml: The z_ prefix ensures this file is loaded last (configs are applied alphabetically), overriding any previous settings. Mount this file into the ClickHouse container in /root/signoz/deploy/docker/docker-compose.yaml: clickhouse: volumes: - ../common/clickhouse/config.xml:/etc/clickhouse-server/config.xml - ../common/clickhouse/users.xml:/etc/clickhouse-server/users.xml - ../common/clickhouse/custom-function.xml:/etc/clickhouse-server/custom-function.xml - ../common/clickhouse/user_scripts:/var/lib/clickhouse/user_scripts/ - ../common/clickhouse/cluster.xml:/etc/clickhouse-server/config.d/cluster.xml - ../common/clickhouse/ttl.xml:/etc/clickhouse-server/config.d/ttl.xml - ../common/clickhouse/z_log_disable.xml:/etc/clickhouse-server/config.d/z_log_disable.xml # ADD THIS LINE - clickhouse:/var/lib/clickhouse/ Add the following environment variables to the signoz service in docker-compose.yaml: signoz: environment: - SIGNOZ_ALERTMANAGER_PROVIDER=signoz - SIGNOZ_TELEMETRYSTORE_CLICKHOUSE_DSN=tcp://clickhouse:9000 - SIGNOZ_SQLSTORE_SQLITE_PATH=/var/lib/signoz/signoz.db # SMTP configuration - SIGNOZ_ALERTMANAGER_SIGNOZ_GLOBAL_SMTP__FROM=your-email@gmail.com - SIGNOZ_ALERTMANAGER_SIGNOZ_GLOBAL_SMTP__SMARTHOST=smtp.gmail.com:587 - SIGNOZ_ALERTMANAGER_SIGNOZ_GLOBAL_SMTP__AUTH__USERNAME=your-email@gmail.com - SIGNOZ_ALERTMANAGER_SIGNOZ_GLOBAL_SMTP__AUTH__PASSWORD=your-16-char-app-password Gmail note: You must use an App Password, not your regular Gmail password. https://myaccount.google.com/apppasswords Important: The correct variable names use the SIGNOZ_ALERTMANAGER_SIGNOZ_GLOBAL_SMTP__ prefix with double underscores. Using SIGNOZ_ALERTMANAGER_SMTP_* (without SIGNOZ_GLOBAL) will not work. cd /root/signoz/deploy/docker docker compose up -d Verify all containers are healthy: docker compose ps Verify ClickHouse system logs are disabled: docker exec signoz-clickhouse clickhouse-client --query " SELECT name FROM system.tables WHERE database='system' AND name LIKE '%log%' ORDER BY name" # Only query_log should remain Run an otel-collector agent on the main application server: mkdir -p /opt/otel-agent cat > /opt/otel-agent/config.yaml :4317" tls: insecure: true extensions: health_check: service: extensions: [health_check] pipelines: metrics: receivers: [hostmetrics] processors: [resourcedetection, resource/env, batch] exporters: [otlp] EOF Replace with your SigNoz monitor server IP. docker run -d \ --name otel-agent \ --restart unless-stopped \ --hostname navio-main-server \ -v /opt/otel-agent/config.yaml:/etc/otelcol-contrib/config.yaml \ -v /:/hostfs:ro \ -e HOST_PROC=/hostfs/proc \ -e HOST_SYS=/hostfs/sys \ -e HOST_ETC=/hostfs/etc \ -e HOST_VAR=/hostfs/var \ -e HOST_RUN=/hostfs/run \ -e HOST_DEV=/hostfs/dev \ otel/opentelemetry-collector-contrib:latest After a few minutes, the host should appear in SigNoz UI → Infrastructure → Hosts. Go to SigNoz UI → Alerts → New Alert → Metric Based Alert: Query A (used disk): Metric: system.filesystem.usage Filter: state = used AND host.name = Time aggregation: avg, Space aggregation: sum Query B (total disk — all states): Metric: system.filesystem.usage Filter: host.name = Time aggregation: avg, Space aggregation: sum Formula F1: A / B * 100 Condition: Warning: above 80 Critical: above 90 Evaluation window: last 5 min Annotations: Alert: Disk usage has exceeded {{$threshold}}% (current: {{$value}}%) # Check disk usage on monitor server df -h / # Verify ClickHouse system logs are not growing docker exec signoz-clickhouse clickhouse-client --query " SELECT table, formatReadableSize(sum(bytes_on_disk)) as size FROM system.parts WHERE database = 'system' GROUP BY table ORDER BY sum(bytes_on_disk) DESC" # Clean up unused Docker images on main server docker image prune -af # Check Docker disk usage docker system df # Enter clickhouse-client docker exec -it signoz-clickhouse clickhouse-client # Create force drop flag (run in a separate terminal) docker exec signoz-clickhouse bash -c \ "touch /var/lib/clickhouse/flags/force_drop_table && \ chmod 666 /var/lib/clickhouse/flags/force_drop_table" # Truncate all large log tables TRUNCATE TABLE system.trace_log; TRUNCATE TABLE system.metric_log; TRUNCATE TABLE system.query_log; TRUNCATE TABLE system.processors_profile_log; TRUNCATE TABLE system.part_log; TRUNCATE TABLE system.asynchronous_metric_log; TRUNCATE TABLE system.query_views_log; Root cause: CPU does not support AVX2 instructions. # Check AVX2 support grep -o 'avx2' /proc/cpuinfo | head -1 # No output = no AVX2 Fix: Add 0 to ClickHouse users.xml as described in Section 2. docker logs otel-agent 2>&1 | tail -20 # Check that host.name is detected correctly # Check that the endpoint IP is correct