Professional PostgreSQL monitoring made easy
Kaarel Moppel
www.cybertec.at
Kaarel Moppel
www.cybertec.at
Why to monitor
Failure / Downtime detection
Slowness / Performance analysis
Proactive predictions
Maybe wasting money?
Kaarel Moppel
www.cybertec.at
Dierent levels of Database monitoring
Kaarel Moppel
www.cybertec.at
High level service availability
Try to periodically connect/query from an outside system
DIY - e.g. a simple Cron script
SaaS - lots of service providers
Who will guard the guards themselves?
You’ll probably want two services for more critical stu
Kaarel Moppel
www.cybertec.at
System monitoring
Operating System / Process monitoring
DIY involving typically a TSDB and some graphing/alerting
engine
Graphite, RRDtool, OpenTSDB
Nagios / Icinga /
Something provided out-of-the-box by cloud providers usually
Included in VM software like VMware vSphere etc
Kaarel Moppel
www.cybertec.at
System monitoring
Make sure to understand what you’re measuring!
Do you know what does the CPU load number actually mean?
Is it a good metric?
What’s the dierence between VIRT, RES, SHR memory
values for a process?
Kaarel Moppel
www.cybertec.at
PostgreSQL land
Kaarel Moppel
www.cybertec.at
Log analysis
“Just in case” storing of logs for possible ad hoc needs.
Moving logs to a central place makes sense.
Cron + rsync
(r)syslog(-ng), redislog
Active parsing
DIY (Graylog, ELK, …)
pgBadger
Some cloud service (Loggly, Splunk, …)
Kaarel Moppel
www.cybertec.at
Logging conguration
Settings to note
log_destination (CSV format recommended)
log_statement
log_min_duration_statement
log_min_messages / log_min_error_statement
krl@postgres=# SELECT count(*) FROM pg_settings
WHERE category LIKE 'Reporting and Logging%';
count
------
35
(1 row)
Kaarel Moppel
www.cybertec.at
Stats Collector
Not all track_* parameters enabled by default
Dynamic views
pg_stat_activity, pg_stat_replication/pg_stat_wal_receiver,
pg_stat_ssl
Cumulative views
Most pg_stat_* views
Long uptimes cause “lag” for problem detection
Selective stats reset possible
Kaarel Moppel
www.cybertec.at
Stats Collector
pg_stat_database
pg_stat(io)_user_tables
pg_stat(io)_user_indexes
pg_stat_user_functions
(see “\dv pg_stat*“, 31 views for PG 10)
Kaarel Moppel
www.cybertec.at
Extensions
Most notably pg_stat_statments
pgstattuple
pg_buercache
auto_explain
Kaarel Moppel
www.cybertec.at
Locks
Separate from Stats Collector
pg_locks
pg_stat_activity
wait_event_type/wait_event (9.6+, very detailed info)
log_lock_waits (uses deadlock_timeout)
Kaarel Moppel
www.cybertec.at
Autovacuum
For busy databases monitor also Autovacuum
pg_stat_progress_vacuum
pg_stat_activity WHERE query LIKE ‘autovacuum%’
If Autovacuum is lagging behind you’ll end up with unecessary
bloat
Tip: idle_in_transaction_session_timeout /
old_snapshot_threshold
Kaarel Moppel
www.cybertec.at
Real life
Mixed approach for bigger setups
DYI
Log collection / parsing
Continuous storing of pg_stat* snapshots via some tool
Alerting and trends predictions (it’s hard!)
APM
A more high level concept, requires some trust / lock-in
AppDynamics, New Relic, DataDog,
Kaarel Moppel
www.cybertec.at
PostgreSQL Monitoring Tools
Kaarel Moppel
www.cybertec.at
No shortage of tools
https://wiki.postgresql.org/wiki/Monitoring
Kaarel Moppel
www.cybertec.at
Approaches
Ad hoc
Continuous monitoring frameworks
Cloud / SaaS
DIY
Kaarel Moppel
www.cybertec.at
Ad hoc monitoring / troubleshooting
Kaarel Moppel
www.cybertec.at
Open Source Ad hoc tools
pgAdmin4
pg_activity
pg_view
pgcenter
pghero
Kaarel Moppel
www.cybertec.at
Continuous monitoring frameworks
Kaarel Moppel
www.cybertec.at
Commercial
AppDynamics
New Relic
Datadog
Vividcortex
EDB Enterprise Manager
pganalyze
Most also have some free version with basic features
Kaarel Moppel
www.cybertec.at
Open Source
Genral Monitoring Frameworks
Nagios
Icinga
Munin
Zabbix
check_postgres script
Kaarel Moppel
www.cybertec.at
Open Source
Postgres specic
pghero
PoWa (server side, quite advanced - pg_qualstats,
pg_stat_kcache)
PgObserver (client side + ad hoc)
pgwatch2 (client side)
Kaarel Moppel
www.cybertec.at
pgwatch2
Kaarel Moppel
www.cybertec.at
Main principles - why another tool?
1-minute setup
Docker
Custom visuals / Dashboarding
Non-invasive
No extensions for main functionality
Easy extensibility
SQL metrics
Do minimal work needed, use existing SW
Kaarel Moppel
www.cybertec.at
Architecture components
Metrics gathering daemon
Go
Cong database
Postgres
Metrics storage layer
InuxDB (Graphite possible)
Web UI for administration
Python / Bootstrap
Easy dashboarding with data discovery
Grafana
Kaarel Moppel
www.cybertec.at
Features
Ready to go
Default cover almost all pg_stat* views
Test database (possible to disable) as playground
Supports Postgres 9.0+ (older versions also possible)
Security (SSL)
Custom metrics via SQL, also for business layer!
Reuse of existing components (Postgres, Grafana, InuxDB)
possible
Can be integrated with your “cloud”
Features
Component logs available via Web UI for troubleshooting
Possible to monitor all databases of a cluster automatically
Change detection
Added/changed/deleted table/index/sproc/cong events
Alerting easily possible
Grafana
Kapacitor (“K” from InuxData’s TICK stack)
Extensible - Grafana has plugins!
Getting started
1. docker run -d -p 3000:3000 -p 8080:8080 \
--name pw2 cybertec/pgwatch2
2. Wait some seconds and open browser at localhost:8080
3. Insert your DB connection strings and wait some minutes
4. Start Dashboarding!
Alerting / Anomaly detection
Grafana
Eeasy setup, point and click
Most important alerting services covered
Email
PagerDuty
Slack
Web hooks
Kafka
Graph panel only
Kapacitor
Part of the InuxData’s TICK stack
Harder to get going but very powerful!
Features
Extensive math/string processing support
Statistical data mangling
UDF-s
Alert topics - pub/sub
Stream caching (e.g. last 10min moving average)
Stream redirection - store transformed data back into InuxDB
Kapacitor sample - simple
stream
|from()
.measurement('cpu')
|alert()
.crit(lambda: "usage_idle" < 70)
.log('/tmp/alerts.log')
.email()
Kapacitor sample - complex
|from()
.measurement('cpu')
|groupBy('service', 'datacenter')
|window()
.period(1m)
|percentile('load_1min', 95.0)
|eval(lambda: sigma("percentile"))
.as('sigma')
|alert()
.id('{{ .Name }}/{{ index .Tags "service" }}/{{ index .Tags "datacenter"}}')
.message('{{ .ID }} is {{ .Level }} cpu-95th:{{ index .Fields "percentile" }}')
.crit(lambda: "sigma" > 3.0)
pgwatch2 - What’s next?
Improvement areas
More system level metrics
Better wrappers for cpu, disk, mem
Better query text handling
Web UI has pg_stat_statements overview
Fully automatic Docker updates
Log parsing?
User input expected @ github.com/cybertec-postgresql/pgwatch2
Contact us
Web: www.cybertec.at
Github: github.com/cybertec-postgresql
Twitter: @PostgresSupport