matrix-docker-ansible-deploy/docs/configuring-playbook-prometheus-grafana.md

163 lines
13 KiB
Markdown
Raw Normal View History

<!--
SPDX-FileCopyrightText: 2021 MDAD Team and contributors
SPDX-License-Identifier: AGPL-3.0-or-later
-->
# Enabling metrics and graphs (Prometheus, Grafana) for your Matrix server (optional)
2021-01-29 09:59:27 +00:00
The playbook can install [Grafana](https://grafana.com/) with [Prometheus](https://prometheus.io/) and configure performance metrics of your homeserver with graphs for you.
2021-01-29 09:59:27 +00:00
## Adjusting the playbook configuration
2021-01-29 09:59:27 +00:00
To enable Grafana and/or Prometheus, add the following configuration to your `inventory/host_vars/matrix.example.com/vars.yml` file:
2021-01-29 09:59:27 +00:00
```yaml
prometheus_enabled: true
2021-01-29 09:59:27 +00:00
# You can remove this, if unnecessary.
prometheus_node_exporter_enabled: true
2021-01-29 09:59:27 +00:00
# You can remove this, if unnecessary.
prometheus_postgres_exporter_enabled: true
# You can remove this, if unnecessary.
matrix_prometheus_nginxlog_exporter_enabled: true
grafana_enabled: true
grafana_anonymous_access: false
# This has no relation to your Matrix user ID. It can be any username you'd like.
2021-02-12 12:02:53 +00:00
# Changing the username subsequently won't work.
grafana_default_admin_user: "some_username_chosen_by_you"
# Changing the password subsequently won't work.
grafana_default_admin_password: "some_strong_password_chosen_by_you"
2021-01-29 09:59:27 +00:00
```
The retention policy of Prometheus metrics is [15 days by default](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects). Older data gets deleted automatically.
### Adjusting the Grafana URL
By default, this playbook installs Grafana web user-interface on the `stats.` subdomain (`stats.example.com`) and requires you to [adjust your DNS records](#adjusting-dns-records).
By tweaking the `grafana_hostname` variable, you can easily make the service available at a **different hostname** than the default one.
Example additional configuration for your `inventory/host_vars/matrix.example.com/vars.yml` file:
```yaml
# Change the default hostname
grafana_hostname: grafana.example.com
```
## Adjusting DNS records
Once you've decided on the domain, **you may need to adjust your DNS** records to point the Grafana domain to the Matrix server.
By default, you will need to create a CNAME record for `stats`. See [Configuring DNS](configuring-dns.md) for details about DNS changes.
**Note**: It is possible to install Prometheus without installing Grafana. This case it is not required to create the CNAME record.
## Installing
Edit descriptions about installation of components (#3842) * Replace installation command shortcut for the "just" program with the most conservative raw ansible-playbook command This commit replaces installation command shortcut ("recipe") for the "just" program with the raw ansible-playbook command, so that the shortcut will be added to it later. The command is so conservative that failure of the command will mean something is clearly broken. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add comments about using setup-all instead of install-all Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add description about shortcut command with the "just" program to the ansible-playbook command with "setup-all" and "start" tags It also explains difference between "just install-all" and "just setup-all" recipes. The explanation is based on docs/playbook-tags.md Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update raw ansible-playbook command to have it do what "just install-all" or "just setup-all" does Since "just install-all" or "just setup-all" invokes "ensure-matrix-users-created" as well, it needs adding to the raw ansible-playbook command. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Remove "ensure-matrix-users-created" from the raw ansible-playbook command which does not need it Also: update the "just" recipes accordingly. "just install-all" and "just setup-all" run "ensure-matrix-users-created" tag as well, therefore they need to be replaced with "run-tags" recipes to skip "ensure-matrix-users-created" Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-etherpad.md: add ensure-matrix-users-created to the raw ansible-playbook Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add description about "ensure-matrix-users-created" and create a list with description about shortcut commands with "just" This commit also fixes list item capitalization and punctuation. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add notes bullet lists Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-matrix-corporal.md and docs/configuring-playbook-email2matrix.md: adopt common instructions Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Replace "run the installation command" with "run the playbook with tags" Now that shortcut commands for the "just" program are displayed along with the existing "installation command", this commit replaces "run the installation command" with "run the playbook with tags" in order to prevent misunderstanding and confusion. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add notes about changing passwords of users specified on vars.yml Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-synapse-admin.md: add the playbook command and just recipes Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Remove redundant blank lines Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-alertmanager-receiver.md: remove the direction to proceed to Usage Such a kind of direction is not used on other documentation, so it should be fine to just remove it. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/importing-synapse-media-store.md: code block for ansible-playbook Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> --------- Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> Co-authored-by: Suguru Hirahara <acioustick@noreply.codeberg.org>
2024-12-01 07:42:30 +00:00
After configuring the playbook and potentially [adjusting your DNS records](#adjusting-dns-records), run the playbook with [playbook tags](playbook-tags.md) as below:
<!-- NOTE: let this conservative command run (instead of install-all) to make it clear that failure of the command means something is clearly broken. -->
```sh
ansible-playbook -i inventory/hosts setup.yml --tags=setup-all,start
```
The shortcut commands with the [`just` program](just.md) are also available: `just install-all` or `just setup-all`
Edit descriptions about installation of components (#3842) * Replace installation command shortcut for the "just" program with the most conservative raw ansible-playbook command This commit replaces installation command shortcut ("recipe") for the "just" program with the raw ansible-playbook command, so that the shortcut will be added to it later. The command is so conservative that failure of the command will mean something is clearly broken. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add comments about using setup-all instead of install-all Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add description about shortcut command with the "just" program to the ansible-playbook command with "setup-all" and "start" tags It also explains difference between "just install-all" and "just setup-all" recipes. The explanation is based on docs/playbook-tags.md Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update raw ansible-playbook command to have it do what "just install-all" or "just setup-all" does Since "just install-all" or "just setup-all" invokes "ensure-matrix-users-created" as well, it needs adding to the raw ansible-playbook command. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Remove "ensure-matrix-users-created" from the raw ansible-playbook command which does not need it Also: update the "just" recipes accordingly. "just install-all" and "just setup-all" run "ensure-matrix-users-created" tag as well, therefore they need to be replaced with "run-tags" recipes to skip "ensure-matrix-users-created" Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-etherpad.md: add ensure-matrix-users-created to the raw ansible-playbook Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add description about "ensure-matrix-users-created" and create a list with description about shortcut commands with "just" This commit also fixes list item capitalization and punctuation. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add notes bullet lists Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-matrix-corporal.md and docs/configuring-playbook-email2matrix.md: adopt common instructions Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Replace "run the installation command" with "run the playbook with tags" Now that shortcut commands for the "just" program are displayed along with the existing "installation command", this commit replaces "run the installation command" with "run the playbook with tags" in order to prevent misunderstanding and confusion. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Add notes about changing passwords of users specified on vars.yml Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-synapse-admin.md: add the playbook command and just recipes Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Remove redundant blank lines Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/configuring-playbook-alertmanager-receiver.md: remove the direction to proceed to Usage Such a kind of direction is not used on other documentation, so it should be fine to just remove it. Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> * Update docs/importing-synapse-media-store.md: code block for ansible-playbook Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> --------- Signed-off-by: Suguru Hirahara <acioustick@noreply.codeberg.org> Co-authored-by: Suguru Hirahara <acioustick@noreply.codeberg.org>
2024-12-01 07:42:30 +00:00
`just install-all` is useful for maintaining your setup quickly ([2x-5x faster](../CHANGELOG.md#2x-5x-performance-improvements-in-playbook-runtime) than `just setup-all`) when its components remain unchanged. If you adjust your `vars.yml` to remove other components, you'd need to run `just setup-all`, or these components will still remain installed. Note these shortcuts run the `ensure-matrix-users-created` tag too.
2021-01-29 09:59:27 +00:00
## What does it do?
Name | Description
-----|----------
`prometheus_enabled`|[Prometheus](https://prometheus.io) is a time series database. It holds all the data we're going to talk about.
`prometheus_node_exporter_enabled`|[Node Exporter](https://prometheus.io/docs/guides/node-exporter/) is an addon of sorts to Prometheus that collects generic system information such as CPU, memory, filesystem, and even system temperatures
`prometheus_postgres_exporter_enabled`|[Postgres Exporter](configuring-playbook-prometheus-postgres.md) is an addon of sorts to expose Postgres database metrics to Prometheus.
`matrix_prometheus_nginxlog_exporter_enabled`|[NGINX Log Exporter](configuring-playbook-prometheus-nginxlog.md) is an addon of sorts to expose NGINX logs to Prometheus.
`grafana_enabled`|[Grafana](https://grafana.com/) is the visual component. It shows (on the `stats.example.com` subdomain) the dashboards with the graphs that we're interested in
`grafana_anonymous_access`|By default you need to log in to see graphs. If you want to publicly share your graphs (e.g. when asking for help in [`#synapse:matrix.org`](https://matrix.to/#/#synapse:matrix.org?via=matrix.org&via=privacytools.io&via=mozilla.org)) you'll want to enable this option.
`grafana_default_admin_user`<br>`grafana_default_admin_password`|By default Grafana creates a user with `admin` as the username and password. If you feel this is insecure and you want to change it beforehand, you can do that here
2021-01-29 09:59:27 +00:00
## Security and privacy
2021-01-31 17:43:47 +00:00
Metrics and resulting graphs can contain a lot of information. This includes system specs but also usage patterns. This applies especially to small personal/family scale homeservers. Someone might be able to figure out when you wake up and go to sleep by looking at the graphs over time. Think about this before enabling anonymous access. And you should really not forget to change your Grafana password.
Most of our docker containers run with limited system access, but the `prometheus-node-exporter` has access to the host network stack and (readonly) root filesystem. This is required to report on them. If you don't like that, you can set `prometheus_node_exporter_enabled: false` (which is actually the default). You will still get Synapse metrics with this container disabled. Both of the dashboards will always be enabled, so you can still look at historical data after disabling either source.
## Collecting metrics to an external Prometheus server
**If the integrated Prometheus server is enabled** (`prometheus_enabled: true`), metrics are collected by it from each service via communication that happens over the container network. Each service does not need to expose its metrics "publicly".
When you'd like **to collect metrics from an external Prometheus server**, you need to expose service metrics outside of the container network.
The playbook provides a single endpoint (`https://matrix.example.com/metrics/*`), under which various services may expose their metrics (e.g. `/metrics/node-exporter`, `/metrics/postgres-exporter`, `/metrics/hookshot`, etc). To expose all services on this `/metrics/*` feature, use `matrix_metrics_exposure_enabled`. To protect access using [Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication), see `matrix_metrics_exposure_http_basic_auth_enabled` and `matrix_metrics_exposure_http_basic_auth_users` below.
When using `matrix_metrics_exposure_enabled`, you don't need to expose metrics for individual services one by one.
The following variables may be of interest:
Name | Description
-----|----------
`matrix_metrics_exposure_enabled`|Set this to `true` to **enable metrics exposure for all services** on `https://matrix.example.com/metrics/*`. If you think this is too much, refer to the helpful (but nonexhaustive) list of individual `matrix_SERVICE_metrics_proxying_enabled` (or similar) variables below for exposing metrics on a per-service basis.
`matrix_metrics_exposure_http_basic_auth_enabled`|Set this to `true` to protect all `https://matrix.example.com/metrics/*` endpoints with [Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication) (see the other variables below for supplying the actual credentials). When enabled, all endpoints beneath `/metrics` will be protected with the same credentials
`matrix_metrics_exposure_http_basic_auth_users`|Set this to the Basic Authentication credentials (raw `htpasswd` file content) used to protect `/metrics/*`. This htpasswd-file needs to be generated with the `htpasswd` tool and can include multiple username/password pairs.
`matrix_synapse_metrics_enabled`|Set this to `true` to make Synapse expose metrics (locally, on the container network)
`matrix_synapse_metrics_proxying_enabled`|Set this to `true` to expose Synapse's metrics on `https://matrix.example.com/metrics/synapse/main-process` and `https://matrix.example.com/metrics/synapse/worker/TYPE-ID`. Read [below](#collecting-synapse-worker-metrics-to-an-external-prometheus-server) if you're running a Synapse worker setup (`matrix_synapse_workers_enabled: true`). To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above.
`prometheus_node_exporter_enabled`|Set this to `true` to enable the node (general system stats) exporter (locally, on the container network)
`prometheus_node_exporter_container_labels_traefik_enabled`|Set this to `true` to expose the node (general system stats) metrics on `https://matrix.example.com/metrics/node-exporter`. To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above.
`prometheus_postgres_exporter_enabled`|Set this to `true` to enable the [Postgres exporter](configuring-playbook-prometheus-postgres.md) (locally, on the container network)
`prometheus_postgres_exporter_container_labels_traefik_enabled`|Set this to `true` to expose the [Postgres exporter](configuring-playbook-prometheus-postgres.md) metrics on `https://matrix.example.com/metrics/postgres-exporter`. To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above.
`matrix_prometheus_nginxlog_exporter_enabled`|Set this to `true` to enable the [NGINX Log exporter](configuring-playbook-prometheus-nginxlog.md) (locally, on the container network)
2024-06-10 21:30:22 +00:00
`matrix_sliding_sync_metrics_enabled`|Set this to `true` to make [Sliding Sync](configuring-playbook-sliding-sync-proxy.md) expose metrics (locally, on the container network)
`matrix_sliding_sync_metrics_proxying_enabled`|Set this to `true` to expose the [Sliding Sync](configuring-playbook-sliding-sync-proxy.md) metrics on `https://matrix.example.com/metrics/sliding-sync`. To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above.
`matrix_bridge_hookshot_metrics_enabled`|Set this to `true` to make [Hookshot](configuring-playbook-bridge-hookshot.md) expose metrics (locally, on the container network)
`matrix_bridge_hookshot_metrics_proxying_enabled`|Set this to `true` to expose the [Hookshot](configuring-playbook-bridge-hookshot.md) metrics on `https://matrix.example.com/metrics/hookshot`. To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above.
`matrix_SERVICE_metrics_proxying_enabled`|Various other services/roles may provide similar `_metrics_enabled` and `_metrics_proxying_enabled` variables for exposing their metrics. Refer to each role for details. To password-protect the metrics, see `matrix_metrics_exposure_http_basic_auth_users` above or `matrix_SERVICE_container_labels_metrics_middleware_basic_auth_enabled`/`matrix_SERVICE_container_labels_metrics_middleware_basic_auth_users` variables provided by each role.
2023-07-12 06:09:27 +00:00
`matrix_media_repo_metrics_enabled`|Set this to `true` to make media-repo expose metrics (locally, on the container network)
### Collecting Synapse worker metrics to an external Prometheus server
2021-10-19 21:32:28 +00:00
If you are using workers (`matrix_synapse_workers_enabled: true`) and have enabled `matrix_synapse_metrics_proxying_enabled` as described above, the playbook will also automatically expose all Synapse worker threads' metrics to `https://matrix.example.com/metrics/synapse/worker/ID`, where `ID` corresponds to the worker `id` as exemplified in `matrix_synapse_workers_enabled_list`.
2021-10-19 21:32:28 +00:00
The playbook also generates an exemplary config file (`/matrix/synapse/external_prometheus.yml.template`) with all the correct paths which you can copy to your Prometheus server and adapt to your needs. Make sure to edit the specified `password_file` path and contents and path to your `synapse-v2.rules`. It will look a bit like this:
```yaml
scrape_configs:
- job_name: 'synapse'
metrics_path: /metrics/synapse/main-process
scheme: https
basic_auth:
username: prometheus
password_file: /etc/prometheus/password.pwd
static_configs:
- targets: ['matrix.example.com:443']
labels:
job: "master"
index: 1
- job_name: 'matrix-synapse-synapse-worker-generic-worker-0'
metrics_path: /metrics/synapse/worker/generic-worker-0
scheme: https
basic_auth:
username: prometheus
password_file: /etc/prometheus/password.pwd
static_configs:
- targets: ['matrix.example.com:443']
labels:
job: "generic_worker"
index: 18111
```
2021-03-02 06:02:31 +00:00
## More information
2021-01-29 09:59:27 +00:00
- [Enabling synapse-usage-exporter for Synapse usage statistics](configuring-playbook-synapse-usage-exporter.md)
- [Understanding Synapse Performance Issues Through Grafana Graphs](https://element-hq.github.io/synapse/latest/usage/administration/understanding_synapse_through_grafana_graphs.html) at the Synapse Github Wiki
- [The Prometheus scraping rules](https://github.com/element-hq/synapse/tree/master/contrib/prometheus) (we use v2)
- [The Synapse Grafana dashboard](https://github.com/element-hq/synapse/tree/master/contrib/grafana)
2021-01-29 09:59:27 +00:00
- [The Node Exporter dashboard](https://github.com/rfrail3/grafana-dashboards) (for generic non-synapse performance graphs)