Prometheus is an Open Source tool used to monitor nearly everything, it has a bunch of integrations, you can check it here
Before you start, please have a look at the config files needed here:
https://github.com/globalsquadproject/monitoring
# Download the Prometheus Server package
wget https://github.com/prometheus/prometheus/releases/download/v2.40.1/prometheus-2.40.1.linux-amd64.tar.gz
# Add Prometheus User
useradd --no-create-home --shell /bin/false prometheus
# Create diectory and change ownership
mkdir /etc/prometheus
mkdir /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus
#Extract Prmetheus file & rename it.
tar -xvzf prometheus-2.8.1.linux-amd64.tar.gz
mv prometheus-2.8.1.linux-amd64 prometheuspackage
# Copy “prometheus” and “promtool” binary and change ownership
cp prometheuspackage/prometheus /usr/local/bin/
cp prometheuspackage/promtool /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool
# Copy “consoles” and “console_libraries”
cp -r prometheuspackage/consoles /etc/prometheus
cp -r prometheuspackage/console_libraries /etc/prometheus
chown -R prometheus:prometheus /etc/prometheus/consoles
chown -R prometheus:prometheus /etc/prometheus/console_libraries
vi /etc/prometheus/prometheus.yml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'prometheus_master'
scrape_interval: 5s
static_configs:
- targets: ['SERVER_IP:9090']
# Change the ownership
chown prometheus:prometheus /etc/prometheus/prometheus.yml
vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
#start service
systemctl daemon-reload
systemctl start prometheus
systemctl status prometheus
Access the Prometheus server using this this URL: http://SERVER_IP:9090/graph

Node Exporter
The node exporter is very important to Prometheus, it’s responsible for exposing the metrics from the nodes, and it will create an endpoint allowing Prometheus to fetch all data needed.
# Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
tar -xvzf node_exporter-1.4.0.linux-amd64.tar.gz
useradd -rs /bin/false nodeusr
mv node_exporter-1.4.0.linux-amd64/node_exporter /usr/local/bin/
vim /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=nodeusr
Group=nodeusr
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter
Access the node exporter interface using this this URL: http://SERVER_IP:9100/metrics
Login on Prometheus server again
vim /etc/prometheus/prometheus.yml
- job_name: 'node_exporter_centos'
scrape_interval: 5s
static_configs:
- targets: ['CLIENT_IP:9100']
systemctl restart prometheus
http://SERVER_IP:9090/targets
Select
Node_memory_MemFree_bytes
Grafana
Grafana is an open-source tool used to show the metrics in a better view, in our case we are going to use Prometheus as a data source
# Download the grafana package
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.2.4-1.x86_64.rpm
# Install the package
sudo yum install grafana-enterprise-9.2.4-1.x86_64.rpm
# Start the service
sudo systemctl start grafana-server
sudo systemctl status grafana-server
http://SERVER_IP:3000
Once you reach the Grafana use the default credentials
User: admin
Pass: admin

After that, you can create your own dashboard or import from Grafana’s website, https://grafana.com/grafana/dashboards/
I used this one:
https://grafana.com/grafana/dashboards/14513-linux-exporter-node/
You can see in the screenshot below how easy is to identify a strange behaviour in your application (huge spike in the graph)

You can use the stress command to simulate that spike
[root@prometheus01 ~]# stress --cpu 2 --timeout 60
stress: info: [20691] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [20691] successful run completed in 60s
[root@prometheus01 ~]#
That’s it, I hope you have learned something new. If you have any questions please don’t hesitate to reach out.