=============== Troubleshooting =============== .. contents:: :local: How to test if DNS resolver and network connections are working from a container? ================================================================================= Depending on your networking environment and local DNS settings, you may encounter networking errors and DNS resolver issues. To keep the Fleio Docker container images small and to follow best practices, containers do not include network debugging utilities, like ``ping`` or ``telnet``. As an exception, the Fleio ``utils`` container (which you can enter using ``fleio bash`` command) include these, and many other handy utilities. ``backend``, ``celery`` and other containers that are actually used for Fleio's functionality do not include debugging tools. Use the following commands to confirm that DNS resolver is working and outgoing network connections to the specified host are allowed: .. code-block:: bash # The following command enters the fleio-celery-1 Docker container, you can replace it with another Fleio container, # e.g. fleio-backend-1. docker exec -it fleio-celery-1 bash # Replace the host ('fleio.com') and TCP sockety (443) with desired values to test. # Connect throws an error if it fails for any reason, otherwise "ok" is printed. python -c "import socket; socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('fleio.com', 443)); print('ok')" # exit the container: exit If a connection is successfully established to the specified host on the specified TCP port, you will get "ok" printed at the console. Otherwise, you will get an error. Some common errors are exemplified below. If host can not be resolved to IP address, meaning that the DNS resolver is not working correctly in the container (or you used a host that does not have a record in the global DNS system), you will get: .. code-block:: bash Traceback (most recent call last): File "", line 1, in socket.gaierror: [Errno -2] Name or service not known If target host refused the connection: .. code-block:: bash Traceback (most recent call last): File "", line 1, in ConnectionRefusedError: [Errno 111] Connection refused Connection times out, possibly a firewall is preventing the connection: .. code-block:: bash Traceback (most recent call last): File "", line 1, in TimeoutError: [Errno 110] Connection timed out .. _troubleshoot-openstack-connection: How to check if Fleio connects successfully to the OpenStack API ================================================================ After you fill in the details on the OpenStack settings :ref:`openstack-settings-credentials` tab press **Test connection** to see if credentials are working. Then by pressing **Save & sync** or **Sync** button all the OpenStack objects (projects, instances, volumes, networks etc.) from the Fleio database are reset and the list of objects with states is retrieved from OpenStack. You can then go to **Cloud** > **Instances** and check list of instances (or volumes, or some other resource that you know has elements in OpenStack) and see if there are any objects present. .. _troubleshoot-openstack-notifications: How to check if OpenStack notifications are working =================================================== Fleio requires notification messages from the OpenStack's internal RabbitMQ queue to update the cloud objects' state in the Fleio database and for most of the pricing rules. **You can confirm that Fleio receives notifications** by shutting down an existing instance and see the instance status being updated in Fleio. Follow these steps: #. Go to Fleio ``/staff`` panel #. Sync OpenStack objects as described in section :ref:`troubleshoot-openstack-connection` #. Assuming you have a running instance, shut it down #. Wait a few seconds for the instance to reach the **STOPPED** state. If your instance still shows as **RUNNING** state in Fleio, while you can see the instance being ``SHUTOFF`` in OpenStack (``openstack server list --all-projects``), that means that notifications are not working. These steps apply for other operations as well: create instance, but it never shows up in Fleio, while you can see it created in OpenStack, or instance start for a shutoff instance. To repeat these steps, make sure you start from a synced database by pressing Sync on the OpenStack :ref:`openstack-settings-credentials` tab. **The checklist to have notifications working is**: * Make sure you :ref:`enabled the OpenStack notifications` * you added :ref:`add the RabbitMQ user` for Fleio * you set the :ref:`notification settings` For debugging purposes, you can also enable notification logging with :ref:`log-notifications` and see in the log file what notifications are received. See in this video how notifications log should work: .. raw:: html .. _troubleshoot-email: Troubleshoot email messages are not sent ======================================== To check if your Fleio installation is able to send email messages, try to reset a staff user's password. Go to the Fleio staff panel, log out first if you are logged in, and go to the login form (*/staff* URL). Click on the *Forgot password* link and fill in an email address of an existing staff user. If email settings are working correctly, you will receive a reset password email message. If you do not receive a reset password email message, log in to the staff panel and check *Utilities* > *Email message log*. You should see *Password reset* entry marked with a red bar if email sending failed. Click on the entry and check "*Error:*" line on screen. Fix the email settings according to :ref:`email-settings` and test reset password again. If you can't see a *Password reset* entry in the *Email message log*, double check that you used a valid staff user email address. .. _troubleshoot-db: Troubleshoot database errors on upgrade ======================================= Fleio database must be created with utf8mb4 character set. You can check the character set by running the following query on your database (usually fleio): .. code-block:: bash fleio mysql SELECT @@character_set_database, @@collation_database; If your database contains tables that are using a different character set you will have to update the tables to have the UTF8mb4 set. This can be done with the following steps: 1. Export current DB .. code-block:: bash cd /home/fleio/compose db_pass=$(cat /home/fleio/compose/secrets/.db_password) docker compose exec -T db mysqldump fleio -u fleio -p"$db_pass" > fleio`date +%d.%m.%Y`.sql 2. Adjust the dump so all the ``CREATE TABLE`` statements will use utf8mb4. 3. Import the database (replace the path to your `FIXED_DB` with the proper name) .. code-block:: bash sudo chown fleio /home/fleio/compose/FIXED_DB.sql sudo -i -u fleio cd /home/fleio/compose # stop all services docker compose stop # and start just the database service docker compose start db # get the database password in a variable db_pass=$(cat /home/fleio/compose/secrets/.db_password) # enter the mysql console docker compose exec db mysql -u fleio -p"$db_pass" # once in console, drop the existing database and create a new one: DROP DATABASE fleio; CREATE DATABASE fleio CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # exit the console to Bash exit # import the database cat fleio.sql | docker compose exec -T db mysql fleio -u fleio -p"$db_pass" 4. Apply migrations by running ``fleio django migrate`` How to enable debugging ======================= If you're in a development environment (not a production server) you can also enable debugging from `settings.py` file to get more information in browser when you're accessing an invalid URL and in case an error occurs during development: .. code-block:: python DEBUG = True Now you'll be able to see the backend API nicely formatted and even perform POST, PUT, and DELETE requests. Note that you need to be authenticated as Here's an example of PHP code calling the Fleio API: https://github.com/fleio/fleio-whmcs. You can also run Django commands from the command line after you activate the Fleio Python virtual environment and move to the Django project directory. Please see :ref:`djangoCli`. For more information regarding the Django command line see https://docs.djangoproject.com/en/dev/ref/django-admin/ .. _troubleshoot-ceilometer: Ceilometer troubleshooting ========================== .. important:: Data points might take some time to be added, so you might want to wait for at least 30 minutes to 1 hour after the instance was deployed before starting to troubleshoot this. If no metrics are displayed for an instance first step will be to check Fleio backend logs. If you find no errors in logs or BadRequest errors when retrieving metrics you should ensure gnocchi is installed and both gnocchi and Ceilometer are configured correctly for the instance's region. See :ref:`ceilometer-configuration` for instructions on how to configure gnocchi and Ceilometer. Network related metrics ~~~~~~~~~~~~~~~~~~~~~~~ If you properly configured Ceilometer / gnocchi, the metrics should show correctly in the instance's "Metrics" tab. However, if there are no metrics shown even after a few hours after your deployed the instance, you might have some configuration issues. To get the network metrics, please run the following commands: .. code-block:: bash Measures: gnocchi measures show --resource-id=`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'` --aggregation=max --granularity=3600 network.incoming.bytes Metric: gnocchi metric show --resource-id=`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'` network.incoming.bytes Archive-policy settings: gnocchi archive-policy show `gnocchi metric show --resource-id=\`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'\` network.incoming.bytes | grep archive_policy | awk '{print $4}'` If the measures command returns "Aggregation method 'max' at granularity '3600.0' for metric xxxxxx does not exist (HTTP 404)" you must check the archive policy granulation. You should have 1h granulation defined, which was configured with: .. code-block:: bash gnocchi archive-policy create -d granularity:5m,points:290 -d granularity:30m,points:336 -d granularity:1h,points:768 -m max -m mean -m sum your_policy The next step is to run the metrics command and check if your resource is using the correct policy. In the output you should be able to identify the archive_policy/name and you must assure that is the correct one. If all is correct by now, you need to run the third command. This should confirm you that your archive policy does have the correct aggregation methods and it has the correct granularity. If it doesn't have the aggregation_methods max, sum, mean and the correct granularity (shown bellow), please check your configuration. Bellow you will find the correct archive policy: .. code-block:: bash +---------------------+-----------------------------------------------------------------+ | Field | Value | +---------------------+-----------------------------------------------------------------+ | aggregation_methods | max, sum, mean | | back_window | 0 | | definition | - points: 290, granularity: 0:05:00, timespan: 1 day, 0:10:00 | | | - points: 336, granularity: 0:30:00, timespan: 7 days, 0:00:00 | | | - points: 768, granularity: 1:00:00, timespan: 32 days, 0:00:00 | | name | your policy | +---------------------+-----------------------------------------------------------------+ Another thing that needs to be checked is the pipeline.yaml file (see :ref:`pipeline`). In that file, you should check the publishers archive policy (in our example described in the configuring guide is fleio_policy: gnocchi://?filter_project=service&archive_policy=fleio_policy). This does not needs to be persistent across all regions, but it should be the same in both ceilometer's pipeline.yaml file and in gnocchi. If they are different, we recommend to clear all the metrics with ceilometer disabled by running this command: .. code-block:: bash set -e; set -x; for host in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack hypervisor list -f value -c 'Hypervisor Hostname'"); do ssh $host systemctl stop ceilometer-polling; ssh $host systemctl stop ceilometer-agent-notification; done; ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; gnocchi archive-policy-rule delete default"; ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; gnocchi archive-policy create -d granularity:5m,points:290 -d granularity:30m,points:336 -d granularity:1h,points:768 -m max -m mean -m sum fleio; gnocchi archive-policy-rule create -a fleio -m \\* default"; for gnocchi_id in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; gnocchi resource list -f value -c id"); do ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; gnocchi resource delete $gnocchi_id"; done; for host in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack hypervisor list -f value -c 'Hypervisor Hostname'"); do ssh $host systemctl start ceilometer-polling; ssh $host systemctl start ceilometer-agent-notification; done After that, you will have to follow the guide for :ref:`ceilometer-configuration`. CPU usage related metrics ~~~~~~~~~~~~~~~~~~~~~~~~~ The CPU metrics are required in order to have data in the Instance details -> metrics tab -> CPU metrics graph: .. image:: /_static/images/newstaff/cloud/instances/instance-detailsCPUmetrics.png :scale: 50 If you don't have any data on this graph, you can run the following commands to see if you have any gnocchi metrics for that instance: .. code-block:: bash gnocchi aggregates '(/ (rateofchange (metric cpu max)) 3000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=300 gnocchi aggregates '(/ (rateofchange (metric cpu max)) 18000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=1800 gnocchi aggregates '(/ (rateofchange (metric cpu max)) 36000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=3600 Note that the command from above will return the data points for your last deployed instance. For a specific instance use the following commands: .. code-block:: bash gnocchi aggregates '(/ (rateofchange (metric cpu max)) 3000000000)' id=REPLACE-ME --granularity=300 gnocchi aggregates '(/ (rateofchange (metric cpu max)) 18000000000)' id=REPLACE-ME --granularity=1800 gnocchi aggregates '(/ (rateofchange (metric cpu max)) 36000000000)' id=REPLACE-ME --granularity=3600 If there are datapoints in ceilometer, then you should get some values, as in the following example: .. code-block:: none +----------------------------------------------+---------------------------+-------------+---------------------+ | name | timestamp | granularity | value | +----------------------------------------------+---------------------------+-------------+---------------------+ | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:40:00+00:00 | 300.0 | 76.76666666666667 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:45:00+00:00 | 300.0 | 97.0 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:50:00+00:00 | 300.0 | 97.14333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:55:00+00:00 | 300.0 | 97.06 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:00:00+00:00 | 300.0 | 97.15333333333334 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:05:00+00:00 | 300.0 | 96.86333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:10:00+00:00 | 300.0 | 97.25333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:15:00+00:00 | 300.0 | 53.93 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:20:00+00:00 | 300.0 | 0.31333333333333335 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:25:00+00:00 | 300.0 | 0.3333333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:30:00+00:00 | 300.0 | 0.31333333333333335 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:35:00+00:00 | 300.0 | 0.32 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:40:00+00:00 | 300.0 | 0.32 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:45:00+00:00 | 300.0 | 0.30666666666666664 | +----------------------------------------------+---------------------------+-------------+---------------------+ If you get the `Metrics not found` error then you will have to re-check the :ref:`ceilometer-configuration` guide. Detailed info on the Docker Fleio deployment ============================================ See :ref:`docker_deploy_notes`. fleio backup command gives error: bc command not found, aborting ================================================================ You need the ``bc`` system package (bash command line calculator). Here's how you install it on Debian or Ubuntu: .. code-block:: bash sudo apt install bc Other Linux distributions have similar commands to install ``bc``. "504 Gateway Timeout" error shows in web interface / uwsgi keeps restarting =========================================================================== You periodically see "504 Gateway Timeout" toast messages in the web interface and when you check ``backend`` container logs (``docker logs fleio-backend-1``), you see uWSGI restarting every few seconds. This may be caused by incorrect system time on the Fleio server. Make sure you :ref:`keep system time up-to-date `. "Verification code is invalid" when enabling second factor authentication ========================================================================= When you try to secure a user account by enabling two-factor authentication you get error "Verification code is invalid" . Try a couple of times and even scan the QR code again and retry. If it still doesn't work, and you made sure you are using the right code from Google Authenticator (or other application), then this may be caused by incorrect system time on the server where Fleio is installed. Make sure you :ref:`keep system time up-to-date `.