=============== Troubleshooting =============== .. contents:: :local: :backlinks: none .. _troubleshoot-dns-resolver: How to test if DNS resolver and network connections are working from a container? ================================================================================= Depending on your networking environment and local DNS settings, you may encounter networking errors and DNS resolver issues. To validate that DNS and networking settings are working correctly from within a Fleio container, run the``fleio info`` command. This connects to a couple of HTTPS URLs that Fleio is using, implicitly testing Docker DNS resolvers: .. code-block:: bash fleio info # ... # last lines of the output: Request to https://fleio.com/versions.txt succeeded. Request to https://licensing.fleio.com/ succeeded. If DNS resolver and networking is working correctly, you will see success messages above. If requests fail, debug further based on the error messages. .. note:: If you have a custom MTU set on the host where Fleio is installed, you also need :ref:`customize the MTU value for Docker containers`. To keep the Fleio Docker container images small and to follow best practices, containers do not include network debugging utilities, like ``ping`` or ``telnet``. The Fleio ``utils`` container (which you can enter using ``fleio bash`` command) include these, and many other handy utilities. ``backend``, ``celery`` and other containers that are actually used for Fleio's main functionality do not include debugging tools. Use the following commands to confirm that DNS resolver is working and outgoing network connections to the specified host are allowed: .. code-block:: bash # The following command enters the fleio-celery-1 Docker container, you can replace it with another Fleio container, # e.g. fleio-backend-1. docker exec -it fleio-celery-1 bash # Replace the host ('fleio.com') and TCP socket (443) with desired values to test. # Connect throws an error if it fails for any reason, otherwise "ok" is printed. python -c "import socket; socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('fleio.com', 443)); print('ok')" # exit the container: exit If a connection is successfully established to the specified host on the specified TCP port, you will get "ok" printed at the console. Otherwise, you will get an error. Some common errors are exemplified below. If host can not be resolved to IP address, meaning that the DNS resolver is not working correctly in the container (or you used a host that does not have a record in the global DNS system), you will get: .. code-block:: bash Traceback (most recent call last): File "", line 1, in socket.gaierror: [Errno -2] Name or service not known If target host refused the connection: .. code-block:: bash Traceback (most recent call last): File "", line 1, in ConnectionRefusedError: [Errno 111] Connection refused Connection times out, possibly a firewall is preventing the connection: .. code-block:: bash Traceback (most recent call last): File "", line 1, in TimeoutError: [Errno 110] Connection timed out .. _troubleshoot-openstack-connection: How to check if Fleio connects successfully to the OpenStack API ================================================================ After you fill in the details on the OpenStack settings :ref:`openstack-settings-credentials` tab press **Test connection** to see if credentials are working. Then by pressing **Save & sync** or **Sync** button all the OpenStack objects (projects, instances, volumes, networks etc.) from the Fleio database are reset and the list of objects with states is retrieved from OpenStack. You can then go to **Cloud** > **Instances** and check list of instances (or volumes, or some other resource that you know has elements in OpenStack) and see if there are any objects present. .. _troubleshoot-openstack-notifications: How to check if OpenStack notifications are working =================================================== Fleio requires notification messages from the OpenStack's internal RabbitMQ queue to update the cloud objects' state in the Fleio database and for most of the pricing rules. **You can confirm that Fleio receives notifications** by shutting down an existing instance and see the instance status being updated in Fleio. Follow these steps: #. Go to Fleio ``/staff`` panel #. Sync OpenStack objects as described in section :ref:`troubleshoot-openstack-connection` #. Assuming you have a running instance, shut it down #. Wait a few seconds for the instance to reach the **STOPPED** state. If your instance still shows as **RUNNING** state in Fleio, while you can see the instance being ``SHUTOFF`` in OpenStack (``openstack server list --all-projects``), that means that notifications are not working. These steps apply for other operations as well: create instance, but it never shows up in Fleio, while you can see it created in OpenStack, or instance start for a shutoff instance. To repeat these steps, make sure you start from a synced database by pressing Sync on the OpenStack :ref:`openstack-settings-credentials` tab. **The checklist to have notifications working is**: * Make sure you :ref:`enabled the OpenStack notifications`, selected an :ref:`OpenStack notifications setup option` and you filled in the necessary values in :ref:`notification settings` * you added :ref:`add the RabbitMQ user` for Fleio * you set the :ref:`notification settings` For debugging purposes, you can also enable notification logging with :ref:`log-notifications` and see in the log file what notifications are received. See in this video how notifications log should work: .. raw:: html If some notifications are received by Fleio and some are not received, see :ref:`missing-some-os-notifications`. .. _missing-some-os-notifications: Some OpenStack notifications are not received ============================================= For general OpenStack notifications troubleshooting, especially if none of the OpenStack notifications are received, see :ref:`troubleshoot-openstack-notifications`. If just some notifications are not received, here are two possible reasons: 1. You have the same "Notifications pool name" (in OpenStack settings, Notifications tab) configured in two different Fleio installations that are connected to the same RabbitMQ. In this case, half of the notifications would go to one Fleio installation and the other half to the second installation. To fix this, just fill in another, unique "Notifications pool name" (it will be automatically created in RabbitMQ). 2. If "AMQP durable queues" checkbox is checked (in the staff panel > **Settings** > OpenStack, on the **NOTIFICATIONS** tab), about 33% of the notifications are not received. Make sure that **AMQP durable queues** checkbox is unchecked. .. _troubleshoot-license-set: Troubleshoot error when setting license ======================================= On Fleio install, upgrade or when running ``fleio license`` to set a new license key, you may get an error: .. code-block:: bash * Setting license Error: License setting failed. Aborting ... Use 'fleio license' command to set a new license Your license key may be bound to another Fleio installation. If you are in one of the following cases: * you have used the license key before and you are now performing a new Fleio installation, * you have changed the domain name on which Fleio is installed, * or you have changed the Django ``SECRET_KEY`` in the :ref:`settings.py file` * you exceed the licensed amount of RAM on your OpenStack compute nodes or, in case of Fleio Hosting Billing Edition or Full Edition, you exceed the number of licensed clients then contact the Fleio support department to reset your license. Another possible reason for failing to set the license key is networking or DNS resolver issues in Docker containers. See :ref:`troubleshoot-dns-resolver`. .. _troubleshoot-email: Troubleshoot email messages are not sent ======================================== To check if your Fleio installation is able to send email messages, try to reset a staff user's password. Go to the Fleio staff panel, log out first if you are logged in, and go to the login form (*/staff* URL). Click on the *Forgot password* link and fill in an email address of an existing staff user. If email settings are working correctly, you will receive a reset password email message. If you do not receive a reset password email message, log in to the staff panel and check *Utilities* > *Email message log*. You should see *Password reset* entry marked with a red bar if email sending failed. Click on the entry and check "*Error:*" line on screen. Fix the email settings according to :ref:`email-settings` and test reset password again. If you can't see a *Password reset* entry in the *Email message log*, double check that you used a valid staff user email address. Incoming emails are not recorded in the support ticketing system ================================================================ Make sure the mail server pipe is :ref:`correctly configured`. You can also enable :ref:`logging for incoming ticketing system emails`. Once logging is enabled, if no emails are logged, it's probably a misconfiguration on the mail server pipe. .. _troubleshoot-db: Troubleshoot database errors on upgrade ======================================= Fleio database must be created with utf8mb4 character set. You can check the character set by running the following query on your database (usually fleio): .. code-block:: bash fleio mysql SELECT @@character_set_database, @@collation_database; If your database contains tables that are using a different character set you will have to update the tables to have the UTF8mb4 set. This can be done with the following steps: 1. Export current DB .. code-block:: bash cd /home/fleio/compose db_pass=$(cat /home/fleio/compose/secrets/.db_password) docker compose exec -T db mysqldump fleio -u fleio -p"$db_pass" > fleio`date +%d.%m.%Y`.sql 2. Adjust the dump so all the ``CREATE TABLE`` statements will use utf8mb4. 3. Import the database (replace the path to your `FIXED_DB` with the proper name) .. code-block:: bash sudo chown fleio /home/fleio/compose/FIXED_DB.sql sudo -i -u fleio cd /home/fleio/compose # stop all services docker compose stop # and start just the database service docker compose start db # get the database password in a variable db_pass=$(cat /home/fleio/compose/secrets/.db_password) # enter the mysql console docker compose exec db mysql -u fleio -p"$db_pass" # once in console, drop the existing database and create a new one: DROP DATABASE fleio; CREATE DATABASE fleio CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # exit the console to Bash exit # import the database cat fleio.sql | docker compose exec -T db mysql fleio -u fleio -p"$db_pass" 4. Apply migrations by running ``fleio django migrate`` .. _no-networks: Network is missing on instance create form ========================================== When you create an instance, as staff user or end-user, no network is shown or a network is missing from the **Network selection** dropdown. By default, **Network selection** drop down includes: * OpenStack networks that are marked as ``shared=True`` * the client's own networks The fix may simply be to mark some network(s) as ``shared=True``. .. note:: Only networks that have at least one subnet are shown in the create instance form. The **Network selection** field may be hidden or may have some networks pre-selected depending on the :ref:`frontend settings`. Network included in the **Network selection** dropdown field also depend on weather ``openstack.networks.display_external_networks`` and ``openstack.networks.display_shared_networks`` :ref:`feature toggles are enabled or disabled`. How to enable debugging ======================= If you're in a development environment (not a production server) you can also enable debugging from `settings.py` file to get more information in browser when you're accessing an invalid URL and in case an error occurs during development: .. code-block:: python DEBUG = True Now you'll be able to see the backend API nicely formatted and even perform POST, PUT, and DELETE requests. Note that you need to be authenticated as Here's an example of PHP code calling the Fleio API: https://github.com/fleio/fleio-whmcs. You can also run Django commands from the command line after you activate the Fleio Python virtual environment and move to the Django project directory. Please see :ref:`djangoCli`. For more information regarding the Django command line see https://docs.djangoproject.com/en/dev/ref/django-admin/ .. _troubleshoot-ceilometer: Ceilometer troubleshooting ========================== .. important:: Data points might take some time to be added, so you might want to wait for at least 30 minutes to 1 hour after the instance was deployed before starting to troubleshoot this. If no metrics are displayed for an instance first step will be to check Fleio backend logs. If you find no errors in logs or BadRequest errors when retrieving metrics you should ensure gnocchi is installed and both gnocchi and Ceilometer are configured correctly for the instance's region. See :ref:`ceilometer-configuration` for instructions on how to configure gnocchi and Ceilometer. Network related metrics ~~~~~~~~~~~~~~~~~~~~~~~ If you properly configured Ceilometer / gnocchi, the metrics should show correctly in the instance's "Metrics" tab. However, if there are no metrics shown even after a few hours after your deployed the instance, you might have some configuration issues. To get the network metrics, please run the following commands: .. code-block:: bash Measures: openstack metric measures show --resource-id=`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'` --aggregation=max --granularity=3600 network.incoming.bytes Metric: openstack metric show --resource-id=`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'` network.incoming.bytes Archive-policy settings: openstack metric archive-policy show `gnocchi metric show --resource-id=\`gnocchi resource list | grep instance_network_interface | tail -n 1 | awk '{print $2}'\` network.incoming.bytes | grep archive_policy | awk '{print $4}'` If the measures command returns "Aggregation method 'max' at granularity '3600.0' for metric xxxxxx does not exist (HTTP 404)" you must check the archive policy granulation. You should have 1h granulation defined, which was configured with: .. code-block:: bash openstack metric archive-policy create -d granularity:5m,points:290 -d granularity:30m,points:336 -d granularity:1h,points:768 -m max -m mean -m sum your_policy The next step is to run the metrics command and check if your resource is using the correct policy. In the output you should be able to identify the archive_policy/name and you must assure that is the correct one. If all is correct by now, you need to run the third command. This should confirm you that your archive policy does have the correct aggregation methods and it has the correct granularity. If it doesn't have the aggregation_methods max, sum, mean and the correct granularity (shown below), please check your configuration. Below you will find the correct archive policy: .. code-block:: bash +---------------------+-----------------------------------------------------------------+ | Field | Value | +---------------------+-----------------------------------------------------------------+ | aggregation_methods | max, sum, mean | | back_window | 0 | | definition | - points: 290, granularity: 0:05:00, timespan: 1 day, 0:10:00 | | | - points: 336, granularity: 0:30:00, timespan: 7 days, 0:00:00 | | | - points: 768, granularity: 1:00:00, timespan: 32 days, 0:00:00 | | name | your policy | +---------------------+-----------------------------------------------------------------+ Another thing that needs to be checked is the pipeline.yaml file (see :ref:`pipeline`). In that file, you should check the publishers archive policy (in our example described in the configuring guide is fleio_policy: gnocchi://?filter_project=service&archive_policy=fleio_policy). This does not needs to be persistent across all regions, but it should be the same in both ceilometer's pipeline.yaml file and in gnocchi. If they are different, we recommend to clear all the metrics with ceilometer disabled by running this command: .. code-block:: bash set -e; set -x; for host in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack hypervisor list -f value -c 'Hypervisor Hostname'"); do ssh $host systemctl stop ceilometer-polling; ssh $host systemctl stop ceilometer-agent-notification; done; ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack metric archive-policy-rule delete default"; ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack metric archive-policy create -d granularity:5m,points:290 -d granularity:30m,points:336 -d granularity:1h,points:768 -m max -m mean -m sum fleio; openstack metric archive-policy-rule create -a fleio -m \\* default"; for openstack metric_id in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack metric resource list -f value -c id"); do ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack metric resource delete $gnocchi_id"; done; for host in $(ssh $(awk '/utility/ {print $NF}' /etc/hosts) "source openrc; openstack hypervisor list -f value -c 'Hypervisor Hostname'"); do ssh $host systemctl start ceilometer-polling; ssh $host systemctl start ceilometer-agent-notification; done After that, you will have to follow the guide for :ref:`ceilometer-configuration`. CPU usage related metrics ~~~~~~~~~~~~~~~~~~~~~~~~~ The CPU metrics are required in order to have data in the Instance details -> metrics tab -> CPU metrics graph: .. image:: /_static/images/newstaff/cloud/instances/instance-detailsCPUmetrics.png :scale: 50 If you don't have any data on this graph, you can run the following commands to see if you have any gnocchi metrics for that instance: .. code-block:: bash openstack metric aggregates '(/ (rateofchange (metric cpu max)) 3000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=300 openstack metric aggregates '(/ (rateofchange (metric cpu max)) 18000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=1800 openstack metric aggregates '(/ (rateofchange (metric cpu max)) 36000000000)' id=$(openstack metric list | grep -w "cpu" | tail -n 1 | awk '{print $2}') --granularity=3600 Note that the command from above will return the data points for your last deployed instance. For a specific instance use the following commands: .. code-block:: bash openstack metric aggregates '(/ (rateofchange (metric cpu max)) 3000000000)' id=REPLACE-ME --granularity=300 openstack metric aggregates '(/ (rateofchange (metric cpu max)) 18000000000)' id=REPLACE-ME --granularity=1800 openstack metric aggregates '(/ (rateofchange (metric cpu max)) 36000000000)' id=REPLACE-ME --granularity=3600 If there are datapoints in ceilometer, then you should get some values, as in the following example: .. code-block:: none +----------------------------------------------+---------------------------+-------------+---------------------+ | name | timestamp | granularity | value | +----------------------------------------------+---------------------------+-------------+---------------------+ | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:40:00+00:00 | 300.0 | 76.76666666666667 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:45:00+00:00 | 300.0 | 97.0 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:50:00+00:00 | 300.0 | 97.14333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T07:55:00+00:00 | 300.0 | 97.06 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:00:00+00:00 | 300.0 | 97.15333333333334 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:05:00+00:00 | 300.0 | 96.86333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:10:00+00:00 | 300.0 | 97.25333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:15:00+00:00 | 300.0 | 53.93 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:20:00+00:00 | 300.0 | 0.31333333333333335 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:25:00+00:00 | 300.0 | 0.3333333333333333 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:30:00+00:00 | 300.0 | 0.31333333333333335 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:35:00+00:00 | 300.0 | 0.32 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:40:00+00:00 | 300.0 | 0.32 | | ffbe5368-607b-437b-8253-9bc390c8d049/cpu/max | 2021-06-22T08:45:00+00:00 | 300.0 | 0.30666666666666664 | +----------------------------------------------+---------------------------+-------------+---------------------+ If you get the `Metrics not found` error then you will have to re-check the :ref:`ceilometer-configuration` guide. Detailed info on the Docker Fleio deployment ============================================ See :ref:`docker_deploy_notes`. fleio backup command gives error: bc command not found, aborting ================================================================ You need the ``bc`` system package (bash command line calculator). Here's how you install it on Debian or Ubuntu: .. code-block:: bash sudo apt install bc Other Linux distributions have similar commands to install ``bc``. "504 Gateway Timeout" error shows in web interface / uwsgi keeps restarting =========================================================================== You periodically see "504 Gateway Timeout" toast messages in the web interface and when you check ``backend`` container logs (``docker logs fleio-backend-1``), you see uWSGI restarting every few seconds. This may be caused by incorrect system time on the Fleio server. Make sure you :ref:`keep system time up-to-date `. "Verification code is invalid" when enabling second factor authentication ========================================================================= When you try to secure a user account by enabling two-factor authentication you get error "Verification code is invalid" . Try a couple of times and even scan the QR code again and retry. If it still doesn't work, and you made sure you are using the right code from Google Authenticator (or other application), then this may be caused by incorrect system time on the server where Fleio is installed. Make sure you :ref:`keep system time up-to-date `.