Quick Contact

    Python Tutorial
    Python Panda Tutorial
    Python Selenium
    Python Flask Tutorial
    Python Django
    Interview Questions & Answers


    Let’s start with experiment, so when you run an experiment like RL library. RLlib stores all the results in a default results directory. You can find the path to the default results directory from the logs, right before each training summary table as-

    – You can also set a different results directory by passing the “local_dir” argument to the “tune.run()” command as-:

    And let’s run this. And when you run this, you will see that RLlib has created a folder called “cartpole_v1”. Each experiment created the following files and folders inside the results directory.

    First you get a folder with the algorithm name. Inside that there are time-stamped files storing the experiment state. But you don’t need that. Then there is a time stamped folder. This folder is very important as it contains all the training and evaluation data. Software called Tensor board, which was automatically installed when you installed RLlib, can visualize- training and evaluation data in -: real time, as the experiment is running.

    – Basically instead of boring text like this, you get fancy graphs.

    So let’s see how it actually works. For this, you have to go to terminal as-:

    And type – “tensorboard — logdir” and then supply the path to the folder containing the training and evaluation data as-:

    Then go to the returned URL in your browser and you get as-:

    It is a tensor board interface.

    – Many aspects of training and evaluation can be visualized here. Most important metric is the average total rewards per episode during training and evaluation, because it indicates how close the agent is to its goal.

    – To track the metric during training, look for the graph following-:

    – The X-axis here is the total number of steps that the agent has taken so far while training and learning. The Y-axis shows the average total rewards per episode as the agent trains and learns in the environment.

    – As you see the metric is steadily increasing, indicating that the agent is learning. And is the experiment is still running, you can update the graphics to include the most recent data.

    – Simply click on the reload button and the graph will update to include all the data available, so far. If the experiment is stopped, the reload button won’t do anything, because there is no new data to show. You might sometimes find that the graph breaks off at the edges.

    In order to prevent that, just uncheck the following checkbox, and then you will see the entire data.

    Training metric I useful as an indication, of learning, but the evaluation metric is the indicator of the agent’s currently capability. So, let’s look at the average total rewards per episode during evaluation. So find the graph with tag- “ray/tune/evaluation/episode_reward_mean”.

    The X-axis and the Y-axis are the same as the previous graph. You can also visualize and compare two experiments simultaneously.

    – Let’s run the second experiment with same environment, but let’s now use a different algorithm.

    – You can choose any algorithm implemented by RLlib and the full list is in this web page.

    – Let’s choose the famous DQN algorithm. So, just place the algorithm name with DQN and to visualize both experiments simultaneously, the easiest way to choose the same results directory as before. So use “local_dir” and set it to same directory.

    – That’s because tensor board can automatically detect different experiments that are stored with same results directory and display simultaneously.

    – To use respective capability you simply need to pass the common top level directory into “logdir” argument.

    – Tensor board will start from the supplied directory and auto detects the training and evaluation data from both experiments under the directory.

    Let’s try it in terminal-:

    1. So stop the previously running command and start the new one, where you pass the top level directory as the “logdir” argument.

    So here you see the results in above image from both running experiment.

    – For example in above image the orange data corresponds to the experiment running the PPO algorithm, and the blue data here represents the data running the experiment running the DQN algorithm.

    – If you reload you will see the new data coming in as-:

    You can also turn off visualizations from any experiment you want. So let’s turn PPO, that is orange data, and then you will see only blue selected data.

    – You can run different algorithms on different environments and visualizing them in Tensorboard.


    Apply now for Advanced Python Training Course

    Copyright 1999- Ducat Creative, All rights reserved.

    Anda bisa mendapatkan server slot online resmi dan terpercaya tentu saja di sini. Sebagai salah satu provider yang menyediakan banyak pilihan permainan.