Technical Summary

TrailNet DNN Architecture

The TrailNet DNN is a novel deep neural network architecture that is optimized to perform within the hardware limitations of the Jetson TX2. It is possible to retrain the weights of the TrailNet DNN by creating your own set of training images, but that is beyond the current scope of this documentation. Here we are interested in investigating how well the network is able to maintain the pose of the drone with respect to the center of the trail. The stability of the controller is integral to the networks ability to make accurate inferences about the current orientation and position of the drone. They modified their loss function to include the cross entropy and entropy reward terms in order to increase the stability of the outputs.

Differentiable loss function used to train the weights of the TrailNet DNN

Once the DNN has completed its convolutions on the input image, it outputs a 1x6 vector that provides the predicted view orientation (vo) and lateral offset (lo) with respect to the center of the trail. If for example it is inferred that the current view orientation is 0.9 (90%) left and 0.1 (10%) right then alpha (yaw angle) would become negative. Neglecting the second term for now, this would result in a clockwise yaw angle, which would cause the drone to turn back towards the center of the trail. Beta is essentially the proportional gain for the controller that corresponds to 10 degrees in our case.

 // Compute turn angle from probabilities. Positive angle - turn left, negative - turn right, 0 - go straight
   float current_turn_angle_deg =  dnn_turn_angle_*(right_view_p - left_view_p) + dnn_lateralcorr_angle_*(right_side_p - left_side_p);
Method for computing the yaw angle considering the inferred lateral offsets and view orientations

In addition to publishing a new yaw angle, the px4 controller node also computes the next local waypoint based on the current orientation and position of the drone. This

inline geometry_msgs::Point PX4Controller::computeNextWaypoint(
    geometry_msgs::PoseStamped& current_pose,
    float linear_control_val,
    float angular_control_val,
    float linear_speed ) const
{
    // Create movement vector in the "body" frame (it is still in mavros frame of reference)
    Eigen::Vector3d movement_vector(linear_control_val, angular_control_val, 0);
    movement_vector *= linear_speed;

    // Convert movement_vector to the inertial frame based on current pose and compute new position
    Eigen::Quaterniond current_orientation;
    tf::quaternionMsgToEigen(current_pose.pose.orientation, current_orientation);
    Eigen::Matrix3d rotation_mat = current_orientation.toRotationMatrix();
    movement_vector = rotation_mat * movement_vector;

    Eigen::Vector3d current_position;
    tf::pointMsgToEigen(current_pose.pose.position, current_position);
    Eigen::Vector3d new_position = current_position + movement_vector;
    geometry_msgs::Point next_waypoint;
    tf::pointEigenToMsg(new_position, next_waypoint);

    return next_waypoint;
}

Altogether the final commanded 'goto' pose message is built, using a fixed altitude, fixed constant forward velocity, and the computed waypoint.

else
{
  is_moving_ = true;

  // Compute next waypoint based on current commands
  geometry_msgs::Point next_waypoint = computeNextWaypoint(current_pose, linear_control_val,
                                                                             angular_control_val, linear_speed_);
  next_waypoint.z = altitude_; // need to set Z so it holds the set altitude
  goto_pose.pose.position = next_waypoint;

  // Compute new orientation for the next waypoint if we are not going backwards or strafing
  if(linear_control_val > 0)
  {
    goto_pose.pose.orientation = getRotationTo(current_pose.pose.position, next_waypoint);
  }
}

The navigation state includes conditional statements for if the joystick node is running, allowing override of the published pose commands within ROS without needing to switch out of OFFBOARD mode on the flight controller.

case ControllerState::Navigating:
  // Log
  tf::quaternionMsgToEigen(current_pose.pose.orientation, orientation_quaternion);
  rotation_matrix = orientation_quaternion.toRotationMatrix();
  euler_angles    = rotation_matrix.eulerAngles(0,1,2);
  ROS_DEBUG("NAVPOS: %4.2f, %4.2f, %4.2f, Att: %4.2f, %4.2f, %4.2f, SetAlt: %4.2f",
  current_pose.pose.position.x, current_pose.pose.position.y, current_pose.pose.position.z,
  angles::to_degrees(euler_angles[0]), angles::to_degrees(euler_angles[1]), angles::to_degrees(euler_angles[2]),
  altitude_
  );

  if (fcu_state_.mode != vehicle_->getOffboardModeName())
  {
    // If offboard is off, don't move, just update goto_pose to be the current pose to avoid flyaway when offboard gets turned on!
    goto_pose = current_pose;
    break;
  }

  has_command = got_new_joy_command_ || got_new_dnn_command_;
  if(!use_dnn_data_)  // Use joystick/teleop control if DNN disabled
  {
    if(got_new_joy_command_)
    {
      linear_control_val   = linear_control_val_;
      angular_control_val  = angular_control_val_;
      yaw_control_val      = yaw_control_val_;
      altitude_control_val = altitude_control_val_;
      got_new_joy_command_ = false;
    }
  }
  else // Use DNN control if joystick is not touched
  {
    if(got_new_joy_command_ && (linear_control_val_!=0 || angular_control_val_!=0 ||
    yaw_control_val_!=0 || altitude_control_val_!=0))
    {
      linear_control_val   = linear_control_val_;
      angular_control_val  = angular_control_val_;
      yaw_control_val      = yaw_control_val_;
      altitude_control_val = altitude_control_val_;
      got_new_joy_command_ = false;
      joy_commands_count_++;
    }
    else if(got_new_dnn_command_)
    {
      linear_control_val   = dnn_linear_control_val_;
      angular_control_val  = dnn_angular_control_val_;
      got_new_dnn_command_ = false;
      dnn_commands_count_++;
    }
    else
    {
      // We get here only when there is no commands from DNN or no joystick movements outside of deadzone.
      // Clearing the flag is currently requried only for rover.
      has_command = false;
      break;
    }
  }

Generating Disparity Images from the Stereo DNN

Stereo DNN Architecture

It has been a challenge in the computer vision community to find a robust solution to extracting depth from a single image, because fundamentally there is a lack of geometric information available from 1 picture to extract true depth measurements. Stereo cameras have 2 image sensors which can be calibrated and rectified in order to begin to compute the disparity image which represents the depth of objects in the frame. Instead of trying to continue to find a monocular depth solution, the redtail team worked towards creating a novel semi-supervised DNN architecture that used individual stereo image disparity maps to estimate depth from a single image. Their approach used a combination of supervised and unsupervised learning, which at the time of its development was known to be the first network approaching this problem to have done so.

Loss Function with unsupervised and supervised terms

Eq. (2) ensures photometric consistency, Eq. (3) compares the estimated disparities to the sparse LIDAR data, Eq. (4) ensures that the left and right disparity maps are consistent with each other, and Eq. (5) encourages the disparity maps to be piecewise smooth.

Expanded definitions of the loss functions coefficients

The photometric consistency of the images provides an unsupervised learning approach to training the networks, while the inclusion of the ground truth lidar data provides supervised learning. In their research, they experimented with training exclusively supervised and unsupervised networks and the results with the least amount of error came from the combined approach.

Expansion of the coefficients in equations (2) and (3)
Definitions for terms in the above equations

We have tested this network on our drone and are working to integrate the output disparity map into our GAAS project. The major advantage that this approach to computing depth is that we can get a better representation of the disparity image prior to filtering it for our point cloud to obtain an occupancy map. Further, this algorithm is optimized to run using CUDA, cuDNN, and Tensor RT libraries which make use of the GPU. This will leave room for other functionality to be integrated on the CPU and most other disparity image generation algorithms such as block matching (BM) are extremely CPU intensive. A classical way to generate a comparable disparity image similar to the one inferred by the Stereo DNN is to use semi-global block matching (SGBM) which is extremely CPU intensive and smooth such that any excessive filter would remove major features from the point cloud and occupancy map.

Semi-Global Block Matching Disparity Image

The result below with the WLS filter is the best result achievable with respect to computational efficiency. It is extremely important that the generated disparity image is computed quickly and contains just enough important disparity information about the environment to generate a point cloud for the occupancy map.

Block Matching Disparity Image
Example of an occupancy map inside a room using a pose fused point cloud from the BM WLS disparity image

References

[1] Smolyanskiy, Nikolai, et al. “On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach.” ArXiv.org, 8 July 2020, arxiv.org/abs/1803.09719.

[2] Smolyanskiy, Nikolai, et al. “Toward Low-Flying Autonomous MAV Trail Navigation Using Deep Neural Networks for Environmental Awareness.” ArXiv.org, 22 July 2017, arxiv.org/abs/1705.02550.

Last updated

Was this helpful?