Categories
Offsites

Pathdreamer: A World Model for Indoor Navigation

When a person navigates around an unfamiliar building, they take advantage of many visual, spatial and semantic cues to help them efficiently reach their goal. For example, even in an unfamiliar house, if they see a dining area, they can make intelligent predictions about the likely location of the kitchen and lounge areas, and therefore the expected location of common household objects. For robotic agents, taking advantage of semantic cues and statistical regularities in novel buildings is challenging. A typical approach is to implicitly learn what these cues are, and how to use them for navigation tasks, in an end-to-end manner via model-free reinforcement learning. However, navigation cues learned in this way are expensive to learn, hard to inspect, and difficult to re-use in another agent without learning again from scratch.

People navigating in unfamiliar buildings can take advantage of visual, spatial and semantic cues to predict what’s around a corner. A computational model with this capability is a visual world model.

An appealing alternative for robotic navigation and planning agents is to use a world model to encapsulate rich and meaningful information about their surroundings, which enables an agent to make specific predictions about actionable outcomes within their environment. Such models have seen widespread interest in robotics, simulation, and reinforcement learning with impressive results, including finding the first known solution for a simulated 2D car racing task, and achieving human-level performance in Atari games. However, game environments are still relatively simple compared to the complexity and diversity of real-world environments.

In “Pathdreamer: A World Model for Indoor Navigation”, published at ICCV 2021, we present a world model that generates high-resolution 360º visual observations of areas of a building unseen by an agent, using only limited seed observations and a proposed navigation trajectory. As illustrated in the video below, the Pathdreamer model can synthesize an immersive scene from a single viewpoint, predicting what an agent might see if it moved to a new viewpoint or even a completely unseen area, such as around a corner. Beyond potential applications in video editing and bringing photos to life, solving this task promises to codify knowledge about human environments to benefit robotic agents navigating in the real world. For example, a robot tasked with finding a particular room or object in an unfamiliar building could perform simulations using the world model to identify likely locations before physically searching anywhere. World models such as Pathdreamer can also be used to increase the amount of training data for agents, by training agents in the model.

Provided with just a single observation (RGB, depth, and segmentation) and a proposed navigation trajectory as input, Pathdreamer synthesizes high resolution 360º observations up to 6-7 meters away from the original location, including around corners. For more results, please refer to the full video.

How Does Pathdreamer Work?
Pathdreamer takes as input a sequence of one or more previous observations, and generates predictions for a trajectory of future locations, which may be provided up front or iteratively by the agent interacting with the returned observations. Both inputs and predictions consist of RGB, semantic segmentation, and depth images. Internally, Pathdreamer uses a 3D point cloud to represent surfaces in the environment. Points in the cloud are labelled with both their RGB color value and their semantic segmentation class, such as wall, chair or table.

To predict visual observations in a new location, the point cloud is first re-projected into 2D at the new location to provide ‘guidance’ images, from which Pathdreamer generates realistic high-resolution RGB, semantic segmentation and depth. As the model ‘moves’, new observations (either real or predicted) are accumulated in the point cloud. One advantage of using a point cloud for memory is temporal consistency — revisited regions are rendered in a consistent manner to previous observations.

Internally, Pathdreamer represents surfaces in the environment via a 3D point cloud containing both semantic labels (top) and RGB color values (bottom). To generate a new observation, Pathdreamer ‘moves’ through the point cloud to the new location and uses the re-projected point cloud image for guidance.

To convert guidance images into plausible, realistic outputs Pathdreamer operates in two stages: the first stage, the structure generator, creates segmentation and depth images, and the second stage, the image generator, renders these into RGB outputs. Conceptually, the first stage provides a plausible high-level semantic representation of the scene, and the second stage renders this into a realistic color image. Both stages are based on convolutional neural networks.

Pathdreamer operates in two stages: the first stage, the structure generator, creates segmentation and depth images, and the second stage, the image generator, renders these into RGB outputs. The structure generator is conditioned on a noise variable to enable the model to synthesize diverse scenes in areas of high uncertainty.

Diverse Generation Results
In regions of high uncertainty, such as an area predicted to be around a corner or in an unseen room, many different scenes are possible. Incorporating ideas from stochastic video generation, the structure generator in Pathdreamer is conditioned on a noise variable, which represents the stochastic information about the next location that is not captured in the guidance images. By sampling multiple noise variables, Pathdreamer can synthesize diverse scenes, allowing an agent to sample multiple plausible outcomes for a given trajectory. These diverse outputs are reflected not only in the first stage outputs (semantic segmentation and depth images), but in the generated RGB images as well.

Pathdreamer is capable of generating multiple diverse and plausible images for regions of high uncertainty. Guidance images on the leftmost column represent pixels that were previously seen by the agent. Black pixels represent regions that were previously unseen, for which Pathdreamer renders diverse outputs by sampling multiple random noise vectors. In practice, the generated output can be informed by new observations as the agent navigates the environment.

<!–

Pathdreamer is capable of generating multiple diverse and plausible images for regions of high uncertainty. Guidance images on the leftmost column represent pixels that were previously seen by the agent. Black pixels represent regions that were previously unseen, for which Pathdreamer renders diverse outputs by sampling multiple random noise vectors. In practice, the generated output can be informed by new observations as the agent navigates the environment.

–>

Pathdreamer is trained with images and 3D environment reconstructions from Matterport3D, and is capable of synthesizing realistic images as well as continuous video sequences. Because the output imagery is high-resolution and 360º, it can be readily converted for use by existing navigation agents for any camera field of view. For more details and to try out Pathdreamer yourself, we recommend taking a look at our open source code.

Application to Visual Navigation Tasks
As a visual world model, Pathdreamer shows strong potential to improve performance on downstream tasks. To demonstrate this, we apply Pathdreamer to the task of Vision-and-Language Navigation (VLN), in which an embodied agent must follow a natural language instruction to navigate to a location in a realistic 3D environment. Using the Room-to-Room (R2R) dataset, we conduct an experiment in which an instruction-following agent plans ahead by simulating many possible navigable trajectory through the environment, ranking each against the navigation instructions, and choosing the best ranked trajectory to execute. Three settings are considered. In the Ground-Truth setting, the agent plans by interacting with the actual environment, i.e. by moving. In the Baseline setting, the agent plans ahead without moving by interacting with a navigation graph that encodes the navigable routes within the building, but does not provide any visual observations. In the Pathdreamer setting, the agent plans ahead without moving by interacting with the navigation graph and also receives corresponding visual observations generated by Pathdreamer.

When planning ahead for three steps (approximately 6m), in the Pathdreamer setting the VLN agent achieves a navigation success rate of 50.4%, significantly higher than the 40.6% success rate in the Baseline setting without Pathdreamer. This suggests that Pathdreamer encodes useful and accessible visual, spatial and semantic knowledge about real-world indoor environments. As an upper bound illustrating the performance of a perfect world model, under the Ground-Truth setting (planning by moving) the agent’s success rate is 59%, although we note that this setting requires the agent to expend significant time and resources to physically explore many trajectories, which would likely be prohibitively costly in a real-world setting.

We evaluate several planning settings for an instruction-following agent using the Room-to-Room (R2R) dataset. Planning ahead using a navigation graph with corresponding visual observations synthesized by Pathdreamer (Pathdreamer setting) is more effective than planning ahead using the navigation graph alone (Baseline setting), capturing around half the benefit of planning ahead using a world model that perfectly matches reality (Ground-Truth setting).

Conclusions and Future Work
These results showcase the promise of using world models such as Pathdreamer for complicated embodied navigation tasks. We hope that Pathdreamer will help unlock model-based approaches to challenging embodied navigation tasks such as navigating to specified objects and VLN.

Applying Pathdreamer to other embodied navigation tasks such as Object-Nav, continuous VLN, and street-level navigation are natural directions for future work. We also envision further research on improved architecture and modeling directions for the Pathdreamer model, as well as testing it on more diverse datasets, including but not limited to outdoor environments. To explore Pathdreamer in more detail, please visit our GitHub repository.

Acknowledgements
This project is a collaboration with Jason Baldridge, Honglak Lee, and Yinfei Yang. We thank Austin Waters, Noah Snavely, Suhani Vora, Harsh Agrawal, David Ha, and others who provided feedback throughout the project. We are also grateful for general support from Google Research teams. Finally, we thank Tom Small for creating the animation in the third figure.

Categories
Misc

NVIDIA AI Perception Coming to ROS Developers

Image of a piece of robotics equipment in a warehouse.NVIDIA announces new initiatives to deliver a suite of perception technologies to the ROS developer community. Image of a piece of robotics equipment in a warehouse.

All things that move will become autonomous. And all things autonomous will require advanced real-time perception. 

NVIDIA announced its latest initiatives to deliver a suite of perception technologies to the ROS developer community. These initiatives will reduce development time and improve performance for developers seeking to incorporate cutting-edge computer vision and AI/ML functionality into their ROS-based robotics applications.

Open Robotics to Extend ROS for NVIDIA AI

NVIDIA and Open Robotics have entered into an agreement to accelerate ROS 2 performance on NVIDIA’s Jetson edge AI platform and GPU-based systems and to enable seamless simulation interoperability between Open Robotics’s Ignition Gazebo and NVIDIA Isaac Sim on Omniverse. 

The Jetson platform is widely adopted by roboticists across a spectrum of applications. It is designed to enable high-performance, low latency processing for robots to be responsive, safe, and collaborative. Open Robotics will enhance ROS 2 to enable efficient management of data flow and shared memory across GPU and other processors present on Jetson. This will significantly improve the performance of applications that have to process high bandwidth data from sensors such as cameras and lidars in real-time. 

In addition to the enhancements for deployment of robot applications on Jetson, Open Robotics and NVIDIA are working on plans to integrate Ignition Gazebo and NVIDIA Isaac Sim. NVIDIA Isaac Sim already supports ROS 1 & 2 out of the box and features an all-important ecosystem of 3D content with its connection to popular applications, e.g., Blender and Unreal Engine 4. 

Ignition Gazebo brings a decades-long track record of widespread use throughout the robotics community, including in high-profile competition events such as the ongoing DARPA Subterranean Challenge. With the two simulators connected, ROS developers can easily move their robots and environments between Ignition and Isaac Sim to run large scale simulations and take advantage of each simulator’s advanced features, such as high-fidelity dynamics, accurate sensor models, and photorealistic rendering to generate synthetic data for training and testing of AI models. 

“As more ROS developers leverage hardware platforms that contain additional compute capabilities designed to offload the host CPU, ROS is evolving to make it easier to efficiently take advantage of these advanced hardware resources,” said Brian Gerkey, CEO of Open Robotics. “Working with an accelerated computing leader like NVIDIA and its vast experience in AI and robotics innovation will bring significant benefits to the entire ROS community.”

Software resulting from this collaboration is expected to be released in the spring of 2022.

Isaac GEMs Released for ROS with Significant Speedup

Isaac GEMs for ROS are hardware accelerated packages that make it easier for ROS developers to build high-performance solutions on the Jetson platform. The focus of these GEMs is on improving throughput on image processing and on DNN-based perception models that are of growing importance to roboticists. These packages reduce the load on the host CPU while providing significant performance gain. 

The new Isaac GEMs for ROS include: 

Image demonstrates Isaac ROS stereo camera support with the left and right camera view in the ROS Rviz tool.
Figure 1. Stereo camera support in ROS with left and right camera view in the ROS Rviz tool. Both RGB and depth images are shown in RViz.

New Isaac Sim Features Enable ROS Developers

The latest release of Isaac Sim includes significant support for the ROS developer community. Some of the more compelling examples of this are the ROS2 Navigation stack and the MoveIt Motion Planning Framework. These examples are available today and can be found in the Isaac Sim documentation

List of ROS Examples in Isaac Sim 

  • ROS April Tag
  • ROS Stereo Camera
  • ROS Navigation
  • ROS TurtleBot3 Sample
  • ROS Manipulation and Camera Sample
  • ROS Services
  • MoveIt Motion Planning Framework
  • Native Python ROS Usage
  • ROS2 Navigation
Figure 2. Functional block diagram of Isaac Sim on Omniverse showing robot model, environment model, and 3D assets inputs.

Isaac Sim Generates Synthetic Datasets for Training Perception

In addition to being a robotic simulator, Isaac Sim has a powerful set of capabilities to generate synthetic data to train and test perception models. These capabilities will be more important as roboticists incorporate more perception features into their platforms. It’s clear that the better that a robot can perceive its environment the more autonomous it can be, thereby requiring less human intervention. 

Once Isaac Sim generates synthetic datasets, they can be fed directly into NVIDIA TAO, an AI model adaptation platform, to adapt perception models for a robot’s specific working environment. The task of ensuring that a robot’s perception stack is going to perform in a given working environment can be started well before any real-data is ever collected from the target surroundings.

Roboticists have long faced challenges in connecting and integrating the classic robotic tasks like navigation to AI-based perception stacks. Isaac Sim addresses this workflow challenge by being a robotics and synthetic data generation tool simultaneously, with streamlined TAO training platform integration.



More to Come at ROS World and GTC 2021

NVIDIA is gearing up for ROS World on Oct 21-22, 2021. We are planning to release more new GEMs for Jetson developers including several popular DNNs. We will also announce features in Isaac Sim to support the ROS developer community. Be sure to stop by our virtual booth, attend our NVIDIA ROS roundtable, watch the technical presentation on Isaac Sim, and more.

NVIDIA has a great lineup of speakers, talks, and content at the upcoming GTC scheduled for Nov. 8-11. We have a track for robotics developers including a presentation by Brian Gerkey, CEO and cofounder of Open Robotics. Additionally, we have talks covering NVIDIA Jetson, Isaac ROS, Isaac Sim, Isaac GYM and more.

Getting Started Today

The following resources are available for developers interested in getting started today on adding NVIDIA AI Perception to their products.

Isaac GEMs for ROS >>
Isaac Sim information >>
Tutorials on Synthetic Data Generation with Isaac Sim >>
Accelerating ML Training with TAO toolkit information >>

Categories
Misc

AI Model Rapidly Identifies Structures Damaged by Wildfires

Image of a wildfire encroaching on a town in Portugal.New research develops a deep learning algorithm to detect wildfire damage remotelyImage of a wildfire encroaching on a town in Portugal.

Wildfire evacuees and disaster response groups could soon have the power to remotely scan a town for structural damage within minutes, using the newly developed AI tool DamageMap.

A collaboration between researchers at Stanford University and California Polytechnic State University, San Luis Obispo—the project uses aerial imagery and a deep learning algorithm to pinpoint building damage after a wildfire event. The research could guide disaster relief and personnel toward areas that need it most, while keeping concerned homeowners informed.

“After a fire or disaster, lots of people need or want to know the extent and severity of damage. We set out to help reduce the response time to get actionable information valuable to fire victims, and emergency and recovery personnel,” said G. Andrew Fricker, an assistant professor at Cal Poly and codeveloper of DamageMap.

As the impacts of climate change lead to warmer and drier conditions, wildfire disasters are hitting communities more frequently and severely. In 2020 Western US wildfires destroyed over 13,000 buildings, amounting to almost $20 billion in losses. With months to go in this season, California has already seen over 7,000 fires damage about 3,000 structures.

When blazes subside damage assessments teams perform inspections and evaluate the safety of burned areas. These reports are used by emergency operations centers to organize disaster relief and recovery resources for residents. Knowing the location and the amount of damage in a region could help emergency groups allocate resources, especially when juggling multiple fires simultaneously.

While inspections are an essential step for repopulation, they are also time-consuming and resource-intensive.

Recent machine learning models have looked to alleviate this burden using satellite imagery. But, most methods require high-quality pre- and post-wildfire images of similar composition (such as lighting and angle) to detect changes and pinpoint areas of damage. They also require up-to-date images for accuracy, which can be costly to maintain and difficult to scale.

With DamageMap, the researchers trained a new deep learning algorithm capable of detecting damage by employing two models that work together and sleuth out the conditions of a building. The first model relies on any pre-fire drone or satellite imagery in a region to detect buildings and map out footprints. The second model uses post-fire aerial images to determine structural damage, such as scorched roofs or destroyed buildings.

The researchers used a database of 47,543 images of structures from five different wildfires across the globe to train the neural network. Hand-labeling a subset of these images as damaged and undamaged, the algorithm learned to identify and classify structures.

They tested the model using imagery from two recent California wildfires—the Butte County Camp Fire, and Shasta and Trinity County Carr Fire. Comparing model predictions against ground surveyor data—which records the location of damaged buildings—DamageMap accurately detected damaged structures about 96% of the time.

The technology is not only accurate, it’s also fast. Using an NVIDIA GPU and the cuDNN-accelerated PyTorch deep learning framework, DamageMap processes images at a rate of about 60 milliseconds per image.

Figure 1. DamageMap identifies damaged buildings in red and safe in green.
Courtesy of the DamageMap team 

Classifying the 15,931 buildings in the town of Paradise—an area almost completely destroyed by the 2018 Camp Fire—takes 16 minutes.  

The work is available for testing and exploring, with the code and supporting analysis on GitHub. The researchers encourage others to use, develop, and improve the model further. 

According to Fricker, the tool can be trained to look beyond damaged buildings and include elements such as burned cars, or downed power lines to further inform response and recovery efforts.


Read the full article in the International Journal of Disaster Risk Reduction >>
Read more >>   

Categories
Misc

NVIDIA Invites Healthcare Startup Submissions to Access UK’s Most Powerful Supercomputer

It takes major computing power to tackle major projects in digital biology — and that’s why we’re connecting pioneering healthcare startups with the U.K.’s most powerful supercomputer, Cambridge-1. U.K. startups can now apply to harness the system, which is dedicated to advancing healthcare with AI and digital biology. Since inaugurating Cambridge-1 in July, five founding Read article >

The post NVIDIA Invites Healthcare Startup Submissions to Access UK’s Most Powerful Supercomputer appeared first on The Official NVIDIA Blog.

Categories
Misc

Wild Things: 3D Reconstructions of Endangered Species with NVIDIA’s Sifei Liu

Endangered species can be difficult to study — they’re elusive, and the very act of observing them can disrupt their lives. Now, scientists can take a closer look at endangered species by studying AI-generated 3D representations of them. Sifei Liu, a senior research scientist at NVIDIA, has worked with her team to create an algorithm Read article >

The post Wild Things: 3D Reconstructions of Endangered Species with NVIDIA’s Sifei Liu appeared first on The Official NVIDIA Blog.

Categories
Misc

Next Generation: ‘Teens in AI’ Takes on the Ada Lovelace Hackathon

Jobs in data science and AI are among the fastest growing in the entire workforce, according to LinkedIn’s 2021 Jobs Report. Teens in AI, a London-based initiative, is working to inspire the next generation of AI researchers, entrepreneurs and leaders through a combination of hackathons, accelerators, networking events and bootcamps. In October, the organization, with Read article >

The post Next Generation: ‘Teens in AI’ Takes on the Ada Lovelace Hackathon appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Calls UK AI Strategy “Important Step,” Will Open Cambridge-1 Supercomputer to UK Healthcare Startups

NVIDIA today called the U.K. government’s launch of its AI Strategy an important step forward, and announced a programme to open the Cambridge-1 supercomputer to U.K. healthcare…

Categories
Misc

Guide to Autoencoders with TensorFlow & Keras

Guide to Autoencoders with TensorFlow & Keras submitted by /u/RubiksCodeNMZ
[visit reddit] [comments]
Categories
Misc

Spyder/Tensorflow stuck on first epoch

I’ll link the StackOverflow post: https://stackoverflow.com/questions/69267805/spyder-tensorflow-stuck-on-first-epoch

Help is deeply appreciated. Thanks.

submitted by /u/Snoo37084
[visit reddit] [comments]

Categories
Misc

Getting Started with NVIDIA Networking

Man with a laptop.Preview and test Cumulus Linux in your own environment, at your own pace, without organizational or economic barriers. Man with a laptop.

Looking to try open networking for free? Try NVIDIA Cumulus VX—a free virtual appliance that provides all the features of NVIDIA Cumulus Linux. You can preview and test Cumulus Linux in your own environment, at your own pace, without organizational or economic barriers. You can also produce sandbox environments for prototype assessment, preproduction rollouts, and script development.

Cumulus VX runs on all popular hypervisors, such VirtualBox and VMware VSphere, and orchestrators, such as Vagrant and GNS3. 

Our website has the images needed to run NVIDIA Cumulus VX on your preferred hypervisor—download is simple. What’s more, we provide a detailed guide on how to install and set up Cumulus VX to create this simple two leaf, one spine topology.

Figure 1. Cumulus VX two leaf, one spine topology.

With these three switches up and running, you are all set to try out NVIDIA Cumulus Linux features, such as traditional networking protocols (BGP and MLAG), and NVIDIA Cumulus-specific technologies, such as ONIE and Prescriptive Topology Manager (PTM). And, not to worry, the Cumulus Linux User’s Guide is always close at hand to help you out, as well as the community Slack channel, where you can submit questions and engage with the wider community.

Explore further and try advanced configurations:

  • Update your virtual environment to use the NVIDIA Cumulus Linux on-demand self-paced labs (a quick and easy way to learn the fundamentals.) 
  • Run the topology converter to simulate a custom network topology with VirtualBox and Vagrant, or KVM-QEMU and Vagrant.

If your needs are different, or if you have platform or disk limitations, we also provide an alternative to NVIDIA Cumulus VX. NVIDIA Cumulus in the Cloud is a free, personal, virtual data center network that provides a low-effort way to see NVIDIA Cumulus technology in action—no hypervisor needed.