<p>In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA Fortran for an NVIDIA GPU. You'll be working with a basic matrix transpose example.</p> <br> <p>The prerequisites for this lab are as follows: <ul> <li>Basic knowledge of programming with CUDA Fortran</li> </ul> </p> <br> <b><span style="color: red;">Please read the instructions at the bottom of this page before clicking the Start Lab button!</span></b>
This lab teaches you how to use the Computational Network Toolkit (CNTK) from Microsoft for training and testing neural networks to recognize handwritten digits. You will work through a series of examples that will allow you to design, create, train and test a neural network to classify the MNIST handwritten digit dataset, illustrating the use of convolutional, pooling and fully connected layers as well as different types of activation functions. By the end of the lab you will have basic knowledge of convolutional neural networks, which will prepare you to move to more advanced usage of CNTK.
<p>In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA C/C++ for an NVIDIA GPU. You'll be working with a basic matrix transpose example.</p> <br> <p>The prerequisites for this lab are as follows: <ul> <li>Basic knowledge of programming with CUDA C/C++</li> </ul> </p>
<p>Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. <a href="http://www.openacc.org" target="_blank">OpenACC</a> is a directive based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. In 90 minutes, you will experience a four-step process for accelerating applications using OpenACC: <ol> <li>Characterize and profile your application</li> <li>Add compute directives</li> <li>Add directives to optimize data movement</li> <li>Optimize your application using kernel scheduling</li> </ol> </p>
The primary purpose here is to explore how deep learning can be leveraged in a healthcare setting to predict severity of illness in patients based on information provided in electronic health records (EHR). In this lab we will use the python library <a href="http://pandas.pydata.org/">pandas</a> to manage dataset provided in <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">HDF5</a> format and deep learning framework <a href="https://keras.io/">Keras</a> to build recurrent neural networks (<a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">RNN</a>). In particular, this lab will construct a special kind of deep recurrent neural network that is called a long-short term memory network (<a href="https://en.wikipedia.org/wiki/Long_short-term_memory">LSTM</a>). The general idea here is to develop a analytic framework powered by deep learning techniques that provides medical professionals the capability to generate patient mortality predictions at any time of interest. Such a solution provides essential feedback to clinicians when trying to assess the impact of treatment decisions or raise early warning signs to flag at risk patients in a busy hospital care setting. Finally, we will compare the performance of this LSTM approach to standard mortality indices such as PIM2 and PRISM3 as well as contrast alternative solution formulations using more traditional machine learning methods like logistic regression.<br><br> <u><b>If you launch this lab then you agree to the following terms:</b></u> <ol> <li>The Dataset cannot be downloaded, shared, transferred or provided to any users for any activities outside of the Permitted Use during the workshop.</li> <li>All Permitted Users agree not to use the information in the Dataset to identify or contact the individuals who are data subjects in the DDS or his/her relatives, employers or household members.</li> </ol>
<p>Thrust is a parallel algorithms library loosely based on the C++ Standard Template Library. Thrust provides a number of building blocks, such as sort, scans, transforms, and reductions, to enable developers to quickly embrace the power of parallel computing. In addition to targeting the massive parallelism of NVIDIA GPUs, Thrust supports multiple system back-ends such as OpenMP and Intel’s Threading Building Blocks. This means that it’s possible to compile your code for different parallel processors with a simple flick of a compiler switch.<br> In 90-minutes, you will work through a number of exercises including: <ol> <li>Basic Iterators, Containers, and Functions</li> <li>Built-in and Custom Functors</li> <li>Fancy Iterators</li> <li>Portability to CPU processing</li> <li>Exception and Error handling</li> <li>A case study implementing all of the above</li> </ol></p>
Learn how to accelerate your C/C++ application using drop-in libraries to harness the massively parallel power of NVIDIA GPUs. In about two hours, you will work through three exercises, including: <ul> <li>Use cuBLAS to accelerate a basic matrix multiply</li> <li>Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls</li> <li>Use nvprof to profile code and optimize with some CUDA Runtime API calls </li> </ul>
Learn about shared memory, generalized ufuncs, and GPU dataframes, intermediate topics for CUDA Python programming with Numba.
Learn how to accelerate your Fortran application using GPU Libraries to harness the massively parallel power of NVIDIA GPUs. In less than an hour, you will work through three exercises, including: <ul> <li>Use cuBLAS to accelerate a basic matrix multiply</li> <li>Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls</li> <li>Use nvprof to profile code and optimize with some CUDA Runtime API calls </li> </ul> <br> <b><span style="color: red;">Please read the instructions at the bottom of this page before clicking the Start Lab button!</span></b>
Learn how a neural network with an autoencoder can be used to dramatically speed up the removal of noise in ray traced images. Learn how a neural network with an autoencoder can be used to dramatically speed up the removal of noise in ray traced images. You will learn how to: Determine whether noise exists in rendered images or not Use a pre-trained network to denoise some sample images or your own images Train your own denoiser using the provided dataset Upon completion of this Section, you will be able to use Autoencoders inside Neural Networks to train your own rendered image denoiser.
Leverage the NVIDIA Command-Line Profiler and an understanding of Unified Memory to iteratively optimize CUDA C/++ accelerated applications.
Docker is a popular container infrastructure which allows programs and large software frameworks to be packaged (i.e. containerized) and distributed as a single preconfigured image – alleviating the need for a complex installation and configuration process on the local host. Together with the nvidia-docker plugin, which exposes the GPU hardware on the host inside of the container, it is possible to run production grade deep learning workflows with considerably reduced host configuration and administration. In this lab we show you how to work with Docker images and manage the container lifecycle. We demonstrate how to access images on the public Docker image registry DockerHub for maximum reuse in creating composable lightweight containers. Finally, we give step-by-step examples of deep learning training in both TensorFlow and MXNet using nvidia-docker, and provide instructions for creating your own local registry for hosting Docker images on a private network. The lab concludes with a brief discussion on next steps, such as scaling container workflows for the datacenter, available tools in the Docker ecosystem and Cloud container services.
Deep learning allows us to map inputs to outputs that are extremely computationally intense. Learn to deploy deep learning to applications that recognize images and detect pedestrians in real time by: <li>Accessing and understanding the files that make up a trained model</li> <li>Building from each function’s unique input and output</li> <li>Optimizing the most computationally intense parts of your application for different performance metrics like throughput and latency</li> <br> Upon completion of this Lab, you will be able to implement deep learning to solve problems in the real world.
Use the TF.Learn API in TensorFlow to solve a binary classification problem: Given census data about a person such as age, gender, education and occupation (the features), we will try to predict whether or not the person earns more than 50,000 dollars a year (the target label). Train a logistic regression model, and given an individual's information our model will output a number between 0 and 1, which can be interpreted as the probability that the individual has an annual income of over 50,000 dollars.<br><br> In this lab, you will learn how to:<br> <ul> <li>Use Pandas and tf.contrib to load, view, and organize data.</li> <li>Select and engineer data.</li> <li>Train and evaluate a linear model.</li> <li>Use regularization to prevent overfitting. </li> </ul><br> On completion of this lab, you will be able to <b>go from dataset to trained linear model with multiple datasets.</b>
In this lab, we introduce some basic methods for utilizing a Convolutional Neural Network (CNN) to process Radio Frequency (RF) signals. More specifically, we look at the classic problem of detecting a weak signal corrupted by noise. We show you how to leverage the NVIDIA DIGITS application to read in a dataset, train a CNN, adjust hyper-parameters and then test and evaluate the performance of your model. <br><br> Lab created by <a href="https://kickview.com/" target="_blank">KickView - Intelligent Processing Applications</a>
Learn how to accelerate your Fortran application using CUDA to harness the massively parallel power of NVIDIA GPUs. In less than an hour, you will work through three exercises, including: <ul> <li>Hello Parallelism!</li> <li>Accelerate the simple SAXPY algorithm</li> <li>Accelerate a basic Matrix Multiply algorithm with CUDA</li> </ul> <br> <b><span style="color: red;">Please read the instructions at the bottom of this page before clicking the Start Lab button!</span></b>
Learn and employ the fundamental techniques for GPU-accelerating CPU-only applications on the world’s most performant parallel processors using CUDA C/C++.
Many problems have established deep learning solutions, but sometimes the problem that you want to solve does not. Learn to create custom solutions through the challenge of detecting whale faces from aerial images by: <li>Combining traditional computer vision with deep learning</li> <li>Performing minor “brain surgery” on an existing neural network using the deep learning framework Caffe</li> <li>Harnessing the knowledge of the deep learning community by identifying and using a purpose built network and end-to-end labeled data.</li> <br> Upon completion of this lab, you will be able to solve custom problems with deep learning.
Learn how to accelerate your Python application using CUDA to harness the massively parallel power of NVIDIA GPUs. In less than an hour, you will work through three exercises, including: <ul> <li>Hello Parallelism!</li> <li>Accelerate the simple SAXPY algorithm</li> <li>Accelerate a basic Matrix Multiply algorithm with CUDA</li> </ul>
This label explores various approaches to the problem of semantic image segmentation, which is a generalization of image classification where class predictions are made at the pixel level. In this lab we will use the Sunnybrook Cardiac Data to train a neural network to learn to locate the left ventricle on MRI images. On completion of this lab, you will understand how to use popular image classification neural networks for semantic segmentation, you will learn how to extend Caffe with custom Python layers, you will become familiar with the concept of transfer learning and you will get to train two neural networks from the family of Fully Convolutional Networks (FCN).
Deep learning enables entirely new solutions by replacing hand-coded instructions with models learned from examples. Train a deep neural network to recognize handwritten digits by: <li>Loading image data to a training environment</li> <li>Choosing and training a network</li> <li>Testing with new data and iterating to improve performance</li> <br> On completion of this Lab, you will be able to assess what data you should be training from.
An Introduction to CUDA Python programming with Numba.
There are a variety of important applications that need to go beyond detecting individual objects within an image, and that instead need to segment the image into spatial regions of interest. An example of image segmentation involves medical imagery analysis, where it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells, so that you can isolate a particular organ. Another example includes self-driving cars, where segmenting an image into distinct areas is needed to understand road scenes. In this lab, you will learn how to train and evaluate an image segmentation network using TensorFlow.
Thanks to work being performed at Mayo Clinic, approaches using deep learning techniques to detect Radiomics from MRI imaging can lead to more effective treatments and yield better health outcomes for patients with brain tumors. Radiogenomics, specifically Imaging Genomics, refers to the correlation between cancer imaging features and gene expression. Imaging Genomics (Radiomics) can be used to create biomarkers that identify the genomics of a disease without the use of an invasive biopsy. The focus of this lab is detection of the 1p19q co-deletion biomarker using deep learning - specifically convolutional neural networks – using Keras and TensorFlow. What is remarkable about this research and lab is the novelty and promising results of utilizing deep learning to predict Radiomics.
If you have not already registered, please sign-up for the <a href="https://developer.nvidia.com/openacc-course">OpenACC Lab series</a>.<br><br> It is highly recommended that you have basic understanding of programming with OpenACC. If you do not, try the OpenACC - 2X in 4 Steps lab first!<br><br> This lab continues to work completed in the lab "Profiling and Parallelizing" by adding OpenACC data management directives and then optimizing the code using the OpenACC loop directive. Participants will use the PGI compiler and NVIDIA Visual Profiler to optimize the code. This lab is intended to be taken after completing the previous lab and after watching lecture 3 of the free OpenACC <a href="https://developer.nvidia.com/openacc-course">course</a> provided by NVIDIA
Explore the fundamentals of deep learning by training neural networks and using results to improve performance and capabilities. In this hands-on course, you will learn the basics of deep learning by training and deploying neural networks. You will: ● Implement common deep learning workflows such as image classification and object detection ● Experiment with data, training parameters, network structure, and other strategies to increase performance and capability ● Deploy your networks to start solving real world problems On completion, you will be able to solve your own problems with deep learning. The Quest below is an older version of this course. For the newer more up-to-date version of this course please explore the new DLI Cloud Platfrom here: https://courses.nvidia.com/courses/course-v1:DLI+C-FX-01+V2/about
In this hands-on course, you will learn how to apply Convolutional Neural Networks (CNNs) to MRI scans to perform a variety of medical tasks and calculations. You will: ● Perform image segmentation on MRI images to determine the location of the left ventricle ● Calculate ejection fractions by measuring differences between diastole and systole using CNNs applied to MRI scans to detect heart disease ● Apply CNNs to MRI scans of LGGs to determine 1p/19q chromosome co-deletion status Upon completion of this course, you’ll be able to apply CNNs to MRI scans to conduct a variety of medical tasks.
In this hands-on course, you will learn the basics of deep learning and how to apply deep learning to detect chromosome co-deletion and search for motifs in genomic sequences. You will: ● Understand the basics of Convolutional Neural Networks (CNNs) and how they work ● Apply CNNs to MRI scans of LGGs to determine 1p/19q chromosome co-deletion status ● Use the DragoNN toolkit to simulate genomic data and to search for motifs Upon completion of this course, you’ll be able to: understand how CNNs work, evaluate MRI images using CNNs, and use real regulatory genomic data to research new motifs.
Accelerate your C/C++ applications on the massively parallel NVIDIA GPUs using CUDA. This course is for anyone with some C/C++ experience who’s interested in accelerating the performance of their applications beyond the limits of CPU-only programming. In this course, you’ll learn how to: • Extend your C/C++ code with the CUDA programming model • Write and launch kernels that execute with massive parallelism on an NVIDIA GPU • Profile and optimize your accelerated programs Upon completion, you’ll be able to write massively parallel heterogeneous programs on powerful NVIDIA GPUs, and optimize their performance by utilizing NVVP.
Learn the basics of OpenACC, a high-level programming language for programming on GPUs. This course is for anyone with some C/C++ experience who is interested in accelerating the performance of their applications beyond the limits of CPU-only programming. In this course, you’ll learn: • Four simple steps to accelerating your already existing application with OpenACC • How to profile and optimize your OpenACC codebase • How to program on multi-GPU systems by combining OpenACC with MPI Upon completion, you’ll be able to build and optimize accelerated heterogeneous applications on multiple GPU clusters using a combination of OpenACC, CUDA-aware MPI, and NVIDIA profiling tools.
Prerequisites: Basic Python competency Duration: 8 hours This hands-on course explores how to use Numba – the just-in-time, type-specializing, Python function compiler – to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to: • Use Numba to compile CUDA kernels from NumPy ufuncs • Use Numba to create and launch custom CUDA kernels • Apply key GPU memory management techniques Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.
The primary purpose of this lab is to explore the second annual national data science bowl (NDSB2). The challenge posed by NDSB2 was to estimate ejection fraction from a sequence of MRI derived images of a beating heart. In essence, the ejection fraction is the difference of the blood volume into the heart minus the blood volume out of the heart. That is, the volume of blood ejected from a human heart over a single beat (i.e. expansion and contraction). The general notion here is that an abnormal ejection fraction (too small or too large) is indicative of a serious medical condition. In a typical laboratory setting, it can take upwards of 20 mins for medical professionals to analyze a single pulmonary MRI scan -- no doubt time better spent instead with patients. For this qwiklab we use the popular R programming language with deep learning framework MXNet to create a powerful GPU accelerated convolution neural network (CNN) solution. This lab will outline the process of preparing a large image dataset for training as well as general considerations and common strategies for deep learning. With only this brief encounter, we will not be able to obtain the near human performance levels achieved by the NDSB2 competition finalists, however, tutorial alumni will take away the essential knowledge and basic skills to be successful in creating their own deep learning workflows. Finally, we hope this interaction raises awareness for applications of deep learning in healthcare and inspires participants to contribute their own ideas in the next national data science bowl.
<p>Learn about the three techniques for accelerating code on a GPU; Libraries, Directives like OpenACC, and writing code directly in CUDA-enabled langauges. In 45 minutes, you will work through a few different exercises demonstrating the potential speed-ups and ease of use of porting to the GPU.</p>
Learn how to accelerate your Python application using GPU drop-in libraries to harness the massively parallel power of NVIDIA GPUs. In less than an hour, you will work through three exercises, including: <ul> <li>Use a Python profiler to determine which part of the code is consuming the most amount of time</li> <li>Use a cuRAND API call to optimize this portion of code</li> <li>Profile again and use the CUDA Runtime API to optimize data movement to achieve more application speed-up</li> </ul> <br> <b>Please read instructions below before starting lab!</b>
Learn to write custom CUDA kernels and techniques for optimal memory migration for CUDA Python with Numba.
In this lab we introduce deep learning accelerated by GPUs. We tour popular software frameworks for deep learning by training Convolutional Neural Networks (CNNs) in each framework to classify images.
<p>OpenACC is a high-level language for programming GPUs using compiler hints. With OpenACC a programmer can take advantage of the benefits of GPUs with little code change and incremental improvements to their existing code. This lab is intended for existing OpenACC programmers to take their OpenACC skills to the next level by optimizing data copies to be overlapped with GPU computation using a simple technique known as pipelining. When it’s impossible to completely eliminate the need to copy data to and from the GPU memory, pipelining makes it possible to make these copies nearly free.<br> In 90 minutes, you will work through a number of exercises including: <ol> <li>Using the OpenACC routine directive to allow on-device function calls.</li> <li>Breaking up large work into bite-sized pieces.</li> <li>Working on these pieces asynchronously from the CPU</li> <li>Overlapping GPU computation and PCIe data motion Some OpenACC experience is required to take this lab. For an introduction to OpenACC, please see our other labs.</li> </ol> </p>
<p>In this lab you will learn how to program multi GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC. Basic knowledge of MPI and OpenACC is a prerequisite. The topics covered by this lab are:<br> <ul> <li>Exchanging data between different GPUs using CUDA-aware MPI and OpenACC</li> <li>Handle GPU affinity in multi GPU systems</li> <li>Overlapping communication with computation to hide communication times</li> <li>Optionally how to use the NVIDIA performance analysis tools</li> </ul></p> <p><span style="color:red;">Recommended prerequisites</span> for this lab are: C or Fortran, basic OpenACC and basic MPI.</p>
In this lab, we will take a look at 3 different approaches to convolution using CUDA. These 3 approaches will fit different kinds of problem domains. They are:<ol> <li>Convolution of 2 equal-size signals.</li> <li>Convolution of a large signal with a small signal</li> <li>Convolution of 2 signals using Frequency Domain (FFT) methods</li> </ol><br> This lab will introduce constant memory and shared memory for optimization of global memory accesses. In addition, the NVIDIA visual profiler will be briefly used to help understand global memory access statistics.<br><br> <span style="color:red;"><b>Prerequisites</b></span>: You are expected to know how to write basic CUDA kernels, and familiarity with basic CUDA codes like vector Add.
In this lab, we use the DragoNN toolkit on simulated and real regulatory genomic data, demystify popular DragoNN (Deep RegulAtory GenOmics Neural Network) architectures and provide guidelines for modeling and interpreting regulatory sequence using DragoNN models. We will answer questions such as When is a DragoNN good choice for a learning problem in genomics? How does one design a high-performance model? And more importantly, can we interpret these models to discover predictive genome sequence patterns to gain new biological insights?
This Module will guide you through how to transfer the look and feel of one image to another image by extracting distinct visual features. See how convolutional neural networks are used for feature extraction, and feeds into a generator for painting a new resultant image. You will learn how to: Transfer the look and feel of one image to another image by extracting distinct visual features Qualitatively determine whether a style is transferred correctly using different techniques Use architectural innovations and training techniques for arbitrary style transfer Upon completion of this Section, you will be able to use Neural Networks to do arbitrary style transfer that is fast enough to apply even to videos.
Effective descriptions of content within images and video clips has been performed with convolutional and recurrent neural networks. Users will apply a deep learning technique via a framework to create captions on data and generate their own captions.
Leverage the NVIDIA Visual Profiler and an understanding of concurrent CUDA streams to iteratively optimize CUDA C/++ accelerated applications.
If you have not already registered, please sign-up for the <a href="https://developer.nvidia.com/openacc-course">OpenACC Lab series</a>.<br><br> It is highly recommended that you have basic understanding of programming with OpenACC. If you do not, try the OpenACC - 2X in 4 Steps lab first!<br><br> In this lab participants will gain experience with the first two steps of the OpenACC programming cycle: Identify and Express Parallelism. Participants will profile a provided C or Fortran application using NVIDIA NVPROF and use the PGI OpenACC compiler to accelerate the code. This lab is intended to be taken after lecture 2 of the OpenACC course provided by NVIDIA.
<p>In this self-paced, hands-on lab, you will learn how to improve a multi GPU MPI+OpenACC program. It is a follow-up lab of the Introduction to Multi GPU Programming with MPI and OpenACC lab. Knowledge on how to program multiple GPUs with MPI and OpenACC is a prerequisite. The topics covered by this lab are<br> <ul> <li>Overlapping communication with computation to hide communication times</li> <li>Handling noncontiguous halo updates with a 2D tiled domain decomposition</li> </ul></p> <p><span style="color:red;">Recommended prerequisites</span> C or Fortran, basic OpenACC and basic MPI.</p>
This Module will guide you through the process of training a Generative Adversarial Network (GAN) to generate image contents in DIGITS. You will learn how to: Use Generative Adversarial Networks (GANs) to create handwritten numbers Visualize the feature space and use attribute vector to generate image analogies Train a GAN to generate images with set attributes Upon completion of this Section, you will be able to use GANs to generate images by manipulating feature space.
OpenACC is methodically evolving to improve programmer productivity for exploration & production workloads, allowing developers to focus less on computer science, and more on geoscience. This lab will use a profile-driven approach with hints from the compiler and underlying memory management to accelerate an open source seismic processing application. Using PGPROF, a profiling tool to help accelerate both host and GPU code, this lab contains four tasks: <ul> <li>Assess: Identify critical regions, and profile baseline CPU code</li> <li>Parallelize: Decorate key loops using parallel directives, and use managed memory to migrate pages automatically</li> <li>Optimize: Use profile, and verbose compiler output to decorate data directives, and measure best performance</li> <li>Deploy: Instead of using OpenMP, or pthreads to maximize CPU cores, use compiler “multicore” option with OpenACC directives for portable performance </li> </ul>