In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA Fortran for an NVIDIA GPU. You'll be working with a basic matrix transpose example. The prerequisites for this lab are as follows: Basic knowledge of programming with CUDA Fortran Please read the instructions at the bottom of this page before clicking the Start Lab button!
This lab teaches you how to use the Computational Network Toolkit (CNTK) from Microsoft for training and testing neural networks to recognize handwritten digits. You will work through a series of examples that will allow you to design, create, train and test a neural network to classify the MNIST handwritten digit dataset, illustrating the use of convolutional, pooling and fully connected layers as well as different types of activation functions. By the end of the lab you will have basic knowledge of convolutional neural networks, which will prepare you to move to more advanced usage of CNTK.
In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA C/C++ for an NVIDIA GPU. You'll be working with a basic matrix transpose example. The prerequisites for this lab are as follows: Basic knowledge of programming with CUDA C/C++
The primary purpose here is to explore how deep learning can be leveraged in a healthcare setting to predict severity of illness in patients based on information provided in electronic health records (EHR). In this lab we will use the python library pandas to manage dataset provided in HDF5 format and deep learning framework Keras to build recurrent neural networks (RNN). In particular, this lab will construct a special kind of deep recurrent neural network that is called a long-short term memory network (LSTM). The general idea here is to develop a analytic framework powered by deep learning techniques that provides medical professionals the capability to generate patient mortality predictions at any time of interest. Such a solution provides essential feedback to clinicians when trying to assess the impact of treatment decisions or raise early warning signs to flag at risk patients in a busy hospital care setting. Finally, we will compare the performance of this LSTM approach to standard mortality indices such as PIM2 and PRISM3 as well as contrast alternative solution formulations using more traditional machine learning methods like logistic regression. If you launch this lab then you agree to the following terms: The Dataset cannot be downloaded, shared, transferred or provided to any users for any activities outside of the Permitted Use during the workshop. All Permitted Users agree not to use the information in the Dataset to identify or contact the individuals who are data subjects in the DDS or his/her relatives, employers or household members.
Thrust is a parallel algorithms library loosely based on the C++ Standard Template Library. Thrust provides a number of building blocks, such as sort, scans, transforms, and reductions, to enable developers to quickly embrace the power of parallel computing. In addition to targeting the massive parallelism of NVIDIA GPUs, Thrust supports multiple system back-ends such as OpenMP and Intel’s Threading Building Blocks. This means that it’s possible to compile your code for different parallel processors with a simple flick of a compiler switch. In 90-minutes, you will work through a number of exercises including: Basic Iterators, Containers, and Functions Built-in and Custom Functors Fancy Iterators Portability to CPU processing Exception and Error handling A case study implementing all of the above
Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. OpenACC is a directive based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. In 90 minutes, you will experience a four-step process for accelerating applications using OpenACC: Characterize and profile your application Add compute directives Add directives to optimize data movement Optimize your application using kernel scheduling
Learn how to accelerate your C/C++ application using drop-in libraries to harness the massively parallel power of NVIDIA GPUs. In about two hours, you will work through three exercises, including: Use cuBLAS to accelerate a basic matrix multiply Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls Use nvprof to profile code and optimize with some CUDA Runtime API calls
Leverage the NVIDIA Command-Line Profiler and an understanding of Unified Memory to iteratively optimize CUDA C/++ accelerated applications.
Learn about shared memory, generalized ufuncs, and GPU dataframes, intermediate topics for CUDA Python programming with Numba.
Learn how to accelerate your Fortran application using GPU Libraries to harness the massively parallel power of NVIDIA GPUs. In less than an hour, you will work through three exercises, including: Use cuBLAS to accelerate a basic matrix multiply Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls Use nvprof to profile code and optimize with some CUDA Runtime API calls Please read the instructions at the bottom of this page before clicking the Start Lab button!