The NVIDIA RAPIDS™ suite of software libraries, built on NVIDIA AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Get Started Get Started on Cloud
Ways to Get Started With RAPIDS
cuDF is a GPU DataFrame library that provides a pandas-like API for loading, filtering, and manipulating data.
- 10 Minutes to cuDF
- GPU-Accelerated DataFrames in Python: Part 1 (Blog)
- GPU-Accelerated DataFrames in Python: Part 2 (Blog)
- Getting Started Notebook
- Speed up DataFrame Operations With cuDF (DLI Course)
- Accelerating Sequential Python User-Defined Functions With RAPIDS on GPUs for 100X Speedups (Blog)
- RAPIDS cuDF Feature Engineering + XGB (Notebook)
The RAPIDS Accelerator for Apache Spark provides a set of plug-ins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries.
- Getting Started With Spark-RAPIDS (Docs)
- Spark Using RAPIDS Accelerator Examples (Code Sample)
- Migrating to GPU-Accelerated Apache Spark at NVIDIA (Webinar)
- Making Apache Spark More Concurrent (Blog)
- Cost-Effective Data Processing With Apache Spark and GPUs (Webinar)
- Saving Apache Spark Big Data Processing Costs on Google Cloud Dataproc (Blog)
- Improving Apache Spark Performance and Reducing Costs With Amazon EMR and NVIDIA (Blog)
cuGraph is a GPU-accelerated graph analytics library that includes support for property graphs, remote (graph as a service) operations, and graph neural networks.
cuML is a suite of libraries that implements machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects and matches APIs from scikit-learn in most cases.
- Beginner’s Guide to GPU-Accelerated Machine Learning Pipelines (Blog)
- cuML Documentation
- Getting Started With cuML (Notebook)
- Accelerating K-Nearest Neighbors by 600X using RAPIDS cuML (Blog)
- Accelerating Random Forests by up to 45X Using RAPIDS cuML (Blog)
- Advancing the State-of-the-Art in AutoML, Now 10X Faster With NVIDIA GPUs and RAPIDS (Blog)
- Text Classification With Naive Bayes Algorithms (Blog)
- Scaling-Out RAPIDS cuML and XGBoost With Dask on Google Kubernetes Engine (GKE) (Blog)
Learn How RAPIDS Is Being Deployed Today
AT&T applied the NVIDIA RAPIDS Accelerator for Apache Spark on GPU clusters for extract, transform, and load (ETL) and feature engineering stages in their data-to-AI pipeline, improving performance, reducing costs, and increasing simplicity compared to CPU-based Spark clusters and Databricks' Photon engine.
NVIDIA and NASA have been using RAPIDS to monitor air quality during the COVID-19 pandemic by combining surface monitoring data and near-real-time model data produced by the NASA GEOS-CF model. They use XGBoost to detect and quantify air pollution anomalies and build a bias-correction model that relates the model’s nitrogen dioxide predictions to observations.
Read Blog: Part 1
Read Blog: Part 2
TCS Optumera accelerated their demand forecasting pipeline using Spark+Rapids to generate accurate predictions at a granular level, resulting in a 6X acceleration of data pipelines and a 170X performance boost in model training.
Watch On-Demand Session
Walmart leveraged Dask-RAPIDS to solve scalability issues with their product substitution algorithm, resulting in substantial hardware and cost reductions, improved runtime, and better insights for their business and customers.
Watch On-Demand Session
Ping An, CAPE Analytics, Applica, Bank of Montreal, Capital One, Square, and Intuit are using NVIDIA's GPU-powered AI to improve customer service, prevent fraud, streamline processes, and accelerate growth, resulting in faster claims handling, more accurate underwriting decisions, elimination of manual errors, improved runtime, and better product design and selection.
IRS and Cloudera
The IRS team used Cloudera Data Platform with Spark 3.0 accelerated by NVIDIA to achieve 20X performance gains on a large dataset for uncovering fraud, allowing them to run previously impossible jobs and accelerate their current work. They plan to apply this to their next step of accelerating full-blown AI inference jobs.
Free Hands-On RAPIDS Labs on NVIDIA LaunchPad
Experience RAPIDS through one of the following free hands-on AI labs on hosted infrastructure:
- Predict Prices With Accelerated Data Processing
- Data Processing, Tokenization, and Sentiment Analysis
- Accelerating Apache Spark With Zero Code Changes
RAPIDS libraries are open source, written in Python, and built on Apache Arrow. The software is being developed in partnership with enterprises globally. Download RAPIDS to dramatically accelerate machine learning and data science.