Skip to content

NDP Workspace

The Workspace is a collaborative environment designed to support a wide range of projects, including AI and Machine Learning (ML) workflows, exploratory data analysis (EDA), scientific research projects and educational projects. Each workspace operates within JupyterHub and provides integration with data resources from NDP Data Catalog and external GitHub repositories.

Use Cases

Integrated Workflow Development

The workspace is the main unit for assembling and delivering complete research workflows by integrating datasets from the data catalog, source code from GitHub, and connections to computing resources. Researchers use workspaces to combine data, code, and computation in a unified environment, enabling streamlined exploration, analysis, and experimentation.

Classrooms and Data Challenges

Within a classroom or data challenge environment, workspaces function as foundational units that support both structured learning and exploratory, project-based activities. These workspaces align with course or data challenge objectives and provide students with interactive, hands-on modules, assessments, and access to curated datasets and computing resources.

As part of the learning experience, students and participants are encouraged to develop their own workspaces.

As an example see the Example Data Challenge and Onboarding.

Community Training

Research groups and agencies that contribute datasets or services to NDP have the opportunity to develop dedicated workspaces designed as demos or tutorials. These workspaces act as practical tools to train the broader community on how to effectively access, process, analyze, and visualize their resources. For example, a workspace might guide users through working with a sample dataset, demonstrating data utilization workflows and showcasing techniques such as live streaming analysis or real-time data visualization, helping users fully leverage the contributed resources in their own work.

Key Features of the NDP workspace

Metadata Form

This form helps users provide all the relevant information about their workspace, including a clear description, step-by-step execution instructions for running it in JupyterHub, any prerequisites (for educational purposes), tags to improve discoverability, and links to additional resources or references.

Data from NDP Catalog

The workspace supports the addition of data resources from the NDP data catalog, making it easy for users to find and utilize datasets relevant to their project.

Models Integration

Users can integrate models from open-source platforms like HuggingFace to enrich their workspaces with advanced machine learning capabilities.

GitHub Integration

With GitHub integration, the workspace allows users to connect to external repositories, ensuring that source code, configuration files, and dependencies are easily managed.

JupyterHub

NDP JupyterHub service (hosted by NRP's Nautilus) allows users to develop their workflows in a user-friendly environment.

NDP Widget

The NDP Widget is a JupyterLab extension that provides an interactive interface to access key features of the NDP platform. This widget is accessible from the left menu within the JupyterHub service.

The NDP Widget consists of the following components:

  • File Manager
  • GIT Extension
  • Current Folder: Displays the current folder you are working in, indicating the directory where files will be downloaded or git directories cloned.

  • Select your workspace: Displays a list of your workspaces and educational modules. Selecting a workspace updates the resources displayed in the following sections.

  • Clone GitHub repository into workspace: Displays the GitHub repositories associated with the selected workspace. You can choose a specific repository and clone it to your Current Folder.

  • Install requirements.txt: Installs a requirements.txt file in the current kernel if it exists in the Current Folder. A pip_install.log file is generated, confirming the successful installation of libraries or indicating any installation failures.

Environments are not persistent

Important: In the current version of NDP, installed libraries and packages are not saved. Each time you return, you will need to reinstall the packages from your requirements.txt file before you can start working.

  • Add datasets and resources from workspace: Displays data and resources associated with the selected workspace. You can choose specific resources and download them to your Current Folder. Additionally, you can organize your downloads by selecting the checkbox to automatically create a directory with the name of the dataset, which will contain the downloaded files within your Current Folder.

User_Persistent_Storage

Once JupyterHub is launched, you will notice a _User-Persistent-Storage_ directory. This directory corresponds to your persistent storage. Make sure to save your work in this directory, otherwise it will be lost when you disconnect from your server.

Every user is given a 10GB storage. If you need more space, contact ndp@sdsc.edu.