NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • 1. Setting up a dataset space
  • 2. Create your dataset
  • 3. Distribute your data to the dataset space
  • 4. Create a snapshot and name it
  • How public datasets work

Was this helpful?

  1. User Guides
  2. Data guides

Setting up a dataset on Nuvolos

PreviousData guidesNextImporting data on Nuvolos

Last updated 2 years ago

Was this helpful?

This page explains how to create datasets with specific access control settings for your team on Nuvolos.

As a short summary, it is possible to create datasets on Nuvolos:

  • That are visible to all users in an organisation,

  • that are visible only to faculty users of an organisation,

  • or just visible to invited users.

1. Setting up a dataset space

Required role: organization faculty

As a first step, you need to set up a dataset space. Any organization faculty user can do this, by creating the "New dataset" space creation.

Dataset spaces are special: you cannot run applications in them. The best way to populate a dataset space is to distribute to it.

During the creation, you have the option to set the visibility of the space:

  • A private space is visible to users explicitly invited to the space (the default behaviour).

  • A faculty-only space is visible to all organisation faculty users.

  • A public space is visible to all users. Users still need to request access to use the data in a public space but they are made aware of the space's existence.

Notice that the visibility options can be expanded with the toggle on the space creation screen.

2. Create your dataset

Required role: organization faculty / space administrator in existing space

Dataset spaces hold static information. In order to generate the dataset, we suggest you set up a regular research space where you execute a data pipeline and perform analytical and transformative steps to arrive at the final state of data you want to then store.

In order to see what tools are available for data pipelines, please refer to this guide.

3. Distribute your data to the dataset space

Required role: editor in appropriate instance of dataset space

Once your pipeline is finished, the artefacts you want to store are available. Make sure to distribute your data (either tables, or files or a combination of the two) to the dataset space. You may want to distribute an app as well which contains a blueprint or a software library that facilitates interaction with your data - however the app will not be able to run in the dataset space.

If you are doing regular updates to the dataset, we suggest cleaning up the current state before distributing to make sure that the next data vintage is completely clean from previous artefacts.

4. Create a snapshot and name it

Required role: editor in appropriate instance of dataset space

Once the distribution is completed, create a new snapshot of the dataset space. We generally suggest to create a named snapshot with a full description of the circumstances of the snapshot creation. Datasets generated by the Nuvolos team always name these snapshots vintages to highlight the fact that the same dataset may evolve during time.

How public datasets work

Public datasets are visible to all members of an organisation. This does not imply immediate access to the contents of the public dataset space, users are granted the "observer" role.

In order to gain viewer access to the dataset space, users need to request access. This can be done by navigating to a public space and requesting the viewer role:

Once the request is submitted, the manager of the organisation needs to review and accept the request.

Create new dataset
Choose visibility of the space