NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • Data pipelines (ETL pipelines)
  • File uploads
  • Downloading files
  • Importing files

Was this helpful?

  1. User Guides
  2. Data guides

Importing data on Nuvolos

How do you import data onto Nuvolos?

PreviousSetting up a dataset on NuvolosNextA complete database research workflow (Matlab & RStudio)

Last updated 2 years ago

Was this helpful?

This page collects data import use-cases and best practices on how data may be transferred into Nuvolos.

Data pipelines (ETL pipelines)

Data pipelines can be loosely defined as a chain of processes that extract data from a source and store it in a pre-defined format in a target.

Sources could be:

  • Files over the internet that a script collects,

  • Webpages that you ,

  • Data already in a database format.

Targets similarly could be:

  • Flat files in some data format (csv, , etc.)

  • Some structured database format,

  • A data lake (a mix of structured, semi-structured and unstructured data).

In the case of Nuvolos, the targets generally are of three types:

  • The Nuvolos file system (flat file storage types),

  • The Nuvolos data warehouse (database format),

  • .

To run data pipelines on Nuvolos we suggest our users to take advantage of the applications on Nuvolos. You can manually or automatically trigger highly complex and tuneable workflows with the help of Airflow. We suggest our users to get acquainted with Airflow .

To highlight, Airflow offers the user the ability to create a pipeline (defined as a directed acyclic graph) that combines multiple workload types (scripts of various languages, command line operations) and offers administrative tools for monitoring, failover, timetables, etc.

File uploads

There are multiple options to upload files to Nuvolos:

  • Application specific UIs (sometimes more suitable for larger files).

Downloading files

If the data you want to have on Nuvolos is available at some link, there is no need to download to your machine and upload to Nuvolos. You can directly download the data to Nuvolos.

Importing files

You might already have your data in some storage service. Most well-known storage services, including:

  • Azure files

  • Amazon S3

  • Dropbox

  • Google Drive

  • Box

  • Mega

We have provided detailed guidance on some relevant, non-trivial use-cases:

You can combine the Airflow tool with our guide to to create a pipeline that ends up writing Nuvolos tables.

The ,

In every Nuvolos application we have made the wget command line tool available with which you can download files from the internet. See its documentation .

are supported to be mounted (to be attached as a folder) to your apps after a short setup procedure. The list of supported types can be found . In order to see the exact procedure, consult our documentation .

scrape
parquet
Large-file storage
Airflow
best practices
upload data
File UI
here
here
here
Dropbox
SSHFS
SharePoint Online