Importing data on Nuvolos
How do you import data onto Nuvolos?
This page collects data import use-cases and best practices on how data may be transferred into Nuvolos.
Data pipelines (ETL pipelines)
Data pipelines can be loosely defined as a chain of processes that extract data from a source and store it in a pre-defined format in a target.
Sources could be:
Files over the internet that a script collects,
Webpages that you scrape,
Data already in a database format.
Targets similarly could be:
Flat files in some data format (csv, parquet, etc.)
Some structured database format,
A data lake (a mix of structured, semi-structured and unstructured data).
In the case of Nuvolos, the targets generally are of three types:
The Nuvolos file system (flat file storage types),
The Nuvolos data warehouse (database format),
To run data pipelines on Nuvolos we suggest our users to take advantage of the Airflow applications on Nuvolos. You can manually or automatically trigger highly complex and tuneable workflows with the help of Airflow. We suggest our users to get acquainted with Airflow best practices.
To highlight, Airflow offers the user the ability to create a pipeline (defined as a directed acyclic graph) that combines multiple workload types (scripts of various languages, command line operations) and offers administrative tools for monitoring, failover, timetables, etc.
You can combine the Airflow tool with our guide to upload data to create a pipeline that ends up writing Nuvolos tables.
File uploads
There are multiple options to upload files to Nuvolos:
The File UI,
Application specific UIs (sometimes more suitable for larger files).
Downloading files
If the data you want to have on Nuvolos is available at some link, there is no need to download to your machine and upload to Nuvolos. You can directly download the data to Nuvolos.
In every Nuvolos application we have made the wget
command line tool available with which you can download files from the internet. See its documentation here.
Importing files
You might already have your data in some storage service. Most well-known storage services, including:
Azure files
Amazon S3
Dropbox
Google Drive
Box
Mega
are supported to be mounted (to be attached as a folder) to your apps after a short setup procedure. The list of supported types can be found here. In order to see the exact procedure, consult our documentation here.
We have provided detailed guidance on some relevant, non-trivial use-cases:
Last updated