Comment on page
Importing data on Nuvolos
How do you import data onto Nuvolos?
This page collects data import use-cases and best practices on how data may be transferred into Nuvolos.
Data pipelines can be loosely defined as a chain of processes that extract data from a source and store it in a pre-defined format in a target.
Sources could be:
- Files over the internet that a script collects,
- Data already in a database format.
Targets similarly could be:
- Some structured database format,
- A data lake (a mix of structured, semi-structured and unstructured data).
In the case of Nuvolos, the targets generally are of three types:
- The Nuvolos file system (flat file storage types),
- The Nuvolos data warehouse (database format),
To run data pipelines on Nuvolos we suggest our users to take advantage of the Airflow applications on Nuvolos. You can manually or automatically trigger highly complex and tuneable workflows with the help of Airflow. We suggest our users to get acquainted with Airflow best practices.
To highlight, Airflow offers the user the ability to create a pipeline (defined as a directed acyclic graph) that combines multiple workload types (scripts of various languages, command line operations) and offers administrative tools for monitoring, failover, timetables, etc.
You can combine the Airflow tool with our guide to upload data to create a pipeline that ends up writing Nuvolos tables.
There are multiple options to upload files to Nuvolos:
- Application specific UIs (sometimes more suitable for larger files).
If the data you want to have on Nuvolos is available at some link, there is no need to download to your machine and upload to Nuvolos. You can directly download the data to Nuvolos.
In every Nuvolos application we have made the
wget
command line tool available with which you can download files from the internet. See its documentation here.You might already have your data in some storage service. Most well-known storage services, including:
- Azure files
- Amazon S3
- Dropbox
- Google Drive
- Box
- Mega
We have provided detailed guidance on some relevant, non-trivial use-cases: