NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • Overview
  • Setting up OpenMetaData
  • Adding an Azure File Share storage:
  • Adding an ingestion pipeline for Azure File Share
  • Running the ingestion pipeline

Was this helpful?

  1. User Guides
  2. Application-specific guides

OpenMetaData

OpenMetaData with Azure File Share support is available on Nuvolos.

PreviousDNDCv.CANNextUploading data to the Large File Storage

Last updated 1 year ago

Was this helpful?

Overview

OpenMetaData is an end-to-end metadata management platform that enables unlocking the value of data assets in common use cases of data discovery and governance, as well as emerging use cases related to data quality, observability, and people collaboration.

OpenMetaData on Nuvolos supports the ingestion of files stored on Azure File Shares, which allows you to track operations performed on files stored in Azure File Shares.

Setting up OpenMetaData

Add a new OpenMetaData application to your working instance in Nuvolos:

OpenMetaData runs in a VSCode application on Nuvolos, along with a pre-installed Airflow application, which executes the ingestion workflows created by OpenMetaData.

Starting your application

Once you have added the OpenMetaData application to your Nuvolos instance, start your application.

After a couple of minutes, you should see an initialization screen:

This initialization can take a few minutes upon the first start of a new application as both the OpenMetaData and Airflow databases need to be set up in the background.

Once the application starts, you will see a VSCode interface:

VSCode is used so the Airflow interface can also be accessed and DAGs be created/refined as necessary. You can also install additional packages via the built-in Terminal.

Opening OpenMetaData

OpenMetaData opens in a new tab in VSCode:

Click on the Sign in with Auth0 button to log in to OpenMetaData. On the first start, a new user will be created for you.

If you are an administrator in your Nuvolos space, your OpenMetaData user will be an administrator within the OpenMetaData application. If you are not an administrator in the Nuvolos space, a non-privileged OpenMetaData user will be created.

OpenMetaData checks for administrators only on the first start of the application, if you have been granted Nuvolos space administrator privileges after the application was started, you will need to ask your co-admins to grant you admin roles in OpenMetaData.

Adding an Azure File Share storage:

Click on Settings -> Storages -> Add New Service and select AZFS from the available storage services:

Click Test Connection to test whether the credentials can be used to access the Azure File Share:

Adding an ingestion pipeline for Azure File Share

You can create an ingestion pipeline to create OpenMetaData containers and objects from folders and files in Azure File Share. The pipeline is an Airflow DAG, created and managed by OpenMetaData.

To create an ingestion pipeline, edit your new AZFS (Azure File Share) storage and click Add Metadata Ingestion on the Ingestions tab:

You can name your ingestion pipeline if you wish. You need to choose Storage Metadata Config AZFS as the value for Storage Metadata Config Service and provide the connection string for the Azure File Share and the name of the file share:

In the next step, you can specify the schedule for the ingestion pipeline:

Once the schedule is defined, click Add & Deploy to create the ingestion pipeline in Airflow:

Running the ingestion pipeline

Click Run to execute the ingestion pipeline on demand:

You can see the logs from Airflow by clicking on the Logs link.

Viewing the Airflow DAG

You can open Airflow with the Airflow: Show Airflow VSCode command:

Checking the newly ingested metadata:

You can check the newly ingested metadata in the Explore -> Containers view:

To show OpenMetaData, open the and issue the OpenMetaData: Show OpenMetadata command:

Give a name to your storage service and obtain the name of the Azure File Share to be used and a (the credential) that provides read access to the Azure File Share:

Command Palette
connection string