NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • Accessing data from applications running in Nuvolos
  • Accessing data tables from Python in Nuvolos
  • Accessing data tables from R in Nuvolos
  • Accessing data tables from Stata in Nuvolos
  • Accessing data tables from Matlab in Nuvolos
  • Accessing data tables from external, non-Nuvolos applications
  • Connecting with R
  • Connecting with Python
  • Connecting with Stata
  • Connecting with Matlab
  • Connecting with Excel
  • Canceling queries
  • List running queries
  • Cancel a particular query
  • Cancel all running queries
  • Exporting big data
  • Export steps
  • Matlab example

Was this helpful?

  1. Features
  2. Database integration

Access data from applications

PreviousUpload dataNextSet up ODBC drivers

Last updated 3 months ago

Was this helpful?

You can easily access your data tables stored in the Scientific Data Warehouse (SDW) from either in-Nuvolos applications or external, non-Nuvolos applications.

Make sure to upgrade to the latest connectors in your existing Nuvolos apps by 2025-03-31 to avoid interruptions to SDW access.

Accessing data from applications running in Nuvolos

Currently, data access from Nuvolos is supported in the following applications: Python (Spyder, JupyterLab and VS Code), RStudio, Matlab and Stata.

Accessing data tables from Python in Nuvolos

If you want to use Nuvolos-hosted Python (via JupyterLab or Spyder), the data access will be simple:

  1. Make sure you have the data available.

  2. Run your application.

  3. Inside your app, you will need to use the nuvolos-odbc developed by Alphacruncher, which is pre-installed in the Jupyter application.

Usage example:

from nuvolos import get_connection
import pandas as pd

con = get_connection()
df = pd.read_sql("SELECT * FROM table", con=con)

Stopping queries from Python

df = pd.read_sql("<SQL_COMMAND>", con=con)

Upgade instructions for nuvolos-odbc

You can use pip to upgrade nuvolos-odbc in existing applications:

pip install --upgrade nuvolos-odbc

Accessing data tables from R in Nuvolos

If you want to use Nuvolos-hosted R (via RStudio), the data access will be simple:

  1. Make sure you have the data available.

  2. Run your application.

Usage example:

con <- nuvolos::get_connection()
result_data <- dbGetQuery(con,"SELECT * FROM table LIMIT 10")

Stopping queries from R

result_data <- dbGetQuery(con,"<SQL_COMMAND>")

Accessing data tables from Stata in Nuvolos

If you want to use Nuvolos-hosted Stata, the data access is greatly simplified. Nuvolos has its own sysprofile.do that automatically sets you up with access parameters. Stata communicates with the database using odbc , so you will need to issue the following command to load the query:

odbc load, exec(`"SELECT * FROM "table" LIMIT 10"') connectionstring($conn_str)

Stopping queries from Stata

odbc load, exec(`"<SQL_COMMAND>"') connectionstring($conn_str)

Accessing data tables from Matlab in Nuvolos

First test your query in the Tables view of your space (or create a new query by using our query builder form). Then select the option Run from application > Matlab when looking at the query results preview to see the instructions on executing the given query inside Matlab on Nuvolos.

conn = get_connection();
result_data = select(conn,'SELECT * FROM TABLE_NAME LIMIT 10');

In this example, result_data will be of the table type, and thus column names will be also available for the programmer as part of the return data structure.

Accessing data tables from external, non-Nuvolos applications

Connecting with R

First, please install the Nuvolos r-connector package developed for Nuvolos:

options(repos = "https://cran.rstudio.com")
install.packages("remotes")
remotes::install_github("nuvolos-cloud/r-connector")

Finally, pass the SQL statement, the database name and schema name to theread_sql() function:

result_data <- nuvolos::read_sql("SELECT * FROM \"TABLE\" LIMIT 10", dbname = "my_database", schemaname= "my_schema")

Attention: you need to follow the quotation approach as the example code. If the table name is case insensitive, it can be referred as table or \"TABLE\". If the table name is case sensitive (containing upper-and lowercase letters or special characters), quotation is needed. For example: \"Table\".

In case you need to re-input your credentials, please execute the command below in R's console.

nuvolos::input_nuvolos_credential()

Stopping queries from R

result_data <- novolos::execute("<SQL_COMMAND>", dbname = "my_database", schemaname= "my_schema")

Connecting with Python

First, install the nuvolos package developed for Nuvolos:

pip install --upgrade nuvolos

Since version 0.4.0 of the nuvolos package, installing the ODBC driver is no longer required.

Next, obtain access tokens and database/schema names from the Connection Guide on the Nuvolos Tables interface of the instance you wish to access:

from nuvolos import get_connection
import pandas as pd

con = get_connection(dbname = "dbname", schemaname="schemaname")
df = pd.read_sql("SELECT * FROM table", con=con)

This will trigger the request to access your credential if it is the first time to access the Nuvolos database. Please input your local computer's password to allow the Python connector to read your Nuvolos credential.

In case you need to re-input your credentials, please execute the command below in Python's console.

from nuvolos import input_nuvolos_credential
input_nuvolos_credential()

Stopping queries from Python

df = pd.read_sql("<SQL_COMMAND>", con=con)

Special Notes for Win10

If you meet the error below:

RuntimeError: The current Numpy installation (...) fails to pass a sanity check due to a bug in the windows runtime...

Please install an older version of NumPy from a terminal to solve this. It is a temporary solution specifically for windows 10.

pip install numpy==1.19.3

Connecting with Stata

Accessing data from out-of-Nuvolos Stata applications consists of the following steps:

  1. Install the Snowflake ODBC driver.

  2. Establish a connection.

To simplify work, we suggest that you save your connection parameters to global macros and finally create a connection string as a global macro. On Nuvolos, this is part of the sysprofile.do file of the application.

You should only add these macros to your profile.do or sysprofile.do if you are going to work only in one single instance and state/snapshot.

set odbcmgr unixodbc
global user "<username>"
global dbpwd "<snowflake_access_token>"
global dbpath_db `"<database_name>"'
global dbpath_schema `"<schema_name>"'
global conn_str `"DRIVER=SnowflakeDSIIDriver;SERVER=alphacruncher.eu-central-1.snowflakecomputing.com;DATABASE=$dbpath_db;SCHEMA=$dbpath_schema;UID=$user;PWD=$dbpwd"'

You can then access data similar to if you were using Nuvolos:

odbc load, exec(`"SELECT * FROM "table" LIMIT 10"') connectionstring($conn_str)

Stopping queries from Stata

odbc load, exec(`"<SQL_COMMAND>"') connectionstring($conn_str)

Connecting with Matlab

conn = get_connection("dbname", "schemaname");
dataset = select(conn, "SELECT * FROM my_table");

If you need to correct or change your credential, you can use the command below to input your credentials again.

create_credential(true)

Connecting with Excel

Windows

2. In the Windows File Explorer, navigate to the download location and right-click theSnowflakeExcelAddin.xlam file, click Properties, and tick "Unblock":

3. Open Excel. The add-in needs to be installed before it can be used:

  • Click the "Browse" button and navigate to the folder with the unzipped "Excelerator" package.

  • Choose the file SnowflakeExcelAddin.xlam file, click "OK"

4. You can find the installed "Excelerator" add-in on the Home tab of the Ribbon.

5. Click the "connect" button to open the connection window:

7. Click the "Query" button in the add-in. It shows the SQL execution window to query data. Please select the target "Database", "Schema" first, then the add-in will load all the available tables in "Table". The default is to query all columns unless the user click "Choose" to select a few target columns only. After entering the SQL command in the window below and clicking "Execute", the results will be inserted as a table to the active Excel sheet.

Attention: the SQL command must quote the table as the way of "database-name"."schema-name"."table-name".

Uploading Data

Table data can be updated or a new table can be created with sheet data using the Excelator add-in.

To upload data to Nuvolos, please perform the following steps:

  1. Click on the cell of the first row and the first column of the data you wish to upload.

3. Once you have the data types selected, click "Upload":

4. Select the upload action to be taken: An existing table can be updated, appended to, or replaced with the data in the sheet. Alternatively, you can tick "Create new table then upload", then enter the name of the new table. Click "Upload" to upload your data.

Mac OS

  1. After installation of the ODBC driver, please open a terminal and use the command below to copy all the folders and files created from downloading the Snowflake ODBC driver (typically the /opt/snowflake/ path) to the /Library/ODBC/ODBCDataSources folder:

    ~$ sudo cp -r /opt/snowflake/ /Library/ODBC/ODBCDataSources
  2. The user has to manually update the Snowflake ODBC Driver's file at /Library/ODBC/ODBCDataSources/snowflakeodbc/lib/universal/simba.snowflake.ini. After navigate and open this file, please change two lines below in the file to remap the new locations of its associated files:

ErrorMessagesPath = /Library/ODBC/ODBCDataSources/snowflakeodbc/ErrorMessages
CABundleFile = /Library/ODBC/ODBCDataSources/snowflakeodbc/lib/universal/cacert.pem

After the changes, the file should look like below, and please save the file.

3. Run the command below to open the application with administrator rights, otherwise, it will show "General installer error" in the later stage.

sudo /Applications/iODBC/iODBC\ Administrator64.app/Contents/MacOS/iODBC\ Administrator64

4. Click on the ODBC Drivers tab, and check if the Snowflake driver is already present. If it is, verify that the file path is /Library/ODBC/ODBCDataSources/snowflakeodbc/lib/universal/Snowflake.dylib. If it is not, click Add a driver.

  • In the field Description of the driver, type a name for the driver, such as "SnowflakeODBC".

  • In the field Driver file name, click Browse and navigate to the driver file libSnowflake.dylib in the /Library/ODBC/ODBCDataSources/snowflakeodbc/lib/universal/ folder.

  • Click Ok.

  • Under the System DSN tab, click Add. A dialog opens.

  • Select the Snowflake driver you added. The DSN configuration window opens.

  • Enter a unique DSN name for your Snowflake connection, such as "SnowflakeExcel".

  • Click OK to save settings.

5. Launch Microsoft Excel. Go to Data > New Database Query > From Database.

6. With the iODBC Data Source Chooser window open, switch to the System DSN tab and select the DSN created, and hit OK.

8. The Microsoft Query window opens.

  • Type the desired SQL statement, and click Run.

  • Click Return Data to import the results to the spreadsheet.

  • Attention: the SQL command must quote the table as the way of "database-name"."schema-name"."table-name".

Canceling queries

Running queries can be listed and (selectively) canceled using SQL statements.

List running queries

You can check how many running queries you have with the command below. You'll need to substitute your Nuvolos username into the <USERNAME> placeholder. You can find out your Nuvolos username in the Profile page on the Nuvolos Web interface.

SELECT USER_NAME, QUERY_ID, SESSION_ID, QUERY_TEXT, START_TIME FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER(USER_NAME=>'<USERNAME>')) WHERE EXECUTION_STATUS = 'RUNNING'

Cancel a particular query

Once you know the query id, you can cancel a specific query with

SELECT SYSTEM$CANCEL_QUERY('<QUERY_ID>')

Cancel all running queries

Alternatively, you can cancel all your currently running queries with the command

ALTER USER <USERNAME> ABORT ALL QUERIES

Exporting big data

Parquet files offer high data compression and significantly faster read times compared to CSV files and are supported by most scientific applications (Matlab, Pandas, R).

Export steps

The general steps that need to be performed are the following:

  1. Create a staging area in the database where the database engine will export the query results as a Parquet file,

  2. Run the query and export the data,

  3. Retrieve the exported data from the staging area in the database,

  4. Drop the remote staging area.

This translates to the following SQL, assuming the compressed data will be less, than 5GB:

CREATE OR REPLACE FILE FORMAT PARQUET_FORMAT TYPE = PARQUET COMPRESSION = SNAPPY;
CREATE OR REPLACE STAGE PARQUET_STAGE FILE_FORMAT=PARQUET_FORMAT;

COPY INTO @PARQUET_STAGE/orders.parquet
FROM (SELECT 
    O_ORDERKEY, 
    O_CUSTKEY, 
    O_TOTALPRICE, 
    O_ORDERDATE, 
    O_ORDERPRIORITY, 
    O_CLERK, 
    O_SHIPPRIORITY, 
    O_COMMENT 
FROM ORDERS)
HEADER = TRUE
OVERWRITE = TRUE
SINGLE = TRUE
MAX_FILE_SIZE = 5000000000;


GET @PARQUET_STAGE file:///files/query_results/ PATTERN='orders.parquet' PARALLEL=4;

DROP STAGE PARQUET_STAGE;
DROP FILE FORMAT PARQUET_FORMAT;

If you anticipate that the result set of your query would be larger than 5GB compressed (~50M rows), you will need to split the exported data into multiple Parquet files. This way the export will be faster, however, as the database engine can parallelize the data export:

CREATE OR REPLACE FILE FORMAT PARQUET_FORMAT TYPE = PARQUET COMPRESSION = SNAPPY;
CREATE OR REPLACE STAGE PARQUET_STAGE FILE_FORMAT=PARQUET_FORMAT;

COPY INTO @PARQUET_STAGE/orders_
FROM (SELECT 
    O_ORDERKEY, 
    O_CUSTKEY, 
    O_TOTALPRICE, 
    O_ORDERDATE, 
    O_ORDERPRIORITY, 
    O_CLERK, 
    O_SHIPPRIORITY, 
    O_COMMENT 
FROM ORDERS)
HEADER = TRUE
OVERWRITE = TRUE
SINGLE = FALSE
MAX_FILE_SIZE = 5000000000;


GET @PARQUET_STAGE file:///files/query_results/ PATTERN='orders_*' PARALLEL=4;

DROP STAGE PARQUET_STAGE;
DROP FILE FORMAT PARQUET_FORMAT;

Matlab example

You can download data as Parquet files and read them with Matlab as the following example demonstrates, which retrieves and loads the ORDERS table containing 15 million records.

conn = get_connection();
execute(conn, 'CREATE OR REPLACE FILE FORMAT PARQUET_FORMAT TYPE = PARQUET COMPRESSION = SNAPPY;');
execute(conn, 'CREATE OR REPLACE STAGE PARQUET_STAGE FILE_FORMAT=PARQUET_FORMAT;');
execute(conn, sprintf([ ... 
'COPY INTO @PARQUET_STAGE/orders.parquet '...
'FROM FROM (SELECT  '...
'    O_ORDERKEY::INT,  '...
'    O_CUSTKEY::INT,  '...
'    O_TOTALPRICE::DOUBLE,  '... 
'    O_ORDERDATE,  '...
'    O_ORDERPRIORITY,  '...
'    O_CLERK,  '...
'    O_SHIPPRIORITY::INT,  '...
'    O_COMMENT  '...
FROM ORDERS) '...
'HEADER = TRUE '...
'OVERWRITE = TRUE '...
'SINGLE = TRUE '...
'MAX_FILE_SIZE = 5000000000; '...
]));


try
    % This line will fail with an error currently, however, the file will be retrieved.
    execute(conn, 'GET @PARQUET_STAGE/orders.parquet file:///files/query_results/ PARALLEL=4 ;');
catch
    warning('Data has been retrieved');
end

% Please run these clean-up statements after the error.
execute(conn, 'DROP STAGE PARQUET_STAGE;');
execute(conn, 'DROP FILE FORMAT PARQUET_FORMAT;');

info = parquetinfo('/files/query_results/orders.parquet')
T = parquetread('/files/query_results/orders.parquet')

Please refer to the section for the available SQL commands. You can execute them as

Inside RStudio, you will need to use the developed by Alphacruncher.

Please refer to the section for the available SQL commands. You can execute them as

Please refer to the section for the available SQL commands. You can execute them as

We suggest using the select statement of Matlab as it provides results in the data type. For example:

Next, and database/schema names from the Connection Guide on the Nuvolos Tables interface of the instance you wish to access:

Connection Guide

Credentials: When you connect to the Nuvolos database for the first time, it will ask for your credentials. Check "Remember with keyring" box to avoid your future input. You can find your credentials following the . You don't need to write your credentials explicitly in your scripts, and the connector can safely access your token during the connection process.

Please refer to the section for the available SQL commands. You can execute them as

Finally, pass the database and schema names specified in the to the get_connection() function:

Credentials: When you connect to the Nuvolos database for the first time, it will ask for your credentials. You can find your credentials following the . You don't need to write your credentials explicitly in your scripts, and the connector can safely access your token during the connection process.

Please refer to the section for the available SQL commands. You can execute them as

and database/schema names from the Connection Guide on the Nuvolos tables interface.

To set up your access parameters, issue the following commands. These have to be issued only once. The values for username and snowflake_access_token can be obtained , and for database_name and schema_name, follow instructions .

Please refer to the section for the available SQL commands. You can execute them as

First, please download and install the toolbox developed for Nuvolos. You can also click "Get Add-ons", search "nuvolos" in the Matlab Add-on Explorer, and then “Add" in your toolbox.

Next, and database/schema names from the Connection Guide on the Nuvolos Tables interface of the instance you wish to access:

Finally, pass the database and schema names specified in the to the get_connection() function:

For your credential's safety consideration, the Matlab connector will display the login dialog for the first time connecting to the Nuvolos database. Please find your credentials following the .

For both Windows and Mac OS, please first install the for your platform, which is required to access the Nuvolos database service. You only need to satisfy the prerequisites and finish the ODBC driver installation (first step). You don't need to further configure and test the driver.

1. After installation of ODBC driver, please download the "" package to local and unzip it into a local folder.

Navigate to the add-in management window following "File"⇒\Rightarrow⇒"Option"⇒\Rightarrow⇒"Add-ins"⇒\Rightarrow⇒ "Go...".

For more detailed Excel add-in installation and management, please refer to this .

6. Under the "User & Password" authentication type, please enter the appropriate values "server URL", "User ID" and "Password" to login. You can find the required and database/schema names from the Connection Guide on the "Tables" page of your Nuvolos space. Note: "Hostname" is the "server URL".

Click "Define Data Types" on the Home ribbon. This will add a row above the first row, where you will need to select the appropriate data types for the data columns you wish to upload. Unfortunately, the Add-In cannot automatically infer the data types for the columns:

In the keyword section, click the "+" button at the left bottom to add keyword and value pairs of "server", "port", "database" and "schema". The user can from the Connection Guide on the interface of the Nuvolos "Tables".

7. Enter your Snowflake username and password. The user can from the Connection Guide on the interface of the Nuvolos "Tables".

The user can from the Connection Guide on the interface of the Nuvolos "Tables".

If your work requires frequently reading a large amount of tabular data from Nuvolos tables, in cluster batch jobs, for example, or you wish to export the result of a query that exceeds 1 million rows, it is recommended to retrieve the data locally as compressed data files.

Please note that Matlab requires integer columns to be explicitly cast to an integer field in order to have the logical data type in the Parquet file be INT. Each field in a Parquet file has a physical type (INT64, DOUBLE, BYTE, etc) and a logical type telling the processing application how to interpret the data saved in the field. Please refer to the .

Python library
r-connector
table
obtain access tokens
connection guide
Connection Guide
connection guide
Obtain access tokens
following these instructions
here
nuvolos
obtain access tokens
Connection Guide
Connection Guide
Snowflake ODBC database driver
Excelerator
guide
access tokens
obtain these values and access tokens
obtain access tokens
obtain the database, schema, and table names
Parquet
Matlab documentation on the supported Parquet logical types
cancelling queries
cancelling queries
cancelling queries
cancelling queries
cancelling queries
cancelling queries