NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • Matlab workflow
  • Querying relevant data
  • The simple analysis
  • Storing results in the database
  • RStudio workflow
  • Querying relevant data
  • The simple analysis
  • Storing results in the database

Was this helpful?

  1. User Guides
  2. Data guides

A complete database research workflow (Matlab & RStudio)

PreviousImporting data on NuvolosNextAccessing data as data.frames in R

Last updated 2 years ago

Was this helpful?

Nuvolos supports a query-analyze-insert cycle to help researchers query, analyse and store data safely in a simple, integrated workflow. The workflow is demonstrated in two languages, but can be extended to any languages supported by Nuvolos.

Matlab workflow

A standard skeleton scientific workflow in Matlab can be broken down into three main steps:

  1. Query research-relevant data

  2. Analyse, transform or otherwise manipulate data

  3. Store results

Querying relevant data

For the mock example, we are going to work with the Fama-French factor set that is available for our Demo user, we will be focusing on the 5-factor table. The NORTH_AMERICA_5_FACTORS table has been .

After opening the Matlab application, the following bit of code will return the entire database table as a Matlab table-type object. The query we are executing is a merge of a monthly stock series table (for Apple monthly stock prices) and the Fama-French factor table as follows.

The SQL query:

SELECT NAF.*, SM.MPRC, SM.MRET*100, SM.MTCAP 
FROM NORTH_AMERICA_5_FACTORS NAF 
INNER JOIN TIME_SERIES_MONTHLY SM 
ON SM.MCALDT = NAF.DATE 
WHERE KYPERMNO = 14593

The code that executes the query, the above string is saved in query_string.

con = get_connection();
dataset_factor = select(con, query_string);

The simple analysis

The previous step resulted in dataset_factor containing a Matlab Table object that holds the data. The fitlm method fits a linear regression on the table with an R-style formula. We then write back the fitted values to the Table as a column.

dataset_factor.EXCESS_RETURN = dataset_factor.SM_MRET_100 - dataset_factor.RF;
mod = fitlm(dataset_factor,'EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA')
dataset_factor.FIT_FACTOR_5 = mod.Fitted;

Storing results in the database

As a final step, we write back the results using the data upload command for Matlab:

sqlwrite(con,'APPLE_5FACTOR_FIT',dataset_factor);

As an end result, we can then find the table on the Nuvolos UI:

RStudio workflow

A standard skeleton scientific workflow in RStudio can be broken down into three main steps:

  1. Query research-relevant data

  2. Analyse, transform or otherwise manipulate data

  3. Store results

It is possible to create more complext workflows, but it will usually consist of three-step modules such as above.

Querying relevant data

After opening the RStudio application, the following bit of code will return the entire database table as an R data frame object. The query we are executing is a merge of a monthly stock series table (for Apple monthly stock prices) and the Fama-French factor table as follows.

The SQL query:

SELECT NAF.*, SM.MPRC, SM.MRET*100 AS SM_MRET_100, SM.MTCAP 
FROM NORTH_AMERICA_5_FACTORS NAF 
INNER JOIN TIME_SERIES_MONTHLY SM 
ON SM.MCALDT = NAF.DATE 
WHERE KYPERMNO = 14593

The code that executes the query, the above string is saved in query_string.

conn <- nuvolos::get_connection()
dataset_factor <- dbGetQuery(conn, query_string)

The simple analysis

The previous step resulted in dataset_factor containing an R data.frame object that holds the data. The lm method fits a linear regression on the data frame. We put the fitted values to the data frame.

dataset_factor$EXCESS_RETURN <- dataset_factor$SM_MRET_100 - dataset_factor$RF
mod <- lm(EXCESS_RETURN ~ (-1) + MKT_RF + SMB + HML + RMW + CMA, dataset_factor)
dataset_factor$FIT_FACTOR_5 <- mod$fitted.values

Storing results in the database

As a final step, we write back the results using the data upload command for R:

DBI::dbWriteTable(conn, name="APPLE_5FACTOR_FIT", value=dataset_factor, batch_rows = 10000)

For the mock example, we are going to work with the Fama-French factor set that is available for our Demo user, we will be focusing on the 5-factor table. The NORTH_AMERICA_5_FACTORS .

North America
North America
distributed to the instance we are working in
table has been distributed to the instance we are working in