Databricks Connect
Nuvolos now offers a VSCode application with Python 3.9 and R 4.2 and Databricks Connect (databrics-connect
) pre-installed. From this application, you can submit Spark jobs to Databrics-hosted Spark clusters.
PySpark and sparklyR are both installed in the application.
Prerequisites
Databricks Connect only supports Databricks clusters with versions up to 10.4 LTS.
To configure the connection to Databricks, you will need a personal access token, which is not available in the Databricks Community Edition.
First, create a "Databricks 10.4 LTS + Py39 + R 4.2" application in Nuvolos:
Start the new application and open a terminal and configure your Databricks connection with the databricks-connect configure
command. You will need the URL of your Databricks cluster and your personal access token.
You can test your connection with the command databricks-connect test
Python example
To run the example, please install the slugify
Python package with the following command:
conda install -y -c conda-forge python-slugify
Once you have configured the Databricks connection, you can try the following simple example to create a Databricks table and run a SQL query on the table:
You will see a result like:
R example
The sparklyr
package is pre-installed in the application which allows you to connect to Databricks Spark clusters, configured with databricks-connect.
You can run the following R script example to run a simple job on your Databricks cluster:
You should see an output like:
Last updated