Zero-copy data sharing (virtualization)

Edited

Virtualization allows you to make a resource explorable in your marketplace without having to copy the underlying records into Huwise. The records remain stored in the source system, and the actions made to explore it are performed on the source.

This is useful when you want to expose data in Huwise while avoiding duplication on the platform.

This feature is an add-on available to Advanced and Ultimate clients. It is currently available for Snowflake and Databricks sources. Contact your customer success manager for more information.

How zero-copy assets work

A zero-copy, or virtualized, dataset is an asset in Huwise whose records remain stored in the source system. Because exploration queries are sent directly to the source system, zero-copy datasets remain aligned with the source data without requiring duplication.

Like other datasets, zero-copy datasets have dedicated asset pages in the marketplace where users can explore both the data and schema. And like any other asset, zero-copy assets include metadata and allow you to configure who can access them.

Note that zero-copy assets maintain the same security level as with non-virtualized datasets and we apply standard sanitization techniques to the real-time queries that are made.

Zero-copy datasets currently support:

  • Full-text search

  • Filtering

  • Column sorting

  • Result counting

  • Pagination

  • Schema retrieval

Some features available on copied datasets are not currently available on zero-copy datasets, though the experience is evolving. At the moment, zero-copy datasets do not support Explore by Query, AI insights, export, the API console, custom views, or Studio pages.

How to create zero-copy datasets

Zero-copy, or virtualized, datasets are created in bulk through harvesters:

  1. In your back office, go to Catalog > Harvesters and click Add harvester

  2. Select the relevant harvester and enter a name

  3. Configure the connection and the required source settings

  4. Review the available options and metadata configuration

  5. Click Preview to verify the result

  6. Click Start harvester

  7. Review the created assets and click Publish to make the assets available

Remember, as with other harvesters, once created assets are not automatically published. This allows you to review them before making them available in the marketplace.

Configuring specific harvesters

The Snowflake and Databricks harvesters allow you to mass-import assets and virtualize them.

Remember, while harvesters create multiple zero-copy assets, the Snowflake and Databricks connectors copy individual imported assets.

The configuration of these harvesters is familiar to those used to configuring their corresponding connectors.

Snowflake harvester

See here for more detail about configuring the different parameters.

Databricks harvester

See here for more detail about configuring the different parameters.