PolyBase use cases clarified

I previously talked about PolyBase and its enhancements (see PASS Summit Announcements: PolyBase enhancements). There is some confusion on PolyBase use cases as they are different depending on whether you are using PolyBase with Azure SQL Data Warehouse (SQL DW) or SQL Server 2016, as well as the sources you are using it against. The three main use cases for using PolyBase are: Loading data, federating querying, and aging out data. Here is the support for those three uses cases in SQL DW and SQL Server 2016:

PolyBase in:	Parallelize Data Load (Blob and ADLS)	Federated Query (push down) HDInsights	Federated Query (push down) HDP/Cloudera (local or blob)	Federated Query (push down) five new sources*	Age Out Data
SQL DW	Yes	N/A	N/A	No support for on-prem sources	Maybe
SQL Server 2016	Yes via scale-out groups. Blob, not ADLS	N	Y (Creates MapReduce job)	Y	Maybe

* = Teradata, Oracle, SQL Server, MongoDB, generic ODBC (Spark, Hive, Impala, DB2)

For federated queries: “N” requires all data from the source to be copied into SQL Server 2016 and then filtered. For “Y”, the query is pushed down into the data source and only the results are returned back, which can be much faster for large amounts of data.

I mention “Maybe” for age out data in SQL DW as you can use PolyBase to access the aged-out data in blob or Azure Data Lake Storage (ADLS), but it will have to import all the data so may have slower performance (which is usually ok for accessing data that is aged-out). For SQL Server 2016, it will have to import the data unless you use HDP/Cloudera, in which case the creation of the MapReduce job will add overhead.

Here are details on what PolyBase supports for each product:

PolyBase (works with)	Azure Blob Store (WASB)	Push Down	Azure Data Lake Store (ADLS)	Push Down	HDI	Push Down	Cloudera (CDH)	Push Down	Horton Works (HDP)	Push Down
SQL 2016	Yes	N/A	No	N/A	No	No	Yes	Yes	Yes	Yes
Azure SQL DW	Yes	N/A	Yes	N/A	No	No	No	No	No	No
APS	Yes	N/A	No	N/A	Yes	Yes (internal region) No (external)	Yes	Yes	Yes	Yes

Here are some important notes:

The file types that PolyBase supports: UTF-8 and UTF-16 encoded delimited text, RC File, ORC, Parquet, gzip, zlib, Snappy. Not supported: extended ASCII, fixed-file format, WinZip, JSON, and XML
Azure SQL Database does not support PolyBase
SQL DW recently added PolyBase support for ADLS but does not support compute pushdown
ADLS in only in two regions (East US 2, Central US)
PolyBase supports row sizes up to 1MB
PolyBase can do writes to blob/ADLS and HDFS (using CETAS)
PolyBase requires the CREATE EXTERNAL TABLE command
PolyBase offers ability to create statistics on tables (but they are not auto-created or auto-updated)

PolyBase parallelized reads for data loading:

Supported: in SQL using CTAS or INSERT INTO
Not supported: BCP, Bulk Insert, SQLBulkCopy
Not supported: SSIS (unless used to call stored procedure containing CTAS)
Supported: ADF
- If source compatible with PolyBase, will directly copy
- If source not compatible with PolyBase, will stage to Blob
- If source is ADLS, will still stage to Blob (having to stage to blob will be removed week of 8/20, so PolyBase will copy from ADLS directly to target)

The bottom line is, for SQL DW, think of PolyBase as a mechanism for data loading. For SQL Server 2016, think of PolyBase for federated querying.

More info:

PolyBase scale-out groups

PolyBase Queries

Azure, SQL Server

About the Author

James Serra

About the Author

James is currently a Senior Business Intelligence Architect/Developer and has over 20 years of IT experience. James started his career as a software developer, then became a DBA 12 years ago, and for the last five years he has been working extensively with Business Intelligence using the SQL Server BI stack (SSAS, SSRS, and SSIS). James has been at times a permanent employee, consultant, contractor, and owner of his own business. All these experiences along with continuous learning has helped James to develop many successful data warehouse and BI projects. James has earned the MCITP Business Developer 2008, MCITP Database Administrator 2008, and MCITP Database Developer 2008, and has a Bachelor of Science degree in Computer Engineering.

PolyBase (works with)

Azure Blob Store (WASB)

Push Down

Azure Data Lake Store (ADLS)

Push Down

HDI

Push Down

Cloudera (CDH)

Push Down

Horton Works (HDP)

Push Down

SQL 2016

Yes

N/A

No

N/A

No

No

Yes

Yes

Yes

Yes

Azure SQL DW

Yes

N/A

Yes

N/A

No

No

No

No

No

No

APS

Yes

N/A

No

N/A

Yes

Yes (internal region) No (external)

Yes

Yes

Yes

Yes

About the Author

James Serra

About the Author

Start the discussion at forums.toadworld.com

Yes (internal region)

No (external)