Mar 15, 2017 10:00:02 AM by James Serra
I previously talked about PolyBase and its enhancements (see PASS Summit Announcements: PolyBase enhancements). There is some confusion on PolyBase use cases as they are different depending on whether you are using PolyBase with Azure SQL Data Warehouse (SQL DW) or SQL Server 2016, as well as the sources you are using it against. The three main use cases for using PolyBase are: Loading data, federating querying, and aging out data. Here is the support for those three uses cases in SQL DW and SQL Server 2016:
PolyBase in: | Parallelize Data Load (Blob and ADLS) | Federated Query (push down) HDInsights | Federated Query (push down) HDP/Cloudera (local or blob) | Federated Query (push down) five new sources* | Age Out Data |
SQL DW | Yes | N/A | N/A | No support for on-prem sources | Maybe |
SQL Server 2016 | Yes via scale-out groups. Blob, not ADLS | N | Y (Creates MapReduce job) | Y | Maybe |
* = Teradata, Oracle, SQL Server, MongoDB, generic ODBC (Spark, Hive, Impala, DB2)
For federated queries: “N” requires all data from the source to be copied into SQL Server 2016 and then filtered. For “Y”, the query is pushed down into the data source and only the results are returned back, which can be much faster for large amounts of data.
I mention “Maybe” for age out data in SQL DW as you can use PolyBase to access the aged-out data in blob or Azure Data Lake Storage (ADLS), but it will have to import all the data so may have slower performance (which is usually ok for accessing data that is aged-out). For SQL Server 2016, it will have to import the data unless you use HDP/Cloudera, in which case the creation of the MapReduce job will add overhead.
Here are details on what PolyBase supports for each product:
PolyBase (works with) |
Azure Blob Store (WASB) |
Push Down |
Azure Data Lake Store (ADLS) |
Push Down |
HDI |
Push Down |
Cloudera (CDH) |
Push Down |
Horton Works (HDP) |
Push Down |
SQL 2016 |
Yes |
N/A |
No |
N/A |
No |
No |
Yes |
Yes |
Yes |
Yes |
Azure SQL DW |
Yes |
N/A |
Yes |
N/A |
No |
No |
No |
No |
No |
No |
APS |
Yes |
N/A |
No |
N/A |
Yes |
Yes (internal region)
|
Yes |
Yes |
Yes |
Yes |
Here are some important notes:
PolyBase parallelized reads for data loading:
The bottom line is, for SQL DW, think of PolyBase as a mechanism for data loading. For SQL Server 2016, think of PolyBase for federated querying.
More info:
Tags: SQL Server Azure
Written by James Serra
James is currently a Senior Business Intelligence Architect/Developer and has over 20 years of IT experience. James started his career as a software developer, then became a DBA 12 years ago, and for the last five years he has been working extensively with Business Intelligence using the SQL Server BI stack (SSAS, SSRS, and SSIS). James has been at times a permanent employee, consultant, contractor, and owner of his own business. All these experiences along with continuous learning has helped James to develop many successful data warehouse and BI projects. James has earned the MCITP Business Developer 2008, MCITP Database Administrator 2008, and MCITP Database Developer 2008, and has a Bachelor of Science degree in Computer Engineering.
We use cookies to improve your experience with our site. By continuing to use this site, you consent to our use of cookies. Learn more.