In the previous part of this article series, we looked at the history of Amazon AWS, and how Amazon first released its Relational Database Service (Amazon RDS) supporting disparate and popular relational databases such as MySQL, Oracle, MS SQL server, PostgreSQL and MariaDB. This happened in 2009.
Five years later, in 2014, Amazon launched its own cloud-optimized Aurora database, with MySQL compatibility. We looked at how the storage architecture in the case of this new type of cloud-optimized database was designed to use the Amazon S3 service and its scale-out-on-demand capabilities, with the storage automatically replicated across three AWS Availability Zones (AZs) – two copies in each zone and achieving high availability up to 99.99%.
We noted that parallelized backups of the database take place to the S3 service, and this happens automatically and worry free, relieving the backup burden of the DBA. Another interesting point was that up to 15 Amazon Read replicas could be created from the primary database. These read replicas, with a replication lag of 10 to 20 milliseconds, are available for reads by the application. Or, they could also be used as failover targets. (By the way, read replicas cannot be created in the case of Oracle databases on RDS.)
Although read replicas achieve the purpose of offloading and scaling reads, the use of read replicas is not actually transparent to the application, as it would be in a true active-active read-write cluster. The application would have to be manually pointed to each read replica for read purposes, and to the primary database for write purposes. Also, since there is just one primary database handling all the writes, horizontal scaling for writes would not be possible in the RDS database. After exhausting other performance-boosting options such as increasing the number of IOPs in the storage, the only way to scale the primary database would be vertical, by moving to a more powerful RDS server. And this would have downtime.
At this point we need to compare this to Oracle’s Real Application Cluster (RAC) database, which is available in the Oracle Database Cloud. As part of the Oracle Exadata Cloud, a multi-node RAC cluster would have horizontal scalability for the application writes. This is one thing most people forget about RAC; it is not just for High Availability but also for horizontal scalability (scalability of the RAC database is one of the important features in Exadata RAC), and by its very nature of active-active instances, Oracle RAC is totally transparent to the application for reads and writes. No need to re-point the application.
However, for technical reasons, such as lack of shared storage and multicast IPs, Oracle RAC is not recommended to be set up on the AWS cloud. If you want to use RAC, you need to use the Oracle Database Cloud, and specifically the Exadata Cloud if you want more than two nodes for scalability.
Coming back to the Aurora discussion. After the Amazon Aurora MySQL-compatible database was released in 2014, users found it was easy to migrate MySQL databases to this type of database. Clients already using Amazon RDS for MySQL were among the first clients, and they could move to Aurora with a few clicks on the AWS console.
In late 2015, AWS announced a new AWS Database Migration Service (DMS). This made it possible to move on-premises Oracle, SQL Server, MySQL, MariaDB, or PostgreSQL databases to either the same database platform on the AWS cloud, or a new database platform. The AWS choice for a new platform to move to was obviously its own cloud-optimized database Aurora. When moving from Oracle to Aurora, for example, AWS also provided a Schema Conversion Tool to convert the on-premise schemas and stored procedures for use on the new platform.
However, there was one drawback: The Schema Conversion tool could not translate the Oracle stored procedures and their procedural language of PL/SQL; all the tool did when processing the procedures was to extract the SQL statements in the code. As any DBA would know, this is hardly enough. This meant that for this kind of migration, the application had to be virtually rewritten in the new database. The effect would be to increase the cost of the database migration project, since you now had to find someone who knew the nuances of both Oracle PL/SQL code, and the new procedural code that was to be used in the target database. Only such a person would be able to rewrite the application stored in the procedural language at the database end.
So, what was the next step for Amazon RDS? Amazon created a PostgreSQL compatible version of Aurora, which was announced recently in the last part of 2016. PostgreSQL has its own PL/SQL compatible language called as PL/pgSQL and this was obviously seen as a huge advantage.
But we wondered. Would that truly solve the issue of migration from Oracle to PostgreSQL? In the first place, is PostgreSQL fully compatible with Oracle?
Let’s take a look at some more details. PostgreSQL is an open source database, about 20 years mature, owned by a foundation, and it has an open source license which obviously is a big attraction. The code can be forked and modified. It supports many of the features of the SQL:2011 standard. PostgreSQL is supported by a large international community, mainly in postgresql.org and other countries in the world. The open source PostgreSQL is of course 100% free with no costs to download and install, but that doesn’t apply to PostgreSQL compatible Aurora where there would be a cost for using the Aurora database on the AWS cloud infrastructure.
PostgreSQL is ACID (Atomicity, Consistency, Isolation, Durability) compliant, object-oriented, ANSI-SQL:2008 compatible, and also has geospatial features. You can write stored procedures in 12 languages such as Java, Perl, Python, Ruby, Tcl, C/C++, or PL/pgSQL which is like Oracle’s PL/SQL. The database has partitioning, it has datatypes such as varchar, integer, numeric, XML, and JSON/B. The database can also enforce referential integrity with foreign keys. There are out-of-the-box functions such as to_char, to_date, and constraints such as check, not null, primary. In PostgreSQL, Foreign Data Wrappers (FDW) can be used to connect to external data sources, somewhat like the database links available in Oracle, except that it can connect to different databases like MongoDB.
User Defined Functions (UDF) can be created in PostgreSQL but an important consideration is that there are no packages (stored collections of procedures and functions) like in the Oracle Database, and as per reports this has stopped a number of Oracle Database users from moving to PostgreSQL.
In the next part of this article series, we will look at PostgreSQL replication and some of its drawbacks, and other compatibility issues.
(Disclaimer: The views expressed in this article are the author’s own views and do not reflect the views of Amazon Web Services (AWS) or Oracle Corporation. The author has written this article as an independent, unbiased cloud architect/consultant.)
Start the discussion at forums.toadworld.com