Amazon Elastic MapReduce (Amazon EMR), a major update, has been updated to version 5. This release includes a better user interface, improved debugging and updates to apps, as well as support for 16 open-source Hadoop ecosystem projects.
According to Jeff Barr, a spokesperson for Amazon Web Services Inc. (AWS), “Today the team is announcing that we are releasing EMR 5.0.0.” This major release includes support for 16 open-source Hadoop ecosystem projects, major upgrade for Spark and Hive as well as major version upgrades for Spark, Hive, and Pig. It also features Tez by default for Hive/Pig, improvements to Hue, Zeppelin’s user interface, and enhanced debugging capabilities.
Barr explained that EMR 5.0 supports 16 Hadoop ecosystem projects, including Apache Hadoop and Apache Spark, Presto, Apache Hive and Apache HBase. This is to ensure that the latest technology is available.
According to Amazon EMR’s Web site, it is a Web service that allows you to process large amounts of data quickly and economically.
AWS also states on its site that Amazon EMR simplifies Big Data Processing, providing a managed Hadoop framework which makes it simple, fast, and cost-effective to distribute and process large amounts of data across dynamically scalable Amazon EC2 instances. You can also run popular distributed frameworks like Presto and Apache Spark in Amazon EMR and interact with data stored in Amazon S3 or Amazon DynamoDB.
Barr singled out Spark and Hive for their upgrades.
He stated that EMR updated Hive (a SQL-like interface to Tez and Hadoop MapReduce), from 1.0 to 2.0, with a move towards Java 8. It also updates Spark (an engine to large-scale data processing) to 2.0 from 1.6.2. The update also includes a similar move towards Scala 2.11. Both Spark and Hive are major updates and include new features and performance enhancements as well as bug fixes. Spark now has a Structured Streaming API and better SQL support.
Updates to Zeppelin, “a notebook for interactive analytics” and Hue, “an interface to analyze data with Hadoop,” include UI improvements. Hue now has notebooks that allow multiple queries to be made from one page.
Barr stated that debugging functionality has been improved for data developers with functionality such as partial stack trace and links to S3 log files accessible from a console.
Barr stated that EMR 5.0 clusters are now possible to spin up in any AWS Region. A Aug. 23 Webinar will introduce new features of the upgrade.