Strata Summit, New York, NY - September 21, 2011 -
EMC Corporation (NYSE: EMC) today announced the creation of the Greenplum® Analytics Workbench, which will be used for regular integration tests on Apache Hadoop. The 1,000-plus node test bed cluster incorporates technology from the world's leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can confidently deploy new releases in a production environment.
Apache Hadoop has rapidly emerged as the preferred solution for Big Data analytics across unstructured data. Organizations looking for opportunity in an ever-changing business environment are finding that Big Data analysis is the competitive advantage. In fact, according to a 2011 TDWI survey, 34% of companies do big data analytics today, and that number is growing. Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured and machine-generated data, organizations can make better decisions that drive revenue, improve service and reduce costs.
Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.
With an aggressive plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC's testing will be planned in coordination with the Apache Hadoop project.
The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including:
The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.
"EMC and its partners have made a significant contribution to the Apache Hadoop community by promising to validate Apache Hadoop releases on clusters at petabyte scale. With access to continuous integration testing, the world's best unstructured data analytics software will get better and faster, allowing companies and organizations to gain better insights from their data."
- Dhruba Borthakur, Member of Hadoop Project Management Committee
"The EMC 1k node cluster fills a vital resource gap, one that has been missing up to this validating Apache Hadoop builds and releases at scale. I can't wait to take it out for a burn."
- Michael Stack, Engineer at StumbleUpon and Member of Hadoop Project Management Committee
"Apache Hadoop at this stage needs a standardized tool for testing and validating Hadoop releases at scale. EMC's 1,000 node test bed launch will facilitate the development of Apache Hadoop as a vital tool for Big Data analytics, advance its internal innovation, and lead to greater adoption of Hadoop. I am especially pleased that EMC is contributing its findings back to the open source community."
- Konstantin Shvachko of eBay, Member of Apache Project Management Committee
"Intel is excited to be a part of the largest Hadoop test bed cluster ever built. Being able to analyze Big Data sets and make use of the tremendous volume of unstructured data being created is an opportunity that could transform entire industries. The latest Intel® Xeon® 5600 series processors will provide the processing power required to scale Big Data analytics and realize the full potential of Apache Hadoop. The entire open source community, including Intel, will benefit from the key learnings from both development and testing on the cluster."
- David Tuhy, General Manager of the Storage Group, Intel Corporation
"Greenplum is excited to be part of the elite group of hardware and software manufacturers that made possible the Greenplum Analytics Workbench. The test bed cluster, at 1,000-plus hardware nodes, is itself an accomplishment. But more importantly, we are excited to make this test bed available to the open source community so that enterprises can feel comfortable deploying Apache Hadoop in a production environment and can reap the benefits of Big Data analytics."
- Luke Lonergan, Chief Technology Officer, Greenplum, a division of EMC
Industry Buzz Around Greenplum Analytics Workbench:
, a part of , enables organizations to modernize, automate and transform their using industry-leading , servers, and data protection technologies. This provides a trusted foundation for businesses to transform IT, through the creation of a , and transform their business through the creation of cloud-native applications and solutions. Dell EMC services customers across 180 countries – including 98 percent of the Fortune 500 – with the industry’s most comprehensive and innovative portfolio from edge to core to cloud.