Tag big data

Big Data Challenges Move from Tech to the Organization

This year I had the opportunity to lead our predictions for big data. Unlike most predictions this time of year, we don’t just look ahead for the coming 12 months. The effects of innovation, changes in the market and impact on IT budgets are hard to recognize over such a short timeframe. That’s why our predictions often extend to 36 months. (We also do lookbacks to see if we were right or not, but that’s a topic for another blog post.)

What became clear during the process of selecting and refining predictions is the focus has changed. Technology is no longer the interesting part of big data. What’s interesting is how organizations deal with it. The hype is receding and big data is no longer viewed as a simple technology problem. Organizations have to focus on the building blocks of enterprise information management (EIM):

eim_building_blocks

So far, only the most rudimentary elements of enabling infrastructure have been considered. This is not sustainable. One prediction from my colleague Roxane Edjlali is that 60% of big data projects will fail to make it into production either due to an inability to demonstrate value or because they cannot evolve into existing EIM processes.

This is only part of the story. Cultural or business model changes will be necessary to benefit from big data. And ethics must be a primary consideration as privacy concerns rise in importance.

Gartner clients can read the full report here: Predicts 2015: Big Data Challenges Move From Technology to the Organization. If you want to ensure your organization is on the right side of the analytical divide, join me and my Gartner colleagues at the Gartner Business Intelligence & Analytics Summit.

gartner_bi_vegas

Spark and Tez Highlight MapReduce Problems

On February 3rd, Cloudera announced support for Apache Spark as part of Cloudera Enterprise. I’ve blogged about Spark before so I won’t go into substantial detail here, but the short version is Spark improves upon MapReduce by removing the need to write data to disk between steps. Spark also takes advantage of in-memory processing and data sharing for further optimizations.

The other successor to MapReduce (of course there is more than one) is Apache Tez. Tez improves upon MapReduce by removing the need to write data to disk between steps (Sound familiar?). It also has in-memory capabilities similar to Spark.  Thus far Hortonworks has thrown its weight behind Tez development as part of the Stinger project.

Both Tez and Spark are described as supplementing MapReduce workloads. However, I don’t think this will be case much longer. The world has changed since Google published the original MapReduce paper in 2004. Memory prices have plummeted while data volumes and sources have increased, making legacy MapReduce less appealing.

Vendors will likely begin distancing themselves from MapReduce for more performant options once there are some high profile customer references. It remains to be seen what this means for early adopters with legacy MapReduce applications.

Thanks to Josh Wills at Cloudera for helping clarify the advantage provided by Spark & Tez.

BI Hadoop Specialists Trail, Broader Tools Lead

There is a romantic notion of leaving the past behind and embracing the future unencumbered. Previous mistakes forgotten, we can venture forward to accomplish great things to the amazement of friends, colleagues and casual onlookers.

This is the promise made by BI and analytics vendors in the Hadoop-only ecosystem. After all, if your data moves to Hadoop, why concern yourself with data stored in legacy data warehouses? Based on the audience response from a polling question conducted during a webinar on Hadoop 2.0, you can’t escape your past. You can only embrace it.

Read more…

Finding a Spark at Yahoo!

Recently I had an opportunity to learn a little more about Apache Spark, a new in-memory cluster computing system originally developed at the UC Berekeley AMPlab. By moving data into memory, Spark improves performance for tasks like interactive data analysis and iterative machine learning. These improvements are especially pronounced when comparing them to a batch oriented, disk-bound system like Apache Hadoop. While Spark has seen rapid adoption at a number of companies, I learned how Yahoo! has started integrating Spark into its analytics.

Read More…

Copyright © Nick Heudecker

Built on Notes Blog Core
Powered by WordPress