Posts tagged FREEDOM

How Open Source Succeeds In The Cloud—It Trades Freedom For Simplicity

Those new to open source won’t remember just how much of the early code amounted to little more than crappy-but-free clones of popular proprietary products. Boy, how times have changed.

Open source, once a clumsy (but free!) imitator of proprietary innovation is now doing taking the lead on industry innovation, with Big Data being the most obvious example. While this is a hugely positive industry shift, it also introduces complexities. Namely, with so much exceptional open source software contending to power your next Big Data project, how do you choose which to use?

Opening Up Innovation

Black Duck Software recently named its annual “Open Source Rookies of the Year,” pulling data from thousands of projects relative to project activity, commits pace, project team attributes, and other factors. Spanning cloud and virtualization, mobile, social media and more, they reflect the ever-increasing scope of code that is successfully developed in the open, rather than behind closed doors.

See also: Why Your Company Needs To Write More Open-Source Software

Nowhere is this trend more evident than in Big Data.

As Cloudera co-founder Mike Olson declares, “No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form.” That’s a stunning assessment, but it’s absolutely true. Open source may have come to life as an imitator, but it’s innovating at a frenetic pace in Big Data land.

Which may be a problem.

Spoiled By Open Source Riches

Big Data projects are now being released at such a frenetic pace that developers struggle to keep up. In case you’re just getting your feet wet with Hadoop, for example, you now need to consider Spark, Samza or a variety of other oddly-named but increasingly important Big Data tools.

See also: Applications Drive The Biggest Money In Big Data

Importantly, these tools are largely being born within enterprises like LinkedIn that have serious Big Data needs that no commercial software can solve. Even the National Weather Service has jumped in, open sourcing the code that powers its global forecast system.

While most companies won’t need such niche code, they may want the sorts of things released by the big Web companies. Take for instance, LinkedIn’s release of Apache Samza:

The LinkedIn-developed framework is designed to process complex real-time workloads that require special handling after ingestion. It embeds a local key-value store in every stream that makes it possible to store the kind of contextual information needed to carry out advanced operations such as merging datasets locally instead of having to query a remote system every time they’re needed.

This leads to fantastic performance. It also leads to the question: what should a developer use to tackle her organization’s data load?

On the database side, there are hundreds of options, ranging from NoSQL databases like MongoDB and Cassandra to relational mainstays like Oracle and MySQL. Should a developer choose the most popular database, picking from a list like DB-Engines’ ranking? That’s one approach, but you could easilyend up with a big mismatch between the workload and the tool managing it.

If this seems like a trivial problem, it’s not. At all. I spent years working for Big Data infrastructure providers, and now work for a company trying to make sense of the deluge of open source Big Data tools. It’s hard to keep up, and very difficult to know which to use.

Closing Off Choices

One reason that Amazon Web Services (AWS) has become the go-to public cloud is that the company has managed to simultaneously offer a broad array of open source solutions to run (supported and unsupported) on its cloud, and a suite of proprietary services for everything from email to data warehousing.

Developers, anxious to “get stuff done,” can turn to AWS and know that they’ll have both a variety of options and the safety of a paved path.

Microsoft Azure has followed suit. Not content to roll out a Hadoop-based analytics service, for example, Microsoft is now close to releasing Cosmos, its parallel processing and storage service. Or take the company’s support for MongoDB, an open source document database, to appeal to those that want the popular NoSQL database. At the same time, Microsoft has rolled out its own document database as a service, for those that want a document database but may prefer Microsoft’s packaging of it.

Microsoft, in short, wants to provide choice to its customers, but curated and nicely packaged.

This looks like the future of open source infrastructure: free to download, but perhaps more useful rolled into a cloud service that removes complexity (and choice). It may not be what the open source crowd would prefer, but it may end up being the ideal way to turn open source Big Data innovation into solutions mainstream enterprises can actually use.

Photo by George Thomas

View full post on ReadWrite

How Open Source Succeeds In The Cloud By Trading Freedom For Simplicity

Those new to open source won’t remember just how much of the early code amounted to little more than crappy-but-free clones of popular proprietary products. Boy, how times have changed.

Open source, once a clumsy (but free!) imitator of proprietary innovation is now doing taking the lead on industry innovation, with Big Data being the most obvious example. While this is a hugely positive industry shift, it also introduces complexities. Namely, with so much exceptional open source software contending to power your next Big Data project, how do you choose which to use?

Opening Up Innovation

Black Duck Software recently named its annual “Open Source Rookies of the Year,” pulling data from thousands of projects relative to project activity, commits pace, project team attributes, and other factors. Spanning cloud and virtualization, mobile, social media and more, they reflect the ever-increasing scope of code that is successfully developed in the open, rather than behind closed doors.

See also: Why Your Company Needs To Write More Open-Source Software

Nowhere is this trend more evident than in Big Data.

As Cloudera co-founder Mike Olson declares, “No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form.” That’s a stunning assessment, but it’s absolutely true. Open source may have come to life as an imitator, but it’s innovating at a frenetic pace in Big Data land.

Which may be a problem.

Spoiled By Open Source Riches

Big Data projects are now being released at such a frenetic pace that developers struggle to keep up. In case you’re just getting your feet wet with Hadoop, for example, you now need to consider Spark, Samza or a variety of other oddly-named but increasingly important Big Data tools.

See also: Applications Drive The Biggest Money In Big Data

Importantly, these tools are largely being born within enterprises like LinkedIn that have serious Big Data needs that no commercial software can solve. Even the National Weather Service has jumped in, open sourcing the code that powers its global forecast system.

While most companies won’t need such niche code, they may want the sorts of things released by the big Web companies. Take for instance, LinkedIn’s release of Apache Samza:

The LinkedIn-developed framework is designed to process complex real-time workloads that require special handling after ingestion. It embeds a local key-value store in every stream that makes it possible to store the kind of contextual information needed to carry out advanced operations such as merging datasets locally instead of having to query a remote system every time they’re needed.

This leads to fantastic performance. It also leads to the question: what should a developer use to tackle her organization’s data load?

On the database side, there are hundreds of options, ranging from NoSQL databases like MongoDB and Cassandra to relational mainstays like Oracle and MySQL. Should a developer choose the most popular database, picking from a list like DB-Engines’ ranking? That’s one approach, but you could easilyend up with a big mismatch between the workload and the tool managing it.

If this seems like a trivial problem, it’s not. At all. I spent years working for Big Data infrastructure providers, and now work for a company trying to make sense of the deluge of open source Big Data tools. It’s hard to keep up, and very difficult to know which to use.

Closing Off Choices

One reason that Amazon Web Services (AWS) has become the go-to public cloud is that the company has managed to simultaneously offer a broad array of open source solutions to run (supported and unsupported) on its cloud, and a suite of proprietary services for everything from email to data warehousing.

Developers, anxious to “get stuff done,” can turn to AWS and know that they’ll have both a variety of options and the safety of a paved path.

Microsoft Azure has followed suit. Not content to roll out a Hadoop-based analytics service, for example, Microsoft is now close to releasing Cosmos, its parallel processing and storage service. Or take the company’s support for MongoDB, an open source document database, to appeal to those that want the popular NoSQL database. At the same time, Microsoft has rolled out its own document database as a service, for those that want a document database but may prefer Microsoft’s packaging of it.

Microsoft, in short, wants to provide choice to its customers, but curated and nicely packaged.

This looks like the future of open source infrastructure: free to download, but perhaps more useful rolled into a cloud service that removes complexity (and choice). It may not be what the open source crowd would prefer, but it may end up being the ideal way to turn open source Big Data innovation into solutions mainstream enterprises can actually use.

Photo by George Thomas

View full post on ReadWrite

Learn How to Build Your Business With a Virtual Staff: Virtual Freedom by Chris Ducker #SEJBookClub by @wonderwall7

Editor’s Note: This is the second in a monthly series of book reviews by the Search Engine Journal editorial team.  Join us each month to discuss our picks on Facebook, Twitter, and Google+ using the hashtag #SEJBookClub and via the comments below. For our second SEJ Book Club, I read Virtual Freedom: How to Work with Virtual Staff to Buy More Time, Become More Productive, and Build Your Dream Business by Chris Ducker. For me, it was one of those books that you start reading and don’t put down until you’re done. I read it in two days between my […]

The post Learn How to Build Your Business With a Virtual Staff: Virtual Freedom by Chris Ducker #SEJBookClub by @wonderwall7 appeared first on Search Engine Journal.

View full post on Search Engine Journal

Go to Top
Copyright © 1992-2015, DC2NET All rights reserved