Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting our team. We will be in touch shortly.Close

  1. Blog
  2. Article

Tom Callway
on 19 February 2015

Ubuntu, Hortonworks and Microsoft = Big Data Hosted Solution

The first Microsoft Azure hosted service to run Linux (on Ubuntu) announced at Strata Conference

This week thousands of people are in California at Strata + Hadoop World to learn more about the technology and business of big data. At the Strata Conference, Microsoft announced yesterday the preview of Azure HDInsight on Ubuntu. This is a recognition that Ubuntu, the leading scale-out and cloud Linux, is great for running Big Data solutions.

Microsoft’s Ranga Rengarajan, corporate vice president, Data Platform and Joseph Sirosh, corporate vice president, Machine Learning noted that, Azure HDInsight, is Microsoft’s Apache Hadoop-based service in the Azure cloud. It is designed to make it easy for customers to evaluate petabytes of all types of data with fast, cost-effective scale on demand, as well as programming extensions so developers can use their favorite languages. Microsoft customers like Virginia Tech, Chr. Hanson, Mediatonic and many others are using it to find important data insights. And, yesterday, they announced that customers can run HDInsight on Ubuntu clusters (the leading scale-out Linux), in addition to Windows, with simple deployment, a managed service level agreement and full technical support. This is particularly compelling for people that already use Hadoop on Linux on-premises like on Hortonworks Data Platform, because they can use common Linux tools, documentation, and templates and extend their deployment to Azure with hybrid cloud connections.

Combined with Canonical’s Juju, Cloud Orchestration tool, Ubuntu make it a breeze to test, deploy, scale and manage Big Data architectures. This is the result of years of effort to optimize Big Data workloads on Ubuntu by our development teams.

For over a decade, DevOps teams have been working with “classical” Configuration Management tools. They have become very successful at building insurance that each server under their watch would run in perfect accordance with their desires and policies.

However, when it comes to Big Data, whether to process vast data sets, or to run real time analytics on unpredictable data streams, or to offer Data-aaS, new questions arise: How to embrace fast paced scalability of their architectures, whether up when the flow grows, or in, when business flow slows? How to stay ahead of the game in a world of faster than ever changing technologies? Add multi-clouds to the equation to prevent single points of failure and you end up with a nightmare for every decision maker.

Containerization has received a lot of positive reviews as an attempt to fix some of these issues by maintaining a single and lightweight “image” of application that becomes cloud-agnostic. But it also came with a list of new and still-to-be-fixed concerns regarding security and, to come back to the first point, orchestration.

So what is good cloud orchestration? To answer that question we have to get back to the requirements for such a tool:

  • Be portable: orchestration is valuable if and only if it is adaptable to each and every substrate: public cloud, private cloud, hybrid cloud, bare metal, containers…
  • Manage scalability: deploying an architecture and not being able to scale it from the management tool doesn’t make sense. To orchestration, the deployment targets should be infinite. The tool must be able to get any share of that infinity and change its mind at any point in time.
  • Manage services: to consumers of the architecture, the knowledge of each machine involved in a scale out service is pointless. What is important is to know how to access the service that the cluster provides.
  • Manage relations: at cloud scale, what matters is that pieces of architecture can communicate together.

What is our answer to those requirements? Juju.

  • Juju creates portable architectures: When deploying a service, Juju makes the minimum number of assumptions regarding the substrate. It always starts with a vanilla OS image, and adds software or containers on top. All configuration information is processed dynamically. Then it can export to a standard YAML file, and reproduce the same architecture regardless of the provider.
  • Juju can scale architectures in and out: Juju offers commands to add or remove service units, efficiently providing ways to scale in both directions. Complemented with a system collecting performance metrics and pointing to its API, it becomes very easy to design autoscalable solutions that do not rely on a cloud provider to function.
  • Juju manages services: The best illustration of Juju’s focus on service is its GUI: whether a cluster has 2 or 200 nodes, it still comes up as a single box.
  • Juju manages relations: Juju can create and manage relations between services by exposing parameters to other services, and consume exposed variables. Juju plugs services into each other, add credentials, and allows the smoothest way to run complex architectures.

On top of that, Juju comes with a centralized Charm Store, a unique marketplace where all charms are stored and exchanged. The main benefit of this approach is that you’ll always find the currently best charm available for a service. If it doesn’t match your own preferences, you can fork it, and share your views with others, thus helping to create an even better experience for future users.For Enterprises, this is a guarantee that their DevOps team are always up-to-date and as agile as they can be when it comes to building new services for the company.

So take Juju, the best in class cloud orchestration tool, with Ubuntu, the best OS for Big Data deployment and Azure, the most advanced Enterprise cloud together to make it easy for customers to evaluate petabytes of all types of data fast.

Related posts

Felicia Jia
18 June 2024

Empowering RISC-V with open source through Ubuntu

Silicon Article

Canonical collaborates with partners to deliver optimised Ubuntu on RISC-V platforms, empowering innovation on RISC-V  Open source and global standards have a long history of success because they have a license framework that ensures anyone, anywhere can have ongoing access to them. RISC-V,  an open standard Instruction Set Architecture ( ...

Andreea Munteanu
17 June 2024

Top 5 reasons to use Ubuntu for your AI/ML projects

AI Article

For 20 years, Ubuntu has been at the cutting edge of technology. Pioneers looking to innovate new technologies and ideas choose Ubuntu as the medium to do it, whether they’re building devices for space, deploying a fleet of robots or building up financial infrastructure.  The rise of machine learning is no exception and has encouraged ...

13 June 2024

World’s first RISC-V Laptop gets a massive upgrade and equips with Ubuntu

Canonical announcements Article

DeepComputing partners with Canonical to unveil a huge boost to the DC-ROMA RISC-V Laptop family  The DC-ROMA RISC-V Laptop II is the world’s first RISC-V laptop pre-installed and powered by Ubuntu, which is one of the most popular Linux distributions in the world, providing developers with an outstanding mix of usability and reliability, ...