Getting Started with Azure Databricks

Microsoft's Azure Databricks and Azure ML aim to make it easy to build extensive data analyses without using particular programming languages or managing a lot of R or Python code. You can use these tools to run analytics and machine learning jobs and streamline data analysis and management in cloud environments.

Azure Databricks began when the company decided to include data science capabilities inside the Azure cloud platform. Microsoft wasn't the first to offer that. Still, the company created a service with a bevy of developer-centric features, including a programming interface (API) to design, train, and run machine learning and analytics jobs.

Since the cloud is an increasingly important part of how data and analytics organizations run, Microsoft moved its Databricks offering from Azure to its Azure Public Cloud so that any Azure subscriber could use the technology. It also moved the capabilities from the Serverless Resource Manager to Microsoft's Open Data Service, which is the underlying technology of Azure Databricks.

Cloud Architect (AWS & Azure) Masters Program

Become A Globally-recognized Cloud ArchitectExplore Course
Cloud Architect (AWS & Azure) Masters Program

What Are the New Services Available in Microsoft's Azure Databricks?

Microsoft is offering three new services today: Azure Databricks Workbench, Azure Databricks Modeler, and Azure Databricks PubSub.

The Azure Databricks Workbench (formerly called Azure ML Studio) helps data scientists and analysts design, train, and deploy machine learning and analytics jobs. It can be used with programming languages like R or Python. The Azure ML Modeler allows users to design and train machine learning models within Microsoft's machine learning service. And the Azure Databricks PubSub is a connection layer that connects various components inside Microsoft's Azure public cloud, including the Azure IoT Hub and the Azure Data Lake Store.

How Do the Azure Databricks Tools Connect to Other Microsoft tools?

The three Azure Databricks tools work with other Azure services, including the Azure Machine Learning service, Azure SQL Data Warehouse, and the Azure Functions service.

Right now, the Azure Databricks tools can connect to any machine learning library running in Azure, such as the Spark MLlib or MXNet libraries. Microsoft also provides some examples of data science tasks that can be run from the Databricks Workbench using these libraries.

The service is also compatible with R, Python, TensorFlow, and Spark and packages for MATLAB and other programming languages.

What's Microsoft's Plan to Make the Azure Databricks Services Better?

Microsoft is adding several improvements to its services as it builds on the foundation it has already set up. One key area of focus is the interface and documentation. The Azure Databricks Workbench documentation and example code has received a huge update, and Microsoft also plans to revamp the documentation for the Azure ML Modeler and PubSub. Microsoft is also investing in training to enable Databricks as an accepted platform to build machine learning models, data engineering jobs, and extensive data analysis workflows.

Microsoft also plans to add better support for incorporating structured and semi-structured data and more connectors with other systems like the Azure Data Lake Store and SQL Server. Microsoft is working with the New York Data Science Academy to create a free course that teaches students how to build data science jobs using these technologies. Microsoft has also been helping data science and machine learning teams figure out where to get their data. For example, Azure's Spark cloud services make it easier for teams to import and connect to different kinds of data.

Give Your Career The Edge

Learn in-demand skills and concepts for FREEStart Learning
Give Your Career The Edge

How is Microsoft Differentiating From Competitors in the Big Data and Analytics Market?

Microsoft is not the first company to launch tools for building machine learning and analytics jobs. Amazon has the Amazon Redshift service and Amazon SageMaker, a benefit for writing, training, and deploying machine learning models. Google also offers the Data Science Experience service, which includes data prep and query tools and is compatible with popular devices such as R and Matlab. Google also provides analytics services for Hadoop, as well as the Google BigQuery service.

But Microsoft's Azure Databricks service allows developers to write applications that run on both the Azure public cloud and Azure's public cloud services, like the Windows Server Datacenter. That's going to make Azure Databricks particularly useful for data science and machine learning jobs because these kinds of projects typically run in the public cloud.

Finally, Microsoft's services are more competitive because they have a familiar and proven integration framework called the Microsoft Graph. Developers can create apps using Microsoft's technologies in place for its Edge web browser, Skype, Microsoft Dynamics CRM, and Windows 10. That familiarity is a crucial differentiator for Microsoft.

Get a deep understanding of the administrative lifecycle in Azure environments with the Microsoft Azure Fundamentals Course. Enroll now!

What's Microsoft's Vision for Azure Databricks?

Microsoft has set a long-term goal of enabling developers to build their applications using Microsoft's services. Microsoft also plans to integrate these services with its ecosystem of products, including Microsoft's Cortana virtual assistant and Microsoft Office 365. Microsoft also plans to integrate Databricks with Microsoft's data warehousing services, the SQL Server Service, and Microsoft Graph.

Today, the Azure Databricks tools work best with Google BigQuery, Microsoft Graph, and Spark, and MongoDB. But Microsoft also plans to integrate the devices with Azure HDInsight and Azure Data Lake Store. Microsoft recently announced plans to bring Azure HDInsight capabilities to Azure Databricks, and those plans should soon become a reality. Microsoft also plans to bring Spark and TensorFlow to the Azure Machine Learning service and improve Spark's interface with Databricks.

Technologies like Azure Databricks for machine learning are covered in Simplilearn’s Post Graduate Program in Artificial Intelligence and Machine Learning in partnership with Purdue University.  Other aspects of Azure are covered in our Post Graduate Program in Cloud Computing in partnership with Caltech CTME. These programs will help you develop your career and deepen your knowledge of Azure tools and technologies.

About the Author

Matthew DavidMatthew David

Matt is a Digital Leader at Accenture. His passion is a combination of solving today's problems to run more efficiently, adjusting focus to take advantage of digital tools to improve tomorrow and move organizations to new ways of working that impact the future.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.