Machine Learning is one of the most important techniques of data science and analytics. Azure Synapse Analytics supports various ways of implementing machine learning out-of-the-box.
The basic process flow of a typical machine learning project is as follows:
- Understanding the business processes and rules
- Data acquisition and ingestion
- Analytical model selection and training
- Model deployment and scoring
The output of the first step is to define the basics for the project, in terms of the scope and outcomes.
The next step is analytical model selection and training. Azure Synapse Analytics incorporates two major ways to achieve this:
Spark MLib : Spark MLib can be used to train models in the more classical sense. This option is ideal for those who are already familiar with Spark MLib.
Automated ML : If you are new to machine learning, Azure Synapse Analytics provides this automated way to train models. This method allows user to select the best model based on certain metrics.
Apart from the Spark MLib, Synapse Analytics provides support for other popular machine learning libraries such as, SciKit Learn.
The final step in a machine learning project is, model deployment and scoring. Synapse Analytics supports deployment and batch scoring of Analytical models trained within Azure (using one of the techniques mentioned above) or outside the Azure environment as well. There are two ways to run batch scoring in Azure Synapse: TSQL PREDICT function and Apache Spark Pools for Azure Synapse.