Tokio School

Machine Learning

Machine Learning was born from pattern recognition, but today it allows us to develop applications that improve their performance by «learning» from data collected in past situations. In this Python specialisation you will be able to apply Machine Learning to real projects, including preparation and related tasks, deployment in production and the lifecycle of a model.

Unit 1: Introduction to Big Data and Machine Learning

  • Introduction to Machine Learning
    • The theory of gravity
    • The scientific method
    • Mathematical models
    • Scientific method applications
    • Data science
    • Introduction to Big Data
    • Introduction to Machine Learning
    • The equation of the straight line
    • Model training
    • Working with Machine Learning models
    • Machine Learning applications
    • AlphaGo
  • Linear algebra
    • Relationship to the areas of big data, machine learning and artificial intelligence
    • Elements
    • Operations and properties

Unit 2: Work environment

Unit 3: Python and Scikit-learn numeric libraries

Unit 1: Linear regression 

  • Simple
    • Model equation
    • Graphical representation
    • Types of variables
  • Multivariable
    • Data modelling
    • Curve modelling
    • Analytical resolution
    • Cost function
    • Solving by iterative methods
    • Resolution algorithm

Unit 2: Gradient descent optimisation

  • Gradient descent
  • Convergence
  • Local and global minima
  • Learning ratio
    • Learning ratio choice
  • Training algorithm

Unit 3: Standardisation, regularisation and validation

  • Standardisation
    • Problem
    • What is standardisation?
    • Updated training algorithm
  • Regularisation
    • Deviation and variance
    • Regularisation
    • Regularised cost function
  • Cross-validation
    • Resolution methods
    • Dataset subdivision
    • K-fold
    • Updated training algorithm

Unit 4: Bayesian models and model evaluation

  • Example: carcinogenic cells’ classification
  • Sensitivity and specificity

Unit 5: Classification

  • Decision trees
    • Representation
    • Main concepts
    • Categorical and continuous target variables
    • Node splitting
    • Advantages and disadvantages of decision trees
    • Limitations on tree size
    • Tree pruning
    • Decision trees vs. linear models
    • Bootstrapping
    • Training algorithm
  • Logistic regression
    • Data modelling
    • Binary and multi-class classification
    • Hypothesis
    • Activation function: sigmoid
    • Cost function
    • Training algorithm: binary classification
    • Training Algorithm: multiclass classification
  • Classification by SVM
    • Logistic regression vs. SVM
    • Hypothesis
    • Kernels and landmarks
    • Hypothesis transformation
    • Types of kernels available
    • Cost functions
    • Regularisation parameter
    • Training algorithm: multiclass classification

Unit 6: Introduction to neural networks 

  • Natural neurons
  • Artificial neurons
  • Perceptron
  • Multi-layer or deep neural networks
    • Propagation of predictions
    • Cost function
    • Training
    • Multi-class classification
    • Training algorithm: binary classification

Unit 1: Optimisation by randomisation

  • Problem: local minima
  • Multiple initialisations
  • Implementation

Unit 2: Clustering

  • Differences between clustering and classification
  • K-means 
  • Other clustering algorithms

Unit 1: Anomalies detection

  • The problem
  • Anomalies in supervised vs. unsupervised and semi-supervised learning
  • Model representation
  • Choice of features
  • Normal or Gaussian multivariate distribution
  • Training algorithm

Unit 2: Recommendation systems

  • Linear regression recommendation systems
  • Recommendation systems approach
  • Cost function
  • Training algorithms
  • Prediction performance
  • Similarity between examples

Unit 3: Genetic algorithms

  • Natural evolution
  • Natural evolution of behaviour
  • Main concepts
  • Algorithms applied to optimisation
  • Examples

Unit 1: ML systems approach

  • Initial approach
    • Data cleansing and transformation
    • Large-scale implementation

Unit 2: Feature engineering

  • Definition and characteristics
  • Creation of characteristics
  • Problems and solutions
  • Data quality

Unit 3: Principal Components Analysis (“PCA”)

  • Variables representation
  • Dimensionality reduction
  • Definition and applications
  • Visual representation

Unit 4: Assemblies

  • Definition and applications
  • Types of errors
  • Assembly techniques
  • Bagging
  • Max voting
  • Mean and weighted mean
  • Random forest
  • Boosting and adaptive boosting or AdaBoosting
  • Stacking

Unit 5: Models’ evaluation and improvement

  • Deviation and variance
  • Evaluation metrics: linear regression
  • Evaluation metrics: classification
  • Deviation and variance avoidance
  • Error analysis and evaluation of results 

Unit 6: Operations in ML

  • ML Engineering 
  • Operations in ML