Planning a Machine Learning project


Figure 1 ML modeling and deployment
  • Define the list the datasets which is needed for the project
  • Define the strategy for labeling data, in-house / outsourced / crowdsourced
  • Describe other datasets you believe are important to this project, especially meta-data (the data about data).
  • Human level performance (HLP)
  • Literature search for state-of-the-art/open source
  • Quick-and-dirty implementation
  • Performance of older system
  • Linear regression when predicting continuous values
  • Logistic regression when classifying structured data
  • Pre-trained convolutional neural networks for vision related tasks
  • Recurrent neural networks and gradient boosted trees for sequence modeling
  • Brainstorm the ways the system might go wrong.
  • Performance on subsets of data (e.g., ethnicity, gender).
  • Prevalence of specific errors/outputs (e.g., FP, FN).
  • Performance on rare classes.
  • Establish metrics to assess performance against these issues on appropriate slices of data.
  • Get business/product owner buy-in.
  • Realtime or Batch
  • Could vs. Edge/Browser
  • Compute resources (CPU/GPU/memory)
  • Latency, throughput (QPS)
  • Logging
  • Security and privacy
  • Tools like TensorFlow transform, Apache beam, Airflow,…
  • Keep track of data provenance ( where it comes from) and lineage (sequence of steps)
  • New product/capacity
  • Automate/assist with manual task (shadow deployment)
  • Replace previous ML system
  • Canary deployment
    Monitor system and ramp up traffic gradually.
  • Blue green deployment
    The old version can be called the blue environment while the new version can be known as the green environment. As you test and deploy to your green environment, you keep your blue environment running seamlessly for production users, until successful deployment and testing on green environment.
  • Software metrics
    – Memory
    – Compute
    – Latency
    – Throughput
    – Server load
  • Input metrics
    – Average image brightness
    – Num missing values
    – Avg input volume
  • Output metrics
    – times return null
    – times user redoes search
    – times user switches to typing
  • ML metrics (accuracy, precision/recall, etc.)
  • Software metrics (latency, throughput, etc. given compute resources)
  • Business metrics (revenue, etc.)



Source link

Leave a Reply

Your email address will not be published.

Previous Article

e-con Systems™ Launches a Time of Flight (ToF) Camera for Accurate 3D Depth Measurement

Next Article

Destiny 2: How To Get The Strident Whistle Bow (& God Rolls)

Related Posts