Preparing the data - Deepstash

Preparing the data

This step is the most time-consuming, with ML engineers spending around 80% of the AI model development time in this stage. A significant amount of time is spent cleaning the data and transforming it into the required format.

Things to consider include:

  • Transforming the data into the required format.
  • Cleaning the data set of inaccurate and irrelevant data.
  • Enhance and augment the data set if the quality is low.

6 STASHED

1 LIKE

MORE IDEAS FROM THEARTICLE

Define the business problem you are trying to solve.

  • What results are you expecting from the process?
  • What processes are used to solve this problem?
  • Do you see AI improving the current process?
  • What are the key performance indicators (KPIs) that will help track progress?
  • What resources will be needed?
  • Consider how to break down the problem into iterative sprints.

Once you have answers, then identify how you can solve the problem using AI.

6 STASHED

1 LIKE

AI adoption

In 2019, near 87% of data science projects did not get into production. However, due to COVID -19, most companies have scaled up their AI adoption and increased their AI investment.

In 2020, almost 50 % of enterprises employed an ML model. But to completely harness the power of AI, multiple models need to be created and deployed.

5 STASHED

1 LIKE

AI model development involves multiple stages that interconnect to each other.

  1. Identify the business problem. Instead of asking how to improve your artificial intelligence, ask how to improve your business.
  2. Identify and collect data. Identifying the correct data is vital to ensure model accuracy and relevance.
  3. Preparing the data.
  4. Model building and training.
  5. Model testing. The model is trained and tuned using the training and validation data sets.
  6. Model deployment. Once the model is tested with different datasets, you will have to validate model performance using the parameters from Step 1.

6 STASHED

1 LIKE

Ask questions, such as.

  • What data is needed to solve the business problem?
  • What quantity of data is required?
  • Do you have enough data to build a model?
  • Do you need more data to extend the existing data?
  • How is the data obtained, and where is it stored?
  • Can you use pre-trained data?

Consider if your model will operate in real-time to determine if you need to create data pipelines to feed the model.

Consider what form of data is required:

  • Structured data in the form of rows and columns.
  • Unstructured data, such as images.
  • Static data, such as previous sales data.
  • Streaming data.

6 STASHED

1 LIKE

At this step, all the requirements have been collected for the solution modelling to proceed.

ML engineers will define the features of the model, taking the following into account:

  • Use the same features for training and testing the model to avoid inaccurate results.
  • Consider working with Subject Matter Experts to direct you on what features would be necessary for the model.
  • Be wary of using multiple features that might be irrelevant to the model.

Once the features are defined, choose the most suitable algorithm.

5 STASHED

While the model is trained and tuned using the training and validation data set, the model will behave differently when used in the real world, which is fine.

The main objective is to minimise the change in model behaviour when it is deployed. Three data sets are used when experiments are carried out: training, validation, and testing.

  • If the model performs poorly on the training data, select a better algorithm, increase data quality, or feed more data into the model.
  • If the model does not perform well on testing data, the model may not extend the algorithm, and more data needs to be added.

6 STASHED

1 LIKE

Analyse if the KPIs and the business objective of the model are achieved. If the parameters are not met, consider changing the model or improving the quality and quantity of the data.

Before deployment:

  • Ensure to measure and monitor the model performance continuously.
  • Define a baseline to measure future iterations of the model.
  • Keep iterating the model to improve model performance.

When all the defined parameters are met, deploy the model into the intended infrastructure.

5 STASHED

Deepstash helps you become inspired, wiser and productive, through bite-sized ideas from the best articles, books and videos out there.

GET THE APP:

RELATED IDEAS

Building ethical AI

Companies are leveraging data and artificial intelligence to create scalable solutions — but they’re also scaling their reputational, regulatory, and legal risks. For instance, Los Angeles is suing IBM for allegedly misappropriating data it collected with its ubiquitous weather app. Optum is being investigated by regulators for creating an algorithm that allegedly recommended that doctors and nurses pay more attention to white patients than to sicker black patients. Goldman Sachs is being investigated by regulators for using an AI algorithm that allegedly discriminated against women by granting larger credit limits to men than women on their Apple cards. Facebook infamously granted Cambridge Analytica, a political firm, access to the personal data of more than 50 million users.

Just a few years ago discussions of “data ethics” and “AI ethics” were reserved for nonprofit organizations and academics. Today the biggest tech companies in the world — Microsoft, Facebook, Twitter, Google, and more — are putting together fast-growing teams to tackle the ethical problems that arise from the widespread collection, analysis, and use of massive troves of data, particularly when that data is used to train machine learning models, aka AI.

AI and Equality
  • Designing systems that are fair for all.

8 STASHED

1 LIKE

Companies are leveraging data and artificial intelligence to create scalable solutions — but they’re also scaling their reputational, regulatory, and legal risks. 

  • Los Angeles is suing IBM for allegedly misappropriating data it collected with its ubiquitous weather app.
  • Optum is being investigated by regulators for creating an algorithm that allegedly recommended that doctors and nurses pay more attention to white patients than to sicker black patients.
  • Goldman Sachs is being investigated by regulators for using an AI algorithm that allegedly discriminated against women by granting larger credit limits to men than women on their Apple cards.
  • Facebook infamously granted Cambridge Analytica, a political firm, access to the personal data of more than 50 million users.

13 STASHED

3 LIKES