Unsupervised Learning

As the name suggests, unsupervised learning is a machine learning technique in which models are not supervised using training dataset.

Instead, models itself find the hidden patterns and insights from the given data.

It is much similar as a human learns to think by their own experiences, which makes it closer to the real AI.

In real-world, we do not always have input data with the corresponding output (training or labelled data referred to as supervisor). So, to solve such cases, we need unsupervised learning.

Steps:

Step 1: The very first step is to load the unlabeled data into the system.

Step 2: Once the data is loaded into the system, the algorithm analyses the data.

Step 3: As the analysis gets completed, the algorithm will look for patterns depending upon the behavior or attributes of the dataset.

Step 4: Once pattern identification and grouping are done, it gives the output.

Example: Input dataset containing images of different types of fruits.

Now, let’s take these fruits and feed them to an unsupervised learning model.

The model determines the features associated with the data and understands that all the apples are similar in nature and thus groups them together.

Similarly, it understands that all the oranges have the same features and thus group them together and the same is the case with all the mangoes (in case we have mangoes in this example)

Here, the unsupervised learning algorithm will perform this task by clustering the image dataset into the groups according to similarities between images.

Example 2:

For instance, given a data base of movie reviews, you could identify clusters of users who rate action movies similarly, and use those correlations to predict how one member might like a particular movie he had not yet seen, but others have rated.

Types of Supervised learning

  1. Classification

Supervised learning can be further divided into two types of problems:

Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc.

Multiple classes may also be present.

The output variable or the dependent variable should be categorical in nature.

Example:  Diagnosis

“Prone to lung cancer” (output variable) is the dependent variable and “Weight” and “Number of cigarettes smoked” are the independent variables.

  1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc.

  • Analyse the existing data and
  • Predict the future data parts.

Let’s say you have two variables, “Number of hours studied” & “Number of marks scored”. Here we want to understand how the number of marks scored by a student change with the number of hours studied by the student, i.e.

“Marks scored” is the dependent variable, and “Hours studied” is the independent variable.

You need to note that “marks scored” is the dependent variable and it is a continuous numerical.

Question: “How many hours should a student learn to get 60 points?” 

Ans: The regression model would understand that there is an increment of 10 marks for    every extra hour studied and to score 60 marks the student must study for 6 hours.

Example: Weather app in our mobile

                This app predicts the weather of the entire next week. How does it do?

By analysing the previous data (say past 10 years weather report data) and predicts the pattern for the next week.

Here, since we deal with large amount of data, it may be difficult for humans to work on it. Hence, the machines are fed with large amount of data and made to predict the future data parts.

How supervised learning works?

In supervised learning, models are trained using labelled dataset, where the model learns about each type of data.

Once the training process is completed, the model is tested based on test data (a subset of the training set), and then it predicts the output.

Example:

Task

Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape.

Training Experience

If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.

If the given shape has three equal sides, then it will be labelled as a triangle.

If the given shape has six equal sides, then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is to identify the shape.

Performance

The machine is already trained on all types of shapes, and when it finds a new shape, how well it classifies the shape based on number of sides and predicts the output.

Steps involved

  • First, determine the type of training dataset
  • Collect/Gather the labelled training data.
  • Split the training dataset into training dataset, test dataset, and validation dataset.
  • Determine the input features of the training dataset, which should have enough knowledge so that the model can accurately predict the output.
  • Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
  • Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters, which are the subset of training datasets.
  • Evaluate the accuracy of the model by providing the providing the test set. If the model predicts the correct output, which means our model is accurate.

References

https://www.javatpoint.com/supervised-machine-learning

Supervised learning

The machines are trained using well “labelled” training data, and on basis of that data, machines predict the output.

The labelled data means some input data is already tagged with the correct output.

Supervisor is this training data (labelled data) which helps to predict the output correctly when a new input data point is given as input.

The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y).

Step 1: The very first step of Supervised Machine Learning is to load labelled data into the system. This step is a bit time-consuming because the preparation of labelled data is often done by a human trainer.

Step 2: The next step is to train and build connections between inputs and outputs(function). This step is also known as the training model.

Step 3: Then comes the step known as the testing model. As the name suggests, you test the model by introducing it to a set of new data.

Here, the input is an independent variable, and the output is a dependent variable. The goal is to generate a mapping function that is accurate enough so that the algorithm can predict the output when we feed new input.

Example of labelled data:

We have a labelled dataset that consists of images of apples and oranges, with different attributes such shape, colour etc.

Consider the image of an apple shown above with the labels- shape, colour, and apple.

We train the model with this image. Then, we repeat the same training process with other images of both apples and oranges with their attributes.

What we are doing is-

Here, the input data is the independent variable and “Apple” or “Orange” is dependent variable as it is dependent on the input picture given.

Our goal is to generate a mapping function between the dependent and independent variable so we can determine the output when we feed a new data point.          

            

Once the model is trained and the algorithm is built, the accuracy can be tested with the help of a test dataset.

When we feed the model with a new apple image, it scans the image and matches the attributes of the image with other trained images. Then depending upon the accuracy of the model, it returns the output ‘apple’.

When new data point is given as input, say,

The machine should be able to guess the output as  “Apple”.

This labelled data or the training data (acts as supervisor), helps to predict the output as “Apple”.

Machine learning

Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so.

Machine learning algorithms are readily available. For example, Python has many libraries which support these machine learning algorithms.

Using this we can:

  1. Import a data set
  2. Fit a model to the data set
  3. Find the accuracy (how well it predicts a new data point)

Then why to learn the basics when instead we can use the pre-built algorithms that are readily available?

Still, we should learn the basics (the mathematics behind these concepts) of machine learning to be able to take an informed decision about the performance of the model.

Photo by Alex Knight on Pexels.com

We can fit the model to a data set and get some performance, but how to justify whether the performance given by the machine learning algorithm is proper or not?

The design of experiment                                                                                                                                              

Design of experiment

To be able to take the decisions with confidence we should know the fundamental concepts.

Category management in supermarkets: what's important for newcomers to know?

Category management is a categorisation of products according to buyers’ shopping habits. It is used within any retail store, including the supermarket or its website with the possibility of ordering goods for home delivery. Category management for one retail chain (for example, several supermarkets) is similar to physical stores and online trading. However, the categories may vary from point to point. This is because it depends on the customers who often visit the store.

What does a manager have to know when starting to work on categories in the supermarket?

It’s going to be difficult to analyze customer segments

Simply because they are going to be buying goods offline. When you move at least 50 per cent of your goods online to collect goods in-store or pay for home delivery, the whole situation will improve. Although, this still does not guarantee that the customer segments will meet your expectations. Yes, you can track customers through recurring patterns in receipts. You can also come up with hypotheses based on the data obtained from your competitors and other market analyses. Still, you won’t be able to study your customers under a microscope.

The cost of a product will often take precedence over its other qualities

It is a major problem for category management in supermarkets and retail in general. Most likely, grocery store visitors care more about the product’s price, and low cost always wins over other characteristics. Therefore, it will be more challenging to create categories: inevitably, you should be guided by the price of the goods.

At some point, it will be difficult to fight off competitors

You will be selling the same items which are available in other supermarkets. Some manufacturers may agree to cooperate exclusively with your chain of stores; however, this is unlikely because this arrangement is simply unprofitable for the supplier. Moreover, it’s impossible to lower the bar and start undercutting, that is, trading at a loss. After all, a business needs to earn money. In some instances, it may only be achievable through introducing discounts.

You will be able to test hypotheses only on a small number of consumers

Why? Most trade will occur in offline stores rather than in their online equivalents. And this, as we remember, complicates the task of marketers and, unfortunately, category managers. Only a small percentage of consumers shop online. That makes sense: you have to pay for delivery. You can’t select your products or ensure that all your fruit and vegetables are intact. Your delivery guy might also be late or you may face logistical problems like mixed-up orders.

How can you test your hypotheses on those customers who have turned to your supermarket website? First, you have to study your portal analytics: how users have found you, how they behave on the site, which tabs they open first, and how their filter products in various categories. If you have social media accounts, be sure to use them. The statistics offered by Instagram will demonstrate who your customers are and how they behave. By understanding the core of your audience accessing your resources, you can create category management hypotheses and test them on users before displaying the goods in stores. Of course, many of your customers will not shop online. Still, you can learn all about them, for example, through their choice of purchases on receipts.

The variety of categories will depend on the location of the supermarket

This limits your ability to influence the profits of the business. The categories will depend largely on areas where your shops are located, as well as people who live there and their buying power. For example, it is pointless to introduce a category of personal care products if jars of face cream have been sitting on the shelves for months in a particular store. It makes no sense to reduce the number of household goods with an above-average price tag if they sell like hotcakes. If you work in a supermarket in a deprived area, be prepared for a succession of identical categories with low-cost products.

Start learning category management now to secure a job within this field in a few weeks! This area has been actively booming in India and the surrounding regions. Therefore, your knowledge will definitely be in demand.