IBM Artificial Intelligence Analyst Mastery Award Preparation Guide

Introduction to machine learning

Machine learning

Machine learning is a field of artificial intelligence. It uses statistical methods to give computer the ability to "learn" from data, without being explicitly programmed.

Model: learning process use traning data to develop a model, then use model to make future predictions

statistical model is a mathematical function that represents a relationship or mapping between a set of inputs and a set of outputs.

Machine learning algorithms

a technique to extracts patterns from historical data. These patterns can be applied to new data.
data quality is critical for prediction accuracy

Machine learning approaches
1) Supervised learning:
1.1) Classification is the task of predicting a discrete class label, such as “black, white, or gray” and “tumoror not tumor”.
1.2) Regression is the task of predicting a continuous quantity, such as “weight”, “probability”, and “cost”.

2) Unsupervised learning: Detect patterns and relationships between data without using labeled data.
2.1) Clustering algorithms: Discover how to split the data set into a number of groups such that the data points in the same groups are more similar to each other compared to data points in other groups. Examples for applications include customer segmentation, image segmentation, and recommendation systems.

3) Reinforcement learning
- uses trial and error (a rewarding approach).
- The algorithm discovers an association between the goal and the sequence of events that leads to a successful outcome.

Sample of Machine learning algorithms:
•Naïve Bayes classification (supervised classification –probabilistic): assume the features are Independent of each other
•Linear regression (supervised regression, predict continuous value)
•Logistic regression (supervised classification, predict discrete category)
•Support vector machine (SVM) (supervised linear or non-linear classification)
•K-means clustering (unsupervised learning)
•Decision tree (supervised non-linear classification)

-Using entropy and information gain to construct a decision tree.
-Entropy: measure of the amount of uncertainty and randomness in a set of data for the classification task.

-Information gain:It is used for ranking the attributes or features to split at given node in the tree
-Information gain = (Entropy of distribution before the split)–(entropy of distribution after it)

Artificial Neural networks
Artificial neural networks are collections of nodes
Each node applies a mathematical transformation to the data it receives; it then passes its result to the other nodes in its path.

Deep learning

Similar to a traditional neural network, but it has many more hidden layers.

Deep learning has emerged now because of
- big data, which requires data processing
- GPUs to train neural networks
- Advancement in algorithms like the rectified linear unit (ReLU)

Deep learning Applications:
1) Multilayer perceptron (MLP): Classification and regression, for example, a house price prediction.
2) Convolutional neural network (CNN): For image processing like facial recognition.
3) Recurrent neural network (RNN): For one-dimensional sequence input data. Like audio and languages.
4) Hybrid neural network: Covering more complex neural networks, for example, autonomous cars.

Model evaluation

Overfitting (high variance): occurs when a machine learning model can fit the training set perfectly and fails with unseen future data.
- Reason: Too many features are used or you are reusing training samples in testing.
- Solution: a) Fewer features b)More data c)Cross-validation

Underfitting (high bias): occurs when a machine learning model cannot fit the training data or generalize to new data.
- Reason:The model is using a simple estimator.
- Solution: Add More features or use different estimator

Cross-validation (CV)
is a process to evaluate a machine learning model by splitting a data set once or several times to train and test the model. The data set can be split into a training set to train the model and a validation set to pre-test the model. Select the model that has least error. Finally, there is a test set to evaluate the model. Thus, the data set can be split as 60% - 20% - 20% for training, validation, and testing sets.
One criticism of this process is that splitting the data set into three parts reduces the number of samples that can be used for training the model.

Hold-out method
partitions the data set into a majority set for training and minority set for testing. The split of the training set to test set is 80% - 20% or 70% - 30%, with no fixed rule.

K-fold cross validation
randomly partitions data into K equal sized subsamples. For each iteration, one subsample is kept as validation set and the rest of the subsamples (K-1) are the training set. The iterations are repeated K times, where each subsample has one chance to be the validation set. The K results can then be averaged to produce a single model. The biggest advantage of K-fold is that all data is changed to be used for both training and validation. There is no strict rule for the number K, but it is commonly K=5 or K=10, which are 5-fold cross-validation or 10-fold cross-validation. For each subsample, you maintain approximately the same percentage of data of each target class as in the complete set, which is known as the Stratified K-fold method.

Leave one out CV (LOO-CV)
extreme case of k-fold with N-1 traning set and 1 sample only as test set!
So, is similar to K-fold, but in this case each one sample data point is held out as a validation set, and the rest of data set is the training set. Comparing LOO-CV and K-fold, K-fold is faster and requires less computation, but in terms of accuracy, LOO-CV often has a high variance as an estimator.

Watson services

Chatbot

Watson Assistant

Quickly build a chat bot by using tools and dialog trees.

Data

Watson Studio	collaborative environment with AI tools that a team can use to collect and prepare training data, and to design, train, and deploy machine learning models it support TensorFlow, Caffe, PyTorch, and Keras
Watson Machine Learning	Enables users to perform two fundamental operations of machine learning: training and scoring, use GPUs for faster training and hosting trained models use Hyperparameter optimization(HPO) for training complex neural networks (HPO) is a mechanism for automatically exploring, building a series of models and comparing the models using metrics of interest. To use HPO you must specify ranges of values to explore for each Hyperparameter.
Watson Knowledge Catalog	Machine learning data catalog (MLDC) that enables you to access, curate, categorize and share data, knowledge assets and their relationships, wherever they reside. It allow remote connect to DB2, S3, Hadoop,...

Knowledge

Watson Discovery	Adds cognitive search and content analytics to applications to identify patterns, trends, and insights. search in structured and unstructured data, can used to Find answers to FAQs
Watson Discovery News	Explore news and blogs with smarter news from Watson that includes concepts, sentiment, relationships and categories
Watson Natural Language Understanding	Analyze semantic features of text input, including the following items: Concepts, Entities, Keywords, Categories, Sentiment, Emotion, Relations, Semantic roles Used to Categorize news articles and blog posts
Watson Knowledge Studio	Teach Watson to discover meaningful insights in unstructured text The model can be deployed directly to Watson Natural Language Understanding, and Watson Discovery enter custom dectinary in rule-based modle using 3 column CSV file [Lemma,Poscode,Surface]

Vision

Watson Visual Recognition

Tag and classify visual content by using machine learning, can Train custom models to create specialized classes

Speech

Speech to Text	Easily converts audio and voice into written text, support English (US and UK), Japanese, Portuguese (Brazil), French, German,Spanish, Korean, Arabic, Mandarin Chinese.
Text to Speech	Converts written text into natural-sounding audio, support English (UK and US), Japanese, Portuguese (Brazil), French, German,Spanish, Italian.

Language

Language Translator	Identifies the language of text and translates it into different languages programmatically
Natural Language Classifier	Interprets and classifies natural language. Applies natural language processing and machine learning techniques to return the best matching classes for a sentence or phrase train using two column CSV file [Text, Class] Can used to Classify mail as spam or non-spam

Empathy

Personality Insights	Predicts personality characteristics through text Used in Market segmentation and campaigns cannot be trained by the user
Tone Analyzer	detects three types of tones: -Emotion (anger, disgust, fear, joy, and sadness) -Social propensities (openness,extro/introversion, agreeable, emotional range) -Language styles (analytical, confident, and tentative) cannot be trained by the user

Watson SDKs

Developers should consider using the SDKs instead of calling the REST APIs directly.

IBM SDKs:
.NET SDK, Python SDK, Java SDK, Node.js SDK, Ruby SDK, Swift SDK, Android SDK, and Unity SDK

Community SDKs:
Go SDK, PHP SDK, ScalaSDK

IBM Watson services offerings
- A set of services on IBM Cloud
- Software as a Service (SaaS)
- Set of industry solutions

Watson Studio

collaborative platform for data scientists, built on open source components and IBM added value, and is available in the cloud and on-premises.

Open-source components: Python, Scala, R, SQL, and notebooks (Jupyter and Zeppelin)

IBM added value: Watson Machine Learning, Flow Editor, Decision Optimization, SPSS predictive analytics algorithms, analytics dashboard

Watson Studio is designed for

- Data engineer: Designs how data is organized and ensures operability.

- Data scientist: Goes deep into the data to draw hidden insights for the business.

- Business analyst: Works with data to apply insights to the business strategy.

- App developer: Plugs into data and models and writes code to build apps.

Watson Studio Project: Project is a way to organize resources for a specific data science task or goal.

Project access level:
- Viewer: View the project.
- Editor: Control project assets.
- Admin: Control project assets, collaborators, and settings.

Projects consist of:
- Data assets are the files in your object store or connections, such as a database, and other external files.
- Collaborators can be assigned to your projects as admins, editors, or viewers.
- Analytic assets are the notebooks and the models that you develop.
- Tools: You can think of Watson Studio AI tools in four categories:
• Visual recognition
• Natural language classification
• Machine learning
• Deep learning

Watson Machine Learning

Steps To prepare data for a machine learning algorithm:
- Data selection handle Sampling noise, Sampling bias, validate sample accurate representation the entire population
- Data preprocessing handle Noise and outliers, Missing values, Inconsistent values, Duplicate data
- Data transformation Scaling, Aggregation, Decomposition

Watson Machine Learning perform two fundamental operations:
- Training is the process of refining an algorithm so that it can learn from a data set.
- Scoring is the operation of predicting an outcome by using a trained model.

Watson Machine Learning model creation require Cloud Object Storage Service

Tools to create a machine learning model:
1- AutoAI:
1.1) Data pre-processing
1.2) Automated model selection
1.3) Automated feature engineering
1.4) Hyperparameter optimization

2- SPSS Modeler (Flow Editor): create machine learning, deep learning, SparkML flow.

3- A Notebook to prepare data, train the model, and deploy the model.

Deploying the model : saved to the model repository (.PMML file extension)

Deployments methods:
- from a Notebook.
- from the Flow Editor.
- Deploy a batch or steaming model.

Natural Language Processing

Popular NLP tasks:
- Machine translation: Automatically translating one language to another
- Information retrieval: Search engines, such as Google and Bing
- Spell checkers
- Natural language assistants, such as Siri and Alexa

Basic concepts and terminologies

Synonyms	Clever and smart	Words that are written differently but are similar in meaning.
Antonyms	Clever and stupid	Words that have meanings that are opposite to each other
Homonyms		Words that have the same form but have unrelated meanings.
-Homographs		- This answer is right. - The building is on the right side of the river. - You have the right to remain silent. - Come here right now.
-Homophones	“left” and “lift”. “right” and “write”.	Words that sound similar when spoken but have different meanings and spellings
Polysemy	- face your fear. - face is beautiful.	Words that have the same written form and a related meaning
Hyponymy	Orange is a hyponym of fruit	word represents a subclass of the other word
Hypernymy	Fruit is a hypernym of orange	word represents a superclass of the other word

Natural language processing tools and services

- Apache OpenNLP: Provides tokenizers, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, co-reference resolution, and more.
- Stanford Core NLP: A suite of NLP tools that provide part-of-speech tagging, a named entity recognizer, a co-reference resolution system, sentiment analysis, and more.
- Natural Language Toolkit (NLTK): A Python library that provides modules for processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
- WordNet: lexical databases for the English language. Supported by APIs and programming languages.

NLP categories
- Natural Language Understanding (NLU)
- Natural Language Generation (NLG)

Natural language understanding
- Unstructured to structured
- Question and answer system
- Sentiment analysis

Natural language generation
- Machine translation
- Text summarization
- Weather forecasting system by convert weather charts and numbers to text

Natural language processing pipeline:
1.Sentence segmentation: split by and
2.Tokenization: split by space
3.Parts of speech (POS) tagging: tag each token with its grammatical representation, such as noun, verb, or adjective
4.Morphological processing:"happily" is broken into happy –ly, “football”, divided into two tokens “foot” and “ball”
5.Word-level (lexical) semantics: Deals with the meaning of words, replace “got” token with “received”.
6.Parsing: Evaluate in terms of grammatical correctness (if reqired)

Information retrieval
1- Stemming: Reduce a word to its word stem (glasses→glass)
2- Normalization

2.1) Case folding: Child --> child
2.2) Duplication removal: Hiiiiii --> Hi
2.3) Acronyms processing: WHO --> World Health Organization
2.4) Format normalization: $100 --> 100 dollars
2.5) Value normalization: 2 July 1980 --> DATE

3- Term Frequency
- TF: Term Frequency measures how many times a term t occurs in a document d.
- IDF: Inverse Document Frequency measures how rare a term is.

Watson NLP services:
- Watson Natural Language Classifier
- Watson Natural Language Understanding
- Watson Discovery

Watson NLP services that extract information from unstructured text:
- Watson Natural Language Understanding
- Watson Discovery
- Watson Tone Analyzer
- Watson Personality Insights

Watson Natural Language Understanding
analyzes the input text and provides an output that includes:
- Entities and relationships
- Sentiment analysis
- Keywords

Watson Discovery
cognitive search and content analytics to identify patterns, trends, and insights in structured and unstructured data.
Accepted document formats: .pdf, .docx, HTML, JSON

Model Evaluation

Accuracy = (Tp+Tn)/(Tp+Tn+Fp+Fn)

Precision = Tp/(Tp+Fp)

Recall = Tp/(Tp+Fn)

F-Score = 2*Precision*Recall/(Precision+Recall) --> The higher the F-score value is, the better the algorithm is

Watson Knowledge Studio

Knowledge Studio is a tool for annotating unstructured domain documents, Uses those annotations to create a custom model.

Use Knowledge Studio to create:
1- A machine learning model
2- A rule-based model

- The model created by Knowledge Studio can be plugged into a natural language processing (NLP) pipeline.
- Apply your models to Watson Discovery, Watson Natural Language Understanding, and Watson Explorer.

Machine learning model	Rule-based model
Statistical approach to find entities and relationships in documents. Learn from new data, Scalable. used for complex text extraction that have Variations	Declarative approach to finding entities in documents. Does not learn from new data. It can find only patterns that it has been trained to find. Useful for extract emails, URLs, and phone
Disadvantages Requires work to develop a supervised corpora (ground truth), and it requires certain amount of data.	Disadvantages Requires work to develop, write, and define a set of rules.
Developed by human	developed using - custom Dictionaries that mapped to a class name - Regular expression tool (Regex tool)

How to Building a model?
1- Import documents.
2- Human annotate documents.
3- Knowledge Studio uses a ground truth to train a model.
4- A trained model is ready to find entities, relationships, and co-references in new documents.

Notes: A ruled based model can be used to pre-annotate documents to speed up the human annotation process.

Ground truth editor: used to manually add annotations (mentions, relations, and co-references ) to small sets of documents.

Workspace

- Create a single workspace for each model.
- The workspace contains the artifacts and resources needed to build the model.
- One workspace may contain one rule-based model and one machine learning model.

Workspace resources: You add the following types of resources to the workspace:
1- Type system: Defines the entities and relationships between entities that matter to you

1.1- Mentions. Example: “Watson”, “IBM”.
1.2- Entities types: Example: “PERSON”, “ORGANIZATION”.
1.3- Relation types: Example: founderOf, employedBy.

2- Dictionaries: Group words and phrases that should be treated equivalently by a model.
3- Documents

CSV file dictionary
•The standard dictionary format.
•The maximum size of a CSV file that you can upload is 1 MB.
•The first row in the file must specify the following column headers:lemma,poscode,surface
where:
- Lemma: Specifies the most representative word form for the entry.
- Poscode: Specifies a code that identifies the part of speech.
- Surface: Specifies equivalent terms, also called surface forms.

Poscode:
0 - Unknown
1 - Pronoun
2 - Verb
3 - Noun
4 - Adjective
5 - Adverb
6 - Adposition
7 - Interjection
8 - Conjunction
9 - Determiner
10 - Quantifier

Creation of a model stages:
1.Knowledge curation: collect the documents (import specific documents)
2.Ground truth generation: collection of vetted data, annotated documents, that can be used to train watson new domain.
3.Model development: Only documents that became ground truth can be used to train the model
4.Model evaluation
5.Model deployment

Watson Assistant

Conversational design key factors
1-Positioning

1.1- Purpose: What is the purpose of the solution?
1.2- Viewpoint: What role should solution play?Assistant, coach, salesperson
1.3- Proactivity:

1.3.1- Proactive (lean forward): The chatbot reaches out to the user and asks
1.3.2- Reactive (lean backward): The chatbot waits for the user to ask
1.3.3- Combination: The chatbot uses both techniques.

2-Tone and personality (friendly, informal tone vs Formal): Humor increases user's understanding, and satisfaction

Design issues
1.Understand the limitations of a chatbot.
2.Acknowledge limitations, “I don’t know”, and give some suggestions
3.Handle frustration. “Would you like to talk to a real person?”

Chatbot components
- Intents
- Entities
- Dialog

Watson Assistant components
- Assistants: can be deployed through Slack and Facebook Messenger, website widget.
- Skills: Workspace contains Intents, Entities, Dialog
- Intents: Represents the purpose of user's input, (intent is always prefixed with ’#’)
- Entities: represent nouns , always prefixed with ’@’. example, @street or @contact_info.
- Dialog: defines how responds when recognizes defined intents and entities, branching conversation flow
- Content catalog: prebuilt common intents that you can add to application

Entities: user input match the values, synonyms, or patterns that you define for the entity

Enable entity Fuzzy matching to:
- recognize terms similar to the entity value and synonyms
- Detects different grammatical forms. For example:bananas > banana
- Detects misspelled entities.
- Detects partial matching.

Context
- it is an object that passed between application and the Watson Assistant service, maintains state information such as a customer's name or account number.

Context variables
- a variable that you define in a dialog node or from an application
- you can specify a default value for a context variable
- Nodes and application logic can change its value
- context object can be handled in the context editor interface
- Context variables (ex username variable) are stored in context objects that are described as a JSON entry in the node.
- the context variable is always prefixed with ’$’, for example, $username.

Conditions
- Logical expressions that are evaluated to true or false.
- A node condition determines whether that node is used in the conversation.
- Conditions evaluate the intents, entities, and context that are identified in the user responses.

Responses
- The dialog response defines how to reply to the user.
- response are based on intents, entities, or context.
- You can add variations to the response for a more natural experience.
- Response can be Text, Images, list of options, Pause: to wait for a specified number of milliseconds

Slots
- Add slots to a dialog node to gather multiple pieces of information from a user within that node.
- Slots collect information at the user’s pace. Details that the user provides are saved, and the service asks only for the details that the user did not provide.

Enriching the chatbot
- Persona (Avatar, voice)
- Emotion and tone ( Sentiment analysis, Tone analysis)
- Interfaces with other chat applications
- Speech recognition

To enable Communicating with Watson Assistant
1- Create a wrapper for the Watson service.
2- Specify the service credentials.
3- Specify the service version.
4- Watson SDKs provide mechanisms for instantiating a service wrapper

Watson Assistant integration:
1) Front end or channel: where the users type their questions

2) Application layer:
2.1- Pre-processing: can add Tone Analyzer
2.2- Watson Assistant: to detecting the intent, returns some action to perform and some text or make directly call to external services from the Watson Assistant dialog
2.3- Post-processing: use Watson Discovery service or writing information to a database

Notes:
- Watson Assistant dialog passes intent, entities, and context to the application.
- Application calls "assistant.message" API by passing the payload object and callback function in its arguments.
- Watson Assistant processes the task, sets the output, intents, entities, and context in the response, and then returns to the callback function.
- The response is processed by the application in the callback function.

Calling external applications from a dialog node
- Create Orchestration application that acts as the middle layer
- Call the external service directly from dialog node.
- use IBM Cloud Functions

IBM Cloud Functions is based on Apache OpenWhisk. It is a Function-as-a-Service (FaaS) platform that runs functions in response to incoming events or direct invocations.

Actions Array

- To make programmatic calls directly from a dialog node, add an actions array to the node by using the JSON editor.
- actions array can define up to five separate programmatic calls.

Actions array consist of
1) action name
2) type of call to make:
- client: Sends a message response to client application.
- cloud_function: Calls an IBM Cloud Functions action (one or more) directly.
- web_action: Calls an IBM Cloud Functions web action (one or more) directly.
*You must define action itself separately by using IBM Cloud Functions.

3) parameters list, and the response from the external function or service.
4) result_variable: reference the JSON object that is returned by the external call.
5) credentials

Watson Visual Recognition

Convolutional neural networks were proposed in 1998 research paper by Yann LeCun and Léon Bottou.

Computer vision use cases
-Facial recognition: Tagging friends on social media.
-Augmented reality: adding computer-generated images into the user view

Visual pattern recognition pipeline
- Image Acquisition
- Pre-Processing: Resizing images, Noise reduction, Contrast adjustment
- Segmentation: into foreground and background
- Feature Extraction: distinct color, shapes(line, corner)
- Selection: choosing Features subset to reduce dimensionality
- Classification

Watson Visual Recognition built-in models (without training):
- General model: Default classification from thousands of classes.
- Face model: Facial analysis with age and gender.
- Explicit model: Assess whether an image contains objectionable or adult content.
- Food model: Specifically for images of food items.
- Text model (Private beta): Text extraction from natural scene images.

Watson Visual Recognition custom models
- Classification models: predict the existence of objects in an image
- Object Detection models: Used to locate or count objects in an image

Custom Model Notes:
- To train a custom model, You must provide at least two example .zip files: two positive examples files or one positive and one negative file.
- If an image is duplicated in both the negative and positive sets, the rule is that a duplicate image is kept in the positive set.
- Images in the training and testing sets should resemble each other with regard to angle, lighting, distance, size of subject, and other factors.
- Custom model uses binary “one versus the rest” models to train each class against the other classes

Computer vision tasks

1- Object detection and recognition:Detect certain patterns within the image (ex red eyes, face).

2- Content-based image retrieval: Image retrieval from a database based on user’s image query.

3- Optical character recognition (OCR):Converting hand-written text to a digital format.

4- Object tracking:Following the position changes of a target object from one frame to another

5- Image restoration: Fixing and restoring images that are corrupted by noise, such as motion blur

6- Scene reconstruction: Creation of a 3D model by supplying the system with multiple 2D images from different views