Skip to content

Unit I AI and its Subfields

  • Introduction to Artificial Intelligence (AI), History, Definition, Artificial General Intelligence (AGI), Industry Applications of AI, Challenges in AI.
  • Knowledge Engineering, Machine Learning (ML), Computer Vision, Natural Language Processing (NLP), Robotics.

Introduction to Artificial Intelligence

Artificial Intelligence (AI) is a broad and rapidly evolving field of computer science focused on designing and building machines that can perform tasks that normally require human intelligence.

These tasks include:

  • Learning – the ability to improve performance based on experience (e.g., machine learning algorithms that learn from data)
  • Problem-solving – finding solutions to complex problems (e.g., optimizing routes, solving puzzles)
  • Decision-making – making choices under uncertainty (e.g., medical diagnosis systems)
  • Perception – interpreting sensory data like images, sounds, or videos (e.g., facial recognition)
  • Language understanding – processing and generating human language (e.g., chatbots, translation tools)

AI systems aim to replicate or simulate human cognitive functions using algorithms, statistical models, and data-driven approaches.

Types of AI

  1. Narrow AI (Weak AI) – Designed for a specific task (e.g., voice assistants, recommendation systems).
  2. Narrow AI (Weak AI) – Designed for a specific task (e.g., voice assistants, recommendation systems).
  3. Super AI – Hypothetical systems that surpass human intelligence across all domains.

Img - AI unit 1_2.jpg

AI Techniques

  • Machine Learning – Systems that learn from data without being explicitly programmed.
  • Deep Learning – A subset of machine learning using neural networks with many layers.
  • Natural Language Processing (NLP) – Enabling machines to understand and generate human language.
  • Computer Vision – Allowing machines to interpret visual information.

History of Artificial Intelligence (AI)

The history of Artificial Intelligence spans several decades of research, innovation, and milestones. Here’s a structured overview from its early ideas to present-day advancements:

  1. Early Ideas (Pre–20th Century)
  • The concept of intelligent machines can be traced back to ancient myths, stories, and automata.
  • Greek myths about mechanical beings and medieval thinkers like Aristotle explored logic and reasoning, laying philosophical foundations.
  • In the 17th century, Gottfried Wilhelm Leibniz worked on symbolic logic, which influenced later computational theories.
  1. Birth of AI (1940s–1950s)
  • Alan Turing (1950): Introduced the Turing Test to assess whether a machine can exhibit intelligent behavior indistinguishable from a human.
  • John von Neumann: His work on stored-program computers provided the architecture for computational processes.
  • 1956 – Dartmouth Conference: Considered the official birth of AI as a field. Researchers like John McCarthy, Marvin Minsky, Herbert Simon, and Allen Newell gathered to explore machine intelligence.
  1. Early AI Programs (1950s–1970s)
  • Logic Theorist (1956): Created by Allen Newell and Herbert Simon; it could prove mathematical theorems.
  • General Problem Solver (1957): Another early AI program capable of solving puzzles.
  • ELIZA (1966): An early chatbot by Joseph Weizenbaum that simulated conversation.
  1. The First AI Winter (1970s–1980s)
  • Progress slowed due to limitations in computing power and unrealistic expectations.
  • Funding and interest declined because early systems couldn't handle real-world complexities.
  • AI research faced skepticism, and this period became known as the AI Winter.
  1. Expert Systems Era (1980s)
  • AI revived with Expert Systems, programs that used a set of rules to solve specific problems.
  • Example: MYCIN – A medical diagnosis system.
  • Governments and industries invested heavily, but these systems were expensive and hard to maintain.
  • The field again faced challenges, leading to a second AI Winter in the late 1980s.
  1. Machine Learning and Big Data (1990s–2000s)
  • Researchers shifted focus from rule-based systems to machine learning, where systems learn from data.
  • Support Vector Machines (SVMs), decision trees, and other algorithms gained popularity.
  • With improved computing power and access to large datasets, AI systems became more practical.
  • AI started being used in areas like speech recognition and recommendation systems.
  1. Deep Learning and Modern AI (2010s–Present)
  • Deep learning using neural networks revolutionized AI.
  • Image recognition, natural language processing, and self-driving cars became achievable.
  • Landmark achievements:
    • AlphaGo (2016): Defeated human champions in the game of Go.
    • GPT models (2018 onward): Large language models like GPT-3 and GPT-4 that understand and generate human-like text.
  • AI is now widely used in healthcare, finance, education, entertainment, and more.
  1. Current Trends and Future Outlook
  • Ethical AI – Addressing concerns about bias, privacy, and fairness.
  • Explainable AI (XAI) – Making AI decisions understandable to humans.
  • AI Governance – Creating policies and regulations to ensure responsible use.
  • Artificial General Intelligence (AGI) – Still a long-term goal, but research continues toward machines that can think, reason, and learn like humans.

Summary Timeline

Year/PeriodKey Event
Ancient timesMyths and philosophical ideas on intelligence
1950Alan Turing’s Turing Test
1956Dartmouth Conference – Birth of AI
1960sELIZA chatbot, early theorem-proving programs
1970sFirst AI Winter due to limitations
1980sRise of expert systems
Late 1980sSecond AI Winter
1990s–2000sMachine learning becomes dominant
2010s–presentDeep learning breakthroughs and real-world AI apps
FutureEthical AI, AGI research, global regulation efforts

Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) refers to a type of machine intelligence that can perform any intellectual task that a human being is capable of. It is also known as strong AI or full AI. Current AI systems are limited to specific functions,

whereas AGI aims to replicate human cognitive abilities such as:

  • Reasoning
  • Problem-solving
  • Learning from experience
  • Understanding language, context, and emotions
  • Adapting to new situations

AGI systems are designed to apply knowledge across different domains without needing specialized programming. They can transfer skills, learn from new experiences, and improve themselves over time, much like humans do.

Key Features of AGI:

  1. Broad Abilities: Able to perform diverse tasks without needing separate training for each.
  2. Learning and Adaptation: Learns from experience and applies it to unfamiliar situations.
  3. Human-like Understanding: Interprets language, context, and behavior intelligently.
  4. Transferability: Applies knowledge across various domains and tasks.
  5. Self-improvement: Enhances its own capabilities without external intervention.

Why is AGI Important? AGI represents the next frontier in artificial intelligence. It has the potential to revolutionize technology and society by creating machines that think, reason, and learn like humans. However, developing AGI is a major challenge due to its complexity and ethical considerations.

Differences between Artificial General Intelligence (AGI) and Narrow or Weak AI

ComponentAGI (Artificial General Intelligence)Narrow or Weak AI
DefinitionA machine intelligence capable of performing any intellectual task a human can do.Designed to perform a specific task or a limited set of tasks.
Scope of AbilitiesBroad cognitive capabilities; can handle multiple tasks without specialized programming.Limited to one or a few tasks; cannot adapt beyond its programmed scope.
LearningLearns from experience and applies knowledge across different domains.Learns only within a narrow context; cannot generalize learning.
AdaptabilityCan adapt to new challenges and tasks autonomously.Cannot adapt to new tasks without human intervention or reprogramming.
UnderstandingUnderstands context, language nuances, meaning, and behavior like humans.Operates based on rules and patterns; lacks deeper understanding.
TransferabilityApplies skills and knowledge across various fields and domains.Restricted to specific domains; cannot transfer knowledge to unrelated tasks.
Self-improvementCapable of learning and improving independently over time.Requires external updates or modifications for improvements.
Interaction with HumansCommunicates naturally and intelligently, resembling human interaction.Interaction is rigid, rule-based, and task-specific.
ComplexityHighly complex; mirrors human reasoning and decision-making processes.Relatively simple and task-oriented.
GoalTo create machines with human-like cognitive abilities and versatility.To solve particular problems efficiently without human-like intelligence.

Img - AI unit 1_6.jpg

Industry Applications of AI

Artificial Intelligence (AI) is being widely applied across industries to improve efficiency, reduce costs, and enhance customer experience.

Industry Applications of AI

IndustryAI Applications
Healthcare- Medical diagnosis and imaging analysis (e.g., identifying diseases from scans)
- Personalized treatment plans
- Drug discovery and development
- Virtual health assistants and chatbots for patient interaction
- Monitoring patient health using wearable devices
Finance- Fraud detection and risk management
- Algorithmic trading and investment strategies
- Credit scoring and loan approval
- Customer service automation using AI-powered chatbots
- Predictive analytics for market trends
Retail & E-commerce- Personalized product recommendations
- Inventory management and demand forecasting
- Visual search and virtual try-on features
- Customer service automation – Price optimization
Manufacturing- Predictive maintenance to avoid equipment failure
- Quality control and defect detection
- Supply chain optimization
- Robotics for assembly and packaging
- Process automation
Automotive- Autonomous vehicles and driver-assist systems
- Traffic pattern analysis for smart navigation
- Predictive maintenance
- Enhanced safety features using sensors and AI algorithms
Education- Intelligent tutoring systems that personalize learning
- Automated grading and assessment
- Virtual classrooms and AI-driven content recommendations
- Learning analytics for student performance tracking
Entertainment & Media- Content recommendation engines (movies, music, games)
- Automated video editing and production
- AI-driven storytelling and scriptwriting tools
- Enhanced user experience through interactive platforms
Energy & Utilities- Smart grids and energy management
- Forecasting energy consumption
- Predictive maintenance of infrastructure
- Optimization of renewable energy sources
Agriculture- Crop monitoring using drone imagery
- Pest and disease detection
- Precision farming with sensor data
- Automated irrigation and yield prediction
Human Resources- Resume screening and candidate selection
- Employee performance analytics
- Predictive workforce planning
- Training and onboarding automation

Advantages of AI in Industry:

  • Improved decision-making through data analysis
  • Automation of routine and repetitive tasks
  • Enhanced customer satisfaction through personalization
  • Increased efficiency and reduced operational costs
  • Faster innovation and development cycles

Challenges in AI

Artificial Intelligence offers tremendous benefits, but its development and deployment face several challenges that must be addressed for safe, ethical, and effective use.

Challenges In Table

CategoryChallenges
Technical Challenges- Data Quality and Availability: AI systems require large amounts of high-quality data, which may not always be available or may be biased.
- Interpretability: Many AI models, especially deep learning ones, are “black boxes” whose decision-making processes are difficult to understand.
- Scalability: Developing AI systems that perform reliably across different environments and large datasets is complex.
- Robustness: AI models can be sensitive to small changes in input, leading to incorrect or unsafe outcomes.
- Integration: Incorporating AI into existing systems and workflows can be technically challenging.
Ethical Challenges- Bias and Fairness: AI can inherit biases from training data, leading to unfair or discriminatory outcomes.
- Privacy: Collecting and using personal data for AI can compromise individual privacy rights.
- Transparency: There is a need for clear explanations of how AI makes decisions to ensure accountability.
- Accountability: Determining who is responsible when AI systems cause harm or make mistakes is complex.
Social Challenges- Job Displacement: Automation may replace human jobs, leading to unemployment and economic disparity.
- Trust: Users may be reluctant to trust AI systems due to fears about reliability, control, and misuse.
- Security: AI systems can be vulnerable to attacks, such as adversarial inputs or data manipulation.
- Access Inequality: Advanced AI technologies may be accessible only to wealthier organizations or countries, increasing global inequality.
CategoryChallenges
Regulatory Challenges- Lack of Standards: AI development often lacks standardized frameworks, making governance and safety regulation difficult.
- Legal Frameworks: Existing laws may not cover AI’s unique risks, requiring new policies and legislation.
- Cross-border Coordination: AI impacts are global, requiring cooperation between governments and industries.
Environmental Challenges- High Energy Consumption: Training large AI models demands significant computational power, contributing to carbon emissions.
- Sustainability: Efficient use of resources and energy management in AI systems is still a major concern.

Knowledge Engineering

Knowledge Engineering is a branch of Artificial Intelligence (AI) and Computer Science dedicated to the design, development, and maintenance of knowledge-based systems. Its primary objective is to capture, represent, organize, and utilize human expertise and domain-specific knowledge in a machine-readable form. This structured knowledge enables intelligent systems to reason, make informed decisions, and solve complex problems within specialized domain.

Steps in Knowledge Engineering

  1. Knowledge Acquisition – The process of gathering expertise from domain specialists and converting it into a form understandable by computers. Techniques include expert interviews, surveys, observations, and analysis of existing documents or case studies.
  2. Knowledge Representation – Organizing and structuring acquired knowledge in a machine-interpretable format. Ontologies, semantic networks, rules, frames, and logic-based models are common representation methods.
  3. Knowledge Integration – Combining knowledge from different sources such as structured databases, unstructured text, and expert rules into a unified knowledge base.
  4. Inference and Reasoning – Developing algorithms and mechanisms that allow the system to draw conclusions, make inferences, and apply logical reasoning.
  5. Maintenance and Refinement – Continuously updating and improving the knowledge base to keep pace with domain changes. This ensures accuracy, relevance, and adaptability of the system over time.
  6. Verification and Validation Checking that knowledge has been correctly captured and represented. Ensuring that the system’s results are consistent with real-world expectations and expert judgment.
  7. Deployment and Interaction Integrating the knowledge-based system into real-world applications. Designing user-friendly interfaces that allow users to query the system and receive accurate, meaningful responses.

Img - AI unit 1_10.jpg

Machine Learning

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that allows computers to automatically learn from data and enhance their performance on tasks without being explicitly programmed. In other words, ML systems identify patterns in data and use them to make predictions or decisions.

Here’s Tom Mitchell’s widely cited definition of Machine Learning: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

omponents:

  • T = Task (what the program is supposed to do)
  • P = Performance measure (how we evaluate the model)
  • E = Experience (data or feedback used for learning)
Example
  • Task (T) = Playing chess
  • Performance (P) = Win rate against opponents
  • Experience (E) = Games played.

Types of Machine Learning

The following table represents the different types of machine learning and its applications.

Table: Types of Machine Learning

TypeDefinitionExampleApplications
1. Supervised LearningThe model learns from labeled data (input–output pairs) to make predictions.Classification, Prediction (regression)- Email spam detection
- Stock price prediction
- Medical diagnosis
2. Unsupervised LearningThe model learns patterns from unlabeled data, finding structure or relationships.Clustering- Customer segmentation
- Market basket analysis
- Anomaly detection
3. Reinforcement LearningThe model learns by trial and error, receiving rewards or penalties for actions.Training a robot to navigate a maze- Robotics
- Game AI (e.g., AlphaGo)
- Self-driving cars
4. Semi-Supervised LearningUses both labeled and unlabeled data to improve learning.Classifying web content when only some pages are labeled- Text classification
- Image recognition with limited labels
5. Self-Supervised LearningModel generates labels from input data itself to learn representations.Predicting missing words in a sentence (used in NLP models like GPT)- Natural Language Processing (NLP)
- Computer vision
- Speech recognition

Classification is a type of supervised learning task where the goal is to predict the category or class label of new observations based on past data.

It involves two phases:

  1. Training Phase: The algorithm learns from a labeled dataset where each input has a known class.
  2. Prediction Phase: The trained model predicts the class for new, unseen data.
ExampleInput FeaturesClass Labels
Email spam detectionEmail content, sender, subjectSpam / Not Spam
Disease diagnosisSymptoms, age, test resultsDisease / No Disease
Credit card fraud detectionTransaction amount, locationFraud / Not Fraud
Handwritten digit recognitionPixel values of the image0, 1, 2, …, 9

Common Classification Algorithms:

  • Decision Tree
  • Random Forest
  • Support Vector Machine (SVM)
  • k-Nearest Neighbors (k-NN)
  • Naive Bayesian Classification
  • Artificial Neural Networks (ANN)

Regression is supervised learning used to predict continuous numeric values from input features, such as predicting house prices from size.
Regression involves two phases: training and prediction.

  • In the training phase, the model learns from a labeled dataset where each input has a known numeric output.
  • In the prediction phase, the trained model predicts the numeric value for new, unseen inputs.
ExampleInput FeaturesOutput (Continuous Value)
House price predictionArea, location, number of roomsPrice in dollars
Temperature forecastingHumidity, pressure, timeTemperature in °C
Stock price predictionHistorical prices, market trendsStock price in dollars
Car resale value estimationAge, mileage, brandPrice in dollars

The most important regression algorithms are:

  1. Simple linear regression – One independent variable, one response variable
  2. Multiple linear regression – Two or more independent variables, one response variable
  3. Non-linear regression

Clustering

Clustering is an unsupervised learning technique that aims to group similar data points based on their features, without relying on predefined labels.

  • Input: Unlabeled data
  • Output: Groups or clusters of similar items

The algorithm examines the data and measures similarity between data points.
It then assigns data points to clusters such that points in the same cluster are more similar to each other than to those in other clusters

Examples of Clustering Tasks

ExampleInput FeaturesOutput (Clusters)
Customer segmentationAge, income, buying behaviorHigh-value, Medium-value, Low-value customers
Document groupingText content, keywordsSports, Politics, Technology documents
Image segmentationPixel values, color, textureDifferent objects or regions in images
Market analysisPurchase history, demographicsCustomer groups with similar preferences

Img - AI unit 1_13.jpg

Important Clustering Algorithms

  • K-Means Clustering
  • Hierarchical Clustering
  • DBSCAN (Density-Based Spatial Clustering)
  • Gaussian Mixture Models (GMM)

Artificial Neurons An artificial neuron (also called a perceptron) is the basic computational unit of a neural network, inspired by the biological neuron in the human brain. It receives one or more inputs, processes them, and produces an output based on a function. Structure of an Artificial Neuron:

(Inputs x₁, x₂, … xₙ → weights w₁, w₂, … wₙ → Summation → Activation Function → Output)

Differences Between Biological and Artificial Neurons

FeatureBiological NeuronArtificial Neuron
Basic UnitNerve cell in the brain or nervous systemComputational unit in an artificial neural network
Signal TypeElectrical (action potentials) and chemical (neurotransmitters)Numerical values (real numbers)
StructureDendrites, cell body (soma), axon, synapsesInputs, weights, bias, activation function, output
Input HandlingReceives signals from thousands of other neurons through dendritesReceives multiple weighted inputs from other neurons or features
ProcessingNon-linear integration of signals in the somaComputes a weighted sum of inputs and applies an activation function
OutputAction potential transmitted via axon to other neuronsOutput value sent to next layer or final result
Learning MechanismSynaptic plasticity: strengthens or weakens connections (Hebbian learning, LTP/LTD)Adjusts weights and biases through optimization algorithms (e.g., gradient descent)
Communication SpeedSlower (~milliseconds per signal)Very fast (microseconds per computation)
FeatureBiological NeuronArtificial Neuron
Energy SourceMetabolic energy (ATP)Electrical energy in a computer
ComplexityHighly complex, can self-organize, repair, and adaptSimpler mathematical model, fully controlled by code and data
FlexibilityCan handle ambiguous and incomplete information naturallyRequires training data and defined network structure

Single layer perceptron is suitable for solving linearly separable problems such as AND/OR.
It is not suitable for solving linearly inseparable problems such as XOR.
To solve XOR problems, multilayer perceptron is required.

Multilayer Perceptron (MLP) A Multilayer Perceptron (MLP) is an artificial neural network composed of several layers of interconnected neurons (nodes) arranged in a feedforward structure.
It has:

  • one input layer
  • one or more hidden layers
  • one output layer

The Backpropagation algorithm is used to train multilayer perceptron networks.

Structure of an MLP

Deep Learning

Deep Learning (DL)is a type of Machine Learning that uses multi-layered neural networks to automatically learn features from large and complex data such as images, audio, and text, without manual feature engineering.

The following table represents different deep learning architectures:

Img - AI unit 1_15.jpg

Reinforcement Learning (RL)

Reinforcement Learning is a branch of machine learning in which an agent learns to make decisions by interacting with an environment, aiming to maximize cumulative rewards. Unlike supervised learning, it does not rely on labeled data; instead, the agent improves its performance through trial and error.

The following table represents key components in reinforcement learning:

ComponentDescription
AgentThe learner or decision maker.
EnvironmentThe world the agent interacts with.
State (s)The current situation of the agent in the environment.
Action (a)Choices the agent can make in a given state.
Reward (r)Feedback received after taking an action. Can be positive or negative.

Img - AI unit 1_16.jpg

Key Components of Reinforcement Learning

ComponentDescription
Policy (π)Strategy used by the agent to decide actions based on states.
Value Function (V or Q)Estimates the expected reward from a state (V) or state–action pair (Q).
ModelOptional; predicts the next state and reward given current state and action.

Working Procedure of Reinforcement Learning

  1. The agent observes the current state.
  2. Based on its policy, it selects an action (a).
  3. The environment responds with a reward (r) and the next state (s’).
  4. The agent updates its policy or value function to improve future decisions.
  5. Repeat until the agent learns an optimal policy.

Types of Reinforcement Learning

  1. Model-Free RL
  • The agent learns without knowing the environment’s dynamics.
  • Examples:
    • Q-Learning
    • SARSA
  1. Model-Based RL The agent tries to learn a model of the environment and uses it for planning.

Popular RL Algorithms

CategoryAlgorithm
Value-BasedQ-Learning, Deep Q-Networks (DQN)
Policy-BasedREINFORCE, Policy Gradient
Actor-CriticA3C, PPO, DDPG

Applications of RL:

  • Gaming: AlphaGo, Chess, Atari games
  • Robotics: Robot navigation, manipulation
  • Finance: Portfolio management, trading
  • Healthcare: Treatment planning, drug discovery
  • Autonomous Vehicles: Self-driving car decision making.

Img - AI unit 1_17.jpg

Computer Vision

Computer Vision is a field of artificial intelligence and computer science that enables computers to interpret, analyze, and understand visual information from the world, such as images or videos, in a way similar to human vision.
The goal is to automate tasks that the human visual system can perform.

Key Tasks in Computer Vision

TaskDescription
Image ClassificationAssigning a label to an entire image (e.g., cat, dog).
Object DetectionIdentifying and locating objects in an image with bounding boxes.
Image SegmentationDividing an image into meaningful regions (semantic or instance segmentation).
Face RecognitionIdentifying or verifying a person from facial images.
Optical Character Recognition (OCR)Converting printed or handwritten text into machine-readable text.
Pose EstimationDetecting human body keypoints and posture.
Image Generation / EnhancementTasks like super-resolution, image inpainting, and style transfer.

Object Detection is a computer vision task focused on not only identifying the class of objects in an image or video but also locating them. Unlike image classification, which assigns a single label to the entire image, object detection provides both the category of each object and its spatial location using bounding boxes.

Example Use Cases of Object Detection

Application AreaExample Use CaseCompanies / Countries Using It
Autonomous VehiclesDetecting pedestrians, vehicles, traffic signsTesla Autopilot (USA), Waymo (USA), Baidu Apollo (China)
Security & SurveillanceMonitoring public spaces for intrudersAirports & banks globally, Hikvision (China), Dahua (China)
Retail & Inventory ManagementProduct detection, automated checkoutAmazon Go (USA), Walmart smart shelves (USA), JD.com (China)
Healthcare & Medical ImagingDetecting tumors or abnormalities in scansIBM Watson Health (USA), Zebra Medical Vision (Israel), Aidoc (Israel)
Industrial Automation & RoboticsDetecting defects, sorting objectsSiemens (Germany), FANUC (Japan), ABB Robotics (Sweden/Switzerland)
AgricultureCrop disease detection, yield estimationDrone-based monitoring in USA, Netherlands, India
Augmented Reality & GamingOverlaying virtual elements on real objectsPokémon Go (Global), IKEA Place (Global)
Wildlife & Environmental MonitoringCounting animals, detecting poachingAfrican safari reserves, WWF projects, global camera traps

Face Recognition Face Recognition is a computer vision technology that identifies or verifies a person by analyzing facial features from an image or video. It matches the detected face against a database to recognize the individual.

Components of Face Recognition

ComponentDescription
Face DetectionLocating a face within an image or video frame.
Feature ExtractionMeasuring unique facial characteristics like distance between eyes, nose shape, or jawline.
Face Matching / RecognitionComparing extracted features with a database to identify or verify a person.

Applications of Face Recognition

  • Security & Surveillance: Airport security, public safety monitoring
  • Smartphones & Devices: Unlocking phones using Face ID
  • Banking & Payments: Biometric authentication for transactions
  • Social Media: Automatic tagging of people in photos
  • Law Enforcement: Identifying suspects or missing persons

Scene Understanding

Scene Understanding is a computer vision task where a system interprets an entire scene in an image or video, identifying objects, relationships, spatial layout, and context to understand what is happening.
It goes beyond object detection to analyze the overall environment and interactions.

Components of Scene Understanding

ComponentDescription
Object DetectionIdentifying individual objects within the scene.
Semantic SegmentationClassifying each pixel in the image according to object type or region (e.g., road, sky, car).
Instance SegmentationDifferentiating multiple instances of the same object type.
Contextual UnderstandingRecognizing relationships and interactions between objects (e.g., a person riding a bicycle).
Scene ClassificationDetermining the overall type of scene (e.g., beach, city street, forest).

Applications of Scene Understanding

  • Autonomous Vehicles: Understanding traffic scenes, predicting pedestrian and vehicle behavior
  • Robotics: Navigation and manipulation in complex environments
  • Surveillance: Detecting unusual or suspicious activities in public spaces
  • Augmented Reality: Accurately overlaying virtual objects in real-world scenes
  • Smart Cities: Monitoring urban environments for traffic and crowd analysis

Medical Imaging refers to techniques and processes used to create visual representations of the interior of the body for clinical analysis, diagnosis, and treatment planning. AI and computer vision enhance medical imaging by automatically analyzing images, detecting anomalies, and assisting healthcare professionals.

Natural Language Processing (NLP)

Natural Language Processing (NLP) is a branch of artificial intelligence and linguistics that enables computers to comprehend, interpret, generate, and interact with human language, effectively bridging the gap between human communication and machine understanding.

Text Pre-processing Text pre-processing is an essential step in NLP. Common techniques include:

  • Lowercasing: Converting text to lowercase to ensure consistency and avoid case-related duplication (e.g., computer and Computer).
  • Tokenization: Splitting text into individual words or tokens (e.g., ‘and’, ‘the’).
  • Stop Word Removal: Eliminating common, uninformative words that don’t add meaning to the analysis.
  • Punctuation Removal: Removing punctuation marks that are often irrelevant in many NLP tasks.
  • Numerical and Special Character Removal: Removing numbers and other non-alphabetic characters based on the analysis need.
  • Whitespace Trimming: Removing unnecessary spaces, tabs, or line breaks.
  • Lemmatization and Stemming: Reducing words to their base or root form (e.g., runningrun), consolidating related words.
  • Spell Checking and Correction: Identifying and correcting spelling errors.
  • Handling Contractions and Abbreviations: Expanding contractions (e.g., can’tcannot) and standardizing abbreviations.
  • Handling HTML Tags: Removing or stripping HTML tags in text data.
  • Text Normalization: Standardizing text formats, such as converting dates to a consistent format.
  • Removing or Masking Personal Identifiable Information (PII): Replacing or removing sensitive information like names, addresses, or social security numbers for privacy and compliance.
  • Removing URLs and Email Addresses: Eliminating URLs and email addresses that may not be relevant for analysis.
  • Text Segmentation: Splitting text into segments or paragraphs as required by analysis tasks.
  • Sentence and Document Length Normalization: Ensuring uniform sentence/document lengths for tasks such as text classification.
  • Encoding and Decoding: Converting text between different character encodings whenever necessary.

Important Tasks in Natural Language Processing

TaskDescription
Text ClassificationCategorizing text into predefined labels (e.g., spam detection, sentiment analysis).
Named Entity Recognition (NER)Identifying entities such as names, locations, dates, and organizations in text.
Part-of-Speech (POS) TaggingDetermining the grammatical role of each word in a sentence.
Machine TranslationTranslating text from one language to another (e.g., English → French).
Question AnsweringExtracting answers from text based on a given question.
Text SummarizationProducing concise summaries from longer documents.
Sentiment AnalysisDetecting emotions or opinions expressed in text.
Speech Recognition & GenerationConverting speech to text (ASR) or text to speech (TTS).
Dialogue Systems / ChatbotsUnderstanding and generating human-like conversational responses.

Text Classification Text classification is the process of assigning predefined labels or categories to a given piece of text.

  • In sentiment analysis, text is classified as positive, negative, or neutral based on the expressed opinion.
  • In topic classification, text is categorized into specific subjects or domains, making it easier to organize and manage information.

A simple sentiment analysis model takes text as input and outputs a label (e.g., positive, negative, neutral).
Although basic models exist, real-world applications often require more advanced techniques.

Examples of Sentiment Analysis:

“Very good, solid, good balance, comfortable… loved it”Positive

Example of Topic Classification Categorizing a news article under topics like Politics, Sports, or Technology for better organization.

Named Entity Recognition (NER) Named Entity Recognition (NER) is a technique in Natural Language Processing (NLP) that focuses on identifying and extracting specific entities from text, such as:

  • names of people
  • organizations
  • locations
  • dates
  • times
  • predefined categories

NER plays a key role in information extraction and enhances the overall contextual understanding of text by machines.

Examples of NER

  1. “The company TCS was founded in 1968.” NER identifies TCS as an organization and 1968 as a date.
  2. “The meeting is going to be held at 10:00 AM today.” NER identifies 10:00 AM as a time and today as a date.

Applications of NER

  • Information Extraction: Retrieves details such as people, organizations, and locations for tasks like building knowledge bases or generating reports.
  • Machine Translation: Improves translation accuracy by correctly recognizing named entities in the source text.
  • Text Summarization: Enhances summarization by identifying key entities and ensuring they are properly represented.

NER is essential in many NLP systems, including:

  • chatbots
  • sentiment analysis tools
  • search engines

It is widely used in domains requiring structured insights from unstructured text.

Question Answering (QA) Question Answering (QA) systems are designed to automatically provide answers to questions posed in natural language.
These systems analyze the query, search for relevant information, and generate appropriate responses. QA systems can retrieve answers from:

  • databases
  • documents
  • knowledge bases
  • real-time sources

Types of QA Tasks

  1. Extractive QA The system extracts the exact answer directly from the given text.
Example
[!question] “What is the capital of India?” **Answer:** *Delhi*
  1. Abstractive QA The system generates answers in its own words, not limited to the original text.
Example
[!question] “What is the meaning of life?” **Answer:** A reflective or philosophical response.

Classifications Based on Scope

  • Closed-domain QA: Focuses on questions within a specific field (e.g., medicine, law).
  • Open-domain QA: Handles questions across diverse topics without domain restrictions.
  • Knowledge-base QA: Uses structured knowledge sources (e.g., DBpedia, Freebase) to answer fact-based queries.

Applications of QA Systems

  • Search engines: Provide direct answers to user queries.
  • Virtual assistants: Power systems like Amazon Alexa and Google Assistant.
  • Knowledge bases: Help users query structured information (product catalogs, medical databases).

QA systems are also used in platforms like Quora, and Kaggle competitions (e.g., identifying duplicate questions).
Modern large language models (LLMs) like ChatGPT combine retrieval and generative approaches to produce high-quality answers.

Machine Translation (MT) Machine Translation is the process of automatically converting text or speech from one language to another.
It works by understanding the input language, creating an intermediate form, and then producing the translated output in the target language.

Three Main Approaches to Machine Translation:

  1. Rule-based MT: Uses grammar rules and dictionaries of both languages to perform translation.
  2. Statistical MT Learns translation patterns by analyzing large collections of parallel texts in two languages.
  3. Neural / AI-based MT Uses artificial neural networks to learn complex language patterns and produce more natural translations. Governments
Example
(In India) promote machine translation to support **digital empowerment** and **digital inclusion**.

Initiatives like Bhasha Daan encourage citizens to contribute open-source language datasets. Global tools like Google Translate support more than 120 languages.

Text Generation Text Generation is the process of automatically creating meaningful and natural-sounding text.
It can handle simple tasks (product descriptions) as well as advanced ones (story writing, language modelling).

Approaches to Text Generation

  1. Rule-based methods: Follow predefined grammar rules.
  2. Statistical methods: Use patterns learned from large text datasets.
  3. Neural / AI-based methods: Use deep learning models to produce fluent, human-like text.

Common Uses of Text Generation

  • Chatbots: AI programs simulating conversations with humans; widely used in customer support.
  • Content creation: Automatically generating articles, blogs, or social media posts to help produce engaging content quickly.

Text Summarization Text Summarization is the process of creating short and clear summaries of long texts while keeping the main ideas and important details. It helps people quickly understand large amounts of information.

Types of Summarization

  1. Extractive Summarization
  • Picks important sentences or phrases directly from the original text.
  • Keeps the exact wording but may sound less smooth or natural. Img - AI unit 1_24.jpg
  1. Abstractive Summarization
  • Creates new sentences that may not exist in the original text.
  • Rewrites the content in a shorter and more natural way.
  • Produces more human-like summaries, but is harder to do.
  • Like machine translation, summarization can use rules, statistical methods, or AI-based techniques.

Common Uses of Text Summarization

  • News summarization: Quick summaries of news articles for easy reading.
  • Document summarization: Short versions of long research papers, reports, or documents.
  • Social media summarization: Condensing long posts or discussions into key points.
  • Content curation: Combining multiple sources into a single concise overview.

Text summarization helps people save time and easily find important information in large volumes of text.

Robotics

Robotics is a branch of science and engineering focused on designing, building, and using robots.
Robots are programmable machines that can perform tasks automatically or with human guidance.

Robotics integrates knowledge from:

  • mechanical engineering
  • electrical engineering
  • computer science
  • artificial intelligence

Key Components of Robotics

  1. Sensors – Help robots sense surroundings (e.g., cameras, microphones, touch sensors).
  2. Actuators – Parts that move the robot (e.g., motors, wheels, arms).
  3. Control System – The “brain” of the robot; processes information and makes decisions.
  4. Power Supply – Provides energy for the robot to operate.

Types of Robots

  • Industrial Robots: Used in factories for manufacturing, welding, and assembly.
  • Service Robots: Used in sectors like healthcare, cleaning, and customer service.
    Img - AI unit 1_25.jpg
  • Military Robots: Used for surveillance, bomb disposal, and defense.
  • Humanoid Robots: Designed to look or behave like humans.
  • Autonomous Robots: Self-driving cars and drones that operate with minimal human input.

Applications of Robotics

  • Manufacturing: Automating repetitive tasks in industries.
  • Healthcare: Assisting in surgeries, rehabilitation, and patient care.
  • Exploration: Space rovers (e.g., on Mars) and deep-sea robots.
  • Agriculture: Automated harvesting, planting, and monitoring crops.
  • Household: Robotic vacuum cleaners and personal assistants.
  • World domination: A conspiracy theory that says: Robots will take over the world from their Creators (us Humans.)

Robotics has advanced rapidly with AI and machine learning, making robots more intelligent, adaptive, and capable of working alongside humans.

Comparison of Humanoid Robots

RobotDeveloperYearHeightPurposeMobilityAI / Interaction
ASIMOHonda2000130 cmResearch, assistanceWalks, runs, climbs stairsRecognizes faces, voices, interacts with humans
AtlasBoston Dynamics2013150–180 cmResearch, disaster responseWalks, runs, jumps, parkourLimited AI; focus on navigation and manipulation
PepperSoftBank Robotics2014120 cmCustomer service, social interactionWheels, limited movementRecognizes emotions, talks, guides people
NaoSoftBank Robotics200658 cmEducation, researchWalks, dances, gesturesProgrammable for interaction and teaching
SophiaHanson Robotics2016165 cmSocial interaction, AI researchLimited mobilityConversational AI, facial expressions, emotion recognition
iCubItalian Institute of Technology2004104 cmCognitive researchWalks, manipulates objectsLearns via exploration and interaction
Robonaut 2 (R2)NASA / GM2011180 cmSpace station assistanceWorks in microgravityTeleoperated with some autonomy

© 2025 Notes.Tamim's.Space