N 0ATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
COURSE PLAN – PART I Name of the
programme and specialization
B.Tech. Computer Science and Engineering Course Title Data Science
Course Code CSPE81 No. of Credits 3
Course Code of Pre-requisite subject(s)
NIL
Session January 2023 Section
(if, applicable) A&B, VIII Semester Name of Faculty Dr. M. Brindha Department CSE Email brindham@nitt.edu Telephone No. 9944627902 Name of Course
Coordinator(s) (if, applicable)
NA
E-mail Telephone
No.
Course Type Elective Course
Syllabus (approved in Senate)
UNIT I Introduction to data science - case for data science - data science classification - data science
algorithms - Data Science Process - prior Knowledge - Data Preparation - Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration.
UNIT II
Natural language Processing basics - Language Syntax and Structure - Language Semantics - Natural language Processing - Text Analytics - Text Preprocessing and Wrangling - Understanding Text Syntax and Structure - Feature Engineering for Text Representation - Traditional Feature Engineering Models - bag of words model - bag of N-Grams model - TF - IDF Model - Topic Models - Text Classification - Automated Text Classification - Text Classification Blueprint - Classification Models - Multinomial Naïve Bayes - Logistic Regression - Support Vector Machines - Ensemble Models - Random Forest - Gradient Boosting Machines - Evaluating Classification Models.
UNIT III
Text Similarity and clustering - Essential Concepts - Analyzing term Similarity - Analyzing Document Similarity - Document Clustering - Feature Engineering - K-means Clustering - Affinity Propagation - Ward‟s Agglomerative Hierarchical Clustering - Semantic Analysis - Exploring Wordnet - Word Sense Disambiguation - Named Entity Recognition - Analyzing Semantic Representations - Sentiment Analysis - Unsupervised Lexicon-Based Models
- Bing Liu‟s Lexicon - MPQA Subjectivity Lexicon - Pattern Lexicon - TextBlob Lexicon - AFINN Lexicon - SentiWordNet Lexicon - VADER Lexicon - Classifying Sentiment with Supervised Learning.
UNIT IV
Speech - Phonetics - Speech Sounds and Phonetic Transcription - Articulatory
NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI
Phonetics - Phonological Categories and Pronunciation variation - Acoustics Phonetics and Signals - Speech Synthesis - Phonetic Analysis - Prosodic Analysis - Diphone Waveform synthesis - Automatic Speech Recognition - Speech Recognition Architecture - Applying Hidden Markov Model to Speech - Feature Extraction: MFCC Vectors - Computing Acoustic Likelihoods
- The Lexicon and language Model Search and decoding.
UNIT V
Time series Forecasting - Time series Decomposition - Smoothing based Methods - Regression based Methods - Machine Learning Methods - Performance evaluation - Anomaly Detection - Concepts - Distance based outlier Detection - Density based outlier Detection - Local outlier factor - Feature Selection - Classifying feature selection Methods - Principal Component Analysis - Information theory based filtering - chi-square based filtering - Wrapper-type feature selection.
COURSE OBJECTIVES
➢ To understand the data science process and exploration
➢ To learn Machine learning algorithms
➢ To get a knowledge on types of learning, processes, techniques and models
➢ To know about the research that requires the integration of large amounts of data COURSE OUTCOMES (CO)
➢ Understand the data science concepts, techniques and models
➢ Forecast the time series data
➢ Build recommendation systems
➢ Learn and apply different mining algorithms and recommendation systems for large volumes of data
➢ Perform analytics on data streams
Course Outcome (CO) Aligned programme Outcome Understand the data science concepts,
techniques and models 2, 5, 9
Forecast the time series data 1, 11 Build recommendation systems 1, 3, 6, 12 Learn and apply different mining
algorithms and recommendation systems for large volumes of data
2, 3, 5, 6, 9
Perform analytics on data streams 1, 3, 6, 12
COURSE PLAN – PART II COURSE OVERVIEW
This course mainly describes the concepts and techniques of Data science for Business applications.
COURSE TEACHING AND LEARNING ACTIVITIES
S.No. Week Topic Mode of Delivery
1. I Week Introduction to data science - case for data science - data science classification - data
PPT/Chalk
NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI
science algorithms - Data Science Process - prior Knowledge - Data Preparation
2. II Week
Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration
PPT/Chalk
3. III Week
Natural language Processing basics - Language Syntax and Structure - Language Semantics- Natural language Processing - Text Analytics - Text Preprocessing and Wrangling - Understanding Text Syntax and Structure - Feature Engineering for Text Representation
PPT/Chalk
4. IV Week
Traditional Feature Engineering Models - bag of words model - bag of N-Grams model - TF - IDF Model - Topic Models - Text Classification - Automated Text Classification - Text Classification Blueprint
PPT/Chalk
5. V Week
Classification Models - Multinomial Naïve Bayes - Logistic Regression - Support Vector Machines - Ensemble Models - Random Forest - Gradient Boosting Machines - Evaluating Classification Models.
PPT/Chalk
6. VI Week
Text Similarity and clustering - Essential Concepts - Analyzing term Similarity - Analyzing Document Similarity - Document Clustering - Feature Engineering - K-means Clustering - Affinity Propagation - Ward‟s
Agglomerative Hierarchical Clustering
PPT/Chalk
7. VII Week
Semantic Analysis - Exploring Wordnet - Word Sense Disambiguation - Named Entity Recognition - Analyzing Semantic Representations -
PPT/Chalk
8. VIII Week
Sentiment Analysis - Unsupervised Lexicon-Based Models - Bing Liu‟s Lexicon - MPQA Subjectivity Lexicon - Pattern Lexicon - TextBlob Lexicon - AFINN Lexicon - SentiWordNet Lexicon - VADER Lexicon - Classifying Sentiment with Supervised Learning.
PPT/Chalk
9. IX Week
Speech - Phonetics - Speech Sounds and Phonetic Transcription - Articulatory Phonetics - Phonological Categories and Pronunciation variation - Acoustics Phonetics and Signals - Speech Synthesis - Phonetic Analysis - Prosodic Analysis
PPT/Chalk
NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI
10. X Week
Diphone Waveform synthesis - Automatic Speech Recognition - Speech Recognition Architecture - Applying Hidden Markov Model to Speech - Feature Extraction:
MFCC Vectors - Computing Acoustic Likelihoods - The Lexicon and language Model Search and decoding.
PPT/Chalk
11. XI Week
Time series Forecasting - Time series Decomposition - Smoothing based Methods - Regression based Methods - Machine Learning Methods - Performance evaluation
PPT/Chalk
12. XII Week
Anomaly Detection - Concepts - Distance based outlier Detection - Density based outlier Detection - Local outlier factor
PPT/Chalk
13. XIII Week
Feature Selection - Classifying feature selection Methods - Principal Component Analysis - Information theory based filtering - chi-square based filtering - Wrapper-type feature selection.
PPT/Chalk
Text Book
1. Vijay Kotu, Bala Deshpande, “Data Science: Concepts and Practice”, Second Edition, Elsevier 2. Publications, 2019. 2. Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei,
David Brooks, “Deep Learning for
3. Computer Architects”, Morgan Clay Pool Publishers, 2017. 3. Dipanjan Sarkar, “Text Analytics with Python: A Practitioner‟s Guide to Natural Language Processing”, A
4. Press, 2019. 4. Daniel Jurafsky, James H. Martin, “Speech and Language Processing”, Pearson, 2009.
References Books
1. Ethem Alpaydin, “Introduction to Machine Learning”, Third Edition, Adaptive Computation and Machine
2. Learning Series, MIT Press, 2014. 2. Stephen Marsland, “Machine Learning – An Algorithmic Perspective”, Second Edition, Machine Learning
3. and Pattern Recognition Series, Chapman and Hall/CRC, 2014. 3. Dietmar Jannach, Markus Zanker, “Recommender Systems: An Introduction”, Cambridge University Press, 2010.
COURSE ASSESSMENT METHODS-THEORY (shall range from 4 to 6)
S.No. Mode of Assessment Week/Date Duration %
Weightage 1.
CT1 As per Dean (Academic)
Schedule 1 Hour 10%
2.
CT2 As per Dean (Academic)
Schedule 1 Hour 10%
3.
Assignments (2) Before and After CT1 Non-contact Hours
30%
4. Seminar/Demo IV, V, VI, VII, VIII, IX,
X, XI, XII, XIII
Contact Hours 20%
Compensation Assessment* th
OF
NATIONAL INSTTTUTE OF TECHNOLOGY, TIRUCHIRAPPALLI
As per Dcan (Acadcmic) Schcdulc
Final Assessment* 3 hours 30%
5.
100%
TOTAL
*mandatory
COURSE EXIT SURVEY (mention the ways in which the fecdback about the c o u r s e shall be
assessed)
1. Students' feedback through class committee meetings.
2. Feedback questionnaire from students - from MIS at the end of the semester.
COURSE POLICY (preferred mode of correspondence with students, compensation assessment
policy to be specified)
MODE OF CORRESPONDENCE (email/ phone etc)
Mode of Correspondence through Phone, Email.
COMPENSATION ASSESSMENT POLICY
lt any student is not able to attend Assessment-1 and/or Assessment-2 due to genuine reasons,
student is permitted to attend the compensation assessment (CPA) with 10% weightage.
ATTENDANCE POLICY (A uniform attendance policy as specified below shall be followed) A t least 75% attendance in each course is mandatory.
A maximum of 10% shall be allowed under On Duty (OD) category.
Students with less than 65% of attendance shall be prevented from writing the final assessment
and shall be awarded 'V" grade.
ACADEMIC DISHONESTY & PLAGIARISM
Talking to other students, copying from others during an assessment will be treated as
punishable dishonesty.
Zero mark to be awarded for the offenders. For copying from another student, both students
get the same penalty of zero mark.
T h e departmental disciplinary committee including the course faculty member, PAC chairperson and
the HoD, as members shall verify the facts of the malpractice and award the punishment if the student is found guilty. The report shall be submitted to the Academic office.
T h e above policy against academic dishonesty shall be applicable for all the programmers.
ADDITIONAL INFORMATION
The students can get their doubts clarified at any time with their faculty member.
FOR APPROVAL
mbha
HOD
Course Paculty CC-Chairperson
Page 5 of 5