• No results found

algorithms - Data Science Process - prior Knowledge - Data Preparation - Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration.

N/A
N/A
Protected

Academic year: 2023

Share "algorithms - Data Science Process - prior Knowledge - Data Preparation - Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration. "

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

N 0ATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

COURSE PLAN – PART I Name of the

programme and specialization

B.Tech. Computer Science and Engineering Course Title Data Science

Course Code CSPE81 No. of Credits 3

Course Code of Pre-requisite subject(s)

NIL

Session January 2023 Section

(if, applicable) A&B, VIII Semester Name of Faculty Dr. M. Brindha Department CSE Email brindham@nitt.edu Telephone No. 9944627902 Name of Course

Coordinator(s) (if, applicable)

NA

E-mail Telephone

No.

Course Type Elective Course

Syllabus (approved in Senate)

UNIT I Introduction to data science - case for data science - data science classification - data science

algorithms - Data Science Process - prior Knowledge - Data Preparation - Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration.

UNIT II

Natural language Processing basics - Language Syntax and Structure - Language Semantics - Natural language Processing - Text Analytics - Text Preprocessing and Wrangling - Understanding Text Syntax and Structure - Feature Engineering for Text Representation - Traditional Feature Engineering Models - bag of words model - bag of N-Grams model - TF - IDF Model - Topic Models - Text Classification - Automated Text Classification - Text Classification Blueprint - Classification Models - Multinomial Naïve Bayes - Logistic Regression - Support Vector Machines - Ensemble Models - Random Forest - Gradient Boosting Machines - Evaluating Classification Models.

UNIT III

Text Similarity and clustering - Essential Concepts - Analyzing term Similarity - Analyzing Document Similarity - Document Clustering - Feature Engineering - K-means Clustering - Affinity Propagation - Ward‟s Agglomerative Hierarchical Clustering - Semantic Analysis - Exploring Wordnet - Word Sense Disambiguation - Named Entity Recognition - Analyzing Semantic Representations - Sentiment Analysis - Unsupervised Lexicon-Based Models

- Bing Liu‟s Lexicon - MPQA Subjectivity Lexicon - Pattern Lexicon - TextBlob Lexicon - AFINN Lexicon - SentiWordNet Lexicon - VADER Lexicon - Classifying Sentiment with Supervised Learning.

UNIT IV

Speech - Phonetics - Speech Sounds and Phonetic Transcription - Articulatory

(2)

NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI

Phonetics - Phonological Categories and Pronunciation variation - Acoustics Phonetics and Signals - Speech Synthesis - Phonetic Analysis - Prosodic Analysis - Diphone Waveform synthesis - Automatic Speech Recognition - Speech Recognition Architecture - Applying Hidden Markov Model to Speech - Feature Extraction: MFCC Vectors - Computing Acoustic Likelihoods

- The Lexicon and language Model Search and decoding.

UNIT V

Time series Forecasting - Time series Decomposition - Smoothing based Methods - Regression based Methods - Machine Learning Methods - Performance evaluation - Anomaly Detection - Concepts - Distance based outlier Detection - Density based outlier Detection - Local outlier factor - Feature Selection - Classifying feature selection Methods - Principal Component Analysis - Information theory based filtering - chi-square based filtering - Wrapper-type feature selection.

COURSE OBJECTIVES

➢ To understand the data science process and exploration

➢ To learn Machine learning algorithms

➢ To get a knowledge on types of learning, processes, techniques and models

➢ To know about the research that requires the integration of large amounts of data COURSE OUTCOMES (CO)

➢ Understand the data science concepts, techniques and models

➢ Forecast the time series data

➢ Build recommendation systems

➢ Learn and apply different mining algorithms and recommendation systems for large volumes of data

➢ Perform analytics on data streams

Course Outcome (CO) Aligned programme Outcome Understand the data science concepts,

techniques and models 2, 5, 9

Forecast the time series data 1, 11 Build recommendation systems 1, 3, 6, 12 Learn and apply different mining

algorithms and recommendation systems for large volumes of data

2, 3, 5, 6, 9

Perform analytics on data streams 1, 3, 6, 12

COURSE PLAN – PART II COURSE OVERVIEW

This course mainly describes the concepts and techniques of Data science for Business applications.

COURSE TEACHING AND LEARNING ACTIVITIES

S.No. Week Topic Mode of Delivery

1. I Week Introduction to data science - case for data science - data science classification - data

PPT/Chalk

(3)

NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI

science algorithms - Data Science Process - prior Knowledge - Data Preparation

2. II Week

Modeling - Application - Knowledge - Data Exploration - Objectives of data Exploration - Datasets - Descriptive Statistics - Data Visualization - Roadmap for data exploration

PPT/Chalk

3. III Week

Natural language Processing basics - Language Syntax and Structure - Language Semantics- Natural language Processing - Text Analytics - Text Preprocessing and Wrangling - Understanding Text Syntax and Structure - Feature Engineering for Text Representation

PPT/Chalk

4. IV Week

Traditional Feature Engineering Models - bag of words model - bag of N-Grams model - TF - IDF Model - Topic Models - Text Classification - Automated Text Classification - Text Classification Blueprint

PPT/Chalk

5. V Week

Classification Models - Multinomial Naïve Bayes - Logistic Regression - Support Vector Machines - Ensemble Models - Random Forest - Gradient Boosting Machines - Evaluating Classification Models.

PPT/Chalk

6. VI Week

Text Similarity and clustering - Essential Concepts - Analyzing term Similarity - Analyzing Document Similarity - Document Clustering - Feature Engineering - K-means Clustering - Affinity Propagation - Ward‟s

Agglomerative Hierarchical Clustering

PPT/Chalk

7. VII Week

Semantic Analysis - Exploring Wordnet - Word Sense Disambiguation - Named Entity Recognition - Analyzing Semantic Representations -

PPT/Chalk

8. VIII Week

Sentiment Analysis - Unsupervised Lexicon-Based Models - Bing Liu‟s Lexicon - MPQA Subjectivity Lexicon - Pattern Lexicon - TextBlob Lexicon - AFINN Lexicon - SentiWordNet Lexicon - VADER Lexicon - Classifying Sentiment with Supervised Learning.

PPT/Chalk

9. IX Week

Speech - Phonetics - Speech Sounds and Phonetic Transcription - Articulatory Phonetics - Phonological Categories and Pronunciation variation - Acoustics Phonetics and Signals - Speech Synthesis - Phonetic Analysis - Prosodic Analysis

PPT/Chalk

(4)

NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI

10. X Week

Diphone Waveform synthesis - Automatic Speech Recognition - Speech Recognition Architecture - Applying Hidden Markov Model to Speech - Feature Extraction:

MFCC Vectors - Computing Acoustic Likelihoods - The Lexicon and language Model Search and decoding.

PPT/Chalk

11. XI Week

Time series Forecasting - Time series Decomposition - Smoothing based Methods - Regression based Methods - Machine Learning Methods - Performance evaluation

PPT/Chalk

12. XII Week

Anomaly Detection - Concepts - Distance based outlier Detection - Density based outlier Detection - Local outlier factor

PPT/Chalk

13. XIII Week

Feature Selection - Classifying feature selection Methods - Principal Component Analysis - Information theory based filtering - chi-square based filtering - Wrapper-type feature selection.

PPT/Chalk

Text Book

1. Vijay Kotu, Bala Deshpande, “Data Science: Concepts and Practice”, Second Edition, Elsevier 2. Publications, 2019. 2. Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei,

David Brooks, “Deep Learning for

3. Computer Architects”, Morgan Clay Pool Publishers, 2017. 3. Dipanjan Sarkar, “Text Analytics with Python: A Practitioner‟s Guide to Natural Language Processing”, A

4. Press, 2019. 4. Daniel Jurafsky, James H. Martin, “Speech and Language Processing”, Pearson, 2009.

References Books

1. Ethem Alpaydin, “Introduction to Machine Learning”, Third Edition, Adaptive Computation and Machine

2. Learning Series, MIT Press, 2014. 2. Stephen Marsland, “Machine Learning – An Algorithmic Perspective”, Second Edition, Machine Learning

3. and Pattern Recognition Series, Chapman and Hall/CRC, 2014. 3. Dietmar Jannach, Markus Zanker, “Recommender Systems: An Introduction”, Cambridge University Press, 2010.

COURSE ASSESSMENT METHODS-THEORY (shall range from 4 to 6)

S.No. Mode of Assessment Week/Date Duration %

Weightage 1.

CT1 As per Dean (Academic)

Schedule 1 Hour 10%

2.

CT2 As per Dean (Academic)

Schedule 1 Hour 10%

3.

Assignments (2) Before and After CT1 Non-contact Hours

30%

4. Seminar/Demo IV, V, VI, VII, VIII, IX,

X, XI, XII, XIII

Contact Hours 20%

Compensation Assessment* th

(5)

OF

NATIONAL INSTTTUTE OF TECHNOLOGY, TIRUCHIRAPPALLI

As per Dcan (Acadcmic) Schcdulc

Final Assessment* 3 hours 30%

5.

100%

TOTAL

*mandatory

COURSE EXIT SURVEY (mention the ways in which the fecdback about the c o u r s e shall be

assessed)

1. Students' feedback through class committee meetings.

2. Feedback questionnaire from students - from MIS at the end of the semester.

COURSE POLICY (preferred mode of correspondence with students, compensation assessment

policy to be specified)

MODE OF CORRESPONDENCE (email/ phone etc)

Mode of Correspondence through Phone, Email.

COMPENSATION ASSESSMENT POLICY

lt any student is not able to attend Assessment-1 and/or Assessment-2 due to genuine reasons,

student is permitted to attend the compensation assessment (CPA) with 10% weightage.

ATTENDANCE POLICY (A uniform attendance policy as specified below shall be followed) A t least 75% attendance in each course is mandatory.

A maximum of 10% shall be allowed under On Duty (OD) category.

Students with less than 65% of attendance shall be prevented from writing the final assessment

and shall be awarded 'V" grade.

ACADEMIC DISHONESTY & PLAGIARISM

Talking to other students, copying from others during an assessment will be treated as

punishable dishonesty.

Zero mark to be awarded for the offenders. For copying from another student, both students

get the same penalty of zero mark.

T h e departmental disciplinary committee including the course faculty member, PAC chairperson and

the HoD, as members shall verify the facts of the malpractice and award the punishment if the student is found guilty. The report shall be submitted to the Academic office.

T h e above policy against academic dishonesty shall be applicable for all the programmers.

ADDITIONAL INFORMATION

The students can get their doubts clarified at any time with their faculty member.

FOR APPROVAL

mbha

HOD

Course Paculty CC-Chairperson

Page 5 of 5

References

Related documents

(2) The neural network language model was interpolated with the full back-off language model (trained on CTS and BN data) and compared to this full language model. The first

Cognitive Science Computational Models of Language Processing, Cognitive Science Computational Models of Language Processing,.

Jo_DEM ladakaa kal aayaa thaa, vaha cricket acchhaa khel letaa hai. Jo_PRON kal aayaa thaa, vaha cricket acchhaa khel

Basics of data mining, Knowledge Discovery in databases, KDD process, data mining tasks primitives, Integration of data mining systems with a database or data

Sentiment Analysis is important term of referred to collection information in a source by using NLP, computational linguistics and text analysis and to make decision by

8 Depth image, Segmented binary image, FEMD values and recognition result, Hand contour and maximum inscribed circle, Time-series

This is to certify that the thesis entitled Automating Knowledge Acquisition from Natural Language Text which is being submitted by Rajesh 'That for the award of Doctor of

These computational models provide a better insight into how humans communicate using natural language and also help in the building of intelligent computer systems