Module title: Data Wrangling

SCQF level: 11:
SCQF credit value: 20.00
ECTS credit value: 10

Module code: SET11521
Module leader: Dimitra Gkatzia
School School of Computing
Subject area group: Software Engineering
Prerequisites

n/a

2019/0, Trimester 2, FACE-TO-FACE,
Occurrence: 001
Primary mode of delivery: FACE-TO-FACE
Location of delivery: MERCHISTON
Partner:
Member of staff responsible for delivering module: Dimitra Gkatzia
Module Organiser:


Learning, Teaching and Assessment (LTA) Approach:
Teaching comprises a blend of lectures and lab-based practical sessions, with support materials and resources available online. The lecture programme will enhanced by material from guest speakers which will also be made available online.

Teaching will concentrate on the critical analysis of the underlying principles and theories, and of their implementation in the Python language and relevant specialised code libraries, (LOs 1 - 4). Students are expected to spend a substantial proportion of their time doing practical programming exercises and researching the underlying principles and theories, and related academic literature (LOs 1 - 4). The practical materials are organised and selected for enhancing students’ understanding of the theories/principles covered.


Formative Assessment:
Formative assessment will be provided during lab-based practical sessions. There will also be a series of online quizzes/ exercises that will give a formative feedback.



Summative Assessment:
The main summative assessment will comprise of one practical coursework worth 100% of the final mark (covering LOs 1- 4). An element of this coursework will be submitted around week 7 to give some formative feedback (30%: L.Os 1, 2), with the main submission being at the end of the module (70%: LOs 3, 4), after which the final feedback will be given.



Student Activity (Notional Equivalent Study Hours (NESH))
Mode of activityLearning & Teaching ActivityNESH (Study Hours)
Face To Face Lecture 24
Face To Face Practical classes and workshops 24
Independent Learning Guided independent study 112
Face To Face Tutorial 40
Total Study Hours200
Expected Total Study Hours for Module200


Assessment
Type of Assessment Weighting % LOs covered Week due Length in Hours/Words
Project - Practical 30 1,2 7 HOURS= 12, WORDS= 1000
Project - Practical 70 3,4 14/15 HOURS= 28, WORDS= 1000
Component 1 subtotal: 100
Component 2 subtotal: 0
Module subtotal: 100

Description of module content:

Data Wrangling is the process of transforming and mapping data from "raw" data formats into other formats with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. This may include further data processing, visualisation, aggregation, training a statistical model, as well as many other potential uses. Data Wrangling includes several steps, starting with extracting the data in a raw form from the data source, processing the raw data using specialised algorithms (e.g. NLP approaches for text processing), storing using appropriate data structures (e.g. lists, matrices etc.) and finally utilise the resulting content into a data sink for storage and future use, such as training machine learning models.

Contemporary data acquisition and analysis has to address several challenges including the variety of data sources, the volume of data, validity etc. These require the use of specialised data storage, aggregation and processing techniques. This module introduces a range of tools and techniques necessary for working with data in a variety of formats with a view to developing data-driven applications. The module focuses primarily on developing applications using the Python scripting language and associated libraries and will also introduce a range of associated data processing technologies and techniques.

The module covers the following topics:

• Data types and formats: numerical and time series, textual, unstructured
• Data sources and interfaces: open data, APIs, social media, web-based
• Techniques for dealing with text data such as vectorisation, bag of words, word embeddings
• Supervised Machine Learning approaches
• Developing and evaluating Data-Driven Applications in Python

The Benchmark Statement for Computing specifies the range of skills and knowledge that should be incorporated in computing courses. This module encompasses cognitive skills in Computational Thinking, Modelling and Methods and Tools, Requirements Analysis and practical skills in specification, development and testing and the deployment and use of tools and critical evaluation in addition to providing useful generic skills for employment.

Learning Outcomes for module:

On completion of this module, students will be able to:
LO1: Critically evaluate the tools and techniques of data extraction, interfacing, aggregation and processing
LO2: Select and apply a range of specialised data types, tools and techniques for data extraction, interfacing, aggregation and processing
LO3: Employ specialised techniques for dealing with complex data sets
LO4: Design, develop and critically evaluate data-driven applications in Python

Indicative References and Reading List - URL:

Core - MCKINNEY, W. (2012) PYTHON FOR DATA ANALYSIS: DATA WRANGLING WITH PANDAS, NUMPY, AND IPYTHON.: O’REILLY, 1st ed.
Core - CIELEN, D. & MEYSMAN, A. (2016) INTRODUCING DATA SCIENCE: BIG DATA, MACHINE LEARNING,...: MANNING PUBLICATIONS, 1st ed.
Click here to view the LibrarySearch.