Core Module Information
Module title: Data Wrangling

SCQF level: 11:
SCQF credit value: 20.00
ECTS credit value: 10

Module code: SET11821
Module leader: Dimitra Gkatzia
School School of Computing, Engineering and the Built Environment
Subject area group: Computer Science
Prerequisites

There are no pre-requisites for this module to be added

Description of module content:

Data Wrangling is the process of transforming and mapping data from "raw" data formats into other formats with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. This may include further data processing, visualisation, aggregation, training a statistical model, as well as many other potential uses. Data Wrangling includes several steps, starting with extracting the data in a raw form from the data source, processing the raw data using specialised algorithms (e.g. NLP approaches for text processing), storing using appropriate data structures (e.g. lists, matrices etc.) and finally utilise the resulting content into a data sink for storage and future use, such as training machine learning models. Contemporary data acquisition and analysis has to address several challenges including the variety of data sources, the volume of data, validity etc. These require the use of specialised data storage, aggregation and processing techniques. This module introduces a range of tools and techniques necessary for working with data in a variety of formats with a view to developing data-driven applications. The module focuses primarily on developing applications using the Python scripting language and associated libraries and will also introduce a range of associated data processing technologies and techniques. The module covers the following topics:• Data types and formats: numerical and time series, textual, unstructured • Data sources and interfaces: open data, APIs, social media, web-based• Techniques for dealing with text data such as vectorisation, bag of words, word embeddings• Supervised Machine Learning approaches• Developing and evaluating Data-Driven Applications in PythonThe Benchmark Statement for Computing specifies the range of skills and knowledge that should be incorporated in computing courses. This module encompasses cognitive skills in Computational Thinking, Modelling and Methods and Tools, Requirements Analysis and practical skills in specification, development and testing and the deployment and use of tools and critical evaluation in addition to providing useful generic skills for employment.

Learning Outcomes for module:

Upon completion of this module you will be able to

LO1: Critically evaluate the tools and techniques of data extraction, interfacing, aggregation and processing.

LO2: Select and apply a range of specialised data types, tools and techniques for data extraction, interfacing, aggregation and processing.

LO3: Employ specialised techniques for dealing with complex data sets.

LO4: Design, develop and critically evaluate data-driven applications in Python.

Full Details of Teaching and Assessment
2024/5, Trimester 2, Blended,
VIEW FULL DETAILS
Occurrence: 001
Primary mode of delivery: Blended
Location of delivery: WORLDWIDE
Partner:
Member of staff responsible for delivering module: Dimitra Gkatzia
Module Organiser:


Student Activity (Notional Equivalent Study Hours (NESH))
Mode of activityLearning & Teaching ActivityNESH (Study Hours)NESH Description
Independent Learning Lecture 10 The lectures will present you with the core material of data wrangling. They can be accessed at any time, and are associated with the workbooks.
Independent Learning Practical classes and workshops 20 Guided by the practical materials and workbook, you will have the chance to apply the knowledge learnt in the lectures to practical problems in the field of data wrangling. You can receive feedback on your work and progress from the module team.
Independent Learning Tutorial 10 In the tutorial sessions, you will have the opportunity to ask questions of the module team, to enable you to succeed in your studies.
Online Guided independent study 160 Guided by the further reading on the Moodle page, you will have a chance to extend your knowledge of data wrangling and embed core concepts.
Total Study Hours200
Expected Total Study Hours for Module200


Assessment
Type of Assessment Weighting % LOs covered Week due Length in Hours/Words Description
Project - Practical 30 1~2 Week 7 HOURS= 1000 In this first coursework, you will begin the process of data wrangling, using authentic assessment.
Project - Practical 70 3~4 Exam Period HOURS= 1000 Contuining your work from coursework 1, you will refine and develop your work for this coursework, to demonstrate the whole lifecycle of data wrangling.
Component 1 subtotal: 100
Component 2 subtotal: 0
Module subtotal: 100

Indicative References and Reading List - URL:
Contact your module leader