Core Module Information
Module title: Data Management and Processing

SCQF level: 10:
SCQF credit value: 20.00
ECTS credit value: 10

Module code: SET10115
Module leader: Md Zia Ullah
School School of Computing, Engineering and the Built Environment
Subject area group: Computer Science
Prerequisites

There are no pre-requisites for this module to be added

Description of module content:

This module will explore and develop data management and processing solutions that work on dirty, complex, real-world data. This module will examine the key concepts of data warehousing, data cleaning, and data processing in the context of business requirements and focus on how to combine these steps into a coherent data processing pipeline.First, modern tools and techniques in data management will be examined, emphasizing good practice and professional approaches to storing and handling data. Next, the module will explore ways of cleaning noisy real-world data to make it suitable for data processing. Finally, data processing and collation techniques such as Machine learning or Deep Learning will be applied to the data to extract structure and elicit comprehension of the data. Throughout the module, the advantages and disadvantages of using local and cloud approaches will be explored, alongside discussing common parallel approaches to facilitate faster solutions.In short, the goal of this module is to allow students to understand a data processing pipeline from raw data to final delivery. It will cover:• Data warehousing and storage techniques• Data cleaning techniques• A discussion of cloud approaches• Data processing and collation techniques• An introduction to parallel data pipeline approaches• An introduction to Natural language processing, and • Topic modelling

Learning Outcomes for module:

Upon completion of this module you will be able to

LO1: Compare different data warehousing techniques and technologies related to data management.

LO2: Critically reflect on the advantages of local and cloud solutions for data processing.

LO3: Appraise different methods of data cleaning in the context of large or complex data sets using R language.

LO4: Integrate industry-standard data collation techniques.

LO5: Create a data processing and management pipeline from raw data to a final delivery using R language.

Full Details of Teaching and Assessment
2024/5, Trimester 1, Blended,
VIEW FULL DETAILS
Occurrence: 001
Primary mode of delivery: Blended
Location of delivery: MERCHISTON
Partner:
Member of staff responsible for delivering module: Md Zia Ullah
Module Organiser:


Student Activity (Notional Equivalent Study Hours (NESH))
Mode of activityLearning & Teaching ActivityNESH (Study Hours)NESH Description
Face To Face Lecture 20 10 lectures will cover the following topics: - Data warehousing, - Storage solutions and data management, - Data quality, - Data pipelines, - Data processing, - Text data pre-processing, - Data summarisation and visualisation, - NLP, and - Topic modelling.
Face To Face Practical classes and workshops 20 Ten practical sessions will facilitate the students to solve problems aligned with the corresponding lectures having one-to-one support from the lecturer and demonstrator. The solution to the practical problem will be released one week after the session.
Online Guided independent study 160 Guided independent study
Total Study Hours200
Expected Total Study Hours for Module200


Assessment
Type of Assessment Weighting % LOs covered Week due Length in Hours/Words Description
Project - Practical 30 1~2 Week 8 HOURS= 2000 Project - Practical courseworkWriting a reflective report on data warehousing and storage techniques contextualised for a specific use case, supported by the relevant literature.
Project - Practical 70 3~4~5 Week 13 HOURS= 2000 Project - Practical courseworkDeveloping a complete workflow using R including the reports.
Component 1 subtotal: 30
Component 2 subtotal: 70
Module subtotal: 100

Indicative References and Reading List - URL:
Contact your module leader