Module title: Scripting for Data Science

SCQF level: 08:
SCQF credit value: 20.00
ECTS credit value: 10

Module code: SET08423
Module leader: Sean McKeown
School School of Computing
Subject area group: Software Engineering
Prerequisites

Good understanding of fundamental programming/scripting concepts and some understanding of how they can be used to automate statistical analysis.


Description of module content:

The aim of the module is to deepen the students' understanding of fundamental programming concepts, introduce more advanced concepts pertaining to script development, and develop an ability to utilise publicly available software libraries to solve problems. Throughout the module, the underlying concepts will be contextualised through case studies relevant to the students' Data Science programmes of study. The chosen scripting language is widely used by Data Scientists in both academia and industry and has a thriving community which provides supporting software packages of relevance to the programme of study.
The module provides a fundamental introduction to the chosen scripting language and makes no assumptions about student’s prior exposure to it. The latter parts of the module will focus on applying these concepts to data processing, such that students will develop insight into automating common statistical analyses on imported datasets.

The syllabus includes topics such as:
• An introduction to building scripts using a popular scripting language widely used in Data Science
• Core programming and language concepts, such as data types, control structures, functions, importing libraries, and re-usable design
• Techniques for creating robust scripts, including exception handling, testing and debugging
• Importing and working with externally sourced data (e.g. text and CSV files)
• The use of open-source libraries for automating basic data processing (e.g. calculating point statistics, plotting histograms)
Indicative case studies:
• How to download, format, and import open source datasets using the scripting language.
• Answering basic questions relating to open datasets, such as what the median, mode and mean values, interquartile ranges, and why these values are important.
• Basic plotting to understand the distribution of the underlying data, with examples of how point statistics may be misleading.

Learning Outcomes for module:

LO1: Design, implement and test substantial software scripts which solve problems relating to statistics and data science.
LO2: Employ good practice programming and scripting techniques to develop well-written modular code which is reusable, well documented and uses comprehensive error handling techniques.
LO3: Solve complex, applied problems through abstraction by identifying, utilising and integrating publicly available software libraries as appropriate.

Indicative References and Reading List - URL:

Please contact your Module Leader for details
Click here to view the LibrarySearch.