ETL: A Simple Package to Load Data from Views

A common, native way to load data into tables in Oracle is to create a view to load the data from. Depending on how the view is built, you can either refresh (i.e. overwrite) the data in a table or append fresh data to the table. Here, I present a simple package ETL that only requires you to maintain a configuration table and obviously the source views (or tables) and target tables.

read on

The Case for Industrial Data Science

It has – perhaps somewhat prematurely – been called the sexiest job of the twenty-first century, but whether you buy into the Big Data hype or not, data science is here to stay.

The available literature, the majority of courses in both the virtual and real world, and the media all purport the image of the data science ‘artiste’: a data bohemian who lives among free, like-minded spirits in lofty surroundings, who receives sacks of money in exchange for genuine works of art created with any possible ‘cool’ tool that flutters by in whatever direction the wind is blowing that day.

The reality for many in the field is quite different. Corporations rarely grant anyone unfettered access to all data, and similarly they are not willing to try and buy every new tool that hits the market, simply to satisfy someone’s curiosity. Furthermore, industrial data science has requirements that are much stricter than what is commonly taught in programmes around the world, and it’s time to make the case for industrial data science.

read on

Setting up Scala for Spark App Development

Apache Spark is a popular framework for distributed computing, both within and without the Hadoop ecosystem. Spark offers interactive shells for Scala as well as Python. Applications can be written in any language for which there is an API: Scala, Python, Java, or R. Since it can be daunting to set up your environment to begin developing applications, I have created a presentation that gets you up and running with Spark, Scala, sbt, and ScalaTest in (almost) no time.

read on

Mapping a Value Stream in Neo4j

The canonical use cases of a graph database such as Neo4j are social networks. In logistics and manufacturing networks also arise naturally. In particular, supply chains and value streams spring to mind. They may not be as large as Facebook’s social graph of all its users, but seeing them for the beasts they truly are can be beneficial. In this post I therefore want to talk about how you can model a value stream in Neo4j and how you can extract valuable information from it.

read on