ETL: A Simple Package to Load Data from Views

A common, native way to load data into tables in Oracle is to create a view to load the data from. Depending on how the view is built, you can either refresh (i.e. overwrite) the data in a table or append fresh data to the table. Here, I present a simple package ETL that only requires you to maintain a configuration table and obviously the source views (or tables) and target tables.

read more

The Case for Industrial Data Science

It has – perhaps somewhat prematurely – been called the sexiest job of the twenty-first century, but whether you buy into the Big Data hype or not, data science is here to stay.

The available literature, the majority of courses in both the virtual and real world, and the media all purport the image of the data science ‘artiste’: a data bohemian who lives among free, like-minded spirits in lofty surroundings, who receives sacks of money in exchange for genuine works of art created with any possible ‘cool’ tool that flutters by in whatever direction the wind is blowing that day.

The reality for many in the field is quite different. Corporations rarely grant anyone unfettered access to all data, and similarly they are not willing to try and buy every new tool that hits the market, simply to satisfy someone’s curiosity. Furthermore, industrial data science has requirements that are much stricter than what is commonly taught in programmes around the world, and it’s time to make the case for industrial data science.

read more

Setting up Scala for Spark App Development

Apache Spark is a popular framework for distributed computing, both within and without the Hadoop ecosystem. Spark offers interactive shells for Scala as well as Python. Applications can be written in any language for which there is an API: Scala, Python, Java, or R. Since it can be daunting to set up your environment to begin developing applications, I have created a presentation that gets you up and running with Spark, Scala, sbt, and ScalaTest in (almost) no time.

read more

Mapping a Value Stream in Neo4j

The canonical use cases of a graph database such as Neo4j are social networks. In logistics and manufacturing networks also arise naturally. In particular, supply chains and value streams spring to mind. They may not be as large as Facebook’s social graph of all its users, but seeing them for the beasts they truly are can be beneficial. In this post I therefore want to talk about how you can model a value stream in Neo4j and how you can extract valuable information from it.

read more

Oracle Date Arithmetic Weirdness

Although the date arithmetic in Oracle Database is well documented, it is not always as clear as it could be. In this blog post I want to point out a few common traps with regard to date calculations in Oracle that you should be aware of, especially with regard to intervals.

read more