OSS DSS Formalization

The next step in our open source solutions (OSS) for decision support systems (DSS) study guide (SG), according to the syllabus, is to make our first decision: a formal definition of "Decision Support System". Next, and soon, will be a post listing the technologies that will contribute to our studies.

The first stop in looking for a definition of anything today, is Wikipedia. And indeed, Wikipedia does have a nice article on DSS. One of the things that I find most informative about Wikipedia articles, is the "Talk" page for an article. The DSS discussion is rather mild though, no ongoing debate as can be found on some other talk pages, such as the discussion about Business Intelligence. The talk pages also change more often, and provide insight into the thoughts that go into the main article.

And of course, the second stop is a Google search for Decision Support System; a search on DSS is not nearly as fruitful for our purposes. :)

Once upon a time, we might have gone to a library and thumbed through the card catalog to find some books on Decision Support Systems. A more popular approach today would be to search Amazon for Decision Support books. There are several books in my library that you might find interesting for different reasons:

  1. Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL by Roland Bouman & Jos van Dongen provides a very good overview of data warehousing, business intelligence and data mining, all key components to a DSS, and does so within the context of the open source Pentaho suite
  2. Smart Enough Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions by James Taylor & Neil Raden introduces business concepts for truly managing information and using decision support systems, as well as being a primer on data warehousing and business intelligence, but goes beyond this by automating the data flow and decision making processes
  3. Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications by Larissa T. Moss & Shaku Atre takes a business, program and project management approach to implementing DSS within a company, introducing fundamental concepts in a clear, though simplistic level
  4. Competing on Analytics: The New Science of Winning by Thomas H. Davenport & Jeanne G. Harris in many ways goes into the next generation of decision support by showing how data, statistical and quantitative analysis within a context specific processes, gives businesses a strong lead over their competition, albeit, it does so at a very simplistic, formulaic level

These books range from being technology focused to being general business books, but they all provide insight into how various components of DSS fit into a business, and different approaches to implementing them. None of them actually provide a complete DSS, and only the first focuses on OSS. If you followed the Amazon search link given previously, you might also have noticed that there are books that show Excel as a DSS, and there is a preponderance of books that focus on the biomedical/pharmaceutical/healthcare industry. Another focus area is in using geographic information systems (actually one of the first uses for multi-dimensional databases) for decision support. There are several books in this search that look good, but haven't made it into my library as yet. I would love to hear your recommendations (perhaps in the comments).

From all of this, and our experiences in implementing various DW, BI and DSS programs, I'm going to give a definition of DSS. From a previous post in this DSS SG, we have the following:

A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.
-- Questions and Commonality

As we stated, this is vague and generic. Now that we've done some reading, let's see if we can do better.

A DSS assists an individual in reaching the best possible conclusion, resolution or course of action in stand-alone, iterative or interdependent situations, by using historical and current structured and unstructured data, collaboration with colleagues, and personal knowledge to predict the outcome or infer the consequences.

I like that definition, but your comments will help to refine it.

Note that we make no mention of specific processes, nor any technology whatsoever. It reflects my bias that decisions are made by individuals not groups (electoral systems not withstanding). To be true to our "TeleInterActive Lifestyle" &#59;) I should point out that the DSS must be available when and where the individual needs to make the decision.

Any comments?

R the next Big Thing or Not

Recently, AnnMaria De Mars, PhD (multiple) and Dr. Peter Flom, PhD have stirred up a bit of a tempest in a tweet-pot, as well as in the statistical blogosphere, with comparisons of R and SAS, IBM/SPSS and the like. I've commented on both of their blogs, but decided to expand a bit here, as the choice of R is something that we planned to cover in a later post to our Open Source Solutions Decision Support Systems Study Guide. First, let me say that Dr. De Mars and Dr. Flom appear to have posted completely independently of each other, and further, that their posts have different goals.

In The Next Big Thing, Dr. De Mars is looking for the next big thing, both to keep her own career on-track, and to guide students into areas of study that will be survive in the job market in the coming decades. This is always difficult for mentors, as we can't always anticipate the "black swan" events that might change things drastically. The tempestuous nature of her post came from one little sentence:

Contrary to what some people seem to think, R is definitely not the next big thing, either. -- AnnMaria De Mars, The Next Big Thing, AnnMaria's Blog

In SAS vs. R, Introduction and Request, Dr. Flom starts a series comparing R and SAS from the standpoint of a statistician deciding upon tools to use.

There are several threads in Dr. De Mars post. I agree with Dr. De Mars that two of the "next big things" in data management & analysis are data visualization and dealing with unstructured data. I'm of the opinion that there is a third area, related to the "Internet of Things" and the tsunami of data that will be generated by it. These are conceptual areas, however. Dr. De Mars quickly moves on to discussing the tools that might be a part of the solutions of these next big things. The concepts cited are neither software packages nor computing languages. The software packages SAS, IBM/SPSS, Stata, Pentaho and the like, and the computing language S, with its open source distribution R, and its proprietary distribution S+ are none likely to be the next big things, as they are currently useful tools to know.

I find it interesting that both Dr. De Mars and Dr. Flom, as well as the various commenters, tweeters, and other posters, are comparing software suites and applications with a computing language. I think that a bit more historical perspective might be needed in bringing these threads together.

In 1979, when I first sat down with a FORTRAN programmer to turn my Bayesian methodologies into practical applications to determine the reliability and risk associated with the STAR48 kick motor and associated Payload Assist Module (PAM), the statistical libraries for FORTRAN seemed amazing. The ease with which we were able to create the program and churn through decades of NASA data (after buying a 1MB memory box for the mainframe) was wondrous &#59;)

Today, there's not so much wonder from such a feat. The evolution of computing has drastically affected the way in which we apply mathematics and statistics today. Several of the comments to these posts argue both sides of the statement that anyone doing statistics today should be a programmer, or shouldn't. It's an interesting argument, that I've also seen reflected in chemistry, as fewer technicians are used in the lab, and the Ph.D.s work directly with the robots to prepare the samples and interpret the results.

Approximately 15 years ago, I moved from solving scientific and engineering problems directly with statistics, to solving business problems through vendor's software suites. The marketing names for this endeavor have gone through several changes: Decision Support Systems, Very Large Databases, Data Warehousing, Data Marts, Corporate Information Factory, Business Intelligence, and the like. Today, Data Mining, Data Visualization, Sentiment Analysis, "Big Data", SQL Streaming, and similar buzzwords reflect the new "big thing". Software applications, from new as well as established vendors, both open source and proprietary, are coming to the fore to handle these new areas that represent real problems.

So, one question to answer for students, is which, if any, of these software packages will best survive with and aid the growth of, their maturing careers. Will Tableau, LyzaSoft, QlikView or Viney@rd be in a better spot in 20 years, through growth or acquisition, than SAS or IBM/SPSS? Will the open source movement take down the proprietary vendors or be subsumed by them? Is Pentaho/Weka the BI & data mining solution for their career? Maybe, maybe not. But what about that other beast of which everyone speaks? Namely, R, the r-project, the R Statistical Language. What is it? Is it a worthy alternative to SAS or IBM/SPSS or Pentaho/Weka? Or is it a different genus altogether? That's a question I've been seeking to answer for myself, in my own career evolution. After 15 years, software such as SAP/Business Objects and IBM/Cognos, haven't evolved into anything that I like, with their pinnacle of statistical computation being the "average", the arithmetic mean. SAS and IBM/SPSS are certainly better, and with data mining, machine learning and predictives becoming important to business, certainly likely to be a good choice for the future. But are they really powerful enough? Are they flexible enough? Can they be used to solve the next generation of data problems?  They're very likely to evolve into software that can do so.  But how quickly?  And like all vendor software, they have limitations based upon the market studies and business decisions of the corporation.

How is R different?

Well, first, R is a computing language. Unlike SAP/Business Objects, IBM/Cognos, IBM/SPSS, SAS, Pentaho, JasperSoft, SpagoBI, or Oracle, it's not a company, nor a BI Suite, nor even a collection of software applications.  Second, R is an open source project. It's an open source implementation of S. Like C, and the other single letter named languages, S came out of Bell Labs, and in the case of R, in 1976. The open source implementation, R comes from R. Ihaka and R. Gentleman, first revealed in 1996 through the article, R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:299–314, and is often associated with the Department of Statistics, University of Auckland.

While I'm not a software engineer, R is a very compelling statistical tool. As a language, it's very intuitive… for a statistician. It's an interactive, interpretive, functional, object oriented, statistical programming language. R itself is written in R, C, C++ and FORTRAN; it's powerful. As an open source project, it has attracted thousands upon thousands of users who have formed a strong community. There are thousands upon thousands of community contributed packages for R. It's flexible, and growing. One of the main goals of R was data visualization, and it has a wonderful new package for data visualization in ggplot2. It's ahead of the curve. There are packages for  parallel processing (some quite specific), for big data beyond in-memory capacity, for servers, and for embedding in a web site.  Get the idea?  If you think you need something in R, search CRAN, RForge, BioConductor or Omegahat.

As you can tell, I like R. :) However, in all honesty, I don't think that the SAS vs. R controversy is an either/or situation. SAS, IBM/SPSS and Pentaho complement R and vice-versa. Pentaho, IBM/SPSS and some SAS products support R. R can read data from SAS, IBM/SPSS , relational databases, Excel, mapReduce and more. The real question isn't is one tool better than another but rather selecting the best tool to answer a particular question.  That being said, I'm looking forward to Dr. Flom's comparison, as well as the continuing discussion on Dr. De Mars' blog.

For us, the question is building a decision support system or stack from open source components. It looks like we'll have a good time doing so.

OSS DSS Studies Introduction

First, let me say that we're talking about systems supporting the decisions that are made by human beings, not "expert systems" that automate decisions.  As an example, let's look at inventory management.  A human might use various components of a DSS to determine the amount of an item in stock, the demand for that item as a trend to determine when it might be out of stock, and predictives as to various factors (internal, external, environmental, political, etc) that might affect supply, to come to a decision as to how much and when to order more of that item.  An expert system might be created that could also determine when and how much of an item to oder, using neural networks, Bayesian nets or other algorithms.  The expert system might even take from the same DSS components (or directly from their underlying data) as the human might.  One could even run the expert system in parallel with humans making the decisions, scoring or otherwise evaluating the two, until the expert system is comparable or better than the expert system.  But, we're not really interested in expert systems in this study guide.  We'll be focusing on systems that help humans to make better decisions, not on automated feedback and control loops.

To me, a technology doesn't matter very much if it's not supporting some process, or a step within a process.  That process may be for personal reasons or supporting work activities. For this study guide, let's begin by continuing the discussion that we began in the previous posts, about the process by which one makes a decision, the steps, the events, the triggers and the consequences of making a decision.

I have my own process in making decisions.  I've played in executive and management roles for many years, and have been responsible for 5 P/L centers.  But this is a study guide, and while I intend to offer my own opinions and interpretations, we need some objective sources to study.  Let's start with a Google search.  Of course, Wikipedia has an article.  A site of which I've not heard before has the first hit with their article on problem-solving and decision-making.  Science Daily has a timely article from 2010 March 13 on how we really make decisions, our brain activity during decision making.  I also like the map from The Institute for Strategic Clarity.  Mindtools sets out a list of techniques and tools for aiding in the decision making process, and provides an important caveat "Do remember, though, that the tools in this chapter exist only to assist your intelligence and common sense. These are your most important assets in good Decision Making".  Reading through various reviews, the one book on decision making that I want to add to my library is The Managerial Decision-Making Process, 5th ed. by E. Frank Harrison.  From the Glossary of Political Economy Terms, we have:


Where formal organizations are the setting in which decisions are made, the particular decisions or policies chosen by decision-makers can often be explained through reference to the organization's particular structure and procedural rules. Such explanations typically involve looking at the distribution of responsibilities among organizational sub-units, the activities of committees and ad hoc coordinating groups, meeting schedules, rules of order etc. The notion of fixed-in-advance standard operating procedures (SOPs) typically plays an important role in such explanations of individual decisions made. -- Organizational process models of decision-making


Let's revisit and expand upon the summary that we gave in the third post in this series.

  1. As an individual faced with making a decision, I may want input from others, I may want consensus, but in the end, it is an individual decision, and I will bear the fruits of having made that decision.
  2. I need to put the problem, and my decision making, into context.  I have a variety of resources at my disposal to do so:

    • historical data
    • current information
    • structured data from transactional systems, master data, metadata, data warehouse, and other possible sources
    • unstructured data from blogs, wikis, Zotero libraries, Evernote, searches, bookmarks and similar sources
    • email
    • non-electronic correspondence, notes and conversations
    • personal experience
    • the experience of others garnered through water cooler and hallway conversations, formal meetings, twitter, phone calls and the like
  3. Now I need to understand all of these facts, opinions and conjecture at my disposal.  Part of this sifting all of it through my internal filters, using my "gut".  Part is using the various reporting and analytical tools at my disposal, and then filtering those through my gut.  And really, this and the next point will constitute the majority of this OSS DSS Study Guide - the tools we use.
  4. As I contemplate the various decisions that I might make from all of this, I want to understand the consequences of each potential decision: might this decision lead to a better product, more profit, less profit, broader market penetration, higher reliability, or even an alternate universe.
  5. As I make this decision, I'll want to collaborate with others.  Ideally, I'll want to collaborate within the context of my decision support system. Once upon a time we would do this by embedding the tools within a portal system, now we take a more master data management approach, and use a services oriented architecture with either web services description language (WSDL) or representational state transition (ReST) application programming interfaces (APIs) to the collaborative environment, usually a wiki.

In summary, this introduction has set up a framework for a decision-making process for an individual to use a decision support system.  The majority of this study guide will be to expore the actual decision support system, and the open source tools from which we can build such a system.

Syllabus for OSS DSS Studies

As promised, here's the syllabus for our study guide to decision support systems using open source solutions. We'll start with a first draft on 2010-03-23, and update and change based on ideas, comments and lessons learned. So, please comment. :) The updates will be marked. Deletions will be marked with a strike-though and not removed.

  1. Introduction
    1. Continuing the discussion of the processes and technologies that constitute a decision support system
    2. Formalizing a definition of DSS as well as the components, such as business intelligence (BI) that contribute to a DSS
    3. Providing [and updating] the list of references for this study guide
  2. Preparation
    1. Discussing the technology for use in this study guide including the client(s) and server (Red Hat Enterprise Linux 5)
    2. Checking for prerequisites for the open source solutions that will be used
    3. Hands-on exercises for preparing the system
  3. Installation
    1. Pointers and examples for installing the open source server-side packages including but not limited to:
      1. LucidDB
      2. Pentaho BI-Server, including PAT, and Administrative Console
      3. RServe and/or RApache
    2. Pointers for installation of client-side software and some examples on MacOSX
  4. Modeling
    1. Generally, we would determine the models, the architecture and then one (or more competing) design(s) to satisfy that architecture, including selecting the right technical solutions for the job at hand. Here, we're creating a learning environment for certain tools, so we're introducing the architecture and design studies after the technology installs.
    2. In general, this section will explore the various means of modeling processes, systems and data, specifically as these relate to making decisions.
    3. Decision Making Processes
      1. Decision Theory
      2. Game Theory
      3. Machine Learning & Data Mining
      4. Bayes and Iterations
      5. Predictives
    4. Information Flow
    5. Mathematical Modeling
    6. Data Modeling
    7. UML
    8. Dimensional Modeling
    9. PMML
  5. Architecture and Design
    1. In this section, we'll examine the differences between enterprise and system architecture, and between architecture and design. We'll look at various architectural and design elements that might influence both policy and technology directions.
    2. Discussing Enterprise Architecture, especially the translation between the user needs and technology/operational realities
    3. System Architecture
    4. SOA, ReST, WSDL, and Master Data Management
    5. Technology selection and vendor bake-offs
  6. Implementation Considerations
    1. Discussing the various philosophies and considerations for implementing any DSS, or really, any system integration project. We'll look at our own three track implementation methodology, as well as how the new Pentaho Agile BI tools support our method. In addition, we'll consider how we'll get all these OSS tools working together, on the same data sets, as well as, the importance of managing data about the data.
    2. Pentaho Agile BI and our own 8D™ Method
    3. System and Data Integration
    4. Metadata
  7. Using the Tools
    1. This is the vaguest part of our syllabus. We'll be using the examples from our various references, but with the system we've set-up here, rather than the exact systems that the references use. For example, we'll be using LucidDB and not MySQL for the examples from Pentaho Solutions. Remember too, that this is a study guide, and not a oops meant to be a book written as a series of blog posts, so while we might vary from the reference materials, we'll always refer to them.
    2. ETL
    3. Reporting
    4. OLAP
    5. Data Mining & Machine Learning
    6. Statistical Analysis
    7. Predictives
    8. Workflow
    9. Collaboration
    10. Hmm, this should take years :D

Renewables and Smart Grid

We are currently in, at least, the fourth era of growth and interest in renewable energy. The first two of which I'm aware, in the late 1800's into the turn of that century, and in the 1950's, both concentrated on solar (Photovoltaics and Solar Thermal), with some wind power in the first. The third was during the Carter Administration in the 1970's (famously ending when Ronald Reagan ordered the solar panels off the roof of the White House). Disclosure: I was doing photovoltaic research at SES, Inc (now part of Royal Dutch Shell) as a physicaleletrochemist during this time.

During the recent upswing in interest, investment and installations of renewable energy sources (photovoltaics, solar thermal, wind, wave, tidal, geothermal, biomass, etc.) I've been worried that the bubble would soon burst. But today, I've had a thought that encourages me, that maybe renewables will take their place along side coal, oil and nuclear. The reason for this is complex, more social than technical, more due to business than to science.

Many point to the past failures of renewables, of whatever type, due to inefficiencies and to long periods, or infinite time, for a return on the upfront investment. But I think that much of what prevented adoption of renewables is more for social and business reasons. For the most part, the past marketing effort for renewables was to get people off the grid. This was scary for the individual, not justified by the ROI, and inimical to business interests.

Today however, we have the prospect of the Smart Grid. What exactly defines the Smart Grid is still being debated, but here's my hopeful thought. Just as the Internet evolved to combine data, communication and collaboration protocols into what we now term Web2.0 or read-write-web or social media, allowing anyone who desires to do so, become a producer of content as well as a consumer, the Smart Grid will not force users of renewable energy sources off the grid, but will allow whoever desires to do so become a producer as well as a consumer of utility services, starting with electricity, but perhaps evolving to include other utility services as well. Let me also point out that I'm not [just] talking about the individual, I'm talking about communities and small businesses. For example, the Smart Grid would allow a small business such as our local Coastside Scavengers to install an AdaptiveARC reactor, transforming the waste they pick-up from our homes into electricity, and additional cash flow.

This possibility has social, business and economic implications that the previous generations of renewables lacked. This gives me hope. This also strengthens my desire to see workable standards, and working implementations of the Smart Grid(s) - whatever that turns out to really mean.

July 2019
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds