Comment to BBBT Blog on Wherescape

Today started for me with a great Bouder BI Brain Trust [BBBT] Session featuring WhereScape and their launch of WhereScape 3D [registration or account required to download], their new data warehouse planning tool. Other than my interest in all things related to data management and analysis [DMA], the WhereScape 3D tool is particularly interesting to me in its potential for use in Agile environments and its flexibility in being used with other data integration tools, not just WhereScape Red. Richard Hackathorn does a great job describing WhereScape 3D, which launched in beta at the BBBT, complete with cake for those in the room, which he's already downloaded and used. [I'm awaiting the promised cross-platform JAR to try it out on my MacBookPro.]

Unfortunately, Twitter search is letting me down today as I normally gather all the #BBBT tweets from a session, send them to Evernote, and check these "notes" as I write a blog post.

WhereScape 3D is a planning tool, allowing a data warehouse developer to profile source systems, model the data warehouse or data mart, and automagically create metadata driven documentation. Further, one can iterate through this process, creating new versions of the models and documentation, without destroying the old. The documentation can be exported as HTML and included in any web-based collaboration platform. So, there is the potential of using the documentation against Scrum style burn down lists and for lightweight Agile artifacts.

WhereScape 3D and Red come with a variety of ODBC drivers, and, with the proper Teradata licensing, the Teradata JDBC driver as well. One can also add other ODBC and JDBC drivers. However, neither WhereScape product currently allows connections to non-relational database sources. I would find this to be severely limiting, as in traditional enterprises, we've never worked on a DMA project that didn't include legacy systems requiring us to pull from flat files, systems written in Pick Basic against a UniVerse or other multi-value database management system [MVDBMS], electronic data interchange [EDI] files, XML, or java or ReSTful services. In other cases, we're facing new data science challenges of extreme volumetric flows of data from web, sensor and transaction logs, requiring real-time analytics, such as can be had with SQLstream, or stored in NoSQL data sources, such as Hadoop and its offshoots.

Which leads us to another interesting feature of WhereScape 3D: it's designed to be used with any data integration tool, not just WhereScape Red. I'm looking forward to get that JAR file, currently hiding in a MS Windows EXE file, and trying WhereScape 3D in conjunction with Pentaho Data Integration [PDI or KETTLE] and seeing how the nimble nature of WhereScape 3D planning works with PDI Spoon AgileBI against all sorts of data flows targeting LucidDB ADBMS and data vault. Yeehah!

Full360 on BBBT

Today, Friday the 13th of May, 2011, the Boulder BI Brain Trust heard from Larry Hill [find @lkhill1 onTwitter] and Rohit Amarnath [find @ramarnat on Twitter] of Full360 [find @full360 on Twitter] about the company's elasticBI™ offering.

Serving up business intelligence in the Cloud has gone through the general hype cycles of all other software applications, from early application service providers (ASP), through the software as a service (SaaS) pitches to the current Cloud hype, including infrastructure and platform as a service (IaaS and PaaS). All the early efforts have failed. To my mind, there have been three reasons for these failures.

  1. Security concerns on the part of customers
  2. Logistics difficulties in bringing large amounts of data into the cloud
  3. Operational problems in scaling single-tenant instances of the BI stack to large number of customers

Full360, a 15-year-old system integrator & consultancy, with a clientele ranging from startups to the top ten global financial institutions, has come up with a compelling Cloud BI story in elasticBI™, using a combination of open source and proprietary software to build a full BI stack from ETL [Talend OpenStudio as available through Jaspersoft] to the data mart/warehouse [Vertica] to BI reporting, dashboards and data mining [Jaspersoft partnered with Revolution Analytics], all available through Amazon Web Services (AWS). Full360 is building upon their success as Jaspersoft's primary cloud partner, and their involvement in the Rightscale Cloud Management stack, which was a 2010 winner of the SIIA CODiE award, with essentially the same stack as elasticBI.

Full360 has an excellent price point for medium size businesses, or departments within larger organizations. Initial deployment, covering set-up, engineering time and the first month's subscription, comes to less than a proof of concept might cost for a single piece of their stack. The entry level monthly subscription extended out for one year, is far less than an annual subscription or licensing costs for similar software, considering depreciation on the hardware, and the cost of personnel to maintain the system, especially considering that the monthly fee includes operations management and a small amount of consulting time, this is a great deal for medium size businesses.

The stack being offered is full-featured. Jaspersoft has, arguably, the best open source reporting tool available. Talend Open Studio is a very competitive data integration tool, with options for master data management, data quality and even an enterprise service bus for complete data integration from internal and external data sources and web services. Vertica is a very robust and high-performance column-store Analytic Database Management System (ADBMS) with "big data" capabilities that was recently purchased by HP.

All of this is wonderful, but none of it is really new, nor a differentiator from the failed BI services of the past, nor the on-going competition today. Where Full360 may win however, is in how they answer the three challenges that caused the failure of those past efforts.

Security

Full360's elasticBI™ handles the security question with the answer that they're using AWS security. More importantly, they recognized the security concerns as one of their presentation sections today stated, "Hurdles for Cloud BI" being cloud security, data security and application security. All three of these being handled by AWS standard security practices. Whether or not this is suficient, especially in the eyes of customers, is uncertain.

Operations

Operations and maintenance is one area where Full360 is taking great advantage of the evolution of current Cloud services best known methods and "devops" by using Chef opscode recipes for handling deployment, maintenance, ELT and upgrades. However, whether or not this level of automation will be sufficient to counter the lack of a multi-tenant architecture remains to be seen. There are those that argue that true Cloud or even the older SaaS differentiators and ability to scale profitably at their price-points, depends on multi-tenancy, which causes all customers to be at the same version of the stack. The heart of providing multi-tenancy is in the database, and this is the point where most SaaS vendors, other than salesforce-dot-com (SFDC), fail. However, Jaspersoft does claim support for multi-tenant architecture. It may be that Full360 will be able to maintain the balance between security/privacy and scalability with their use of devops, and without creating a new multi-tenant architecture.Also, the point of Cloud services isn't the cloud at all. That is, the fact that the hardware, software, platform, what-have-you is in a remote or distributed data center isn't the point. The point is the elastic self-provisioning. The ability of the customer to add resources on their own, and being charged accordingly.

Data Volume

The entry-level data volume for elacticBI™ is the size of a departmental data mart today. But even today, successfully loading into the Cloud, that much data in a nightly ETL run, simply isn't feasible. Full360 is leveraging Aspera's technology for high-speed data transfer, and AWS does support a form of good ol' fashioned "sneaker net", allowing customers to mail in hard drives. In addition, current customers with larger data volumes, are drawing that data from the cloud, with the source being in AWS already, or from SFDC. This is a problem that will continue to be an "arms race" into the future, with data volumes, source location and bandwidth being in a three-way pile-up.

In conclusion, Full360 has developed an excellent BI Service to suplement their professional services offereings. Larger organizations are still wary of allowing their data out of their control, or may be afraid of the target web services provide for hackers, as exemplified by the recent bank & retailer email scammers, er marketing, and Sony break-ins. Smaller companies, which might find the price attractive enough to offset security concerns, haven't seen the need for BI. So, the question remains as to whether or not the market is interestd in BI in the Cloud.

This post was simultaneously published on the Blog of the Boulder BI Brain Trust, of which I'm a member.

Setting up the Server for OSS DSS

The first thing to do when setting up your server with open source solutions [OSS] for a decision support system [DSS] is to check all the dependencies and system requirements for the software that you're installing.

Generally, in our case, once you make sure that your software will work on the version of your operating system that you're running, the major dependency is Java. Some of the software that we're running may have trouble with openJDK, and others may require the Java software development kit [JDK or Java SDK], and not just the runtime environment [JRE]. For example, Hadoop 0.20.2 may have problems with openJDK, and versions before LucidDB 0.9.3 required the JDK. Once upon a time, two famous database companies would issue system patches that we're required for their RDBMS to run, but would break the other, forcing customers to have only one system on a host. A true pain for development environments.

Since I don't know when you'll be reading this, or if you're planning to use different software than I'm using, I'm just going to suggest that you check very carefully that the system requirements and software dependencies are fulfilled by your server.

Now that we're sure that the *Nix or Microsoft operating system that we're using will support the software that we're using, the next step is to set up a system user for each software package. Here's examples for a *Nix operating systems: Linux kernel 2.x derived and the BSD derived, MacOSX. I've tested this on Red Hat Enterprise Linux 5, OpenSUSE 11, MacOSX 10.5 [Leopard] and 10.6 [Snow Leopard].

On Linux, at the command line interface [CLI]:

useradd -c "name your software Server" -s /bin/bash -mr USERNAME
- c COMMENT is the comment field used as the user's full name
-s SHELL defines the login shell
-m create the home directory
-r create as a system user

Likely, you will need to run this command through sudo, and may need the full path:

/usr/sbin/useradd

Change the password

sudo passwd USERNAME

Here's one example, setting up the Pentaho system user.

poc@elf:~> sudo /usr/sbin/useradd -c "Pentaho BI Server" -s /bin/bash -mr pentaho
poc@elf:~> sudo passwd pentaho
root's password:
Changing password for pentaho.
New Password:
Reenter New Password:
Password changed.
phpoc@elf:~>

On the Mac, do the following

vate:~ poc$ sudo dscl /Local/Default -create /Users/_pentaho RealName "PentahoCE BI Server" UserShell /bin/bash
vate:~ poc$ sudo sudo passwd _pentaho
Changing password for _pentaho.
New Password:
Reenter New Password:
Password changed.
vate:~ poc$

On Windows you'll want to set up your server software as service, after the installation.

If you haven't already done so, you'll want to download the software that you want to use from the appropriate place. In many cases this will be Sourceforge. Alternate sources might be the Enterprise Editions of Pentaho, the DynamoBI downloads for LucidDB, SQLstream, SpagoWorld, The R-Project, Hadoop, and many more possibilities.

Installing this software is no different than installing any other software on your particular operating system:

  • On any system you may need to unpack an archive indicated by a .zip, .rar, .gz or .tar file extension. On Windows & MacOSX you will likely just double-click the archive file to unpack it. On *Nix systems, including MacOSX and linux, you may also use the CLI and a command such as gunzip, unzip, or tar xvzf
  • On Windows, you'll likely double-click a .exe file and follow the instructions from the installer.
  • On MacOSX, you might double-click a .dmg file and drag the application into the Applications directory, or you'll do something more *Nix like.
  • On Linux systems, you might, at the CLI, execute the .bin file as the system user that you set up for this software.
  • On *Nix systems, you may wish to install the server-side somewhere other than a user-specific or local Applications directory, such as /usr/local/ or even in a web-root.

One thing to note is that most of the software that you'll use for an OSS DSS uses Java, and that the latest Pentaho includes the latest Java distribution. Most other software doesn't. Depending on your platform, and the supporting software that you have installed, you may wish to point [softwareNAME]_JAVA_HOME to the Pentaho Java installation, especially if the version of Java included with Pentaho meets the system requirements for other software that you want to use, and you don't have any other compatible Java on your system.

For both security, and a to avoid any confusion, you might want to change the ports used by the software you installed from their defaults.

You may need to change other configuration files from their defaults for various reasons as well, though I generally find the defaults to be satisfactory. You may need to install other software from one package into another package, for compatibility or interchange. For example, if you're trying out, or if you've purchased, Pentaho Enterprise Edition with Hadoop, Pentaho provides Java libraries [JAR files]and licenses to install on each Hadoop node, including code that Pentaho has contributed to the Hadoop project.

Also remember that Hadoop is a top-level Apache project, and not usable software in and of itself. It contains subprojects that make it useful:

  • Hadoop Commons containing the utilities that support all the rest
  • HDFS - the Hadoop Distributed File System
  • MapReduce - the software framework for distributed processing of data on clusters

You may also want one or more of the other Apache subprojects related to Hadoop:

  • Avro - a data serialization system
  • Chukwa - a data collection system
  • HBase - a distributed database management system for structured data
  • Hive - a data warehouse infrastructure
  • Mahout - a data mining library
  • Pig - an high-level data processing language for parallelization
  • Zookeeper - a coordination service for distributed applicaitons

Welcome to the Data Archon

We've been blogging for over five years on data management & analytics open source solutions, collaboration, mobile, project management, Agile implementations and the like. In addition to these topics, we'll be discussing statistical, mathematical and computerized modeling, management and analysis of data.

Chili for a Chilly Day

I haven't posted a recipe in a while, but as Friday swung our weather from bright blue, warm days into chilly, rainy winter in a quick snap of the fingers, I thought it was time to make the first chili of the year. I've been building this chili recipe since high school, when I first added a block of unsweetened chocolate to the mix, into college when I first added dark beer. Now, the recipe contains hints of a molé sauce as well, and I make my spice mix in advance, to allow the flavours to blend. Oh, and open a bottle of your favorite dark beer, or two if you want to start drinking B) Set the beer aside to become flat.

Beans

While adding beans are optional, I'm planning to do so, and since I'll be using dried beans. This step has the longest lead time. First, I buy my dried beans at Phipps Country Store and Farm in Pescadero, CA. They are about an half-hour drive from me, and I'll visit them several times a year to replenish my supply of dried beans. They have an huge selection of dried beans. For red chili, I use a combination of black beans and one or more dried beans from the kidney family: Big Mexican Red Kidney, Cranberry, Pinto or Red beans. Cranberry are my favorite; they're a big, meaty bean, with a nutty flavour that compliments the creamy black bean nicely. Phipps now has an online store, so you can buy their great beans even if they aren't a convenient drive from you.

I'll use about two pounds of beans, one pound of dried black beans, and the second pound made up of whatever kidney varietals I'm using. As I said, Cranberry beans are my favorite for chili, and that's while I'll be using with the black beans today. Put the dried beans in a strainer, and rinse under cold water. Carefully check the beans, removing any discolored, withered or soft beans, as well as any foreign material such as stems or stones. Place the beans in a kettle and cover with enough cold water to top the beans by two inches. Remove any "floaters". Add a bay leaf. Do not add any salt or acids [tomato, vinegar, etc] as these will wrinkle the beans. You can let the beans soak overnight, or bring the kettle to a boil, simmer the beans for five minutes, and then let the beans soak in the hot water, covered, for an hour. After the hour soak, remove the beans, retaining about a cup of the water and the bay leaf. All of this just prepares the beans. They're not cooked and ready to eat yet.

Prepared Beans in a Ceramic Bowl

Either in advance, or an hour before serving, put the prepared beans back into the kettle, add the reserved soaking liquor and bay leaf and a red [hot] or yellow [sweet] onion, peeled and studded with cloves. Do not add salt nor acids. Cover with enough cold water to just top the beans. Bring to a boil, place on simmering bricks, and simmer for 45 minutes or until the beans are tender.

Chili Base

This is the real "Chili" with Tex-Mex, "Texas Red", Chili con Carne, and Chili with Beans being stews based upon Chili. I start with about five pounds of tomatoes and five pounds of peppers. The tomatoes can be heirloom, cluster, or whatever you have in your garden or local store that are fresh, feel heavy for their size and are very ripe. If you use anything other than red tomatoes, your chili may have an odd colour, but the flavour will be great. I usually use an equal, by weight, combination of chili peppers and bell peppers. In California, at this time of year, there is a great selection of chilies: poblano, anaheim, astor, etc. I generally avoid green bell peppers, as I prefer the flavour of the red, orange and yellow ones. Today I'm using almost three pounds of poblano chilies and two pounds of red, orange and yellow bell peppers.

I start with the tomatoes, as they'll take awhile as well, and can also be prepared the day before, as the beans can. Bring a large pot or kettle of cold, salted water to a boil. While it's coming to a boil, using a very sharp knife, I use a "bird's beak" hooked knife, remove the stem end from each tomato, and score an "X" in the skin at the opposite end from the stem. Place the tomatoes into the boiling water. Once the pot has returned to a boil, after two minutes or so, the skin will begin to peel back from the scored end of the tomatoes. Remove the tomatoes, and place in a bowl to cool. As soon as you can handle them, remove the skin from the tomatoes. Cut each tomato in half, cross-wise, and remove the seeds with a small spoon. Place the peeled, cored tomatoes, cut-side down, into a colander, and allow to drain for at least an hour, but overnight in the refrigerator [over a bowl] is fine too. You'll be amazed how much water you'll collect. You can save this tomato juice, to use in place of water in stock or stews, or to thin this chili, if needed.

Peeled Tomatoes
Tomatoes Cut Along the Cross Section
Tomatoes With Seeds Removed

Next, rinse the peppers and fire-roast them, either over the flames on a gas-stove, under a broiler, or over a grill. Leave the peppers whole, and flame them until the skin is blackened all over each pepper. Place the peppers into a paper bag, or wrap in parchment paper - this traps enough steam to help loosen the skins, and allow them to cool. If the peppers are hot to the taste [such as an jalapeño chili], wear rubber gloves and a mask to avoid capsicum burns. Scrape as much skin as possible off of the peppers with a knife, core them, cut in half, lengthwise, and remove the white veins. Cut the peppers into strips, lengthwise.

Peppers Being Fire Roasted over a Gas Stove Top

Make a soffritto of one sliced large red onion, two crushed cloves of garlic, the peppers, your favorite chili powder and cilantro leaves that have been rinsed, dried and chopped. A soffritto is just a slowly cooked medley of vegetables, spices and herbs in olive oil. After an hour or two or three, chop those tomatoes that have been draining and add those and that bottle of beer that you were leaving to become flat, a block of very good dark, unsweetened chocolate, and two tablespoons of freshly ground, roasted, unsalted valencia [sweet, and what I prefer] or virginia [meatier tasting] peanuts. Stir it around, add a teaspoon each of coarse sea salt, Mexican oregano and cayenne pepper, and cook on the simmering bricks or in an oven on low, for about an hour. Add salt and seasonings to taste.

Soffritto
Drained Tomatoes on the Soffritto
Block of Dark Chocolate on a Plate
Chocolate Added to the Soffritto
Ground Peanuts added to the Soffritto
Chili Simmering in the Pot

Chili con Carne

If you're going to make a chili con carne, take two pounds of cubed beef, and brown the cubes in bacon fat. Add the chili over the meat to cover, and let simmer another hour. Shred the meat cubes apart using two forks and return to the stew, or slice thinly. Add the cooked beans and serve.

Chili Con Carne
Chili Con Carne with Beans in the Pan

Vegetarian Chili Frijole

Add the cooked beans to about a quart of the chili and serve.

Serving Suggestions

  1. Put a cooked, hot tamale, of your favorite variation into a bowl and cover with the chili. I like pork tamale with the chili con carne, and chilies & cheese tamale with the vegetarian chili
  2. Put cooked, brown, short-grain rice into a bowl, top with chili.
  3. Fill a bowl with chili and serve with hot cornbread. I especially like the recipe from Recipes for Living in Big Sur. This corn bread is stuffed with chilies and corn, and layered with cheese.
  4. Enjoy it diner style, served in a bowl with oyster or soda crackers.
  5. Forget the beans, don't shred the beef, and enjoy a bowl of Texas Red.
  6. Put out various condiments: chopped onions, chopped, fire-roasted jalapeños or hotter chilies, hot pepper sauces [there are many on the market, try something new], shredded cheese - especially Mexican cheeses.
  7. Serve over a bowl of different grains, rather than rice: quinoa is a great choice, especially for the vegetarian version, soft polenta is another good choice, and you can check out many more suitable grains at Bob's Read Mill.

  8. Take a crunchy deli roll, hollow it out, fill with thinly sliced beef and top with the chili and cheese. Or forget the beef, and use a kielbasa sausage.
ttp://press.teleinteractive.net/media/blogs/cynasuralog/ChiliOverCornMuffin85.JPG">Chili Served over a Corn Muffin in flat Bowl

Variations

First and foremost, recipes are guidelines, not exact instructions that you must follow. Add more or less of anything. Consider every recipe a starting point for your own imagination and taste.

We already talked about the various beans you can use, as well as the variety of chilies. There are many more types, of course. Be adventurous.

Instead of beef, try other red or dark meats: venison, buffalo or beefalo, elk, duck, turkey thighs - especially from a wild turkey, or other game meat. Go wild.

Rather than cayenne pepper, use ground ancho [sweet and fruity] or chipotle [smoky] chilies.

Try chili verde. Use tomatillos rather than tomatoes, forget the chocolate and peanut butter. Use white beans rather than red, and white meats rather than red.

Chili Powder

I make my own chili powder. I start by filling an old spice jar, 50/50 with cumin seeds and coriander, and shoving in a cinnamon stick. When I need chili powder, I take a teaspoon of the mixture, and toast the seeds. Allow the cumin and coriander to cool, then grind in a mortar and pestle, add a 1/4 teaspoon of ground cinnamon and a 1/2 teaspoon of ground allspice.

Enjoy!

July 2019
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds