DataGrok

For all the silliness surrounding Big Data and Data Science, all the hype and all the controversy, there are actually very innovative and disruptive technologies coming from this area, this new approach to data management and analytics [DMA]. How do we categorize the vendors or the technologies that have never existed before?

Predictives

One new area is Predictive Analytics, also called Predictive Intelligence. Since predictions are not analytics, as the term is used in BI, and certainly not the Intelligence used in BI, I don't like either, but prefer the simpler "Predictives". Four companies with which I've had briefings, fall into the Predictives category, but each of these companies have very different approaches and technologies for performing predictives. These companies are Opera Solutions, Alpine Data Labs, INRIX and Zementis. There are other companies that I'll include in a full report after receiving briefings, such as KXEN, Soft10 and Numenta. By the way, Numenta's product is named "Grok". Given their differences, do they really all belong in the same category?

Opera Solutions: Acting on petabytes of data, Opera Solutions provides a signal hub stack starting with data management, going through pattern matching in the signal layer, and, enhanced by their own Data Science teams, resulting in predictions and inferences for better decisions for enterprise advantage, understanding the "signal" is more important than the underlying technology, to actually create front line productivity through signals manifesting and adjusting "gut feel" where machines don't direct humans but do the heavy lifting.

Alpine Data Labs: Alpine Data Labs brings mathematical, statistical and machine learning predictive methods to the data in situ, no matter how small nor how big the data sets, within a variety of RDBMS technologies and Hadoop distributions. Alpine Data Labs helps data science teams address the data where it lays, across data types and functional areas, working with all the data to bring insight to bear on better decisions.

INRIX: INRIX data science teams and technology provides unique predictives using connected cars, connected devices and connected people.

Zementis: Zementis brings predictive modeling into decision management through their data science teams, Adapa product and strong commitment to the predictive markup modeling language [PMML]. Through partners and customers Zementis works with traditional and innovative data sources to provide decision management from predictives, data mining and machine learning for marketing solutions, financial services, predictive maintenance and energy/water sustainability.

DataGrok

One of the more interesting things to come out of data science is how do you really understand the data that is being gathered and presented. Two of the companies with which I've recently have had briefings, challenge the categories of Data Discovery or Data Exploration. However, each of these companies have different technologies, and different approaches to fully, deeply understanding your data, and to being able to draw conclusions from the data before doing other, more formal analytics. Over the past month, I've had the good fortune of having very in-depth, in-person briefings by both of these companies. Both of these companies are helping those who need it most to truly, fully, deeply, easily understand their data. These approaches, while very, very different, both constitute an entirely new category. Beyond data discovery, beyond data exploration, I call this new category Data Grokking.

"Grok" as I wrote in 2007, means to

"to fully and deeply understand"; [but to you need some background on the word's origins]. It's Martian and not from any Terran language at all. It comes from the fertile mind of Robert A. Heinlein, and was brought to Earth by Valentine Michael Smith in Heinlein's wonderful 1961 novel Stranger in a Strange Land.

One of these companies is still in stealth mode, and I won't mention their name here. The other is Ayasdi, and Ayasdi takes a very, very interesting approach to grokking your data.

These two very different technologies, based upon very different science and mathematics, do indeed allow us to fully and deeply understand our data. Much like the Martian ceremony, the DataGrok allows us to mentally ingest our data, to realize creative insights from our data sets, and to recognize the fundamental interweaving among the data, that, prior to these two innovative firms, could only come about through a long, arduous struggle with the data sets.

As I mentioned, the one company is still in stealth mode, so I'll write about Ayasdi here.

Ayasdi

Ayasdi comes out of the intersection of Topology and Computer Science, as brought together by a Stanford Professor, Gunnar Carlsson, and Gurjeet Singh. The project started as a DARPA contract that has spanned more than four years, comptop. The CompTop project included Duke, Rutgers & Stanford nodes. Topological methods discover the structure of the data - this is somewhat analogous to, but not the same as the probabilistic or cumulative distribution or density functions [pdf, PDF, cdf or CDF].

Ayasdi is focused on four markets:

  1. Pharmaceuticals, Healthcare and Biotech
  2. Oil & gas
  3. Government
  4. Financial Services

From this, you can see that Ayasdi customers go after expensive data, i.e. expensive to collect, expensive to use. Iris is the front end to the Ayasdi Platform, and while available as a private cloud, their offering is primarily SaaS.

The analyst community is trying to figure out where to put Ayasdi, thus my category of DataGrok. Another area of confusion is "What is the right tool of each step of the process from DataGrok to inferences and predictions?" Some of this stems from mistrust of machines, but we need machines that do more than count and sort, we need machines that help us to find insight and improve performance.

Sensors Sensors Everywhere

A sensor is anything that can create data about its environs. A more formal definition is

a device that detects or measures a physical property and records, indicates, or otherwise responds to it -New Oxford American Dictionary

A very simple example is a thermocouple.

A picture of a k-type thermocouple showing the standard connector
This is a picture of a k-type thermocouple taken from the FAA under a CC By license

Essentially, two metals are bound together such that when the environment around this wire becomes hotter or colder, the metals produce a voltage. Through this thermoelectric effect, this strain translate into a voltage differential across the wire, producing an electrical signal. A simple voltmeter can read this signal, and one could calibrate that electrical signal to be read as degrees of temperature change.

You likely have one of these in your home thermostat. Perhaps you have a very simple thermostat that turns your home heater on and off.

A picture of an older home thermostat with cover removed
This is a picture of an older model, simple home thermostat, with the cover removed, showing the inner workings, under a CC By license

Perhaps you have a more complex, programmable thermostat that can control the temperature and humidity of your home through a furnace, air conditioner, humidifier/dehumidifier and fans, with different settings for different times of the day and days of the week.

This is a picture of an advanced Honeywell Programmable Thermostat
This is a picture of an advanced Honeywell Programmable Home Thermostat with a green backlit LCD display from the Honeywell website.

Perhaps you have something that looks very simple, but is now part of a complex system that includes not only your home HVAC system, but your computer and smartphone, and computers and analytic software at your utility company.

This is a picture of the very advanced Nest home thermostat.
This is a picture of the very advanced Nest home thermostat, which looks very simple but connects to your computers, smartphones, tablets and more, from the Nest website press downloads.

And this progression is why the Internet of Things is about to explode with Connected Data, with sensors being the new nerve endings of an increasingly intelligent world.

A Section of my Internet of Things mindmap showing the sensor branches
This is a section of my Internet of Things mindmap showing just the sensor branches.

Imagine sensors streaming Connected Data from your home entertainment system, refrigerator & most of its contents, toaster, coffee maker, alarm clock, garden, irrigation, home security, parking on the street in front of your home, traffic flowing by your home to your destination, air quality, and so much more.

We will interact with the world around us in ways that will change our decision making processes in our personal lives, in business, and in the regulatory processes of governments.

If you want to learn more, join IBM and my fellow panelists on Thursday, Sept. 13, from 4 to 5 p.m. ET to chat about cloud and the connected home using hashtag #cloudchat.

The Internet of Things and Change

Will You Be Ready For the M2M World?

The Internet of Things, the Connected World, the Smart Planet… All these terms indicate that the number of devices connected to, communicating through, and building relationships on the Internet has exceeded the number of humans using the Internet. But what does this really mean? Is it about the number of devices, and what devices? Is it about the data, so much data, so fast, so disparate, that will make current big data look like teeny-weeny data?

I think that it's about change: the way we live our lives, the way we conduct business, the way we walk down a street, drive a car, or think about relationships. All will change over the next decade:

  1. Sensors are everywhere. The camera at the traffic light and overseeing the freeway; those are sensors. That new bump in the parking space and new box on the street lamp; those are sensors. From listening for gun shots to monitoring a chicken coop, sensors are cropping up in every area of your life.
  2. Machine to Machine [M2M] relationships will generate connected data that will affect every aspect of your life. Connected Data will be used to fine-tune predictives that will prevent crimes, anticipate your next purchase and take over control of your car to avoid traffic jams. The nascent form of this is already happening: Los Angeles and Santa Cruz police are using PredPol to predict & prevent crimes, location aware ads popping up in your favorite smartphone apps, and Nevada and California are giving driver licenses to robotic cars.
  3. Sustainability isn't about saving the planet, it's about saving money. Saving the planet, reducing dependence on polluting energy sources and reducing waste in landfills are all good things, but they aren't part of the fiduciary responsibilities of most executives. However, Smart Buildings, recycling & composting, and Green IT all increase a company's bottom line and that does fall under every executive's fiduciary goals.

Making Sense of Inter-Connectedness - Introducing My Internet of Things Mind Map

As you can tell from the mindmap associated with this post, I've been thinking about the Internet of things quite a bit lately. It's a natural progression for me. I'm fascinated by all the new sensors, the Connected Data [you heard it here first] that will swamp Big Data, the advances in data management and analytics that will be needed, the impact upon policy and regulation, and the vision of the people and companies bringing about the Internet of Things. But more, as I've been reading and thinking about the SmartPlanet, SmartCities, SmartGrid and SmartPhones, and that ConnectedData, I realized that I can never look at the world around me in the same way again.

Let's look at some of the "facts" [read guesses] that have been written about the IoT.

Looking to the future, Cisco IBSG predicts there will be 25 billion devices connected to the Internet by 2015 and 50 billion by 2020. From The Internet of Things: How the Next Evolution of the Internet Is Changing Everything by Dave Evans, April 2011 [links to PDF]

Between 2011 and 2020 the number of connected devices globally will grow from 9 billion to 24 billion as the benefit of connecting more and varied devices is realised. The Connected Life: A USD4.5 trillion global impact in 2020, [links to PDF] February 2012 by Machine Research for the GSMA.

Two different estimates, one of 24 billion devices of many different types, connected by wireless broadband, and one of 50 billion mobile devices using different types of cellular networks, all by the year 2020. And neither of these estimates include the trillions of other types of things that will deployed over the next eight years. Trillions, not billions, using a variety of personal, local, and wide-area wireless networks.

My Focus Starts at The Intersection of Sensors, Analytics and Smart Cities, with Energy Management and Sustainability

One of the things that will change over time is the way that I look at the Internet of Things. All of it is interesting. But for now, I'll be focusing on the intersection of Sensors, Analytics and Smart Cities, with Energy Management and Sustainability.

Count RFID, Zigbee, MEMS, Smartdust and more traditional sensors, Robots, autonomous vehicles, Healthcare monitors, Smart Meters and more, being distributed in cities, cars, factories, trains, farms, planes, animals and people, and the number of connected devices in 2020 will be in the trillions. Data generated by less than one billion humans using the Internet a few times a day swamped traditional data management & analytics systems, spawning "Big Data". Trillions of devices updating ConnectedData every few nanoseconds will indeed change everything.

Of paramount importance moving forward is determining how to extract business, personal and social value from the intersections, interfaces and interstices of the infrastructure, connected data, objects and people building relationships through the Internet of Things.

Come join me as I look at this convergence and the business impact ahead of us.

Numb3rs Protyped Data Science

Data Science is a new term and a new job title that has been receiving quite a bit of hype. There have been arguments over the definition of this term, and whether or not it truly describes a new field of endeavor or is just an offshoot of statistics, software programming or business intelligence. Another take is that many of the definitions of a Data Scientist can be met by few if any individuals, but really define a team. The television show Numb3rs ran with new episodes in the USA from 2005 through 2010. In many ways, the show was a prototype for such a Data Science team. Let's look at the roles on the show, and see how they might translate into your organization.

The Mathematician or Statistician

Charlie Eppes, boy genius, who grew into a young professor of Mathematics at the fictional CalSci. Like so many professors, he supplemented his income by consulting. In his case, to the FBI and his brother, applying mathematics and statistics to solving crimes of all sorts. His breadth and depth of knowledge was remarkable; unlikely to be matched in the real world. However, an applied mathematician or statistician, with knowledge of a branch of mathematics or statistics relevant to your problems is essential in either a Data Scientist or a data science team.

Computational Statistician, Computer Scientist or Software Developer

Amita Ramanujan, a student at CalSci who achieves her doctorate in computational mathematics, becomes a professor at CalSci, helps with the consultations to the FBI and, as a side note, dates her one-time thesis advisor, Charlie Eppes. In many ways, Amita is the closest to being a Data Scientist of anyone in the show. Equally adept at mathematics, physics, statistics and hacking, Amita often acquires the required data from disparate sources, and transforms Charlie's mathematical visions into working code. If you hire an Amita, you might just have all you need to get Data Science producing real solutions for you.

Over time, both Charlie and Amita gain a fairly impressive domain knowledge of criminalistics.

Subject Matter Expert (SME)

And speaking of subject matter experts, this is where Numb3rs really prototyped what is required to make Data Science valuable in solving the crimes, er, problems at your organization. There were many SMEs, both as regular characters, and as special roles for specific shows. The regular characters included the FBI agents, from Don Eppes, the lead agent, and Charlie's brother, to his team, with David Sinclair and Colby Granger surviving the entire series. By the end of the series, these two FBI agents were taking turns suggesting and explaining mathematical approaches. Other FBI agents who were on the team for one or more seasons include Terry Lake, Megan Reaves, Liz Warner and Nikki Betancourt. Megan was also a profiler.

Also of important note, was the Eppes brothers' father, Alan Eppes. Alan was a retired city planner for Los Angeles. His knowledge of building, regulations, and city's byways, processes, neighborhoods and interactions, were often instrumental in understanding the results of Charlie and Amita's calculations.

Another regular was Larry Feinhardt, holding the Walter T. Merrick chair at CalSci. It may surprise you to learn that a theoretical physicist and cosmologist, interested in studying the heavens, string theory and zero point energy was a crucial member of a crime solving data science team, but he was, other than a brief hiatus aboard the International Space Station.

More important than these regular cast members however were the guest SMEs. There were some recurring roles, such as the drop-out with a knack for baseball stats, or the mechanical engineer who was more interested in how things failed than in how to build them. Flame propagation, biology, disease control, cryptology, cognition, gaming, chemistry, forensic accounting and more specialists all are needed at one time or another to solve the crime.

Retrospective

For me, the lesson is that while you may find an individual who can be creative with the right math or statistics, find, extract and massage data, have sufficient domain expertise and write sophisticated code, turning their creative algorithms into real solutions, you're more likely to need a team. And even that team will need additional help, from within the organization or outside consultants. Your regular team and "guest stars" mathematicians, frequentists, Bayesians, engineers, scientists, accountants, business analysts, and others to bring the best decisions out of your data.

And if you want to check out how realistic the mathematics was in Numb3rs, check out The Math of Numb3rs from Cornell University. For more on the cast, characters and show, look for the CBS official Numb3rs site, and, of course, the Wikipedia article on Numb3rs.

Update: 20120722: I'm honored by the mentions and retweets on Twitter from those who love the Numb3rs analogy. Thank you all.
.

Caggionetti

A Lone Raw Caggionetti
A Lone Raw Caggionetti
Caggionetti Dusted with Sugar and Spice
Caggionetti Dusted with Turbinado Sugar and the Spice Nutmeg
A Plateful of Caggionetti
A Plateful of Fried Caggionetti dusted with turbinado sugar and nutmeg

Caggionetti are a fried Christmas cookie from the Abruzzo region of Italy. My paternal Grandmother, Leni, made them every year. Unfortunately, no one in the family ever got her recipe. They look like a fried ravioli, filled with a chestnut paste and dusted with sugar and spices. I've been making them the past few years, playing with ingredients, and I've finally have a recipe that I wish to share. This makes between 50 & 60 cookies

The dough is made with olive oil, white wine and flour. If you don't have a pasta machine to roll out thin, flat sheets of dough, won ton wrappers may be substituted.

Pastry

4 to 4 & 1/2 cups of whole wheat pastry flour
1/3 cup of extra virgin olive oil - the fruitier the better
white wine

Mound the flour up on a [marble if you have it] pastry board, make a well in the center, add the olive oil, begin kneading the oil into the flour and add the white wine until you have a very stiff dough, similar to a pasta dough. Run it through your pasta machine at least twice until it is nice and thin.

Use a ravioli cutter, round cookie cutter or a glass to make 2 & 1/2 inch round circles of dough.

Filling

My grandmother made a filling of chestnut, cocoa, raisins, figs and hazelnuts. I've seen recipes using citron, walnuts, almonds, chocolate, or cicci instead of some or all of those ingredients, and ones with no cocoa or chocolate.

12 ounces of roasted chestnuts
1/4 cup of raisins soaked in the wine must before boiling or tawny port
1 pint of Grape or Wine must boiled down to about two ounces of syrup, if you can find it, or 1/2 cup of turbinado sugar and/or honey plus tawny port
1 cup of hazelnut meal
6 donatto figs done Melissese style with the tough stem removed, quartered length-wise and chopped coarsely
1/4 cup of fine quality, unsweetened cocoa
a few grinds of allspice

Mix all of these ingredients together.

Making the cookies

Using two spoons, take a chestnut sized ball of the filling, and make it egg shaped by scraping it between the spoons, then place in the center of a dough circle. Rub water around the outside edge of the dough. Pull the dough up around the filling, press together at the watered edge, and then crimp with a fork, turn it over, and crimp the other side.

Heat a cast iron pan, add about a quarter-inch of olive oil. When hot, add enough cookies to the oil to fill the pan. Turn every two minutes until the dough is golden brown [usually about 8 minutes total]. Transfer to a plate lined with paper towels. Allow to cool for a few minutes, and then dust with sugar and spice [I used nutmeg, but cinnamon, clove, allspice, cardamom, or any combination works too].

December 2019
Mon Tue Wed Thu Fri Sat Sun
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds

Mindmaps

Our current thinking on sensor analytics ecosystems (SAE) bringing together critical solution spaces best addressed by Internet of Things (IoT) and advances in Data Management and Analytics (DMA) is updated frequently. The following links to a static, scaleable vector graphic of the mindmap.

Recent Posts

free blog software