I was having a conversation late last week with a SaaS BI vendor about how organizations get data into their online data warehouse (ftp seems to be the most popular method), when it struck me: why couldn’t they use an RSS feed from a transactional system to feed data into the data warehouse for BI purposes (or Atom, for that matter)? Near-real-time data is essential for many types of BI analysis, so there has to be something better than once-daily uploads.
-
Pages
-
Feeds
-
Categories
- BAM
- BI
- blogging
- BPA
- BPM
- BPM history
- BPM standards
- BPMG-BPTG
- BRM
- business
- CEP
- compliance
- EA
- EAI
- ECM
- Enterprise 2.0
- ESB
- Gartner
- Links
- mashups
- off topic
- open source
- outsourcing
- Rant
- SaaS
- Six Sigma
- SOA
- Software design
- torcamp
- Vendors
- Web 2.0
- ·conferences
- AppianForum
- Architecture&Process
- ARISProcessWorld
- BEAparticipate
- BPMGProcess2006
- BPMThinkTank
- BrainStorm2007
- BusinessRulesForum
- E2.0 conf
- E2.0 TTW
- ExperienceTech
- FASTforward08
- ForresterITLeadership
- Gartner BPM
- IIR BPM
- IntegrationWorld
- IntlBPM
- IQPC BPM
- IT360
- LombardiDriven
- mashupcamp
- NewSoftwareIndustry
- ProformaVision2006
- SAPevents
- SharedInsightsPCC
- TransitCamp
- TUCON
- UserNet2005
Archives
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
-
{ 5 } Comments
The problem with RSS/Atom is that the Producer doesn’t know whether the Consumer has read the data or not, so you either stuff the RSS feed with more data than it needs or (really, and/or) suffer the risk of data dropouts. Now, there are technological solutions to this, using ETag tokens and various clever techniques, but since you’re coding this yourself, the question has to be: why not just use TIBCO, MQ, or whatever.
I have to agree with David on this. You want to have system sync transaction save and will need appropriate technology for that. Data push or Publish and Subscribe technologies are already out there, that work fine for this.
Although I see a use case for RSS feeds on certain process data, towards people that do day-to-day monitoring. Non critical monitoring (for the C & I in RACI of certain processes).
Regarding the daily uploads: yes, I agree there. We need to start moving out of the batch thinking of both tasks and data transfer. As a driver, I would not want my car to only upload and show my speed, gasoline level and traffic jam info every hour or so, even less 24 hour. So why do we settle for less in business? For some reason people do firefighting from a result of delayed information, but never have the role in their job description…
Unfortunately, in many IT people’s head, there is still a lot of mainframe thinking (send blob of data here, perform update XYZ, print signal list, send data further), instead of: what business processes are we running here (24×7) and what operational control do we need over it?
Regards,
Roeland
David and Roeland, thanks for your comments. I agree that there’s an issue with no delivery guarantee mechanism, but there could be a lot of intraday BI that is more focussed on aggregates than individual data points for which this might be suitable.
David, I agree that if you’re doing this between two on-premise systems, then a proper message bus is the way to go; I’m thinking of the case of an on-premise transaction system sending data to a SaaS BI system, where the SaaS system is more likely to consume RSS feeds than TIBCO or MQ messages, at least until the SaaS offerings mature somewhat.
Roeland, the batch thinking definitely needs to go, but it’s a constant battle with old-style IT departments. The problem is, as you state, that business needs near-real-time information to manage things effectively, and IT is only giving them daily updates.
Also check out the post after this one on the TIBCO PageBus stuff that was just announced today, that’s client-side, but interesting to start seeing the pub-sub paradigm being used more widely.
Hi Sandy,
Ah, but then the issue is RSS isn’t “sent” — it’s consumed via HTTP GET, which means the SaaS consumer/system has to tunnel in somehow to business data. This is a quibble, I know.
We did the [Large Car Company] website by using MQ messages to replicate parts of their DB. The site was off-premises and outside of the firewall, but once again…. Funnily enough, they didn’t send the messages as updates happened but exactly in the batch orientation that Roeland was talking about.
There’s still options. First, the Atom Publishing Protocol (not to be confused with the Atom Syndication Format, which is more anologous to RSS) allows secure connections to push item by item from the producer to the consumer. This is how Google populates Google Base (they call the protocol GData, but it’s APP). The nice thing is that there’s starting to be good tool support for this and if I was doing SaaS services I would look at this in a serious way. You’d still have to deal with queuing internally, but with a single consumer this isn’t too tough.
Another SaaS option the Amazon Simple Queue service:
http://www.amazon.com/gp/browse.html?node=13584001
I’d seriously consider this if there were “freeware” alternatives, just in case Amazon decides to get out of the software services industry.
Some great suggestions David, thanks. My real focus was on some sort of non-proprietary way to do near-real-time updates, it doesn’t have to be RSS.
Post a Comment