advertisement
advertisement


Oracle Backup Failure Major Factor In American Eagle 8-Day Crash

Written by Evan Schuman
July 30th, 2010
It seems a failure in an Oracle backup utility coupled with the failure of IBM hosting managers to detect it and to verify that a disaster recovery site was operational were the key factors in turning a standard site outage at American Eagle Outfitters into an 8-day-long disaster, according to an IT source involved in the probe.

The initial problem was pretty much along the lines of what StorefrontBacktalk reported on Thursday (July 29), which was a series of server failures. But the problems with two of the biggest names in retail tech--IBM and Oracle--are what made this situation balloon into a nightmare.

This Story Is Only Available For Premium Subscribers. Click Or Login In Below To Read The Rest Of This Story.


advertisement

10 Comments | Read Oracle Backup Failure Major Factor In American Eagle 8-Day Crash

  1. Sean Connolly Says:

    Sounds very similar to the Microsoft/Danger/T-Mobile event last year!

    Sean

  2. Bill Bittner Says:

    As Mr. Reagon would have said: “Trust but verify.”

    Any backup plan must also include full scale dry runs. Pick a slow night and go through the whole thing including switching to the backup site for a day AND coming back to the primary. Do this at least once a month and as the last step as you begin your holiday season systems freeze.

  3. Jim B Says:

    This is another reason showing why outsourcing without brains is a very bad thing. Anyone who outsuorces doesn’t mean “abdicates” yet in many situations that is what seems to happen. You hand off the job to someone else but don’t keep the managing and monitoring in place in your own shop to make sure they are supporting the business each an every second of every day.

    Issue with penalties, it’s like holding a bigger stick over your dog, eventually it loses it’s punch. Penalites like this will not spur or enable better performance. You are just one of many customers hosted at IBM’s site and to them, that’s it, one of many.

    I’m not saying yes or no to outsourcing – as Mr Bittner says it has to be verifed and that is the company’s responsibility not the fox’s.

  4. Anonymous Says:

    Was there an audit clause in the contract between the two parties? Does IBM or Oracle conduct SAS 70 Type II Audits/Agreed Upon Procedures? Does IBM or Oracle conduct tests of their backups to ensure they can recover. Just asking…

  5. Fabien Tiburce, President, Compliantia Says:

    It be interesting to know the liability implications and penalties involved. Outsourcing contracts typically include such clauses. Our own SaaS contract entitles customers a 2% discount for each 1% drop in service as monitored by pingdom.com Bruised reputation aside, I wonder what this is going to cost IBM…

  6. Sid Sidner Says:

    IBM out-sourcing has a very bad reputation among their clients. In my experience, many were looking for ways to get out of their 5 year contracts. It is sad, really.

  7. Ace DBA Says:

    Quote:
    “Once replaced, they tried to do a restore, and backups would not restore with the Oracle backup utility. They had 400 gigabytes (of data) and they were only getting 1 gigabyte per hour restoring. They got it up to 5 gigabytes per hour, but the restores kept failing. I don’t know if there was data corruption of a faulty process.”
    This seems more like a hardware problem with the tape management system. A modern LTO-3 drive can output data at better than 200MB per second ( or about 5 minutes per GB – a bit more than 33 hours for 400GB. )- faster than many SANs can pass it. I would question the hardware and SAN used to do the backups here, not the Oracle recovery software. By the way, at 5GB per hour, it would have taken about three days to do the restore – one good reason to have the Data Guard fail over site.

    As for the Oracle Data Guard. There is no excuse at all for this not working. It is all about monitoring here – very easy to do. Now, if the redo logs were not being applied, it is quite simple to discover when this stopped happening. There is quite a bit that really good DBA might do to get the fail over site up and running. For instance, he might seek to pull the missing redo logs from the backup tapes and manually apply them to the standby database to catch it up.

    Ultimately, it comes down to who was watching the fail over system and, more importantly, who is watching the watchers?

  8. HA Guy Says:

    >> As for the Oracle Data Guard. There is no excuse at all for this not working. It is all about monitoring here – very easy to do.

    Actually, this article states that they didn’t even implement Data Guard and they knew that. Oracle Data Guard or some similar host-based replication technology would have likely saved them from this outage. Note that I say “host-based”, because any storage mirroring technology would have propagated the corrupted bits to the remote DR volumes (if physical data corruption was indeed the cause of their outage).

  9. Ravi Says:

    If you don’t have a good DBA team then you have to suffer.

  10. RJWitty Says:

    A reduction in outsourcing/hosting fees is not adequate. There has to be compensation for lost business. Outsourcers can’t hide behind the customer’s business interruption insurance.

Leave a Reply

Readers, specifically those who want to comment on a story:
Our Comment SPAM system is getting very aggressive these days and has been blocking legitimate comments. If you post a comment and don't see it appear within 2 hours or so, can you please send a heads-up to customer-service@storefrontbacktalk.com? Ideally, please include the time you posted the comment. That will allow us to try and hunt for it. Thanks! P.S. We're working on fixing the system, but we don't want to lose any valuable comments in the meantime.

Weekly, Monthly Newsletters

Quickly catch-up on the latest in E-Commerce and Retail Tech with our free weekly report, with urgent bulletins as news merits—along with our monthlies on Mobile, Security, In-Store, E-Commerce and CRM.
advertisement

Most Recent Comments

"Careless" Systems Integrators Now Directly Under PCI DSS

This exact issue has been bothering me for years, and I was JUST talking about it with someone only yesterday. This may well be my favorite article, mostly because I'm biased and have hated this particular problem forever. Read more...
Good article, but how does this have anything to do with the DSS? Read more...
Actually, the QIR program has a lot to do with the DSS (or PCI). Since merchants rely on their reseller or integrator to implement their PA-DSS validated application, these resellers and system integrators play a critical role in merchants achieving and maintaining PCI compliance. As far as I can tell, the QIR program is designed to help merchants stay compliant by making sure their payment applications are installed according to the PA-DSS Implementation Guide, for example ensuring default passwords are changed (and protected), that the data encryption keys are properly set and secured, that the merchant's data retention policy is set, that no sensitive cardholder data are stored, and often that a firewall is in place and properly configured. Read more...
Although this is a great move forward in pushing the issue of highly trained people, it is also a good marketing ploy for the council. It begs the question: How much do they stand to make? The problem for this is that for people (like myself) that are just starting out their own business venture, PCI has typically charged a premium for their training and certifications. This change will likely force those of us with less capital to spin into the abyss. I have more than 15 years in the security and compliance fields with heavy hitter certs like CISSP, CRISC, and Sec+. There should not be a guide but a free test or a pre-requisite of either the PCI cert OR other heavy hitter certs. I just don't want the good guys in small places to get flushed out. Read more...

StorefrontBacktalk
Our apologies. Due to legal and security copyright issues, we can't facilitate the printing of Premium Content. If you absolutely need a hard copy, please contact customer service.