We have been working to track down the nasty bug on the NVIDIA GPU WS's that is causing problems for donors sending back WUs. We have been trying different fixes over the last week, but this has been very tricky to figure out.
After another brainstorming session this afternoon, I think we have a good plan for the short term and long term. I hope that new WUs being assigned won't see this problem due to rerouting of assignments. Joe is also going to pound out the bugs on his new WS on vspg11a to get that going.
I'm very sorry for this major issue. This has been called the worst outage we've had and I think we agree. I've had a long chat with the development team about this and we've talked about how to fix issues in the WS code release cycle. I think the plan we have in place will stop this from happening in the future, but the main issue right now is to solve the problems at hand.
UPDATE 6pm 2/19/2010 — after a week of working on this, trying lots of stuff, and nothing working, I think we've found something promising. I'm nervous typing this as everything looked promising before, but at least I think Joe's found the reason for the problem, which is the hard part.
UPDATE 11pm — so far so good. It looks like this fix may be sticking.
UPDATE 7:30am 2/20/2010 — looks like the fix is indeed working. We will continue to monitor the servers closely over the weekend.