Sunday, September 9, 2007

rsync bugs?

For one of the web sites that I built, it has 2 servers. The main server has 2 RAID sets to backup each other. And there is another machine with an extra RAID set for off-machine backup.

Recently, the RAID on the second machine died and need to be re-configured. When initiating rsync to copy the files from the primary machine to secondary machine, the process mysteriously failed with a timeout message after coped several hundred files.

It seems to be caused by dropped network packets. So tried to use the -bwlimit parameter on rsync to reduce the speed (the machines are connected with gigabit network). And BINGO. Although a bit slow, but at least it gets the job done.

Still have no idea whether it is caused by the rsync program or the network stack or the kernel though.

