Sunday, September 9, 2007

rsync bugs?

For one of the web sites that I built, it has 2 servers. The main server has 2 RAID sets to backup each other. And there is another machine with an extra RAID set for off-machine backup.

Recently, the RAID on the second machine died and need to be re-configured. When initiating rsync to copy the files from the primary machine to secondary machine, the process mysteriously failed with a timeout message after coped several hundred files.

It seems to be caused by dropped network packets. So tried to use the -bwlimit parameter on rsync to reduce the speed (the machines are connected with gigabit network). And BINGO. Although a bit slow, but at least it gets the job done.

Still have no idea whether it is caused by the rsync program or the network stack or the kernel though.

No comments: