Discussion Forums
Discussion Forums > Category: Storage & Content Delivery > Forum: Amazon Simple Storage Service >Thread: s3sync -- a simple rsync look-alike perl script for S3
Advanced search options
s3sync -- a simple rsync look-alike perl script for S3
Posted on: Mar 27, 2006 12:29 AM
Attachment s3 (20.0 KB)
Attachment s3sync (12.7 KB)
  Click to reply to this thread Reply
Attached is a new version of my s3 perl command line script to access S3 and a new s3sync perl script that has rsync-like functionality. s3sync uses the s3 script and you need to set the path to that script properly at the beginning of the s3sync script.

s3sync -h prints out help information. I highly recommend the -n flag to get comfortable with what it does! Here's the help:

#usage:
# s3sync [options] <src> <bucket>:<prefix>
  1. s3sync [options] <bucket>:<prefix> <dst>
  2. -h: print help
  3. -d: print debug info
  4. -r: recursive
  5. -s: use secure (HTTPS) connection (default HTTP)
  6. -n: dry-run - don't put/get anything, just print what would be done
#
#S3 object naming:
# A local src named /foo/bar/baz.ext is stored with key foo/bar/baz.ext
  1. A local src named foo/bar/baz.ext is stored with key foo/bar/baz.ext
  2. This is intended to be identical to rsync's behavior if your remote
  3. home directory were / except that keys in S3 don't start with a /

Some simple sample usage:

dev aws # ls -R NM-October-00/20001003/
NM-October-00/20001003/:
20001003-083749.jpg 20001003-150357.jpg
20001003-150223.jpg 20001003-150650.jpg

dev aws # ./s3sync -n NM-October-00/20001003 tve-test:/img
NM-October-00/20001003/20001003-083749.jpg -> tve-test:img/20001003/20001003-083749.jpg
NM-October-00/20001003/20001003-150223.jpg -> tve-test:img/20001003/20001003-150223.jpg
NM-October-00/20001003/20001003-150357.jpg -> tve-test:img/20001003/20001003-150357.jpg
NM-October-00/20001003/20001003-150650.jpg -> tve-test:img/20001003/20001003-150650.jpg

dev aws # ./s3sync -n NM-October-00/20001003/ tve-test:/img
NM-October-00/20001003/20001003-083749.jpg -> tve-test:img/20001003-083749.jpg
NM-October-00/20001003/20001003-150223.jpg -> tve-test:img/20001003-150223.jpg
NM-October-00/20001003/20001003-150357.jpg -> tve-test:img/20001003-150357.jpg
NM-October-00/20001003/20001003-150650.jpg -> tve-test:img/20001003-150650.jpg
dev aws # # notice trailing slash in source path..

dev aws # ./s3sync NM-October-00/20001003/ tve-test:/img
dev aws # # took a loong time to transfer...

dev aws # ./s3sync -d NM-October-00/20001003/ tve-test:/img
load_etags command: ./s3 -l ls 'tve-test' 'img'
img/20001003-083749.jpg -> 255018 -- 02052099847a6ff3c268eda97582a1c5
img/20001003-150223.jpg -> 240357 -- ac485ce64c3075ef245ed2cdf2337f80
img/20001003-150357.jpg -> 243913 -- 9b6d7e24e20763d7f2d65ffab13447f5
img/20001003-150650.jpg -> 140159 -- ef6b362054b112a3905121f36a288b7d
Putting directory NM-October-00/20001003 to img
skip dir .
skip dir ..
put file NM-October-00/20001003/20001003-083749.jpg -> img/20001003-083749.jpg
sizes match, checking md5/etag
md5/etag match - skipping put
put file NM-October-00/20001003/20001003-150223.jpg -> img/20001003-150223.jpg
sizes match, checking md5/etag
md5/etag match - skipping put
put file NM-October-00/20001003/20001003-150357.jpg -> img/20001003-150357.jpg
sizes match, checking md5/etag
md5/etag match - skipping put
put file NM-October-00/20001003/20001003-150650.jpg -> img/20001003-150650.jpg
sizes match, checking md5/etag
md5/etag match - skipping put
dev aws # # this was real fast since the files were already there

dev aws # ./s3 ls tve-test img
img/20001003-083749.jpg
img/20001003-150223.jpg
img/20001003-150357.jpg
img/20001003-150650.jpg

dev aws # ./s3sync -n tve-test:img/ images
mkdir images
tve-test:img/20001003-150650.jpg -> images/20001003-150650.jpg
tve-test:img/20001003-150357.jpg -> images/20001003-150357.jpg
tve-test:img/20001003-083749.jpg -> images/20001003-083749.jpg
tve-test:img/20001003-150223.jpg -> images/20001003-150223.jpg

dev aws # ./s3sync tve-test:img/ images
dev aws # # took a while to fech all the files...

dev aws # du images
888 images
dev aws # diff -r images NM-October-00/20001003

-Thorsten
Permlink Replies: 49 | Pages: 2 - Last Post: Nov 28, 2006 3:56 AM by: philipjohnnybob
Replies
« Previous | Page: 1 2 | Next »
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Mar 28, 2006 2:56 PM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
Have you thought of putting an option to compare what is already in the bucket with the directory you want to sync with and only sync what has changed? That would be extremely helpful.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted on: Mar 28, 2006 5:21 PM
in response to: lylepratt in response to: lylepratt
  Click to reply to this thread Reply
That's what it does. The version uploaded only handles buckets with at most 100 items, but I have a new version that handles an arbitrary number. I can upload it if there's interest.

If you put files to S3, what s3 does internally is first list the appropriate prefix of object keys, extract the etags, and then grind through the files one-by-one and upload only those where the etag differs. If you pass the -d option you can see this in gory detail. If you get files, it first does an md5sum on any file that already exists locally and then does a conditional get from S3 based on the ETag.

-Thorsten
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Mar 28, 2006 10:09 PM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
There sure is interest! I was just here looking for a way to use S3 for backups, and this should fit the bill nicely =)

It would rule if someone smart could develop a linux filesystem driver that backends to s3.

Man this is such a cool technology..
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted on: Mar 29, 2006 9:15 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
New versions of the s3 and s3sync scripts attached. Minor fixes such as handling buckets with more than 1000 objects.

Something new on the to-do list: when putting a file, use the filename extension to set the content-type so that browsing (with a std web browser) to the object directly performs the correct operation on the content.
-Thorsten

[Mhhh, the forum software won't let me attach files today, so they will follow in another post, I hope]
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted on: Mar 29, 2006 9:17 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
Attachment s3 (20.4 KB)
Attachment s3sync (12.7 KB)
  Click to reply to this thread Reply
New versions of s3 and s3sync as promised in previous post...
-Thorsten
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Mar 29, 2006 10:58 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
what if I already have over 12000 objects uploaded to my bucket. Will it see those and skip re-uploading them?
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Mar 29, 2006 11:49 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
I have also found a bug when trying to sync file names that have ( or ) in them.
sh: -c: line 1: syntax error near unexpected token `('
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Apr 11, 2006 11:18 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
when s3sync is syncing...does it make the files public or private?
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 16, 2006 2:05 AM
in response to: lylepratt in response to: lylepratt
  Click to reply to this thread Reply
This bug can apparently be fixed pretty easily.

In the s3sync file there are two lines that contain `md5sum $src` and `md5sum $dst` respectively.

Just put single ticks around the argument to the md5sum prog to properly escape bogon characters like parens.

so they become:
`md5sum '$src'`
etc.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 8:10 AM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
My mods, in case anyone wants them. They make the script incompatible with data stored by Thorsten's version.

For me this is a stopgap solution until s3fs project has enough features to be useable. (http://dev.extensibleforge.net/wiki/s3/fuse) Until then, s3sync is the only way I have found to practically store and retreive unix directory structures from s3. There's an s3 backup manager patch too (http://rbdixon.googlepages.com/amazons3backupmanager) but it suffers from two crippling problems in my humble opinion. First, it hasn't been fixed to understand IsTruncated, and second, backup manager does not make restoration of files easy.

My modifications are listed below. They are what I considered to be a minimal set of features to make the script useable on my systems (which are symlink heavy).

# Modified 4/2006 by Greg Bell

  1. Added support for storing symlinks instead of processing them as normal
  2. dirs/files.
  3. Added support for sending/receiving permissions and ownership info.
  4. This meta info will only be transferred if the dir/file itself is
  5. updated (i.e. we don't always verify ownership/permissions on all files).
  6. Ownership/permissions are stored in a #P# file right after the real entry.
  7. There's probably a better way to do this meta info, but I'm not equal
  8. to the task of reading all the s3 docs today so this is how I'm doing it.
  9. Added better shell escaping so there won't be further pathname problems
  10. This change will break compatibility with Thorsten's version (majorly).

I can't attach the file at this time for some reason. I'll try again shortly...
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 8:11 AM
in response to: greg13070 in response to: greg13070
Attachment s3sync_mod (21.0 KB)
  Click to reply to this thread Reply
Attached file contains mods referred to in above post.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: R. B. Dixon RealName(TM)
Posted on: Apr 18, 2006 4:19 PM
in response to: greg13070 in response to: greg13070
  Click to reply to this thread Reply
I modified Backup Manager (http://backup-manager.org) to use S3 as a backup target. You might take a look at that. You'll have to check it out of SVN to get the S3 mods but they are there and working for me locally for about 2 weeks doing incremental and weekly full backups. When paired with the dar archive utility it is quite powerful.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: R. B. Dixon RealName(TM)
Posted on: Apr 18, 2006 4:23 PM
in response to: greg13070 in response to: greg13070
  Click to reply to this thread Reply

The SVN repository version of backup-manager does 1000+ keys fine. I ran into that limitation in about a week so I had to fix it.

I agree about restoration and am working on a solution for that when I have time. I've got a script that maintains, locally, a dar_manager catalog of the contents of every dar archive that is on S3. The next step will be automating the pull of dar archive slices from s3 based on the catalog contents.

Till then restoration is manual and I use your s3 utility to pull down dar archive slices. :)

Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 7:03 PM
in response to: R. B. Dixon in response to: R. B. Dixon
  Click to reply to this thread Reply
I think that backup manager will be more efficient solution than s3sync, but I can't use it until restoration is very easy (must minimize down time in the event of a failure, or a "failure" where I accidentally break something).

I'm continuing to tweak Thorsten's s3sync now-- adding retry for failed s3 operations. Does anyone want these modifications or should I stop beating a dead utility?

Greg
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 7:39 PM
in response to: lylepratt in response to: lylepratt
  Click to reply to this thread Reply
In response to lylepratt's question on 4/11, the files are set to the default private ACL, as it mentions in the usage for the "s3" script which you can get via "s3 -h"
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Apr 18, 2006 7:43 PM
in response to: greg13070 in response to: greg13070
  Click to reply to this thread Reply
Thanks for the info! I am still interested in the changes for sure!
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 8:02 PM
in response to: lylepratt in response to: lylepratt
  Click to reply to this thread Reply
OK I'll be keeping the latest version here:
http://s3.amazonaws.com/ServEdge_pub/s3sync

Right now it's at 0.3.1, which added configurable "retry" support for when s3 times out or returns a bogon. It's just a dumb loop but seems to get the job done.

If I make any earth-shattering changes in the future I'll post back to the thread. I wonder if Thorsten is still around. Always a little odd to be hacking on something written by another. Never quite sure if I follow the original author's intentions =)

I am also monitoring the forum so let me know if questions/etc.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: lylepratt
Posted on: Apr 18, 2006 8:05 PM
in response to: greg13070 in response to: greg13070
  Click to reply to this thread Reply
Greg,

Earlier your said that your changes "make the script incompatible with data stored by Thorsten's version." I currently have over 300 gigs stored to my S3 account. Would I need to delete and re-add these files in order to use your modified script?
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: Apr 18, 2006 10:02 PM
in response to: lylepratt in response to: lylepratt
  Click to reply to this thread Reply
The differences are basically two.

My version does not traverse into symlinks, but stores meta-data for them instead. If you have any data on s3 that was made by the old s3sync looking at a symlink, you need to remove those files from s3 before running my version. Otherwise when you try to restore, there will be contention between the old "real" file and the new symlink version which are both on s3.

My version stores permission and ownership data when put'ing files, and reads it back when get'ing them. It does not monitor for permission changes or detect that permission files are not on your s3. So basically any data you have out there now lacks this info. And when you restore it (with any s3sync version) then all files and directories will be owned by root and mode'd as root's current umask. This is probably not what you want.

As with the original version, this is provided as-is. I hope you find it useful. Let me know if you have any other questions!
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: 3edges
Posted on: May 15, 2006 12:44 PM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
Am I missing something crucial here, or is s3sync badly broken?

As far as I can tell, s3sync does not recreate the directory structure when copying back from S3 - you have to do that yourself, by hand - next to impossible if you have hundreds or thousands of nested directories.

Also, s3sync does not delete files on the remote side that are deleted on the local side, making s3sync useless as a backup tool.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: May 15, 2006 7:53 PM
in response to: 3edges in response to: 3edges
  Click to reply to this thread Reply
It does create the directory structure when copying back, although I just tried and it looks like there's a bug in it (because it only got part way through and then stopped with an error.)

You are correct that it doesn't currently delete things in either direction. This is just a feature that I didn't feel the need to add, and the original author is apparently not working on this any more.

I'll see if I can fix the directory thing because I need that to work for my own backups =)

As to your larger question regarding whether the tool is useful, I would say that it serves a limited purpose and is incomplete. Myself, I want to wait for s3fs (or one of the parallel projects) and then back up to s3 directly as a file system. I am only using s3sync until such time as I can do that.

Thanks for the bug report. More later.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: May 15, 2006 8:39 PM
in response to: 3edges in response to: 3edges
  Click to reply to this thread Reply
NO, I'm apparently wrong. It doesn't seem to be making the directories at all. What the heck... Oh well, easy enough to fix.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: May 15, 2006 9:15 PM
in response to: 3edges in response to: 3edges
  Click to reply to this thread Reply
Missing local directories are fixed now. Restore works as expected.
As always, my updated version is at:
http://s3.amazonaws.com/ServEdge_pub/s3sync

As for delete... maybe next version. Unless someone else wants to take a crack at it.
Re: s3sync -- a simple rsync look-alike perl script for S3
Posted by: greg13070
Posted on: May 15, 2006 9:44 PM
in response to: Thorsten von Eicken in response to: Thorsten von Eicken
  Click to reply to this thread Reply
By the way, my usage is as follows (for example):

backup the /etc folder:
envdir env ./s3sync -r -s /etc Bucket:prefix

restore it to somewhere else:
envdir env ./s3sync -r -s Bucket:prefix/etc/ /restorepoint/etc
(which would restore under a different root just for safety sake)

I use envdir (from daemontools) to set up my environment variables easily (ID,KEY, etc) so that's what the first two words are all about...

For my "prefix" I use the name of a server. That way I can keep separate servers' backups in the same actual bucket without conflict.

Hope this is helpful.
« Previous | Page: 1 2 | Next »