Extensions and changes from LANL #5

jti-lanl · 2014-12-03T21:45:00Z

[The following is the contents of README_0.5.2 ...]

This is not (yet) an officially-released version of aws4c. 0.5.2
represents unofficial extensions to the 0.5 release, by Los Alamos National
Laboratory.

We attempted to preserve functionality of all interfaces provided by 0.5.
Where new functionality was added, we've typically added new interfaces.

(1) Extensions to IOBuf / IOBufNode.

The 0.5 version simply adds strings into a linked list, via a malloc and
strcpy(). This might be acceptable when all that's needed is to parse
individual response headers, but we wanted to support more-efficient
operations when large buffers are being sent or received. Thus, it is now
possible to add data for writing by installing user buffers directly into
an IOBuf's linked list. It is also possible to add storage for receiving
data in a similar way. In either case, the added storage may be static or
dynamic. Here are some typical scenarios.

// GET an object from the server into user's <data_ptr>
//     NOTE: this uses the "extend" functions to add unused storage
aws_iobuf_reset(io_buf);
aws_iobuf_extend_static(io_buf, data_ptr, data_length);
AWS4C_CHECK( s3_get(io_buf, obj_name) );
AWS4C_CHECK_OK( io_buf );

// PUT the contents of user's <data_ptr>
//     NOTE: this uses the "append" functions to add data
aws_iobuf_reset(io_buf);
aws_iobuf_append_static(io_buf, data_ptr, data_length);
AWS4C_CHECK( s3_put(io_buf, obj_name) );
AWS4C_CHECK_OK( io_buf );

These use cases will typically also want to call aws_iobuf_reset() after
completion, so as to avoid keeping a pointer internal to the IOBuf which
will go out of scope.

(2) Re-using connections

In 0.5, every call to aws4c functions creates a new CURL connection. This
can add overhead to an application that is performing many operations. We
allow the user to specify that connections should be preserved, or to reset
the connection at a specific time.

aws_reuse_connections(1); // begin reusing connections
// ...
aws_reset_connection();   // reset the connection once
// ...
aws_reuse_connections(0); // stop reusing connections

(3) GET/PUT to/from file

Companion-functions to the 0.5 head/get/put/post functions allow user to
specify a file. This also allows user to provide one IOBuf to capture the
response, in addition to one used to send the request.

(4) binary data

The old getline() function was not very useful for reading arbitrary
streams of binary (or text data). We added a get_raw() function that
ignores newlines.

(5) EMC extensions

EMC supports some extensions to "pure" S3, such as using byte-ranges to
write parts of an object in parallel, or to append to an object. These
normally aren't legal, so you have to call a special function to allow the
library support to be used:

// use byte-range to append to an object
s3_enable_EMC_extensions(1);
s3_set_byte_range(-1, -1);   // creates "Range: bytes=-1-"
s3_put(io_buf, obj_name);

// another interface to append to an object
s3_enable_EMC_extensions(1);
emc_put_append(io_buf, obj_name);

// instead of multi-part upload ...
s3_enable_EMC_extensions(1);
s3_set_byte_range(offset, length);
s3_put(io_buf, obj_name);

(6) extras

The 0.5.2 makefile builds a library, libaws4c. We also provide some
debugging functions and XML support, which probably should not be part of
the default library. Therefore, these have their own header
(aws4c_extras.h) and are built into a separate library libaws4c_extras.
This allows test-apps to use extra functionality, without requiring the
production library to be as big.

(7) unit-tests

Feel free to add new unit-tests to test_aws.c This allows simple tests of
new functions, and provides a crude regression-test.

[The following is the contents of README_0.5.2 ...] This is not (yet) an officially-released version of aws4c. 0.5.2 represents unofficial extensions to the 0.5 release, by Los Alamos National Laboratory. We attempted to preserve functionality of all interfaces provided by 0.5. Where new functionality was added, we've typically added new interfaces. (1) Extensions to IOBuf / IOBufNode. The 0.5 version simply adds strings into a linked list, via a malloc and strcpy(). This might be acceptable when all that's needed is to parse individual response headers, but we wanted to support more-efficient operations when large buffers are being sent or received. Thus, it is now possible to add data for writing by installing user buffers directly into an IOBuf's linked list. It is also possible to add storage for receiving data in a similar way. In either case, the added storage may be static or dynamic. Here are some typical scenarios. // GET an object from the server into user's <data_ptr> // NOTE: this uses the "extend" functions to add unused storage aws_iobuf_reset(io_buf); aws_iobuf_extend_static(io_buf, data_ptr, data_length); AWS4C_CHECK( s3_get(io_buf, obj_name) ); AWS4C_CHECK_OK( io_buf ); // PUT the contents of user's <data_ptr> // NOTE: this uses the "append" functions to add data aws_iobuf_reset(io_buf); aws_iobuf_append_static(io_buf, data_ptr, data_length); AWS4C_CHECK( s3_put(io_buf, obj_name) ); AWS4C_CHECK_OK( io_buf ); These use cases will typically also want to call aws_iobuf_reset() after completion, so as to avoid keeping a pointer internal to the IOBuf which will go out of scope. (2) Re-using connections In 0.5, every call to aws4c functions creates a new CURL connection. This can add overhead to an application that is performing many operations. We allow the user to specify that connections should be preserved, or to reset the connection at a specific time. aws_reuse_connections(1); // begin reusing connections // ... aws_reset_connection(); // reset the connection once // ... aws_reuse_connections(0); // stop reusing connections (3) GET/PUT to/from file Companion-functions to the 0.5 head/get/put/post functions allow user to specify a file. This also allows user to provide one IOBuf to capture the response, in addition to one used to send the request. (4) binary data The old getline() function was not very useful for reading arbitrary streams of binary (or text data). We added a get_raw() function that ignores newlines. (5) EMC extensions EMC supports some extensions to "pure" S3, such as using byte-ranges to write parts of an object in parallel, or to append to an object. These normally aren't legal, so you have to call a special function to allow the library support to be used: // use byte-range to append to an object s3_enable_EMC_extensions(1); s3_set_byte_range(-1, -1); // creates "Range: bytes=-1-" s3_put(io_buf, obj_name); // another interface to append to an object s3_enable_EMC_extensions(1); emc_put_append(io_buf, obj_name); // instead of multi-part upload ... s3_enable_EMC_extensions(1); s3_set_byte_range(offset, length); s3_put(io_buf, obj_name); (6) extras The 0.5.2 makefile builds a library, libaws4c. We also provide some debugging functions and XML support, which probably should not be part of the default library. Therefore, these have their own header (aws4c_extras.h) and are built into a separate library libaws4c_extras. This allows test-apps to use extra functionality, without requiring the production library to be as big. (7) unit-tests Feel free to add new unit-tests to test_aws.c This allows simple tests of new functions, and provides a crude regression-test.

…g. "xx.xx.xx.xx:port_number")

Manipulate a linked-list of key-value pairs, to create the meta-data list. Then install it onto an iobuf. This will be added to the object when it is written. During reading, a parser will retrieve these values and install them onto the iobuf used for the get. You can build a list of values once, and reuse them many times, Also, changed const-ness in the arguments of some functions.

User can provide a custom readfunc/writefunc. This allows threaded interaction with curl, such that a series of writes (or reads) can be incrementally added to the data portion of a PUT (or GET). There is example code in test_aws (cases 11 and 12). These tests rely on phtreads, which may not be available on all platforms that want to use libaws4c. Therefore, these tests are not compiled, by default. If you want to run them, you have to build with 'make ... PTHREADS=1' Also added support for chunked-transfer-encoding. This allows a PUT to be sent when the total size of the final object is not known at the time the PUT is invoked. This could be combined with a streaming write. Extended IOBuf so that it keeps track of how much unread data is available. This can be useful for implementing a streaming readfunc. Also extended IOBuf to provide a pointer to user data. This can also be useful in a threaded readfunc/writefunc. In libaws4c, these functions receive callbacks from curl, receiving a pointer to an IOBuf. If they need some other context, this context can be placed into IOBuf.user_data. Finally, test_aws adds an error-message if loading of the user's config-file fails. Otherwise, this failure causes an obscure segfault at runtime.

Added support for user readfunc/writefunc/headerfunc. These are useful to support streaming transactions, where a client starts a request, then provided buffers which are filled(emptied) by successive calls to read(write), controlled via locking. Exported default read/write/header functions, so custom functions can hand-off, if desired. Renamed the default functions by appending "aws_" to their name, in order to avoid conflicts with functions in standard libraries. aws_iobuf_reset() doesn't clear read/write/headerfunc, or user_data, in addition to growth_size. This allows streaming I/O to communicate with these functions by calling aws_iobuf_reset() to indicate empty, without having to carefully reinstall these functionks.

All global variables have been moved into a new AWSContext struct. There is a global instance of this struct, which is used by default everywhere. However, users can now also create their own contexts and attach them to individual IOBufs. This allows multiple threads calling into libaws4c to avoid stepping on each other's parameters. The GET/PUT/DELETE requests generated by this library will look for a context in the IOBuf, falling back to the default context, otherwise. Thus, old code should continue to work without modification. All the old interfaces continue to work the same as they did before (i.e. not thread-safe). The old functions that manipulated global variables (e.g. aws_set_id(), or s3_set_host()) now just manipulate the default context. However, these functions now also have "_r" variants (e.g. aws_set_id_r(), or s3_set_host_r()), which take an extra context argument, changing the settings only in that context. In other words, if you want/need thread-safety, you can now do something like this: AWSContext* ctx = aws_context_new(); aws_set_host_r(ctx, myhost); IOBuf* b = aws_iobuf_new(); aws_iobuf_context(b, ctx); s3_put(b ...); aws_iobuf_reset_hard(b); // frees context, if present aws_iobuf_free(b);

This fixes a problem where aws_iobuf_reset() was wiping out IOBuf.flags where chunked-transfer-encoding was enabled. The solution would be either (a), add IOBuf.flags tot he things that are preserved across calls to aws_iobuf_reset(), or (b) move it to context, which is already preserved. TBD: In general, maybe a lot of the stuff that is currently preserved in calls to aws_iobuf_reset() should just be moved into AWSContext, one way or another. Maybe it makes sense to keep IOBuf.user_data in IOBuf, and preserve it through aws_iobuf_reset(), but many of the other things (e.g. read_fn, write_fn, etc) should probably just be context-things.

See test_aws.c for an example, and there's discussion in README_lanl.

When using SSL (https), either insecure or not, we need to provide the string "https", rather than "http" in the header-generating code. Separated these as two different settings, to be enabled via s3_https() and s3_https_insecure(), with appropriate changes to the header-generating code.

…alues. This behavior caused a subtle bug in MarFS, where the curl callback to a custom readfunction happened at exactly the moment when the IOBuf.user_data value was zeroed out, when another thread called aws_iobuf_reset(). The other thread thought this was safe, because the custom readfunc always waits on a semaphore before accessing the IOBuf contents. However, the readfunc gets access to the semaphore via the user_data. Even though user_data was "not altered", in the long-run, by aws_iobuf_reset(), it was actually being temporarily wiped, and then restored. The upshot is that aws_iobuf_reset() can not temporarily wipe everything and then restore selected values, unless we wrapp a lot more locking around things. The simpler approach is to tweak aws_iobuf_reset() so that it only wipes those values that it is supposed to wipe, and leaves everything else alone.

This is a workhorse, that moves data to and from curl buffers within custom readfuncs used in MarFS. Instead of iterating through chars, we move swaths of storage via memcpy().

If we aren't already inside one of the library functions, then resetting the connection should call curl_easy_cleanup(), instead of waiting for the next use of this connection. Otherwise, if the connection is never used again, we leave file-descriptors sitting in CLOSE_WAIT, forever.

Use set_byte_range[_r] with negative length to cause an open-ended HTTP Range header to be use with the GET request. This is useful in the MarFS fuse implementation, to allow streams to stay open across calls to read().

s3_sproxyd() was setting CTE instead of SPROXYD. This would force everyone to use chunked-transfer-encoding, and would also compute S3 encrypted headers even when they were not needed. Compiling optimized, by default. This turned up a possible us of an uninitialized variable in sqs_example.c AWS4C_CHECK1() now returns non-zero return-codes directly, allowing more-nuanced handling of curl errors. aws_iobuf_extend_internal returns early, if <len> is zero. This allows more-efficient ways to send special singals to curl callback functions.

The idea is that you call s3_set_content_length() or s3_set_content_length_r() before a call to some put/post operation (e.g. s3_put()). The content-length field in the AWSContext gets reset during the put/post, so you must call it again before every put/post. Testing with command-line curl shows significant bandwidth improvement (to Scality sproxyd) using known content-length, as opposed to chunked-transfer-encoding. However, this attempt to invoke the same functionality through libcurl apparently doesn't actually work, as of libcurl 7.19.7. I'm still running into something very similar to the 8-year-old bug reported here: http://curl.haxx.se/mail/archive-2008-05/0032.html Or maybe it's this: http://curl.haxx.se/mail/archive-2011-08/0106.html Anyhow, I want this support in place, in case we can fix it later.

Added some comments, to warn that the content-length tools of the previous commit are not apparently working yet, as noted in the previous commit-log entry. Conflicts: aws4c.c aws4c.h

This is useful in the case where a "streaming" write (e.g. from a pipe) knows the length of data it is ultimately going to write. This allows libcurl to do some things more efficiently. Used in MarFS.

Only if your libcurl is >= 7.38 I actually haven't been able to test this yet, because I have an older libcurl. But some colleagues may be downloading this soon, to test against a newer libcurl, and we want to try this feature.

…om the IOBuf, before pushing new ones to be handed to the curl-interaction thread. However, the streaming_writeheaderfunc(), which parses and installs response-header values into the IOBuf, may be invoked well before all the streaming data has arrived. In that case, aws_iobuf_reset() will wipe the parsed results. Therefore, we provide aws_iobuf_reset_lite(), which leaves any parsed header-fields untouched. Thus, the asynchrony between the two threads is not a problem.

Use s3_http_digest() to enable/disable HTTP digest authentication. We use the user/pass parsed from the ~/.awsAuth file in aws_read_config() as the user/password for libcurl, which is ultimately where the authentication is performed at runtime. Subsequent calls to aws_context_clone() will get a context that can still do this authentication. This approach allows a process running as root to load the credentials (from ~/.awsAuth) at initialization time, then de-escalate and continue to use the context to do authentication, leaving the /root/.awsAuth file unreadable by other users.

…passwd file in place of HOME to find password.

…es to NULL. This appears to address a MarFS problem, where streaming_writeheader() was accessing illegal memory.

…ing to free it.

Optional arg allows access to object-store that uses HTTP-digest authentication.

…ures.

…date. GetStringToSign() takes a new DateConv struct, instead of a char** to receive the date that is computed internally. If the time_t* inside the DateConv is non-null, we use that, instead of the current time, to generate the signature. This allows a custom server to authenticate a signed request by generating its own signature, using a date supplied in the request, and a password looked up locally using a user-name supplied in a request.

jti-lanl and others added 30 commits December 3, 2014 12:44

Added s3_set_proxy(). Argument can include a port-number, as well (e.…

153312b

…g. "xx.xx.xx.xx:port_number")

Improved XML parsing capabilities to make them a little more general.

30ca277

Slight improvement to metadata handling, in GetStringToSign().

76b0afb

Merge branch 'master' into lanl

b5d89fe

Added support for Scality sproxyd.

68fd8af

See test_aws.c for an example, and there's discussion in README_lanl.

Bug caused GetStringToSign() to always act for sproxyd. Fixed.

4baacfd

Added code to do HTTPS requests with -k or --insecure flag.

665e49f

Added CURLOPTs so that we can do HTTPS without a validated certificate.

ffccf71

Improved efficiency of aws_iobuf_get_raw()

c2bd63d

This is a workhorse, that moves data to and from curl buffers within custom readfuncs used in MarFS. Instead of iterating through chars, we move swaths of storage via memcpy().

Allow GET with open-ended byte-range.

76864c4

Use set_byte_range[_r] with negative length to cause an open-ended HTTP Range header to be use with the GET request. This is useful in the MarFS fuse implementation, to allow streams to stay open across calls to read().

Merge branch 'lanl' of github.com:jti-lanl/aws4c into lanl

300e9c6

Added some comments, to warn that the content-length tools of the previous commit are not apparently working yet, as noted in the previous commit-log entry. Conflicts: aws4c.c aws4c.h

Fixed support for explicit content-length with PUT.

eb0d67b

This is useful in the case where a "streaming" write (e.g. from a pipe) knows the length of data it is ultimately going to write. This allows libcurl to do some things more efficiently. Used in MarFS.

Conditionally setting CURLOPT_EXPECT_100_TIMEOUT_MS.

ad53590

Only if your libcurl is >= 7.38 I actually haven't been able to test this yet, because I have an older libcurl. But some colleagues may be downloading this soon, to test against a newer libcurl, and we want to try this feature.

Minor improvement to the curl 7.45 conditional compilation, etc.

04f2104

Changed aws to use euid rather than uid for user checking. also uses …

883f7f9

…passwd file in place of HOME to find password.

aws_context_reset[_r] should release all member storage, and set valu…

f9a58c4

…es to NULL. This appears to address a MarFS problem, where streaming_writeheader() was accessing illegal memory.

jti-lanl added 7 commits May 11, 2016 11:07

aws_iobuf_reset_hard() should check whether context exists before try…

1babf46

…ing to free it.

s3_get app should get host, bucket, object-id, etc, from command-line.

0dd3a8b

Optional arg allows access to object-store that uses HTTP-digest authentication.

Export GetStringToSign(), so callers can generate their own S3 signat…

42cb1ac

…ures.

Fixed segfault in the new __aws_get_httpdate(), when debug==1

cbc9bb5

Always build with '-g'

9afa0da

Fixed bugs found by a new(er) compiler.

cec7fd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extensions and changes from LANL #5

Extensions and changes from LANL #5

jti-lanl commented Dec 3, 2014

Extensions and changes from LANL #5

Are you sure you want to change the base?

Extensions and changes from LANL #5

Conversation

jti-lanl commented Dec 3, 2014