Update boto3 version

0

I need to use a newer boto3 package for AWS Glue Python3 shell job (Glue Version: 1.0). I included the a wheel file in S3: boto3-1.13.21-py2.py3-none-any.whl under Python Library Path. I did a print on boto3.version and observed that Glue Python Shell job still uses 1.9.203 even if i see the following log: "Successfully installed boto3-1.13.21 botocore-1.16.26 docutils-0.15.2 jmespath-0.10.0 python-dateutil-2.8.1 s3transfer-0.3.3 six-1.15.0 urllib3-1.25.10". Is there any way to overwrite the boto3 package version for Glue Python Shell job?

Edited by: blastia on Aug 28, 2020 6:54 PM

Edited by: blastia on Sep 1, 2020 5:19 AM

blastia
asked 4 years ago3581 views
4 Answers
2

You can upgrade the boto3 version with below steps.

  1. Upload boto3 wheel file to your S3 bucket. Boto3 wheel file is available in pypi.org. (https://pypi.org/project/boto3/#files)
  2. Configure your Glue Python shell job with specifying the wheel file S3 path in 'Python library path' in the job configuration.
  3. Insert below codes at the beginning of your python script. (The print statements can obviously be omitted)
import sys
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
    if 'boto' in k:
       del sys.modules[k]

import boto3
print('boto3 version')
print(boto3.__version__)
  1. Then you can import boto3 and start scripting with newer boto3.
    For example, you can use Athena ListDataCatalogs which is not available in default boto3 yet.
athena = boto3.client("athena")
res = athena.list_data_catalogs()

Edited by: NoritakaS-AWS on Oct 22, 2020 9:59 PM

AWS
answered 4 years ago
0

Hi,

We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3

AWS Glue Python Shell with Internet

Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.

Download the following whl files

  1. awscli-1.18.183-py2.py3-none-any.whl https://pypi.org/project/awscli/#files

  2. boto3-1.16.23-py2.py3-none-any.whl https://pypi.org/project/boto3/#files

  3. Upload the files to s3 bucket in your given python library path

  4. Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma

AWS Glue Python Shell without Internet connectivity

Reference: AWS Wrangler Glue dependency build https://github.com/corvuslee/public/blob/master/awswrangler_glue.md

  1. We followed the steps mentioned above for awscli and boto3 whl files

  2. Below is the latest requirements.txt compiled for the newest versions

colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0

  1. Download the dependencies to libs folder

pip download -r requirements.txt -d libs

  1. Move the original main whl files also to the lib directory

  2. awscli-1.18.183-py2.py3-none-any.whl https://pypi.org/project/awscli/#files

  3. boto3-1.16.23-py2.py3-none-any.whl https://pypi.org/project/boto3/#files

  4. Package as a zip file
    cd libs zip ../boto3-depends.zip *

  5. Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
    Note: It is Referenced files path and not Python library path

  6. Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell.

import os.path
import subprocess
import sys

# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))

zip_file = get_referenced_filepath("awswrangler-depends.zip")

subprocess.run(["unzip", zip_file])

# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(["pip3 install --no-index --find-links=. -t . *.whl"], shell=True)

#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]

import boto3
print('boto3 version')
print(boto3.version)

  1. Check if the code is working with latest AWS CLI API

Thanks,
Sarath

Edited by: SarathMohan on Nov 23, 2020 10:32 AM

Edited by: SarathMohan on Nov 23, 2020 10:35 AM

Edited by: SarathMohan on Nov 23, 2020 10:41 AM

answered 3 years ago
0

In case anyone else ends up on this thread, Glue v2.0 simplified this process drastically with the addition of the --additional-python-modules parameter. See

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-glue-20

https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html

The following job parameter made the Athena client functions in boto3 v1.17.12 available without any need to add extra code modifying the python path.

"--additional-python-modules", "botocore>=1.20.12,boto3>=1.17.12"

rluckey
answered 3 years ago
0

This doesn't work for some reason...

After importing the newly updated boto3 library, and checking the version:

print(boto3.__version__)

"1.17.9" is printed

But when I try to access the list_data_catalogs() method:

res = athena.list_data_catalogs()

I receive the following error: "AttributeError: module 'boto3' has no attribute 'list_data_catalogs'"

Edited by: g-crmf on Feb 23, 2021 11:43 AM

Edited by: g-crmf on Feb 23, 2021 11:43 AM

g-crmf
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions