One of the most fascinating things about Azure Functions is the ability to easily deploy your HTTP or Time triggered web services. What if you wanted to create a function that looks for updates in your sftp server and applies your function on the needed files.
In this example we are going to create a function that looks for new files in your SFTP server and launches the logic. Up we go!
Create a Function project
Connect to the SFTP server
Get all the filenames from the output foldes
Get all the filenames from the input folder
Compare the lists
Apply your function on the difference : in our case we get all the files that are in input folder and not in the output.
Here is the example
def main(req: func.HttpRequest) -> func.HttpResponse:
# FTP Server parameters
USERNAME = "your user name"
PASWORD = "your password"
SERVER = 'your server'
import_count = 0
# if a file name is obtained then start the process
if True:
# arrays with the list of processed and input files
input_files_array = []
output_files_array = []
# workaround to set pysftp up and running on Python 3
# and avoid "AttributeError: 'Connection' object has no attribute '_sftp_live'"
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
# establish a connection to the SFTP server
with pysftp.Connection(host=SERVER, username=USERNAME, password=PASWORD, cnopts=cnopts) as sftp:
logging.info("SFTP Connection succesfully established ... ")
# Switch to a remote directory
sftp.cwd('/OUTPUT_FOLDER')
# Obtain structure of the remote directory '/var/www/vhosts'
directory_structure = sftp.listdir_attr()
# Get filenames
for attr in directory_structure:
fname = os.path.splitext(attr.filename)[0]
# initialized processed dataset
processed_array.append(fname)
# now get the initial data
sftp.cwd('/RAWDATA')
# Obtain structure of the remote directory '/var/www/vhosts'
directory_structure = sftp.listdir_attr()
# Print data
for attr in directory_structure:
fname = os.path.splitext(attr.filename)[0]
extension = os.path.splitext(attr.filename)[1]
if str(extension) == '.pdf':
raw_data_array.append(fname)
# get all the files from Input folder that have not been processed yet
difference_array = list(set(raw_data_array) - set(getImportFiles(cursor)))
logging.info('############################################')
logging.info(difference_array)
for unprocessed_file in difference_array:
# create file object from a file name (obtained from the GET request) to read the file directly to a variable
file_object = BytesIO()
sftp.getfo(unprocessed_file, file_object)
# process your files
# get a file name without extension
clean_filename = unprocessed_file
return func.HttpResponse(f"Success")
else:
return func.HttpResponse("Please pass a file name on the query string or \
in the request body", status_code=400)
Finally you need to switch directory and save the output in the output folder (to be able to compare the outputs afterwards). And don't forget to add pysftp in your requirements.txt
Hope this was helpful
Kommentare