©2018 by macnabbs. Proudly created with Wix.com

 
  • Alibek Jakupov

Connecting to SFTP server from Azure Functions


One of the most fascinating things about Azure Functions is the ability to easily deploy your HTTP or Time triggered web services. What if you wanted to create a function that looks for updates in your sftp server and applies your function on the needed files.


In this example we are going to create a function that looks for new files in your SFTP server and launches the logic. Up we go!


  1. Create a Function project

  2. Connect to the SFTP server

  3. Get all the filenames from the output foldes

  4. Get all the filenames from the input folder

  5. Compare the lists

  6. Apply your function on the difference : in our case we get all the files that are in input folder and not in the output.

Here is the example


def main(req: func.HttpRequest) -> func.HttpResponse: # FTP Server parameters USERNAME = "your user name" PASWORD = "your password" SERVER = 'your server' import_count = 0 # if a file name is obtained then start the process if True: # arrays with the list of processed and input files input_files_array = [] output_files_array = [] # workaround to set pysftp up and running on Python 3 # and avoid "AttributeError: 'Connection' object has no attribute '_sftp_live'" cnopts = pysftp.CnOpts() cnopts.hostkeys = None # establish a connection to the SFTP server with pysftp.Connection(host=SERVER, username=USERNAME, password=PASWORD, cnopts=cnopts) as sftp: logging.info("SFTP Connection succesfully established ... ") # Switch to a remote directory sftp.cwd('/OUTPUT_FOLDER') # Obtain structure of the remote directory '/var/www/vhosts' directory_structure = sftp.listdir_attr() # Get filenames for attr in directory_structure: fname = os.path.splitext(attr.filename)[0] # initialized processed dataset processed_array.append(fname) # now get the initial data sftp.cwd('/RAWDATA') # Obtain structure of the remote directory '/var/www/vhosts' directory_structure = sftp.listdir_attr() # Print data for attr in directory_structure: fname = os.path.splitext(attr.filename)[0] extension = os.path.splitext(attr.filename)[1] if str(extension) == '.pdf': raw_data_array.append(fname) # get all the files from Input folder that have not been processed yet difference_array = list(set(raw_data_array) - set(getImportFiles(cursor))) logging.info('############################################') logging.info(difference_array) for unprocessed_file in difference_array: # create file object from a file name (obtained from the GET request) to read the file directly to a variable file_object = BytesIO() sftp.getfo(unprocessed_file, file_object) # process your files # get a file name without extension clean_filename = unprocessed_file return func.HttpResponse(f"Success") else: return func.HttpResponse( "Please pass a file name on the query string or in the request body", status_code=400 )


Finally you need to switch directory and save the output in the output folder (to be able to compare the outputs afterwards). And don't forget to add pysftp in your requirements.txt


Hope this was helpful