• Alibek Jakupov

Connecting to SFTP server from Azure Functions

Updated: May 3


One of the most fascinating things about Azure Functions is the ability to easily deploy your HTTP or Time triggered web services. What if you wanted to create a function that looks for updates in your sftp server and applies your function on the needed files.


In this example we are going to create a function that looks for new files in your SFTP server and launches the logic. Up we go!


  1. Create a Function project

  2. Connect to the SFTP server

  3. Get all the filenames from the output foldes

  4. Get all the filenames from the input folder

  5. Compare the lists

  6. Apply your function on the difference : in our case we get all the files that are in input folder and not in the output.

Here is the example


def main(req: func.HttpRequest) -> func.HttpResponse:
 FTP Server parameters
    USERNAME = "your user name"
    PASWORD = "your password"
    SERVER = 'your server'

    import_count = 0

 if a file name is obtained then start the process
 if True:
 # arrays with the list of processed and input files
        input_files_array = []
        output_files_array = []

 # workaround to set pysftp up and running on Python 3
 # and avoid "AttributeError: 'Connection' object has no attribute '_sftp_live'"
        cnopts = pysftp.CnOpts()
        cnopts.hostkeys = None
 # establish a connection to the SFTP server
 with pysftp.Connection(host=SERVER, username=USERNAME, password=PASWORD, cnopts=cnopts) as sftp:
            logging.info("SFTP Connection succesfully established ... ")

 # Switch to a remote directory
            sftp.cwd('/OUTPUT_FOLDER')
 # Obtain structure of the remote directory '/var/www/vhosts'
            directory_structure = sftp.listdir_attr()
 # Get filenames
 for attr in directory_structure:
                fname = os.path.splitext(attr.filename)[0]
 # initialized processed dataset
                processed_array.append(fname)

 # now get the initial data
            sftp.cwd('/RAWDATA')
 # Obtain structure of the remote directory '/var/www/vhosts'
            directory_structure = sftp.listdir_attr()
 # Print data
 for attr in directory_structure:
                fname = os.path.splitext(attr.filename)[0]
                extension = os.path.splitext(attr.filename)[1]

 if str(extension) == '.pdf':
                    raw_data_array.append(fname)

 get all the files from Input folder that have not been processed yet 
            difference_array = list(set(raw_data_array) - set(getImportFiles(cursor)))
            logging.info('############################################')
            logging.info(difference_array)
 for unprocessed_file in difference_array:
 # create file object from a file name (obtained from the GET request) to read the file directly to a variable
                file_object = BytesIO()
                sftp.getfo(unprocessed_file, file_object)
 # process your files
 get a file name without extension
                clean_filename = unprocessed_file

 return func.HttpResponse(f"Success")
 else:
 return func.HttpResponse("Please pass a file name on the query string or \
        in the request body", status_code=400)

Finally you need to switch directory and save the output in the output folder (to be able to compare the outputs afterwards). And don't forget to add pysftp in your requirements.txt


Hope this was helpful

 
  • Twitter
  • LinkedIn

Since 2018 by ©alirookie