Renewable Energy sources monitoring: Part 3
This is the continuation of the previous article. For better understanding it is recommended to look through the earlier blog posts.
Non-uniform data collection and processing methods
The data from the sources mentioned above are to be consolidated and after consolidation is to be divided into target domains. In the context of studied topic the most relevant domains are the most popular green energy sources three of which are of a top interest. The list is provided in descending order with respect to the volume of produced energy):
Data division by the domains is provided at the figure below
The main problem concerning the consolidated data, as it was mentioned above, is the absence of uniform standard of weather data exchange. The same is true for climate parameters and environmental information. This is a main reason explaining the necessity of the renewable energy monitoring system capable of working with non-uniform heterogeneous data provided in different formats mentioned above. The common practice is a manual data preprocessing for later conversion into a needed format to be used after. The data gathering process may be done automatically (by means of web service clients, FTP downloaders etc.). At the same time the data conversion into a uniform data format requires a lot of sources for creating the format as well as for converting data into this format. For instance, unzipping and conversion GRIB data format with the help of special utilities, conversion it into a CSV and later incorporation into a database. This issue concerns mainly Extract, Transform, Load (ETL) systems. ETL (Extract, Transform, Load) is one of the main processes of data storage managing that includes:
data extraction from outer sources
transformation and parsing the obtained data in order to make them suitable for the need of a particular business model
the clean data into a data storage
In the sight of view of ETL processes the data storage architecture may be described in terms of three key components:
data source: contains structured data in form of a table, several tables, or simply a file (a comma separated value file)
intermediary space: contains complementary tables that are created temporally and for upload processing managing purposes only
data receiver: a data storage or aa database where finally all the obtained data is to be saved
Movement of data from data source to a receiver is called a data stream. Requirements for the data stream organization are described by the analyst. ETL process should be considered not only as the data transmission process from one application to another but also as data preprocessing tools that is used to prepare data for analytics. To create such kind of system it may be inevitable to spend a lot of resources, including time ones.
The system construction, in general, includes following stages:
Data upload mechanisms (separate for each data source)
Data update manager (to detect ‘outdated’ data and to start ETL process for the certain source)
Data converter (separate for each data source)
In the context of creating the renewable energy sources monitoring system the idea of using prepared datasets seems the most reasonable and efficient. Creating own fully working ETL service is inefficient from the point of view of time and human resources.
After fulfilling the ETL processes the next issue concerns consolidated data storage. At present there are a great deal of data storages that may be, in turn, classified into different parameters. These parameters may be, for instance, usage purpose, way of data storing, or data distribution principle. The way of data storage is important as it is the distinguishing parameter that influences the complexity of converting data while downloading from a database of uploading to a database. By the way of organization the databases are divided into:
One of the most important criteria of choosing a storage is a propagation level of the Data Base Management System (DBMS) as this directly influences the processing speed and the quality of the developed data storage. From this sight of view one of the most efficient solutions for data consolidation is is relational DBMS PostgreSQL with its spatial extension PostGIS. One of its main features is an ability to realize the SQL/MED (Management of External Data) standard, describing a connection of non-uniform data storages to a master data base. Non-uniform data sources connection with the help of SQL/MED realization in PostgreSQL is shown schematically on the following figure. At the same time these data sources may be used either directly or for later extraction of certain data.
The problems concerning the data analysis, especially form the point of view of energy sources monitoring require a constant and fast access to the data. In this case, the most suitable is the second option, i.e. when data is uploaded to the server for later analysis.
Besides using open source network data it is possible to use data taken from weather stations and weather sensors. Data collection from weather sensors slightly differs from the process described above as data is sent in the form of a constant continuous data stream. In the context of the current project it is proposed to create a network prototype of autonomous weather stations in order to fulfil high resolution data gathering in a real time mode. Due to these reasons a time interval between data transmission sessions should be sufficiently small in order to guarantee a good time resolution. In the weather station prototype for data transmission an Internet connection (GPRS) has been applied. Thus, all the data is sent to the server by a simple HTTP request. In case of creation of the big weather station network the network load increases proportionally to the number of weather stations and measurement frequency. In order to cope with a large number of HTTP request a load stabilizer is applied (like Nginx). It is necessary to mention that such solutions are used to stabilize the loads of hundred thousand requests per second (depending on the equipment type). If we apply a horizontal scaling and split a sensor network into several servers the problem of request overload may easily solved.
Data form the queue is constantly recorded to the database. This data forms another information source that is interesting especially for the analysis. An ability to analyze the data in a real-time mode is provided by the Azure Stream Analytics tools . These tools are oriented on the analysis of data obtained from the devices that are a part of the Internet of Things (IoT) class and weather stations belong to this class. Moreover this tool gives an opportunity to detect anomalies in data and to signalize them. For instance, if some of the sensors gets out of order, its fault measurements will be directly visible.
The chapter outlines data sources used in renewable energy monitoring systems. Their properties as spatial and time resolution, volume, data representation formats and cost have also been analyzed. It has been set an objective of creating the system of data parsers or Extract, Transform, Load (ETL) systems, allowing uploading and later using non-uniform data sources. We described a general attitude to creating the system of consolidation the non-uniform data sources on the basis of open source software. The obtained results show the possibility of creating a system of gathering and processing of non-uniform data that can be applied in renewable energy monitoring and allow to have a precise goals and objectives for later research.
To be continued.