This is the continuation of the previous article. For better understanding it is recommended to look through the earlier blog posts.
Heterogeneous data processing in sight of monitoring systems
At present there are several scientific research centers implementing research in the field of renewable energy sources usage and monitoring. Firstly this is a National Renewable Energy Laboratory (NREL), research team implementing the “Russia’s Renewable Energy Sources” GIS project, The Institute of Computational Technologies established at Novosibirsk Scientific Centre, Russia, founded and since then leaded by academician (full member of the Russian Academy of Sciences) Yuri Shokin, that pays particular attention to the problem of design and implementation of program systems and application of distant measurement data, The International Scientific Research group “Sustainable, Unconventional and Renewable Energy”, led by professor Panfilov M. (Universite de Lorrane, France) whose researches are focused on energy storage and Smart Grids, Centre for Energy, Environment and Health (CEEH) in Denmark etc.
Heterogeneous data monitoring and processing
Large scale monitoring systems are commonly constructed by means of distant Earth’s surface observation. As an example, it is possible to mention ecologic monitoring systems particularly, solar and wind energy monitoring system, NREL (more detailed overview is given below). This issue is also frequently addressed in Central Asia. But still, most of the mentioned systems consider different features and directions of monitoring as separate systems and, in some cases, not relevant to each other. At present unifying non-uniform information sources is considered as one of the most relevant scientific issues and is often discussed in scientific community.
In the work “Questions to creation of geoportals for assessing the environmental safety of mining companies” is considered an attitude commonly applied in creating a web-oriented system based on the cloud platform allowing an integration of non-uniform space information to solve the problem of ecological monitoring of coal-mining factories. It has been mentioned that due to certain reasons the results of ecological state evaluation works cannot be considered as satisfactory. These reasons are: large area of evaluation regions, measurements on the limited number of observation sites, event registration delay and absence of unified representation of environmental state. In order to solve this problem it is proposed to apply cloud technologies in distributed data processing. The components used in the system are as follows: GoogleApp Engine cloud service, Google Users API authentication service, Google Map API map service, PostgreSQL Database Management System. Data collection is realized by the means of crowdsourcing. It is proposed to collect technological, space and environmental data verified with the help of information obtained by distant measurements (radar and hyperspectral remote sensing data).
The main goal of hyperspectral imaging is to gather and process data from across the electromagnetic spectrum. Hyperspectral imaging is aimed to get the spectrum for every single pixel in the image of a scene, while looking for objects, in order to identify materials, or to detect processes. Hyperspectral sensors and processing systems are commonly constructed to be later applied in astronomy, agriculture, biomedical imaging, geosciences, physics, and surveillance. Hyperspectral sensors observe objects mainly by means of the electromagnetic spectrum. There are certain objects whose unique features ('fingerprints') may be distinguished in the electromagnetic spectrum. These features ('fingerprints') are referred as spectral signatures. Generally they allow identifying the materials that the scanned object is composed of. For instance, while looking for new oil fields oil companies apply a spectral signature for oil.
Imaging radar is known as a creation of two-dimensional images of landscapes by using radars. Light emitted by imaging radar illuminates an area on the surface to take a picture at radio wavelengths. Images are recorded with the help of an antenna and digital computer storage. An observer cannot actually see an object itself in a radar image as it is only the energy reflected back from the object’s surface towards the radar antenna. In most of the cases radar image is used to display the position and motion of objects with high reflection level. Thus, a radio wave signal is sent out, then the direction and delay of the reflected signal are detected to finally create an image.
An analytical data processing may be performed with the help of the information taken from different sources (in this case meta description and cloud system services are applied). System module of calculation services is constructed as a tree. All the calculations are performed while ‘moving’ through the tree.
In the work ZooSPACE platform as integration of technology solutions for implementation of access to heterogeneous information resources is described the methodology of integrating distributed heterogeneous information sources on the basis of weakly connected distributed subsystems. The platform, that has been named ZooSPACE by its developers, is aimed to integrate heterogeneous databases virtually. The system infrastructure is based on the arbitrary number of weakly interconnected nodes. An interaction is implemented with the certain network layer protocols based on TCP/IP protocol, SRW/SRU and Z39.50 technologies having a strong potential to integrate data from different DBMS. The system processing is provided by several LDAP server nodes storing all the configuration data in the form of a single hierarchical database. All the LDAP servers are replicated. Replication means sharing data. Data is shared in order to ensure consistency between redundant resources, (software or hardware components). Commonly, replication is applied to improve reliability, fault-tolerance, or accessibility. Thus, some heavy computational tasks may be executed on separate devices.
It should be mentioned that ZooSPACE servers are based on those ones taken from authors’ previous project. These servers are considered as data access interfaces working under SRW/SRU and Z39.50 protocols and providing an interaction with DBMS servers. It is also possible to readdress request to the other servers, even if they are not in the system network. Besides containing the servers mentioned above, the system is also composed of monitoring subsystem, statistics gathering, administrative and user interfaces. System components are written in the PHP on the Apache server environment. A pilot project has been realized on the base of four nodes (The Institute of Computational Technologies of Novosibirsk Scientific Centre, Russia, Novosibirsk State University, The State Public Scientific Technological Library of the Siberian Branch of the Russian Academy of Sciences, Tomsk Scientific Center of the Siberian Branch of the Russian Academy of Sciences) providing an access to 60 data sources containing more than 40 million records.
In the work GIS modeling and monitoring of territorial processes: design, implementation, usage are discussed the issues concerning development of the GIS for space monitoring of agro technical activities, furrow utilization efficiency, air pollution in the city geo informational modelling and system construction architecture. The system has been constructed on two different GIS. One implementing monitoring and another performing engineering modelling.
At present there are a great deal of GIS providing data on renewable energy on local and international level. GIS that are commonly used in Russia and USA and the National Renewable Energy Laboratory overview is given below. All the data used in the overview is available as project descriptions given on the web portal being discussed. All the systems have been analyzed from the point of view of the data used on the portal, visualization tools and maps (wind, solar etc.).
The data is the main valuable source of such systems. If certain data is confidential there are several open source datasets providing correct data on renewable energy, and in some cases, with relatively high precision. The following table provides a brief overview of some datasets with information on solar energy.
The main issue in creating geographical information systems is finding the way of representing features being visualized (geographic parameters in this particular case).
There are several ways of describing geographical features. Firstly it is necessary to recognize the data types. Normally two data types are defined: Spatial data (data describing the location); Attribute data (data specifying the characteristics of the spatial data i.e. what, when or how much).
Secondly, it is needed to find the way of representation of the data in the GIS. Digitally it is possible to represent the data by grouping it into layers or by selecting appropriate data features.
Data grouping by layers is focused on finding similarities or relevant features in the target data (these features may be the source type as hydrography, elevation, water lines, sewer lines, grocery sales) . In this case it is possible to use one of the following data models: vector (data model using coverage in ARC/INFO, shapefile in ArcView); raster data model (GRID or Image in ARC/INFO & ArcView).
Selecting data properties should be done for for each layer separately. Features are chosen with respect to: projection; scale; accuracy; resolution.
Finally it is necessary to find the means of data incorporation into a computer application system. Commonly it is done by the means a relational Data Base Management System (DBMS).
It is needed to discuss each data type separately. Spatial data types are represented as continuous, areas, networks and points. Continuous data types are divided into: elevation, rainfall and ocean salinity.
Areas are generally defined as:
unbounded: landuse, market areas, soils, rock type
bounded: city/county/state boundaries, ownership parcels, zoning
moving: air masses, animal herds, schools of fish
Networks may be classified as:
Points are generally described as:
Fixed ones: wells, street lamps, addresses
Moving ones: cars, fish, deer
Attribute data types are generally defined as special data tables that contain locational information in the form of addresses, a set of longitude/latitude coordinates (or x/y) etc. Systems like ArcView consider these data tables as event tables. However, the spatial data in the real system is described as a shape file. Thus, all the event tables are to be converted to the strict format. In order to convert data to a shape file format, it is possible to use geocoding, and later display the data as a map.
Attribute data is commonly described as categorical and numerical. Categorical data is frequently referred as name representation and is divided into:
nominal (nominal data has no inherent ordering; for data representation it is used land use types, county names etc.)
ordinal (ordinal data representation obeys a strict inherent order; for data representation classes are used, as road classes and stream classes)
The categorical data is frequently converted to numbers (for example SSN data format), but cannot be adjusted to do arithmetical operations (i.e. cannot be calculated as numeric data).
Numeric data is actually a known difference between certain values calculated by the system. It is classified into interval and ratio.
interval data (has no natural zero; has no proportion; is presented for instance by temperature values (Celsius or Fahrenheit) )
ratio (has natural zero; has proportion (ratio), thus the values may multiplied; is presented for example as income, age or rainfall indices)
The datasets provided in attribute numeric format are expressed either as whole number (integer) or decimal fraction (floating point).
In common case all the maps provided by renewable energy monitoring systems are classified with respect to the energy source they are representing. They are also classified concerning the means of their creation (i.e. raster and vector data model).
Raster data is simple and faster to realize but in certain cases the map resolution may not be sufficient enough for proper analysis. The main distinguishing characteristics of raster data model are:
to define a location it is used a rectangular array or matrix (thus a location is defined as a grid cell)
attributes may only be represented as a single value nested in the cell
this form allows representing a large amount of data (images provided by remote sensing ( as those ones of LANDSAT, SPOT); scanned maps; elevation data from USGS)
continuous features that may be efficiently presented by this form are: elevation, land use, soil type, temperature.
Vector data model requires more complicated and sophisticated tools but it is, in turn, a correct way of weather data representation. Thus, vector data model provides high quality resolution which may be extremely important in certain cases.
Main characteristic of vector graphics are:
to define an object location it is used a set of longitude/latitude or x/y coordinates (they are later linked to form lines and polygons)
every attribute is assigned a unique ID, through which it can be referenced
the majority of data is provided in one of three forms (DIME and TIGER files from US Census; DLG from USGS for streams, roads, etc; census data (tabular); features with discrete boundaries that may be efficiently described by vector (property lines; political boundaries; transportation))
To be continued.