Architecture for Generic Dataset/Report Generation
Diagrams
Component Diagram
Sequence Diagram
IDatasetBuilder, IDataset and DatasetExtensions
Inside the Mobilize.ReportGenerator project, there are multiple classes related to the generation of custom datasets.
To generate a custom dataset
Initialize a new SimpleDatasetBuilder<D> with a data source (an IEnumerable<D>).
The data source should contain all the information that is going to be used. If data from multiple "tables" or lists is required, then it will be necessary to perform Joins (System.Linq.Join) or create custom data structures that are able to hold all the necessary information.
Add parameters as necessary, using the IDatasetBuilder<D>.Parameterize<P> method
You can add as many parameters as you want
Build the dataset with the structure you want
Use the different "Build" methods that are available in the IDatasetBuilder<D> interface
Apply additional transformations to the dataset, if necessary
You can use the extensions methods from DatasetExtensions class
You can use the TransformData method in the IDataset<V> interface
You can add extensions method in the DatasetExtensions class. Inside them, you will likely use the TransformData method in the IDataset<V> interface
Convert the dataset to JSON an return it to the sender of the request
You should use ToJson method in the IDataset<V> interface
IDatasetBuilder
This class allows for the construction of IDatasets with an arbitrary number of parameters and a custom structure (there are three predefined structures: labelled values, labelled sequences and labelled dictionaries).
There are two default implementations for this interface:
SimpleDatasetBuilder
ParametricDatasetBuilder
When designing a dataset, one should instantiate a new SimpleDatasetBuilder and use the methods provided, instead of initializing a ParametricDatasetBuilder.
Some of the methods that are shown in this block of code are actually extension methods, and can be found in the static class DatasetExtensions.
IDataset and ITransformableDataset<V>
The ITransformableDataset interface is implemented by the objects that are created by the IDatasetBuilder. The ITransformableDataset implements the IDataset interface.
The user should use the following methods provided by the ITransformableDataset:
TransformData to change the way in which the data inside the Dataset is structured. Examples of transformations:
Truncating the enumerable to only show the top 5 labels
Adding a label to the enumerable, which holds totalized data
Any other kind of transformation
AddMetadataBasedOnDataAndSelectedParameters to add metadata that is relevant a combination of parameters (not for all the dataset)
AddGlobalMetadata to add metadata that is relevant to all the dataset
ToJson to convert the dataset to the optimal format for communication with other components
DatasetExtensions
This class includes extension methods for the IDataset interface. The main purpose of this class is to allow for extension of this interface, mainly through the use of the TransformData method.
Adding extension methods to this class will avoid the duplication of code that can be generic enough to use it with any IDataset.
Dataset Catalogs and Dataset Generator
IDatasetCatalog
The IDatasetCatalog interface must be implemented by any new catalog that is created. A name and version must be specified, and the GenerateDatasets method is the core of the catalog. The idea is that, inside this method:
Multiple IDatasets must be created (using an IDatasetBuilder and the methods from IDataset).
Each of these datasets should be converted to JSON format using the ToJson method.
Each of these datasets in JSON format must be added to a JArray, which is then returned by the catalog.
An important considerations for error handling: If any of the datasets can't be generated, then a NullDataset must be generated in its place (a NullDataset is a dataset that has no data, no metadata, and only has a globalMetadata property, which inside has a successfulGeneration property, set to false).
Dataset Generator
The dataset generator is a component that is in charge of looking for a catalog name and version, and asking the corresponding catalog to generate the datasets.
There is a method inside the dataset generator where new dataset catalogs can be included. For example, in the method there is currently only one catalog (Name = "RapidScan" and Version = 1):
Communication with Assessment Web API
Assessment Controller
The AssessmentController will expose a method called GenerateDatasets, which receives a DatasetGenerationRequestDto and returns a JArray with the information for each dataset.
The Assessment API will not cache/store any assessment model in order to improve the performance, since this would require the Assessment API to be able to tell if two codebases are exactly the same, and also hold data that will possibly never be used again. The difficulty of this implementation can be assessed in the future. However, at the moment it does not seem likely that it will be used.
There is also another consideration related to the privacy of the user: we should not store data without the user's consent.
DatasetGenerationRequestDto
This is the object received by the Assessment API whenever a dataset generation request is performed.
The compressed output folder must contain the assessment model that will be used to generated the datasets.
The catalog name can be any name that is used to identify the catalog of datasets. For instance, it can be RapidScan.
The catalog version is the version of the catalog. Older versions should not be erased, in order to support backwards compatibility
The configuration is used by the Dataset Catalog to decide if a dataset must be included or not, and also to perform any other custom logic that the specific catalog implements.
If the catalog name and version are not found in the list of catalogs supported by the Dataset Generator. Then the
Structure of the generated datasets
General Structure of the Output (JArray of DatasetDto)
Examples of the JSON representation of Datasets
Example of DatasetDto
1 Parameter
Unit of Measure
Files
Bytes
Lines
For each parameter, we have labelled values
Label corresponds to Technology
Example of DatasetDto
2 Parameters
Technology
C#
SQL
Unit of measure
Kilobytes
Lines
For each pair of parameters, we have labelled sequences
Label corresponds to extension of the files
In the sequences, we can see the top 5 values (in the selected unit of measure) for any file with that extension
Example of DatasetDto
3 Parameters
Technology
C#
SQL
BinaryType
Binary
Non-Binary
Extension
.sql
.cs
.csproj
.dll
.sqlbin (This extension does not exist, but we will assume this is an SQL binary extension)
For each triplet of parameters, we have:
Labelled dictionaries with
Content Lines
Comment Lines
Control Flow Keywords
Each dictionary corresponds to a file, and its labelled with the file's name
Error Management
There are two main types of errors:
Partial Errors
A dataset was not generated because there were errors during generation; most likely related to corrupt data in the assessment model or bad logic in the dataset catalog.
In this case, the data and metadata of the dataset can be ignored and the globalMetadata property should have its "successfulGeneration" property set to false.
A mechanism to generate "NullDatasets" must be provided so that
The server will return a 200 OK
No such Catalog
The catalog that was requested does not exist: either the name of the catalog is wrong or the version of the catalog is wrong.
The server will return a 400 Bad Request.
Unhandled error
The assessment controller will handle any unhandled error that ocurred in the dataset generation process (including unhandled errors in the catalog).
The server will return a 500 Internal Server Error.
Generic UI Overview
Communication between the UI components
This communication is performed the same way as it always has been:
There is a ChartingService that allows for communication between a UI component and a handler (DatasetsHandler)
There is a DatasetsHandler that allows for communication between a service (ChartingService) and the Controller.
The Controller handles any request sent by the DatasetsHandler and return the result of such request, which passess through the DatasetsHandler and the ChartingService before "arriving" at the UI component that called the method from the ChartingService.
In this particular case, the Controller will talk to the Assessment API to generate the requested datasets with the given configuration, catalog name, catalog version and current assessment model
The component is now free to generate the chart using the best suited library/framework
Chart component
A chart component has been created to expose the Chart.js library.
An object of type ChartInfo can be passed as a prop to this chart component.
This implementation is tentative since there are many charting libraries besides Chart.js.
Maybe this component can exist but its name should be different.
Pending Work
Complete TO DOs in the code for the front-end
Decide what to do with the Chart component in the UI.
Complete TO DOs in the code for the back-end
Handle errors of type Partial Errors and Unhandled Error
Add more methods to fill the metadata/globalMetadata of a dataset
Add more unit testing
Improve the internal documentation of the interfaces
Last updated