Architecture for Light Scanners

Requirements

Functional

  • The Light Scanners should generate the following data for the Assessment Model

    • Inventories

      • Usings, with their files

      • Types found in fields, with the files in which they are defined

    • Project

    • ProjectFile

    • InternalProjectReference

    • ExternalLibraryReference

    • ExternalLibrary

  • The Light Scanners should be able to handle generic:

    • Regular expressions

    • XML extraction

    • JSON extraction

Non-Functional

  • The architecture should be based on what we have to support multiple code processors

  • It must be very robust, nothing should fail

  • As light as possible

  • Extensible

  • Multiple phases could be implemented, to keep the work as granular as possible

Architecture

How to tackle each requirement?

  • How to implement phases? / How to treat failures?

    • Each phase has multiple tasks that can be performed in parallel.

    • The results of all the tasks from a phase are merged into the result of phase.

    • The result of a phase is passed onto the next phase , in order to be able to support.

    • Maybe in each phase we should have "default" result which can be combined with the results from each task... if a task fails, its result is not included in the computation... if all tasks fail, the default will be presented. Tasks should be "granular" enough so that, if a part fails, the task can continue.

    • More about this can be checked in the class diagram.

  • How to make sure that the scanners are small?

    • The idea is to mostly focus on configuration files which contains the patterns we are searching. With this, there is no need for a lot of code for each specific case (as it happens with parsers, symbol tables, etc...).

    • We can't be sure that the scanners will be small, but at this moment in the research, we can't be sure about the size (the expectation is that the code will not be a lot).

  • How to use the Generic Infrastructure?

    • An object containing the Code Model writers and readers, loggers, telemetry object, etc... should be passed.

    • This object could be the TConfiguration type that is passed to the BaseLightScanner (you can check this in the class diagram).

  • How will this tool be extensible?

    • Add new light scanners

    • MEF plugins

      • Should be sought for in an APPDATA folder?

      • Use ImportMany with interface ILightScanner

    • Configuration files which can be modified

      • Can change the behavior of an specific scanner

      • Should be sought for in an APPDATA folder?

      • When the configuration file is updated, this should not overwrite the user configuration

    • Can these configuration files be manually modified?

    • Should these configuration files have an auto-update mechanism?

    • The extensibility related to configuration files seem to be more important to the stakeholders in the short-term

Diagrams

Classes

  • The ILightScanner interface implements the ICodeProcessorWorker:

    • It is recommended to have this interface even if it does not implement any methods of its own, just because in the future we might want to include some behavior that is specific to the light scanners, or maybe use this interface in a MEF contract.

  • The BaseLightScanner<TConfiguration> is an abtract class with an abstract method that needs to be overriden (LoadConfiguration). When executed, the onlyPhase will be executed. This phase should be assigned at the creation of the light scanner, and multiple phases can be merged into a single phase by using the Concatenate method provided by the ILightScannerPhase

  • The ILightScannerPhase<TInput, TConfiguration, TResult> has two methods: Concatenate and Execute.

    • Concatenate allows two phases to be merged into a single phase (which in turns, allows for an arbitrary number of phases to be merged into a single phase that can be passed used to create the concrete BaseLightScanner).

    • Execute will execute the phase with an input (of type TInput) and a configuration (of type TConfiguration), producing a result (of type TResult)

  • The BaseLightScannerPhase<TInput, TConfiguration, TResult> is an implementation of ILightScanner<TInput, TConfiguration, TResult> that has multiple tasks and will execute them asynchronously before merging the results from all tasks into a result of type TResult (this happens inside the Execute method).

  • An instance of CompoundLightScannerPhase<TInput, TConfiguration, TResult> is the result of concatenating two BaseLightScannerPhases. When it its executed, it executes both of the phases, one after the other. It inherits from BaseLightScannerPhase so that it can be composed multiple times more (which allows for the composition of an arbitrary number of phases).

  • The ILightScannerTask<TInput, TConfiguration, TTaskResult, TPhaseResult> is essentially an interface that must be implemented by a task.

    • The Execute method will perform the work required by this task and produce a result of type TTaskResult. It receives an input (of type TInput) and a configuration (of type TConfiguration).

    • The TTaskResult generic type must implement the interface IPartialPhaseResult<TPhaseResult> so that the result of the Execute method can be combined with the result of other tasks with the same TPhaseResult generic type.

  • The IPartialPhaseResult<TResult> is an interface that represents a result that can be "appended" to a result of type TResult. The idea is that multiple tasks of the same phase return different types of results, all of which implement the same IPartialPhaseResult<TResult> (with the same TResult). All of these results can be merged into the result of the phase, which will have type TResult.

  • The main point that requires more work is:

    • How to pass the SubProgressDescriptors to each Phase and Task?

Phases and Tasks

Last updated