Manage Dataset
    • 28 Feb 2025
    • 13 Minutes to read
    • Contributors
    • Dark
      Light

    Manage Dataset

    • Dark
      Light

    Article summary

    Manage Datasets

    From the Zetaly Data Platform (ZDP), you can access a page giving a complete view of collected data and associated strategies.
    For each dataset collected, life cycle information and aggregations are available.
    The information provided allows you to monitor the processing of your data in near real time.

    For each dataset, you can access details of the data collected, as well as statistics on the space used and information on the rules that make up the lifecycle defined by the strategy.

    From the dataset list you can assign the strategy to be applied per dataset, choose which fields to keep or exclude, define indexes, assign operations performed during the various aggregations, archive data and delete datasets no longer of interest to your business.

    It is also from this list that you can activate or deactivate the execution of a strategy associated with a dataset.

    Displaying Datasets List

    A dataset overview is available to facilitate data management and provide a status report.

    To access this view, select “Data Enrich” from the Zetaly main menu then "Data Management".
    The Dataset table is displayed sorted by dataset status by default (see rules status for more information).



    The table contains the following columns

    Columns LabelsDescriptions
    State

    Status of global dataset rules.
    In the event of an “error” in a strategy rule, the error status is indicated.

    LabelDataset Label
    Related APIAPI for querying dataset data
    StrategyAssociated strategy label
    RulesNumber of associated rules (and child datasets).
    AuthorUser having associated the strategy (hidden by default). .
    Association dateAssociation date
    EditorUser having updated the association hidden by default). .
    Update DateUpdate Date
    TagsUser Business information

    Note : To show or hide a table column, click on the icon at the top right of the table.

    All dataset management operations are performed from this global view.
    The following sections describe the various possible actions and their operating modes.


    Display Dataset Details

    The dataset sheet provides detailed information about the dataset.

    What is the type of data, what is the collection method, what information is collected, what is the lifecycle policy and its status.
    A wealth of information is available. The example below shows a dataset with an active policy and rules status.

    This page is accessed by clicking on one of the rows in the dataset table referred to in the previous paragraph.

    From Detail page you can : 





    Available functions :


    Activate the strategy (avalaible only is a strategy is assigned.)



    Stop the strategy


    Assign strategy and configure.


    Add Tags


    This detail view has three main sections describ below.

    Global information.

    General dataset information is provided in the ribbon. This information is available for all datasets.

    The Ribbon contains the following cells :

    Cells LabelsDescriptions
    DescriptionDataset description
    Collection MethodCollection Method
    CategoryCategory of data collected
    StrategyStrategy associated to the Dataset ( Orphan by default)
    StateState of the Strategy
    API NameAPI to access the the selected Dataset
    StorageAllocated Storage for the selected Dataset
    Association DateDataset / Strategy  Association Date
    Associated ByUser having associated the Dataset and Strategy


    2  Related Dataset

    Information on Datasets attached to the Global Dataset.


    A Global Dataset represents the logical view of all the datasets resulting from the collection and/or the various stages of the lifecycle.

    A Global Dataset is always attached to a “Collected” dataset.

    • "Collected”  : Contain the raw data collected. No aggregation operation has been performed on the data contained in a dataset.
    • "Child” : Contain data aggregated according to the rules associated with this dataset. For each global Dataset, there are as many Child-Datasets as there are rules defined in the associated strategy.  


    The table contains the following columns

    Columns LabelsDescriptions
    LabelDataset Label (and API to access to this dataset)
    GranularityData Aggregation granularity
    TypeDataset type (Collected or Child)
    StateState of the rule associated to the dataset
    see rule management for more information about rule status and color.

    Selecting a “Collected” or “Child” dataset update the collected fields table and modifies the menu, providing access to statistical information on the selected dataset itself or information on the associated rule in case of Child dataset type.
    When the associated strategy is active rules are operating automatically.
    If for any reason one or more rule(s) is in state "Error", it is recommend to analyse the context using the provided Rule log and Dataset Log to obtain information.
    Once the analysis and potential issue is fixed, you can restart individually a rule to restart operation.


    Restarting a rule :


    From "Related Datasets" table, select the "Restart" option menu.

    When a rule is restarting, the state "Ready" must be display.
    After a moment a new rule cycle will start  normal operations 


    Dataset Fields

    Displays information on the fields in the dataset.  This information is available at every stage of the strategy.
    By clicking on one of the Dataset (Collected or child), this view is updated according to the definitions made during Dataset / Strategy association.
    For each field present in the selected Dataset, information on its type, operation and description is displayed.

    The table contains the following columns

    Columns LabelsDescriptions
    LabelDataset field Label 
    Data TypeData Type 
    OperationsSQL operation applied on the field
    DescriptionOptionnal. Information about the data itself.


    Assigning a strategy

    To implement dataset lifecycle management, you need to assign a strategy to the dataset.
    Before doing so, rules and strategies must have been defined.

    All collected or imported data is stored in a “Collected” dataset.
    The name of the Global Dataset (“Collected” and “Child” Together) is imposed for “LEGACY” type data and comes from the parser label for “RAW” type data.
    Data collection is handled by the Zetaly Streaming Agent module. (see ZSA module for more information)

    • RAW data is data collected without any knowledge of the data structure organization. The data structure is declared by users with Custom Parsers feature (see ZUP module for more information).
    • LEGACY type data comes from collected data for which the data structure is known by default in the Zetaly solution.

    From the dataset list, click on the table menu on the right and choose “Assign a Srategy”.


    A dialog box opens, allowing you to define the attributes necessary for associating a strategy and a dataset.


    File Identify Tab

    By default, the “File Identify” tab is selected. This tab provides information on the collected dataset.
    No action is possible on this tab, it's just an information sheet about collected Data.




    Indexes Tab

    This tab allows to define which dataset fields are to be indexed. By default, all “Text” fields type are indexed.
    You can modify the default selection and choose which fields are indexed and which are not.

    To index a field check the box attached to the field. The opposite operation removes indexing from the field.
    This operation is allowing only if the strategy is not active.
    To search for a specific field, use the search function at the top of the table.



    Strategy Tab

    This tab is used to assign a strategy to the dataset in order to manage its lifecycle.

    Fields Descriptions :

    1 Older Than field selector

    2 Strategy Assignment 

    3 Time zone offset applied for aggregation

    4 Lifecycle applied on dataset (filled only when a strategy is selected)



    Before assigning a strategy, it is necessary to specify which dataset field will be used in order to evaluate the “Older Than” condition of each rule in the strategy (see rules management for more information).

    Older Than field has to be of type “Date” or “Datetime”.
    The selected field cannot be deleted when defining fields and aggregations. It is mandatory throughout the dataset lifecycle.
    The aggregation operation "GROUPBY" is automatically assing to the field selected as "Older than". It is not possible remove this operation. However, you can add additional aggregation operations such as “MAX” or “MIN”.


    Once the "Older Than" field condition is set, the list of avalaible strategy is enabled
    Select the strategy to be applied to the dataset from the list. When a strategy is selected, the list of rules to be applied is displayed.
    It is not possible to modify Strategy properties from this dialog. If there is no strategy define as your expectation you have to create a new strategy (see strategies management for more information). 


    All LEGACY data in Zetaly solution are stored in UTC time.
    Depend on your need you may have to adjust aggregation to your own timezone.
    As the older-than condition is evaluated on the stored data, for aggregations of less than a month it is necessary to take your timezone into account so that a day really represents the period from 00:00 AM to 11:59 PM expressed in local time.

    To specify the desired timezone, select your timezone in selector 3. This parameter will not modify the dates, but will just shift the selection.
    By default aggrgegation are processed using UTC time (Offset 00).

    If you want specify aggregation operation and fields switch  to "Aggregation Tabs" or  you can "Apply " your configuration. The Association Dataset / Strategy is now defined. 

    Define aggregation operations

    The "Aggregations" tab allow to define all aggregation operations done on selected fields.
    A strategy have to be selected  and inactive for this dataset before.

    During this stage, you can immediately abandon fields that have no value for the company. This optimizes data and storage.
    To select or drop a data item, simply check or uncheck the box in the left-hand column of the table.

    When a field is retained (checked) , it must be associated with an aggregation operation.
    By default, “text” type fields inherit the “GroupBy” operation and “Numeric” type fields the “AGV” (average) operation.
    As mentioned, the field selected for the “Older Than” condition cannot be abandoned and the aggregation operation is set to "GroupBy".

    For each selected field , you can modify the default operation. The list of operations is available in the “operations list” table.
    A field can have several operations, allowing you to enrich the data collected. For example, when aggregating a numeric field, you can store its average value (AVG), minimum value (MIN), maximum value (MAX), etc.
    Each additional aggregation operation creates a new field with an identical name, to which a suffix representing the operation is added. (Ex. R4h_AVG, R4h_MIN, R4h_MAX, etc.).

    To add a new operation, click on the button on the interface.
    A new field for selecting an aggregation operation is added. You can add as many operations as you like. However, each operation is unique for the same field.

    To quickly access to a field, use search function.


    Once all operations are define you can "Apply " your configuration. The Association Dataset / Strategy is now defined.


    Operations List

    Data TypeAvalaible  operations
    TEXTGROUPBY  (Default)
    UNIQUE_OF (distinct)
    COUNT
    NUMERICGROUPBY
    COUNT
    MAX
    MIN
    AVG  (Default)
    SUM
    DATETIMEGROUPBY
    MAX (Default)
    MIN


    Adding Tags

    The "Tags" Tab allows to set business information to your data. This step is optionnal

     

    To Add a Tag, select one of available Tags in the dropdown list or create a new Tags and apply changes.
    For more information see Tags management


    Define fields for the live cycle dataset

    Throughout the lifecycle, fields that are no longer of interest can be dropped at any stage.
    This makes it possible but not mandatory to keep only those fields that are of value to the company over a long retention period, and to minimize storage space requirements.
    By default, all fields defined when the strategy is assigned are retained for the duration of the lifecycle.

    To configure fields management over time, select the “Set Rule Fields” menu item from the overview table.


    A dialog box is open with one tab by rule define in the applied strategy.

    1 Rule Information

    2 Dataset Fields avalaibles







    Each Tab is naming for "Rule 1" et "Rule 8".
    "Rule 1" is the first rule in the strategy and "Rule 8" the last one.

    On any Tabs is possible to uncheck (discard) or check (keep) some fields that are no more require in lifecycle.
    On any tab, you can uncheck fields that are no longer required in the lifecycle. When a field is unchecked in a step, it is automatically deleted in subsequent steps (rules), if any.

    An unchecked field can only be checked if this field is enabled in the immediately preceding rule.
    Fields can no longer be modified once the strategy has been activated.
    Once the definition is complete, click on "Apply" button to save configuration.
    "Reset " button restaure defaut selection (All fields)


    Activate lifecycle 

    At this stage you must have associate and define the complete lifecycle configuration expected for a dataset.
    Activating the Strategy will enable the lifecycle. Strategy rules will automatically start thiers cycle and aggregate or Delete data if it match with the rule condition define.
    No more action is required. Rules are automatically scheduling in function of thiers own properties.

    To activate dataset management from the dataset overview, choose the option menu "Activate" 


    The activation process may take some time (a few minutes). It depends on the data already stored in the “Collected” dataset.
    Before activating the strategy, the process defines the index based on the configuration you have declared.


    After having activating the strategy the State column change the information display to inform about current operations status.

    The color state depend on the state of each rule associated to the strategy applied to the Dataset.
    In normal operation (Ready, Running) the green color is always displayed.

    However, if one of the strategy rules is in a different state, the worse case color is displayed.
    Use the detail view to obtain information on the rule(s) requiring attention.



    State ColorRule State
    GreenReady (waiting next cycle)
    Running(performing associated action)
    YellowWaiting (Rule have operation to do but waiting for an execution slot)
    OrangeRollback (Automatic recovery)
    RedError(Rule is in error and require manual restart)
    GreyInactive(strategy deactivated)

    see Rules management for more information.

    Once the strategy is activated you can obtain details on management by accessing to the detail view.


    Deactivate live cycle

    This operation is only available if the current strategy is active or in error.
    To stop the lifecycle applied on a Dataset you need to deactivate it.
    This operation is done from the overview table, choose the option menu "Deactivate" 

    When Deactivation is required, rules ended thier current cycle and stop. The lifeycle is disabled and the displayed state is "inactive"

    This action doesn't disable data collection. Data will continue to be collected and stored in the “Collected” dataset.
    If you don't want to continu the data collection you must disable also the collect for this dataset (see Zetaly streaming Agent for more information).
    You can Reactivate the lifecycle if required.


    Unassign strategy

    This operation is only available if the current strategy is Deactivated.
    The unassignment remove the association between strategy and Dataset. When this operation is completed, the dataset reverts to "Orphan".
    If this action is performed while data is still being collected, then it is stored in the "Collected" dataset without any lifecycle management. 
    The “Collected” dataset will continue to increase in size over time. You may disabled data collection for this dataset if required (see Zetaly streaming Agent for more information).

    This operation is done from the overview table. Choose the option menu "Unassign Strategy" 

     
    The unassignment Dialog is open

    1 Related Dataset List

    Unassignment Option

    File Archive name





    When you unassign a strategy, you can choose either to delete all data from all linked datasets (Collected and Child) as well as child datasets, or to archive the data stored in the various datasets.

    Delete All Child Datasets
    All data and Child Dataset are deleted. This action is unrecoverable.
    After complete operation the Dataset is associated to Orphan Strategy.

    • For Dataset with "LEGACY" category, the dataset "Collected" is not deleted, but all data stored in this dataset is deleted.
    • For Dataset with "SMF" category (based on custom parsers), the "Collected" table is deleted.

    Archive to

    All Datasets (Collected and Child) are aggregating using the aggregation granularity defined by the last rule of the strategy currently associated.

    This operation create a new dataset containing all the data with "Orphan".

    When you specify this option you can use the default file name or rename it using your own file naming pocily. The archive name have to be unique.


    This operation may takes time. During this process the state of the dataset is "Archiving".

    An API is automaticaly created with the same label as specify for the archive name. Using this API you can access to the archived Data.


    Monitoring dataset 

    Data lifecycle management provides statistics on all data and rule execution.

    Statistics are only available for dataset having an association with a strategy.

    Global Dataset statistics

    From the dataset detail page you can directly access to its statistics.
    Click on "Statistics" from top right menu


    The statistics page is open providing information about the dataset.
    By default the iinformation are based for the last week. The displayed period is alway compare the to same previous period to provid information as trend or growth rate.

    4 widgets are shows.

    Widget NumberTitlePurpose
    1Ranking By StorageThis widget provide information about is position in term of allocated space. The number is the rank of the global dataset. 1 is the biggest dataset.
    The percentage is estimated on allocation compare the same previous period 
    2Average Growth RateGrowth rate of the dataset related to the same previous period.
    3Average Storage AllocationAverage allocated storage for the period
    4Allocated storage HistoryHistory storage profile of the dataset  in Gb


    Collected Dataset 

    Each global dataset is linked to a "Collected Dataset". This dataset contains the raw data from the collections.
    For each Collected Dataset the solution provide statistics information.

    From the dataset detail page select the collected Dataset in "Related Dataset" table.


    Then Select "Dataset Statistics" on the Top Rigth coner menu.


    4 widgets are shows.

    Widget NumberTitlePurpose
    1Collected RecordsNumber of records collected for the selected period
    2Average Growth Rate Record Growth rate of the dataset related to the same previous period.
    3Average Collected record By DayAverage number of record s collected by day, with trend for the displayed period
    4Collected records historyNumber of collected records history profile



    Data LifeCycle Management LOG


    Global Dataset Logs

    From the dataset detail page you can directly access to Global dataset Log.
    Click on "Logs" from top right menu

    The log provide information about all activities on the Global Dataset, as assign , activate a Strategy
    The last 24 hours are available from this interface.

    Export log content by click on the export button

     

     Filtering the log level messages displayed.

    Click on this icon open the windows below.

    Select from the dropdown list the expected  Log level.



    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.