In our pre-training task
Posted: Thu Mar 27, 2025 10:42 am
.“flatten” multivariate time series, considering all variates as a single sequence. Patches are subsequently projected into vector representations via a multi-patch size input projection layer. The [mask] signifies a learnable embedding that replaces patches falling within the forecast horizon. The output tokens are then decoded via the multi-patch size output projection into the parameters of the mixture distribution. While not visualized, (non-learnable) instance normalization is applied to inputs/outputs, aligning with the current standard dentist data practice for deep forecasting models.
, we formulate the objective to optimize the mixture distribution log-likelihood. The design of both the data distribution and task distribution are two critical aspects of the pre-training pipeline. This design imparts versatile capabilities to our Large Time Series Model (LTM), enabling it to adapt to a range of downstream tasks. This flexibility stands in contrast to the prevailing deep forecasting paradigm, where models are typically specialized for specific datasets and settings.
, we formulate the objective to optimize the mixture distribution log-likelihood. The design of both the data distribution and task distribution are two critical aspects of the pre-training pipeline. This design imparts versatile capabilities to our Large Time Series Model (LTM), enabling it to adapt to a range of downstream tasks. This flexibility stands in contrast to the prevailing deep forecasting paradigm, where models are typically specialized for specific datasets and settings.