Skip to content

odsc-65115 : Transform columns names to make them compatible across models #1046

New issue

Have a question about this project? Sign up for a free account to open an issue and contact its maintainers and the community.

By clicking “Sign up for ”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on ? Sign in to your account

Merged
merged 11 commits into from
Feb 4, 2025

Conversation

codeloop
Copy link
Member

@codeloop codeloop commented Jan 24, 2025

odsc-65115
Add the column name transformations, also save a mapping for reverse transformation as per model requirement.

@oracle-contributor-agreementoracle-contributor-agreement bot added the OCA VerifiedAll contributors have signed the Oracle Contributor Agreement.label Jan 24, 2025
@codeloopcodeloop changed the title Make column name LGB compatible Transform columns names to make them compatible across models Jan 27, 2025
@codeloopcodeloop changed the title Transform columns names to make them compatible across models odsc-65115 : Transform columns names to make them compatible across models Jan 27, 2025
@codeloopcodeloop marked this pull request as ready for review January 28, 2025 04:50
@codeloopcodeloop enabled auto-merge January 28, 2025 04:50
@codeloopcodeloop requested a review from prasankh January 28, 2025 04:51
ahosler
ahosler previously requested changes Jan 28, 2025
@@ -59,7 +60,8 @@ def run(self, data):

"""
clean_df = self._remove_trailing_whitespace(data)
# clean_df = self._normalize_column_names(clean_df)
if hasattr(self.dataset_info, 'horizon'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a proxy for "if it's forecasting"?
If so it may be better to use kind/type for future compatibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the transformations we use OperatorSpec, incase of the forecasting it is ForecastOperatorSpec, type is available in the OperatorConfig which is not required here and adding type to spec will make it redundant, so I have doing that. Updated this to us operatorspec instance type check for better readability .

self.target_category_columns = [
self.raw_column_names.get(col, col) for col in self.target_category_columns
]
df.columns = df.columns.str.replace(" ", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be doing this replace 2x? Is there a way to do it once so we ensure it's done the same way

@@ -226,6 +254,10 @@ def _check_historical_dataset(self, df):
expected_names = [self.target_column_name, self.dt_column_name] + (
self.target_category_columns if self.target_category_columns else []
)

if self.raw_column_names:
expected_names.extend(list(self.raw_column_names.values()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these guaranteed to be in historical data?

Copy link
Member Author

@codeloop codeloop Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we are making this transformation to the historical data columns, expected columns has to be extended using the same post transformed columns

@codeloopcodeloop merged commit ccabc04 into main Feb 4, 2025
9 checks passed
Sign up for free to join this conversation on . Already have an account? Sign in to comment
Labels
OCA VerifiedAll contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants