Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added sliding window feature to Giudi Gemma2 model #1736

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

slokesha
Copy link
Contributor

@slokesha slokesha commented Jan 30, 2025

Enabled sliding window feature to gemma2 model.
Gemma2 uses sliding window in ever other layer of Gemma2DecoderLayer. The implementation is here . The same logic in GaudiGemma2DecoderLayer.

The value for sliding_window is passed through config file.

Gemma2 can use flash_attention_2 as attention class. Hence enabling flash_attention_2 as a cmdline argument.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@slokesha slokesha marked this pull request as ready for review February 4, 2025 03:58
@slokesha slokesha requested a review from regisss as a code owner February 4, 2025 03:58
@jiminha
Copy link
Collaborator

jiminha commented Feb 5, 2025

Do you see perf improvement with this sliding window enabled?

@jiminha jiminha requested a review from libinta February 5, 2025 00:07
@slokesha slokesha marked this pull request as draft February 5, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants