-
Notifications
You must be signed in to change notification settings - Fork 5
Add kernel fusion #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
benchmarks/CMakeCache.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMakeFiles, CMake caches and generated makefiles should be added to gitignore
|
|
||
| Operation *op = getOperation(); | ||
| if (!op->hasTrait<OpTrait::SymbolTable>()) { | ||
| op->emitError() << " was scheduled to be run under the inliner, but does " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| op->emitError() << " was scheduled to be run under the inliner, but does " | |
| op->emitError() << " was scheduled to be run under the inliner, but does not " |
| @@ -0,0 +1,38 @@ | |||
| /* Top level routine for automatic kernel fusion. Creates "fusion sets", i.e. | |||
| * sets of subkernels that are to be fused together. Major steps: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * sets of subkernels that are to be fused together. Major steps: | |
| * sets of subkernels that are to be fused together. |
Or finish the comment, I'm fine either way :)
| if (write == bufferStores.end()) | ||
| return WalkResult::advance(); | ||
|
|
||
| if (!write->second.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if loadMem's rank is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I see that an interrupt result means that there could be a read after write.
| ploopChains.clear(); | ||
| ploopChains.push_back({}); | ||
|
|
||
| bool noSideEffects = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some comments about this logic. It looks like it will always group all parallels in a block into chains of 2: (i, i+1), (i+2, i+3), ...
|
|
||
| // {{{ generic -> einsum | ||
|
|
||
| typedef struct EinsumArg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer C++ style struct declarations like
struct EinsumArg
{
};
| if ( | ||
| !isa<arith::AddFOp>(op) && | ||
| !isa<arith::MulFOp>(op) && | ||
| !isa<linalg::YieldOp>(op) | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if ( | |
| !isa<arith::AddFOp>(op) && | |
| !isa<arith::MulFOp>(op) && | |
| !isa<linalg::YieldOp>(op) | |
| ) { | |
| if (!isa<arith::AddFOp, arith::MulFOp, linalg::YieldOp>(op)) { |
| !isa<arith::MulFOp>(op) && | ||
| !isa<linalg::YieldOp>(op) | ||
| ) { | ||
| op.dump(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| op.dump(); | |
| llvm::outs() << "Note: cannot convert following linalg.generic to einsum:\n"; | |
| op.dump(); |
OK to dump the op but should give a message for why it's being printed. Or just get rid of the dump since this is only one possible case for why a generic isn't a valid einsum.
| } | ||
| } | ||
|
|
||
| std::set<char> outputIndices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is nice and elegant
| FusedEinsum fusedEinsum; | ||
|
|
||
| std::vector<EinsumSpecification> fused_einsums; | ||
| for (auto outer = einsums.rbegin(); outer != einsums.rend(); ++outer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should have some comments explaining the steps
No description provided.