§ ÇPƒis'ãóø—UddlZddlmZmZmZmZddlZddlmZddlm Z ddl mZmZe¦«Z ejed<e rjddlmZmZmZd„Ze¦«Zej d ¬ ¦«dejdejd ejdedejf d„¦«ZeejefZnejZdeejdejfd„Zdeejdejfd„Zdeejdefd„Zdefd„ZdS)éN)ÚCallableÚListÚOptionalÚUnion)Únn)Ú_SUPPORTS_FLEX_ATTENTION)Ú get_loggerÚlog_onceÚ_log)Ú BlockMaskÚcreate_block_maskÚflex_attentioncó@— tjtd¬¦«S#t$rv}t d|›d¦« tjtdd¬¦«cYd}~S#t$r$}t d|›d¦«‚d}~wwxYwd}~wwxYw) NF)Údynamicz,Compiling flex_attention failed with error 'z%'. Retrying with mode='max-autotune'.zmax-autotune)rÚmodez-Compiling flex_attention failed with error: 'zŒ', Updating your pytorch version to nightlies may solve it, or you can setin your config dataset.packed=False to avoid using flex attention.)ÚtorchÚcompilerÚ ExceptionrÚinfo)Úes úu/home/jaya/work/projects/VOICE-AGENT/VIET/agent-env/lib/python3.11/site-packages/torchtune/modules/attention_utils.pyÚcompile_flex_attentionrsà€ð Ý”=¥¸Ð?Ñ?Ô?Ð?øÝð ð ð õ IŠIØg¸qÐgÐgÐgñ ô ð ð Ý”}¥^¸UÈÐXÑXÔXÐXÐXÐXÐXÐXÐXøÝð ð ð Ý— ’ ðYÀAðYðYðYñôðð øøøøð øøøøøøøð øøøs8‚ B§BÁA'Á!BÁ' BÁ1BÂBÂBÂBF)Ú recursiveÚqÚkÚvÚ block_maskÚreturncó(—t||||¬¦«S)N©r)Úflex_attention_compiled)rrrrs rÚcompile_friendly_flex_attentionr"2s€õ' q¨!¨Q¸:ÐFÑFÔFÐFóÚseq_lenscó—t|¦«}g}t|¦«D]H}tjd„t ||¦«D¦«¦«}| |¦«ŒItj|¦«}|S)aÒ Convert a batch tensor of seq lens into integer IDs denoting sample ownership. For example, seq_lens = [2, 3, 1] would return [0, 0, 1, 1, 1, 2]. Args: seq_lens (List[torch.Tensor]): Sequence lengths of samples in each pack in the batch, shape (batch_size, n), where n is the max number of sequences in a pack and can vary across packs. Returns: Tensor: Document IDs of shape (batch_size, max_seq_len). cód—g|]-\}}tj|f|tj|j¬¦«‘Œ.S©)ÚdtypeÚdevice)rÚfullÚlongr)©Ú.0ÚiÚseq_lens rú z3_get_document_ids_from_seq_lens..UsE€ð ð ð áAwõ” ˜G˜: qµ´ À7Ä>ÐRÑRÔRð ð ð r#)ÚlenÚrangerÚcatÚ enumerateÚappendÚstack)r$Ú batch_sizeÚbatch_document_idsÚ sample_idxÚdocument_idss rÚ_get_document_ids_from_seq_lensr;@s˜€õX‘”€JØÐÝ˜JÑ'Ô'ð 0ð 0ˆ õ”yð ð å"+¨H°ZÔ,@Ñ"AÔ"Að ñ ô ñ ô ˆð ×!Ò! ,Ñ/Ô/Ð/Ð/ÝœÐ%7Ñ8Ô8ÐØÐr#cóð—g}t|¦«}t|¦«D]B}d„t||¦«D¦«}| t j|Ž¦«ŒCt j|¦«S)a Given a batch tensor of seq lens defining the lengths of samples in each pack, Construct a 2D block causal mask for each pack in the batch. For example, if a single sample's seq_lens is [3, 2, 1], the mask would be:: mask = [ [1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 1], ] Args: seq_lens (List[torch.Tensor]): Sequence lengths of samples in each pack in the batch, shape (batch_size, n), where n is the max number of sequences in a pack and can vary across packs. Returns: Tensor: Block causal mask of shape (batch_size, max_seq_len, max_seq_len). c ó†—g|]>\}}tjtj||tj|j¬¦«¦«‘Œ?Sr')rÚtrilÚonesÚboolr)r,s rr0z,create_block_causal_mask..zsT€ð ð ð ñ7õ ŒJÝ” ˜7 Gµ5´:ÀgÄnÐUÑUÔUñ ô ð ð ð r#)r1r2r4r5rÚ block_diagr6)r$Úbatch_block_attn_masksr7r9Úblock_attn_maskss rÚcreate_block_causal_maskrD_s€ð0 ÐÝX‘”€JÝ˜JÑ'Ô'ðKðKˆ ð ð õ(¨°Ô(<Ñ=Ô=ð ñ ô Ðð ×%Ò%¥eÔ&6Ð8HÐ&IÑJÔJÐJÐJÝŒ;Ð-Ñ.Ô.Ð.r#cóÂ‡—trHt|¦«Š‰j\}}‰ d¦«Šˆfd„}t ||d||d¬¦«St|¬¦«S)aÖ Create a block causal document mask for a batch of packed sequences. If flex attention is supported by the current hardware, block causal logic and passing this into :func:`torch.nn.attention.flex_attention.create_block_mask`. The resultant BlockMask is a compressed representation of the full block causal mask. If on an older version, a standard 2D block causal mask is created and returned. Args: seq_lens (List[torch.Tensor]): Sequence lengths of samples in each pack in the batch, shape (batch_size, n), where n is the max number of sequences in a pack and can vary across packs. Returns: _MaskType: BlockMask or Tensor if torch version < 2.5.0. ÚcudacóF•—||k}‰||f‰||fk}||zS)a Defines the logic of a block causal mask by combining both a standard causal mask and a block diagonal document mask. See :func:`~torchtune.modules.attention_utils.create_block_causal_mask` for an illustration. ©)ÚbÚhÚq_idxÚkv_idxÚcausal_maskÚ document_maskr:s €rÚmask_modz*packed_block_causal_mask..mask_mod¡s5ø€ð 6š/ˆKØ(¨¨E¨Ô2°lÀ1ÀfÀ9Ô6MÒMˆMØ Ñ.Ð.r#N)r))r$)rr;ÚshapeÚtoÚcreate_block_causal_mask_flexrD)r$r7Úmax_seq_lenrOr:s @rÚpacked_block_causal_maskrT…sø€õ$ ð;Ý6°xÑ@Ô@ˆØ".Ô"4Ñˆ KØ#—’ vÑ.Ô.ˆð /ð /ð /ð /ð /õ-ØØØØØØð ñ ô ð õ(°Ð:Ñ:Ô:Ð:r#cóf—trUdtjdtjdtjdttdt dtdtjfd„}nTdtjdtjdtjdttdt dtdtjfd „}|S) aE Helper function to decide when to call flex attention or SDPA. It will use flex attention if ALL of the following conditions are met, otherwise it will default to SDPA: - torch version >= 2.5.0 - we are sample packing, therefore mask is a BlockMask - torch.cuda.get_device_capability() >= (7, 5) rrrÚmaskÚ dropout_pÚ is_causalrcó0—t|t¦«rIttdtj¬¦«|dkrt d¦«‚t||||¬¦«S||dd…ddd…dd…f}tj ||||||¬¦«S)NzOUsing flex attention for attention computation since a BlockMask was passed in.)ÚlevelgzCFlex attention does not support dropout. Please set dropout to 0.0.r ©Ú attn_maskrWrX)Ú isinstancerr rÚloggingÚDEBUGÚ ValueErrorr"rÚ functionalÚscaled_dot_product_attention©rrrrVrWrXs rÚ_attention_callz0_sdpa_or_flex_attention.._attention_callÅs×€õ˜$¥ Ñ*Ô*ð ÝÝØeÝ!œ-ðñôðð ˜s’??Ý$Ø]ñôðõ7ØØØØ#ð ñôððÐ#Ø 4¨¨¨¨A¨A¨A Ô.Dõ”}×AÒAØØØØ"Ø'Ø'ð Bñôðr#cót—||dd…ddd…dd…f}tj ||||||¬¦«S)Nr[)rrarbrcs rrdz0_sdpa_or_flex_attention.._attention_callös[€ðÐØ˜A˜A˜A˜t Q Q Q¨¨¨˜MÔ*õ”=×=Ò=ØØØØØ#Ø#ð >ñôð r#)rrÚTensorrÚ _MaskTypeÚfloatr@)rds rÚ_sdpa_or_flex_attentionri¹sá€õ ðGð- ÝŒ|ð- åŒ|ð- õŒ|ð- õ9Ô%ð - õ ð- õð - õŒ\ð- ð- ð- ð- ð- ðb ÝŒ|ð åŒ|ð õŒ|ð õ9Ô%ð õ ð õð õŒ\ð ð ð ð ð,Ðr#) r^ÚtypingrrrrrrÚtorchtune.utils._import_guardrÚtorchtune.utils._loggingr r rÚLoggerÚ__annotations__Ú!torch.nn.attention.flex_attentionrr rRrrr!ÚcompilerÚdisablerfr"rgr;rDrTrirHr#rúrrs-ðð€€€€Ø2Ð2Ð2Ð2Ð2Ð2Ð2Ð2Ð2Ð2Ð2Ð2à€€€àÐÐÐÐÐØBÐBÐBÐBÐBÐBØ9Ð9Ð9Ð9Ð9Ð9Ð9Ð9à!z‘|”|€€g„nÐ#Ð#Ñ#àð+ðððððððððððððð&5Ð4Ñ6Ô6Ðð„^×Ò eÐÑ,Ô,ðGØŒ<ðGàŒ<ðGðŒ<ðGðð Gð ŒðGðGðGñ-Ô,ðGðe”l IÐ-Ô.€I€Ià”€IðØ5”<Ô ðà „\ððððð>#/ t¨E¬LÔ'9ð#/¸e¼lð#/ð#/ð#/ð#/ðL1;Ø5”<Ô ð1;àð1;ð1;ð1;ð1;ðhS ðSðSðSðSðSðSr#