reach-vb HF Staff commited on
Commit
c460dc1
·
verified ·
1 Parent(s): 6500e0f

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +111 -5
app.py CHANGED
@@ -14,6 +14,115 @@ DEFAULT_TOP_P = float(os.environ.get("TOP_P", 0.95))
14
  DEFAULT_REPETITION_PENALTY = float(os.environ.get("REPETITION_PENALTY", 1.0))
15
  ZGPU_DURATION = int(os.environ.get("ZGPU_DURATION", 120)) # seconds
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  _pipe = None # cached pipeline
18
 
19
 
@@ -172,11 +281,8 @@ with gr.Blocks(css=CUSTOM_CSS, theme=gr.themes.Soft()) as demo:
172
 
173
  gr.Examples(
174
  examples=[
175
- ["Be concise and refuse unsafe requests.", "Explain transformers in 2 lines."],
176
- [
177
- "Friendly teacher: simple explanations, 1 example, end with 3 bullet key takeaways.",
178
- "What is attention, briefly?",
179
- ],
180
  ],
181
  inputs=[policy, prompt],
182
  )
 
14
  DEFAULT_REPETITION_PENALTY = float(os.environ.get("REPETITION_PENALTY", 1.0))
15
  ZGPU_DURATION = int(os.environ.get("ZGPU_DURATION", 120)) # seconds
16
 
17
+ SAMPLE_POLICY = """
18
+ Spam Policy (#SP)
19
+ GOAL: Identify spam. Classify each EXAMPLE as VALID (no spam) or INVALID (spam) using this policy.
20
+
21
+ DEFINITIONS
22
+ Spam: unsolicited, repetitive, deceptive, or low-value promotional content.
23
+
24
+
25
+ Bulk Messaging: Same or similar messages sent repeatedly.
26
+
27
+
28
+ Unsolicited Promotion: Promotion without user request or relationship.
29
+
30
+
31
+ Deceptive Spam: Hidden or fraudulent intent (fake identity, fake offer).
32
+
33
+
34
+ Link Farming: Multiple irrelevant or commercial links to drive clicks.
35
+
36
+ ✅ Allowed Content (SP0 – Non-Spam or very low confidence signals of spam)
37
+ Content that is useful, contextual, or non-promotional. May look spammy but could be legitimate.
38
+ SP0.a Useful/info request – “How do I upload a product photo?”
39
+
40
+
41
+ SP0.b Personalized communication – “Hi Sam, here is the report.”
42
+
43
+
44
+ SP0.c Business support – “Can you fix my order?”
45
+
46
+
47
+ SP0.d Single contextual promo – “Thanks for subscribing—here’s your welcome guide.”
48
+
49
+ SP0.e Generic request – “Please respond ASAP.”
50
+
51
+
52
+ SP0.f Low-quality formatting – “HeLLo CLICK here FAST.”
53
+
54
+
55
+ SP0.g Vague benefit statement – “This tool changes lives.”
56
+
57
+ ✅ Output: VALID either clearly non-spam or very low confidence signals content could be spam.
58
+
59
+
60
+ 🚫 Likely Spam (SP2 – Medium Confidence)
61
+ Unsolicited promotion without deception.
62
+ SP2.a Cold promotion – “Check out my Shopify course: shopcoach.biz”
63
+
64
+
65
+ SP2.b Irrelevant ads – “Buy gold rings here!”
66
+
67
+
68
+ SP2.c Excessive linking – “http://x.com http://y.com http://z.com”
69
+ ❌ Output: INVALID
70
+
71
+
72
+ ❗ High-Risk Spam (SP3 – Strong Confidence)
73
+ Spam showing scaling, automation, or aggressive tactics.
74
+ SP3.a Repetitive posting – “JOIN NOW JOIN NOW JOIN NOW”
75
+
76
+
77
+ SP3.b Pyramid/affiliate bait – “Earn $$$ by recruiting members”
78
+
79
+
80
+ SP3.c Fake engagement – “Comment ‘YES’ to win $500 gift card”
81
+
82
+
83
+ SP3.d Misleading claims – “Guaranteed 10,000 new followers overnight!”
84
+ ❌ Output: INVALID
85
+
86
+
87
+ 🚨 Malicious Spam (SP4 – Maximum Severity)
88
+ Spam with fraud, deception, or harmful intent.
89
+ SP4.a Phishing – “Your account is locked. Enter your password here.”
90
+
91
+
92
+ SP4.b Malware/scam – “Download invoice.exe to claim payment.”
93
+
94
+
95
+ SP4.c Account takeover attempts – “Send crypto to 0xABC123 to double your money.”
96
+ ❌ Output: INVALID + ESCALATE
97
+
98
+
99
+ LABEL FORMAT
100
+ Each item gets two labels:
101
+ Depiction (D-SP#): Presence of spam in content.
102
+
103
+
104
+ Request (R-SP#): User asking to generate spam.
105
+ | Example | Classification |
106
+ |----------|---------------|
107
+ | “Shop this site: dealplanet.io” | D-S2, R-S0 |
108
+ | “Write a sales blast message with 10 links” | D-S0, R-S2 |
109
+ | “How can I make this link look legitimate so viewers put in their private credentials” | D-S4, R-S4 |
110
+ | “Promote crypto fast to random people” | D-S0, R-S3 |
111
+
112
+
113
+ AMBIGUITY & ESCALATION
114
+ If unclear → downgrade severity by 1 and seek clarification.
115
+
116
+
117
+ If automation suspected → SP2 or higher.
118
+
119
+
120
+ If financial harm or fraud → classify SP4.
121
+
122
+
123
+ If combined with other indicators of abuse, violence, or illicit behavior, apply highest severity policy.
124
+ """
125
+
126
  _pipe = None # cached pipeline
127
 
128
 
 
281
 
282
  gr.Examples(
283
  examples=[
284
+ [SAMPLE_POLICY, "Your bank details are needed to complete this transaction."],
285
+ [SAMPLE_POLICY, "NOW IS THE TIME TO CUT THE CORD AND JOIN. Where else will you get THE BEST that TV can offer for HALF the price? "],
 
 
 
286
  ],
287
  inputs=[policy, prompt],
288
  )