update

sustainability-lab · Nov 1, 2024 · 24e269a · 24e269a
1 parent aad5b06
commit 24e269a
Showing 1 changed file with 187 additions and 8 deletions.
diff --git a/index.html b/index.html
@@ -681,12 +681,6 @@
       margin-right: 5px;
     }
 
-    figcaption {
-      text-align: center;
-      margin-top: 10px;
-      color: #000;
-    }
-
 
     .result-table {
       width: 100%;
@@ -1498,7 +1492,11 @@ <h1 class="title is-1 publication-title">
                     <span class="author-block"><sup>2</sup>University of Alberta,</span>
                   </div>
                   <div class="is-size-5 publication-authors">
-                    <span class="author-block">17th Asian Conference on Computer Vision (ACCV 2024)</span>
+                    <span class="author-block">17th Asian Conference on Computer Vision (ACCV 2024)
+                  </div>
+                  <div>
+                    <span class="author-block"><strong>Accepted for Oral Presentation</strong>
+                    </span>
                   </div>
                 </span>
 
@@ -1720,6 +1718,177 @@ <h2 class="title is-3 has-text-centered">Overview of All Tasks</h2>
 
 
 
+  <section class="section">
+    <div class="container is-max-desktop">
+      <!-- Abstract. -->
+      <div class="columns is-centered has-text-centered">
+        <div class="column is-four-fifths">
+          <h2 class="title is-3">Abstract</h2>
+          <div class="content has-text-justified">
+
+            <p>
+              While large language models with vision capabilities (VLMs), e.g., <span class="model">GPT-<span
+                  class="gpt-green">4o</span></span> and <span class="model">Gemini-<span class="gemini-blue">1.5</span>
+                Pro</span>,
+              are powering various image-text applications and scoring high on many vision-understanding benchmarks, we
+              find that
+              they are surprisingly still struggling with low-level vision tasks that are easy to humans.
+              Specifically, on <span class="model">Blind<span class="blindtest-purple">Test</span></span>, our suite of
+              7 very
+              simple tasks such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c)
+              which letter is
+              being circled in a word; and (d) counting circles in an Olympic-like logo, four state-of-the-art VLMs are
+              only 58.57%
+              accurate on average.
+              <span class="model">Sonnet-<span class="sonnet35-brown">3.5</span></span> performs the best at 74.01%
+              accuracy, but
+              this is still far from the human expected accuracy of 100%.
+              Across different image resolutions and line widths, VLMs consistently struggle with tasks that require
+              precise spatial
+              information and recognizing geometric primitives that overlap or are close together.
+            </p>
+
+          </div>
+        </div>
+      </div>
+    </div>
+  </section>
+
+
+  <hr>
+  <section class="section">
+    <div class="container">
+      <div class="content-wrapper">
+        <h2 class="title is-3 has-text-centered">Overview of All Tasks</h2>
+        <div class="table-container">
+          <table class="performance-table results-table">
+            <thead>
+              <tr>
+                <th>Model</th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/two_lines.svg" alt="Line Intersect"
+                      class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/two-colored-circles-svg.svg"
+                      alt="Two Circles" class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/acknowledgement-svg.svg"
+                      alt="Circled Letter" class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/three-circles-svg.svg"
+                      alt="Olympic Rings" class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/overlapping-pentagons-svg.svg"
+                      alt="Pentagon" class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/nested-squares-svg.svg"
+                      alt="Nested Squares" class="task-icon-small"></div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/grid-3x4-svg.svg" alt="Grid"
+                      class="task-icon-small">
+                  </div>
+                </th>
+                <th>
+                  <div class="task-icon-container"><img src="static/images/logo/subway-map-svg.svg" alt="Path Following"
+                      class="task-icon-small"></div>
+                </th>
+                <th style="font-weight: bold; text-align: center;">Mean</th>
+              </tr>
+            </thead>
+            <tbody>
+              <tr>
+                <td>Random</td>
+                <td class="rnumber">33.33</td>
+                <td class="rnumber">50.00</td>
+                <td class="rnumber">5.77</td>
+                <td class="rnumber">20.00</td>
+                <td class="rnumber">20.00</td>
+                <td class="rnumber">25.00</td>
+                <td class="rnumber">4.55</td>
+                <td class="rnumber">33.33</td>
+                <td class="rnumber">24.00</td>
+              </tr>
+              <tr>
+                <td><img src="static/images/chatgpt-icon.svg" alt="GPT-4o" class="model-logo-small"></td>
+                <td class="rnumber">41.61</td>
+                <td class="rnumber">72.67</td>
+                <td class="rnumber">70.18</td>
+                <td class="rnumber">42.50</td>
+                <td class="rnumber">17.50</td>
+                <td class="rnumber">55.83</td>
+                <td class="rnumber">39.58</td>
+                <td class="rnumber">47.89</td>
+                <td class="rnumber">48.47</td>
+              </tr>
+              <tr>
+                <td><img src="static/images/google-gemini-icon.svg" alt="Gemini-1.5" class="model-logo-small"></td>
+                <td class="rnumber">66.94</td>
+                <td class="highlight rnumber">92.78</td>
+                <td class="highlight rnumber">92.81</td>
+                <td class="highlight rnumber">87.08</td>
+                <td class="rnumber">19.37</td>
+                <td class="rnumber">80.00</td>
+                <td class="rnumber">39.39</td>
+                <td class="rnumber">41.60</td>
+                <td class="rnumber">65.00</td>
+              </tr>
+              <tr>
+                <td><img src="static/images/claude-ai-icon.svg" alt="Sonnet-3" class="model-logo-small"></td>
+                <td class="rnumber">43.41</td>
+                <td class="rnumber">84.52</td>
+                <td class="rnumber">73.34</td>
+                <td class="rnumber">31.66</td>
+                <td class="rnumber">9.79</td>
+                <td class="rnumber">65.00</td>
+                <td class="rnumber">36.17</td>
+                <td class="rnumber">23.24</td>
+                <td class="rnumber">45.89</td>
+              </tr>
+              <tr>
+                <td><img src="static/images/claude-35.png" alt="Sonnet-3.5" class="model-logo-small"></td>
+                <td class="highlight rnumber">75.36</td>
+                <td class="rnumber">91.66</td>
+                <td class="rnumber">89.22</td>
+                <td class="rnumber">44.16</td>
+                <td class="highlight rnumber">77.29</td>
+                <td class="highlight rnumber">92.08</td>
+                <td class="highlight rnumber">74.26</td>
+                <td class="highlight rnumber">55.53</td>
+                <td class="highlight rnumber">74.94</td>
+              </tr>
+              <tr>
+                <td><strong>Mean</strong></td>
+                <td class="rnumber">56.84</td>
+                <td class="rnumber">85.41</td>
+                <td class="rnumber">81.39</td>
+                <td class="rnumber">51.35</td>
+                <td class="rnumber">30.99</td>
+                <td class="rnumber">73.29</td>
+                <td class="rnumber">47.35</td>
+                <td class="rnumber">42.06</td>
+                <td class="highlight rnumber">58.57</td>
+              </tr>
+            </tbody>
+          </table>
+        </div>
+        <figcaption>Accuracy (%) of each model over 7 tasks. The mean accuracy over all four models is 58.57%,
+          substantially
+          better than random chance (24%), which is computed considering each task as a single-label, N-way
+          classification
+          problem. <span class="model">Sonnet-<span class="sonnet35-brown">3.5</span></span> is the best (74.94%
+          accuracy)
+          but still far from the 100% expected accuracy.</figcaption>
+      </div>
+  </section>
+
+
+
   <section class="section">
     <div class="container">
       <div class="task-grid">
@@ -1769,6 +1938,7 @@ <h2 class="title is-3 has-text-centered">Overview of All Tasks</h2>
 
 
 
+
   <!-- TASK 1 Begins -->
   <section id="task1" class="section">
     <div class="container is-max-desktop">
@@ -1996,6 +2166,9 @@ <h2>How many times do the blue and red lines intersect?</h2>
                 <td style="text-align: center; vertical-align: middle;"><span class="label">1</span><span
                     class="cross">✗</span>
                 </td>
+                <td style="text-align: center; vertical-align: middle;"><span class="label">1</span><span
+                    class="cross">✗</span>
+                </td>
               </tr>
               <tr>
                 <td style="text-align: center; vertical-align: middle;"><img src="static/images/claude-35.png"
@@ -2903,7 +3076,7 @@ <h2>How many circles are in the image? Answer with only the number in numerical
                     class="cross">✗</span>
                 </td>
                 <td style="text-align: center; vertical-align: middle;"><span class="label">10</span><span
-                    class="cross">✗</span>
+                    class="cross"></span>
                 </td>
                 <td style="text-align: center; vertical-align: middle;"><span class="label">5</span><span
                     class="cross">✗</span>
@@ -3902,4 +4075,10 @@ <h2>How many single-color paths go from A to D? Answer with a number in curly br
 
 </body>
 
+</html>
+
+</html>
+
+</html>
+
 </html>